Crate combine

Source
Expand description

This crate contains parser combinators, roughly based on the Haskell libraries parsec and attoparsec.

A parser in this library can be described as a function which takes some input and if it is successful, returns a value together with the remaining input. A parser combinator is a function which takes one or more parsers and returns a new parser. For instance the many parser can be used to convert a parser for single digits into one that parses multiple digits. By modeling parsers in this way it becomes easy to compose complex parsers in an almost declarative way.

§Overview

combine limits itself to creating LL(1) parsers (it is possible to opt-in to LL(k) parsing using the attempt combinator) which makes the parsers easy to reason about in both function and performance while sacrificing some generality. In addition to you being able to reason better about the parsers you construct combine the library also takes the knowledge of being an LL parser and uses it to automatically construct good error messages.

extern crate combine;
use combine::{Parser, EasyParser};
use combine::stream::position;
use combine::parser::char::{digit, letter};
const MSG: &'static str = r#"Parse error at line: 1, column: 1
Unexpected `|`
Expected digit or letter
"#;

fn main() {
    // Wrapping a `&str` with `State` provides automatic line and column tracking. If `State`
    // was not used the positions would instead only be pointers into the `&str`
    if let Err(err) = digit().or(letter()).easy_parse(position::Stream::new("|")) {
        assert_eq!(MSG, format!("{}", err));
    }
}

This library is currently split into a few core modules:

  • parser is where you will find all the parsers that combine provides. It contains the core Parser trait as well as several submodules such as sequence or choice which each contain several parsers aimed at a specific niche.

  • stream contains the second most important trait next to Parser. Streams represent the data source which is being parsed such as &[u8], &str or iterators.

  • easy contains combine’s default “easy” error and stream handling. If you use the easy_parse method to start your parsing these are the types that are used.

  • error contains the types and traits that make up combine’s error handling. Unless you need to customize the errors your parsers return you should not need to use this module much.

§Examples

extern crate combine;
use combine::parser::char::{spaces, digit, char};
use combine::{many1, sep_by, Parser, EasyParser};
use combine::stream::easy;

fn main() {
    //Parse spaces first and use the with method to only keep the result of the next parser
    let integer = spaces()
        //parse a string of digits into an i32
        .with(many1(digit()).map(|string: String| string.parse::<i32>().unwrap()));

    //Parse integers separated by commas, skipping whitespace
    let mut integer_list = sep_by(integer, spaces().skip(char(',')));

    //Call parse with the input to execute the parser
    let input = "1234, 45,78";
    let result: Result<(Vec<i32>, &str), easy::ParseError<&str>> =
        integer_list.easy_parse(input);
    match result {
        Ok((value, _remaining_input)) => println!("{:?}", value),
        Err(err) => println!("{}", err)
    }
}

If we need a parser that is mutually recursive or if we want to export a reusable parser the parser! macro can be used. In effect it makes it possible to return a parser without naming the type of the parser (which can be very large due to combine’s trait based approach). While it is possible to do avoid naming the type without the macro those solutions require either allocation (Box<dyn Parser< Input, Output = O, PartialState = P>>) or via impl Trait in the return position. The macro thus threads the needle and makes it possible to have non-allocating, anonymous parsers on stable rust.

#[macro_use]
extern crate combine;
use combine::parser::char::{char, letter, spaces};
use combine::{between, choice, many1, parser, sep_by, Parser, EasyParser};
use combine::error::{ParseError, StdParseResult};
use combine::stream::{Stream, Positioned};
use combine::stream::position;

#[derive(Debug, PartialEq)]
pub enum Expr {
    Id(String),
    Array(Vec<Expr>),
    Pair(Box<Expr>, Box<Expr>)
}

// `impl Parser` can be used to create reusable parsers with zero overhead
fn expr_<Input>() -> impl Parser< Input, Output = Expr>
    where Input: Stream<Token = char>,
{
    let word = many1(letter());

    // A parser which skips past whitespace.
    // Since we aren't interested in knowing that our expression parser
    // could have accepted additional whitespace between the tokens we also silence the error.
    let skip_spaces = || spaces().silent();

    //Creates a parser which parses a char and skips any trailing whitespace
    let lex_char = |c| char(c).skip(skip_spaces());

    let comma_list = sep_by(expr(), lex_char(','));
    let array = between(lex_char('['), lex_char(']'), comma_list);

    //We can use tuples to run several parsers in sequence
    //The resulting type is a tuple containing each parsers output
    let pair = (lex_char('('),
                expr(),
                lex_char(','),
                expr(),
                lex_char(')'))
                   .map(|t| Expr::Pair(Box::new(t.1), Box::new(t.3)));

    choice((
        word.map(Expr::Id),
        array.map(Expr::Array),
        pair,
    ))
        .skip(skip_spaces())
}

// As this expression parser needs to be able to call itself recursively `impl Parser` can't
// be used on its own as that would cause an infinitely large type. We can avoid this by using
// the `parser!` macro which erases the inner type and the size of that type entirely which
// lets it be used recursively.
//
// (This macro does not use `impl Trait` which means it can be used in rust < 1.26 as well to
// emulate `impl Parser`)
parser!{
    fn expr[Input]()(Input) -> Expr
    where [Input: Stream<Token = char>]
    {
        expr_()
    }
}

fn main() {
    let result = expr()
        .parse("[[], (hello, world), [rust]]");
    let expr = Expr::Array(vec![
          Expr::Array(Vec::new())
        , Expr::Pair(Box::new(Expr::Id("hello".to_string())),
                     Box::new(Expr::Id("world".to_string())))
        , Expr::Array(vec![Expr::Id("rust".to_string())])
    ]);
    assert_eq!(result, Ok((expr, "")));
}

Modules§

easystd
Stream wrapper which provides an informative and easy to use error type.
error
Error types and traits which define what kind of errors combine parsers may emit
future_ext
parser
A collection of both concrete parsers as well as parser combinators.
stream
Streams are similar to the Iterator trait in that they represent some sequential set of items which can be retrieved one by one. Where Streams differ is that they are allowed to return errors instead of just None and if they implement the RangeStreamOnce trait they are also capable of returning multiple items at the same time, usually in the form of a slice.

Macros§

choice
Takes a number of parsers and tries to apply them each in order. Fails if all the parsers fails or if an applied parser fails after it has committed to its parse.
decodestd
Parses an instance of std::io::Read as a &[u8] without reading the entire file into memory.
decode_futures_03futures-io-03
Parses an instance of futures::io::AsyncRead as a &[u8] without reading the entire file into memory.
decode_tokiotokio
Parses an instance of tokio::io::AsyncRead as a &[u8] without reading the entire file into memory.
decode_tokio_02tokio-02
Parses an instance of tokio::io::AsyncRead as a &[u8] without reading the entire file into memory.
decode_tokio_03tokio-03
Parses an instance of tokio::io::AsyncRead as a &[u8] without reading the entire file into memory.
dispatch
dispatch! allows a parser to be constructed depending on earlier input, without forcing each branch to have the same type of parser
opaque
Convenience macro over opaque.
parser
Declares a named parser which can easily be reused.
struct_parser
Sequences multiple parsers and builds a struct out of them.

Enums§

ParseResult
A Result type which has the committed status flattened into the result. Conversions to and from std::result::Result can be done using result.into() or From::from(result)

Traits§

EasyParserstd
Provides the easy_parse method which provides good error messages by default
ParseError
Trait which defines a combine parse error.
Parser
By implementing the Parser trait a type says that it can be used to parse an input stream into the type Output.
Positioned
A type which has a position.
RangeStream
A RangeStream is an extension of Stream which allows for zero copy parsing.
RangeStreamOnce
A RangeStream is an extension of StreamOnce which allows for zero copy parsing.
Stream
A stream of tokens which can be duplicated
StreamOnce
StreamOnce represents a sequence of items that can be extracted one by one.

Functions§

any
Parses any token.
attempt
attempt(p) behaves as p except it always acts as p peeked instead of committed on its parse.
between
Parses open followed by parser followed by close. Returns the value of parser.
chainl1
Parses p 1 or more times separated by op. The value returned is the one produced by the left associative application of the function returned by the parser op.
chainr1
Parses p one or more times separated by op. The value returned is the one produced by the right associative application of the function returned by op.
choice
Takes a tuple, a slice or an array of parsers and tries to apply them each in order. Fails if all the parsers fails or if an applied parser consumes input before failing.
count
Parses parser from zero up to count times.
count_min_max
Parses parser from min to max times (including min and max).
eof
Succeeds only if the stream is at end of input, fails otherwise.
from_str
Takes a parser that outputs a string like value (&str, String, &[u8] or Vec<u8>) and parses it using std::str::FromStr. Errors if the output of parser is not UTF-8 or if FromStr::from_str returns an error.
look_ahead
look_ahead(p) acts as p but doesn’t consume input on success.
many
Parses p zero or more times returning a collection with the values from p.
many1
Parses p one or more times returning a collection with the values from p.
none_of
Extract one token and succeeds if it is not part of tokens.
not_followed_by
Succeeds only if parser fails. Never consumes any input.
one_of
Extract one token and succeeds if it is part of tokens.
optional
Parses parser and outputs Some(value) if it succeeds, None if it fails without consuming any input. Fails if parser fails after having committed some input.
parser
Wraps a function, turning it into a parser.
position
Parser which just returns the current position in the stream.
produce
Always returns the value produced by calling f.
satisfy
Parses a token and succeeds depending on the result of predicate.
satisfy_map
Parses a token and passes it to predicate. If predicate returns Some the parser succeeds and returns the value inside the Option. If predicate returns None the parser fails without consuming any input.
sep_by
Parses parser zero or more time separated by separator, returning a collection with the values from p.
sep_by1
Parses parser one or more time separated by separator, returning a collection with the values from p.
sep_end_by
Parses parser zero or more times separated and ended by separator, returning a collection with the values from p.
sep_end_by1
Parses parser one or more times separated and ended by separator, returning a collection with the values from p.
skip_count
Parses parser from zero up to count times skipping the output of parser.
skip_count_min_max
Parses parser from min to max times (including min and max) skipping the output of parser.
skip_many
Parses p zero or more times ignoring the result.
skip_many1
Parses p one or more times ignoring the result.
token
Parses a character and succeeds if the character is equal to c.
tokens
Parses multiple tokens.
tokens_cmp
Parses multiple tokens.
unexpected
Always fails with message as an unexpected error. Never consumes any input.
unexpected_any
Always fails with message as an unexpected error. Never consumes any input.
value
Always returns the value v without consuming any input.

Type Aliases§

StdParseResult
A type alias over the specific Result type used by parsers to indicate whether they were successful or not. O is the type that is output on success. Input is the specific stream type used in the parser.