Crate menhir [] [src]

The Menhir LR(1) parser generator

Menhir is an LR(1) parser generator. It compiles LR(1) grammar specifications down to executable code. It offers a lot of advanced features such as:

This crate is a wrapper that contains a version of Menhir that can produce Rust parsers as well as Rust bindings to make use of the generator easier from a Cargo build.rs build script.

This is the reference documentation of the wrapper API. It explains how to invoke the generator to generate Rust code and how to link that code with existing Rust programs with a simple complete example. To learn more on how to actually write complex Menhir grammars, look at the Menhir manual. Writing Rust parsers is the same as writing OCaml parsers, except that semantic actions contain Rust code. To learn more on how to interact with the generated code from the rest of your Rust code, see the documentation of the menhir_runtime crate.

How to generate a Rust parser

A simple Rust parser

Below is a simple example of a Menhir parser containing Rust semantic actions and which can be compiled into a Rust parser:

%token <i32> CONST
%token PLUS MINUS TIMES DIV LPAR RPAR EOL

%left PLUS MINUS        /* lowest precedence */
%left TIMES DIV         /* medium precedence */
%nonassoc UMINUS        /* highest precedence */

/* Rust's type inference is not as good as OCaml's...
 * All non-terminal types must be declared. */
%start <i32> main
%type  <i32> expr

%%

main:
    | e = expr EOL                  { e }

expr:
    | i = INT                       { i }
    | OP e = expr CL                { e }
    | e1 = expr PLUS e2 = expr      { e1 + e2 }
    | e1 = expr MINUS e2 = expr     { e1 - e2 }
    | e1 = expr TIMES e2 = expr     { e1 * e2 }
    | e1 = expr DIV e2 = expr       { e1 / e2 }
    | MINUS e = expr %prec UMINUS   { - e }

This should look familiar to users of Yacc-like parser generators. Otherwise, see the Menhir manual.

Building the parser from Cargo

Let's assume we wrote the above parser into a src/calc.rsy file (Menhir grammars that contain Rust code must be named in .rsy).

The first step is instructing Cargo that we will need the generator at compile-time using build-dependencies in the Cargo.toml file. We will also need to tell Cargo that we will be using a custom build script:

[package]
# ...
build = "build.rs"

[build-dependencies]
menhir = "*"

Then, we write a build.rs build script at the root of the Cargo project that uses the menhir wrapper crate to run the generator:

extern crate menhir;

fn main() {
    menhir::process_file("src/calc.rsy", &[]);
    menhir::cargo_rustc_flags().unwrap();
}

The first line runs Menhir on the grammar file and produces a Rust file in our package OUT_DIR. The second argument to this function is an array of additionnal flags to be passed to Menhir as MenhirOption values. The second line prints output that Cargo will interpret and that will instruct it where to find the menhir_runtime crate when compiling the generated code.

Using the generated code

We can then include the generated code wherever we want in our Rust project, by using the include! macro. In our case, the parser file was named parser.rsy, so the generated Rust file will be named parser.rs by default. The generated code contains a lot of items, so it's recommanded to wrap them in a module to avoid polluting the namespace:

mod parser {
    include!(concat!(env!("OUT_DIR"), "/parser.rs"))
}

We can then use the generated code from the parser module we just created. For each entry point in the grammar, Menhir generates a type that exposes a run function. This is an example of how to use it to run the generated parser, whose entry point non-terminal is called main:

use parser::Token::*;
fn main() {
    let input = vec![INT(1), MINUS, INT(2), PLUS, INT(3)].into_iter().enumerate();
    let lexer = menhir_runtime::IteratorLexer::new(input);
    match parser::main::run(lexer) {
        Ok(value) => println!("successful parsing of {}", value),
        Err(_) => println!("syntax error")
    }
}

See the documentation of the menhir_runtime crate for more information on the lexer interface or the way to use the reported error values.

Fast iteration over the grammar

When actually developing a Menhir grammar, it can be useful to call directly the Menhir executable in several cases: for example, when trying to solve a conflict in the grammar, one might want to try to recompile the grammar each time it changes, or adding or removing several flags, without having to modify the build.rs script and running the whole cargo build process again everytime. It is also very useful when working with the new error reporting system, since --list-errors can take tens of seconds to complete, and should thus not be invoked automatically from the build.rs script but rather manually, when the grammar changes. For this purpose, the wrapper crate exposes the link_binary function. We call it this way from your build script:

menhir::link_binary().unwrap();

It will create a symlink (soft link) to the menhir binary in the root of the Cargo project (the directory the MANIFEST_DIR environment variable points to). We can then use this symlink just like the Menhir binary. Don't forget the --rust and --no-stdlib flags, you will probably need them to compile a Rust parser:

./menhir --rust --no-stdlib --some-other-flags src/parser.rsy

Reexports

pub use MenhirOption::*;
pub use OnlyPreprocessOption::*;
pub use SuggestOption::*;

Enums

MenhirOption

An option (flag) to be passed to the Menhir generator

OnlyPreprocessOption

Argument to the OnlyPreprocess flag

SuggestOption

Argument to the Suggest flag

Constants

MENHIR_BINARY

The location of the Menhir binary

Functions

add_option

Add a MenhirOption to a Menhir command

cargo_rustc_flags

Instructs Cargo where to find the menhir_runtime

compile_errors

Convenience function over run to compiles error files

link_binary

Links the Menhir binary to the MANIFEST directory of the calling crate

process_file

Convenience function over run that just compiles the grammar with the default options

run

Run Menhir with the given options and the given grammar file