Crate menhir [−] [src]
The Menhir LR(1) parser generator
Menhir is an LR(1) parser generator. It compiles LR(1) grammar specifications down to executable code. It offers a lot of advanced features such as:
- Full LR(1) parsing, not just LALR
- Parameterized non-terminals
- Inlining of grammar productions
- Conflict explaining in terms of the grammar
- Powerful error reporting
This crate is a wrapper that contains a version of Menhir that can produce
Rust parsers as well as Rust bindings to make use of the generator easier
from a Cargo build.rs
build script.
This is the reference documentation of the wrapper API. It explains how to
invoke the generator to generate Rust code and how to link that code with
existing Rust programs with a simple complete example. To learn more on how
to actually write complex Menhir grammars, look at the Menhir manual.
Writing Rust parsers is the same as writing OCaml parsers, except that
semantic actions contain Rust code. To learn more on how to interact with
the generated code from the rest of your Rust code, see the documentation of
the menhir_runtime
crate.
How to generate a Rust parser
A simple Rust parser
Below is a simple example of a Menhir parser containing Rust semantic actions and which can be compiled into a Rust parser:
%token <i32> CONST %token PLUS MINUS TIMES DIV LPAR RPAR EOL %left PLUS MINUS /* lowest precedence */ %left TIMES DIV /* medium precedence */ %nonassoc UMINUS /* highest precedence */ /* Rust's type inference is not as good as OCaml's... * All non-terminal types must be declared. */ %start <i32> main %type <i32> expr %% main: | e = expr EOL { e } expr: | i = INT { i } | OP e = expr CL { e } | e1 = expr PLUS e2 = expr { e1 + e2 } | e1 = expr MINUS e2 = expr { e1 - e2 } | e1 = expr TIMES e2 = expr { e1 * e2 } | e1 = expr DIV e2 = expr { e1 / e2 } | MINUS e = expr %prec UMINUS { - e }
This should look familiar to users of Yacc-like parser generators. Otherwise, see the Menhir manual.
Building the parser from Cargo
Let's assume we wrote the above parser into a src/calc.rsy
file (Menhir
grammars that contain Rust code must be named in .rsy
).
The first step is instructing Cargo that we will need the generator at
compile-time using build-dependencies
in the Cargo.toml file. We will also
need to tell Cargo that we will be using a custom build script:
[package]
# ...
build = "build.rs"
[build-dependencies]
menhir = "*"
Then, we write a build.rs
build script at the root of the Cargo project
that uses the menhir
wrapper crate to run the generator:
extern crate menhir; fn main() { menhir::process_file("src/calc.rsy", &[]); menhir::cargo_rustc_flags().unwrap(); }
The first line runs Menhir on the grammar file and produces a Rust file in
our package OUT_DIR
. The second argument to this function is an array of
additionnal flags to be passed to Menhir as MenhirOption
values.
The second line prints output that Cargo will interpret and that will
instruct it where to find the menhir_runtime
crate when compiling the
generated code.
Using the generated code
We can then include the generated code wherever we want in our Rust project,
by using the include!
macro. In our case, the parser file was named
parser.rsy
, so the generated Rust file will be named parser.rs
by
default. The generated code contains a lot of items, so it's recommanded to
wrap them in a module to avoid polluting the namespace:
mod parser { include!(concat!(env!("OUT_DIR"), "/parser.rs")) }
We can then use the generated code from the parser
module we just created.
For each entry point in the grammar, Menhir generates a type that exposes a
run
function. This is an example of how to use it to run the generated
parser, whose entry point non-terminal is called main
:
use parser::Token::*; fn main() { let input = vec![INT(1), MINUS, INT(2), PLUS, INT(3)].into_iter().enumerate(); let lexer = menhir_runtime::IteratorLexer::new(input); match parser::main::run(lexer) { Ok(value) => println!("successful parsing of {}", value), Err(_) => println!("syntax error") } }
See the documentation of the menhir_runtime
crate for more information
on the lexer interface or the way to use the reported error values.
Fast iteration over the grammar
When actually developing a Menhir grammar, it can be useful to call directly
the Menhir executable in several cases: for example, when trying to solve a
conflict in the grammar, one might want to try to recompile the grammar each
time it changes, or adding or removing several flags, without having to
modify the build.rs
script and running the whole cargo build
process
again everytime. It is also very useful when working with the new error
reporting system, since --list-errors
can take tens of seconds to
complete, and should thus not be invoked automatically from the build.rs
script but rather manually, when the grammar changes. For this purpose, the
wrapper crate exposes the link_binary
function. We call it this way from
your build script:
menhir::link_binary().unwrap();
It will create a symlink (soft link) to the menhir binary in the root of the
Cargo project (the directory the MANIFEST_DIR
environment variable points
to). We can then use this symlink just like the Menhir binary. Don't forget
the --rust
and --no-stdlib
flags, you will probably need them to compile
a Rust parser:
./menhir --rust --no-stdlib --some-other-flags src/parser.rsy
Reexports
pub use MenhirOption::*; |
pub use OnlyPreprocessOption::*; |
pub use SuggestOption::*; |
Enums
MenhirOption |
An option (flag) to be passed to the Menhir generator |
OnlyPreprocessOption |
Argument to the |
SuggestOption |
Argument to the |
Constants
MENHIR_BINARY |
The location of the Menhir binary |
Functions
add_option |
Add a |
cargo_rustc_flags |
Instructs Cargo where to find the menhir_runtime |
compile_errors |
Convenience function over |
link_binary |
Links the Menhir binary to the |
process_file |
Convenience function over |
run |
Run Menhir with the given options and the given grammar file |