Next: , Previous: Language and Grammar, Up: Concepts


1.2 From Formal Rules to Bison Input

A formal grammar is a mathematical construct. To define the language for Bison, you must write a file expressing the grammar in Bison syntax: a Bison grammar file. See Bison Grammar Files.

A nonterminal symbol in the formal grammar is represented in Bison input as an identifier, like an identifier in C. By convention, it should be in lower case, such as expr, stmt or declaration.

The Bison representation for a terminal symbol is also called a token type. Token types as well can be represented as C-like identifiers. By convention, these identifiers should be upper case to distinguish them from nonterminals: for example, INTEGER, IDENTIFIER, IF or RETURN. A terminal symbol that stands for a particular keyword in the language should be named after that keyword converted to upper case. The terminal symbol error is reserved for error recovery. See Symbols.

A terminal symbol can also be represented as a character literal, just like a C character constant. You should do this whenever a token is just a single character (parenthesis, plus-sign, etc.): use that same character in a literal as the terminal symbol for that token.

A third way to represent a terminal symbol is with a C string constant containing several characters. See Symbols, for more information.

The grammar rules also have an expression in Bison syntax. For example, here is the Bison rule for a C return statement. The semicolon in quotes is a literal character token, representing part of the C syntax for the statement; the naked semicolon, and the colon, are Bison punctuation used in every rule.

     stmt: RETURN expr ';' ;

See Syntax of Grammar Rules.