Generalized LR Parsing (Bison 3.8.1)

5.9 Generalized LR (GLR) Parsing

Bison produces deterministic parsers that choose uniquely when to reduce and which reduction to apply based on a summary of the preceding input and on one extra token of lookahead. As a result, normal Bison handles a proper subset of the family of context-free languages. Ambiguous grammars, since they have strings with more than one possible sequence of reductions cannot have deterministic parsers in this sense. The same is true of languages that require more than one symbol of lookahead, since the parser lacks the information necessary to make a decision at the point it must be made in a shift/reduce parser. Finally, as previously mentioned (see Mysterious Conflicts), there are languages where Bison’s default choice of how to summarize the input seen so far loses necessary information.

When you use the ‘%glr-parser’ declaration in your grammar file, Bison generates a parser that uses a different algorithm, called Generalized LR (or GLR). A Bison GLR parser uses the same basic algorithm for parsing as an ordinary Bison parser, but behaves differently in cases where there is a shift/reduce conflict that has not been resolved by precedence rules (see Operator Precedence) or a reduce/reduce conflict. When a GLR parser encounters such a situation, it effectively splits into a several parsers, one for each possible shift or reduction. These parsers then proceed as usual, consuming tokens in lock-step. Some of the stacks may encounter other conflicts and split further, with the result that instead of a sequence of states, a Bison GLR parsing stack is what is in effect a tree of states.

In effect, each stack represents a guess as to what the proper parse is. Additional input may indicate that a guess was wrong, in which case the appropriate stack silently disappears. Otherwise, the semantics actions generated in each stack are saved, rather than being executed immediately. When a stack disappears, its saved semantic actions never get executed. When a reduction causes two stacks to become equivalent, their sets of semantic actions are both saved with the state that results from the reduction. We say that two stacks are equivalent when they both represent the same sequence of states, and each pair of corresponding states represents a grammar symbol that produces the same segment of the input token stream.

Whenever the parser makes a transition from having multiple states to having one, it reverts to the normal deterministic parsing algorithm, after resolving and executing the saved-up actions. At this transition, some of the states on the stack will have semantic values that are sets (actually multisets) of possible actions. The parser tries to pick one of the actions by first finding one whose rule has the highest dynamic precedence, as set by the ‘%dprec’ declaration. Otherwise, if the alternative actions are not ordered by precedence, but there the same merging function is declared for both rules by the ‘%merge’ declaration, Bison resolves and evaluates both and then calls the merge function on the result. Otherwise, it reports an ambiguity.

It is possible to use a data structure for the GLR parsing tree that permits the processing of any LR(1) grammar in linear time (in the size of the input), any unambiguous (not necessarily LR(1)) grammar in quadratic worst-case time, and any general (possibly ambiguous) context-free grammar in cubic worst-case time. However, Bison currently uses a simpler data structure that requires time proportional to the length of the input times the maximum number of stacks required for any prefix of the input. Thus, really ambiguous or nondeterministic grammars can require exponential time and space to process. Such badly behaving examples, however, are not generally of practical interest. Usually, nondeterminism in a grammar is local—the parser is “in doubt” only for a few tokens at a time. Therefore, the current data structure should generally be adequate. On LR(1) portions of a grammar, in particular, it is only slightly slower than with the deterministic LR(1) Bison parser.

For a more detailed exposition of GLR parsers, see Scott 2000.