Sometimes reduce/reduce conflicts can occur that don’t look warranted. Here is an example:
%% def: param_spec return_spec ','; param_spec: type | name_list ':' type ;
return_spec: type | name ':' type ;
name: "id"; name_list: name | name ',' name_list ;
It would seem that this grammar can be parsed with only a single token of
lookahead: when a
param_spec is being read, an
"id" is a
name if a comma or colon follows, or a
type if another
"id" follows. In other words, this grammar is LR(1). Yet Bison
finds one reduce/reduce conflict, for which counterexample generation
(see section Generation of Counterexamples) would find a nonunifying example.
This is because Bison does not handle all LR(1) grammars by default,
for historical reasons.
In this grammar, two contexts, that after an
"id" at the beginning
param_spec and likewise at the beginning of a
return_spec, are similar enough that Bison assumes they are the
They appear similar because the same set of rules would be
active—the rule for reducing to a
name and that for reducing to
type. Bison is unable to determine at that stage of processing
that the rules would require different lookahead tokens in the two
contexts, so it makes a single parser state for them both. Combining
the two contexts causes a conflict later. In parser terminology, this
occurrence means that the grammar is not LALR(1).
For many practical grammars (specifically those that fall into the non-LR(1) class), the limitations of LALR(1) result in difficulties beyond just mysterious reduce/reduce conflicts. The best way to fix all these problems is to select a different parser table construction algorithm. Either IELR(1) or canonical LR(1) would suffice, but the former is more efficient and easier to debug during development. See section LR Table Construction, for details.
If you instead wish to work around LALR(1)’s limitations, you
can often fix a mysterious conflict by identifying the two parser states
that are being confused, and adding something to make them look
distinct. In the above example, adding one rule to
return_spec as follows makes the problem go away:
… return_spec: type | name ':' type | "id" "bogus" /* This rule is never used. */ ;
This corrects the problem because it introduces the possibility of an
additional active rule in the context after the
"id" at the beginning of
return_spec. This rule is not active in the corresponding context
param_spec, so the two contexts receive distinct parser states.
As long as the token
"bogus" is never generated by
the added rule cannot alter the way actual input is parsed.
In this particular example, there is another way to solve the problem:
rewrite the rule for
return_spec to use
instead of via
name. This also causes the two confusing
contexts to have different sets of active rules, because the one for
return_spec activates the altered rule for
rather than the one for
param_spec: type | name_list ':' type ;
return_spec: type | "id" ':' type ;
For a more detailed exposition of LALR(1) parsers and parser generators, see DeRemer 1982.