Next: Visualizing Your Parser, Previous: Generation of Counterexamples, Up: Debugging Your Parser [Contents][Index]

Bison parsers are *shift/reduce automata* (see The Bison Parser Algorithm). In some
cases (much more frequent than one would hope), looking at this automaton is
required to tune or simply fix a parser.

The textual file is generated when the options `--report` or
`--verbose` are specified, see Invoking Bison. Its name is made by
removing ‘`.tab.c`’ or ‘`.c`’ from the parser implementation file
name, and adding ‘`.output`’ instead. Therefore, if the grammar file is
`foo.y`, then the parser implementation file is called `foo.tab.c`
by default. As a consequence, the verbose output file is called
`foo.output`.

The following grammar file, `calc.y`, will be used in the sequel:

%union { int ival; const char *sval; }

%token <ival> NUM %nterm <ival> exp

%token <sval> STR %nterm <sval> useless

%left '+' '-' %left '*'

%%

exp: exp '+' exp | exp '-' exp | exp '*' exp | exp '/' exp | NUM ;

useless: STR; %%

`bison`

reports:

calc.y:warning: 1 nonterminal useless in grammar [-Wother] calc.y:warning: 1 rule useless in grammar [-Wother] calc.y:19.1-7:warning: nonterminal useless in grammar: useless [-Wother] 19 |useless: STR;|^~~~~~~calc.y:warning: 7 shift/reduce conflicts [-Wconflicts-sr] calc.y:note: rerun with option '-Wcounterexamples' to generate conflict counterexamples

Going back to the calc example, when given `--report=state`,
in addition to `calc.tab.c`, it creates a file `calc.output`
with contents detailed below. The order of the output and the exact
presentation might vary, but the interpretation is the same.

The first section reports useless tokens, nonterminals and rules. Useless nonterminals and rules are removed in order to produce a smaller parser, but useless tokens are preserved, since they might be used by the scanner (note the difference between “useless” and “unused” below):

Nonterminals useless in grammar useless Terminals unused in grammar STR Rules useless in grammar 6 useless: STR

The next section lists states that still have conflicts.

State 8 conflicts: 1 shift/reduce State 9 conflicts: 1 shift/reduce State 10 conflicts: 1 shift/reduce State 11 conflicts: 4 shift/reduce

Then Bison reproduces the exact grammar it used:

Grammar 0 $accept: exp $end 1 exp: exp '+' exp 2 | exp '-' exp 3 | exp '*' exp 4 | exp '/' exp 5 | NUM

and reports the uses of the symbols:

Terminals, with rules where they appear $end (0) 0 '*' (42) 3 '+' (43) 1 '-' (45) 2 '/' (47) 4 error (256) NUM <ival> (258) 5 STR <sval> (259)

Nonterminals, with rules where they appear $accept (9) on left: 0 exp <ival> (10) on left: 1 2 3 4 5 on right: 0 1 2 3 4

Bison then proceeds onto the automaton itself, describing each state with
its set of *items*, also known as *dotted rules*. Each item is a
production rule together with a point (‘`.`’) marking the location of the
input cursor.

State 0 0 $accept: • exp $end NUM shift, and go to state 1 exp go to state 2

This reads as follows: “state 0 corresponds to being at the very
beginning of the parsing, in the initial rule, right before the start
symbol (here, `exp`

). When the parser returns to this state right
after having reduced a rule that produced an `exp`

, the control
flow jumps to state 2. If there is no such transition on a nonterminal
symbol, and the lookahead is a `NUM`

, then this token is shifted onto
the parse stack, and the control flow jumps to state 1. Any other
lookahead triggers a syntax error.”

Even though the only active rule in state 0 seems to be rule 0, the
report lists `NUM`

as a lookahead token because `NUM`

can be
at the beginning of any rule deriving an `exp`

. By default Bison
reports the so-called *core* or *kernel* of the item set, but if
you want to see more detail you can invoke `bison`

with
`--report=itemset` to list the derived items as well:

State 0 0 $accept: • exp $end 1 exp: • exp '+' exp 2 | • exp '-' exp 3 | • exp '*' exp 4 | • exp '/' exp 5 | • NUM NUM shift, and go to state 1 exp go to state 2

In the state 1…

State 1 5 exp: NUM • $default reduce using rule 5 (exp)

the rule 5, ‘`exp: NUM;`’, is completed. Whatever the lookahead token
(‘`$default`’), the parser will reduce it. If it was coming from State
0, then, after this reduction it will return to state 0, and will jump to
state 2 (‘`exp: go to state 2`’).

State 2 0 $accept: exp • $end 1 exp: exp • '+' exp 2 | exp • '-' exp 3 | exp • '*' exp 4 | exp • '/' exp $end shift, and go to state 3 '+' shift, and go to state 4 '-' shift, and go to state 5 '*' shift, and go to state 6 '/' shift, and go to state 7

In state 2, the automaton can only shift a symbol. For instance, because of
the item ‘`exp: exp • '+' exp`’, if the lookahead is ‘`+`’ it is
shifted onto the parse stack, and the automaton jumps to state 4,
corresponding to the item ‘`exp: exp '+' • exp`’. Since there is no
default action, any lookahead not listed triggers a syntax error.

The state 3 is named the *final state*, or the *accepting
state*:

State 3 0 $accept: exp $end • $default accept

the initial rule is completed (the start symbol and the end-of-input were read), the parsing exits successfully.

The interpretation of states 4 to 7 is straightforward, and is left to the reader.

State 4 1 exp: exp '+' • exp NUM shift, and go to state 1 exp go to state 8 State 5 2 exp: exp '-' • exp NUM shift, and go to state 1 exp go to state 9 State 6 3 exp: exp '*' • exp NUM shift, and go to state 1 exp go to state 10 State 7 4 exp: exp '/' • exp NUM shift, and go to state 1 exp go to state 11

As was announced in beginning of the report, ‘`State 8 conflicts:
1 shift/reduce`’:

State 8 1 exp: exp • '+' exp 1 | exp '+' exp • 2 | exp • '-' exp 3 | exp • '*' exp 4 | exp • '/' exp '*' shift, and go to state 6 '/' shift, and go to state 7 '/' [reduce using rule 1 (exp)] $default reduce using rule 1 (exp)

Indeed, there are two actions associated to the lookahead ‘`/`’:
either shifting (and going to state 7), or reducing rule 1. The
conflict means that either the grammar is ambiguous, or the parser lacks
information to make the right decision. Indeed the grammar is
ambiguous, as, since we did not specify the precedence of ‘`/`’, the
sentence ‘`NUM + NUM / NUM`’ can be parsed as ‘`NUM + (NUM /
NUM)`’, which corresponds to shifting ‘`/`’, or as ‘`(NUM + NUM) /
NUM`’, which corresponds to reducing rule 1.

Because in deterministic parsing a single decision can be made, Bison arbitrarily chose to disable the reduction, see Shift/Reduce Conflicts. Discarded actions are reported between square brackets.

Note that all the previous states had a single possible action: either
shifting the next token and going to the corresponding state, or
reducing a single rule. In the other cases, i.e., when shifting
*and* reducing is possible or when *several* reductions are
possible, the lookahead is required to select the action. State 8 is
one such state: if the lookahead is ‘`*`’ or ‘`/`’ then the action
is shifting, otherwise the action is reducing rule 1. In other words,
the first two items, corresponding to rule 1, are not eligible when the
lookahead token is ‘`*`’, since we specified that ‘`*`’ has higher
precedence than ‘`+`’. More generally, some items are eligible only
with some set of possible lookahead tokens. When run with
`--report=lookahead`, Bison specifies these lookahead tokens:

State 8 1 exp: exp • '+' exp 1 | exp '+' exp • [$end, '+', '-', '/'] 2 | exp • '-' exp 3 | exp • '*' exp 4 | exp • '/' exp '*' shift, and go to state 6 '/' shift, and go to state 7 '/' [reduce using rule 1 (exp)] $default reduce using rule 1 (exp)

Note however that while ‘`NUM + NUM / NUM`’ is ambiguous (which results in
the conflicts on ‘`/`’), ‘`NUM + NUM * NUM`’ is not: the conflict was
solved thanks to associativity and precedence directives. If invoked with
`--report=solved`, Bison includes information about the solved
conflicts in the report:

Conflict between rule 1 and token '+' resolved as reduce (%left '+'). Conflict between rule 1 and token '-' resolved as reduce (%left '-'). Conflict between rule 1 and token '*' resolved as shift ('+' < '*').

When given `--report=counterexamples`, `bison`

will generate
counterexamples within the report, augmented with the corresponding items
(see Generation of Counterexamples).

shift/reduce conflict on token '/': 1 exp: exp '+' exp • 4 exp: exp • '/' exp

Example: exp '+' exp • '/' exp Shift derivation exp ↳ 1: exp '+' exp ↳ 4: exp • '/' exp Example: exp '+' exp • '/' exp Reduce derivation exp ↳ 4: exp '/' exp ↳ 1: exp '+' exp •

This shows two separate derivations in the grammar for the same `exp`

:
‘`e1 + e2 / e3`’. The derivations show how your rules would parse the
given example. Here, the first derivation completes a reduction when seeing
‘`/`’, causing ‘`e1 + e2`’ to be grouped as an `exp`

. The second
derivation shifts on ‘`/`’, resulting in ‘`e2 / e3`’ being grouped as
an `exp`

. Therefore, it is easy to see that adding
precedence/associativity directives would fix this conflict.

The remaining states are similar:

State 9 1 exp: exp • '+' exp 2 | exp • '-' exp 2 | exp '-' exp • 3 | exp • '*' exp 4 | exp • '/' exp '*' shift, and go to state 6 '/' shift, and go to state 7 '/' [reduce using rule 2 (exp)] $default reduce using rule 2 (exp)

State 10 1 exp: exp • '+' exp 2 | exp • '-' exp 3 | exp • '*' exp 3 | exp '*' exp • 4 | exp • '/' exp '/' shift, and go to state 7 '/' [reduce using rule 3 (exp)] $default reduce using rule 3 (exp)

State 11 1 exp: exp • '+' exp 2 | exp • '-' exp 3 | exp • '*' exp 4 | exp • '/' exp 4 | exp '/' exp • '+' shift, and go to state 4 '-' shift, and go to state 5 '*' shift, and go to state 6 '/' shift, and go to state 7 '+' [reduce using rule 4 (exp)] '-' [reduce using rule 4 (exp)] '*' [reduce using rule 4 (exp)] '/' [reduce using rule 4 (exp)] $default reduce using rule 4 (exp)

Observe that state 11 contains conflicts not only due to the lack of
precedence of ‘`/`’ with respect to ‘`+`’, ‘`-`’, and ‘`*`’, but
also because the associativity of ‘`/`’ is not specified.

Bison may also produce an HTML version of this output, via an XML file and XSLT processing (see Visualizing your parser in multiple formats).