Understanding the automaton (Wisent Parser Development)

Previous: Grammar debugging, Up: Conflicts [Contents][Index]

2.4.2 Understanding the automaton

This section (took from the manual of Bison 1.49) describes how to use the verbose report printed by wisent-compile-grammar to understand the generated automaton, to tune or fix a grammar.

We will use the following example:

(let ((wisent-verbose-flag t)) ;; Print a verbose report!
  (wisent-compile-grammar
   '((NUM STR)                          ; %token NUM STR

     ((left ?+ ?-)                      ; %left '+' '-';
      (left ?*))                        ; %left '*'

     (exp                               ; exp:
      ((exp ?+ exp))                    ;    exp '+' exp
      ((exp ?- exp))                    ;  | exp '-' exp
      ((exp ?* exp))                    ;  | exp '*' exp
      ((exp ?/ exp))                    ;  | exp '/' exp
      ((NUM))                           ;  | NUM
      )                                 ;  ;

     (useless                           ; useless:
      ((STR))                           ;    STR
      )                                 ;  ;
     )
   'nil)                                ; no %start declarations
  )

When evaluating the above expression, grammar compilation first issues the following two clear messages:

Grammar contains 1 useless nonterminals and 1 useless rules
Grammar contains 7 shift/reduce conflicts

The *wisent-log* buffer details things!

The first section reports conflicts that were solved using precedence and/or associativity:

Conflict in state 7 between rule 1 and token '+' resolved as reduce.
Conflict in state 7 between rule 1 and token '-' resolved as reduce.
Conflict in state 7 between rule 1 and token '*' resolved as shift.
Conflict in state 8 between rule 2 and token '+' resolved as reduce.
Conflict in state 8 between rule 2 and token '-' resolved as reduce.
Conflict in state 8 between rule 2 and token '*' resolved as shift.
Conflict in state 9 between rule 3 and token '+' resolved as reduce.
Conflict in state 9 between rule 3 and token '-' resolved as reduce.
Conflict in state 9 between rule 3 and token '*' resolved as reduce.

The next section reports useless tokens, nonterminal and rules (note that useless tokens might be used by the scanner):

Useless nonterminals:

   useless


Terminals which are not used:

   STR


Useless rules:

#6     useless: STR;

The next section lists states that still have conflicts:

State 7 contains 1 shift/reduce conflict.
State 8 contains 1 shift/reduce conflict.
State 9 contains 1 shift/reduce conflict.
State 10 contains 4 shift/reduce conflicts.

The next section reproduces the grammar used:

Grammar

  Number, Rule
  1       exp -> exp '+' exp
  2       exp -> exp '-' exp
  3       exp -> exp '*' exp
  4       exp -> exp '/' exp
  5       exp -> NUM

And reports the uses of the symbols:

Terminals, with rules where they appear

$EOI (-1)
error (1)
NUM (2) 5
STR (3) 6
'+' (4) 1
'-' (5) 2
'*' (6) 3
'/' (7) 4


Nonterminals, with rules where they appear

exp (8)
    on left: 1 2 3 4 5, on right: 1 2 3 4

The report then details the automaton itself, describing each state with it set of items, also known as pointed rules. Each item is a production rule together with a point (marked by ‘.’) that the input cursor.

state 0

    NUM shift, and go to state 1

    exp go to state 2

State 0 corresponds to being at the very beginning of the parsing, in the initial rule, right before the start symbol (‘exp’). When the parser returns to this state right after having reduced a rule that produced an ‘exp’, it jumps to state 2. If there is no such transition on a nonterminal symbol, and the lookahead is a ‘NUM’, then this token is shifted on the parse stack, and the control flow jumps to state 1. Any other lookahead triggers a parse error.

In the state 1...

state 1

    exp  ->  NUM .   (rule 5)

    $default    reduce using rule 5 (exp)

the rule 5, ‘exp: NUM;’, is completed. Whatever the lookahead (‘$default’), the parser will reduce it. If it was coming from state 0, then, after this reduction it will return to state 0, and will jump to state 2 (‘exp: go to state 2’).

state 2

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    $EOI        shift, and go to state 11
    '+' shift, and go to state 3
    '-' shift, and go to state 4
    '*' shift, and go to state 5
    '/' shift, and go to state 6

In state 2, the automaton can only shift a symbol. For instance, because of the item ‘exp -> exp . '+' exp’, if the lookahead if ‘+’, it will be shifted on the parse stack, and the automaton control will jump to state 3, corresponding to the item ‘exp -> exp . '+' exp’:

state 3

    exp  ->  exp '+' . exp   (rule 1)

    NUM shift, and go to state 1

    exp go to state 7

Since there is no default action, any other token than those listed above will trigger a parse error.

The interpretation of states 4 to 6 is straightforward:

state 4

    exp  ->  exp '-' . exp   (rule 2)

    NUM shift, and go to state 1

    exp go to state 8



state 5

    exp  ->  exp '*' . exp   (rule 3)

    NUM shift, and go to state 1

    exp go to state 9



state 6

    exp  ->  exp '/' . exp   (rule 4)

    NUM shift, and go to state 1

    exp go to state 10

As was announced in beginning of the report, ‘State 7 contains 1 shift/reduce conflict.’:

state 7

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp '+' exp .   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    '*' shift, and go to state 5
    '/' shift, and go to state 6

    '/' [reduce using rule 1 (exp)]
    $default    reduce using rule 1 (exp)

Indeed, there are two actions associated to the lookahead ‘/’: either shifting (and going to state 6), or reducing rule 1. The conflict means that either the grammar is ambiguous, or the parser lacks information to make the right decision. Indeed the grammar is ambiguous, as, since we did not specify the precedence of ‘/’, the sentence ‘NUM + NUM / NUM’ can be parsed as ‘NUM + (NUM / NUM)’, which corresponds to shifting ‘/’, or as ‘(NUM + NUM) / NUM’, which corresponds to reducing rule 1.

Because in LALR(1) parsing a single decision can be made, Wisent arbitrarily chose to disable the reduction, see Conflicts. Discarded actions are reported in between square brackets.

Note that all the previous states had a single possible action: either shifting the next token and going to the corresponding state, or reducing a single rule. In the other cases, i.e., when shifting and reducing is possible or when several reductions are possible, the lookahead is required to select the action. State 7 is one such state: if the lookahead is ‘*’ or ‘/’ then the action is shifting, otherwise the action is reducing rule 1. In other words, the first two items, corresponding to rule 1, are not eligible when the lookahead is ‘*’, since we specified that ‘*’ has higher precedence that ‘+’. More generally, some items are eligible only with some set of possible lookaheads.

States 8 to 10 are similar:

state 8

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp '-' exp .   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    '*' shift, and go to state 5
    '/' shift, and go to state 6

    '/' [reduce using rule 2 (exp)]
    $default    reduce using rule 2 (exp)


state 9

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp '*' exp .   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    '/' shift, and go to state 6

    '/' [reduce using rule 3 (exp)]
    $default    reduce using rule 3 (exp)


state 10

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)
    exp  ->  exp '/' exp .   (rule 4)

    '+' shift, and go to state 3
    '-' shift, and go to state 4
    '*' shift, and go to state 5
    '/' shift, and go to state 6

    '+' [reduce using rule 4 (exp)]
    '-' [reduce using rule 4 (exp)]
    '*' [reduce using rule 4 (exp)]
    '/' [reduce using rule 4 (exp)]
    $default    reduce using rule 4 (exp)

Observe that state 10 contains conflicts due to the lack of precedence of ‘/’ wrt ‘+’, ‘-’, and ‘*’, but also because the associativity of ‘/’ is not specified.

Finally, the state 11 (plus 12) is named the final state, or the accepting state:

state 11

    $EOI        shift, and go to state 12



state 12

    $default    accept

The end of input is shifted ‘$EOI shift,’ and the parser exits successfully (‘go to state 12’, that terminates).