Next: , Previous: Simple GLR Parsers, Up: GLR Parsers


1.5.2 Using GLR to Resolve Ambiguities

Let's consider an example, vastly simplified from a C++ grammar.

     %{
       #include <stdio.h>
       #define YYSTYPE char const *
       int yylex (void);
       void yyerror (char const *);
     %}
     
     %token TYPENAME ID
     
     %right '='
     %left '+'
     
     %glr-parser
     
     %%
     
     prog:
       %empty
     | prog stmt   { printf ("\n"); }
     ;
     
     stmt:
       expr ';'  %dprec 1
     | decl      %dprec 2
     ;
     
     expr:
       ID               { printf ("%s ", $$); }
     | TYPENAME '(' expr ')'
                        { printf ("%s <cast> ", $1); }
     | expr '+' expr    { printf ("+ "); }
     | expr '=' expr    { printf ("= "); }
     ;
     
     decl:
       TYPENAME declarator ';'
                        { printf ("%s <declare> ", $1); }
     | TYPENAME declarator '=' expr ';'
                        { printf ("%s <init-declare> ", $1); }
     ;
     
     declarator:
       ID               { printf ("\"%s\" ", $1); }
     | '(' declarator ')'
     ;

This models a problematic part of the C++ grammar—the ambiguity between certain declarations and statements. For example,

     T (x) = y+z;

parses as either an expr or a stmt (assuming that ‘T’ is recognized as a TYPENAME and ‘x’ as an ID). Bison detects this as a reduce/reduce conflict between the rules expr : ID and declarator : ID, which it cannot resolve at the time it encounters x in the example above. Since this is a GLR parser, it therefore splits the problem into two parses, one for each choice of resolving the reduce/reduce conflict. Unlike the example from the previous section (see Simple GLR Parsers), however, neither of these parses “dies,” because the grammar as it stands is ambiguous. One of the parsers eventually reduces stmt : expr ';' and the other reduces stmt : decl, after which both parsers are in an identical state: they've seen ‘prog stmt’ and have the same unprocessed input remaining. We say that these parses have merged.

At this point, the GLR parser requires a specification in the grammar of how to choose between the competing parses. In the example above, the two %dprec declarations specify that Bison is to give precedence to the parse that interprets the example as a decl, which implies that x is a declarator. The parser therefore prints

     "x" y z + T <init-declare>

The %dprec declarations only come into play when more than one parse survives. Consider a different input string for this parser:

     T (x) + y;

This is another example of using GLR to parse an unambiguous construct, as shown in the previous section (see Simple GLR Parsers). Here, there is no ambiguity (this cannot be parsed as a declaration). However, at the time the Bison parser encounters x, it does not have enough information to resolve the reduce/reduce conflict (again, between x as an expr or a declarator). In this case, no precedence declaration is used. Again, the parser splits into two, one assuming that x is an expr, and the other assuming x is a declarator. The second of these parsers then vanishes when it sees +, and the parser prints

     x T <cast> y +

Suppose that instead of resolving the ambiguity, you wanted to see all the possibilities. For this purpose, you must merge the semantic actions of the two possible parsers, rather than choosing one over the other. To do so, you could change the declaration of stmt as follows:

     stmt:
       expr ';'  %merge <stmtMerge>
     | decl      %merge <stmtMerge>
     ;

and define the stmtMerge function as:

     static YYSTYPE
     stmtMerge (YYSTYPE x0, YYSTYPE x1)
     {
       printf ("<OR> ");
       return "";
     }

with an accompanying forward declaration in the C declarations at the beginning of the file:

     %{
       #define YYSTYPE char const *
       static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);
     %}

With these declarations, the resulting parser parses the first example as both an expr and a decl, and prints

     "x" y z + T <init-declare> x T <cast> y z + = <OR>

Bison requires that all of the productions that participate in any particular merge have identical ‘%merge’ clauses. Otherwise, the ambiguity would be unresolvable, and the parser will report an error during any parse that results in the offending merge.