[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2. A simple example for expressions

Consider the following yacc grammar for a simple expression language:

 
%token INT FLOAT

%%

expr: INT
    | FLOAT
    | '(' expr ')'
    | expr '+' expr
    | expr '-' expr
    | expr '*' expr
    | expr '/' expr
    | '-' expr
    ;

(We will ignore the problems of precedence and associativity and assume that the reader is familiar with how to resolve such issues in yacc grammars).

There are 7 types of nodes for this grammar: ‘intnum’, ‘floatnum’, ‘plus’, ‘minus’, ‘multiply’, ‘divide’, and ‘negate’. They are defined in treecc as follows:

 
%node expression %abstract %typedef

%node binary expression %abstract =
{
    expression *expr1;
    expression *expr2;
}

%node unary expression %abstract =
{
    expression *expr;
}

%node intnum expression =
{
    int num;
}

%node floatnum expression =
{
    float num;
}

%node plus binary
%node minus binary
%node multiply binary
%node divide binary
%node negate unary

We have introduced three extra node types that refer to any expression, binary expressions, and unary expressions. These can be seen as superclasses in an OO-style framework. We have declared these node types as ‘abstract’ because the yacc grammar will not be permitted to create instances of these classes directly.

The ‘binary’, ‘unary’, ‘intnum’, and ‘floatnum’ node types have field definitions associated with them. These have a similar syntax to C struct declarations.

The yacc grammar is augmented as follows to build the parse tree:

 
%union {
    expression *node;
    int         inum;
    float       fnum;
}

%token INT FLOAT

%type <node> expr
%type <inum> INT
%type <fnum> FLOAT

%%

expr: INT               { $$ = intnum_create($1); }
    | FLOAT             { $$ = floatnum_create($1); }
    | '(' expr ')'      { $$ = $2; }
    | expr '+' expr     { $$ = plus_create($1, $3); }
    | expr '-' expr     { $$ = minus_create($1, $3); }
    | expr '*' expr     { $$ = multiply_create($1, $3); }
    | expr '/' expr     { $$ = divide_create($1, $3); }
    | '-' expr          { $$ = negate_create($2); }
    ;

The treecc translator generates the ‘*_create’ functions so that the rest of the compiler can build the necessary data structures on demand. The parameters to the ‘*_create’ functions are identical in type and order to the members of the structure for that node type.

Because ‘expression’, ‘binary’, and ‘unary’ are abstract, there will be no ‘*_create’ functions associated with them. This will help the programmer catch certain kinds of errors.

The type that is returned from a ‘*_create’ function is the first superclass of the node that has a ‘%typedef’ keyword associated with it; ‘expression *’ in this case.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1 Storing extra information

Normally we will want to store extra information with a node beyond that which is extracted by the yacc grammar. In our expression example, we probably want to store type information in the nodes so that we can determine if the whole expression is integer or floating point during semantic analysis. We can add type information to the ‘expression’ node type as follows:

 
%node expression %abstract %typedef =
{
    %nocreate type_code type;
}

The ‘%nocreate’ flag indicates that the field should not be passed to the ‘*_create’ functions as a parameter. i.e. it provides semantic information that isn't present in the grammar. When nodes are created, any fields that are declared as ‘%nocreate’ will be undefined in value. A default value can be specified as follows:

 
%node expression %abstract %typedef =
{
    %nocreate type_code type = {int_type};
}

Default values must be enclosed in ‘{’ and ‘}’ because they are pieces of code in the underlying source language (C, C++, etc), instead of tokens in the treecc syntax. Any legitimate expression in the underlying source language may be used.

We also need to arrange for ‘type_code’ to be declared. One way to do this is by adding a ‘%decls’ section to the front of the treecc input file:

 
%decls %{

typedef enum
{
    int_type,
    float_type

} type_code;

%}

We could have introduced the definition by placing a ‘#include’ directive into the ‘%decls’ section instead, or by defining a treecc enumerated type:

 
%enum type_code =
{
    int_type,
    float_type
}

Now that we have these definitions, type-inferencing can be implemented as follows:

 
%operation void infer_type(expression *e)

infer_type(binary)
{
    infer_type(e->expr1);
    infer_type(e->expr2);

    if(e->expr1->type == float_type || e->expr2->type == float_type)
    {
        e->type = float_type;
    }
    else
    {
        e->type = int_type;
    }
}

infer_type(unary)
{
    infer_type(e->expr);
    e->type = e->expr->type;
}

infer_type(intnum)
{
    e->type = int_type;
}

This example demonstrates using the abstract node types ‘binary’ and ‘unary’ to define operations on all subclasses. The treecc translator will generate code for a full C function called ‘infer_type’ that incorporates all of the cases.

But hang on a second! What happened to ‘floatnum’? Where did it go? It turns out that treecc will catch this. It will report an error to the effect that ‘node type `floatnum' is not handled in operation `infer_type'’. Here is its definition:

 
infer_type(floatnum)
{
    e->type = float_type;
}

As we can see, treecc has just caught a bug in the language implementation and reported it to us as soon as we introduced it.

Let's now extend the language with a ‘power’ operator:

 
yacc:

expr: expr '^' expr     { $$ = create_power($1, $3); }
    ;

treecc:

%node power binary

That's all there is to it! When treecc re-translates the input file, it will modify the definition of ‘infer_type’ to include the extra case for ‘power’ nodes. Because ‘power’ is a subclass of ‘binary’, treecc already knows how to perform type inferencing for the new node and it doesn't warn us about a missing declaration.

What if we wanted to restrict the second argument of ‘power’ to be an integer value? We can add the following case to ‘infer_type’:

 
infer_type(power)
{
    infer_type(e->expr1);
    infer_type(e->expr2);

    if(e->expr2->type != int_type)
    {
        error("second argument to `^' is not an integer");
    }

    e->type = e->expr1->type;
}

The translator now notices that there is a more specific implementation of ‘infer_type’ for ‘power’, and won't use the ‘binary’ case for it.

The most important thing to realise here is that the translator always checks that there are sufficient declarations for ‘infer_type’ to cover all relevant node types. If it detects a lack, it will immediately raise an error to the user. This allows tree coverage problems to be found a lot sooner than with the traditional approach.

See section Full expression example code, for a complete listing of the above example files.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Klaus Treichel on January, 18 2009 using texi2html 1.78.