Grammar format (Wisent Parser Development)

Next: Example, Up: Wisent Grammar [Contents][Index]

2.1 Grammar format

To be acceptable by Wisent a context-free grammar must respect a particular format. That is, must be represented as an Emacs Lisp list of the form:

(terminals assocs . non-terminals)

terminals

Is the list of terminal symbols used in the grammar.

assocs

Specify the associativity of terminals. It is nil when there is no associativity defined, or an alist of (assoc-type . assoc-value) elements.

assoc-type must be one of the default-prec, nonassoc, left or right symbols. When assoc-type is default-prec, assoc-value must be nil or t (the default). Otherwise it is a list of tokens which must have been previously declared in terminals.

For details, see (bison)Contextual Precedence, in the Bison manual.

non-terminals

Is the list of nonterminal definitions. Each definition has the form:

(nonterm . rules)

Where nonterm is the nonterminal symbol defined and rules the list of rules that describe this nonterminal. Each rule is a list:

(components [precedence] [action])

Where:

components

Is a list of various terminals and nonterminals that are put together by this rule.

For example,

(exp ((exp ?+ exp))          ;; exp: exp '+' exp
     )                       ;;    ;

Says that two groupings of type ‘exp’, with a ‘+’ token in between, can be combined into a larger grouping of type ‘exp’.

By convention, a nonterminal symbol should be in lower case, such as ‘exp’, ‘stmt’ or ‘declaration’. Terminal symbols should be upper case to distinguish them from nonterminals: for example, ‘INTEGER’, ‘IDENTIFIER’, ‘IF’ or ‘RETURN’. A terminal symbol that represents a particular keyword in the language is conventionally the same as that keyword converted to upper case. The terminal symbol error is reserved for error recovery.

Scattered among the components can be middle-rule actions. Usually only action is provided (see action).

If components in a rule is nil, it means that the rule can match the empty string. For example, here is how to define a comma-separated sequence of zero or more ‘exp’ groupings:

(expseq  (nil)               ;; expseq: ;; empty
         ((expseq1))         ;;       | expseq1
         )                   ;;       ;

(expseq1 ((exp))             ;; expseq1: exp
         ((expseq1 ?, exp))  ;;        | expseq1 ',' exp
         )                   ;;        ;

precedence

Assign the rule the precedence of the given terminal item, overriding the precedence that would be deduced for it, that is the one of the last terminal in it. Notice that only terminals declared in assocs have a precedence level. The altered rule precedence then affects how conflicts involving that rule are resolved.

precedence is an optional vector of one terminal item.

Here is how precedence solves the problem of unary minus. First, declare a precedence for a fictitious terminal symbol named UMINUS. There are no tokens of this type, but the symbol serves to stand for its precedence:

…
((default-prec t) ;; This is the default
 (left '+' '-')
 (left '*')
 (left UMINUS))

Now the precedence of UMINUS can be used in specific rules:

(exp    …                  ;; exp:    …
         ((exp ?- exp))      ;;         | exp '-' exp
        …                  ;;         …
         ((?- exp) [UMINUS]) ;;         | '-' exp %prec UMINUS
        …                  ;;         …
        )                    ;;         ;

If you forget to append [UMINUS] to the rule for unary minus, Wisent silently assumes that minus has its usual precedence. This kind of problem can be tricky to debug, since one typically discovers the mistake only by testing the code.

Using (default-prec nil) declaration makes it easier to discover this kind of problem systematically. It causes rules that lack a precedence modifier to have no precedence, even if the last terminal symbol mentioned in their components has a declared precedence.

If (default-prec nil) is in effect, you must specify precedence for all rules that participate in precedence conflict resolution. Then you will see any shift/reduce conflict until you tell Wisent how to resolve it, either by changing your grammar or by adding an explicit precedence. This will probably add declarations to the grammar, but it helps to protect against incorrect rule precedences.

The effect of (default-prec nil) can be reversed by giving (default-prec t), which is the default.

For more details, see (bison)Contextual Precedence, in the Bison manual.

It is important to understand that assocs declarations defines associativity but also assign a precedence level to terminals. All terminals declared in the same left, right or nonassoc association get the same precedence level. The precedence level is increased at each new association.

On the other hand, precedence explicitly assign the precedence level of the given terminal to a rule.

action

An action is an optional Emacs Lisp function call, like this:

(identity $1)

The result of an action determines the semantic value of a rule.

From an implementation standpoint, the function call will be embedded in a lambda expression, and several useful local variables will be defined:

$n: Where n is a positive integer. Like in Bison, the value of $n is the semantic value of the nth element of components, starting from 1. It can be of any Lisp data type.
$regionN: Where n is a positive integer. For each $n variable defined there is a corresponding $regionn variable. Its value is a pair (start-pos . end-pos) that represent the start and end positions (in the lexical input stream) of the $n value. It can be nil when the component positions are not available, like for an empty string component for example.
$region: Its value is the leftmost and rightmost positions of input data matched by all components in the rule. This is a pair (leftmost-pos . rightmost-pos). It can be nil when components positions are not available.
$nterm: This variable is initialized with the nonterminal symbol (nonterm) the rule belongs to. It could be useful to improve error reporting or debugging. It is also used to automatically provide incremental re-parse entry points for Semantic tags (see How to use Wisent with Semantic).
$action: The value of $action is the symbolic name of the current semantic action (see Debugging semantic actions).

When an action is not specified a default value is supplied, it is (identity $1). This means that the default semantic value of a rule is the value of its first component. Excepted for a rule matching the empty string, for which the default action is to return nil.