To be acceptable by Wisent a context-free grammar must respect a particular format. That is, must be represented as an Emacs Lisp list of the form:
(terminals assocs . non-terminals)
Is the list of terminal symbols used in the grammar.
Specify the associativity of terminals. It is
there is no associativity defined, or an alist of
(assoc-type . assoc-value) elements.
assoc-type must be one of the
right symbols. When
default-prec, assoc-value must be
t (the default). Otherwise it is a list of
tokens which must have been previously declared in terminals.
For details, see (bison)Contextual Precedence, in the Bison manual.
Is the list of nonterminal definitions. Each definition has the form:
(nonterm . rules)
Where nonterm is the nonterminal symbol defined and rules the list of rules that describe this nonterminal. Each rule is a list:
(components [precedence] [action])
Is a list of various terminals and nonterminals that are put together by this rule.
(exp ((exp ?+ exp)) ;; exp: exp '+' exp ) ;; ;
Says that two groupings of type ‘exp’, with a ‘+’ token in between, can be combined into a larger grouping of type ‘exp’.
By convention, a nonterminal symbol should be in lower case, such as
‘exp’, ‘stmt’ or ‘declaration’. Terminal symbols
should be upper case to distinguish them from nonterminals: for
example, ‘INTEGER’, ‘IDENTIFIER’, ‘IF’ or
‘RETURN’. A terminal symbol that represents a particular keyword
in the language is conventionally the same as that keyword converted
to upper case. The terminal symbol
error is reserved for error
Scattered among the components can be middle-rule actions. Usually only action is provided (see action).
If components in a rule is
nil, it means that the rule
can match the empty string. For example, here is how to define a
comma-separated sequence of zero or more ‘exp’ groupings:
(expseq (nil) ;; expseq: ;; empty ((expseq1)) ;; | expseq1 ) ;; ; (expseq1 ((exp)) ;; expseq1: exp ((expseq1 ?, exp)) ;; | expseq1 ',' exp ) ;; ;
Assign the rule the precedence of the given terminal item, overriding the precedence that would be deduced for it, that is the one of the last terminal in it. Notice that only terminals declared in assocs have a precedence level. The altered rule precedence then affects how conflicts involving that rule are resolved.
precedence is an optional vector of one terminal item.
Here is how precedence solves the problem of unary minus.
First, declare a precedence for a fictitious terminal symbol named
UMINUS. There are no tokens of this type, but the symbol
serves to stand for its precedence:
… ((default-prec t) ;; This is the default (left '+' '-') (left '*') (left UMINUS))
Now the precedence of
UMINUS can be used in specific rules:
(exp … ;; exp: … ((exp ?- exp)) ;; | exp '-' exp … ;; … ((?- exp) [UMINUS]) ;; | '-' exp %prec UMINUS … ;; … ) ;; ;
If you forget to append
[UMINUS] to the rule for unary minus,
Wisent silently assumes that minus has its usual precedence. This
kind of problem can be tricky to debug, since one typically discovers
the mistake only by testing the code.
(default-prec nil) declaration makes it easier to
discover this kind of problem systematically. It causes rules that
lack a precedence modifier to have no precedence, even if the
last terminal symbol mentioned in their components has a declared
(default-prec nil) is in effect, you must specify
precedence for all rules that participate in precedence conflict
resolution. Then you will see any shift/reduce conflict until you
tell Wisent how to resolve it, either by changing your grammar or by
adding an explicit precedence. This will probably add declarations to
the grammar, but it helps to protect against incorrect rule
The effect of
(default-prec nil) can be reversed by giving
(default-prec t), which is the default.
For more details, see (bison)Contextual Precedence, in the Bison manual.
It is important to understand that assocs declarations defines
associativity but also assign a precedence level to terminals. All
terminals declared in the same
nonassoc association get the same precedence level. The
precedence level is increased at each new association.
On the other hand, precedence explicitly assign the precedence level of the given terminal to a rule.
An action is an optional Emacs Lisp function call, like this:
The result of an action determines the semantic value of a rule.
From an implementation standpoint, the function call will be embedded in a lambda expression, and several useful local variables will be defined:
Where n is a positive integer. Like in Bison, the value of
$n is the semantic value of the nth element of
components, starting from 1. It can be of any Lisp data
Where n is a positive integer. For each
variable defined there is a corresponding
variable. Its value is a pair
end-pos) that represent the start and end positions (in the
lexical input stream) of the
$n value. It can be
nil when the component positions are not available, like for an
empty string component for example.
Its value is the leftmost and rightmost positions of input data
matched by all components in the rule. This is a pair
(leftmost-pos . rightmost-pos). It can be
nil when components positions are not available.
This variable is initialized with the nonterminal symbol (nonterm) the rule belongs to. It could be useful to improve error reporting or debugging. It is also used to automatically provide incremental re-parse entry points for Semantic tags (see Wisent Semantic).
The value of
$action is the symbolic name of the current
semantic action (see Debugging actions).
When an action is not specified a default value is supplied, it is
(identity $1). This means that the default semantic value of a
rule is the value of its first component. Excepted for a rule
matching the empty string, for which the default action is to return