Canonical LR, IELR, and LALR can suffer from a couple of problems upon encountering a syntax error. First, the parser might perform additional parser stack reductions before discovering the syntax error. Such reductions can perform user semantic actions that are unexpected because they are based on an invalid token, and they cause error recovery to begin in a different syntactic context than the one in which the invalid token was encountered. Second, when verbose error messages are enabled (see Error Reporting), the expected token list in the syntax error message can both contain invalid tokens and omit valid tokens.
The culprits for the above problems are
%nonassoc, default reductions
in inconsistent states (see Default Reductions), and parser state
merging. Because IELR and LALR merge parser states, they suffer the most.
Canonical LR can suffer only if
%nonassoc is used or if default
reductions are enabled for inconsistent states.
LAC (Lookahead Correction) is a new mechanism within the parsing algorithm
that solves these problems for canonical LR, IELR, and LALR without
%nonassoc, default reductions, or state merging. You can
enable LAC with the
%define parse.lac directive.
Enable LAC to improve syntax error handling.
(This feature is experimental. More user feedback will help to stabilize it. Moreover, it is currently only available for deterministic parsers in C.)
Conceptually, the LAC mechanism is straight-forward. Whenever the parser fetches a new token from the scanner so that it can determine the next parser action, it immediately suspends normal parsing and performs an exploratory parse using a temporary copy of the normal parser state stack. During this exploratory parse, the parser does not perform user semantic actions. If the exploratory parse reaches a shift action, normal parsing then resumes on the normal parser stacks. If the exploratory parse reaches an error instead, the parser reports a syntax error. If verbose syntax error messages are enabled, the parser must then discover the list of expected tokens, so it performs a separate exploratory parse for each token in the grammar.
There is one subtlety about the use of LAC. That is, when in a consistent parser state with a default reduction, the parser will not attempt to fetch a token from the scanner because no lookahead is needed to determine the next parser action. Thus, whether default reductions are enabled in consistent states (see Default Reductions) affects how soon the parser detects a syntax error: immediately when it reaches an erroneous token or when it eventually needs that token as a lookahead to determine the next parser action. The latter behavior is probably more intuitive, so Bison currently provides no way to achieve the former behavior while default reductions are enabled in consistent states.
Thus, when LAC is in use, for some fixed decision of whether to enable default reductions in consistent states, canonical LR and IELR behave almost exactly the same for both syntactically acceptable and syntactically unacceptable input. While LALR still does not support the full language-recognition power of canonical LR and IELR, LAC at least enables LALR’s syntax error handling to correctly reflect LALR’s language-recognition power.
There are a few caveats to consider when using LAC:
IELR plus LAC does have one shortcoming relative to canonical LR. Some parsers generated by Bison can loop infinitely. LAC does not fix infinite parsing loops that occur between encountering a syntax error and detecting it, but enabling canonical LR or disabling default reductions sometimes does.
Because of internationalization considerations, Bison-generated parsers limit the size of the expected token list they are willing to report in a verbose syntax error message. If the number of expected tokens exceeds that limit, the list is simply dropped from the message. Enabling LAC can increase the size of the list and thus cause the parser to drop it. Of course, dropping the list is better than reporting an incorrect list.
Because LAC requires many parse actions to be performed twice, it can have a performance penalty. However, not all parse actions must be performed twice. Specifically, during a series of default reductions in consistent states and shift actions, the parser never has to initiate an exploratory parse. Moreover, the most time-consuming tasks in a parse are often the file I/O, the lexical analysis performed by the scanner, and the user’s semantic actions, but none of these are performed during the exploratory parse. Finally, the base of the temporary stack used during an exploratory parse is a pointer into the normal parser state stack so that the stack is never physically copied. In our experience, the performance penalty of LAC has proved insignificant for practical grammars.
While the LAC algorithm shares techniques that have been recognized in the parser community for years, for the publication that introduces LAC, see Denny 2010 May.