Glossary (Bison 3.8.1)

Appendix B Glossary

Accepting state

A state whose only action is the accept action. The accepting state is thus a consistent state. See Understanding Your Parser.

Backus-Naur Form (BNF; also called “Backus Normal Form”)

Formal method of specifying context-free grammars originally proposed by John Backus, and slightly improved by Peter Naur in his 1960-01-02 committee document contributing to what became the Algol 60 report. See Languages and Context-Free Grammars.

Consistent state

A state containing only one possible action. See Default Reductions.

Context-free grammars

Grammars specified as rules that can be applied regardless of context. Thus, if there is a rule which says that an integer can be used as an expression, integers are allowed anywhere an expression is permitted. See Languages and Context-Free Grammars.

Counterexample

A sequence of tokens and/or nonterminals, with one dot, that demonstrates a conflict. The dot marks the place where the conflict occurs.

A unifying counterexample is a single string that has two different parses; its existence proves that the grammar is ambiguous. When a unifying counterexample cannot be found in reasonable time, a nonunifying counterexample is built: two different string sharing the prefix up to the dot.

See Generation of Counterexamples

Default reduction

The reduction that a parser should perform if the current parser state contains no other action for the lookahead token. In permitted parser states, Bison declares the reduction with the largest lookahead set to be the default reduction and removes that lookahead set. See Default Reductions.

Defaulted state

A consistent state with a default reduction. See Default Reductions.

Dynamic allocation

Allocation of memory that occurs during execution, rather than at compile time or on entry to a function.

Empty string

Analogous to the empty set in set theory, the empty string is a character string of length zero.

Finite-state stack machine

A “machine” that has discrete states in which it is said to exist at each instant in time. As input to the machine is processed, the machine moves from state to state as specified by the logic of the machine. In the case of the parser, the input is the language being parsed, and the states correspond to various stages in the grammar rules. See The Bison Parser Algorithm.

Generalized LR (GLR)

A parsing algorithm that can handle all context-free grammars, including those that are not LR(1). It resolves situations that Bison’s deterministic parsing algorithm cannot by effectively splitting off multiple parsers, trying all possible parsers, and discarding those that fail in the light of additional right context. See Generalized LR (GLR) Parsing.

Grouping

A language construct that is (in general) grammatically divisible; for example, ‘expression’ or ‘declaration’ in C. See Languages and Context-Free Grammars.

IELR(1) (Inadequacy Elimination LR(1))

A minimal LR(1) parser table construction algorithm. That is, given any context-free grammar, IELR(1) generates parser tables with the full language-recognition power of canonical LR(1) but with nearly the same number of parser states as LALR(1). This reduction in parser states is often an order of magnitude. More importantly, because canonical LR(1)’s extra parser states may contain duplicate conflicts in the case of non-LR(1) grammars, the number of conflicts for IELR(1) is often an order of magnitude less as well. This can significantly reduce the complexity of developing a grammar. See LR Table Construction.

Infix operator

An arithmetic operator that is placed between the operands on which it performs some operation.

Input stream

A continuous flow of data between devices or programs.

Kind

“Token” and “symbol” are each overloaded to mean either a grammar symbol (kind) or all parse info (kind, value, location) associated with occurrences of that grammar symbol from the input. To disambiguate,

we use “token kind” and “symbol kind” to mean both grammar symbols and the values that represent them in a base programming language (C, C++, etc.). The names of the types of these values are typically token_kind_t, or token_kind_type, or TokenKind, depending on the programming language.
we use “token” and “symbol” without the word “kind” to mean parsed occurrences, and we append the word “type” to refer to the types that represent them in a base programming language.

In summary: When you see “kind”, interpret “symbol” or “token” to mean a grammar symbol. When you don’t see “kind” (including when you see “type”), interpret “symbol” or “token” to mean a parsed symbol.

LAC (Lookahead Correction)

A parsing mechanism that fixes the problem of delayed syntax error detection, which is caused by LR state merging, default reductions, and the use of %nonassoc. Delayed syntax error detection results in unexpected semantic actions, initiation of error recovery in the wrong syntactic context, and an incorrect list of expected tokens in a verbose syntax error message. See LAC.

Language construct

One of the typical usage schemas of the language. For example, one of the constructs of the C language is the if statement. See Languages and Context-Free Grammars.

Left associativity

Operators having left associativity are analyzed from left to right: ‘a+b+c’ first computes ‘a+b’ and then combines with ‘c’. See Operator Precedence.

Left recursion

A rule whose result symbol is also its first component symbol; for example, ‘expseq1 : expseq1 ',' exp;’. See Recursive Rules.

Left-to-right parsing

Parsing a sentence of a language by analyzing it token by token from left to right. See The Bison Parser Algorithm.

Lexical analyzer (scanner)

A function that reads an input stream and returns tokens one by one. See The Lexical Analyzer Function yylex.

Lexical tie-in

A flag, set by actions in the grammar rules, which alters the way tokens are parsed. See Lexical Tie-ins.

Literal string token

A token which consists of two or more fixed characters. See Symbols, Terminal and Nonterminal.

Lookahead token

A token already read but not yet shifted. See Lookahead Tokens.

LALR(1)

The class of context-free grammars that Bison (like most other parser generators) can handle by default; a subset of LR(1). See Mysterious Conflicts.

LR(1)

The class of context-free grammars in which at most one token of lookahead is needed to disambiguate the parsing of any piece of input.

Nonterminal symbol

A grammar symbol standing for a grammatical construct that can be expressed through rules in terms of smaller constructs; in other words, a construct that is not a token. See Symbols, Terminal and Nonterminal.

Parser

A function that recognizes valid sentences of a language by analyzing the syntax structure of a set of tokens passed to it from a lexical analyzer.

Postfix operator

An arithmetic operator that is placed after the operands upon which it performs some operation.

Reduction

Replacing a string of nonterminals and/or terminals with a single nonterminal, according to a grammar rule. See The Bison Parser Algorithm.

Reentrant

A reentrant subprogram is a subprogram which can be in invoked any number of times in parallel, without interference between the various invocations. See A Pure (Reentrant) Parser.

Reverse Polish Notation

A language in which all operators are postfix operators.

Right recursion

A rule whose result symbol is also its last component symbol; for example, ‘expseq1: exp ',' expseq1;’. See Recursive Rules.

Semantics

In computer languages, the semantics are specified by the actions taken for each instance of the language, i.e., the meaning of each statement. See Defining Language Semantics.

Shift

A parser is said to shift when it makes the choice of analyzing further input from the stream rather than reducing immediately some already-recognized rule. See The Bison Parser Algorithm.

Single-character literal

A single character that is recognized and interpreted as is. See From Formal Rules to Bison Input.

Start symbol

The nonterminal symbol that stands for a complete valid utterance in the language being parsed. The start symbol is usually listed as the first nonterminal symbol in a language specification. See The Start-Symbol.

Symbol kind

A (finite) enumeration of the grammar symbols, as processed by the parser. See Symbols, Terminal and Nonterminal.

Symbol table

A data structure where symbol names and associated data are stored during parsing to allow for recognition and use of existing information in repeated uses of a symbol. See Multi-Function Calculator: mfcalc.

Syntax error

An error encountered during parsing of an input stream due to invalid syntax. See Error Recovery.

Terminal symbol

A grammar symbol that has no rules in the grammar and therefore is grammatically indivisible. The piece of text it represents is a token. See Languages and Context-Free Grammars.

Token

A basic, grammatically indivisible unit of a language. The symbol that describes a token in the grammar is a terminal symbol. The input of the Bison parser is a stream of tokens which comes from the lexical analyzer. See Symbols, Terminal and Nonterminal.

Token kind

A (finite) enumeration of the grammar terminals, as discriminated by the scanner. See Symbols, Terminal and Nonterminal.

Unreachable state

A parser state to which there does not exist a sequence of transitions from the parser’s start state. A state can become unreachable during conflict resolution. See Unreachable States.