Next: , Up: C++ Parsers   [Contents][Index]


10.1.1 A Simple C++ Example

This tutorial about C++ parsers is based on a simple, self contained example.7 The following sections are the reference manual for Bison with C++, the last one showing a fully blown example (see A Complete C++ Example).

To look nicer, our example will be in C++14. It is not required: Bison supports the original C++98 standard.

A Bison file has three parts. In the first part, the prologue, we start by making sure we run a version of Bison which is recent enough, and that we generate C++.

%require "3.2"
%language "c++"

Let’s dive directly into the middle part: the grammar. Our input is a simple list of strings, that we display once the parsing is done.

%%
result:
  list  { std::cout << $1 << '\n'; }
;

%nterm <std::vector<std::string>> list;
list:
  %empty     { /* Generates an empty string list */ }
| list item  { $$ = $1; $$.push_back ($2); }
;

We used a vector of strings as a semantic value! To use genuine C++ objects as semantic values—not just PODs—we cannot rely on the union that Bison uses by default to store them, we need variants (see C++ Variants):

%define api.value.type variant

Obviously, the rule for result needs to print a vector of strings. In the prologue, we add:

%code
{
  // Print a list of strings.
  auto
  operator<< (std::ostream& o, const std::vector<std::string>& ss)
    -> std::ostream&
  {
    o << '{';
    const char *sep = "";
    for (const auto& s: ss)
      {
        o << sep << s;
        sep = ", ";
      }
    return o << '}';
  }
}

You may want to move it into the yy namespace to avoid leaking it in your default namespace. We recommend that you keep the actions simple, and move details into auxiliary functions, as we did with operator<<.

Our list of strings will be built from two types of items: numbers and strings:

%nterm <std::string> item;
%token <std::string> TEXT;
%token <int> NUMBER;
item:
  TEXT
| NUMBER  { $$ = std::to_string ($1); }
;

In the case of TEXT, the implicit default action applies: $$ = $1.


Our scanner deserves some attention. The traditional interface of yylex is not type safe: since the token kind and the token value are not correlated, you may return a NUMBER with a string as semantic value. To avoid this, we use token constructors (see Complete Symbols). This directive:

%define api.token.constructor

requests that Bison generates the functions make_TEXT and make_NUMBER, but also make_YYEOF, for the end of input.

Everything is in place for our scanner:

%code
{
  namespace yy
  {
    // Return the next token.
    auto yylex () -> parser::symbol_type
    {
      static int count = 0;
      switch (int stage = count++)
        {
        case 0:
          return parser::make_TEXT ("I have three numbers for you.");
        case 1: case 2: case 3:
          return parser::make_NUMBER (stage);
        case 4:
          return parser::make_TEXT ("And that's all!");
        default:
          return parser::make_YYEOF ();
        }
    }
  }
}

In the epilogue, the third part of a Bison grammar file, we leave simple details: the error reporting function, and the main function.

%%
namespace yy
{
  // Report an error to the user.
  auto parser::error (const std::string& msg) -> void
  {
    std::cerr << msg << '\n';
  }
}

int main ()
{
  yy::parser parse;
  return parse ();
}

Compile, and run!

$ bison simple.yy -o simple.cc
$ g++ -std=c++14 simple.cc -o simple
$ ./simple
{I have three numbers for you., 1, 2, 3, And that's all!}

Footnotes

(7)

The sources of this example are available as examples/c++/simple.yy.


Next: C++ Bison Interface, Up: C++ Parsers   [Contents][Index]