Next: , Previous: , Up: Reverse Polish Notation Calculator   [Contents][Index]

2.1.3 The rpcalc Lexical Analyzer

The lexical analyzer’s job is low-level parsing: converting characters or sequences of characters into tokens. The Bison parser gets its tokens by calling the lexical analyzer. See The Lexical Analyzer Function yylex.

Only a simple lexical analyzer is needed for the RPN calculator. This lexical analyzer skips blanks and tabs, then reads in numbers as double and returns them as NUM tokens. Any other character that isn’t part of a number is a separate token. Note that the token-code for such a single-character token is the character itself.

The return value of the lexical analyzer function is a numeric code which represents a token kind. The same text used in Bison rules to stand for this token kind is also a C expression for the numeric code of the kind. This works in two ways. If the token kind is a character literal, then its numeric code is that of the character; you can use the same character literal in the lexical analyzer to express the number. If the token kind is an identifier, that identifier is defined by Bison as a C enum whose definition is the appropriate code. In this example, therefore, NUM becomes an enum for yylex to use.

The semantic value of the token (if it has one) is stored into the global variable yylval, which is where the Bison parser will look for it. (The C data type of yylval is YYSTYPE, whose value was defined at the beginning of the grammar via ‘%define api.value.type {double}’; see Declarations for rpcalc.)

A token kind code of zero is returned if the end-of-input is encountered. (Bison recognizes any nonpositive value as indicating end-of-input.)

Here is the code for the lexical analyzer:

/* The lexical analyzer returns a double floating point
   number on the stack and the token NUM, or the numeric code
   of the character read if not a number.  It skips all blanks
   and tabs, and returns 0 for end-of-input. */

#include <ctype.h>
#include <stdlib.h>

yylex (void)
  int c = getchar ();
  /* Skip white space. */
  while (c == ' ' || c == '\t')
    c = getchar ();
  /* Process numbers. */
  if (c == '.' || isdigit (c))
      ungetc (c, stdin);
      if (scanf ("%lf", &yylval) != 1)
        abort ();
      return NUM;
  /* Return end-of-input. */
  else if (c == EOF)
    return YYEOF;
  /* Return a single char. */
    return c;

Next: The Controlling Function, Previous: Grammar Rules for rpcalc, Up: Reverse Polish Notation Calculator   [Contents][Index]