The GNU 3DLDF Language Page

Author: Laurence D. Finston

This copyright notice applies to the text and source code of this web site, and the graphics that appear on it. The software described in this text has its own copyright notice and license, which can be found in the distribution itself.

Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of this license is included in the file COPYING.TXT

Last updated: September 19, 2007

Top

Parser

Introduction to the Grammar

Declarations

Assignments

Contact

Parser

The GNU 3DLDF language is implemented by means of a parser. This is actually a C++ function called yyparse(), which is generated by a package called GNU Bison from a parser input file.

Parser input files for Bison have a special form. They consist largely of rules written in a special syntax with associated actions, written in C or C++. Bison reads such an input file and generates a file C or C++ code with the definition of yyparse().

Bison expects a single input file. In the case of 3DLDF, this file is generated by ctangle from a large set of CWEB files. These are the files in the directory 3DLDF-1.2.0/CWEB/ with names of the pattern p*.w, where * stands for any sequence of characters. Names that fit this pattern are, for example, parser.w, pbsndecl.w, pldfdcl.w, etc.

There are a couple of steps involved, but when ctangle and Bison are finished, there's a file of C++ called parser.c++ and two header files called parser.h and parser.h++. The header files are included in some of the other source files, and parser.c++ is compiled and linked with the other object files to create the executable 3dldf.

Bison also generates the file parser.output as a by-product of its operation. It's not needed for building the package, but I include it in the distribution as a convenience. It contains information about the parser function, including a list of the grammar rules in Backus-Naur form. It is an important source of information for anybody who wants to learn to use the interactive version of GNU 3DLDF, since I have not yet rewritten the manual. See Introduction to the Grammar for more information.

If you want to start learning about the source code of the parser, the place to start is parser.w. This file includes all of the other source files for the parser.

Back to contents
Back to top
Back to main page

Introduction to the Grammar

The file parser.output may seem intimidating at first glance, but it's not that hard to learn to interpret. And once you do, the grammar of the 3DLDF language will be an open book to you.

Please note that parser.output changes with every change to the grammar, and this happens frequently. So the examples on this page may not correspond exactly to what's in the most recent version of parser.output. However, I will mostly be talking about general principles here, so it shouldn't matter.

parser.output starts out with a copyright notice and a license. After that, you'll find a line with the text Terminals which are not used followed by a long list of words in capital letters. Another long list of lines beginning with the word State follows. Finally, you get to a line containing only the word Grammar. We've arrived!

The text in this section is a numbered list of the grammar rules of the 3DLDF language in a special form called Backus-Naur form. Rule 0 is automatically generated by Bison. Rule 1 is the so-called start symbol of the language. It looks like this:

1 program: statement_list END

This means that in 3DLDF, a program is a statement list followed by the token END.

In Bison, a token, also known as a terminal symbol or just terminal for short, is a basic unit of a language. They are conventionally written in capital letters. Other parts of speech are called non-terminal symbols or just non-terminals. They are built up out of tokens in the course of parsing input. The tokens come from the scanner, which is a function called yyscan(), but that's another story.

In Rule 1, program and statement_list are written in lowercase letters, so they are non-terminal symbols. In addition, there is a colon with program on the left-hand side (LHS) and statement_list END on the right-hand side (RHS). The colon can be read as is, i.e., program is statement_list followed by END. Another way to read this is statement_list END reduces to program. This is really what yyparse() is doing: It tries to match the sequence of tokens it receives from the scanner to rules. When it does, it goes into a particular state. These states pile up on a stack until they can be reduced. Then, they can begin to pile up again. Ultimately, a valid input will reduce to the state corresponding to Rule 1, i.e., the rule for the start symbol, program.

The next two rules look like this:

2 statement_list: /* empty */
3               | statement_list statement

This tells us what a statement_list is: It's either empty, or a statement_list followed by a statement. /* and */ enclose a comment, so Rule 2 really has nothing at all on its right-hand side. The | symbol means or.

Rules 2 and 3 could be restated as follows:

2 statement_list: /* empty */
3 statement_list: statement_list statement

The two formulations are completely equivalent.

Rules 2 and 3 are an example of recursion. Let's say yyparse() starts parsing an input and it gets a statement (whatever that is). It can now try to reduce to the state corresponding to Rule 3. All we have is a statement, but it matches, because nothing is a valid statement_list.

Now we scan another statement. We can match Rule 3 again, because we've got a statement_list from before, followed by our current statement. So, in effect, we're appending statements onto a statement_list.

To stop, all we have to do is get the scanner to pass an END token to the parser. This is easy. The string end, written as a separate word in the input, following a statement, will do the trick.

In fact, this works for most tokens: There's usually a character or string that corresponds or maps to them. For example, these are the rules for statement:

 4 statement: SEMI_COLON
 5          | END_INPUT
 6          | equation SEMI_COLON
 7          | declaration SEMI_COLON
 8          | assignment SEMI_COLON
 9          | operation_assignment SEMI_COLON
10          | macro_definition
11          | command SEMI_COLON
12          | conditional
13          | loop
14          | macro_call
15          | let_statement SEMI_COLON

The token SEMI_COLON maps to the character ; and END_INPUT maps to the string end_input. In fact, END_INPUT also maps to endinput, so end_input and endinput can be used interchangeably. In other words, they are synonyms.

The exception is tokens used for declaring objects of types defined in the 3DLDF language. For example, when an input contains the line
point p;
the string point doesn't map to the token POINT, which does exist, but rather to the token POINT_DECLARATOR. The same applies to boolean, transform, path, and all of the other types defined in the 3DLDF language.

So what strings do map to POINT, PATH, etc.? Well, none at all. When one is needed, the parser passes it to the scanner, which passes it back to the parser. I call this faking a token, although such tokens are no less real than ones created by scanning input. The reason for this is the separation between the rules and the actions. Information about the type of a variable can be found in an action, but it's not possible to influence the course of parsing except by means of the stream of tokens passed to the parsing function. Faking tokens is a means of doing this.

Most kinds of statements will require their own chapter. However, a few of them are so simple that I can explain them here.

4 statement: SEMI_COLON
This is the magic rule that makes it possible to have extra semi-colons between statements.

5 statement: END_INPUT
When the parser receives an END_INPUT token, it stops reading from the current input stream. This rule is actually a bit more complicated than it appears at first glance. The behavior of 3DLDF depends on what the current input source is and whether it's using multiple threads to read input. I will explain about this elsewhere.

6 statement: equation SEMI_COLON
Equations à la METAFONT do not yet exist in 3DLDF, so this rule is currently non-functional.

NEXT CHAPTER:

Declarations

Back to contents
Back to top
Back to main page

The GNU 3DLDF Language Page

Table of Contents

Parser

Introduction to the Grammar