Lexical and datum syntax

The syntax of Scheme code is organized in three levels:

  1. the lexical syntax that describes how a program text is split into a sequence of lexemes,

  2. the datum syntax, formulated in terms of the lexical syntax, that structures the lexeme sequence as a sequence of syntactic data, where a syntactic datum is a recursively structured entity,

  3. the program syntax formulated in terms of the datum syntax, imposing further structure and assigning meaning to syntactic data.

Syntactic data (also called external representations) double as a notation for objects, and the read and write procedures can be used for reading and writing syntactic data, converting between their textual representation and the corresponding objects. Each syntactic datum represents a corresponding datum value. A syntactic datum can be used in a program to obtain the corresponding datum value using quote.

Scheme source code consists of syntactic data and (non–significant) comments. Syntactic data in Scheme source code are called forms. (A form nested inside another form is called a subform.) Consequently, Scheme’s syntax has the property that any sequence of characters that is a form is also a syntactic datum representing some object. This can lead to confusion, since it may not be obvious out of context whether a given sequence of characters is intended to be a representation of objects or the text of a program. It is also a source of power, since it facilitates writing programs such as interpreters or compilers that treat programs as objects (or vice versa).

A datum value may have several different external representations. For example, both #e28.000 and #x1c are syntactic data representing the exact integer object 28, and the syntactic data (8 13), ( 08 13 ), (8 . (13 . ())) all represent a list containing the exact integer objects 8 and 13. Syntactic data that represent equal objects (in the sense of equal?) are always equivalent as forms of a program.

Because of the close correspondence between syntactic data and datum values, we sometimes uses the term datum for either a syntactic datum or a datum value when the exact meaning is apparent from the context.