6.8.11.1 Syntax Table Basics

Parsing is the process of converting a raw string of characters, such as you would type in during algebraic entry, into a Calc formula. Calc’s parser works in two stages. First, the input is broken down into tokens, such as words, numbers, and punctuation symbols like ‘+’, ‘:=’, and ‘+/-’. Space between tokens is ignored (except when it serves to separate adjacent words). Next, the parser matches this string of tokens against various built-in syntactic patterns, such as “an expression followed by ‘+’ followed by another expression” or “a name followed by ‘(’, zero or more expressions separated by commas, and ‘)’.”

A syntax table is a list of user-defined syntax rules, which allow you to specify new patterns to define your own favorite input notations. Calc’s parser always checks the syntax table for the current language mode, then the table for the Normal language mode, before it uses its built-in rules to parse an algebraic formula you have entered. Each syntax rule should go on its own line; it consists of a pattern, a ‘:=’ symbol, and a Calc formula with an optional condition. (Syntax rules resemble algebraic rewrite rules, but the notation for patterns is completely different.)

A syntax pattern is a list of tokens, separated by spaces. Except for a few special symbols, tokens in syntax patterns are matched literally, from left to right. For example, the rule,

foo ( ) := 2+3

would cause Calc to parse the formula ‘4+foo()*5’ as if it were ‘4+(2+3)*5’. Notice that the parentheses were written as two separate tokens in the rule. As a result, the rule works for both ‘foo()’ and ‘foo (  ). If we had written the rule as ‘foo () := 2+3’, then Calc would treat ‘()’ as a single, indivisible token, so that ‘foo( ) would not be recognized by the rule. (It would be parsed as a regular zero-argument function call instead.) In fact, this rule would also make trouble for the rest of Calc’s parser: An unrelated formula like ‘bar()’ would now be tokenized into ‘bar ()’ instead of ‘bar ( )’, so that the standard parser for function calls would no longer recognize it!

While it is possible to make a token with a mixture of letters and punctuation symbols, this is not recommended. It is better to break it into several tokens, as we did with ‘foo()’ above.

The symbol ‘#’ in a syntax pattern matches any Calc expression. On the righthand side, the things that matched the ‘#’s can be referred to as ‘#1’, ‘#2’, and so on (where ‘#1’ matches the leftmost ‘#’ in the pattern). For example, these rules match a user-defined function, prefix operator, infix operator, and postfix operator, respectively:

foo ( # ) := myfunc(#1)
foo # := myprefix(#1)
# foo # := myinfix(#1,#2)
# foo := mypostfix(#1)

Thus ‘foo(3)’ will parse as ‘myfunc(3)’, and ‘2+3 foo’ will parse as ‘mypostfix(2+3)’.

It is important to write the first two rules in the order shown, because Calc tries rules in order from first to last. If the pattern ‘foo #’ came first, it would match anything that could match the ‘foo ( # )’ rule, since an expression in parentheses is itself a valid expression. Thus the ‘foo ( # ) rule would never get to match anything. Likewise, the last two rules must be written in the order shown or else ‘3 foo 4’ will be parsed as ‘mypostfix(3) * 4’. (Of course, the best way to avoid these ambiguities is not to use the same symbol in more than one way at the same time! In case you’re not convinced, try the following exercise: How will the above rules parse the input ‘foo(3,4)’, if at all? Work it out for yourself, then try it in Calc and see.)

Calc is quite flexible about what sorts of patterns are allowed. The only rule is that every pattern must begin with a literal token (like ‘foo’ in the first two patterns above), or with a ‘#’ followed by a literal token (as in the last two patterns). After that, any mixture is allowed, although putting two ‘#’s in a row will not be very useful since two expressions with nothing between them will be parsed as one expression that uses implicit multiplication.

As a more practical example, Maple uses the notation ‘sum(a(i), i=1..10)’ for sums, which Calc’s Maple mode doesn’t recognize at present. To handle this syntax, we simply add the rule,

sum ( # , # = # .. # ) := sum(#1,#2,#3,#4)

to the Maple mode syntax table. As another example, C mode can’t read assignment operators like ‘++’ and ‘*=’. We can define these operators quite easily:

# *= # := muleq(#1,#2)
# ++ := postinc(#1)
++ # := preinc(#1)

To complete the job, we would use corresponding composition functions and Z C to cause these functions to display in their respective Maple and C notations. (Note that the C example ignores issues of operator precedence, which are discussed in the next section.)

You can enclose any token in quotes to prevent its usual interpretation in syntax patterns:

# ":=" # := becomes(#1,#2)

Quotes also allow you to include spaces in a token, although once again it is generally better to use two tokens than one token with an embedded space. To include an actual quotation mark in a quoted token, precede it with a backslash. (This also works to include backslashes in tokens.)

# "bad token" # "/\"\\" # := silly(#1,#2,#3)

This will parse ‘3 bad token 4 /"\ 5’ to ‘silly(3,4,5)’.

The token # has a predefined meaning in Calc’s formula parser; it is not valid to use ‘"#"’ in a syntax rule. However, longer tokens that include the ‘#’ character are allowed. Also, while ‘"$"’ and ‘"\""’ are allowed as tokens, their presence in the syntax table will prevent those characters from working in their usual ways (referring to stack entries and quoting strings, respectively).

Finally, the notation ‘%%’ anywhere in a syntax table causes the rest of the line to be ignored as a comment.