The matcher language is a declarative language for specifying a matcher procedure. A matcher procedure is a procedure that accepts a single parser-buffer argument and returns a boolean value indicating whether the match it performs was successful. If the match succeeds, the internal pointer of the parser buffer is moved forward over the matched text. If the match fails, the internal pointer is unchanged.
For example, here is a matcher procedure that matches the character `a':
(lambda (b) (match-parser-buffer-char b #\a))
Here is another example that matches two given characters, c1 and c2, in sequence:
(lambda (b) (let ((p (get-parser-buffer-pointer b))) (if (match-parser-buffer-char b c1) (if (match-parser-buffer-char b c2) #t (begin (set-parser-buffer-pointer! b p) #f)) #f)))
This is code is clear, but has lots of details that get in the way of understanding what it is doing. Here is the same example in the matcher language:
(*matcher (seq (char c1) (char c2)))
This is much simpler and more intuitive. And it generates virtually the same code:
(pp (*matcher (seq (char c1) (char c2)))) -| (lambda (#[b1]) -| (let ((#[p1] (get-parser-buffer-pointer #[b1]))) -| (and (match-parser-buffer-char #[b1] c1) -| (if (match-parser-buffer-char #[b1] c2) -| #t -| (begin -| (set-parser-buffer-pointer! #[b1] #[p1]) -| #f)))))
Now that we have seen an example of the language, it's time to look at
the detail. The
*matcher special form is the interface between
the matcher language and Scheme.
The operand mexp is an expression in the matcher language. The
*matcherexpression expands into Scheme code that implements a matcher procedure.
Here are the predefined matcher expressions. New matcher expressions can be defined using the macro facility (see Parser-language Macros). We will start with the primitive expressions.
These expressions match a given character. In each case, the expression operand is a Scheme expression that must evaluate to a character at run time. The `-ci' expressions do case-insensitive matching. The `not-' expressions match any character other than the given one.
These expressions match a given string. The expression operand is a Scheme expression that must evaluate to a string at run time. The
string-ciexpression does case-insensitive matching.
These expressions match a single character that is a member of a given character set. The expression operand is a Scheme expression that must evaluate to a character set at run time.
end-of-inputexpression is successful only when there are no more characters available to be matched.
discard-matchedexpression always successfully matches the null string. However, it isn't meant to be used as a matching expression; it is used for its effect.
discard-matchedcauses all of the buffered text prior to this point to be discarded (i.e. it calls
discard-parser-buffer-head!on the parser buffer).
discard-matchedmay not be used in certain places in a matcher expression. The reason for this is that it deliberately discards information needed for backtracking, so it may not be used in a place where subsequent backtracking will need to back over it. As a rule of thumb, use
discard-matchedonly in the last operand of a
altexpression (including any
altexpressions in which it is indirectly contained).
In addition to the above primitive expressions, there are two
convenient abbreviations. A character literal (e.g. `#\A') is
a legal primitive expression, and is equivalent to a
expression with that literal as its operand (e.g. `(char
#\A)'). Likewise, a string literal is equivalent to a
expression (e.g. `(string "abc")').
Next there are several combinator expressions. These closely correspond to similar combinators in regular expressions. Parameters named mexp are arbitrary expressions in the matcher language.
This matches each mexp operand in sequence. For example,(seq (char-set char-set:alphabetic) (char-set char-set:numeric))
matches an alphabetic character followed by a numeric character, such as `H4'.
Note that if there are no mexp operands, the
seqexpression successfully matches the null string.
This attempts to match each mexp operand in order from left to right. The first one that successfully matches becomes the match for the entire
altexpression participates in backtracking. If one of the mexp operands matches, but the overall match in which this expression is embedded fails, the backtracking mechanism will cause the
altexpression to try the remaining mexp operands. For example, if the expression(seq (alt "ab" "a") "b")
is matched against the text `abc', the
altexpression will initially match its first operand. But it will then fail to match the second operand of the
seqexpression. This will cause the
altto be restarted, at which time it will match `a', and the overall match will succeed.
Note that if there are no mexp operands, the
altmatch will always fail.
This matches zero or more occurrences of the mexp operand. (Consequently this match always succeeds.)
*expression participates in backtracking; if it matches N occurrences of mexp, but the overall match fails, it will backtrack to N-1 occurrences and continue. If the overall match continues to fail, the
*expression will continue to backtrack until there are no occurrences left.
This matches one or more occurrences of the mexp operand. It is equivalent to(seq mexp (* mexp))
This matches zero or one occurrences of the mexp operand. It is equivalent to(alt mexp (seq))
sexpexpression allows arbitrary Scheme code to be embedded inside a matcher. The expression operand must evaluate to a matcher procedure at run time; the procedure is called to match the parser buffer. For example,(*matcher (seq "a" (sexp parse-foo) "b"))
expands to(lambda (#[b1]) (let ((#[p1] (get-parser-buffer-pointer #[b1]))) (and (match-parser-buffer-char #[b1] #\a) (if (parse-foo #[b1]) (if (match-parser-buffer-char #[b1] #\b) #t (begin (set-parser-buffer-pointer! #[b1] #[p1]) #f)) (begin (set-parser-buffer-pointer! #[b1] #[p1]) #f)))))
The case in which expression is a symbol is so common that it has an abbreviation: `(sexp symbol)' may be abbreviated as just symbol.
with-pointerexpression fetches the parser buffer's internal pointer (using
get-parser-buffer-pointer), binds it to identifier, and then matches the pattern specified by mexp. Identifier must be a symbol.
This is meant to be used on conjunction with
sexp, as a way to capture a pointer to a part of the input stream that is outside the
sexpexpression. An example of the use of
with-pointerappears above (see with-pointer example).