A PEG parser takes a string as input and attempts to parse it as a given nonterminal. The key idea of the PEG implementation is that every nonterminal is just a function that takes a string as an argument and attempts to parse that string as its nonterminal. The functions always start from the beginning, but a parse is considered successful if there is material left over at the end.
This makes it easy to model different PEG parsing operations. For
instance, consider the PEG grammar
"ab", which could also be
(and "a" "b"). It matches the string “ab”. Here’s how
that might be implemented in the PEG style:
(define (match-and-a-b str) (match-a str) (match-b str))
As you can see, the use of functions provides an easy way to model
sequencing. In a similar way, one could model
(or a b) with
something like the following:
(define (match-or-a-b str) (or (match-a str) (match-b str)))
Here the semantics of a PEG
or expression map naturally onto
or operator. This function will attempt to run
(match-a str), and return its result if it succeeds. Otherwise it
Of course, the code above wouldn’t quite work. We need some way for the parsing functions to communicate. The actual interface used is below.
A parsing function takes three arguments - a string, the length of that string, and the position in that string it should start parsing at. In effect, the parsing functions pass around substrings in pieces - the first argument is a buffer of characters, and the second two give a range within that buffer that the parsing function should look at.
Parsing functions return either #f, if they failed to match their nonterminal, or a list whose first element must be an integer representing the final position in the string they matched and whose cdr can be any other data the function wishes to return, or ’() if it doesn’t have any more data.
The one caveat is that if the extra data it returns is a list, any
adjacent strings in that list will be appended by
instance, if a parsing function returns
(13 ("a" "b" "c")),
match-pattern will take
(13 ("abc")) as its value.
For example, here is a function to match “ab” using the actual interface.
(define (match-a-b str len pos) (and (<= (+ pos 2) len) (string= str "ab" pos (+ pos 2)) (list (+ pos 2) '()))) ; we return no extra information
The above function can be used to match a string by running
(match-pattern match-a-b "ab").
PEG expressions, such as those in a
define-peg-pattern form, are
interpreted internally in two steps.
First, any string PEG is expanded into an s-expression PEG by the code
(ice-9 peg string-peg) module.
Then, then s-expression PEG that results is compiled into a parsing
function by the
(ice-9 peg codegen) module. In particular, the
compile-peg-pattern is called on the s-expression. It then
decides what to do based on the form it is passed.
The PEG syntax can be expanded by providing
options for what to do with its forms. The extended syntax will be
associated with a symbol, for instance
my-parsing-form, and will
be called on all PEG expressions of the form
The parsing function should take two arguments. The first will be a
syntax object containing a list with all of the arguments to the form
(but not the form’s name), and the second will be the
capture-type argument that is passed to
New functions can be registered by calling
symbol function), where
symbol is the symbol that will indicate
a form of this type and
function is the code generating function
described above. The function
add-peg-compiler! is exported from
(ice-9 peg codegen) module.