Previous: Regular-expression procedures, Up: Regular Expressions

In addition to providing standard regular-expression support, MIT/GNU Scheme also provides the REXP abstraction. This is an alternative way to write regular expressions that is easier to read and understand than the standard notation. Regular expressions written in this notation can be translated into the standard notation.

The REXP abstraction is a set of combinators that are
composed into a complete regular expression. Each combinator directly
corresponds to a particular piece of regular-expression notation. For
example, the expression `(rexp-any-char)`

corresponds to the
`.`

character in standard regular-expression notation, while
`(rexp* `

`rexp``)`

corresponds to the `*`

character.

The primary advantages of REXP are that it makes the nesting structure of regular expressions explicit, and that it simplifies the description of complex regular expressions by allowing them to be built up using straightforward combinators.

— procedure: **rexp?**` object`

Returns

`#t`

ifobjectis a REXP expression, or`#f`

otherwise. A REXP is one of: a string, which represents the pattern matching that string; a character set, which represents the pattern matching a character in that set; or an object returned by calling one of the procedures defined here.

— procedure: **rexp->regexp**` rexp`

Converts

rexpto standard regular-expression notation, returning a newly-allocated string.

— procedure: **rexp-compile**` rexp`

Converts

rexpto standard regular-expression notation, then compiles it and returns the compiled result. Equivalent to(re-compile-pattern (rexp->regexprexp) #f)

— procedure: **rexp-any-char**

Returns a REXP that matches any single character except a newline. This is equivalent to the

`.`

construct.

— procedure: **rexp-line-start**

Returns a REXP that matches the start of a line. This is equivalent to the

`^`

construct.

— procedure: **rexp-line-end**

Returns a REXP that matches the end of a line. This is equivalent to the

`$`

construct.

— procedure: **rexp-string-start**

Returns a REXP that matches the start of the text being matched. This is equivalent to the

`\``

construct.

— procedure: **rexp-string-end**

Returns a REXP that matches the end of the text being matched. This is equivalent to the

`\'`

construct.

— procedure: **rexp-word-edge**

Returns a REXP that matches the start or end of a word. This is equivalent to the

`\b`

construct.

— procedure: **rexp-not-word-edge**

Returns a REXP that matches anywhere that is not the start or end of a word. This is equivalent to the

`\B`

construct.

— procedure: **rexp-word-start**

Returns a REXP that matches the start of a word. This is equivalent to the

`\<`

construct.

— procedure: **rexp-word-end**

Returns a REXP that matches the end of a word. This is equivalent to the

`\>`

construct.

— procedure: **rexp-word-char**

Returns a REXP that matches any word-constituent character. This is equivalent to the

`\w`

construct.

— procedure: **rexp-not-word-char**

Returns a REXP that matches any character that isn't a word constituent. This is equivalent to the

`\W`

construct.

The next two procedures accept a `syntax-type` argument specifying
the syntax class to be matched against. This argument is a symbol
selected from the following list. Each symbol is followed by the
equivalent character used in standard regular-expression notation.
`whitespace`

(space character),
`punctuation`

(`.`

),
`word`

(`w`

),
`symbol`

(`_`

),
`open`

(`(`

),
`close`

(`)`

),
`quote`

(`'`

),
`string-delimiter`

(`"`

),
`math-delimiter`

(`$`

),
`escape`

(`\`

),
`char-quote`

(`/`

),
`comment-start`

(`<`

),
`comment-end`

(`>`

).

— procedure: **rexp-syntax-char**` syntax-type`

Returns a REXP that matches any character of type

syntax-type. This is equivalent to the`\s`

construct.

— procedure: **rexp-not-syntax-char**` syntax-type`

Returns a REXP that matches any character not of type

syntax-type. This is equivalent to the`\S`

construct.

— procedure: **rexp-sequence**` rexp ...`

Returns a REXP that matches each

rexpargument in sequence. If norexpargument is supplied, the result matches the null string. This is equivalent to concatenating the regular expressions corresponding to eachrexpargument.

— procedure: **rexp-alternatives**` rexp ...`

Returns a REXP that matches any of the

rexparguments. This is equivalent to concatenating the regular expressions corresponding to eachrexpargument, separating them by the`\|`

construct.

— procedure: **rexp-group**` rexp ...`

`rexp-group`

is like`rexp-sequence`

, except that the result is marked as a match group. This is equivalent to the`\(`

...`\)`

construct.

The next three procedures in principal accept a single REXP
argument. For convenience, they accept multiple arguments, which are
converted into a single argument by `rexp-group`

. Note, however,
that if only one REXP argument is supplied, and it's very
simple, no grouping occurs.

— procedure: **rexp***` rexp ...`

Returns a REXP that matches zero or more instances of the pattern matched by the

rexparguments. This is equivalent to the`*`

construct.

— procedure: **rexp+**` rexp ...`

Returns a REXP that matches one or more instances of the pattern matched by the

rexparguments. This is equivalent to the`+`

construct.