Next: , Up: Regular Expression Syntax   [Contents][Index]


18.2.1 Syntax Bits

In any particular syntax for regular expressions, some characters are always special, others are sometimes special, and others are never special. The particular syntax that Regex recognizes for a given regular expression depends on the current syntax (as set by re_set_syntax) when the pattern buffer of that regular expression was compiled.

You get a pattern buffer by compiling a regular expression. See GNU Pattern Buffers, for more information on pattern buffers. See GNU Regular Expression Compiling, and BSD Regular Expression Compiling, for more information on compiling.

Regex considers the current syntax to be a collection of bits; we refer to these bits as syntax bits. In most cases, they affect what characters represent what operators. We describe the meanings of the operators to which we refer in Common Operators and GNU Operators.

For reference, here is the complete list of syntax bits, in alphabetical order:

RE_BACKSLASH_ESCAPE_IN_LISTS

If this bit is set, then ‘\’ inside a list (see List Operators ([] and [^])) quotes (makes ordinary, if it’s special) the following character; if this bit isn’t set, then ‘\’ is an ordinary character inside lists. (See The Backslash Character, for what ‘\’ does outside of lists.)

RE_BK_PLUS_QM

If this bit is set, then ‘\+’ represents the match-one-or-more operator and ‘\?’ represents the match-zero-or-more operator; if this bit isn’t set, then ‘+’ represents the match-one-or-more operator and ‘?’ represents the match-zero-or-one operator. This bit is irrelevant if RE_LIMITED_OPS is set.

RE_CHAR_CLASSES

If this bit is set, then you can use character classes in lists; if this bit isn’t set, then you can’t.

RE_CONTEXT_INDEP_ANCHORS

If this bit is set, then ‘^’ and ‘$’ are special anywhere outside a list; if this bit isn’t set, then these characters are special only in certain contexts. See The Match-beginning-of-line Operator (^), and The Match-end-of-line Operator ($).

RE_CONTEXT_INDEP_OPS

If this bit is set, then certain characters are special anywhere outside a list; if this bit isn’t set, then those characters are special only in some contexts and are ordinary elsewhere. Specifically, if this bit isn’t set then ‘*’, and (if the syntax bit RE_LIMITED_OPS isn’t set) ‘+’ and ‘?’ (or ‘\+’ and ‘\?’, depending on the syntax bit RE_BK_PLUS_QM) represent repetition operators only if they’re not first in a regular expression or just after an open-group or alternation operator. The same holds for ‘{’ (or ‘\{’, depending on the syntax bit RE_NO_BK_BRACES) if it is the beginning of a valid interval and the syntax bit RE_INTERVALS is set.

RE_CONTEXT_INVALID_DUP

If this bit is set, then an open-interval operator cannot occur at the start of a regular expression, or immediately after an alternation, open-group or close-interval operator.

RE_CONTEXT_INVALID_OPS

If this bit is set, then repetition and alternation operators can’t be in certain positions within a regular expression. Specifically, the regular expression is invalid if it has:

  • a repetition operator first in the regular expression or just after a match-beginning-of-line, open-group, or alternation operator; or
  • an alternation operator first or last in the regular expression, just before a match-end-of-line operator, or just after an alternation or open-group operator.

If this bit isn’t set, then you can put the characters representing the repetition and alternation characters anywhere in a regular expression. Whether or not they will in fact be operators in certain positions depends on other syntax bits.

RE_DEBUG

If this bit is set, and the regex library was compiled with -DDEBUG, then internal debugging is turned on; if unset, then it is turned off.

RE_DOT_NEWLINE

If this bit is set, then the match-any-character operator matches a newline; if this bit isn’t set, then it doesn’t.

RE_DOT_NOT_NULL

If this bit is set, then the match-any-character operator doesn’t match a null character; if this bit isn’t set, then it does.

RE_HAT_LISTS_NOT_NEWLINE

If this bit is set, nonmatching lists ‘[^...]’ do not match newline; if not set, they do.

RE_ICASE

If this bit is set, then ignore case when matching; otherwise, case is significant.

RE_INTERVALS

If this bit is set, then Regex recognizes interval operators; if this bit isn’t set, then it doesn’t.

RE_INVALID_INTERVAL_ORD

If this bit is set, a syntactically invalid interval is treated as a string of ordinary characters. For example, the extended regular expression ‘a{1’ is treated as ‘a\{1’.

RE_LIMITED_OPS

If this bit is set, then Regex doesn’t recognize the match-one-or-more, match-zero-or-one or alternation operators; if this bit isn’t set, then it does.

RE_NEWLINE_ALT

If this bit is set, then newline represents the alternation operator; if this bit isn’t set, then newline is ordinary.

RE_NO_BK_BRACES

If this bit is set, then ‘{’ represents the open-interval operator and ‘}’ represents the close-interval operator; if this bit isn’t set, then ‘\{’ represents the open-interval operator and ‘\}’ represents the close-interval operator. This bit is relevant only if RE_INTERVALS is set.

RE_NO_BK_PARENS

If this bit is set, then ‘(’ represents the open-group operator and ‘)’ represents the close-group operator; if this bit isn’t set, then ‘\(’ represents the open-group operator and ‘\)’ represents the close-group operator.

RE_NO_BK_REFS

If this bit is set, then Regex doesn’t recognize ‘\digit as the back-reference operator; if this bit isn’t set, then it does.

RE_NO_BK_VBAR

If this bit is set, then ‘|’ represents the alternation operator; if this bit isn’t set, then ‘\|’ represents the alternation operator. This bit is irrelevant if RE_LIMITED_OPS is set.

RE_NO_EMPTY_RANGES

If this bit is set, then a regular expression with a range whose ending point collates lower than its starting point is invalid; if this bit isn’t set, then Regex considers such a range to be empty.

RE_NO_GNU_OPS

If this bit is set, GNU regex operators are not recognized; otherwise, they are.

RE_NO_POSIX_BACKTRACKING

If this bit is set, succeed as soon as we match the whole pattern, without further backtracking. This means that a match may not be the leftmost longest; see What Gets Matched? for what this means.

RE_NO_SUB

If this bit is set, then no_sub will be set to one during re_compile_pattern. This causes matching and searching routines not to record substring match information.

RE_UNMATCHED_RIGHT_PAREN_ORD

If this bit is set and the regular expression has no matching open-group operator, then Regex considers what would otherwise be a close-group operator (based on how RE_NO_BK_PARENS is set) to match ‘)’.


Next: Predefined Syntaxes, Up: Regular Expression Syntax   [Contents][Index]