14.3.8 The Back-reference Operator (\digit)
If the syntax bit RE_NO_BK_REF isn't set, then Regex recognizes
back references. A back reference matches a specified preceding group.
The back reference operator is represented by ‘\digit’
anywhere after the end of a regular expression's digit-th
group (see Grouping Operators).
digit must be between ‘1’ and ‘9’. The matcher assigns
numbers 1 through 9 to the first nine groups it encounters. By using
one of ‘\1’ through ‘\9’ after the corresponding group's
close-group operator, you can match a substring identical to the
one that the group does.
Back references match according to the following (in all examples below,
‘(’ represents the open-group, ‘)’ the close-group, ‘{’
the open-interval and ‘}’ the close-interval operator):
- If the group matches a substring, the back reference matches an
identical substring. For example, ‘(a)\1’ matches ‘aa’ and
‘(bana)na\1bo\1’ matches ‘bananabanabobana’. Likewise,
‘(.*)\1’ matches any (newline-free if the syntax bit
RE_DOT_NEWLINE isn't set) string that is composed of two
identical halves; the ‘(.*)’ matches the first half and the
‘\1’ matches the second half.
- If the group matches more than once (as it might if followed
by, e.g., a repetition operator), then the back reference matches the
substring the group last matched. For example,
‘((a*)b)*\1\2’ matches ‘aabababa’; first group 1 (the
outer one) matches ‘aab’ and group 2 (the inner one) matches
‘aa’. Then group 1 matches ‘ab’ and group 2 matches
‘a’. So, ‘\1’ matches ‘ab’ and ‘\2’ matches
‘a’.
- If the group doesn't participate in a match, i.e., it is part of an
alternative not taken or a repetition operator allows zero repetitions
of it, then the back reference makes the whole match fail. For example,
‘(one()|two())-and-(three\2|four\3)’ matches ‘one-and-three’
and ‘two-and-four’, but not ‘one-and-four’ or
‘two-and-three’. For example, if the pattern matches
‘one-and-’, then its group 2 matches the empty string and its
group 3 doesn't participate in the match. So, if it then matches
‘four’, then when it tries to back reference group 3—which it
will attempt to do because ‘\3’ follows the ‘four’—the match
will fail because group 3 didn't participate in the match.
You can use a back reference as an argument to a repetition operator. For
example, ‘(a(b))\2*’ matches ‘a’ followed by two or more
‘b’s. Similarly, ‘(a(b))\2{3}’ matches ‘abbbb’.
If there is no preceding digit-th subexpression, the regular
expression is invalid.