Next: , Previous: , Up: Common Operators   [Contents][Index]


18.3.8 The Back-reference Operator (\digit)

If the syntax bit RE_NO_BK_REF isn’t set, then Regex recognizes back-references. A back-reference matches a specified preceding group. The back-reference operator is represented by ‘\digit’ anywhere after the end of a regular expression’s digit-th group (see Grouping Operators (() or \(\))).

digit must be between ‘1’ and ‘9’. The matcher assigns numbers 1 through 9 to the first nine groups it encounters. By using one of ‘\1’ through ‘\9’ after the corresponding group’s close-group operator, you can match a substring identical to the one that the group does.

Back-references match according to the following (in all examples below, ‘(’ represents the open-group, ‘)’ the close-group, ‘{’ the open-interval and ‘}’ the close-interval operator):

You can use a back-reference as an argument to a repetition operator. For example, ‘(a(b))\2*’ matches ‘a’ followed by two or more ‘b’s. Similarly, ‘(a(b))\2{3}’ matches ‘abbbb’.

If there is no preceding digit-th subexpression, the regular expression is invalid.

Back-references can greatly slow down matching, as they can generate exponentially many matching possibilities that can consume both time and memory to explore. Also, the POSIX specification for back-references is at times unclear. Furthermore, many regular expression implementations have back-reference bugs that can cause programs to return incorrect answers or even crash, and fixing these bugs has often been low-priority: for example, as of 2020 the GNU C library bug database contained back-reference bugs 52, 10844, 11053, 24269 and 25322, with little sign of forthcoming fixes. Luckily, back-references are rarely useful and it should be little trouble to avoid them in practical applications.


Next: Anchoring Operators, Previous: Grouping Operators (() or \(\)), Up: Common Operators   [Contents][Index]