Next: , Up: Repetition Operators


14.3.4.1 The Match-zero-or-more Operator (*)

This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. ‘*’ represents this operator. For example, ‘o*’ matches any string made up of zero or more ‘o’s. Since this operator operates on the smallest preceding regular expression, ‘fo*’ has a repeating ‘o’, not a repeating ‘fo’. So, ‘fo*’ matches ‘f’, ‘fo’, ‘foo’, and so on.

Since the match-zero-or-more operator is a suffix operator, it may be useless as such when no regular expression precedes it. This is the case when it:

Three different things can happen in these cases:

  1. If the syntax bit RE_CONTEXT_INVALID_OPS is set, then the regular expression is invalid.
  2. If RE_CONTEXT_INVALID_OPS isn't set, but RE_CONTEXT_INDEP_OPS is, then ‘*’ represents the match-zero-or-more operator (which then operates on the empty string).
  3. Otherwise, ‘*’ is ordinary.

The matcher processes a match-zero-or-more operator by first matching as many repetitions of the smallest preceding regular expression as it can. Then it continues to match the rest of the pattern.

If it can't match the rest of the pattern, it backtracks (as many times as necessary), each time discarding one of the matches until it can either match the entire pattern or be certain that it cannot get a match. For example, when matching ‘ca*ar’ against ‘caaar’, the matcher first matches all three ‘a’s of the string with the ‘a*’ of the regular expression. However, it cannot then match the final ‘ar’ of the regular expression against the final ‘r’ of the string. So it backtracks, discarding the match of the last ‘a’ in the string. It can then match the remaining ‘ar’.