3.1 Fundamental Structure

In regular expressions, the characters ‘.?*+{|()[\^$’ are special characters and have uses described below. All other characters are ordinary characters, and each ordinary character is a regular expression that matches itself.

The period ‘.’ matches any single character. It is unspecified whether ‘.’ matches an encoding error.

A regular expression may be followed by one of several repetition operators; the operators beginning with ‘{’ are called interval expressions.

?

The preceding item is optional and is matched at most once.

*

The preceding item is matched zero or more times.

+

The preceding item is matched one or more times.

{n}

The preceding item is matched exactly n times.

{n,}

The preceding item is matched n or more times.

{,m}

The preceding item is matched at most m times. This is a GNU extension.

{n,m}

The preceding item is matched at least n times, but not more than m times.

The empty regular expression matches the empty string. Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated expressions.

Two regular expressions may be joined by the infix operator ‘|’. The resulting regular expression matches any string matching either of the two expressions, which are called alternatives.

Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole expression may be enclosed in parentheses to override these precedence rules and form a subexpression. An unmatched ‘)’ matches just itself.

Not every character string is a valid regular expression. See Problematic Regular Expressions.