[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

D. Querying using regular expressions

See also 2.4.3 Query expressions.

Unfortunately, we do not have room in this manual for a complete exposition on regular expressions. The following is a basic summary of some regular expressions you might wish to use.

NOTE: When you use query expressions containing regular expressions as part of an ordinary query-pr shell command line, you need to quote them with ", otherwise the shell will try to interpret the special characters used, yielding highly unpredictable results.

See section `Regular Expression Syntax' in Regex, for details on regular expression syntax. Also see section `Syntax of Regular Expressions' in GNU Emacs Manual, but beware that the syntax for regular expressions in Emacs is slightly different.

All search criteria options to query-pr rely on regular expression syntax to construct their search patterns. For example,

 
query-pr --expr 'State="open"' --format full

matches all PRs whose `>State:' values match with the regular expression `open'.

We can substitute the expression `o' for `open', according to GNU regular expression syntax. This matches all values of `>State:' which begin with the letter `o'.

We see that

 
query-pr --expr 'State="o"' --format full

is equivalent to

 
query-pr --expr 'State="o"' --format full

in this case, since the only value for `>State:' which matches the expression `o' is `open'. `State="o"' also matches `o', `oswald', and even `oooooo', but none of those values are valid states for a Problem Report in default GNATS installations.

We can also use the expression operator `|' to signify a logical OR, such that

 
query-pr --expr 'State="o|a"' --format full

matches all `open' or `analyzed' Problem Reports.

Regular expression syntax considers a regexp token surrounded with parentheses, as in `(regexp)', to be a group. This means that `(ab)*' matches any number (including zero) of contiguous instances of `ab'. Matches include `', `ab', and `ababab'.

Regular expression syntax considers a regexp token surrounded with square brackets, as in `[regexp]', to be a list. This means that `Char[(ley)(lene)(broiled)' matches any of the words `Charley', `Charlene', or `Charbroiled' (case is significant; `charbroiled' is not matched).

Using groups and lists, we see that

 
query-pr --expr 'Category="gcc|gdb|gas"' --format full

is equivalent to

 
query-pr --expr 'Category="g(cc|db|as)"' --format full

and is also very similar to

 
query-pr --expr 'Category="g[cda]"' --format full

with the exception that this last search matches any values which begin with `gc', `gd', or `ga'.

The `.' character is known as a wildcard. `.' matches on any single character. `*' matches the previous character (except newlines), list, or group any number of times, including zero. Therefore, we can understand `.*' to mean "match zero or more instances of any character."

 
query-pr --expr 'State=".*a"' --format full

matches all values for `>State:' which contain an `a'. (These include `analyzed' and `feedback'.)

Another way to understand what wildcards do is to follow them on their search for matching text. By our syntax, `.*' matches any character any number of times, including zero. Therefore, `.*a' searches for any group of characters which end with `a', ignoring the rest of the field. `.*a' matches `analyzed' (stopping at the first `a') as well as `feedback'.

Note: When using `fieldtype:Text' or `fieldtype:Multitext' (see section 2.4.3 Query expressions), you do not have to specify the token `.*' at the beginning of your expression to match the entire field. For the technically minded, this is because these queries use `re_search' rather than `re_match'. `re_match' anchors the search at the beginning of the field, while `re_search' does not anchor the search.

For example, to search in the >Description: field for the text

 
The defrobulator component returns a nil value.

we can use

 
query-pr --expr 'fieldtype:Multitext="defrobulator.*nil"' --format full

To also match newlines, we have to include the expression `(.|^M)' instead of just a dot (`.'). `(.|^M)' matches "any single character except a newline (`.') or (`|') any newline (`^M')." This means that to search for the text

 
The defrobulator component enters the bifrabulator routine
and returns a nil value.

we must use

 
query-pr --expr 'fieldtype:Multitext="defrobulator(.|^M)*nil"'
         --format full

To generate the newline character `^M', type the following depending on your shell:

csh
`control-V control-M'

tcsh
`control-V control-J'

sh (or bash)
Use the RETURN key, as in

 
(.|
)

Again, see section `Regular Expression Syntax' in Regex, for a much more complete discussion on regular expression syntax.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Yngve Svendsen on January, 9 2002 using texi2html