Next: , Previous: , Up: GNU Regex Functions   [Contents][Index]


18.6.1.8 Using Registers

A group in a regular expression can match a (possibly empty) substring of the string that regular expression as a whole matched. The matcher remembers the beginning and end of the substring matched by each group.

To find out what they matched, pass a nonzero regs argument to a GNU matching or searching function (see GNU Matching and GNU Searching), i.e., the address of a structure of this type, as defined in regex.h:

struct re_registers
{
  unsigned num_regs;
  regoff_t *start;
  regoff_t *end;
};

Except for (possibly) the num_regs’th element (see below), the ith element of the start and end arrays records information about the ith group in the pattern. (They’re declared as C pointers, but this is only because not all C compilers accept zero-length arrays; conceptually, it is simplest to think of them as arrays.)

The start and end arrays are allocated in one of two ways. The simplest and perhaps most useful is to let the matcher (re)allocate enough space to record information for all the groups in the regular expression. If re_set_registers is not called before searching or matching, then the matcher allocates two arrays each of 1 + re_nsub elements (re_nsub is another field in the pattern buffer; see GNU Pattern Buffers). The extra element is set to -1. Then on subsequent calls with the same pattern buffer and regs arguments, the matcher reallocates more space if necessary.

The function:

void
re_set_registers (struct re_pattern_buffer *buffer,
                  struct re_registers *regs,
                  size_t num_regs,
                  regoff_t *starts, regoff_t *ends)

sets regs to hold num_regs registers, storing them in starts and ends. Subsequent matches using buffer and regs will use this memory for recording register information. starts and ends must be allocated with malloc, and must each be at least num_regs * sizeof (regoff_t) bytes long.

If num_regs is zero, then subsequent matches should allocate their own register data.

Unless this function is called, the first search or match using buffer will allocate its own register data, without freeing the old data.

The following examples illustrate the information recorded in the re_registers structure. (In all of them, ‘(’ represents the open-group and ‘)’ the close-group operator. The first character in the string string is at index 0.)


Next: Freeing GNU Pattern Buffers, Previous: GNU Translate Tables, Up: GNU Regex Functions   [Contents][Index]