Next: , Previous: , Up: Strings   [Contents][Index]


16.1.2 Iterating through strings

For complex string processing, string functions may not be enough, and you need to iterate through a string while processing each (possibly multibyte) character or encoding error in turn. Gnulib has several modules for iterating forward through a string in this way. Backward iteration, that is, from the string’s end to start, is not provided, as it is too hairy in general.

The choice of modules depends on the application’s needs. The mbiter module family is more suitable for applications that treat some sequences of two or more bytes as a single encoding error, and for applications that need to support obsolescent encodings on non-GNU platforms, such as CP864, EBCDIC, Johab, and Shift JIS. In this module family, mbuiter and mbuiterf are more suitable than mbiter and mbiterf when arguments are C strings, lengths are not already known, and it is highly likely that only the first few multibyte characters need to be inspected.

The mcel module is simpler and can be faster than the mbiter family, and is more suitable for applications that do not need the mbiter family’s special features.

The mcel-prefer module is like mcel except that it also causes some other modules, such as mbscasecmp, to use mcel rather than the mbiter family. This can be simpler and faster. However, it does not support the obsolescent encodings, and it may behave differently on data containing encoding errors where behavior is unspecified or undefined, because in mcel each encoding error is a single byte whereas in the mbiter family a single encoding error can contain two or more bytes.

If a package uses mcel-prefer, it may also want to give gnulib-tool one or more of the options --avoid=mbiter, --avoid=mbiterf, --avoid=mbuiter and --avoid=mbuiterf, to avoid packaging modules that are not needed.


Next: Strings with NUL characters, Previous: The C string representation, Up: Strings   [Contents][Index]