1.4 Choice of in-memory representation of strings
There are three ways of representing strings in memory of a running
program.
- As ‘char *’ strings. Such strings are represented in locale encoding.
This approach is employed when not much text processing is done by the
program. When some Unicode aware processing is to be done, a string is
converted to Unicode on the fly and back to locale encoding afterwards.
- As UTF-8 or UTF-16 or UTF-32 strings. This implies that conversion from
locale encoding to Unicode is performed on input, and in the opposite
direction on output. This approach is employed when the program does
a significant amount of text processing, or when the program has multiple
threads operating on the same data but in different locales.
- As ‘wchar_t *’, a.k.a. “wide strings”. This approach is misguided,
see The wchar_t mess.