As m4 reads its input, it separates it into tokens. A
token is either a name, a quoted string, or any single character, that
is not a part of either a name or a string. Input to m4 can also
contain comments. GNU m4 does not yet understand
multibyte locales; all operations are byte-oriented rather than
character-oriented (although if your locale uses a single byte
encoding, such as ISO-8859-1, you will not notice a difference).
However, m4 is eight-bit clean, so you can
use non-ascii characters in quoted strings (see Changequote),
comments (see Changecom), and macro names (see Indir), with the
exception of the nul character (the zero byte ‘'\0'’).