Next: , Previous: , Up: Constants   [Contents][Index]

12.9 Unicode Character Codes

You can specify Unicode characters, for individual character constants or as part of string constants (see String Constants), using escape sequences; and even in C identifiers. Use the ‘\u’ escape sequence with a 16-bit hexadecimal Unicode character code. If the code value is too big for 16 bits, use the ‘\U’ escape sequence with a 32-bit hexadecimal Unicode character code. (These codes are called universal character names.) For example,

\u6C34      /* 16-bit code (UTF-16) */
\U0010ABCD  /* 32-bit code (UTF-32) */

One way to use these is in UTF-8 string constants (see UTF-8 String Constants). For instance,

u8"fóó \u6C34 \U0010ABCD"

You can also use them in wide character constants (see Wide Character Constants), like this:

u'\u6C34'      /* 16-bit code */
U'\U0010ABCD'  /* 32-bit code */

and in wide string constants (see Wide String Constants), like this:

u"\u6C34\u6C33"  /* 16-bit code */
U"\U0010ABCD"    /* 32-bit code */

And in an identifier:

int foo\u6C34bar = 0;

Codes in the range of D800 through DFFF are not valid in Unicode. Codes less than 00A0 are also forbidden, except for 0024, 0040, and 0060; these characters are actually ASCII control characters, and you can specify them with other escape sequences (see Character Constants).