char32_t type ¶The ISO C and POSIX standard creators then introduced the
char32_t type. In ISO C 11, it was conceptually a “32-bit wide
character” type. In ISO C 23, its semantics has been further
specified: A char32_t value is a Unicode code point.
Thus, the char32_t type is not affected the problems that plague
the wchar_t type.
The char32_t type and its API are defined in the <uchar.h>
header file.
ISO C and POSIX specify only the basic functions for the char32_t
type, namely conversion of a single character (mbrtoc32 and
c32rtomb). For convenience, Gnulib adds API for classification
and case conversion of characters.
GNU libunistring can also be used on char32_t values. Since
char32_t is the same as uint32_t, all u32_*
functions of GNU libunistring are applicable to arrays of
char32_t values.
On glibc systems, use of the 32-bit wide strings (char32_t[]) is
exactly as efficient as the use of the older wide strings
(wchar_t[]). This is possible because on glibc, wchar_t
values already always were 32-bit and Unicode code points.
mbrtoc32 is just an alias of mbrtowc. The Gnulib
*c32* functions are optimized so that on glibc systems they
immediately redirect to the corresponding *wc* functions.
Gnulib implements the ISO C 23 semantics of char32_t when you
import the ‘uchar-h-c23’ module. Without this module, it implements
only the ISO C 11 semantics; the effect is that on some platforms
(macOS, FreeBSD, NetBSD, Solaris) a char32_t value is the same
as a wchar_t value, not a Unicode code point. Thus, when you
want to pass char32_t values to GNU libunistring or to some Unicode
centric Gnulib functions, you need the ‘uchar-h-c23’ module in order
to do so without portability problems.