Next: , Up: Elementary string functions   [Contents][Index]


4.3.1 Iterating over a Unicode string

The following functions inspect and return details about the first character in a Unicode string.

Function: int u8_mblen (const uint8_t *s, size_t n)
Function: int u16_mblen (const uint16_t *s, size_t n)
Function: int u32_mblen (const uint32_t *s, size_t n)

Returns the length (number of units) of the first character in s, which is no longer than n. Returns 0 if it is the NUL character. Returns -1 upon failure.

This function is similar to mblen, except that it operates on a Unicode string and that s must not be NULL.

Function: int u8_mbtouc (ucs4_t *puc, const uint8_t *s, size_t n)
Function: int u16_mbtouc (ucs4_t *puc, const uint16_t *s, size_t n)
Function: int u32_mbtouc (ucs4_t *puc, const uint32_t *s, size_t n)

Returns the length (number of units) of the first character in s, putting its ucs4_t representation in *puc. Upon failure, *puc is set to 0xfffd, and an appropriate number of units is returned.

The number of available units, n, must be > 0.

This function fails if an invalid sequence of units is encountered at the beginning of s, or if additional units (after the n provided units) would be needed to form a character.

This function is similar to mbtowc, except that it operates on a Unicode string, puc and s must not be NULL, n must be > 0, and the NUL character is not treated specially.

Function: int u8_mbtouc_unsafe (ucs4_t *puc, const uint8_t *s, size_t n)
Function: int u16_mbtouc_unsafe (ucs4_t *puc, const uint16_t *s, size_t n)
Function: int u32_mbtouc_unsafe (ucs4_t *puc, const uint32_t *s, size_t n)

This function is identical to u8_mbtouc/u16_mbtouc/u32_mbtouc. Earlier versions of this function performed fewer range-checks on the sequence of units.

Function: int u8_mbtoucr (ucs4_t *puc, const uint8_t *s, size_t n)
Function: int u16_mbtoucr (ucs4_t *puc, const uint16_t *s, size_t n)
Function: int u32_mbtoucr (ucs4_t *puc, const uint32_t *s, size_t n)

Returns the length (number of units) of the first character in s, putting its ucs4_t representation in *puc. Upon failure, *puc is set to 0xfffd, and -1 is returned for an invalid sequence of units, -2 is returned for an incomplete sequence of units.

The number of available units, n, must be > 0.

This function is similar to u8_mbtouc, except that the return value gives more details about the failure, similar to mbrtowc.


Next: Creating Unicode strings one character at a time, Up: Elementary string functions   [Contents][Index]