Next: , Previous: Elementary string conversions, Up: unistr.h


4.3 Elementary string functions

The following functions inspect and return details about the first character in a Unicode string.

— Function: int u8_mblen (const uint8_t *s, size_t n)
— Function: int u16_mblen (const uint16_t *s, size_t n)
— Function: int u32_mblen (const uint32_t *s, size_t n)

Returns the length (number of units) of the first character in s, which is no longer than n. Returns 0 if it is the NUL character. Returns -1 upon failure.

This function is similar to mblen, except that it operates on a Unicode string and that s must not be NULL.

— Function: int u8_mbtouc_unsafe (ucs4_t *puc, const uint8_t *s, size_t n)
— Function: int u16_mbtouc_unsafe (ucs4_t *puc, const uint16_t *s, size_t n)
— Function: int u32_mbtouc_unsafe (ucs4_t *puc, const uint32_t *s, size_t n)

Returns the length (number of units) of the first character in s, putting its ucs4_t representation in *puc. Upon failure, *puc is set to 0xfffd, and an appropriate number of units is returned.

The number of available units, n, must be > 0.

This function is similar to mbtowc, except that it operates on a Unicode string, puc and s must not be NULL, n must be > 0, and the NUL character is not treated specially.

— Function: int u8_mbtouc (ucs4_t *puc, const uint8_t *s, size_t n)
— Function: int u16_mbtouc (ucs4_t *puc, const uint16_t *s, size_t n)
— Function: int u32_mbtouc (ucs4_t *puc, const uint32_t *s, size_t n)

This function is like u8_mbtouc_unsafe, except that it will detect an invalid UTF-8 character, even if the library is compiled without --enable-safety.

— Function: int u8_mbtoucr (ucs4_t *puc, const uint8_t *s, size_t n)
— Function: int u16_mbtoucr (ucs4_t *puc, const uint16_t *s, size_t n)
— Function: int u32_mbtoucr (ucs4_t *puc, const uint32_t *s, size_t n)

Returns the length (number of units) of the first character in s, putting its ucs4_t representation in *puc. Upon failure, *puc is set to 0xfffd, and -1 is returned for an invalid sequence of units, -2 is returned for an incomplete sequence of units.

The number of available units, n, must be > 0.

This function is similar to u8_mbtouc, except that the return value gives more details about the failure, similar to mbrtowc.

The following function stores a Unicode character as a Unicode string in memory.

— Function: int u8_uctomb (uint8_t *s, ucs4_t uc, int n)
— Function: int u16_uctomb (uint16_t *s, ucs4_t uc, int n)
— Function: int u32_uctomb (uint32_t *s, ucs4_t uc, int n)

Puts the multibyte character represented by uc in s, returning its length. Returns -1 upon failure, -2 if the number of available units, n, is too small. The latter case cannot occur if n >= 6/2/1, respectively.

This function is similar to wctomb, except that it operates on a Unicode strings, s must not be NULL, and the argument n must be specified.

The following functions copy Unicode strings in memory.

— Function: uint8_t * u8_cpy (uint8_t *dest, const uint8_t *src, size_t n)
— Function: uint16_t * u16_cpy (uint16_t *dest, const uint16_t *src, size_t n)
— Function: uint32_t * u32_cpy (uint32_t *dest, const uint32_t *src, size_t n)

Copies n units from src to dest.

This function is similar to memcpy, except that it operates on Unicode strings.

— Function: uint8_t * u8_move (uint8_t *dest, const uint8_t *src, size_t n)
— Function: uint16_t * u16_move (uint16_t *dest, const uint16_t *src, size_t n)
— Function: uint32_t * u32_move (uint32_t *dest, const uint32_t *src, size_t n)

Copies n units from src to dest, guaranteeing correct behavior for overlapping memory areas.

This function is similar to memmove, except that it operates on Unicode strings.

The following function fills a Unicode string.

— Function: uint8_t * u8_set (uint8_t *s, ucs4_t uc, size_t n)
— Function: uint16_t * u16_set (uint16_t *s, ucs4_t uc, size_t n)
— Function: uint32_t * u32_set (uint32_t *s, ucs4_t uc, size_t n)

Sets the first n characters of s to uc. uc should be a character that occupies only 1 unit.

This function is similar to memset, except that it operates on Unicode strings.

The following function compares two Unicode strings of the same length.

— Function: int u8_cmp (const uint8_t *s1, const uint8_t *s2, size_t n)
— Function: int u16_cmp (const uint16_t *s1, const uint16_t *s2, size_t n)
— Function: int u32_cmp (const uint32_t *s1, const uint32_t *s2, size_t n)

Compares s1 and s2, each of length n, lexicographically. Returns a negative value if s1 compares smaller than s2, a positive value if s1 compares larger than s2, or 0 if they compare equal.

This function is similar to memcmp, except that it operates on Unicode strings.

The following function compares two Unicode strings of possibly different lengths.

— Function: int u8_cmp2 (const uint8_t *s1, size_t n1, const uint8_t *s2, size_t n2)
— Function: int u16_cmp2 (const uint16_t *s1, size_t n1, const uint16_t *s2, size_t n2)
— Function: int u32_cmp2 (const uint32_t *s1, size_t n1, const uint32_t *s2, size_t n2)

Compares s1 and s2, lexicographically. Returns a negative value if s1 compares smaller than s2, a positive value if s1 compares larger than s2, or 0 if they compare equal.

This function is similar to the gnulib function memcmp2, except that it operates on Unicode strings.

The following function searches for a given Unicode character.

— Function: uint8_t * u8_chr (const uint8_t *s, size_t n, ucs4_t uc)
— Function: uint16_t * u16_chr (const uint16_t *s, size_t n, ucs4_t uc)
— Function: uint32_t * u32_chr (const uint32_t *s, size_t n, ucs4_t uc)

Searches the string at s for uc. Returns a pointer to the first occurrence of uc in s, or NULL if uc does not occur in s.

This function is similar to memchr, except that it operates on Unicode strings.

The following function counts the number of Unicode characters.

— Function: size_t u8_mbsnlen (const uint8_t *s, size_t n)
— Function: size_t u16_mbsnlen (const uint16_t *s, size_t n)
— Function: size_t u32_mbsnlen (const uint32_t *s, size_t n)

Counts and returns the number of Unicode characters in the n units from s.

This function is similar to the gnulib function mbsnlen, except that it operates on Unicode strings.