Next: , Previous: , Up: GNU libunistring   [Contents][Index]


5 Conversions between Unicode and encodings <uniconv.h>

This include file declares functions for converting between Unicode strings and char * strings in locale encoding or in other specified encodings.

The following function returns the locale encoding.

Function: const char * locale_charset ()

Determines the current locale’s character encoding, and canonicalizes it into one of the canonical names listed in localcharset.h. If the canonical name cannot be determined, the result is a non-canonical name.

The result must not be freed; it is statically allocated.

The result of this function can be used as an argument to the iconv_open function in GNU libc, in GNU libiconv, or in the gnulib provided wrapper around the native iconv_open function. It may not work as an argument to the native iconv_open function directly.

The handling of unconvertible characters during the conversions can be parametrized through the following enumeration type:

Type: enum iconv_ilseq_handler

This type specifies how unconvertible characters in the input are handled.

Constant: enum iconv_ilseq_handler iconveh_error

This handler causes the function to return with errno set to EILSEQ.

Constant: enum iconv_ilseq_handler iconveh_question_mark

This handler produces one question mark ‘?’ per unconvertible character.

Constant: enum iconv_ilseq_handler iconveh_question_replacement_character

This handler produces one U+FFFD per unconvertible character if that fits in the target encoding, otherwise one question mark ‘?’ per unconvertible character.

Constant: enum iconv_ilseq_handler iconveh_escape_sequence

This handler produces an escape sequence \uxxxx or \Uxxxxxxxx for each unconvertible character.

The following functions convert between strings in a specified encoding and Unicode strings.

Function: uint8_t * u8_conv_from_encoding (const char *fromcode, enum iconv_ilseq_handler handler, const char *src, size_t srclen, size_t *offsets, uint8_t *resultbuf, size_t *lengthp)
Function: uint16_t * u16_conv_from_encoding (const char *fromcode, enum iconv_ilseq_handler handler, const char *src, size_t srclen, size_t *offsets, uint16_t *resultbuf, size_t *lengthp)
Function: uint32_t * u32_conv_from_encoding (const char *fromcode, enum iconv_ilseq_handler handler, const char *src, size_t srclen, size_t *offsets, uint32_t *resultbuf, size_t *lengthp)

Converts an entire string, possibly including NUL bytes, from one encoding to UTF-8 encoding.

Converts a memory region given in encoding fromcode. fromcode is as for the iconv_open function.

The input is in the memory region between src (inclusive) and src + srclen (exclusive).

If offsets is not NULL, it should point to an array of srclen integers; this array is filled with offsets into the result, i.e. the character starting at src[i] corresponds to the character starting at result[offsets[i]], and other offsets are set to (size_t)(-1).

resultbuf and *lengthp should be a scratch buffer and its size, or resultbuf can be NULL.

May erase the contents of the memory at resultbuf.

If successful: The resulting Unicode string (non-NULL) is returned and its length stored in *lengthp. The resulting string is resultbuf if no dynamic memory allocation was necessary, or a freshly allocated memory block otherwise.

In case of error: NULL is returned and errno is set. Particular errno values: EINVAL, EILSEQ, ENOMEM.

Function: char * u8_conv_to_encoding (const char *tocode, enum iconv_ilseq_handler handler, const uint8_t *src, size_t srclen, size_t *offsets, char *resultbuf, size_t *lengthp)
Function: char * u16_conv_to_encoding (const char *tocode, enum iconv_ilseq_handler handler, const uint16_t *src, size_t srclen, size_t *offsets, char *resultbuf, size_t *lengthp)
Function: char * u32_conv_to_encoding (const char *tocode, enum iconv_ilseq_handler handler, const uint32_t *src, size_t srclen, size_t *offsets, char *resultbuf, size_t *lengthp)

Converts an entire Unicode string, possibly including NUL units, from UTF-8 encoding to a given encoding.

Converts a memory region to encoding tocode. tocode is as for the iconv_open function.

The input is in the memory region between src (inclusive) and src + srclen (exclusive).

If offsets is not NULL, it should point to an array of srclen integers; this array is filled with offsets into the result, i.e. the character starting at src[i] corresponds to the character starting at result[offsets[i]], and other offsets are set to (size_t)(-1).

resultbuf and *lengthp should be a scratch buffer and its size, or resultbuf can be NULL.

May erase the contents of the memory at resultbuf.

If successful: The resulting Unicode string (non-NULL) is returned and its length stored in *lengthp. The resulting string is resultbuf if no dynamic memory allocation was necessary, or a freshly allocated memory block otherwise.

In case of error: NULL is returned and errno is set. Particular errno values: EINVAL, EILSEQ, ENOMEM.

The following functions convert between NUL terminated strings in a specified encoding and NUL terminated Unicode strings.

Function: uint8_t * u8_strconv_from_encoding (const char *string, const char *fromcode, enum iconv_ilseq_handler handler)
Function: uint16_t * u16_strconv_from_encoding (const char *string, const char *fromcode, enum iconv_ilseq_handler handler)
Function: uint32_t * u32_strconv_from_encoding (const char *string, const char *fromcode, enum iconv_ilseq_handler handler)

Converts a NUL terminated string from a given encoding.

The result is malloc allocated, or NULL (with errno set) in case of error.

Particular errno values: EILSEQ, ENOMEM.

Function: char * u8_strconv_to_encoding (const uint8_t *string, const char *tocode, enum iconv_ilseq_handler handler)
Function: char * u16_strconv_to_encoding (const uint16_t *string, const char *tocode, enum iconv_ilseq_handler handler)
Function: char * u32_strconv_to_encoding (const uint32_t *string, const char *tocode, enum iconv_ilseq_handler handler)

Converts a NUL terminated string to a given encoding.

The result is malloc allocated, or NULL (with errno set) in case of error.

Particular errno values: EILSEQ, ENOMEM.

The following functions are shorthands that convert between NUL terminated strings in locale encoding and NUL terminated Unicode strings.

Function: uint8_t * u8_strconv_from_locale (const char *string)
Function: uint16_t * u16_strconv_from_locale (const char *string)
Function: uint32_t * u32_strconv_from_locale (const char *string)

Converts a NUL terminated string from the locale encoding.

The result is malloc allocated, or NULL (with errno set) in case of error.

Particular errno values: ENOMEM.

Function: char * u8_strconv_to_locale (const uint8_t *string)
Function: char * u16_strconv_to_locale (const uint16_t *string)
Function: char * u32_strconv_to_locale (const uint32_t *string)

Converts a NUL terminated string to the locale encoding.

The result is malloc allocated, or NULL (with errno set) in case of error.

Particular errno values: ENOMEM.


Next: Output with Unicode strings <unistdio.h>, Previous: Elementary Unicode string functions <unistr.h>, Up: GNU libunistring   [Contents][Index]