Next: , Previous: Composition of characters, Up: uninorm.h


12.3 Normalization of strings

The Unicode standard defines four normalization forms for Unicode strings. The following type is used to denote a normalization form.

— Type: uninorm_t

An object of type uninorm_t denotes a Unicode normalization form. This is a scalar type; its values can be compared with ==.

The following constants denote the four normalization forms.

— Macro: uninorm_t UNINORM_NFD

Denotes Normalization form D: canonical decomposition.

— Macro: uninorm_t UNINORM_NFC

Normalization form C: canonical decomposition, then canonical composition.

— Macro: uninorm_t UNINORM_NFKD

Normalization form KD: compatibility decomposition.

— Macro: uninorm_t UNINORM_NFKC

Normalization form KC: compatibility decomposition, then canonical composition.

The following functions operate on uninorm_t objects.

— Function: bool uninorm_is_compat_decomposing (uninorm_t nf)

Tests whether the normalization form nf does compatibility decomposition.

— Function: bool uninorm_is_composing (uninorm_t nf)

Tests whether the normalization form nf includes canonical composition.

— Function: uninorm_t uninorm_decomposing_form (uninorm_t nf)

Returns the decomposing variant of the normalization form nf. This maps NFC,NFD → NFD and NFKC,NFKD → NFKD.

The following functions apply a Unicode normalization form to a Unicode string.

— Function: uint8_t * u8_normalize (uninorm_t nf, const uint8_t *s, size_t n, uint8_t *resultbuf, size_t *lengthp)
— Function: uint16_t * u16_normalize (uninorm_t nf, const uint16_t *s, size_t n, uint16_t *resultbuf, size_t *lengthp)
— Function: uint32_t * u32_normalize (uninorm_t nf, const uint32_t *s, size_t n, uint32_t *resultbuf, size_t *lengthp)

Returns the specified normalization form of a string.