Next: , Up: Normalization forms (composition and decomposition) <uninorm.h>   [Contents][Index]


13.1 Decomposition of Unicode characters

The following enumerated values are the possible types of decomposition of a Unicode character.

Constant: int UC_DECOMP_CANONICAL

Denotes canonical decomposition.

Constant: int UC_DECOMP_FONT

UCD marker: <font>. Denotes a font variant (e.g. a blackletter form).

Constant: int UC_DECOMP_NOBREAK

UCD marker: <noBreak>. Denotes a no-break version of a space or hyphen.

Constant: int UC_DECOMP_INITIAL

UCD marker: <initial>. Denotes an initial presentation form (Arabic).

Constant: int UC_DECOMP_MEDIAL

UCD marker: <medial>. Denotes a medial presentation form (Arabic).

Constant: int UC_DECOMP_FINAL

UCD marker: <final>. Denotes a final presentation form (Arabic).

Constant: int UC_DECOMP_ISOLATED

UCD marker: <isolated>. Denotes an isolated presentation form (Arabic).

Constant: int UC_DECOMP_CIRCLE

UCD marker: <circle>. Denotes an encircled form.

Constant: int UC_DECOMP_SUPER

UCD marker: <super>. Denotes a superscript form.

Constant: int UC_DECOMP_SUB

UCD marker: <sub>. Denotes a subscript form.

Constant: int UC_DECOMP_VERTICAL

UCD marker: <vertical>. Denotes a vertical layout presentation form.

Constant: int UC_DECOMP_WIDE

UCD marker: <wide>. Denotes a wide (or zenkaku) compatibility character.

Constant: int UC_DECOMP_NARROW

UCD marker: <narrow>. Denotes a narrow (or hankaku) compatibility character.

Constant: int UC_DECOMP_SMALL

UCD marker: <small>. Denotes a small variant form (CNS compatibility).

Constant: int UC_DECOMP_SQUARE

UCD marker: <square>. Denotes a CJK squared font variant.

Constant: int UC_DECOMP_FRACTION

UCD marker: <fraction>. Denotes a vulgar fraction form.

Constant: int UC_DECOMP_COMPAT

UCD marker: <compat>. Denotes an otherwise unspecified compatibility character.

The following constant denotes the maximum size of decomposition of a single Unicode character.

Macro: unsigned int UC_DECOMPOSITION_MAX_LENGTH

This macro expands to a constant that is the required size of buffer passed to the uc_decomposition and uc_canonical_decomposition functions.

The following functions decompose a Unicode character.

Function: int uc_decomposition (ucs4_t uc, int *decomp_tag, ucs4_t *decomposition)

Returns the character decomposition mapping of the Unicode character uc. decomposition must point to an array of at least UC_DECOMPOSITION_MAX_LENGTH ucs_t elements.

When a decomposition exists, decomposition[0..n-1] and *decomp_tag are filled and n is returned. Otherwise -1 is returned.

Function: int uc_canonical_decomposition (ucs4_t uc, ucs4_t *decomposition)

Returns the canonical character decomposition mapping of the Unicode character uc. decomposition must point to an array of at least UC_DECOMPOSITION_MAX_LENGTH ucs_t elements.

When a decomposition exists, decomposition[0..n-1] is filled and n is returned. Otherwise -1 is returned.

Note: This function returns the (simple) “canonical decomposition” of uc. If you want the “full canonical decomposition” of uc, that is, the recursive application of “canonical decomposition”, use the function u*_normalize with argument UNINORM_NFD instead.


Next: Composition of Unicode characters, Up: Normalization forms (composition and decomposition) <uninorm.h>   [Contents][Index]