Decomposition of characters (GNU libunistring)

13.1 Decomposition of Unicode characters

The following enumerated values are the possible types of decomposition of a Unicode character.

Constant: int UC_DECOMP_CANONICAL ¶: Denotes canonical decomposition.

Constant: int UC_DECOMP_FONT ¶: UCD marker: <font>. Denotes a font variant (e.g. a blackletter form).

Constant: int UC_DECOMP_NOBREAK ¶: UCD marker: <noBreak>. Denotes a no-break version of a space or hyphen.

Constant: int UC_DECOMP_INITIAL ¶: UCD marker: <initial>. Denotes an initial presentation form (Arabic).

Constant: int UC_DECOMP_MEDIAL ¶: UCD marker: <medial>. Denotes a medial presentation form (Arabic).

Constant: int UC_DECOMP_FINAL ¶: UCD marker: <final>. Denotes a final presentation form (Arabic).

Constant: int UC_DECOMP_ISOLATED ¶: UCD marker: <isolated>. Denotes an isolated presentation form (Arabic).

Constant: int UC_DECOMP_CIRCLE ¶: UCD marker: <circle>. Denotes an encircled form.

Constant: int UC_DECOMP_SUPER ¶: UCD marker: <super>. Denotes a superscript form.

Constant: int UC_DECOMP_SUB ¶: UCD marker: <sub>. Denotes a subscript form.

Constant: int UC_DECOMP_VERTICAL ¶: UCD marker: <vertical>. Denotes a vertical layout presentation form.

Constant: int UC_DECOMP_WIDE ¶: UCD marker: <wide>. Denotes a wide (or zenkaku) compatibility character.

Constant: int UC_DECOMP_NARROW ¶: UCD marker: <narrow>. Denotes a narrow (or hankaku) compatibility character.

Constant: int UC_DECOMP_SMALL ¶: UCD marker: <small>. Denotes a small variant form (CNS compatibility).

Constant: int UC_DECOMP_SQUARE ¶: UCD marker: <square>. Denotes a CJK squared font variant.

Constant: int UC_DECOMP_FRACTION ¶: UCD marker: <fraction>. Denotes a vulgar fraction form.

Constant: int UC_DECOMP_COMPAT ¶: UCD marker: <compat>. Denotes an otherwise unspecified compatibility character.

The following constant denotes the maximum size of decomposition of a single Unicode character.

Macro: unsigned int UC_DECOMPOSITION_MAX_LENGTH ¶: This macro expands to a constant that is the required size of buffer passed to the uc_decomposition and uc_canonical_decomposition functions.

The following functions decompose a Unicode character.

Function: int uc_decomposition (ucs4_t uc, int *decomp_tag, ucs4_t *decomposition) ¶

Returns the character decomposition mapping of the Unicode character uc. decomposition must point to an array of at least UC_DECOMPOSITION_MAX_LENGTH ucs_t elements.

When a decomposition exists, decomposition[0..n-1] and *decomp_tag are filled and n is returned. Otherwise -1 is returned.

Function: int uc_canonical_decomposition (ucs4_t uc, ucs4_t *decomposition) ¶

Returns the canonical character decomposition mapping of the Unicode character uc. decomposition must point to an array of at least UC_DECOMPOSITION_MAX_LENGTH ucs_t elements.

When a decomposition exists, decomposition[0..n-1] is filled and n is returned. Otherwise -1 is returned.

Note: This function returns the (simple) “canonical decomposition” of uc. If you want the “full canonical decomposition” of uc, that is, the recursive application of “canonical decomposition”, use the function u*_normalize with argument UNINORM_NFD instead.