Next: Composition of Unicode characters, Up: Normalization forms (composition and decomposition) `<uninorm.h>`

[Contents][Index]

The following enumerated values are the possible types of decomposition of a Unicode character.

- Constant:
*int***UC_DECOMP_CANONICAL**¶ Denotes canonical decomposition.

- Constant:
*int***UC_DECOMP_FONT**¶ UCD marker:

`<font>`

. Denotes a font variant (e.g. a blackletter form).

- Constant:
*int***UC_DECOMP_NOBREAK**¶ UCD marker:

`<noBreak>`

. Denotes a no-break version of a space or hyphen.

- Constant:
*int***UC_DECOMP_INITIAL**¶ UCD marker:

`<initial>`

. Denotes an initial presentation form (Arabic).

- Constant:
*int***UC_DECOMP_MEDIAL**¶ UCD marker:

`<medial>`

. Denotes a medial presentation form (Arabic).

- Constant:
*int***UC_DECOMP_FINAL**¶ UCD marker:

`<final>`

. Denotes a final presentation form (Arabic).

- Constant:
*int***UC_DECOMP_ISOLATED**¶ UCD marker:

`<isolated>`

. Denotes an isolated presentation form (Arabic).

- Constant:
*int***UC_DECOMP_CIRCLE**¶ UCD marker:

`<circle>`

. Denotes an encircled form.

- Constant:
*int***UC_DECOMP_SUPER**¶ UCD marker:

`<super>`

. Denotes a superscript form.

- Constant:
*int***UC_DECOMP_SUB**¶ UCD marker:

`<sub>`

. Denotes a subscript form.

- Constant:
*int***UC_DECOMP_VERTICAL**¶ UCD marker:

`<vertical>`

. Denotes a vertical layout presentation form.

- Constant:
*int***UC_DECOMP_WIDE**¶ UCD marker:

`<wide>`

. Denotes a wide (or zenkaku) compatibility character.

- Constant:
*int***UC_DECOMP_NARROW**¶ UCD marker:

`<narrow>`

. Denotes a narrow (or hankaku) compatibility character.

- Constant:
*int***UC_DECOMP_SMALL**¶ UCD marker:

`<small>`

. Denotes a small variant form (CNS compatibility).

- Constant:
*int***UC_DECOMP_SQUARE**¶ UCD marker:

`<square>`

. Denotes a CJK squared font variant.

- Constant:
*int***UC_DECOMP_FRACTION**¶ UCD marker:

`<fraction>`

. Denotes a vulgar fraction form.

- Constant:
*int***UC_DECOMP_COMPAT**¶ UCD marker:

`<compat>`

. Denotes an otherwise unspecified compatibility character.

The following constant denotes the maximum size of decomposition of a single Unicode character.

- Macro:
*unsigned int***UC_DECOMPOSITION_MAX_LENGTH**¶ This macro expands to a constant that is the required size of buffer passed to the

`uc_decomposition`

and`uc_canonical_decomposition`

functions.

The following functions decompose a Unicode character.

- Function:
*int***uc_decomposition***(ucs4_t*¶`uc`, int *`decomp_tag`, ucs4_t *`decomposition`) Returns the character decomposition mapping of the Unicode character

`uc`.`decomposition`must point to an array of at least`UC_DECOMPOSITION_MAX_LENGTH`

`ucs_t`

elements.When a decomposition exists,

and`decomposition`[0..`n`-1]`*`

are filled and`decomp_tag``n`is returned. Otherwise -1 is returned.

- Function:
*int***uc_canonical_decomposition***(ucs4_t*¶`uc`, ucs4_t *`decomposition`) Returns the canonical character decomposition mapping of the Unicode character

`uc`.`decomposition`must point to an array of at least`UC_DECOMPOSITION_MAX_LENGTH`

`ucs_t`

elements.When a decomposition exists,

is filled and`decomposition`[0..`n`-1]`n`is returned. Otherwise -1 is returned.Note: This function returns the (simple) “canonical decomposition” of

`uc`. If you want the “full canonical decomposition” of`uc`, that is, the recursive application of “canonical decomposition”, use the function`u*_normalize`

with argument`UNINORM_NFD`

instead.