Next: , Previous: , Up: Unicode character classification and properties <unictype.h>   [Contents][Index]


8.2 Canonical combining class

Every Unicode character or code point has a canonical combining class assigned to it.

What is the meaning of the canonical combining class? Essentially, it indicates the priority with which a combining character is attached to its base character. The characters for which the canonical combining class is 0 are the base characters, and the characters for which it is greater than 0 are the combining characters. Combining characters are rendered near/attached/around their base character, and combining characters with small combining classes are attached "first" or "closer" to the base character.

The canonical combining class of a character is a number in the range 0..255. The possible values are described in the Unicode Character Database https://www.unicode.org/Public/UNIDATA/UCD.html. The list here is not definitive; more values can be added in future versions.

Constant: int UC_CCC_NR

The canonical combining class value for “Not Reordered” characters. The value is 0.

Constant: int UC_CCC_OV

The canonical combining class value for “Overlay” characters.

Constant: int UC_CCC_NK

The canonical combining class value for “Nukta” characters.

Constant: int UC_CCC_KV

The canonical combining class value for “Kana Voicing” characters.

Constant: int UC_CCC_VR

The canonical combining class value for “Virama” characters.

Constant: int UC_CCC_ATBL

The canonical combining class value for “Attached Below Left” characters.

Constant: int UC_CCC_ATB

The canonical combining class value for “Attached Below” characters.

Constant: int UC_CCC_ATA

The canonical combining class value for “Attached Above” characters.

Constant: int UC_CCC_ATAR

The canonical combining class value for “Attached Above Right” characters.

Constant: int UC_CCC_BL

The canonical combining class value for “Below Left” characters.

Constant: int UC_CCC_B

The canonical combining class value for “Below” characters.

Constant: int UC_CCC_BR

The canonical combining class value for “Below Right” characters.

Constant: int UC_CCC_L

The canonical combining class value for “Left” characters.

Constant: int UC_CCC_R

The canonical combining class value for “Right” characters.

Constant: int UC_CCC_AL

The canonical combining class value for “Above Left” characters.

Constant: int UC_CCC_A

The canonical combining class value for “Above” characters.

Constant: int UC_CCC_AR

The canonical combining class value for “Above Right” characters.

Constant: int UC_CCC_DB

The canonical combining class value for “Double Below” characters.

Constant: int UC_CCC_DA

The canonical combining class value for “Double Above” characters.

Constant: int UC_CCC_IS

The canonical combining class value for “Iota Subscript” characters.

The following functions associate canonical combining classes with their name.

Function: const char * uc_combining_class_name (int ccc)

Returns the name of a canonical combining class, more precisely, the abbreviated name. Returns NULL if the canonical combining class is a numeric value without a name.

Function: const char * uc_combining_class_long_name (int ccc)

Returns the long name of a canonical combining class. Returns NULL if the canonical combining class is a numeric value without a name.

Function: int uc_combining_class_byname (const char *ccc_name)

Returns the canonical combining class given by name, e.g. "BL", or by long name, e.g. "Below Left". This lookup ignores spaces, underscores, or hyphens as word separators and is case-insignificant.

The following function looks up the canonical combining class of a character.

Function: int uc_combining_class (ucs4_t uc)

Returns the canonical combining class of a Unicode character.


Next: Bidi class, Previous: General category, Up: Unicode character classification and properties <unictype.h>   [Contents][Index]