Next: , Up: General category   [Contents][Index]


8.1.1 The object oriented API for general category

Type: uc_general_category_t

This data type denotes a general category value. It is an immediate type that can be copied by simple assignment, without involving memory allocation. It is not an array type.

The following are the predefined general category value. Additional general categories may be added in the future.

The UC_CATEGORY_* constants reflect the systematic general category values assigned by the Unicode Consortium. Whereas the other UC_* macros are aliases, for use when readable code is preferred.

Constant: uc_general_category_t UC_CATEGORY_L
Macro: uc_general_category_t UC_LETTER

This represents the general category “Letter”.

Constant: uc_general_category_t UC_CATEGORY_LC
Macro: uc_general_category_t UC_CASED_LETTER
Constant: uc_general_category_t UC_CATEGORY_Lu
Macro: uc_general_category_t UC_UPPERCASE_LETTER

This represents the general category “Letter, uppercase”.

Constant: uc_general_category_t UC_CATEGORY_Ll
Macro: uc_general_category_t UC_LOWERCASE_LETTER

This represents the general category “Letter, lowercase”.

Constant: uc_general_category_t UC_CATEGORY_Lt
Macro: uc_general_category_t UC_TITLECASE_LETTER

This represents the general category “Letter, titlecase”.

Constant: uc_general_category_t UC_CATEGORY_Lm
Macro: uc_general_category_t UC_MODIFIER_LETTER

This represents the general category “Letter, modifier”.

Constant: uc_general_category_t UC_CATEGORY_Lo
Macro: uc_general_category_t UC_OTHER_LETTER

This represents the general category “Letter, other”.

Constant: uc_general_category_t UC_CATEGORY_M
Macro: uc_general_category_t UC_MARK

This represents the general category “Marker”.

Constant: uc_general_category_t UC_CATEGORY_Mn
Macro: uc_general_category_t UC_NON_SPACING_MARK

This represents the general category “Marker, nonspacing”.

Constant: uc_general_category_t UC_CATEGORY_Mc
Macro: uc_general_category_t UC_COMBINING_SPACING_MARK

This represents the general category “Marker, spacing combining”.

Constant: uc_general_category_t UC_CATEGORY_Me
Macro: uc_general_category_t UC_ENCLOSING_MARK

This represents the general category “Marker, enclosing”.

Constant: uc_general_category_t UC_CATEGORY_N
Macro: uc_general_category_t UC_NUMBER

This represents the general category “Number”.

Constant: uc_general_category_t UC_CATEGORY_Nd
Macro: uc_general_category_t UC_DECIMAL_DIGIT_NUMBER

This represents the general category “Number, decimal digit”.

Constant: uc_general_category_t UC_CATEGORY_Nl
Macro: uc_general_category_t UC_LETTER_NUMBER

This represents the general category “Number, letter”.

Constant: uc_general_category_t UC_CATEGORY_No
Macro: uc_general_category_t UC_OTHER_NUMBER

This represents the general category “Number, other”.

Constant: uc_general_category_t UC_CATEGORY_P
Macro: uc_general_category_t UC_PUNCTUATION

This represents the general category “Punctuation”.

Constant: uc_general_category_t UC_CATEGORY_Pc
Macro: uc_general_category_t UC_CONNECTOR_PUNCTUATION

This represents the general category “Punctuation, connector”.

Constant: uc_general_category_t UC_CATEGORY_Pd
Macro: uc_general_category_t UC_DASH_PUNCTUATION

This represents the general category “Punctuation, dash”.

Constant: uc_general_category_t UC_CATEGORY_Ps
Macro: uc_general_category_t UC_OPEN_PUNCTUATION

This represents the general category “Punctuation, open”, a.k.a. “start punctuation”.

Constant: uc_general_category_t UC_CATEGORY_Pe
Macro: uc_general_category_t UC_CLOSE_PUNCTUATION

This represents the general category “Punctuation, close”, a.k.a. “end punctuation”.

Constant: uc_general_category_t UC_CATEGORY_Pi
Macro: uc_general_category_t UC_INITIAL_QUOTE_PUNCTUATION

This represents the general category “Punctuation, initial quote”.

Constant: uc_general_category_t UC_CATEGORY_Pf
Macro: uc_general_category_t UC_FINAL_QUOTE_PUNCTUATION

This represents the general category “Punctuation, final quote”.

Constant: uc_general_category_t UC_CATEGORY_Po
Macro: uc_general_category_t UC_OTHER_PUNCTUATION

This represents the general category “Punctuation, other”.

Constant: uc_general_category_t UC_CATEGORY_S
Macro: uc_general_category_t UC_SYMBOL

This represents the general category “Symbol”.

Constant: uc_general_category_t UC_CATEGORY_Sm
Macro: uc_general_category_t UC_MATH_SYMBOL

This represents the general category “Symbol, math”.

Constant: uc_general_category_t UC_CATEGORY_Sc
Macro: uc_general_category_t UC_CURRENCY_SYMBOL

This represents the general category “Symbol, currency”.

Constant: uc_general_category_t UC_CATEGORY_Sk
Macro: uc_general_category_t UC_MODIFIER_SYMBOL

This represents the general category “Symbol, modifier”.

Constant: uc_general_category_t UC_CATEGORY_So
Macro: uc_general_category_t UC_OTHER_SYMBOL

This represents the general category “Symbol, other”.

Constant: uc_general_category_t UC_CATEGORY_Z
Macro: uc_general_category_t UC_SEPARATOR

This represents the general category “Separator”.

Constant: uc_general_category_t UC_CATEGORY_Zs
Macro: uc_general_category_t UC_SPACE_SEPARATOR

This represents the general category “Separator, space”.

Constant: uc_general_category_t UC_CATEGORY_Zl
Macro: uc_general_category_t UC_LINE_SEPARATOR

This represents the general category “Separator, line”.

Constant: uc_general_category_t UC_CATEGORY_Zp
Macro: uc_general_category_t UC_PARAGRAPH_SEPARATOR

This represents the general category “Separator, paragraph”.

Constant: uc_general_category_t UC_CATEGORY_C
Macro: uc_general_category_t UC_OTHER

This represents the general category “Other”.

Constant: uc_general_category_t UC_CATEGORY_Cc
Macro: uc_general_category_t UC_CONTROL

This represents the general category “Other, control”.

Constant: uc_general_category_t UC_CATEGORY_Cf
Macro: uc_general_category_t UC_FORMAT

This represents the general category “Other, format”.

Constant: uc_general_category_t UC_CATEGORY_Cs
Macro: uc_general_category_t UC_SURROGATE

This represents the general category “Other, surrogate”. All code points in this category are invalid characters.

Constant: uc_general_category_t UC_CATEGORY_Co
Macro: uc_general_category_t UC_PRIVATE_USE

This represents the general category “Other, private use”.

Constant: uc_general_category_t UC_CATEGORY_Cn
Macro: uc_general_category_t UC_UNASSIGNED

This represents the general category “Other, not assigned”. Some code points in this category are invalid characters.

The following functions combine general categories, like in a boolean algebra, except that there is no ‘not’ operation.

Function: uc_general_category_t uc_general_category_or (uc_general_category_t category1, uc_general_category_t category2)

Returns the union of two general categories. This corresponds to the unions of the two sets of characters.

Function: uc_general_category_t uc_general_category_and (uc_general_category_t category1, uc_general_category_t category2)

Returns the intersection of two general categories as bit masks. This does not correspond to the intersection of the two sets of characters.

Function: uc_general_category_t uc_general_category_and_not (uc_general_category_t category1, uc_general_category_t category2)

Returns the intersection of a general category with the complement of a second general category, as bit masks. This does not correspond to the intersection with complement, when viewing the categories as sets of characters.

The following functions associate general categories with their name.

Function: const char * uc_general_category_name (uc_general_category_t category)

Returns the name of a general category, more precisely, the abbreviated name. Returns NULL if the general category corresponds to a bit mask that does not have a name.

Function: const char * uc_general_category_long_name (uc_general_category_t category)

Returns the long name of a general category. Returns NULL if the general category corresponds to a bit mask that does not have a name.

Function: uc_general_category_t uc_general_category_byname (const char *category_name)

Returns the general category given by name, e.g. "Lu", or by long name, e.g. "Uppercase Letter". This lookup ignores spaces, underscores, or hyphens as word separators and is case-insignificant.

The following functions view general categories as sets of Unicode characters.

Function: uc_general_category_t uc_general_category (ucs4_t uc)

Returns the general category of a Unicode character.

This function uses a big table.

Function: bool uc_is_general_category (ucs4_t uc, uc_general_category_t category)

Tests whether a Unicode character belongs to a given category. The category argument can be a predefined general category or the combination of several predefined general categories.


Next: The bit mask API for general category, Up: General category   [Contents][Index]