Next: , Up: General category


8.1.1 The object oriented API for general category

— Type: uc_general_category_t

This data type denotes a general category value. It is an immediate type that can be copied by simple assignment, without involving memory allocation. It is not an array type.

The following are the predefined general category value. Additional general categories may be added in the future.

— Constant: uc_general_category_t UC_CATEGORY_L
— Constant: uc_general_category_t UC_CATEGORY_Lu
— Constant: uc_general_category_t UC_CATEGORY_Ll
— Constant: uc_general_category_t UC_CATEGORY_Lt
— Constant: uc_general_category_t UC_CATEGORY_Lm
— Constant: uc_general_category_t UC_CATEGORY_Lo
— Constant: uc_general_category_t UC_CATEGORY_M
— Constant: uc_general_category_t UC_CATEGORY_Mn
— Constant: uc_general_category_t UC_CATEGORY_Mc
— Constant: uc_general_category_t UC_CATEGORY_Me
— Constant: uc_general_category_t UC_CATEGORY_N
— Constant: uc_general_category_t UC_CATEGORY_Nd
— Constant: uc_general_category_t UC_CATEGORY_Nl
— Constant: uc_general_category_t UC_CATEGORY_No
— Constant: uc_general_category_t UC_CATEGORY_P
— Constant: uc_general_category_t UC_CATEGORY_Pc
— Constant: uc_general_category_t UC_CATEGORY_Pd
— Constant: uc_general_category_t UC_CATEGORY_Ps
— Constant: uc_general_category_t UC_CATEGORY_Pe
— Constant: uc_general_category_t UC_CATEGORY_Pi
— Constant: uc_general_category_t UC_CATEGORY_Pf
— Constant: uc_general_category_t UC_CATEGORY_Po
— Constant: uc_general_category_t UC_CATEGORY_S
— Constant: uc_general_category_t UC_CATEGORY_Sm
— Constant: uc_general_category_t UC_CATEGORY_Sc
— Constant: uc_general_category_t UC_CATEGORY_Sk
— Constant: uc_general_category_t UC_CATEGORY_So
— Constant: uc_general_category_t UC_CATEGORY_Z
— Constant: uc_general_category_t UC_CATEGORY_Zs
— Constant: uc_general_category_t UC_CATEGORY_Zl
— Constant: uc_general_category_t UC_CATEGORY_Zp
— Constant: uc_general_category_t UC_CATEGORY_C
— Constant: uc_general_category_t UC_CATEGORY_Cc
— Constant: uc_general_category_t UC_CATEGORY_Cf
— Constant: uc_general_category_t UC_CATEGORY_Cs
— Constant: uc_general_category_t UC_CATEGORY_Co
— Constant: uc_general_category_t UC_CATEGORY_Cn

The following are alias names for predefined General category values.

— Macro: uc_general_category_t UC_LETTER

This is another name for UC_CATEGORY_L.

— Macro: uc_general_category_t UC_UPPERCASE_LETTER

This is another name for UC_CATEGORY_Lu.

— Macro: uc_general_category_t UC_LOWERCASE_LETTER

This is another name for UC_CATEGORY_Ll.

— Macro: uc_general_category_t UC_TITLECASE_LETTER

This is another name for UC_CATEGORY_Lt.

— Macro: uc_general_category_t UC_MODIFIER_LETTER

This is another name for UC_CATEGORY_Lm.

— Macro: uc_general_category_t UC_OTHER_LETTER

This is another name for UC_CATEGORY_Lo.

— Macro: uc_general_category_t UC_MARK

This is another name for UC_CATEGORY_M.

— Macro: uc_general_category_t UC_NON_SPACING_MARK

This is another name for UC_CATEGORY_Mn.

— Macro: uc_general_category_t UC_COMBINING_SPACING_MARK

This is another name for UC_CATEGORY_Mc.

— Macro: uc_general_category_t UC_ENCLOSING_MARK

This is another name for UC_CATEGORY_Me.

— Macro: uc_general_category_t UC_NUMBER

This is another name for UC_CATEGORY_N.

— Macro: uc_general_category_t UC_DECIMAL_DIGIT_NUMBER

This is another name for UC_CATEGORY_Nd.

— Macro: uc_general_category_t UC_LETTER_NUMBER

This is another name for UC_CATEGORY_Nl.

— Macro: uc_general_category_t UC_OTHER_NUMBER

This is another name for UC_CATEGORY_No.

— Macro: uc_general_category_t UC_PUNCTUATION

This is another name for UC_CATEGORY_P.

— Macro: uc_general_category_t UC_CONNECTOR_PUNCTUATION

This is another name for UC_CATEGORY_Pc.

— Macro: uc_general_category_t UC_DASH_PUNCTUATION

This is another name for UC_CATEGORY_Pd.

— Macro: uc_general_category_t UC_OPEN_PUNCTUATION

This is another name for UC_CATEGORY_Ps (“start punctuation”).

— Macro: uc_general_category_t UC_CLOSE_PUNCTUATION

This is another name for UC_CATEGORY_Pe (“end punctuation”).

— Macro: uc_general_category_t UC_INITIAL_QUOTE_PUNCTUATION

This is another name for UC_CATEGORY_Pi.

— Macro: uc_general_category_t UC_FINAL_QUOTE_PUNCTUATION

This is another name for UC_CATEGORY_Pf.

— Macro: uc_general_category_t UC_OTHER_PUNCTUATION

This is another name for UC_CATEGORY_Po.

— Macro: uc_general_category_t UC_SYMBOL

This is another name for UC_CATEGORY_S.

— Macro: uc_general_category_t UC_MATH_SYMBOL

This is another name for UC_CATEGORY_Sm.

— Macro: uc_general_category_t UC_CURRENCY_SYMBOL

This is another name for UC_CATEGORY_Sc.

— Macro: uc_general_category_t UC_MODIFIER_SYMBOL

This is another name for UC_CATEGORY_Sk.

— Macro: uc_general_category_t UC_OTHER_SYMBOL

This is another name for UC_CATEGORY_So.

— Macro: uc_general_category_t UC_SEPARATOR

This is another name for UC_CATEGORY_Z.

— Macro: uc_general_category_t UC_SPACE_SEPARATOR

This is another name for UC_CATEGORY_Zs.

— Macro: uc_general_category_t UC_LINE_SEPARATOR

This is another name for UC_CATEGORY_Zl.

— Macro: uc_general_category_t UC_PARAGRAPH_SEPARATOR

This is another name for UC_CATEGORY_Zp.

— Macro: uc_general_category_t UC_OTHER

This is another name for UC_CATEGORY_C.

— Macro: uc_general_category_t UC_CONTROL

This is another name for UC_CATEGORY_Cc.

— Macro: uc_general_category_t UC_FORMAT

This is another name for UC_CATEGORY_Cf.

— Macro: uc_general_category_t UC_SURROGATE

This is another name for UC_CATEGORY_Cs. All code points in this category are invalid characters.

— Macro: uc_general_category_t UC_PRIVATE_USE

This is another name for UC_CATEGORY_Co.

— Macro: uc_general_category_t UC_UNASSIGNED

This is another name for UC_CATEGORY_Cn. Some code points in this category are invalid characters.

The following functions combine general categories, like in a boolean algebra, except that there is no ‘not’ operation.

— Function: uc_general_category_t uc_general_category_or (uc_general_category_t category1, uc_general_category_t category2)

Returns the union of two general categories. This corresponds to the unions of the two sets of characters.

— Function: uc_general_category_t uc_general_category_and (uc_general_category_t category1, uc_general_category_t category2)

Returns the intersection of two general categories as bit masks. This does not correspond to the intersection of the two sets of characters.

— Function: uc_general_category_t uc_general_category_and_not (uc_general_category_t category1, uc_general_category_t category2)

Returns the intersection of a general category with the complement of a second general category, as bit masks. This does not correspond to the intersection with complement, when viewing the categories as sets of characters.

The following functions associate general categories with their name.

— Function: const char * uc_general_category_name (uc_general_category_t category)

Returns the name of a general category. Returns NULL if the general category corresponds to a bit mask that does not have a name.

— Function: uc_general_category_t uc_general_category_byname (const char *category_name)

Returns the general category given by name, e.g. "Lu".

The following functions view general categories as sets of Unicode characters.

— Function: uc_general_category_t uc_general_category (ucs4_t uc)

Returns the general category of a Unicode character.

This function uses a big table.

— Function: bool uc_is_general_category (ucs4_t uc, uc_general_category_t category)

Tests whether a Unicode character belongs to a given category. The category argument can be a predefined general category or the combination of several predefined general categories.