Next: , Up: Unicode character classification and properties <unictype.h>   [Contents][Index]


8.1 General category

Every Unicode character or code point has a general category assigned to it. This classification is important for most algorithms that work on Unicode text.

The GNU libunistring library provides two kinds of API for working with general categories. The object oriented API uses a variable to denote every predefined general category value or combinations thereof. The low-level API uses a bit mask instead. The advantage of the object oriented API is that if only a few predefined general category values are used, the data tables are relatively small. When you combine general category values (using uc_general_category_or, uc_general_category_and, or uc_general_category_and_not), or when you use the low level bit masks, a big table is used thats holds the complete general category information for all Unicode characters.