Previous: Word breaks in a string, Up: uniwbrk.h


10.2 Word break property

This is a more low-level API. The word break property is a property defined in Unicode Standard Annex #29, section “Word Boundaries”, see http://www.unicode.org/reports/tr29/#Word_Boundaries. It is used for determining the word breaks in a string.

The following are the possible values of the word break property. More values may be added in the future.

— Constant: int WBP_OTHER
— Constant: int WBP_CR
— Constant: int WBP_LF
— Constant: int WBP_NEWLINE
— Constant: int WBP_EXTEND
— Constant: int WBP_FORMAT
— Constant: int WBP_KATAKANA
— Constant: int WBP_ALETTER
— Constant: int WBP_MIDNUMLET
— Constant: int WBP_MIDLETTER
— Constant: int WBP_MIDNUM
— Constant: int WBP_NUMERIC
— Constant: int WBP_EXTENDNUMLET

The following function looks up the word break property of a character.

— Function: int uc_wordbreak_property (ucs4_t uc)

Returns the Word_Break property of a Unicode character.