Previous: , Up: Word breaks in strings <uniwbrk.h>   [Contents][Index]


11.2 Word break property

This is a more low-level API. The word break property is a property defined in Unicode Standard Annex #29, section “Word Boundaries”, see https://www.unicode.org/reports/tr29/#Word_Boundaries. It is used for determining the word breaks in a string.

The following are the possible values of the word break property. More values may be added in the future.

Constant: int WBP_OTHER
Constant: int WBP_CR
Constant: int WBP_LF
Constant: int WBP_NEWLINE
Constant: int WBP_EXTEND
Constant: int WBP_FORMAT
Constant: int WBP_KATAKANA
Constant: int WBP_ALETTER
Constant: int WBP_MIDNUMLET
Constant: int WBP_MIDLETTER
Constant: int WBP_MIDNUM
Constant: int WBP_NUMERIC
Constant: int WBP_EXTENDNUMLET
Constant: int WBP_RI
Constant: int WBP_DQ
Constant: int WBP_SQ
Constant: int WBP_HL
Constant: int WBP_ZWJ
Constant: int WBP_EB
Constant: int WBP_EM
Constant: int WBP_GAZ
Constant: int WBP_EBG

The following function looks up the word break property of a character.

Function: int uc_wordbreak_property (ucs4_t uc)

Returns the Word_Break property of a Unicode character.


Previous: Word breaks in a string, Up: Word breaks in strings <uniwbrk.h>   [Contents][Index]