MIT/GNU Scheme’s character-set abstraction is used to represent groups of characters, such as the letters or digits. A character set may contain any character. Alternatively, a character set can be treated as a set of code points.
Implementation note: MIT/GNU Scheme allows any “bitless” character to be stored in a character set; operations that accept characters automatically strip their bucky bits.
#t if object is a character set, otherwise it
#t if char is in char-set, otherwise it
#t if code-point is in char-set, otherwise
Returns a procedure of one argument that returns
#t if its
argument is a character in char-set, otherwise it returns
Calls predicate once on each Unicode code point, and returns a character set containing exactly the code points for which predicate returns a true value.
The next procedures represent a character set as a code-point
list, which is a list of code-point range elements. A
code-point range is either a Unicode code point, or a pair
(start . end) that specifies a contiguous range of
code points. Both start and end must be exact nonnegative
integers less than or equal to
#x110000, and start must
be less than or equal to end. The range specifies all of the
code points greater than or equal to start and strictly less
Returns a new character set consisting of the characters specified by
elements. The procedure
char-set takes these elements as
multiple arguments, while
char-set* takes them as a single
list-valued argument; in all other respects these procedures are
An element can take several forms, each of which specifies one or more characters to include in the resulting character set: a character includes itself; a string includes all of the characters it contains; a character set includes its members; or a code-point range includes the corresponding characters.
In addition, an element may be a symbol from the following table, which represents the characters as shown:
|Name||Unicode character specification|
|Alphabetic = True|
|Alphabetic = True | Numeric_Type = Decimal|
|Cased = True|
|Lowercase = True|
|Numeric_Type = Decimal|
|General_Category != (Cs | Cn)|
|Uppercase = True|
|White_Space = True|
Returns a code-point list specifying the contents of char-set. The returned list consists of numerically sorted, disjoint, and non-abutting code-point ranges.
#t if char-set-1 and char-set-2 contain
exactly the same characters, otherwise it returns
Returns a character set that’s the inverse of char-set. That is, the returned character set contains exactly those characters that aren’t in char-set.
These procedures compute the respective set union, set intersection, and set difference of their arguments.
These procedures correspond to
char-set-intersection but take a single argument that’s a list
of character sets rather than multiple character-set arguments.
These constants are the character sets corresponding to
#t if char-set contains only 8-bit code points
(i.e.. ISO 8859-1 characters), otherwise it returns