String Comparison (Guile Reference Manual)

Warning: This is the manual of the legacy Guile 2.2 series. You may want to read the manual of the current stable series instead.

6.6.5.7 String Comparison

The procedures in this section are similar to the character ordering predicates (see Characters), but are defined on character sequences.

The first set is specified in R5RS and has names that end in ?. The second set is specified in SRFI-13 and the names have not ending ?.

The predicates ending in -ci ignore the character case when comparing strings. For now, case-insensitive comparison is done using the R5RS rules, where every lower-case character that has a single character upper-case form is converted to uppercase before comparison. See See the (ice-9 i18n) module, for locale-dependent string comparison.

Scheme Procedure: string=? s1 s2 s3 …

Lexicographic equality predicate; return #t if all strings are the same length and contain the same characters in the same positions, otherwise return #f.

The procedure string-ci=? treats upper and lower case letters as though they were the same character, but string=? treats upper and lower case as distinct characters.

Scheme Procedure: string<? s1 s2 s3 …: Lexicographic ordering predicate; return #t if, for every pair of consecutive string arguments str_i and str_i+1, str_i is lexicographically less than str_i+1.

Scheme Procedure: string<=? s1 s2 s3 …: Lexicographic ordering predicate; return #t if, for every pair of consecutive string arguments str_i and str_i+1, str_i is lexicographically less than or equal to str_i+1.

Scheme Procedure: string>? s1 s2 s3 …: Lexicographic ordering predicate; return #t if, for every pair of consecutive string arguments str_i and str_i+1, str_i is lexicographically greater than str_i+1.

Scheme Procedure: string>=? s1 s2 s3 …: Lexicographic ordering predicate; return #t if, for every pair of consecutive string arguments str_i and str_i+1, str_i is lexicographically greater than or equal to str_i+1.

Scheme Procedure: string-ci=? s1 s2 s3 …: Case-insensitive string equality predicate; return #t if all strings are the same length and their component characters match (ignoring case) at each position; otherwise return #f.

Scheme Procedure: string-ci<? s1 s2 s3 …: Case insensitive lexicographic ordering predicate; return #t if, for every pair of consecutive string arguments str_i and str_i+1, str_i is lexicographically less than str_i+1 regardless of case.

Scheme Procedure: string-ci<=? s1 s2 s3 …: Case insensitive lexicographic ordering predicate; return #t if, for every pair of consecutive string arguments str_i and str_i+1, str_i is lexicographically less than or equal to str_i+1 regardless of case.

Scheme Procedure: string-ci>? s1 s2 s3 …: Case insensitive lexicographic ordering predicate; return #t if, for every pair of consecutive string arguments str_i and str_i+1, str_i is lexicographically greater than str_i+1 regardless of case.

Scheme Procedure: string-ci>=? s1 s2 s3 …: Case insensitive lexicographic ordering predicate; return #t if, for every pair of consecutive string arguments str_i and str_i+1, str_i is lexicographically greater than or equal to str_i+1 regardless of case.

Scheme Procedure: string-compare s1 s2 proc_lt proc_eq proc_gt [start1 [end1 [start2 [end2]]]]
C Function: scm_string_compare (s1, s2, proc_lt, proc_eq, proc_gt, start1, end1, start2, end2): Apply proc_lt, proc_eq, proc_gt to the mismatch index, depending upon whether s1 is less than, equal to, or greater than s2. The mismatch index is the largest index i such that for every 0 <= j < i, s1[j] = s2[j] – that is, i is the first position that does not match.

Scheme Procedure: string-compare-ci s1 s2 proc_lt proc_eq proc_gt [start1 [end1 [start2 [end2]]]]
C Function: scm_string_compare_ci (s1, s2, proc_lt, proc_eq, proc_gt, start1, end1, start2, end2): Apply proc_lt, proc_eq, proc_gt to the mismatch index, depending upon whether s1 is less than, equal to, or greater than s2. The mismatch index is the largest index i such that for every 0 <= j < i, s1[j] = s2[j] – that is, i is the first position where the lowercased letters do not match.

Scheme Procedure: string= s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_eq (s1, s2, start1, end1, start2, end2): Return #f if s1 and s2 are not equal, a true value otherwise.

Scheme Procedure: string<> s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_neq (s1, s2, start1, end1, start2, end2): Return #f if s1 and s2 are equal, a true value otherwise.

Scheme Procedure: string< s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_lt (s1, s2, start1, end1, start2, end2): Return #f if s1 is greater or equal to s2, a true value otherwise.

Scheme Procedure: string> s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_gt (s1, s2, start1, end1, start2, end2): Return #f if s1 is less or equal to s2, a true value otherwise.

Scheme Procedure: string<= s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_le (s1, s2, start1, end1, start2, end2): Return #f if s1 is greater to s2, a true value otherwise.

Scheme Procedure: string>= s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_ge (s1, s2, start1, end1, start2, end2): Return #f if s1 is less to s2, a true value otherwise.

Scheme Procedure: string-ci= s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_ci_eq (s1, s2, start1, end1, start2, end2): Return #f if s1 and s2 are not equal, a true value otherwise. The character comparison is done case-insensitively.

Scheme Procedure: string-ci<> s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_ci_neq (s1, s2, start1, end1, start2, end2): Return #f if s1 and s2 are equal, a true value otherwise. The character comparison is done case-insensitively.

Scheme Procedure: string-ci< s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_ci_lt (s1, s2, start1, end1, start2, end2): Return #f if s1 is greater or equal to s2, a true value otherwise. The character comparison is done case-insensitively.

Scheme Procedure: string-ci> s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_ci_gt (s1, s2, start1, end1, start2, end2): Return #f if s1 is less or equal to s2, a true value otherwise. The character comparison is done case-insensitively.

Scheme Procedure: string-ci<= s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_ci_le (s1, s2, start1, end1, start2, end2): Return #f if s1 is greater to s2, a true value otherwise. The character comparison is done case-insensitively.

Scheme Procedure: string-ci>= s1 s2 [start1 [end1 [start2 [end2]]]]
C Function: scm_string_ci_ge (s1, s2, start1, end1, start2, end2): Return #f if s1 is less to s2, a true value otherwise. The character comparison is done case-insensitively.

Scheme Procedure: string-hash s [bound [start [end]]]
C Function: scm_substring_hash (s, bound, start, end): Compute a hash value for s. The optional argument bound is a non-negative exact integer specifying the range of the hash function. A positive value restricts the return value to the range [0,bound).

Scheme Procedure: string-hash-ci s [bound [start [end]]]
C Function: scm_substring_hash_ci (s, bound, start, end): Compute a hash value for s. The optional argument bound is a non-negative exact integer specifying the range of the hash function. A positive value restricts the return value to the range [0,bound).

Because the same visual appearance of an abstract Unicode character can be obtained via multiple sequences of Unicode characters, even the case-insensitive string comparison functions described above may return #f when presented with strings containing different representations of the same character. For example, the Unicode character “LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE” can be represented with a single character (U+1E69) or by the character “LATIN SMALL LETTER S” (U+0073) followed by the combining marks “COMBINING DOT BELOW” (U+0323) and “COMBINING DOT ABOVE” (U+0307).

For this reason, it is often desirable to ensure that the strings to be compared are using a mutually consistent representation for every character. The Unicode standard defines two methods of normalizing the contents of strings: Decomposition, which breaks composite characters into a set of constituent characters with an ordering defined by the Unicode Standard; and composition, which performs the converse.

There are two decomposition operations. “Canonical decomposition” produces character sequences that share the same visual appearance as the original characters, while “compatibility decomposition” produces ones whose visual appearances may differ from the originals but which represent the same abstract character.

These operations are encapsulated in the following set of normalization forms:

NFD: Characters are decomposed to their canonical forms.
NFKD: Characters are decomposed to their compatibility forms.
NFC: Characters are decomposed to their canonical forms, then composed.
NFKC: Characters are decomposed to their compatibility forms, then composed.

The functions below put their arguments into one of the forms described above.

Scheme Procedure: string-normalize-nfd s
C Function: scm_string_normalize_nfd (s): Return the NFD normalized form of s.

Scheme Procedure: string-normalize-nfkd s
C Function: scm_string_normalize_nfkd (s): Return the NFKD normalized form of s.

Scheme Procedure: string-normalize-nfc s
C Function: scm_string_normalize_nfc (s): Return the NFC normalized form of s.

Scheme Procedure: string-normalize-nfkc s
C Function: scm_string_normalize_nfkc (s): Return the NFKC normalized form of s.