Previous: , Up: Strings   [Contents][Index] String Internals

Guile stores each string in memory as a contiguous array of Unicode code points along with an associated set of attributes. If all of the code points of a string have an integer range between 0 and 255 inclusive, the code point array is stored as one byte per code point: it is stored as an ISO-8859-1 (aka Latin-1) string. If any of the code points of the string has an integer value greater that 255, the code point array is stored as four bytes per code point: it is stored as a UTF-32 string.

Conversion between the one-byte-per-code-point and four-bytes-per-code-point representations happens automatically as necessary.

No API is provided to set the internal representation of strings; however, there are pair of procedures available to query it. These are debugging procedures. Using them in production code is discouraged, since the details of Guile’s internal representation of strings may change from release to release.

Scheme Procedure: string-bytes-per-char str
C Function: scm_string_bytes_per_char (str)

Return the number of bytes used to encode a Unicode code point in string str. The result is one or four.

Scheme Procedure: %string-dump str
C Function: scm_sys_string_dump (str)

Returns an association list containing debugging information for str. The association list has the following entries.


The string itself.


The start index of the string into its stringbuf


The length of the string


If this string is a substring, it returns its parent string. Otherwise, it returns #f


#t if the string is read-only


A new string containing this string’s stringbuf’s characters


The number of characters in this stringbuf


#t if this stringbuf is shared


#t if this stringbuf’s characters are stored in a 32-bit buffer, or #f if they are stored in an 8-bit buffer

Previous: , Up: Strings   [Contents][Index]