When creating a Scheme string from a C string or when converting a Scheme string to a C string, the concept of character encoding becomes important.
In C, a string is just a sequence of bytes, and the character encoding describes the relation between these bytes and the actual characters that make up the string. For Scheme strings, character encoding is not an issue (most of the time), since in Scheme you usually treat strings as character sequences, not byte sequences.
Converting to C and converting from C each have their own challenges.
When converting from C to Scheme, it is important that the sequence of bytes in the C string be valid with respect to its encoding. ASCII strings, for example, can’t have any bytes greater than 127. An ASCII byte greater than 127 is considered ill-formed and cannot be converted into a Scheme character.
Problems can occur in the reverse operation as well. Not all character encodings can hold all possible Scheme characters. Some encodings, like ASCII for example, can only describe a small subset of all possible characters. So, when converting to C, one must first decide what to do with Scheme characters that can’t be represented in the C string.
Converting a Scheme string to a C string will often allocate fresh
memory to hold the result. You must take care that this memory is
properly freed eventually. In many cases, this can be achieved by
scm_dynwind_free inside an appropriate dynwind context,
See Dynamic Wind.
Creates a new Scheme string that has the same contents as str when interpreted in the character encoding of the current locale.
scm_from_locale_string, str must be null-terminated.
scm_from_locale_stringn, len specifies the length of
str in bytes, and str does not need to be null-terminated.
If len is
(size_t)-1, then str does need to be
null-terminated and the real length will be found with
If the C string is ill-formed, an error will be raised.
Note that these functions should not be used to convert C string
constants, because there is no guarantee that the current locale will
match that of the execution character set, used for string and character
constants. Most modern C compilers use UTF-8 by default, so to convert
C string constants we recommend
respectively, but also frees str with
Thus, you can use this function when you would free str anyway
immediately after creating the Scheme string. In certain cases, Guile
can then use str directly as its internal representation.
Returns a C string with the same contents as str in the character
encoding of the current locale. The C string must be freed with
free eventually, maybe by using
See Dynamic Wind.
scm_to_locale_string, the returned string is
null-terminated and an error is signalled when str contains
scm_to_locale_stringn and lenp not
str might contain
#\nul characters and the length of the
returned string in bytes is stored in
returned string will not be null-terminated in this case. If
scm_to_locale_stringn behaves like
If a character in str cannot be represented in the character encoding of the current locale, the default port conversion strategy is used. See Ports, for more on conversion strategies.
If the conversion strategy is
error, an error will be raised. If
substitute, a replacement character, such as a question
mark, will be inserted in its place. If it is
escape, a hex
escape will be inserted in its place.
Puts str as a C string in the current locale encoding into the
memory pointed to by buf. The buffer at buf has room for
max_len bytes and
scm_to_local_stringbuf will never store
more than that. No terminating
'\0' will be stored.
The return value of
scm_to_locale_stringbuf is the number of
bytes that are needed for all of str, regardless of whether
buf was large enough to hold them. Thus, when the return value
is larger than max_len, only max_len bytes have been
stored and you probably need to try again with a larger buffer.
For most situations, string conversion should occur using the current
locale, such as with the functions above. But there may be cases where
one wants to convert strings from a character encoding other than the
locale’s character encoding. For these cases, the lower-level functions
scm_from_stringn are provided. These
functions should seldom be necessary if one is properly using locales.
This is an enumerated type that can take one of three values:
SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE. They are used to indicate
a strategy for handling characters that cannot be converted to or from a
given character encoding.
that a conversion should throw an error if some characters cannot be
SCM_FAILED_CONVERSION_QUESTION_MARK indicates that a
conversion should replace unconvertable characters with the question
mark character. And,
requests that a conversion should replace an unconvertable character
with an escape sequence.
While all three strategies apply when converting Scheme strings to C,
SCM_FAILED_CONVERSION_QUESTION_MARK can be used when converting C
strings to Scheme.
This function returns a newly allocated C string from the Guile string str. The length of the returned string in bytes will be returned in lenp. The character encoding of the C string is passed as the ASCII, null-terminated C string encoding. The handler parameter gives a strategy for dealing with characters that cannot be converted into encoding.
If lenp is
NULL, this function will return a null-terminated C
string. It will throw an error if the string contains a null
The Scheme interface to this function is
string->bytevector, from the
ice-9 iconv module. See Representing Strings as Bytes.
This function returns a scheme string from the C string str. The
length in bytes of the C string is input as len. The encoding of the C
string is passed as the ASCII, null-terminated C string
The handler parameters suggests a strategy for dealing with
The Scheme interface to this function is
See Representing Strings as Bytes.
The following conversion functions are provided as a convenience for the most commonly used encodings.
Return a scheme string from the null-terminated C string str, which is ISO-8859-1-, UTF-8-, or UTF-32-encoded. These functions should be used to convert hard-coded C string constants into Scheme strings.
Return a scheme string from C string str, which is ISO-8859-1-,
UTF-8-, or UTF-32-encoded, of length len. len is the number
of bytes pointed to by str for
scm_from_utf8_stringn; it is the number of elements (code points)
in str in the case of
Return a newly allocated, ISO-8859-1-, UTF-8-, or UTF-32-encoded C string
from Scheme string str. An error is thrown when str
cannot be converted to the specified encoding. If lenp is
NULL, the returned C string will be null terminated, and an error
will be thrown if the C string would otherwise contain null
characters. If lenp is not
NULL, the string is not null terminated,
and the length of the returned string is returned in lenp. The length
returned is the number of bytes for
scm_to_utf8_stringn; it is the number of elements (code points)
It is not often the case, but sometimes when you are dealing with the implementation details of a port, you need to encode and decode strings according to the encoding and conversion strategy of the port. There are some convenience functions for that purpose as well.
scm_from_stringn and friends, except they take their
encoding and conversion strategy from a given port object.