Next: , Previous: , Up: Bytevectors   [Contents][Index]


6.6.6.6 Interpreting Bytevector Contents as Unicode Strings

Bytevector contents can also be interpreted as Unicode strings encoded in one of the most commonly available encoding formats. See Representing Strings as Bytes, for a more generic interface.

(utf8->string (u8-list->bytevector '(99 97 102 101)))
⇒ "cafe"

(string->utf8 "café") ;; SMALL LATIN LETTER E WITH ACUTE ACCENT
⇒ #vu8(99 97 102 195 169)
Scheme Procedure: string-utf8-length str
C function: SCM scm_string_utf8_length (str)
C function: size_t scm_c_string_utf8_length (str)

Return the number of bytes in the UTF-8 representation of str.

Scheme Procedure: string->utf8 str
Scheme Procedure: string->utf16 str [endianness]
Scheme Procedure: string->utf32 str [endianness]
C Function: scm_string_to_utf8 (str)
C Function: scm_string_to_utf16 (str, endianness)
C Function: scm_string_to_utf32 (str, endianness)

Return a newly allocated bytevector that contains the UTF-8, UTF-16, or UTF-32 (aka. UCS-4) encoding of str. For UTF-16 and UTF-32, endianness should be the symbol big or little; when omitted, it defaults to big endian.

Scheme Procedure: utf8->string utf
Scheme Procedure: utf16->string utf [endianness]
Scheme Procedure: utf32->string utf [endianness]
C Function: scm_utf8_to_string (utf)
C Function: scm_utf16_to_string (utf, endianness)
C Function: scm_utf32_to_string (utf, endianness)

Return a newly allocated string that contains from the UTF-8-, UTF-16-, or UTF-32-decoded contents of bytevector utf. For UTF-16 and UTF-32, endianness should be the symbol big or little; when omitted, it defaults to big endian.