7.6.2.15 Transcoders

The transcoder facilities are exported by (rnrs io ports).

Several different Unicode encoding schemes describe standard ways to encode characters and strings as byte sequences and to decode those sequences. Within this document, a codec is an immutable Scheme object that represents a Unicode or similar encoding scheme.

An end-of-line style is a symbol that, if it is not none, describes how a textual port transcodes representations of line endings.

A transcoder is an immutable Scheme object that combines a codec with an end-of-line style and a method for handling decoding errors. Each transcoder represents some specific bidirectional (but not necessarily lossless), possibly stateful translation between byte sequences and Unicode characters and strings. Every transcoder can operate in the input direction (bytes to characters) or in the output direction (characters to bytes). A transcoder parameter name means that the corresponding argument must be a transcoder.

A binary port is a port that supports binary I/O, does not have an associated transcoder and does not support textual I/O. A textual port is a port that supports textual I/O, and does not support binary I/O. A textual port may or may not have an associated transcoder.

Scheme Procedure: latin-1-codec
Scheme Procedure: utf-8-codec
Scheme Procedure: utf-16-codec

These are predefined codecs for the ISO 8859-1, UTF-8, and UTF-16 encoding schemes.

A call to any of these procedures returns a value that is equal in the sense of eqv? to the result of any other call to the same procedure.

Scheme Syntax: eol-style eol-style-symbol

eol-style-symbol should be a symbol whose name is one of lf, cr, crlf, nel, crnel, ls, and none.

The form evaluates to the corresponding symbol. If the name of eol-style-symbol is not one of these symbols, the effect and result are implementation-dependent; in particular, the result may be an eol-style symbol acceptable as an eol-style argument to make-transcoder. Otherwise, an exception is raised.

All eol-style symbols except none describe a specific line-ending encoding:

lf

linefeed

cr

carriage return

crlf

carriage return, linefeed

nel

next line

crnel

carriage return, next line

ls

line separator

For a textual port with a transcoder, and whose transcoder has an eol-style symbol none, no conversion occurs. For a textual input port, any eol-style symbol other than none means that all of the above line-ending encodings are recognized and are translated into a single linefeed. For a textual output port, none and lf are equivalent. Linefeed characters are encoded according to the specified eol-style symbol, and all other characters that participate in possible line endings are encoded as is.

Note: Only the name of eol-style-symbol is significant.

Scheme Procedure: native-eol-style

Returns the default end-of-line style of the underlying platform, e.g., lf on Unix and crlf on Windows.

Condition Type: &i/o-decoding
Scheme Procedure: make-i/o-decoding-error port
Scheme Procedure: i/o-decoding-error? obj

This condition type could be defined by

(define-condition-type &i/o-decoding &i/o-port
  make-i/o-decoding-error i/o-decoding-error?)

An exception with this type is raised when one of the operations for textual input from a port encounters a sequence of bytes that cannot be translated into a character or string by the input direction of the port’s transcoder.

When such an exception is raised, the port’s position is past the invalid encoding.

Condition Type: &i/o-encoding
Scheme Procedure: make-i/o-encoding-error port char
Scheme Procedure: i/o-encoding-error? obj
Scheme Procedure: i/o-encoding-error-char condition

This condition type could be defined by

(define-condition-type &i/o-encoding &i/o-port
  make-i/o-encoding-error i/o-encoding-error?
  (char i/o-encoding-error-char))

An exception with this type is raised when one of the operations for textual output to a port encounters a character that cannot be translated into bytes by the output direction of the port’s transcoder. char is the character that could not be encoded.

Scheme Syntax: error-handling-mode error-handling-mode-symbol

error-handling-mode-symbol should be a symbol whose name is one of ignore, raise, and replace. The form evaluates to the corresponding symbol. If error-handling-mode-symbol is not one of these identifiers, effect and result are implementation-dependent: The result may be an error-handling-mode symbol acceptable as a handling-mode argument to make-transcoder. If it is not acceptable as a handling-mode argument to make-transcoder, an exception is raised.

Note: Only the name of error-handling-mode-symbol is significant.

The error-handling mode of a transcoder specifies the behavior of textual I/O operations in the presence of encoding or decoding errors.

If a textual input operation encounters an invalid or incomplete character encoding, and the error-handling mode is ignore, an appropriate number of bytes of the invalid encoding are ignored and decoding continues with the following bytes.

If the error-handling mode is replace, the replacement character U+FFFD is injected into the data stream, an appropriate number of bytes are ignored, and decoding continues with the following bytes.

If the error-handling mode is raise, an exception with condition type &i/o-decoding is raised.

If a textual output operation encounters a character it cannot encode, and the error-handling mode is ignore, the character is ignored and encoding continues with the next character. If the error-handling mode is replace, a codec-specific replacement character is emitted by the transcoder, and encoding continues with the next character. The replacement character is U+FFFD for transcoders whose codec is one of the Unicode encodings, but is the ? character for the Latin-1 encoding. If the error-handling mode is raise, an exception with condition type &i/o-encoding is raised.

Scheme Procedure: make-transcoder codec
Scheme Procedure: make-transcoder codec eol-style
Scheme Procedure: make-transcoder codec eol-style handling-mode

codec must be a codec; eol-style, if present, an eol-style symbol; and handling-mode, if present, an error-handling-mode symbol.

eol-style may be omitted, in which case it defaults to the native end-of-line style of the underlying platform. handling-mode may be omitted, in which case it defaults to replace. The result is a transcoder with the behavior specified by its arguments.

Scheme procedure: native-transcoder

Returns an implementation-dependent transcoder that represents a possibly locale-dependent “native” transcoding.

Scheme Procedure: transcoder-codec transcoder
Scheme Procedure: transcoder-eol-style transcoder
Scheme Procedure: transcoder-error-handling-mode transcoder

These are accessors for transcoder objects; when applied to a transcoder returned by make-transcoder, they return the codec, eol-style, and handling-mode arguments, respectively.

Scheme Procedure: bytevector->string bytevector transcoder

Returns the string that results from transcoding the bytevector according to the input direction of the transcoder.

Scheme Procedure: string->bytevector string transcoder

Returns the bytevector that results from transcoding the string according to the output direction of the transcoder.