Several different Unicode encoding schemes describe standard ways to encode characters and strings as byte sequences and to decode those sequences. Within this document, a codec is an immutable Scheme object that represents a Unicode or similar encoding scheme.
An end-of-line style is a symbol that, if it is not none,
describes how a textual port transcodes representations of line endings.
A transcoder is an immutable Scheme object that combines a codec with an end-of-line style and a method for handling decoding errors. Each transcoder represents some specific bidirectional (but not necessarily lossless), possibly stateful translation between byte sequences and Unicode characters and strings. Every transcoder can operate in the input direction (bytes to characters) or in the output direction (characters to bytes). A transcoder parameter name means that the corresponding argument must be a transcoder.
A binary port is a port that supports binary I/O, does not have an associated transcoder and does not support textual I/O. A textual port is a port that supports textual I/O, and does not support binary I/O. A textual port may or may not have an associated transcoder.
These are predefined codecs for the ISO 8859-1, UTF-8, and UTF-16 encoding schemes.
A call to any of these procedures returns a value that is equal in the sense of
eqv?to the result of any other call to the same procedure.
eol-style-symbol should be a symbol whose name is one of
lf,cr,crlf,nel,crnel,ls, andnone.The form evaluates to the corresponding symbol. If the name of eol-style-symbol is not one of these symbols, the effect and result are implementation-dependent; in particular, the result may be an eol-style symbol acceptable as an eol-style argument to
make-transcoder. Otherwise, an exception is raised.All eol-style symbols except
nonedescribe a specific line-ending encoding:
lf- linefeed
cr- carriage return
crlf- carriage return, linefeed
nel- next line
crnel- carriage return, next line
ls- line separator
For a textual port with a transcoder, and whose transcoder has an eol-style symbol
none, no conversion occurs. For a textual input port, any eol-style symbol other thannonemeans that all of the above line-ending encodings are recognized and are translated into a single linefeed. For a textual output port,noneandlfare equivalent. Linefeed characters are encoded according to the specified eol-style symbol, and all other characters that participate in possible line endings are encoded as is.Note: Only the name of eol-style-symbol is significant.
Returns the default end-of-line style of the underlying platform, e.g.,
lfon Unix andcrlfon Windows.
This condition type could be defined by
(define-condition-type &i/o-decoding &i/o-port make-i/o-decoding-error i/o-decoding-error?)An exception with this type is raised when one of the operations for textual input from a port encounters a sequence of bytes that cannot be translated into a character or string by the input direction of the port's transcoder.
When such an exception is raised, the port's position is past the invalid encoding.
This condition type could be defined by
(define-condition-type &i/o-encoding &i/o-port make-i/o-encoding-error i/o-encoding-error? (char i/o-encoding-error-char))An exception with this type is raised when one of the operations for textual output to a port encounters a character that cannot be translated into bytes by the output direction of the port's transcoder. Char is the character that could not be encoded.
error-handling-mode-symbol should be a symbol whose name is one of
ignore,raise, andreplace. The form evaluates to the corresponding symbol. If error-handling-mode-symbol is not one of these identifiers, effect and result are implementation-dependent: The result may be an error-handling-mode symbol acceptable as a handling-mode argument tomake-transcoder. If it is not acceptable as a handling-mode argument tomake-transcoder, an exception is raised.Note: Only the name of error-handling-style-symbol is significant.The error-handling mode of a transcoder specifies the behavior of textual I/O operations in the presence of encoding or decoding errors.
If a textual input operation encounters an invalid or incomplete character encoding, and the error-handling mode is
ignore, an appropriate number of bytes of the invalid encoding are ignored and decoding continues with the following bytes.If the error-handling mode is
replace, the replacement character U+FFFD is injected into the data stream, an appropriate number of bytes are ignored, and decoding continues with the following bytes.If the error-handling mode is
raise, an exception with condition type&i/o-decodingis raised.If a textual output operation encounters a character it cannot encode, and the error-handling mode is
ignore, the character is ignored and encoding continues with the next character. If the error-handling mode isreplace, a codec-specific replacement character is emitted by the transcoder, and encoding continues with the next character. The replacement character is U+FFFD for transcoders whose codec is one of the Unicode encodings, but is the?character for the Latin-1 encoding. If the error-handling mode israise, an exception with condition type&i/o-encodingis raised.
codec must be a codec; eol-style, if present, an eol-style symbol; and handling-mode, if present, an error-handling-mode symbol.
eol-style may be omitted, in which case it defaults to the native end-of-line style of the underlying platform. Handling-mode may be omitted, in which case it defaults to
replace. The result is a transcoder with the behavior specified by its arguments.
Returns an implementation-dependent transcoder that represents a possibly locale-dependent “native” transcoding.
These are accessors for transcoder objects; when applied to a transcoder returned by
make-transcoder, they return the codec, eol-style, and handling-mode arguments, respectively.