@documentencoding enc
: Set Input Encoding ¶In the default case, the input and output document encoding are assumed to be UTF-8, the vast global character encoding, expressed in 8-bit bytes. UTF-8 is compatible with 7-bit ASCII. It is recommended to use UTF-8 encoding for Texinfo manuals.
The @documentencoding
command declares the input document encoding,
and also affects the encoding of the output. Write it on a line by itself,
with a valid encoding specification following, near the beginning of the file
if your document encoding is not the default encoding.
@documentencoding enc
UTF-8 should always be the best choice for the encoding. Texinfo still supports additional encodings, mainly for compatibility with older manuals8:
US-ASCII
Character encoding based on the English alphabet.
ISO-8859-1
¶ISO-8859-15
ISO-8859-2
These specify the pre-UTF-8 standard encodings for Western European (the first two) and Eastern European languages (the third), respectively. ISO 8859-15 replaces some little-used characters from 8859-1 (e.g., precomposed fractions) with more commonly needed ones, such as the Euro symbol (€).
A full description of the encodings is beyond our scope here; one useful reference is http://czyborra.com/charsets/iso8859.html.
koi8-r
This was a commonly used encoding for the Russian language before UTF-8.
koi8-u
This was a commonly used encoding for the Ukrainian language before UTF-8.
In Info output, a so-called ‘Local Variables’ section (see File Variables in The GNU Emacs Manual) is output including the output encoding. This allows Info readers to set the encoding appropriately. It looks like this:
Local Variables: coding: UTF-8 End:
By default, for Info and plain text output, texi2any
outputs
accent constructs and special characters (such as @'e
)
as the actual UTF-8 sequence or 8-bit character in the output
encoding where possible. If this is not possible, or if the option
--disable-encoding is given, an ASCII transliteration is
used instead.
In HTML output, a ‘<meta>’ tag is output, in the ‘<head>’ section of the HTML, that specifies the output encoding. Web servers and browsers cooperate to use this information so the correct encoding is used to display the page, if supported by the system. That looks like this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
In HTML and LaTeX output, if OUTPUT_CHARACTERS
is set
(see Other Customization Variables), accent constructs and special
characters, such as @'e
or ``
, are output as the actual
UTF-8 sequence or 8-bit character in the output encoding where possible.
Otherwise, HTML entities are used for those characters in HTML, and
LaTeX macros are used in LaTeX.
In DocBook output, if the encoding is different from UTF-8,
an encoding
attribute is added to the XML declaration.
If OUTPUT_CHARACTERS
is set (see Other Customization Variables),
accent constructs such as @'e
are output as the actual 8-bit or
UTF-8 character in the output encoding where possible. Otherwise XML
entities are used for those constructs.
In TeX output, the characters which are supported in the standard Computer Modern fonts are output accordingly. For example, this means using constructed accents rather than precomposed glyphs. Using a missing character generates a warning message, as does specifying an unimplemented encoding.
Although modern TeX systems support nearly every script in use in the world, this wide-ranging support is not available in texinfo.tex, and it’s not feasible to duplicate or incorporate all that effort.
In LaTeX output, code loading the ‘inputenc’ package is output
based on the encoding. This, by itself, does not ensures that all
the characters from the input document can be subsequently output.
The fonts used in the default case should cover the specific Texinfo
glyphs, but not all the possible encoded characters. You may need to
load different fonts in the preamble and use
\DeclareUnicodeCharacter
with a UTF-8 encoding. For example:
@latex \DeclareUnicodeCharacter{017B}{\.Z} @end latex
Cross-references between Info files in different character encodings with non-ASCII characters in node names fail. We strongly recommend using UTF-8 only as the encoding for manuals with non-ASCII characters in the destinations of cross-references.
texi2any
supports more encodings for Texinfo
manuals, potentially all the encodings supported by both Perl and iconv
(see Generic Charset Conversion in The GNU C Library).
The support in output formats may be lacking, however, especially for LaTeX
output.