14.2 @documentencoding enc: Set Input Encoding

In the default case, the input and output document encoding are assumed to be UTF-8, the vast global character encoding, expressed in 8-bit bytes. UTF-8 is compatible with 7-bit ASCII. It is recommended to use UTF-8 encoding for Texinfo manuals.

The @documentencoding command declares the input document encoding, and also affects the encoding of the output. Write it on a line by itself, with a valid encoding specification following, near the beginning of the file if your document encoding is not the default encoding.

@documentencoding enc

UTF-8 should always be the best choice for the encoding. Texinfo still supports additional encodings, mainly for compatibility with older manuals8:

US-ASCII

Character encoding based on the English alphabet.

ISO-8859-1
ISO-8859-15
ISO-8859-2

These specify the pre-UTF-8 standard encodings for Western European (the first two) and Eastern European languages (the third), respectively. ISO 8859-15 replaces some little-used characters from 8859-1 (e.g., precomposed fractions) with more commonly needed ones, such as the Euro symbol (€).

A full description of the encodings is beyond our scope here; one useful reference is http://czyborra.com/charsets/iso8859.html.

koi8-r

This was a commonly used encoding for the Russian language before UTF-8.

koi8-u

This was a commonly used encoding for the Ukrainian language before UTF-8.

In Info output, a so-called ‘Local Variables’ section (see File Variables in The GNU Emacs Manual) is output including the output encoding. This allows Info readers to set the encoding appropriately. It looks like this:

Local Variables:
coding: UTF-8
End:

By default, for Info and plain text output, texi2any outputs accent constructs and special characters (such as @'e) as the actual UTF-8 sequence or 8-bit character in the output encoding where possible. If this is not possible, or if the option --disable-encoding is given, an ASCII transliteration is used instead.

In HTML output, a ‘<meta>’ tag is output, in the ‘<head>’ section of the HTML, that specifies the output encoding. Web servers and browsers cooperate to use this information so the correct encoding is used to display the page, if supported by the system. That looks like this:

<meta http-equiv="Content-Type" content="text/html;
     charset=utf-8">

In HTML and LaTeX output, if OUTPUT_CHARACTERS is set (see Other Customization Variables), accent constructs and special characters, such as @'e or ``, are output as the actual UTF-8 sequence or 8-bit character in the output encoding where possible. Otherwise, HTML entities are used for those characters in HTML, and LaTeX macros are used in LaTeX.

In DocBook output, if the encoding is different from UTF-8, an encoding attribute is added to the XML declaration. If OUTPUT_CHARACTERS is set (see Other Customization Variables), accent constructs such as @'e are output as the actual 8-bit or UTF-8 character in the output encoding where possible. Otherwise XML entities are used for those constructs.

In TeX output, the characters which are supported in the standard Computer Modern fonts are output accordingly. For example, this means using constructed accents rather than precomposed glyphs. Using a missing character generates a warning message, as does specifying an unimplemented encoding.

Although modern TeX systems support nearly every script in use in the world, this wide-ranging support is not available in texinfo.tex, and it’s not feasible to duplicate or incorporate all that effort.

In LaTeX output, code loading the ‘inputenc’ package is output based on the encoding. This, by itself, does not ensures that all the characters from the input document can be subsequently output. The fonts used in the default case should cover the specific Texinfo glyphs, but not all the possible encoded characters. You may need to load different fonts in the preamble and use \DeclareUnicodeCharacter with a UTF-8 encoding. For example:

@latex
\DeclareUnicodeCharacter{017B}{\.Z}
@end latex

Cross-references between Info files in different character encodings with non-ASCII characters in node names fail. We strongly recommend using UTF-8 only as the encoding for manuals with non-ASCII characters in the destinations of cross-references.


Footnotes

(8)

texi2any supports more encodings for Texinfo manuals, potentially all the encodings supported by both Perl and iconv (see Generic Charset Conversion in The GNU C Library). The support in output formats may be lacking, however, especially for LaTeX output.