19.5 Internationalization of Document Strings

texi2any writes fixed strings into the output document at various places: cross-references, page footers, the help page, alternate text for images, and so on. The string chosen depends on the value of the documentlanguage at the time of the string being output (see @documentlanguage ll[_cc]: Set the Document Language, for the Texinfo command interface).

The Gettext framework is used for those strings (see Gettext). The libintl-perl package is used as the gettext implementation; more specifically, the pure Perl implementation is used, so Texinfo can support consistent behavior across all platforms and installations, which would not otherwise be possible. libintl-perl is included in the Texinfo distribution and always installed, to ensure that it is available if needed. It is also possible to use the system gettext (the choice can be made at build-time).

The Gettext domain ‘texinfo_document’ is used for the strings. Translated strings are written as Texinfo, and may include @-commands. In translated strings, the varying parts of the string are not usually denoted by %s and the like, but by ‘{arg_name}’. (This convention is common for gettext in Perl and is fully supported in GNU Gettext; see Perl Format Strings in GNU Gettext.) For example, in the following, ‘{section}’ will be replaced by the section name:

see {section}

These Perl-style brace format strings are used for two reasons: first, changing the order of printf arguments is only available since Perl 5.8.0; second, and more importantly, the order of arguments is unpredictable, since @-command expansion may lead to different orders depending on the output format.

The expansion of a translation string is done like this:

  1. First, the string is translated. The locale is documentlanguage.documentencoding.

    If the documentlanguage has the form ‘ll_CC’, that is tried first, and then just ‘ll’.

    To cope with the possibility of having multiple encodings, a special use of the us-ascii locale encoding is also possible. If the ‘ll’ locale in the current encoding does not exist, and the encoding is not us-ascii, then us-ascii is tried.

    The idea is that if there is a us-ascii encoding, it means that all the characters in the charset may be expressed as @-commands. For example, there is a fr.us-ascii locale that can accommodate any encoding, since all the Latin 1 characters have associated @-commands. On the other hand, Japanese has only a translation ja.utf-8, since there are no @-commands for Japanese characters.

    The us-ascii locales are not needed much now that UTF-8 is used for most documents. Note that accented characters are required to be expressed as @-commands in the us-ascii locales, which may be inconvenient for translators.

  2. Next, the string is expanded as Texinfo, and converted. The arguments are substituted; for example, ‘{arg_name}’ is replaced by the corresponding actual argument.

In the following example, ‘{date}’, ‘{program_homepage}’ and ‘{program}’ are the arguments of the string. Since they are used in @uref, their order is not predictable. ‘{date}’, ‘{program_homepage}’ and ‘{program}’ are substituted after the expansion:

Generated on @emph{{date}} using
@uref{{program_homepage}, @emph{{program}}}.

This approach is admittedly a bit complicated. Its usefulness is that it supports having translations available in different encodings for encodings which can be covered by @-commands, and also specifying how the formatting for some commands is done, independently of the output format—yet still be language-dependent. For example, the ‘@pxref’ translation string can be like this:

see {node_file_href} section `{section}' in @cite{{book}}

which allows for specifying a string independently of the output format, while nevertheless with rich formatting it may be translated appropriately in many languages.