The basic concept behind GNUN is that localization of HTML articles is similar to localization of computer programs1. In articles, like in programs, not every string is considered translatable, so translatable strings must be identified first, and then collected in a file (called “PO template”) for translation. Articles, like programs, tend to change in time, but not every change in the sources calls for a translation update. Sometimes the change does not affect the translatable strings, but sometimes it does. So, translators must have means to identify those changes and apply the appropriate updates to the translation.
gettext package already provides the needed
infrastructure for maintaining translations using PO files. See Introduction in GNU gettext tools, for a basic overview.
GNUnited Nations fills the gaps to apply this infrastructure to articles
in http://gnu.org web site.2
The following diagram summarizes the relation between the files handled by GNUN. It is followed by somewhat detailed explanations, which you should read while keeping an eye on the diagram. Having a clear understanding of these interrelations will surely help translators and web maintainers.
.---<--- * Original ARTICLE.html | | .---> ARTICLE.pot ---> * ARTICLE.LANG.po --->---. `---+ | `--->---. .------<----------------------------' | | | `---. | +---> Translated ARTICLE.LANG.html `-------'
The indication ‘*’ appears in two places in this picture, and means that the corresponding file is intended to be edited by humans. The author or web maintainer edits the original article.html, and translators edit article.lang.po. All other files are regenerated by GNUN and any manual changes on them will be lost on the next run.
Arrows denote dependency relation between files, where a change in one
file will affect the other. Those automatic changes will be applied by
running ‘make -C server/gnun’. This is the primary way to invoke
GNUN, since it is implemented as a set of recipes for GNU
First, GNUN extracts all translatable strings from the original English
article article.html into article.pot. The
resulting file is suitable for manipulation with the various GNU
‘gettext’ utilities. It contains all original article strings and
all translations are set to empty. The letter
t in .pot
marks this as a Template PO file, not yet oriented towards any
The first time though, there is no article.lang.po yet, so a translator must make article.lang.po from article.pot, where lang represents the target language. See New Translation, for details.
Then comes the initial translation of messages in article.lang.po. Translation in itself is a whole matter, whose complexity far overwhelms the level of this manual. Nevertheless, a few hints are given in some other chapter of this manual.
It is possible to make GNUN get translations for common strings from dedicated PO files, so called compendia. See Compendia, for more information.
You may use any compatible PO editor to add translated messages into the PO file. See Editing in GNU gettext tools, for more information.
When the PO file actually exists (hopefully populated with initial translations), GNUN generates article.lang.html file. It takes its structure from the original article.html, but all translatable strings are replaced with their translations specified in article.lang.po.
Original articles sometimes change. A new paragraph is being added or a tiny change in the wording is introduced. Also, some articles are dynamic in nature, like ones containing news entries or a list of other articles. If the original article changes, GNUN will automatically rebuild article.pot, and will merge the changes to article.lang.po. Any outdated translations will be marked as “fuzzy,” any new strings will be added with empty translations, waiting to be translated. In the same run article.lang.html will be rebuilt so the relevant strings in the translation will be substituted with the original English text, until the translation teams update them in article.lang.po.
Those changes in the original article that do not affect the translatable strings (or just delete whole strings) will not lead to changes in article.lang.po. Thus, no actions from translators will be needed. article.lang.html will be automatically regenerated to reflect the changes.
The POT for every article under GNUN’s control is kept in the ‘www’
repository under a special directory po/, which is a
sub-directory of the relevant directory in the ‘www’ tree. So, for
http://www.gnu.org/philosophy/free-sw.html’ that is
philosophy/po/. In addition to free-sw.pot, this directory holds
the canonical source of every translation, like free-sw.bg.po,
free-sw.ca.po, etc. For more details,
See Files and Directories.
Several additional features are implemented, like automatic update of the list of the available translations. For example, if a new free-sw.ja.po translation is added, the list of translations included in free-sw.html and all translated free-sw.lang.html is updated. This saves a lot of tedious, repetitive work and eliminates a source of mistakes. There is a basic infrastructure to “inject” general information about a translation team—like a note how to contact the team, or how to report a bug/suggestion for improvement. Translators’ credits are also handled, as well as translators’ notes, if any.
Actually, it is a lot closer to localization of software documentation, where typically strings (also known as “messages” in gettext’s context) are longer than strings in programs. Nevertheless, all points raised still apply.
The process of converting HTML to PO and the other way around is performed using po4a (“po for anything”), see http://po4a.alioth.debian.org.