Next: , Previous: Overview, Up: Introduction


1.2 What GNUnited Nations is and Should be

The basic concept behind GNUN is that localization of HTML articles is similar to localization of computer programs1. In articles, like in programs, not every string is considered translatable, so translatable strings must be identified first, and then collected in a file (called “PO template”) for translation. Articles, like programs, tend to change in time, but not every change in the sources calls for a translation update. Sometimes the change does not affect the translatable strings, but sometimes it does. So, translators must have means to identify those changes and apply the appropriate updates to the translation.

The GNU gettext package already provides the needed infrastructure for maintaining translations using PO files. See Introduction, for a basic overview. GNUnited Nations fills the gaps to apply this infrastructure to articles in http://gnu.org web site.2

The following diagram summarizes the relation between the files handled by GNUN. It is followed by somewhat detailed explanations, which you should read while keeping an eye on the diagram. Having a clear understanding of these interrelations will surely help translators and web maintainers.

     .---<--- * Original ARTICLE.html
     |
     |   .---> ARTICLE.pot ---> * ARTICLE.LANG.po --->---.
     `---+                                               |
         `--->---.   .------<----------------------------'
                 |   |
                 |   `---.
                 |       +---> Translated ARTICLE.LANG.html
                 `-------'

The indication ‘*’ appears in two places in this picture, and means that the corresponding file is intended to be edited by humans. The author or web maintainer edits the original article.html, and translators edit article.lang.po. All other files are regenerated by GNUN and any manual changes on them will be lost on the next run.

Arrows denote dependency relation between files, where a change in one file will affect the other. Those automatic changes will be applied by running ‘make -C server/gnun’. This is the primary way to invoke GNUN, since it is implemented as a set of recipes for GNU make.

First, GNUN extracts all translatable strings from the original English article article.html into article.pot. The resulted file is suitable for manipulation with the various GNUgettext’ utilities. It contains all original article strings and all translations are set to empty. The letter t in .pot marks this as a Template PO file, not yet oriented towards any particular language.

The first time though, there is no article.lang.po yet, so a translator must manually copy article.pot to article.lang.po, where lang represents the target language. See New Translation, for details.

Then comes the initial translation of messages in article.lang.po. Translation in itself is a whole matter, whose complexity far overwhelms the level of this manual. Nevertheless, a few hints are given in some other chapter of this manual.

You may use any compatible PO editor to add translated messages into the PO file. See Editing, for more information.

When the PO file actually exists (hopefully populated with initial translations), GNUN generates article.lang.html file. It takes its structure from the original article.html, but all translatable strings are replaced with their translations specified in article.lang.po.

Original articles sometimes change. A new paragraph is being added or a tiny change in the wording is introduced. Also, some articles are dynamic in nature, like ones containing news entries or a list of other articles. If the original article changes, GNUN will automatically rebuild article.pot, and will merge the changes to article.lang.po. Any outdated translations will be marked as fuzzy, any new strings will be added with empty translations, waiting to be translated. In the same run article.lang.html will be rebuilt so the relevant strings in the translation will be substituted with the original English text, until the translation teams update them in article.lang.po.

Those changes in the original article that do not affect the translatable strings will not lead to changes in article.lang.po. Thus, no actions from translators will be needed. article.lang.html will be automatically regenerated to reflect the changes.

The POT for every article under GNUN's control is kept in the `www' repository under a special directory po/, which is a sub-directory of the relevant directory in the `www' tree. So, for <http://www.gnu.org/philosophy/free-sw.html> that is philosophy/po/. Except free-sw.pot, this directory holds the canonical source of every translation, like free-sw.bg.po, free-sw.ca.po, etc.

Several additional features are implemented, like automatic update of the list of the available translations. For example, if a new translation is added and the list of translations in free-sw.html is updated, all translated free-sw.lang.html will be regenerated. This saves a lot of tedious, repetitive work. There is a basic infrastructure to “inject” general information about a translation team—like a note how to contact the team, or how to report a bug/suggestion for improvement. Translators' credits are also handled, as well as translators' notes, if any.

GNUN can be extended, and new features will certainly be added. The TODO file currently lists some of them, but new ideas pop up quite often. The plan is to make a solid foundation and develop front-ends—a web front-end, possibly based on Pootle, a statistics facility, probably a wiki compiler, and more.


Footnotes

[1] Actually, it is much more closer to localization of software documentation, where typically strings (also known as “messages” in gettext's context) are longer than strings in programs. Nevertheless, all points raised still apply.

[2] The process of converting HTML to PO and the other way around is performed using po4a (“po for anything”), see http://po4a.alioth.debian.org.