Prioritizing messages (GNU gettext utilities)

12.7 Prioritizing messages: How to determine which messages to translate first

A translator sometimes has only a limited amount of time per week to spend on a package, and some packages have quite large message catalogs (over 1000 messages). Therefore she wishes to translate the messages first that are the most visible to the user, or that occur most frequently. This section describes how to determine these "most urgent" messages. It also applies to determine the "next most urgent" messages after the message catalog has already been partially translated.

In a first step, she uses the programs like a user would do. While she does this, the GNU gettext library logs into a file the not yet translated messages for which a translation was requested from the program.

In a second step, she uses the PO mode to translate precisely this set of messages.

Here are more details. The GNU libintl library (but not the corresponding functions in GNU libc) supports an environment variable GETTEXT_LOG_UNTRANSLATED. The GNU libintl library will log into this file the messages for which gettext() and related functions couldn’t find the translation. If the file doesn’t exist, it will be created as needed. On systems with GNU libc a shared library ‘preloadable_libintl.so’ is provided that can be used with the ELF ‘LD_PRELOAD’ mechanism.

So, in the first step, the translator uses these commands on systems with GNU libc:

$ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so
$ export LD_PRELOAD
$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
$ export GETTEXT_LOG_UNTRANSLATED

and these commands on other systems:

$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
$ export GETTEXT_LOG_UNTRANSLATED

Then she uses and peruses the programs. (It is a good and recommended practice to use the programs for which you provide translations: it gives you the needed context.) When done, she removes the environment variables:

$ unset LD_PRELOAD
$ unset GETTEXT_LOG_UNTRANSLATED

The second step starts with removing duplicates:

$ msguniq $HOME/gettextlogused > missing.po

The result is a PO file, but needs some preprocessing before a PO file editor can be used with it. First, it is a multi-domain PO file, containing messages from many translation domains. Second, it lacks all translator comments and source references. Here is how to get a list of the affected translation domains:

$ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq

Then the translator can handle the domains one by one. For simplicity, let’s use environment variables to denote the language, domain and source package.

$ lang=nl             # your language
$ domain=coreutils    # the name of the domain to be handled
$ package=/usr/src/gnu/coreutils-4.5.4   # the package where it comes from

She takes the latest copy of $lang.po from the Translation Project, or from the package (in most cases, $package/po/$lang.po), or creates a fresh one if she’s the first translator (see Creating a New PO File). She then uses the following commands to mark the not urgent messages as "obsolete". (This doesn’t mean that these messages - translated and untranslated ones - will go away. It simply means that the PO file editor will ignore them in the following editing session.)

$ msggrep --domain=$domain missing.po | grep -v '^domain' \
  > $domain-missing.po
$ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \
  > $domain.$lang-urgent.po

The she translates $domain.$lang-urgent.po by use of a PO file editor (see Editing PO Files). (FIXME: I don’t know whether KBabel and gtranslator also preserve obsolete messages, as they should.) Finally she restores the not urgent messages (with their earlier translations, for those which were already translated) through this command:

$ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \
  > $domain.$lang.po

Then she can submit $domain.$lang.po and proceed to the next domain.