per page, with , order by , clip by
Results of 1 - 1 of about 80 for what does gnu stand for? (0.175 sec.)
what (25797), does (28628), gnu (99455), stand (2787), for? (607)
GNU gettext utilities
#score: 13012
@digest: 8b0f2e04aa2453c5f5141e9dd00f703a
@id: 309631
@mdate: 2023-06-17T13:22:03Z
@size: 1594070
@type: text/html
content-type: text/html; charset=utf-8
description: GNU gettext utilities
distribution: global
generator: makeinfo
keywords: GNU gettext utilities
resource-type: document
viewport: width=device-width,initial-scale=1
#keywords: xgettext (92481), translatable (70211), untranslated (50780), gettext (33096), strings (24139), translator (16956), marking (16949), preparing (16190), locale (14254), translation (13840), format (11923), translated (10590), language (9336), keyword (9244), invoking (7920), likewise (7593), string (7430), sources (6592), location (6119), comments (5922), programmer (5902), po (5790), english (5760), languages (5599), messages (5440), output (5244), comment (5196), program (4988), entries (4920), translators (4668), message (4450), environment (4376)
GNU gettext utilities Next: Introduction , Up: (dir) [ Contents ][ Index ] GNU gettext utilities This manual documents the GNU gettext tools and the GNU libintl library, version 0.22. Table of Contents 1 Introduction 1.1 The Purpose of GNU gettext 1.2 I18n, L10n, and Such 1.3 Aspects in Native Language Support 1.4 Files Conveying Translations 1.5 Overview of GNU gettext 2 The User's View 2.1 Operating System Installation 2.2 Setting the Locale Used by GUI Programs 2.3 Setting the Locale through Environment Variables 2.3.1 Locale Names 2.3.2 Locale Environment Variables 2.3.3 Specifying a Priority List of Languages 2.4 Obtaining good output in a Windows console 2.5 Installing Translations for Particular Programs 3 The Format of PO Files 4 Preparing Program Sources 4.1 Importing the gettext declaration 4.2 Triggering gettext Operations 4.3 Preparing Translatable Strings 4.4 How Marks Appear in Sources 4.5 Marking Translatable Strings 4.6 Special Comments preceding Keywords 4.7 Special Cases of Translatable Strings 4.8 Letting Users Report Translation Bugs 4.9 Marking Proper Names for Translation 4.10 Preparing Library Sources 5 Making the PO Template File 5.1 Invoking the xgettext Program 5.1.1 Input file location 5.1.2 Output file location 5.1.3 Choice of input file language 5.1.4 Input file interpretation 5.1.5 Operation mode 5.1.6 Language specific options 5.1.7 Output details 5.1.8 Informative output 6 Creating a New PO File 6.1 Invoking the msginit Program 6.1.1 Input file location 6.1.2 Output file location 6.1.3 Input file syntax 6.1.4 Output details 6.1.5 Informative output 6.2 Filling in the Header Entry 7 Updating Existing PO Files 7.1 Invoking the msgmerge Program 7.1.1 Input file location 7.1.2 Operation mode 7.1.3 Output file location 7.1.4 Output file location in update mode 7.1.5 Operation modifiers 7.1.6 Input file syntax 7.1.7 Output details 7.1.8 Informative output 8 Editing PO Files 8.1 KDE's PO File Editor 8.2 GNOME's PO File Editor 8.3 Emacs's PO File Editor 8.3.1 Completing GNU gettext Installation 8.3.2 Main PO mode Commands 8.3.3 Entry Positioning 8.3.4 Normalizing Strings in Entries 8.3.5 Translated Entries 8.3.6 Fuzzy Entries 8.3.7 Untranslated Entries 8.3.8 Obsolete Entries 8.3.9 Modifying Translations 8.3.10 Modifying Comments 8.3.11 Details of Sub Edition 8.3.12 C Sources Context 8.3.13 Consulting Auxiliary PO Files 8.4 Using Translation Compendia 8.4.1 Creating Compendia 8.4.1.1 Concatenate PO Files 8.4.1.2 Extract a Message Subset from a PO File 8.4.2 Using Compendia 8.4.2.1 Initialize a New Translation File 8.4.2.2 Update an Existing Translation File 9 Manipulating PO Files 9.1 Invoking the msgcat Program 9.1.1 Input file location 9.1.2 Output file location 9.1.3 Message selection 9.1.4 Input file syntax 9.1.5 Output details 9.1.6 Informative output 9.2 Invoking the msgconv Program 9.2.1 Input file location 9.2.2 Output file location 9.2.3 Conversion target 9.2.4 Input file syntax 9.2.5 Output details 9.2.6 Informative output 9.3 Invoking the msggrep Program 9.3.1 Input file location 9.3.2 Output file location 9.3.3 Message selection 9.3.4 Input file syntax 9.3.5 Output details 9.3.6 Informative output 9.3.7 Examples 9.4 Invoking the msgfilter Program 9.4.1 Input file location 9.4.2 Output file location 9.4.3 The filter 9.4.4 Useful filter-option s when the filter is ‘ sed ' 9.4.5 Built-in filter s 9.4.6 Input file syntax 9.4.7 Output details 9.4.8 Informative output 9.4.9 Examples 9.5 Invoking the msguniq Program 9.5.1 Input file location 9.5.2 Output file location 9.5.3 Message selection 9.5.4 Input file syntax 9.5.5 Output details 9.5.6 Informative output 9.6 Invoking the msgcomm Program 9.6.1 Input file location 9.6.2 Output file location 9.6.3 Message selection 9.6.4 Input file syntax 9.6.5 Output details 9.6.6 Informative output 9.7 Invoking the msgcmp Program 9.7.1 Input file location 9.7.2 Operation modifiers 9.7.3 Input file syntax 9.7.4 Informative output 9.8 Invoking the msgattrib Program 9.8.1 Input file location 9.8.2 Output file location 9.8.3 Message selection 9.8.4 Attribute manipulation 9.8.5 Input file syntax 9.8.6 Output details 9.8.7 Informative output 9.9 Invoking the msgen Program 9.9.1 Input file location 9.9.2 Output file location 9.9.3 Input file syntax 9.9.4 Output details 9.9.5 Informative output 9.10 Invoking the msgexec Program 9.10.1 Input file location 9.10.2 Input file syntax 9.10.3 Informative output 9.11 Highlighting parts of PO files 9.11.1 The --color option 9.11.2 The environment variable TERM 9.11.3 The --style option 9.11.4 Style rules for PO files 9.11.5 Customizing less for viewing PO files 9.12 Other tools for manipulating PO files 9.13 Writing your own programs that process PO files 9.13.1 Error Handling 9.13.2 po_file_t API 9.13.3 po_message_iterator_t API 9.13.4 po_message_t API 9.13.5 PO Header Entry API 9.13.6 po_filepos_t API 9.13.7 Format Type API 9.13.8 Checking API 10 Producing Binary MO Files 10.1 Invoking the msgfmt Program 10.1.1 Input file location 10.1.2 Operation mode 10.1.3 Output file location 10.1.4 Output file location in Java mode 10.1.5 Output file location in C# mode 10.1.6 Output file location in Tcl mode 10.1.7 Desktop Entry mode operations 10.1.8 XML mode operations 10.1.9 Input file syntax 10.1.10 Input file interpretation 10.1.11 Output details 10.1.12 Informative output 10.2 Invoking the msgunfmt Program 10.2.1 Operation mode 10.2.2 Input file location 10.2.3 Input file location in Java mode 10.2.4 Input file location in C# mode 10.2.5 Input file location in Tcl mode 10.2.6 Output file location 10.2.7 Output details 10.2.8 Informative output 10.3 The Format of GNU MO Files 11 The Programmer's View 11.1 About catgets 11.1.1 The Interface 11.1.2 Problems with the catgets Interface?! 11.2 About gettext 11.2.1 The Interface 11.2.2 Solving Ambiguities 11.2.3 Locating Message Catalog Files 11.2.4 How to specify the output character set gettext uses 11.2.5 Using contexts for solving ambiguities 11.2.6 Additional functions for plural forms 11.2.7 Optimization of the *gettext functions 11.3 Comparing the Two Interfaces 11.4 Using libintl.a in own programs 11.5 Being a gettext grok 11.6 Temporary Notes for the Programmers Chapter 11.6.1 Temporary - Two Possible Implementations 11.6.2 Temporary - About catgets 11.6.3 Temporary - Why a single implementation 11.6.4 Temporary - Notes 12 The Translator's View 12.1 Introduction 0 12.2 Introduction 1 12.3 Discussions 12.4 Organization 12.4.1 Central Coordination 12.4.2 National Teams 12.4.2.1 Sub-Cultures 12.4.2.2 Organizational Ideas 12.4.3 Mailing Lists 12.5 Information Flow 12.6 Translating plural forms 12.7 Prioritizing messages: How to determine which messages to translate first 13 The Maintainer's View 13.1 Flat or Non-Flat Directory Structures 13.2 Prerequisite Works 13.3 Invoking the gettextize Program 13.4 Files You Must Create or Alter 13.4.1 POTFILES.in in po/ 13.4.2 LINGUAS in po/ 13.4.3 Makevars in po/ 13.4.4 Extending Makefile in po/ 13.4.5 configure.ac at top level 13.4.6 config.guess , config.sub at top level 13.4.7 mkinstalldirs at top level 13.4.8 aclocal.m4 at top level 13.4.9 config.h.in at top level 13.4.10 Makefile.in at top level 13.4.11 Makefile.in in src/ 13.4.12 gettext.h in lib/ 13.5 Autoconf macros for use in configure.ac 13.5.1 AM_GNU_GETTEXT in gettext.m4 13.5.2 AM_GNU_GETTEXT_VERSION in gettext.m4 13.5.3 AM_GNU_GETTEXT_NEED in gettext.m4 13.5.4 AM_PO_SUBDIRS in po.m4 13.5.5 AM_XGETTEXT_OPTION in po.m4 13.5.6 AM_ICONV in iconv.m4 13.6 Integrating with Version Control Systems 13.6.1 Avoiding version mismatch in distributed development 13.6.2 Files to put under version control 13.6.3 Put PO Files under Version Control 13.6.4 Invoking the autopoint Program 13.6.4.1 Options 13.6.4.2 Informative output 13.7 Creating a Distribution Tarball 14 The Installer's and Distributor's View 15 Other Programming Languages 15.1 The Language Implementor's View 15.2 The Programmer's View 15.3 The Translator's View 15.3.1 C Format Strings 15.3.2 Objective C Format Strings 15.3.3 C++ Format Strings 15.3.4 Python Format Strings 15.3.5 Java Format Strings 15.3.6 C# Format Strings 15.3.7 JavaScript Format Strings 15.3.8 Scheme Format Strings 15.3.9 Lisp Format Strings 15.3.10 Emacs Lisp Format Strings 15.3.11 librep Format Strings 15.3.12 Ruby Format Strings 15.3.13 Shell Format Strings 15.3.14 awk Format Strings 15.3.15 Lua Format Strings 15.3.16 Object Pascal Format Strings 15.3.17 Smalltalk Format Strings 15.3.18 Qt Format Strings 15.3.19 Qt Format Strings 15.3.20 KDE Format Strings 15.3.21 KUIT Format Strings 15.3.22 Boost Format Strings 15.3.23 Tcl Format Strings 15.3.24 Perl Format Strings 15.3.25 PHP Format Strings 15.3.26 GCC internal Format Strings 15.3.27 GFC internal Format Strings 15.3.28 YCP Format Strings 15.4 The Maintainer's View 15.5 Individual Programming Languages 15.5.1 C, C++, Objective C 15.5.2 Python 15.5.3 Java 15.5.4 C# 15.5.5 JavaScript 15.5.6 GNU guile - Scheme 15.5.7 GNU clisp - Common Lisp 15.5.8 GNU clisp C sources 15.5.9 Emacs Lisp 15.5.10 librep 15.5.11 Ruby 15.5.12 sh - Shell Script 15.5.12.1 Preparing Shell Scripts for Internationalization 15.5.12.2 Contents of gettext.sh 15.5.12.3 Invoking the gettext program 15.5.12.4 Invoking the ngettext program 15.5.12.5 Invoking the envsubst program 15.5.12.6 Invoking the eval_gettext function 15.5.12.7 Invoking the eval_ngettext function 15.5.12.8 Invoking the eval_pgettext function 15.5.12.9 Invoking the eval_npgettext function 15.5.13 bash - Bourne-Again Shell Script 15.5.14 GNU awk 15.5.15 Lua 15.5.16 Pascal - Free Pascal Compiler 15.5.17 GNU Smalltalk 15.5.18 Vala 15.5.19 wxWidgets library 15.5.20 Tcl - Tk's scripting language 15.5.21 Perl 15.5.21.1 General Problems Parsing Perl Code 15.5.21.2 Which keywords will xgettext look for? 15.5.21.3 How to Extract Hash Keys 15.5.21.4 What are Strings And Quote-like Expressions? 15.5.21.5 Invalid Uses Of String Interpolation 15.5.21.6 Valid Uses Of String Interpolation 15.5.21.7 When To Use Parentheses 15.5.21.8 How To Grok with Long Lines 15.5.21.9 Bugs, Pitfalls, And Things That Do Not Work 15.5.22 PHP Hypertext Preprocessor 15.5.23 Pike 15.5.24 GNU Compiler Collection sources 15.5.25 YCP - YaST2 scripting language 16 Other Data Formats 16.1 Internationalizable Data Formats 16.1.1 POT - Portable Object Template 16.1.2 Resource String Table 16.1.3 Glade - GNOME user interface description 16.1.4 GSettings - GNOME user configuration schema 16.1.5 AppData - freedesktop.org application description 16.1.6 Preparing Rules for XML Internationalization 16.1.6.1 Two Use-cases of Translated Strings in XML 16.2 Localized Data Formats 16.2.1 Editable Message Catalogs 16.2.1.1 PO - Portable Object 16.2.1.2 Java .properties 16.2.1.3 NeXTstep/GNUstep .strings 16.2.2 Compiled Message Catalogs 16.2.2.1 MO - Machine Object 16.2.2.2 Java ResourceBundle 16.2.2.3 C# Satellite Assembly 16.2.2.4 C# Resource 16.2.2.5 Tcl message catalog 16.2.2.6 Qt message catalog 16.2.3 Desktop Entry files 16.2.3.1 How to handle icons in Desktop Entry files 16.2.4 XML files 17 Concluding Remarks 17.1 History of GNU gettext 17.2 Notes on the Free Translation Project 17.2.1 INSTALL Matters 17.2.2 Using This Package 17.2.3 Translating Teams 17.2.4 Available Packages 17.2.5 Using gettext in new packages 17.3 Related Readings Appendix A Language Codes A.1 Usual Language Codes A.2 Rare Language Codes Appendix B Country Codes Appendix C Licenses C.1 GNU GENERAL PUBLIC LICENSE C.2 GNU LESSER GENERAL PUBLIC LICENSE C.3 GNU Free Documentation License Program Index Option Index Variable Index PO Mode Index Autoconf Macro Index General Index Next: The User's View , Previous: GNU gettext utilities , Up: GNU gettext utilities [ Contents ][ Index ] 1 Introduction This chapter explains the goals sought in the creation of GNU gettext and the free Translation Project. Then, it explains a few broad concepts around Native Language Support, and positions message translation with regard to other aspects of national and cultural variance, as they apply to programs. It also surveys those files used to convey the translations. It explains how the various tools interact in the initial generation of these files, and later, how the maintenance cycle should usually operate. In this manual, we use he when speaking of the programmer or maintainer, she when speaking of the translator, and they when speaking of the installers or end users of the translated program. This is only a convenience for clarifying the documentation. It is absolutely not meant to imply that some roles are more appropriate to males or females. Besides, as you might guess, GNU gettext is meant to be useful for people using computers, whatever their sex, race, religion or nationality! Please submit suggestions and corrections either in the bug tracker at https://savannah.gnu.org/projects/gettext or by email to bug-gettext@gnu.org . Please include the manual's edition number and update date in your messages. The Purpose of GNU gettext I18n, L10n, and Such Aspects in Native Language Support Files Conveying Translations Overview of GNU gettext Next: I18n, L10n, and Such , Up: Introduction [ Contents ][ Index ] 1.1 The Purpose of GNU gettext Usually, programs are written and documented in English, and use English at execution time to interact with users. This is true not only of GNU software, but also of a great deal of proprietary and free software. Using a common language is quite handy for communication between developers, maintainers and users from all countries. On the other hand, most people are less comfortable with English than with their own native language, and would prefer to use their mother tongue for day to day's work, as far as possible. Many would simply love to see their computer screen showing a lot less of English, and far more of their own language. However, to many people, this dream might appear so far fetched that they may believe it is not even worth spending time thinking about it. They have no confidence at all that the dream might ever become true. Yet some have not lost hope, and have organized themselves. The Translation Project is a formalization of this hope into a workable structure, which has a good chance to get all of us nearer the achievement of a truly multi-lingual set of programs. GNU gettext is an important step for the Translation Project, as it is an asset on which we may build many other steps. This package offers to programmers, translators and even users, a well integrated set of tools and documentation. Specifically, the GNU gettext utilities are a set of tools that provides a framework within which other free packages may produce multi-lingual messages. These tools include A set of conventions about how programs should be written to support message catalogs. A directory and file naming organization for the message catalogs themselves. A runtime library supporting the retrieval of translated messages. A few stand-alone programs to massage in various ways the sets of translatable strings, or already translated strings. A library supporting the parsing and creation of files containing translated messages. A special mode for Emacs 1 which helps preparing these sets and bringing them up to date. GNU gettext is designed to minimize the impact of internationalization on program sources, keeping this impact as small and hardly noticeable as possible. Internationalization has better chances of succeeding if it is very light weighted, or at least, appear to be so, when looking at program sources. The Translation Project also uses the GNU gettext distribution as a vehicle for documenting its structure and methods. This goes beyond the strict technicalities of documenting the GNU gettext proper. By so doing, translators will find in a single place, as far as possible, all they need to know for properly doing their translating work. Also, this supplemental documentation might also help programmers, and even curious users, in understanding how GNU gettext is related to the remainder of the Translation Project, and consequently, have a glimpse at the big picture . Next: Aspects in Native Language Support , Previous: The Purpose of GNU gettext , Up: Introduction [ Contents ][ Index ] 1.2 I18n, L10n, and Such Two long words appear all the time when we discuss support of native language in programs, and these words have a precise meaning, worth being explained here, once and for all in this document. The words are internationalization and localization . Many people, tired of writing these long words over and over again, took the habit of writing i18n and l10n instead, quoting the first and last letter of each word, and replacing the run of intermediate letters by a number merely telling how many such letters there are. But in this manual, in the sake of clarity, we will patiently write the names in full, each time… By internationalization , one refers to the operation by which a program, or a set of programs turned into a package, is made aware of and able to support multiple languages. This is a generalization process, by which the programs are untied from calling only English strings or other English specific habits, and connected to generic ways of doing the same, instead. Program developers may use various techniques to internationalize their programs. Some of these have been standardized. GNU gettext offers one of these standards. See The Programmer's View . By localization , one means the operation by which, in a set of programs already internationalized, one gives the program all needed information so that it can adapt itself to handle its input and output in a fashion which is correct for some native language and cultural habits. This is a particularisation process, by which generic methods already implemented in an internationalized program are used in specific ways. The programming environment puts several functions to the programmers disposal which allow this runtime configuration. The formal description of specific set of cultural habits for some country, together with all associated translations targeted to the same native language, is called the locale for this language or country. Users achieve localization of programs by setting proper values to special environment variables, prior to executing those programs, identifying which locale should be used. In fact, locale message support is only one component of the cultural data that makes up a particular locale. There are a whole host of routines and functions provided to aid programmers in developing internationalized software and which allow them to access the data stored in a particular locale. When someone presently refers to a particular locale, they are obviously referring to the data stored within that particular locale. Similarly, if a programmer is referring to “accessing the locale routines”, they are referring to the complete suite of routines that access all of the locale's information. One uses the expression Native Language Support , or merely NLS, for speaking of the overall activity or feature encompassing both internationalization and localization, allowing for multi-lingual interactions in a program. In a nutshell, one could say that internationalization is the operation by which further localizations are made possible. Also, very roughly said, when it comes to multi-lingual messages, internationalization is usually taken care of by programmers, and localization is usually taken care of by translators. Next: Files Conveying Translations , Previous: I18n, L10n, and Such , Up: Introduction [ Contents ][ Index ] 1.3 Aspects in Native Language Support For a totally multi-lingual distribution, there are many things to translate beyond output messages. As of today, GNU gettext offers a complete toolset for translating messages output by C programs. Perl scripts and shell scripts will also need to be translated. Even if there are today some hooks by which this can be done, these hooks are not integrated as well as they should be. Some programs, like autoconf or bison , are able to produce other programs (or scripts). Even if the generating programs themselves are internationalized, the generated programs they produce may need internationalization on their own, and this indirect internationalization could be automated right from the generating program. In fact, quite usually, generating and generated programs could be internationalized independently, as the effort needed is fairly orthogonal. A few programs include textual tables which might need translation themselves, independently of the strings contained in the program itself. For example, RFC 1345 gives an English description for each character which the recode program is able to reconstruct at execution. Since these descriptions are extracted from the RFC by mechanical means, translating them properly would require a prior translation of the RFC itself. Almost all programs accept options, which are often worded out so to be descriptive for the English readers; one might want to consider offering translated versions for program options as well. Many programs read, interpret, compile, or are somewhat driven by input files which are texts containing keywords, identifiers, or replies which are inherently translatable. For example, one may want gcc to allow diacriticized characters in identifiers or use translated keywords; ‘ rm -i ' might accept something else than ‘ y ' or ‘ n ' for replies, etc. Even if the program will eventually make most of its output in the foreign languages, one has to decide whether the input syntax, option values, etc., are to be localized or not. The manual accompanying a package, as well as all documentation files in the distribution, could surely be translated, too. Translating a manual, with the intent of later keeping up with updates, is a major undertaking in itself, generally. As we already stressed, translation is only one aspect of locales. Other internationalization aspects are system services and are handled in GNU libc . There are many attributes that are needed to define a country's cultural conventions. These attributes include beside the country's native language, the formatting of the date and time, the representation of numbers, the symbols for currency, etc. These local rules are termed the country's locale. The locale represents the knowledge needed to support the country's native attributes. There are a few major areas which may vary between countries and hence, define what a locale must describe. The following list helps putting multi-lingual messages into the proper context of other tasks related to locales. See the GNU libc manual for details. Characters and Codesets ¶ The codeset most commonly used through out the USA and most English speaking parts of the world is the ASCII codeset. However, there are many characters needed by various locales that are not found within this codeset. The 8-bit ISO 8859-1 code set has most of the special characters needed to handle the major European languages. However, in many cases, choosing ISO 8859-1 is nevertheless not adequate: it doesn't even handle the major European currency. Hence each locale will need to specify which codeset they need to use and will need to have the appropriate character handling routines to cope with the codeset. Currency ¶ The symbols used vary from country to country as does the position used by the symbol. Software needs to be able to transparently display currency figures in the native mode for each locale. Dates ¶ The format of date varies between locales. For example, Christmas day in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia. Other countries might use ISO 8601 dates, etc. Time of the day may be noted as hh : mm , hh . mm , or otherwise. Some locales require time to be specified in 24-hour mode rather than as AM or PM. Further, the nature and yearly extent of the Daylight Saving correction vary widely between countries. Numbers ¶ Numbers can be represented differently in different locales. For example, the following numbers are all written correctly for their respective locales: 12,345.67 English 12.345,67 German 12345,67 French 1,2345.67 Asia Some programs could go further and use different unit systems, like English units or Metric units, or even take into account variants about how numbers are spelled in full. Messages ¶ The most obvious area is the language support within a locale. This is where GNU gettext provides the means for developers and users to easily change the language that the software uses to communicate to the user. These areas of cultural conventions are called locale categories . It is an unfortunate term; locale aspects or locale feature categories would be a better term, because each “locale category” describes an area or task that requires localization. The concrete data that describes the cultural conventions for such an area and for a particular culture is also called a locale category . In this sense, a locale is composed of several locale categories: the locale category describing the codeset, the locale category describing the formatting of numbers, the locale category containing the translated messages, and so on. Components of locale outside of message handling are standardized in the ISO C standard and the POSIX:2001 standard (also known as the SUSV3 specification). GNU libc fully implements this, and most other modern systems provide a more or less reasonable support for at least some of the missing components. Next: Overview of GNU gettext , Previous: Aspects in Native Language Support , Up: Introduction [ Contents ][ Index ] 1.4 Files Conveying Translations The letters PO in .po files means Portable Object, to distinguish it from .mo files, where MO stands for Machine Object. This paradigm, as well as the PO file format, is inspired by the NLS standard developed by Uniforum, and first implemented by Sun in their Solaris system. PO files are meant to be read and edited by humans, and associate each original, translatable string of a given package with its translation in a particular target language. A single PO file is dedicated to a single target language. If a package supports many languages, there is one such PO file per language supported, and each package has its own set of PO files. These PO files are best created by the xgettext program, and later updated or refreshed through the msgmerge program. Program xgettext extracts all marked messages from a set of C files and initializes a PO file with empty translations. Program msgmerge takes care of adjusting PO files between releases of the corresponding sources, commenting obsolete entries, initializing new ones, and updating all source line references. Files ending with .pot are kind of base translation files found in distributions, in PO file format. MO files are meant to be read by programs, and are binary in nature. A few systems already offer tools for creating and handling MO files as part of the Native Language Support coming with the system, but the format of these MO files is often different from system to system, and non-portable. The tools already provided with these systems don't support all the features of GNU gettext . Therefore GNU gettext uses its own format for MO files. Files ending with .gmo are really MO files, when it is known that these files use the GNU format. Previous: Files Conveying Translations , Up: Introduction [ Contents ][ Index ] 1.5 Overview of GNU gettext The following diagram summarizes the relation between the files handled by GNU gettext and the tools acting on these files. It is followed by somewhat detailed explanations, which you should read while keeping an eye on the diagram. Having a clear understanding of these interrelations will surely help programmers, translators and maintainers. Original C Sources ───> Preparation ───> Marked C Sources ───╮ │ ╭─────────<─── GNU gettext Library │ ╭─── make <───┤ │ │ ╰─────────<────────────────────┬───────────────╯ │ │ │ ╭─────<─── PACKAGE.pot <─── xgettext <───╯ ╭───<─── PO Compendium │ │ │ ↑ │ │ ╰───╮ │ │ ╰───╮ ├───> PO editor ───╮ │ ├────> msgmerge ──────> LANG.po ────>────────╯ │ │ ╭───╯ │ │ │ │ │ ╰─────────────<───────────────╮ │ │ ├─── New LANG.po <────────────────────╯ │ ╭─── LANG.gmo <─── msgfmt <───╯ │ │ │ ╰───> install ───> /.../LANG/PACKAGE.mo ───╮ │ ├───> "Hello world!" ╰───────> install ───> /.../bin/PROGRAM ───────╯ As a programmer, the first step to bringing GNU gettext into your package is identifying, right in the C sources, those strings which are meant to be translatable, and those which are untranslatable. This tedious job can be done a little more comfortably using emacs PO mode, but you can use any means familiar to you for modifying your C sources. Beside this some other simple, standard changes are needed to properly initialize the translation library. See Preparing Program Sources , for more information about all this. For newly written software the strings of course can and should be marked while writing it. The gettext approach makes this very easy. Simply put the following lines at the beginning of each file or in a central header file: #define _(String) (String) #define N_(String) String #define textdomain(Domain) #define bindtextdomain(Package, Directory) Doing this allows you to prepare the sources for internationalization. Later when you feel ready for the step to use the gettext library simply replace these definitions by the following: #include <libintl.h> #define _(String) gettext (String) #define gettext_noop(String) String #define N_(String) gettext_noop (String) and link against libintl.a or libintl.so . Note that on GNU systems, you don't need to link with libintl because the gettext library functions are already contained in GNU libc. That is all you have to change. Once the C sources have been modified, the xgettext program is used to find and extract all translatable strings, and create a PO template file out of all these. This package .pot file contains all original program strings. It has sets of pointers to exactly where in C sources each string is used. All translations are set to empty. The letter t in .pot marks this as a Template PO file, not yet oriented towards any particular language. See Invoking the xgettext Program , for more details about how one calls the xgettext program. If you are really lazy, you might be interested at working a lot more right away, and preparing the whole distribution setup (see The Maintainer's View ). By doing so, you spare yourself typing the xgettext command, as make should now generate the proper things automatically for you! The first time through, there is no lang .po yet, so the msgmerge step may be skipped and replaced by a mere copy of package .pot to lang .po , where lang represents the target language. See Creating a New PO File for details. Then comes the initial translation of messages. Translation in itself is a whole matter, still exclusively meant for humans, and whose complexity far overwhelms the level of this manual. Nevertheless, a few hints are given in some other chapter of this manual (see The Translator's View ). You will also find there indications about how to contact translating teams, or becoming part of them, for sharing your translating concerns with others who target the same native language. While adding the translated messages into the lang .po PO file, if you are not using one of the dedicated PO file editors (see Editing PO Files ), you are on your own for ensuring that your efforts fully respect the PO file format, and quoting conventions (see The Format of PO Files ). This is surely not an impossible task, as this is the way many people have handled PO files around 1995. On the other hand, by using a PO file editor, most details of PO file format are taken care of for you, but you have to acquire some familiarity with PO file editor itself. If some common translations have already been saved into a compendium PO file, translators may use PO mode for initializing untranslated entries from the compendium, and also save selected translations into the compendium, updating it (see Using Translation Compendia ). Compendium files are meant to be exchanged between members of a given translation team. Programs, or packages of programs, are dynamic in nature: users write bug reports and suggestion for improvements, maintainers react by modifying programs in various ways. The fact that a package has already been internationalized should not make maintainers shy of adding new strings, or modifying strings already translated. They just do their job the best they can. For the Translation Project to work smoothly, it is important that maintainers do not carry translation concerns on their already loaded shoulders, and that translators be kept as free as possible of programming concerns. The only concern maintainers should have is carefully marking new strings as translatable, when they should be, and do not otherwise worry about them being translated, as this will come in proper time. Consequently, when programs and their strings are adjusted in various ways by maintainers, and for matters usually unrelated to translation, xgettext would construct package .pot files which are evolving over time, so the translations carried by lang .po are slowly fading out of date. It is important for translators (and even maintainers) to understand that package translation is a continuous process in the lifetime of a package, and not something which is done once and for all at the start. After an initial burst of translation activity for a given package, interventions are needed once in a while, because here and there, translated entries become obsolete, and new untranslated entries appear, needing translation. The msgmerge program has the purpose of refreshing an already existing lang .po file, by comparing it with a newer package .pot template file, extracted by xgettext out of recent C sources. The refreshing operation adjusts all references to C source locations for strings, since these strings move as programs are modified. Also, msgmerge comments out as obsolete, in lang .po , those already translated entries which are no longer used in the program sources (see Obsolete Entries ). It finally discovers new strings and inserts them in the resulting PO file as untranslated entries (see Untranslated Entries ). See Invoking the msgmerge Program , for more information about what msgmerge really does. Whatever route or means taken, the goal is to obtain an updated lang .po file offering translations for all strings. The temporal mobility, or fluidity of PO files, is an integral part of the translation game, and should be well understood, and accepted. People resisting it will have a hard time participating in the Translation Project, or will give a hard time to other participants! In particular, maintainers should relax and include all available official PO files in their distributions, even if these have not recently been updated, without exerting pressure on the translator teams to get the job done. The pressure should rather come from the community of users speaking a particular language, and maintainers should consider themselves fairly relieved of any concern about the adequacy of translation files. On the other hand, translators should reasonably try updating the PO files they are responsible for, while the package is undergoing pretest, prior to an official distribution. Once the PO file is complete and dependable, the msgfmt program is used for turning the PO file into a machine-oriented format, which may yield efficient retrieval of translations by the programs of the package, whenever needed at runtime (see The Format of GNU MO Files ). See Invoking the msgfmt Program , for more information about all modes of execution for the msgfmt program. Finally, the modified and marked C sources are compiled and linked with the GNU gettext library, usually through the operation of make , given a suitable Makefile exists for the project, and the resulting executable is installed somewhere users will find it. The MO files themselves should also be properly installed. Given the appropriate environment variables are set (see Setting the Locale through Environment Variables ), the program should localize itself automatically, whenever it executes. The remainder of this manual has the purpose of explaining in depth the various steps outlined above. Next: The Format of PO Files , Previous: Introduction , Up: GNU gettext utilities [ Contents ][ Index ] 2 The User's View Nowadays, when users log into a computer, they usually find that all their programs show messages in their native language – at least for users of languages with an active free software community, like French or German; to a lesser extent for languages with a smaller participation in free software and the GNU project, like Hindi and Filipino. How does this work? How can the user influence the language that is used by the programs? This chapter will answer it. Operating System Installation Setting the Locale Used by GUI Programs Setting the Locale through Environment Variables Obtaining good output in a Windows console Installing Translations for Particular Programs Next: Setting the Locale Used by GUI Programs , Up: The User's View [ Contents ][ Index ] 2.1 Operating System Installation The default language is often already specified during operating system installation. When the operating system is installed, the installer typically asks for the language used for the installation process and, separately, for the language to use in the installed system. Some OS installers only ask for the language once. This determines the system-wide default language for all users. But the installers often give the possibility to install extra localizations for additional languages. For example, the localizations of KDE (the K Desktop Environment) and LibreOffice are often bundled separately, as one installable package per language. At this point it is good to consider the intended use of the machine: If it is a machine designated for personal use, additional localizations are probably not necessary. If, however, the machine is in use in an organization or company that has international relationships, one can consider the needs of guest users. If you have a guest from abroad, for a week, what could be his preferred locales? It may be worth installing these additional localizations ahead of time, since they cost only a bit of disk space at this point. The system-wide default language is the locale configuration that is used when a new user account is created. But the user can have his own locale configuration that is different from the one of the other users of the same machine. He can specify it, typically after the first login, as described in the next section. Next: Setting the Locale through Environment Variables , Previous: Operating System Installation , Up: The User's View [ Contents ][ Index ] 2.2 Setting the Locale Used by GUI Programs The immediately available programs in a user's desktop come from a group of programs called a “desktop environment”; it usually includes the window manager, a web browser, a text editor, and more. The most common free desktop environments are KDE, GNOME, and Xfce. The locale used by GUI programs of the desktop environment can be specified in a configuration screen called “control center”, “language settings” or “country settings”. Individual GUI programs that are not part of the desktop environment can have their locale specified either in a settings panel, or through environment variables. For some programs, it is possible to specify the locale through environment variables, possibly even to a different locale than the desktop's locale. This means, instead of starting a program through a menu or from the file system, you can start it from the command-line, after having set some environment variables. The environment variables can be those specified in the next section ( Setting the Locale through Environment Variables ); for some versions of KDE, however, the locale is specified through a variable KDE_LANG , rather than LANG or LC_ALL . Next: Obtaining good output in a Windows console , Previous: Setting the Locale Used by GUI Programs , Up: The User's View [ Contents ][ Index ] 2.3 Setting the Locale through Environment Variables As a user, if your language has been installed for this package, in the simplest case, you only have to set the LANG environment variable to the appropriate ‘ ll _ CC ' combination. For example, let's suppose that you speak German and live in Germany. At the shell prompt, merely execute ‘ setenv LANG de_DE ' (in csh ), ‘ export LANG; LANG=de_DE ' (in sh ) or ‘ export LANG=de_DE ' (in bash ). This can be done from your .login or .profile file, once and for all. Locale Names Locale Environment Variables Specifying a Priority List of Languages Next: Locale Environment Variables , Up: Setting the Locale through Environment Variables [ Contents ][ Index ] 2.3.1 Locale Names A locale name usually has the form ‘ ll _ CC '. Here ‘ ll ' is an ISO 639 two-letter language code, and ‘ CC ' is an ISO 3166 two-letter country code. For example, for German in Germany, ll is de , and CC is DE . You find a list of the language codes in appendix Language Codes and a list of the country codes in appendix Country Codes . You might think that the country code specification is redundant. But in fact, some languages have dialects in different countries. For example, ‘ de_AT ' is used for Austria, and ‘ pt_BR ' for Brazil. The country code serves to distinguish the dialects. Many locale names have an extended syntax ‘ ll _ CC . encoding ' that also specifies the character encoding. These are in use because between 2000 and 2005, most users have switched to locales in UTF-8 encoding. For example, the German locale on glibc systems is nowadays ‘ de_DE.UTF-8 '. The older name ‘ de_DE ' still refers to the German locale as of 2000 that stores characters in ISO-8859-1 encoding – a text encoding that cannot even accommodate the Euro currency sign. Some locale names use ‘ ll _ CC @ variant ' instead of ‘ ll _ CC '. The ‘ @ variant ' can denote any kind of characteristics that is not already implied by the language ll and the country CC . It can denote a particular monetary unit. For example, on glibc systems, ‘ de_DE@euro ' denotes the locale that uses the Euro currency, in contrast to the older locale ‘ de_DE ' which implies the use of the currency before 2002. It can also denote a dialect of the language, or the script used to write text (for example, ‘ sr_RS@latin ' uses the Latin script, whereas ‘ sr_RS ' uses the Cyrillic script to write Serbian), or the orthography rules, or similar. On other systems, some variations of this scheme are used, such as ‘ ll '. You can get the list of locales supported by your system for your language by running the command ‘ locale -a | grep '^ ll ' '. There is also a special locale, called ‘ C '. When it is used, it disables all localization: in this locale, all programs standardized by POSIX use English messages and an unspecified character encoding (often US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending on the operating system). Next: Specifying a Priority List of Languages , Previous: Locale Names , Up: Setting the Locale through Environment Variables [ Contents ][ Index ] 2.3.2 Locale Environment Variables A locale is composed of several locale categories , see Aspects in Native Language Support . When a program looks up locale dependent values, it does this according to the following environment variables, in priority order: LANGUAGE LC_ALL LC_xxx , according to selected locale category: LC_CTYPE , LC_NUMERIC , LC_TIME , LC_COLLATE , LC_MONETARY , LC_MESSAGES , ... LANG Variables whose value is set but is empty are ignored in this lookup. LANG is the normal environment variable for specifying a locale. As a user, you normally set this variable (unless some of the other variables have already been set by the system, in /etc/profile or similar initialization files). LC_CTYPE , LC_NUMERIC , LC_TIME , LC_COLLATE , LC_MONETARY , LC_MESSAGES , and so on, are the environment variables meant to override LANG and affecting a single locale category only. For example, assume you are a Swedish user in Spain, and you want your programs to handle numbers and dates according to Spanish conventions, and only the messages should be in Swedish. Then you could create a locale named ‘ sv_ES ' or ‘ sv_ES.UTF-8 ' by use of the localedef program. But it is simpler, and achieves the same effect, to set the LANG variable to es_ES.UTF-8 and the LC_MESSAGES variable to sv_SE.UTF-8 ; these two locales come already preinstalled with the operating system. LC_ALL is an environment variable that overrides all of these. It is typically used in scripts that run particular programs. For example, configure scripts generated by GNU autoconf use LC_ALL to make sure that the configuration tests don't operate in locale dependent ways. Some systems, unfortunately, set LC_ALL in /etc/profile or in similar initialization files. As a user, you therefore have to unset this variable if you want to set LANG and optionally some of the other LC_xxx variables. The LANGUAGE variable is described in the next subsection. Previous: Locale Environment Variables , Up: Setting the Locale through Environment Variables [ Contents ][ Index ] 2.3.3 Specifying a Priority List of Languages Not all programs have translations for all languages. By default, an English message is shown in place of a nonexistent translation. If you understand other languages, you can set up a priority list of languages. This is done through a different environment variable, called LANGUAGE . GNU gettext gives preference to LANGUAGE over LC_ALL and LANG for the purpose of message handling, but you still need to have LANG (or LC_ALL ) set to the primary language; this is required by other parts of the system libraries. For example, some Swedish users who would rather read translations in German than English for when Swedish is not available, set LANGUAGE to ‘ sv:de ' while leaving LANG to ‘ sv_SE '. Special advice for Norwegian users: The language code for Norwegian bokmål changed from ‘ no ' to ‘ nb ' recently (in 2003). During the transition period, while some message catalogs for this language are installed under ‘ nb ' and some older ones under ‘ no ', it is recommended for Norwegian users to set LANGUAGE to ‘ nb:no ' so that both newer and older translations are used. In the LANGUAGE environment variable, but not in the other environment variables, ‘ ll _ CC ' combinations can be abbreviated as ‘ ll ' to denote the language's main dialect. For example, ‘ de ' is equivalent to ‘ de_DE ' (German as spoken in Germany), and ‘ pt ' to ‘ pt_PT ' (Portuguese as spoken in Portugal) in this context. Note: The variable LANGUAGE is ignored if the locale is set to ‘ C '. In other words, you have to first enable localization, by setting LANG (or LC_ALL ) to a value other than ‘ C ', before you can use a language priority list through the LANGUAGE variable. Next: Installing Translations for Particular Programs , Previous: Setting the Locale through Environment Variables , Up: The User's View [ Contents ][ Index ] 2.4 Obtaining good output in a Windows console On Windows, consoles such as the one started by the cmd.exe program do input and output in an encoding, called “OEM code page”, that is different from the encoding that text-mode programs usually use, called “ANSI code page”. (Note: This problem does not exist for Cygwin consoles; these consoles do input and output in the UTF-8 encoding.) As a workaround, you may request that the programs produce output in this “OEM” encoding. To do so, set the environment variable OUTPUT_CHARSET to the “OEM” encoding, through a command such as set OUTPUT_CHARSET=CP850 Note: This has an effect only on strings looked up in message catalogs; other categories of text are usually not affected by this setting. Note also that this environment variable also affects output sent to a file or to a pipe; output to a file is most often expected to be in the “ANSI” or in the UTF-8 encoding. Here are examples of the “ANSI” and “OEM” code pages: Territories ANSI encoding OEM encoding Western Europe CP1252 CP850 Slavic countries (Latin 2) CP1250 CP852 Baltic countries CP1257 CP775 Russia CP1251 CP866 Previous: Obtaining good output in a Windows console , Up: The User's View [ Contents ][ Index ] 2.5 Installing Translations for Particular Programs Languages are not equally well supported in all packages using GNU gettext , and more translations are added over time. Usually, you use the translations that are shipped with the operating system or with particular packages that you install afterwards. But you can also install newer localizations directly. For doing this, you will need an understanding where each localization file is stored on the file system. For programs that participate in the Translation Project, you can start looking for translations here: https://translationproject.org/team/index.html . For programs that are part of the KDE project, the starting point is: https://l10n.kde.org/ . For programs that are part of the GNOME project, the starting point is: https://wiki.gnome.org/TranslationProject . For other programs, you may check whether the program's source code package contains some ll .po files; often they are kept together in a directory called po/ . Each ll .po file contains the message translations for the language whose abbreviation of ll . Next: Preparing Program Sources , Previous: The User's View , Up: GNU gettext utilities [ Contents ][ Index ] 3 The Format of PO Files The GNU gettext toolset helps programmers and translators at producing, updating and using translation files, mainly those PO files which are textual, editable files. This chapter explains the format of PO files. A PO file is made up of many entries, each entry holding the relation between an original untranslated string and its corresponding translation. All entries in a given PO file usually pertain to a single project, and all translations are expressed in a single target language. One PO file entry has the following schematic structure: white-space # translator-comments #. extracted-comments #: reference … #, flag … #| msgid previous-untranslated-string msgid untranslated-string msgstr translated-string The general structure of a PO file should be well understood by the translator. When using PO mode, very little has to be known about the format details, as PO mode takes care of them for her. A simple entry can look like this: #: lib/error.c:116 msgid "Unknown system error" msgstr "Error desconegut del sistema" Entries begin with some optional white space. Usually, when generated through GNU gettext tools, there is exactly one blank line between entries. Then comments follow, on lines all starting with the character # . There are two kinds of comments: those which have some white space immediately following the # - the translator comments -, which comments are created and maintained exclusively by the translator, and those which have some non-white character just after the # - the automatic comments -, which comments are created and maintained automatically by GNU gettext tools. Comment lines starting with #. contain comments given by the programmer, directed at the translator; these comments are called extracted comments because the xgettext program extracts them from the program's source code. Comment lines starting with #: contain references to the program's source code. Comment lines starting with #, contain flags; more about these below. Comment lines starting with #| contain the previous untranslated string for which the translator gave a translation. All comments, of either kind, are optional. References to the program's source code, in lines that start with #: , are of the form file_name : line_number or just file_name . If the file_name contains spaces. it is enclosed within Unicode characters U+2068 and U+2069. After white space and comments, entries show two strings, namely first the untranslated string as it appears in the original program sources, and then, the translation of this string. The original string is introduced by the keyword msgid , and the translation, by msgstr . The two strings, untranslated and translated, are quoted in various ways in the PO file, using " delimiters and \ escapes, but the translator does not really have to pay attention to the precise quoting format, as PO mode fully takes care of quoting for her. The msgid strings, as well as automatic comments, are produced and managed by other GNU gettext tools, and PO mode does not provide means for the translator to alter these. The most she can do is merely deleting them, and only by deleting the whole entry. On the other hand, the msgstr string, as well as translator comments, are really meant for the translator, and PO mode gives her the full control she needs. The comment lines beginning with #, are special because they are not completely ignored by the programs as comments generally are. The comma separated list of flag s is used by the msgfmt program to give the user some better diagnostic messages. Currently there are two forms of flags defined: fuzzy ¶ This flag can be generated by the msgmerge program or it can be inserted by the translator herself. It shows that the msgstr string might not be a correct translation (anymore). Only the translator can judge if the translation requires further modification, or is acceptable as is. Once satisfied with the translation, she then removes this fuzzy attribute. The msgmerge program inserts this when it combined the msgid and msgstr entries after fuzzy search only. See Fuzzy Entries . c-format ¶ no-c-format These flags should not be added by a human. Instead only the xgettext program adds them. In an automated PO file processing system as proposed here, the user's changes would be thrown away again as soon as the xgettext program generates a new template file. The c-format flag indicates that the untranslated string and the translation are supposed to be C format strings. The no-c-format flag indicates that they are not C format strings, even though the untranslated string happens to look like a C format string (with ‘ % ' directives). When the c-format flag is given for a string the msgfmt program does some more tests to check the validity of the translation. See Invoking the msgfmt Program , Special Comments preceding Keywords and C Format Strings . objc-format ¶ no-objc-format Likewise for Objective C, see Objective C Format Strings . c++-format ¶ no-c++-format Likewise for C++, see C++ Format Strings . python-format ¶ no-python-format Likewise for Python, see Python Format Strings . python-brace-format ¶ no-python-brace-format Likewise for Python brace, see Python Format Strings . java-format ¶ no-java-format Likewise for Java MessageFormat format strings, see Java Format Strings . java-printf-format ¶ no-java-printf-format Likewise for Java printf format strings, see Java Format Strings . csharp-format ¶ no-csharp-format Likewise for C#, see C# Format Strings . javascript-format ¶ no-javascript-format Likewise for JavaScript, see JavaScript Format Strings . scheme-format ¶ no-scheme-format Likewise for Scheme, see Scheme Format Strings . lisp-format ¶ no-lisp-format Likewise for Lisp, see Lisp Format Strings . elisp-format ¶ no-elisp-format Likewise for Emacs Lisp, see Emacs Lisp Format Strings . librep-format ¶ no-librep-format Likewise for librep, see librep Format Strings . ruby-format ¶ no-ruby-format Likewise for Ruby, see Ruby Format Strings . sh-format ¶ no-sh-format Likewise for Shell, see Shell Format Strings . awk-format ¶ no-awk-format Likewise for awk, see awk Format Strings . lua-format ¶ no-lua-format Likewise for Lua, see Lua Format Strings . object-pascal-format ¶ no-object-pascal-format Likewise for Object Pascal, see Object Pascal Format Strings . smalltalk-format ¶ no-smalltalk-format Likewise for Smalltalk, see Smalltalk Format Strings . qt-format ¶ no-qt-format Likewise for Qt, see Qt Format Strings . qt-plural-format ¶ no-qt-plural-format Likewise for Qt plural forms, see Qt Format Strings . kde-format ¶ no-kde-format Likewise for KDE, see KDE Format Strings . boost-format ¶ no-boost-format Likewise for Boost, see Boost Format Strings . tcl-format ¶ no-tcl-format Likewise for Tcl, see Tcl Format Strings . perl-format ¶ no-perl-format Likewise for Perl, see Perl Format Strings . perl-brace-format ¶ no-perl-brace-format Likewise for Perl brace, see Perl Format Strings . php-format ¶ no-php-format Likewise for PHP, see PHP Format Strings . gcc-internal-format ¶ no-gcc-internal-format Likewise for the GCC sources, see GCC internal Format Strings . gfc-internal-format ¶ no-gfc-internal-format Likewise for the GNU Fortran Compiler sources, see GFC internal Format Strings . ycp-format ¶ no-ycp-format Likewise for YCP, see YCP Format Strings . It is also possible to have entries with a context specifier. They look like this: white-space # translator-comments #. extracted-comments #: reference … #, flag … #| msgctxt previous-context #| msgid previous-untranslated-string msgctxt context msgid untranslated-string msgstr translated-string The context serves to disambiguate messages with the same untranslated-string . It is possible to have several entries with the same untranslated-string in a PO file, provided that they each have a different context . Note that an empty context string and an absent msgctxt line do not mean the same thing. A different kind of entries is used for translations which involve plural forms. white-space # translator-comments #. extracted-comments #: reference … #, flag … #| msgid previous-untranslated-string-singular #| msgid_plural previous-untranslated-string-plural msgid untranslated-string-singular msgid_plural untranslated-string-plural msgstr[0] translated-string-case-0 ... msgstr[N] translated-string-case-n Such an entry can look like this: #: src/msgcmp.c:338 src/po-lex.c:699 #, c-format msgid "found %d fatal error" msgid_plural "found %d fatal errors" msgstr[0] "s'ha trobat %d error fatal" msgstr[1] "s'han trobat %d errors fatals" Here also, a msgctxt context can be specified before msgid , like above. Here, additional kinds of flags can be used: range: ¶ This flag is followed by a range of non-negative numbers, using the syntax range: minimum-value .. maximum-value . It designates the possible values that the numeric parameter of the message can take. In some languages, translators may produce slightly better translations if they know that the value can only take on values between 0 and 10, for example. The previous-untranslated-string is optionally inserted by the msgmerge program, at the same time when it marks a message fuzzy. It helps the translator to see which changes were done by the developers on the untranslated-string . It happens that some lines, usually whitespace or comments, follow the very last entry of a PO file. Such lines are not part of any entry, and will be dropped when the PO file is processed by the tools, or may disturb some PO file editors. The remainder of this section may be safely skipped by those using a PO file editor, yet it may be interesting for everybody to have a better idea of the precise format of a PO file. On the other hand, those wishing to modify PO files by hand should carefully continue reading on. An empty untranslated-string is reserved to contain the header entry with the meta information (see Filling in the Header Entry ). This header entry should be the first entry of the file. The empty untranslated-string is reserved for this purpose and must not be used anywhere else. Each of untranslated-string and translated-string respects the C syntax for a character string, including the surrounding quotes and embedded backslashed escape sequences, except that universal character escape sequences ( \u and \U ) are not allowed. When the time comes to write multi-line strings, one should not use escaped newlines. Instead, a closing quote should follow the last character on the line to be continued, and an opening quote should resume the string at the beginning of the following PO file line. For example: msgid "" "Here is an example of how one might continue a very long string\n" "for the common case the string represents multi-line output.\n" In this example, the empty string is used on the first line, to allow better alignment of the H from the word ‘ Here ' over the f from the word ‘ for '. In this example, the msgid keyword is followed by three strings, which are meant to be concatenated. Concatenating the empty string does not change the resulting overall string, but it is a way for us to comply with the necessity of msgid to be followed by a string on the same line, while keeping the multi-line presentation left-justified, as we find this to be a cleaner disposition. The empty string could have been omitted, but only if the string starting with ‘ Here ' was promoted on the first line, right after msgid . 2 It was not really necessary either to switch between the two last quoted strings immediately after the newline ‘ \n ', the switch could have occurred after any other character, we just did it this way because it is neater. One should carefully distinguish between end of lines marked as ‘ \n ' inside quotes, which are part of the represented string, and end of lines in the PO file itself, outside string quotes, which have no incidence on the represented string. Outside strings, white lines and comments may be used freely. Comments start at the beginning of a line with ‘ # ' and extend until the end of the PO file line. Comments written by translators should have the initial ‘ # ' immediately followed by some white space. If the ‘ # ' is not immediately followed by white space, this comment is most likely generated and managed by specialized GNU tools, and might disappear or be replaced unexpectedly when the PO file is given to msgmerge . For a PO file to be valid, no two entries without msgctxt may have the same untranslated-string or untranslated-string-singular . Similarly, no two entries may have the same msgctxt and the same untranslated-string or untranslated-string-singular . Next: Making the PO Template File , Previous: The Format of PO Files , Up: GNU gettext utilities [ Contents ][ Index ] 4 Preparing Program Sources For the programmer, changes to the C source code fall into three categories. First, you have to make the localization functions known to all modules needing message translation. Second, you should properly trigger the operation of GNU gettext when the program initializes, usually from the main function. Last, you should identify, adjust and mark all constant strings in your program needing translation. Importing the gettext declaration Triggering gettext Operations Preparing Translatable Strings How Marks Appear in Sources Marking Translatable Strings Special Comments preceding Keywords Special Cases of Translatable Strings Letting Users Report Translation Bugs Marking Proper Names for Translation Preparing Library Sources Next: Triggering gettext Operations , Up: Preparing Program Sources [ Contents ][ Index ] 4.1 Importing the gettext declaration Presuming that your set of programs, or package, has been adjusted so all needed GNU gettext files are available, and your Makefile files are adjusted (see The Maintainer's View ), each C module having translated C strings should contain the line: #include <libintl.h> Similarly, each C module containing printf() / fprintf() /... calls with a format string that could be a translated C string (even if the C string comes from a different C module) should contain the line: #include <libintl.h> Next: Preparing Translatable Strings , Previous: Importing the gettext declaration , Up: Preparing Program Sources [ Contents ][ Index ] 4.2 Triggering gettext Operations The initialization of locale data should be done with more or less the same code in every program, as demonstrated below: int main (int argc, char *argv[]) { … setlocale (LC_ALL, ""); bindtextdomain (PACKAGE, LOCALEDIR); textdomain (PACKAGE); … } PACKAGE and LOCALEDIR should be provided either by config.h or by the Makefile. For now consult the gettext or hello sources for more information. The use of LC_ALL might not be appropriate for you. LC_ALL includes all locale categories and especially LC_CTYPE . This latter category is responsible for determining character classes with the isalnum etc. functions from ctype.h which could especially for programs, which process some kind of input language, be wrong. For example this would mean that a source code using the ç (c-cedilla character) is runnable in France but not in the U.S. Some systems also have problems with parsing numbers using the scanf functions if an other but the LC_ALL locale category is used. The standards say that additional formats but the one known in the "C" locale might be recognized. But some systems seem to reject numbers in the "C" locale format. In some situation, it might also be a problem with the notation itself which makes it impossible to recognize whether the number is in the "C" locale or the local format. This can happen if thousands separator characters are used. Some locales define this character according to the national conventions to '.' which is the same character used in the "C" locale to denote the decimal point. So it is sometimes necessary to replace the LC_ALL line in the code above by a sequence of setlocale lines { … setlocale (LC_CTYPE, ""); setlocale (LC_MESSAGES, ""); … } On all POSIX conformant systems the locale categories LC_CTYPE , LC_MESSAGES , LC_COLLATE , LC_MONETARY , LC_NUMERIC , and LC_TIME are available. On some systems which are only ISO C compliant, LC_MESSAGES is missing, but a substitute for it is defined in GNU gettext's <libintl.h> and in GNU gnulib's <locale.h> . Note that changing the LC_CTYPE also affects the functions declared in the <ctype.h> standard header and some functions declared in the <string.h> and <stdlib.h> standard headers. If this is not desirable in your application (for example in a compiler's parser), you can use a set of substitute functions which hardwire the C locale, such as found in the modules ‘ c-ctype ', ‘ c-strcase ', ‘ c-strcasestr ', ‘ c-strtod ', ‘ c-strtold ' in the GNU gnulib source distribution. It is also possible to switch the locale forth and back between the environment dependent locale and the C locale, but this approach is normally avoided because a setlocale call is expensive, because it is tedious to determine the places where a locale switch is needed in a large program's source, and because switching a locale is not multithread-safe. Next: How Marks Appear in Sources , Previous: Triggering gettext Operations , Up: Preparing Program Sources [ Contents ][ Index ] 4.3 Preparing Translatable Strings Before strings can be marked for translations, they sometimes need to be adjusted. Usually preparing a string for translation is done right before marking it, during the marking phase which is described in the next sections. What you have to keep in mind while doing that is the following. Decent English style. Entire sentences. Split at paragraphs. Use format strings instead of string concatenation. Use placeholders in format strings instead of embedded URLs. Use placeholders in format strings instead of programmer-defined format string directives. Avoid unusual markup and unusual control characters. Let's look at some examples of these guidelines. Decent English style Translatable strings should be in good English style. If slang language with abbreviations and shortcuts is used, often translators will not understand the message and will produce very inappropriate translations. "%s: is parameter\n" This is nearly untranslatable: Is the displayed item a parameter or the parameter? "No match" The ambiguity in this message makes it unintelligible: Is the program attempting to set something on fire? Does it mean "The given object does not match the template"? Does it mean "The template does not fit for any of the objects"? In both cases, adding more words to the message will help both the translator and the English speaking user. Entire sentences Translatable strings should be entire sentences. It is often not possible to translate single verbs or adjectives in a substitutable way. printf ("File %s is %s protected", filename, rw ? "write" : "read"); Most translators will not look at the source and will thus only see the string "File %s is %s protected" , which is unintelligible. Change this to printf (rw ? "File %s is write protected" : "File %s is read protected", filename); This way the translator will not only understand the message, she will also be able to find the appropriate grammatical construction. A French translator for example translates "write protected" like "protected against writing". Entire sentences are also important because in many languages, the declination of some word in a sentence depends on the gender or the number (singular/plural) of another part of the sentence. There are usually more interdependencies between words than in English. The consequence is that asking a translator to translate two half-sentences and then combining these two half-sentences through dumb string concatenation will not work, for many languages, even though it would work for English. That's why translators need to handle entire sentences. Often sentences don't fit into a single line. If a sentence is output using two subsequent printf statements, like this printf ("Locale charset \"%s\" is different from\n", lcharset); printf ("input file charset \"%s\".\n", fcharset); the translator would have to translate two half sentences, but nothing in the POT file would tell her that the two half sentences belong together. It is necessary to merge the two printf statements so that the translator can handle the entire sentence at once and decide at which place to insert a line break in the translation (if at all): printf ("Locale charset \"%s\" is different from\n\ input file charset \"%s\".\n", lcharset, fcharset); You may now ask: how about two or more adjacent sentences? Like in this case: puts ("Apollo 13 scenario: Stack overflow handling failed."); puts ("On the next stack overflow we will crash!!!"); Should these two statements merged into a single one? I would recommend to merge them if the two sentences are related to each other, because then it makes it easier for the translator to understand and translate both. On the other hand, if one of the two messages is a stereotypic one, occurring in other places as well, you will do a favour to the translator by not merging the two. (Identical messages occurring in several places are combined by xgettext, so the translator has to handle them once only.) Split at paragraphs Translatable strings should be limited to one paragraph; don't let a single message be longer than ten lines. The reason is that when the translatable string changes, the translator is faced with the task of updating the entire translated string. Maybe only a single word will have changed in the English string, but the translator doesn't see that (with the current translation tools), therefore she has to proofread the entire message. Many GNU programs have a ‘ --help ' output that extends over several screen pages. It is a courtesy towards the translators to split such a message into several ones of five to ten lines each. While doing that, you can also attempt to split the documented options into groups, such as the input options, the output options, and the informative output options. This will help every user to find the option he is looking for. No string concatenation Hardcoded string concatenation is sometimes used to construct English strings: strcpy (s, "Replace "); strcat (s, object1); strcat (s, " with "); strcat (s, object2); strcat (s, "?"); In order to present to the translator only entire sentences, and also because in some languages the translator might want to swap the order of object1 and object2 , it is necessary to change this to use a format string: sprintf (s, "Replace %s with %s?", object1, object2); A similar case is compile time concatenation of strings. The ISO C 99 include file <inttypes.h> contains a macro PRId64 that can be used as a formatting directive for outputting an ‘ int64_t ' integer through printf . It expands to a constant string, usually "d" or "ld" or "lld" or something like this, depending on the platform. Assume you have code like printf ("The amount is %0" PRId64 "\n", number); The gettext tools and library have special support for these <inttypes.h> macros. You can therefore simply write printf (gettext ("The amount is %0" PRId64 "\n"), number); The PO file will contain the string "The amount is %0<PRId64>\n". The translators will provide a translation containing "%0<PRId64>" as well, and at runtime the gettext function's result will contain the appropriate constant string, "d" or "ld" or "lld". This works only for the predefined <inttypes.h> macros. If you have defined your own similar macros, let's say ‘ MYPRId64 ', that are not known to xgettext , the solution for this problem is to change the code like this: char buf1[100]; sprintf (buf1, "%0" MYPRId64, number); printf (gettext ("The amount is %s\n"), buf1); This means, you put the platform dependent code in one statement, and the internationalization code in a different statement. Note that a buffer length of 100 is safe, because all available hardware integer types are limited to 128 bits, and to print a 128 bit integer one needs at most 54 characters, regardless whether in decimal, octal or hexadecimal. All this applies to other programming languages as well. For example, in Java and C#, string concatenation is very frequently used, because it is a compiler built-in operator. Like in C, in Java, you would change System.out.println("Replace "+object1+" with "+object2+"?"); into a statement involving a format string: System.out.println( MessageFormat.format("Replace {0} with {1}?", new Object[] { object1, object2 })); Similarly, in C#, you would change Console.WriteLine("Replace "+object1+" with "+object2+"?"); into a statement involving a format string: Console.WriteLine( String.Format("Replace {0} with {1}?", object1, object2)); No embedded URLs It is good to not embed URLs in translatable strings, for several reasons: It avoids possible mistakes during copy and paste. Translators cannot translate the URLs or, by mistake, use the URLs from other packages that are present in their compendium. When the URLs change, translators don't need to revisit the translation of the string. The same holds for email addresses. So, you would change fputs (_("GNU GPL version 3 <https://gnu.org/licenses/gpl.html>\n"), stream); to fprintf (stream, _("GNU GPL version 3 <%s>\n"), "https://gnu.org/licenses/gpl.html"); No programmer-defined format string directives The GNU C Library's <printf.h> facility and the C++ standard library's <format> header file make it possible for the programmer to define their own format string directives. However, such format directives cannot be used in translatable strings, for two reasons: There is no reference documentation for format strings with such directives, that the translators could consult. They would therefore have to guess where the directive starts and where it ends. An ‘ msgfmt -c ' invocation cannot check whether the translator has produced a compatible translation of the format string. As a consequence, when a format string contains a programmer-defined directive, the program may crash at runtime when it uses the translated format string. To avoid this situation, you need to move the formatting with the custom directive into a format string that does not get translated. For example, assuming code that makes use of a %r directive: fprintf (stream, _("The contents is: %r"), data); you would rewrite it to: char *tmp; if (asprintf (&tmp, "%r", data) < 0) error (...); fprintf (stream, _("The contents is: %s"), tmp); free (tmp); Similarly, in C++, assuming you have defined a custom formatter for the type of data , the code cout << format (_("The contents is: {:#$#}"), data); should be rewritten to: string tmp = format ("{:#$#}", data); cout << format (_("The contents is: {}"), tmp); No unusual markup Unusual markup or control characters should not be used in translatable strings. Translators will likely not understand the particular meaning of the markup or control characters. For example, if you have a convention that ‘ | ' delimits the left-hand and right-hand part of some GUI elements, translators will often not understand it without specific comments. It might be better to have the translator translate the left-hand and right-hand part separately. Another example is the ‘ argp ' convention to use a single ‘ \v ' (vertical tab) control character to delimit two sections inside a string. This is flawed. Some translators may convert it to a simple newline, some to blank lines. With some PO file editors it may not be easy to even enter a vertical tab control character. So, you cannot be sure that the translation will contain a ‘ \v ' character, at the corresponding position. The solution is, again, to let the translator translate two separate strings and combine at run-time the two translated strings with the ‘ \v ' required by the convention. HTML markup, however, is common enough that it's probably ok to use in translatable strings. But please bear in mind that the GNU gettext tools don't verify that the translations are well-formed HTML. Next: Marking Translatable Strings , Previous: Preparing Translatable Strings , Up: Preparing Program Sources [ Contents ][ Index ] 4.4 How Marks Appear in Sources All strings requiring translation should be marked in the C sources. Marking is done in such a way that each translatable string appears to be the sole argument of some function or preprocessor macro. There are only a few such possible functions or macros meant for translation, and their names are said to be marking keywords. The marking is attached to strings themselves, rather than to what we do with them. This approach has more uses. A blatant example is an error message produced by formatting. The format string needs translation, as well as some strings inserted through some ‘ %s ' specification in the format, while the result from sprintf may have so many different instances that it is impractical to list them all in some ‘ error_string_out() ' routine, say. This marking operation has two goals. The first goal of marking is for triggering the retrieval of the translation, at run time. The keyword is possibly resolved into a routine able to dynamically return the proper translation, as far as possible or wanted, for the argument string. Most localizable strings are found in executable positions, that is, attached to variables or given as parameters to functions. But this is not universal usage, and some translatable strings appear in structured initializations. See Special Cases of Translatable Strings . The second goal of the marking operation is to help xgettext at properly extracting all translatable strings when it scans a set of program sources and produces PO file templates. The canonical keyword for marking translatable strings is ‘ gettext ', it gave its name to the whole GNU gettext package. For packages making only light use of the ‘ gettext ' keyword, macro or function, it is easily used as is . However, for packages using the gettext interface more heavily, it is usually more convenient to give the main keyword a shorter, less obtrusive name. Indeed, the keyword might appear on a lot of strings all over the package, and programmers usually do not want nor need their program sources to remind them forcefully, all the time, that they are internationalized. Further, a long keyword has the disadvantage of using more horizontal space, forcing more indentation work on sources for those trying to keep them within 79 or 80 columns. Many packages use ‘ _ ' (a simple underline) as a keyword, and write ‘ _("Translatable string") ' instead of ‘ gettext ("Translatable string") '. Further, the coding rule, from GNU standards, wanting that there is a space between the keyword and the opening parenthesis is relaxed, in practice, for this particular usage. So, the textual overhead per translatable string is reduced to only three characters: the underline and the two parentheses. However, even if GNU gettext uses this convention internally, it does not offer it officially. The real, genuine keyword is truly ‘ gettext ' indeed. It is fairly easy for those wanting to use ‘ _ ' instead of ‘ gettext ' to declare: #include <libintl.h> #define _(String) gettext (String) instead of merely using ‘ #include <libintl.h> '. The marking keywords ‘ gettext ' and ‘ _ ' take the translatable string as sole argument. It is also possible to define marking functions that take it at another argument position. It is even possible to make the marked argument position depend on the total number of arguments of the function call; this is useful in C++. All this is achieved using xgettext 's ‘ --keyword ' option. How to pass such an option to xgettext , assuming that gettextize is used, is described in Makevars in po/ and AM_XGETTEXT_OPTION in po.m4 . Note also that long strings can be split across lines, into multiple adjacent string tokens. Automatic string concatenation is performed at compile time according to ISO C and ISO C++; xgettext also supports this syntax. In C++, marking a C++ format string requires a small code change, because the first argument to std::format must be a constant expression. For example, std::format ("{} {}!", "Hello", "world") needs to be changed to std::vformat (gettext ("{} {}!"), std::make_format_args("Hello", "world")) Later on, the maintenance is relatively easy. If, as a programmer, you add or modify a string, you will have to ask yourself if the new or altered string requires translation, and include it within ‘ _() ' if you think it should be translated. For example, ‘ "%s" ' is an example of string not requiring translation. But ‘ "%s: %d" ' does require translation, because in French, unlike in English, it's customary to put a space before a colon. Next: Special Comments preceding Keywords , Previous: How Marks Appear in Sources , Up: Preparing Program Sources [ Contents ][ Index ] 4.5 Marking Translatable Strings In PO mode, one set of features is meant more for the programmer than for the translator, and allows him to interactively mark which strings, in a set of program sources, are translatable, and which are not. Even if it is a fairly easy job for a programmer to find and mark such strings by other means, using any editor of his choice, PO mode makes this work more comfortable. Further, this gives translators who feel a little like programmers, or programmers who feel a little like translators, a tool letting them work at marking translatable strings in the program sources, while simultaneously producing a set of translation in some language, for the package being internationalized. The set of program sources, targeted by the PO mode commands describe here, should have an Emacs tags table constructed for your project, prior to using these PO file commands. This is easy to do. In any shell window, change the directory to the root of your project, then execute a command resembling: etags src/*.[hc] lib/*.[hc] presuming here you want to process all .h and .c files from the src/ and lib/ directories. This command will explore all said files and create a TAGS file in your root directory, somewhat summarizing the contents using a special file format Emacs can understand. For packages following the GNU coding standards, there is a make goal tags or TAGS which constructs the tag files in all directories and for all files containing source code. Once your TAGS file is ready, the following commands assist the programmer at marking translatable strings in his set of sources. But these commands are necessarily driven from within a PO file window, and it is likely that you do not even have such a PO file yet. This is not a problem at all, as you may safely open a new, empty PO file, mainly for using these commands. This empty PO file will slowly fill in while you mark strings as translatable in your program sources. , ¶ Search through program sources for a string which looks like a candidate for translation ( po-tags-search ). M-, ¶ Mark the last string found with ‘ _() ' ( po-mark-translatable ). M-. ¶ Mark the last string found with a keyword taken from a set of possible keywords. This command with a prefix allows some management of these keywords ( po-select-mark-and-mark ). The , ( po-tags-search ) command searches for the next occurrence of a string which looks like a possible candidate for translation, and displays the program source in another Emacs window, positioned in such a way that the string is near the top of this other window. If the string is too big to fit whole in this window, it is positioned so only its end is shown. In any case, the cursor is left in the PO file window. If the shown string would be better presented differently in different native languages, you may mark it using M-, or M-. . Otherwise, you might rather ignore it and skip to the next string by merely repeating the , command. A string is a good candidate for translation if it contains a sequence of three or more letters. A string containing at most two letters in a row will be considered as a candidate if it has more letters than non-letters. The command disregards strings containing no letters, or isolated letters only. It also disregards strings within comments, or strings already marked with some keyword PO mode knows (see below). If you have never told Emacs about some TAGS file to use, the command will request that you specify one from the minibuffer, the first time you use the command. You may later change your TAGS file by using the regular Emacs command M-x visit-tags-table , which will ask you to name the precise TAGS file you want to use. See Tag Tables in The Emacs Editor . Each time you use the , command, the search resumes from where it was left by the previous search, and goes through all program sources, obeying the TAGS file, until all sources have been processed. However, by giving a prefix argument to the command ( C-u , ) , you may request that the search be restarted all over again from the first program source; but in this case, strings that you recently marked as translatable will be automatically skipped. Using this , command does not prevent using of other regular Emacs tags commands. For example, regular tags-search or tags-query-replace commands may be used without disrupting the independent , search sequence. However, as implemented, the initial , command (or the , command is used with a prefix) might also reinitialize the regular Emacs tags searching to the first tags file, this reinitialization might be considered spurious. The M-, ( po-mark-translatable ) command will mark the recently found string with the ‘ _ ' keyword. The M-. ( po-select-mark-and-mark ) command will request that you type one keyword from the minibuffer and use that keyword for marking the string. Both commands will automatically create a new PO file untranslated entry for the string being marked, and make it the current entry (making it easy for you to immediately proceed to its translation, if you feel like doing it right away). It is possible that the modifications made to the program source by M-, or M-. render some source line longer than 80 columns, forcing you to break and re-indent this line differently. You may use the O command from PO mode, or any other window changing command from Emacs, to break out into the program source window, and do any needed adjustments. You will have to use some regular Emacs command to return the cursor to the PO file window, if you want command , for the next string, say. The M-. command has a few built-in speedups, so you do not have to explicitly type all keywords all the time. The first such speedup is that you are presented with a preferred keyword, which you may accept by merely typing RET at the prompt. The second speedup is that you may type any non-ambiguous prefix of the keyword you really mean, and the command will complete it automatically for you. This also means that PO mode has to know all your possible keywords, and that it will not accept mistyped keywords. If you reply ? to the keyword request, the command gives a list of all known keywords, from which you may choose. When the command is prefixed by an argument ( C-u M-. ) , it inhibits updating any program source or PO file buffer, and does some simple keyword management instead. In this case, the command asks for a keyword, written in full, which becomes a new allowed keyword for later M-. commands. Moreover, this new keyword automatically becomes the preferred keyword for later commands. By typing an already known keyword in response to C-u M-. , one merely changes the preferred keyword and does nothing more. All keywords known for M-. are recognized by the , command when scanning for strings, and strings already marked by any of those known keywords are automatically skipped. If many PO files are opened simultaneously, each one has its own independent set of known keywords. There is no provision in PO mode, currently, for deleting a known keyword, you have to quit the file (maybe using q ) and reopen it afresh. When a PO file is newly brought up in an Emacs window, only ‘ gettext ' and ‘ _ ' are known as keywords, and ‘ gettext ' is preferred for the M-. command. In fact, this is not useful to prefer ‘ _ ', as this one is already built in the M-, command. Next: Special Cases of Translatable Strings , Previous: Marking Translatable Strings , Up: Preparing Program Sources [ Contents ][ Index ] 4.6 Special Comments preceding Keywords In C programs strings are often used within calls of functions from the printf family. The special thing about these format strings is that they can contain format specifiers introduced with % . Assume we have the code printf (gettext ("String `%s' has %d characters\n"), s, strlen (s)); A possible German translation for the above string might be: "%d Zeichen lang ist die Zeichenkette `%s'" A C programmer, even if he cannot speak German, will recognize that there is something wrong here. The order of the two format specifiers is changed but of course the arguments in the printf don't have. This will most probably lead to problems because now the length of the string is regarded as the address. To prevent errors at runtime caused by translations, the msgfmt tool can check statically whether the arguments in the original and the translation string match in type and number. If this is not the case and the ‘ -c ' option has been passed to msgfmt , msgfmt will give an error and refuse to produce a MO file. Thus consistent use of ‘ msgfmt -c ' will catch the error, so that it cannot cause problems at runtime. If the word order in the above German translation would be correct one would have to write "%2$d Zeichen lang ist die Zeichenkette `%1$s'" The routines in msgfmt know about this special notation. Because not all strings in a program will be format strings, it is not useful for msgfmt to test all the strings in the .po file. This might cause problems because the string might contain what looks like a format specifier, but the string is not used in printf . Therefore xgettext adds a special tag to those messages it thinks might be a format string. There is no absolute rule for this, only a heuristic. In the .po file the entry is marked using the c-format flag in the #, comment line (see The Format of PO Files ). The careful reader now might say that this again can cause problems. The heuristic might guess it wrong. This is true and therefore xgettext knows about a special kind of comment which lets the programmer take over the decision. If in the same line as or the immediately preceding line to the gettext keyword the xgettext program finds a comment containing the words xgettext:c-format , it will mark the string in any case with the c-format flag. This kind of comment should be used when xgettext does not recognize the string as a format string but it really is one and it should be tested. Please note that when the comment is in the same line as the gettext keyword, it must be before the string to be translated. Also note that a comment such as xgettext:c-format applies only to the first string in the same or the next line, not to multiple strings. This situation happens quite often. The printf function is often called with strings which do not contain a format specifier. Of course one would normally use fputs but it does happen. In this case xgettext does not recognize this as a format string but what happens if the translation introduces a valid format specifier? The printf function will try to access one of the parameters but none exists because the original code does not pass any parameters. xgettext of course could make a wrong decision the other way round, i.e. a string marked as a format string actually is not a format string. In this case the msgfmt might give too many warnings and would prevent translating the .po file. The method to prevent this wrong decision is similar to the one used above, only the comment to use must contain the string xgettext:no-c-format . If a string is marked with c-format and this is not correct the user can find out who is responsible for the decision. See Invoking the xgettext Program to see how the --debug option can be used for solving this problem. Next: Letting Users Report Translation Bugs , Previous: Special Comments preceding Keywords , Up: Preparing Program Sources [ Contents ][ Index ] 4.7 Special Cases of Translatable Strings The attentive reader might now point out that it is not always possible to mark translatable string with gettext or something like this. Consider the following case: { static const char *messages[] = { "some very meaningful message", "and another one" }; const char *string; … string = index > 1 ? "a default message" : messages[index]; fputs (string); … } While it is no problem to mark the string "a default message" it is not possible to mark the string initializers for messages . What is to be done? We have to fulfill two tasks. First we have to mark the strings so that the xgettext program (see Invoking the xgettext Program ) can find them, and second we have to translate the string at runtime before printing them. The first task can be fulfilled by creating a new keyword, which names a no-op. For the second we have to mark all access points to a string from the array. So one solution can look like this: #define gettext_noop(String) String { static const char *messages[] = { gettext_noop ("some very meaningful message"), gettext_noop ("and another one") }; const char *string; … string = index > 1 ? gettext ("a default message") : gettext (messages[index]); fputs (string); … } Please convince yourself that the string which is written by fputs is translated in any case. How to get xgettext know the additional keyword gettext_noop is explained in Invoking the xgettext Program . The above is of course not the only solution. You could also come along with the following one: #define gettext_noop(String) String { static const char *messages[] = { gettext_noop ("some very meaningful message"), gettext_noop ("and another one") }; const char *string; … string = index > 1 ? gettext_noop ("a default message") : messages[index]; fputs (gettext (string)); … } But this has a drawback. The programmer has to take care that he uses gettext_noop for the string "a default message" . A use of gettext could have in rare cases unpredictable results. One advantage is that you need not make control flow analysis to make sure the output is really translated in any case. But this analysis is generally not very difficult. If it should be in any situation you can use this second method in this situation. Next: Marking Proper Names for Translation , Previous: Special Cases of Translatable Strings , Up: Preparing Program Sources [ Contents ][ Index ] 4.8 Letting Users Report Translation Bugs Code sometimes has bugs, but translations sometimes have bugs too. The users need to be able to report them. Reporting translation bugs to the programmer or maintainer of a package is not very useful, since the maintainer must never change a translation, except on behalf of the translator. Hence the translation bugs must be reported to the translators. Here is a way to organize this so that the maintainer does not need to forward translation bug reports, nor even keep a list of the addresses of the translators or their translation teams. Every program has a place where is shows the bug report address. For GNU programs, it is the code which handles the “–help” option, typically in a function called “usage”. In this place, instruct the translator to add her own bug reporting address. For example, if that code has a statement printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); you can add some translator instructions like this: /* TRANSLATORS: The placeholder indicates the bug-reporting address for this package. Please add _another line_ saying "Report translation bugs to <...>\n" with the address for translation bugs (typically your translation team's web or email address). */ printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); These will be extracted by ‘ xgettext ', leading to a .pot file that contains this: #. TRANSLATORS: The placeholder indicates the bug-reporting address #. for this package. Please add _another line_ saying #. "Report translation bugs to <...>\n" with the address for translation #. bugs (typically your translation team's web or email address). #: src/hello.c:178 #, c-format msgid "Report bugs to <%s>.\n" msgstr "" Next: Preparing Library Sources , Previous: Letting Users Report Translation Bugs , Up: Preparing Program Sources [ Contents ][ Index ] 4.9 Marking Proper Names for Translation Should names of persons, cities, locations etc. be marked for translation or not? People who only know languages that can be written with Latin letters (English, Spanish, French, German, etc.) are tempted to say “no”, because names usually do not change when transported between these languages. However, in general when translating from one script to another, names are translated too, usually phonetically or by transliteration. For example, Russian or Greek names are converted to the Latin alphabet when being translated to English, and English or French names are converted to the Katakana script when being translated to Japanese. This is necessary because the speakers of the target language in general cannot read the script the name is originally written in. As a programmer, you should therefore make sure that names are marked for translation, with a special comment telling the translators that it is a proper name and how to pronounce it. In its simple form, it looks like this: printf (_("Written by %s.\n"), /* TRANSLATORS: This is a proper name. See the gettext manual, section Names. Note this is actually a non-ASCII name: The first name is (with Unicode escapes) "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois". Pronunciation is like "fraa-swa pee-nar". */ _("Francois Pinard")); The GNU gnulib library offers a module ‘ propername ' ( https://www.gnu.org/software/gnulib/MODULES.html#module=propername ) which takes care to automatically append the original name, in parentheses, to the translated name. For names that cannot be written in ASCII, it also frees the translator from the task of entering the appropriate non-ASCII characters if no script change is needed. In this more comfortable form, it looks like this: printf (_("Written by %s and %s.\n"), proper_name ("Ulrich Drepper"), /* TRANSLATORS: This is a proper name. See the gettext manual, section Names. Note this is actually a non-ASCII name: The first name is (with Unicode escapes) "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois". Pronunciation is like "fraa-swa pee-nar". */ proper_name_utf8 ("Francois Pinard", "Fran\303\247ois Pinard")); You can also write the original name directly in Unicode (rather than with Unicode escapes or HTML entities) and denote the pronunciation using the International Phonetic Alphabet (see https://en.wikipedia.org/wiki/International_Phonetic_Alphabet ). As a translator, you should use some care when translating names, because it is frustrating if people see their names mutilated or distorted. If your language uses the Latin script, all you need to do is to reproduce the name as perfectly as you can within the usual character set of your language. In this particular case, this means to provide a translation containing the c-cedilla character. If your language uses a different script and the people speaking it don't usually read Latin words, it means transliteration. If the programmer used the simple case, you should still give, in parentheses, the original writing of the name – for the sake of the people that do read the Latin script. If the programmer used the ‘ propername ' module mentioned above, you don't need to give the original writing of the name in parentheses, because the program will already do so. Here is an example, using Greek as the target script: #. This is a proper name. See the gettext #. manual, section Names. Note this is actually a non-ASCII #. name: The first name is (with Unicode escapes) #. "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois". #. Pronunciation is like "fraa-swa pee-nar". msgid "Francois Pinard" msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho" " (Francois Pinard)" Because translation of names is such a sensitive domain, it is a good idea to test your translation before submitting it. Previous: Marking Proper Names for Translation , Up: Preparing Program Sources [ Contents ][ Index ] 4.10 Preparing Library Sources When you are preparing a library, not a program, for the use of gettext , only a few details are different. Here we assume that the library has a translation domain and a POT file of its own. (If it uses the translation domain and POT file of the main program, then the previous sections apply without changes.) The library code doesn't call setlocale (LC_ALL, "") . It's the responsibility of the main program to set the locale. The library's documentation should mention this fact, so that developers of programs using the library are aware of it. The library code doesn't call textdomain (PACKAGE) , because it would interfere with the text domain set by the main program. The initialization code for a program was setlocale (LC_ALL, ""); bindtextdomain (PACKAGE, LOCALEDIR); textdomain (PACKAGE); For a library it is reduced to bindtextdomain (PACKAGE, LOCALEDIR); If your library's API doesn't already have an initialization function, you need to create one, containing at least the bindtextdomain invocation. However, you usually don't need to export and document this initialization function: It is sufficient that all entry points of the library call the initialization function if it hasn't been called before. The typical idiom used to achieve this is a static boolean variable that indicates whether the initialization function has been called. If the library is meant to be used in multithreaded applications, this variable needs to be marked volatile , so that its value get propagated between threads. Like this: static volatile bool libfoo_initialized; static void libfoo_initialize (void) { bindtextdomain (PACKAGE, LOCALEDIR); libfoo_initialized = true; } /* This function is part of the exported API. */ struct foo * create_foo (...) { /* Must ensure the initialization is performed. */ if (!libfoo_initialized) libfoo_initialize (); ... } /* This function is part of the exported API. The argument must be non-NULL and have been created through create_foo(). */ int foo_refcount (struct foo *argument) { /* No need to invoke the initialization function here, because create_foo() must already have been called before. */ ... } The more general solution for initialization functions, POSIX pthread_once , is not needed in this case. The usual declaration of the ‘ _ ' macro in each source file was #include <libintl.h> #define _(String) gettext (String) for a program. For a library, which has its own translation domain, it reads like this: #include <libintl.h> #define _(String) dgettext (PACKAGE, String) In other words, dgettext is used instead of gettext . Similarly, the dngettext function should be used in place of the ngettext function. Next: Creating a New PO File , Previous: Preparing Program Sources , Up: GNU gettext utilities [ Contents ][ Index ] 5 Making the PO Template File After preparing the sources, the programmer creates a PO template file. This section explains how to use xgettext for this purpose. xgettext creates a file named domainname .po . You should then rename it to domainname .pot . (Why doesn't xgettext create it under the name domainname .pot right away? The answer is: for historical reasons. When xgettext was specified, the distinction between a PO file and PO file template was fuzzy, and the suffix ‘ .pot ' wasn't in use at that time.) Invoking the xgettext Program Up: Making the PO Template File [ Contents ][ Index ] 5.1 Invoking the xgettext Program xgettext [ option ] [ inputfile ] … The xgettext program extracts translatable strings from given input files. Input file location Output file location Choice of input file language Input file interpretation Operation mode Language specific options Output details Informative output 5.1.1 Input file location ‘ inputfile … ' Input files. ‘ -f file ' ¶ ‘ --files-from= file ' Read the names of the input files from file instead of getting them from the command line. ‘ -D directory ' ¶ ‘ --directory= directory ' Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting .po file will be written relative to the current directory, though. If inputfile is ‘ - ', standard input is read. 5.1.2 Output file location ‘ -d name ' ¶ ‘ --default-domain= name ' Use name .po for output (instead of messages.po ). ‘ -o file ' ¶ ‘ --output= file ' Write output to specified file (instead of name .po or messages.po ). ‘ -p dir ' ¶ ‘ --output-dir= dir ' Output files will be placed in directory dir . If the output file is ‘ - ' or ‘ /dev/stdout ', the output is written to standard output. 5.1.3 Choice of input file language ‘ -L name ' ¶ ‘ --language= name ' Specifies the language of the input files. The supported languages are C , C++ , ObjectiveC , PO , Shell , Python , Lisp , EmacsLisp , librep , Scheme , Smalltalk , Java , JavaProperties , C# , awk , YCP , Tcl , Perl , PHP , Ruby , GCC-source , NXStringTable , RST , RSJ , Glade , Lua , JavaScript , Vala , GSettings , Desktop . ‘ -C ' ¶ ‘ --c++ ' This is a shorthand for --language=C++ . By default the language is guessed depending on the input file name extension. 5.1.4 Input file interpretation ‘ --from-code= name ' ¶ Specifies the encoding of the input files. This option is needed only if some untranslated message strings or their corresponding comments contain non-ASCII characters. Note that Tcl and Glade input files are always assumed to be in UTF-8, regardless of this option. By default the input files are assumed to be in ASCII. 5.1.5 Operation mode ‘ -j ' ¶ ‘ --join-existing ' Join messages with existing file. ‘ -x file ' ¶ ‘ --exclude-file= file ' Entries from file are not extracted. file should be a PO or POT file. ‘ -c[ tag ] ' ¶ ‘ --add-comments[= tag ] ' Place comment blocks starting with tag and preceding keyword lines in the output file. Without a tag , the option means to put all comment blocks preceding keyword lines in the output file. Note that comment blocks are only extracted if there is no program code between the comment and the string that gets extracted. For example, in the following C source code: /* This is the first comment. */ gettext ("foo"); /* This is the second comment: not extracted */ gettext ( "bar"); gettext ( /* This is the third comment. */ "baz"); /* This is the fourth comment. */ gettext ("I love blank lines in my programs"); the second comment line will not be extracted, because there is a line with some tokens between the comment line and the line that contains the string. But the fourth comment is extracted, because between it and the line with the string there is merely a blank line. ‘ --check[= CHECK ] ' ¶ Perform a syntax check on msgid and msgid_plural. The supported checks are: ‘ ellipsis-unicode ' Prefer Unicode ellipsis character over ASCII ... ‘ space-ellipsis ' Prohibit whitespace before an ellipsis character ‘ quote-unicode ' Prefer Unicode quotation marks over ASCII "'` ‘ bullet-unicode ' Prefer Unicode bullet character over ASCII * or - The option has an effect on all input files. To enable or disable checks for a certain string, you can mark it with an xgettext: special comment in the source file. For example, if you specify the --check=space-ellipsis option, but want to suppress the check on a particular string, add the following comment: /* xgettext: no-space-ellipsis-check */ gettext ("We really want a space before ellipsis here ..."); The xgettext: comment can be followed by flags separated with a comma. The possible flags are of the form ‘ [no-] name -check ', where name is the name of a valid syntax check. If a flag is prefixed by no- , the meaning is negated. Some tests apply the checks to each sentence within the msgid, rather than the whole string. xgettext detects the end of sentence by performing a pattern match, which usually looks for a period followed by a certain number of spaces. The number is specified with the --sentence-end option. ‘ --sentence-end[= TYPE ] ' ¶ The supported values are: ‘ single-space ' Expect at least one whitespace after a period ‘ double-space ' Expect at least two whitespaces after a period 5.1.6 Language specific options ‘ -a ' ¶ ‘ --extract-all ' Extract all strings. This option has an effect with most languages, namely C, C++, ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Java, C#, awk, Tcl, Perl, PHP, GCC-source, Glade, Lua, JavaScript, Vala, GSettings. ‘ -k[ keywordspec ] ' ¶ ‘ --keyword[= keywordspec ] ' Specify keywordspec as an additional keyword to be looked for. Without a keywordspec , the option means to not use default keywords. If keywordspec is a C identifier id , xgettext looks for strings in the first argument of each call to the function or macro id . If keywordspec is of the form ‘ id : argnum ', xgettext looks for strings in the argnum th argument of the call. If keywordspec is of the form ‘ id : argnum1 , argnum2 ', xgettext looks for strings in the argnum1 st argument and in the argnum2 nd argument of the call, and treats them as singular/plural variants for a message with plural handling. Also, if keywordspec is of the form ‘ id : contextargnum c, argnum ' or ‘ id : argnum , contextargnum c ', xgettext treats strings in the contextargnum th argument as a context specifier. And, as a special-purpose support for GNOME, if keywordspec is of the form ‘ id : argnum g ', xgettext recognizes the argnum th argument as a string with context, using the GNOME glib syntax ‘ "msgctxt|msgid" '. Furthermore, if keywordspec is of the form ‘ id :…, totalnumargs t ', xgettext recognizes this argument specification only if the number of actual arguments is equal to totalnumargs . This is useful for disambiguating overloaded function calls in C++. Finally, if keywordspec is of the form ‘ id : argnum ...," xcomment " ', xgettext , when extracting a message from the specified argument strings, adds an extracted comment xcomment to the message. Note that when used through a normal shell command line, the double-quotes around the xcomment need to be escaped. This option has an effect with most languages, namely C, C++, ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Java, C#, awk, Tcl, Perl, PHP, GCC-source, Glade, Lua, JavaScript, Vala, GSettings, Desktop. The default keyword specifications, which are always looked for if not explicitly disabled, are language dependent. They are: For C, C++, and GCC-source: gettext , dgettext:2 , dcgettext:2 , ngettext:1,2 , dngettext:2,3 , dcngettext:2,3 , gettext_noop , and pgettext:1c,2 , dpgettext:2c,3 , dcpgettext:2c,3 , npgettext:1c,2,3 , dnpgettext:2c,3,4 , dcnpgettext:2c,3,4 . For Objective C: Like for C, and also NSLocalizedString , _ , NSLocalizedStaticString , __ . For Shell scripts: gettext , ngettext:1,2 , eval_gettext , eval_ngettext:1,2 , eval_pgettext:1c,2 , eval_npgettext:1c,2,3 . For Python: gettext , ugettext , dgettext:2 , ngettext:1,2 , ungettext:1,2 , dngettext:2,3 , _ . For Lisp: gettext , ngettext:1,2 , gettext-noop . For EmacsLisp: _ . For librep: _ . For Scheme: gettext , ngettext:1,2 , gettext-noop . For Java: GettextResource.gettext:2 , GettextResource.ngettext:2,3 , GettextResource.pgettext:2c,3 , GettextResource.npgettext:2c,3,4 , gettext , ngettext:1,2 , pgettext:1c,2 , npgettext:1c,2,3 , getString . For C#: GetString , GetPluralString:1,2 , GetParticularString:1c,2 , GetParticularPluralString:1c,2,3 . For awk: dcgettext , dcngettext:1,2 . For Tcl: ::msgcat::mc . For Perl: gettext , %gettext , $gettext , dgettext:2 , dcgettext:2 , ngettext:1,2 , dngettext:2,3 , dcngettext:2,3 , gettext_noop . For PHP: _ , gettext , dgettext:2 , dcgettext:2 , ngettext:1,2 , dngettext:2,3 , dcngettext:2,3 . For Glade 1: label , title , text , format , copyright , comments , preview_text , tooltip . For Lua: _ , gettext.gettext , gettext.dgettext:2 , gettext.dcgettext:2 , gettext.ngettext:1,2 , gettext.dngettext:2,3 , gettext.dcngettext:2,3 . For JavaScript: _ , gettext , dgettext:2 , dcgettext:2 , ngettext:1,2 , dngettext:2,3 , pgettext:1c,2 , dpgettext:2c,3 . For Vala: _ , Q_ , N_ , NC_ , dgettext:2 , dcgettext:2 , ngettext:1,2 , dngettext:2,3 , dpgettext:2c,3 , dpgettext2:2c,3 . For Desktop: Name , GenericName , Comment , Keywords . To disable the default keyword specifications, the option ‘ -k ' or ‘ --keyword ' or ‘ --keyword= ', without a keywordspec , can be used. ‘ --flag= word : arg : flag ' ¶ Specifies additional flags for strings occurring as part of the arg th argument of the function word . The possible flags are the possible format string indicators, such as ‘ c-format ', and their negations, such as ‘ no-c-format ', possibly prefixed with ‘ pass- '. The meaning of --flag= function : arg : lang -format is that in language lang , the specified function expects as arg th argument a format string. (For those of you familiar with GCC function attributes, --flag= function : arg :c-format is roughly equivalent to the declaration ‘ __attribute__ ((__format__ (__printf__, arg , ...))) ' attached to function in a C source file.) For example, if you use the ‘ error ' function from GNU libc, you can specify its behaviour through --flag=error:3:c-format . The effect of this specification is that xgettext will mark as format strings all gettext invocations that occur as arg th argument of function . This is useful when such strings contain no format string directives: together with the checks done by ‘ msgfmt -c ' it will ensure that translators cannot accidentally use format string directives that would lead to a crash at runtime. The meaning of --flag= function : arg :pass- lang -format is that in language lang , if the function call occurs in a position that must yield a format string, then its arg th argument must yield a format string of the same type as well. (If you know GCC function attributes, the --flag= function : arg :pass-c-format option is roughly equivalent to the declaration ‘ __attribute__ ((__format_arg__ ( arg ))) ' attached to function in a C source file.) For example, if you use the ‘ _ ' shortcut for the gettext function, you should use --flag=_:1:pass-c-format . The effect of this specification is that xgettext will propagate a format string requirement for a _("string") call to its first argument, the literal "string" , and thus mark it as a format string. This is useful when such strings contain no format string directives: together with the checks done by ‘ msgfmt -c ' it will ensure that translators cannot accidentally use format string directives that would lead to a crash at runtime. This option has an effect with most languages, namely C, C++, ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Scheme, Java, C#, awk, YCP, Tcl, Perl, PHP, GCC-source, Lua, JavaScript, Vala. ‘ -T ' ¶ ‘ --trigraphs ' Understand ANSI C trigraphs for input. This option has an effect only with the languages C, C++, ObjectiveC. ‘ --qt ' ¶ Recognize Qt format strings. This option has an effect only with the language C++. ‘ --kde ' ¶ Recognize KDE 4 format strings. This option has an effect only with the language C++. ‘ --boost ' ¶ Recognize Boost format strings. This option has an effect only with the language C++. ‘ --debug ' ¶ Use the flags c-format and possible-c-format to show who was responsible for marking a message as a format string. The latter form is used if the xgettext program decided, the former form is used if the programmer prescribed it. By default only the c-format form is used. The translator should not have to care about these details. This implementation of xgettext is able to process a few awkward cases, like strings in preprocessor macros, ANSI concatenation of adjacent strings, and escaped end of lines for continued strings. 5.1.7 Output details ‘ --color ' ¶ ‘ --color= when ' Specify whether or when to use colors and other text attributes. See The --color option for details. ‘ --style= style_file ' ¶ Specify the CSS style rule file to use for --color . See The --style option for details. ‘ --force-po ' ¶ Always write an output file even if no message is defined. ‘ -i ' ¶ ‘ --indent ' Write the .po file using indented style. ‘ --no-location ' ¶ Do not write ‘ #: filename : line ' lines. Note that using this option makes it harder for technically skilled translators to understand each message's context. ‘ -n ' ¶ ‘ --add-location= type ' Generate ‘ #: filename : line ' lines (default). The optional type can be either ‘ full ', ‘ file ', or ‘ never '. If it is not given or ‘ full ', it generates the lines with both file name and line number. If it is ‘ file ', the line number part is omitted. If it is ‘ never ', it completely suppresses the lines (same as --no-location ). ‘ --strict ' ¶ Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions. ‘ --properties-output ' ¶ Write out a Java ResourceBundle in Java .properties syntax. Note that this file format doesn't support plural forms and silently drops obsolete messages. ‘ --stringtable-output ' ¶ Write out a NeXTstep/GNUstep localized resource file in .strings syntax. Note that this file format doesn't support plural forms. ‘ --its= file ' ¶ Use ITS rules defined in file . Note that this is only effective with XML files. ‘ --itstool ' ¶ Write out comments recognized by itstool ( http://itstool.org ). Note that this is only effective with XML files. ‘ -w number ' ¶ ‘ --width= number ' Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number . ‘ --no-wrap ' ¶ Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split. ‘ -s ' ¶ ‘ --sort-output ' Generate sorted output (deprecated). Note that using this option makes it much harder for the translator to understand each message's context. ‘ -F ' ¶ ‘ --sort-by-file ' Sort output by file location. ‘ --omit-header ' ¶ Don't write header with ‘ msgid "" ' entry. This is useful for testing purposes because it eliminates a source of variance for generated .gmo files. With --omit-header , two invocations of xgettext on the same files with the same options at different times are guaranteed to produce the same results. Note that using this option will lead to an error if the resulting file would not entirely be in ASCII. ‘ --copyright-holder= string ' ¶ Set the copyright holder in the output. string should be the copyright holder of the surrounding package. (Note that the msgstr strings, extracted from the package's sources, belong to the copyright holder of the package.) Translators are expected to transfer or disclaim the copyright for their translations, so that package maintainers can distribute them without legal risk. If string is empty, the output files are marked as being in the public domain; in this case, the translators are expected to disclaim their copyright, again so that package maintainers can distribute them without legal risk. The default value for string is the Free Software Foundation, Inc., simply because xgettext was first used in the GNU project. ‘ --foreign-user ' ¶ Omit FSF copyright in output. This option is equivalent to ‘ --copyright-holder='' '. It can be useful for packages outside the GNU project that want their translations to be in the public domain. ‘ --package-name= package ' ¶ Set the package name in the header of the output. ‘ --package-version= version ' ¶ Set the package version in the header of the output. This option has an effect only if the ‘ --package-name ' option is also used. ‘ --msgid-bugs-address= email@address ' ¶ Set the reporting address for msgid bugs. This is the email address or URL to which the translators shall report bugs in the untranslated strings: - Strings which are not entire sentences; see the maintainer guidelines in Preparing Translatable Strings . - Strings which use unclear terms or require additional context to be understood. - Strings which make invalid assumptions about notation of date, time or money. - Pluralisation problems. - Incorrect English spelling. - Incorrect formatting. It can be your email address, or a mailing list address where translators can write to without being subscribed, or the URL of a web page through which the translators can contact you. The default value is empty, which means that translators will be clueless! Don't forget to specify this option. ‘ -m[ string ] ' ¶ ‘ --msgstr-prefix[= string ] ' Use string (or "" if not specified) as prefix for msgstr values. ‘ -M[ string ] ' ¶ ‘ --msgstr-suffix[= string ] ' Use string (or "" if not specified) as suffix for msgstr values. 5.1.8 Informative output ‘ -h ' ¶ ‘ --help ' Display this help and exit. ‘ -V ' ¶ ‘ --version ' Output version information and exit. ‘ -v ' ¶ ‘ --verbose ' Increase verbosity level. Next: Updating Existing PO Files , Previous: Making the PO Template File , Up: GNU gettext utilities [ Contents ][ Index ] 6 Creating a New PO File When starting a new translation, the translator creates a file called LANG .po , as a copy of the package .pot template file with modifications in the initial comments (at the beginning of the file) and in the header entry (the first entry, near the beginning of the file). The easiest way to do so is by use of the ‘ msginit ' program. For example: $ cd PACKAGE - VERSION $ cd po $ msginit The alternative way is to do the copy and modifications by hand. To do so, the translator copies package .pot to LANG .po . Then she modifies the initial comments and the header entry of this file. Invoking the msginit Program Filling in the Header Entry Next: Filling in the Header Entry , Up: Creating a New PO File [ Contents ][ Index ] 6.1 Invoking the msginit Program msginit [ option ] The msginit program creates a new PO file, initializing the meta information with values from the user's environment. Here are more details. The following header fields of a PO file are automatically filled, when possible. ‘ Project-Id-Version ' The value is guessed from the configure script or any other files in the current directory. ‘ PO-Revision-Date ' The value is taken from the PO-Creation-Data in the input POT file, or the current date is used. ‘ Last-Translator ' The value is taken from user's password file entry and the mailer configuration files. ‘ Language-Team, Language ' These values are set according to the current locale and the predefined list of translation teams. ‘ MIME-Version, Content-Type, Content-Transfer-Encoding ' These values are set according to the content of the POT file and the current locale. If the POT file contains charset=UTF-8, it means that the POT file contains non-ASCII characters, and we keep the UTF-8 encoding. Otherwise, when the POT file is plain ASCII, we use the locale's encoding. ‘ Plural-Forms ' The value is first looked up from the embedded table. As an experimental feature, you can instruct msginit to use the information from Unicode CLDR, by setting the GETTEXTCLDRDIR environment variable. The program will look for a file named common/supplemental/plurals.xml under that directory. You can get the CLDR data from http://cldr.unicode.org/ . Input file location Output file location Input file syntax Output details Informative output 6.1.1 Input file location ‘ -i inputfile ' ¶ ‘ --input= inputfile ' Input POT file. If no inputfile is given, the current directory is searched for the POT file. If it is ‘ - ', standard input is read. 6.1.2 Output file location ‘ -o file ' ¶ ‘ --output-file= file ' Write output to specified PO file. If no output file is given, it depends on the ‘ --locale ' option or the user's locale setting. If it is ‘ - ', the results are written to standard output. 6.1.3 Input file syntax ‘ -P ' ¶ ‘ --properties-input ' Assume the input file is a Java ResourceBundle in Java .properties syntax, not in PO file syntax. ‘ --stringtable-input ' ¶ Assume the input file is a NeXTstep/GNUstep localized resource file in .strings syntax, not in PO file syntax. 6.1.4 Output details ‘ -l ll_CC[.encoding] ' ¶ ‘ --locale= ll_CC[.encoding] ' Set target locale. ll should be a language code, and CC should be a country code. The optional part .encoding specifies the encoding of the locale; most often this part is .UTF-8 . The command ‘ locale -a ' can be used to output a list of all installed locales. The default is the user's locale setting. ‘ --no-translator ' ¶ Declares that the PO file will not have a human translator and is instead automatically generated. ‘ --color ' ¶ ‘ --color= when ' Specify whether or when to use colors and other text attributes. See The --color option for details. ‘ --style= style_file ' ¶ Specify the CSS style rule file to use for --color . See The --style option for details. ‘ -p ' ¶ ‘ --properties-output ' Write out a Java ResourceBundle in Java .properties syntax. Note that this file format doesn't support plural forms and silently drops obsolete messages. ‘ --stringtable-output ' ¶ Write out a NeXTstep/GNUstep localized resource file in .strings syntax. Note that this file format doesn't support plural forms. ‘ -w number ' ¶ ‘ --width= number ' Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number . ‘ --no-wrap ' ¶ Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split. 6.1.5 Informative output ‘ -h ' ¶ ‘ --help ' Display this help and exit. ‘ -V ' ¶ ‘ --version ' Output version information and exit. Previous: Invoking the msginit Program , Up: Creating a New PO File [ Contents ][ Index ] 6.2 Filling in the Header Entry The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and "FIRST AUTHOR <EMAIL@ADDRESS>, YEAR" ought to be replaced by sensible information. This can be done in any text editor; if Emacs is used and it switched to PO mode automatically (because it has recognized the file's suffix), you can disable it by typing M-x fundamental-mode . Modifying the header entry can already be done using PO mode: in Emacs, type M-x po-mode RET and then RET again to start editing the entry. You should fill in the following fields. Project-Id-Version This is the name and version of the package. Fill it in if it has not already been filled in by xgettext . Report-Msgid-Bugs-To This has already been filled in by xgettext . It contains an email address or URL where you can report bugs in the untranslated strings: - Strings which are not entire sentences, see the maintainer guidelines in Preparing Translatable Strings . - Strings which use unclear terms or require additional context to be understood. - Strings which make invalid assumptions about notation of date, time or money. - Pluralisation problems. - Incorrect English spelling. - Incorrect formatting. POT-Creation-Date This has already been filled in by xgettext . PO-Revision-Date You don't need to fill this in. It will be filled by the PO file editor when you save the file. Last-Translator Fill in your name and email address (without double quotes). Language-Team Fill in the English name of the language, and the email address or homepage URL of the language team you are part of. Before starting a translation, it is a good idea to get in touch with your translation team, not only to make sure you don't do duplicated work, but also to coordinate difficult linguistic issues. In the Free Translation Project, each translation team has its own mailing list. The up-to-date list of teams can be found at the Free Translation Project's homepage, https://translationproject.org/ , in the "Teams" area. Language Fill in the language code of the language. This can be in one of three forms: - ‘ ll ', an ISO 639 two-letter language code (lowercase). See Language Codes for the list of codes. - ‘ ll _ CC ', where ‘ ll ' is an ISO 639 two-letter language code (lowercase) and ‘ CC ' is an ISO 3166 two-letter country code (uppercase). The country code specification is not redundant: Some languages have dialects in different countries. For example, ‘ de_AT ' is used for Austria, and ‘ pt_BR ' for Brazil. The country ...
http://www.gnu.org/savannah-checkouts/gnu/gettext/manual/gettext.html - [detail] - [similar]
PREV NEXT
Powered by Hyper Estraier 1.4.13, with 213332 documents and 1081104 words.