Previous: AppData, Up: List of Data Formats [Contents][Index]
Marking translatable strings in an XML file is done through a separate
"rule" file, making use of the Internationalization Tag Set standard
(ITS, http://www.w3.org/TR/its20/). The currently supported ITS
data categories are: ‘Translate’, ‘Localization Note’,
‘Elements Within Text’, and ‘Preserve Space’. In addition to
them, xgettext also recognizes the following extended data
categories:
This data category associates msgctxt to the extracted text. In
the global rule, the contextRule element contains the following:
selector attribute. It contains an absolute selector
that selects the nodes to which this rule applies.
contextPointer attribute that contains a relative
selector pointing to a node that holds the msgctxt value.
textPointer attribute that contains a relative
selector pointing to a node that holds the msgid value.
This data category indicates whether the special XML characters
(<, >, &, ") are escaped with entity
reference. In the global rule, the escapeRule element contains
the following:
selector attribute. It contains an absolute selector
that selects the nodes to which this rule applies.
escape attribute with the value yes or no.
This data category extends the standard ‘Preserve Space’ data
category with the additional value ‘trim’. The value means to
remove the leading and trailing whitespaces of the content, but not to
normalize whitespaces in the middle. In the global rule, the
preserveSpaceRule element contains the following:
selector attribute. It contains an absolute selector
that selects the nodes to which this rule applies.
space attribute with the value default,
preserve, or trim.
All those extended data categories can only be expressed with global
rules, and the rule elements have to have the
https://www.gnu.org/s/gettext/ns/its/extensions/1.0 namespace.
Given the following XML document in a file messages.xml:
<?xml version="1.0"?>
<messages>
<message>
<p>A translatable string</p>
</message>
<message>
<p translatable="no">A non-translatable string</p>
</message>
</messages>
To extract the first text content ("A translatable string"), but not the second ("A non-translatable string"), the following ITS rules can be used:
<?xml version="1.0"?>
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
<its:translateRule selector="/messages" translate="no"/>
<its:translateRule selector="//message/p" translate="yes"/>
<!-- If 'p' has an attribute 'translatable' with the value 'no', then
the content is not translatable. -->
<its:translateRule selector="//message/p[@translatable = 'no']"
translate="no"/>
</its:rules>
‘xgettext’ needs another file called "locating rule" to associate an ITS rule with an XML file. If the above ITS file is saved as messages.its, the locating rule would look like:
<?xml version="1.0"?>
<locatingRules>
<locatingRule name="Messages" pattern="*.xml">
<documentRule localName="messages" target="messages.its"/>
</locatingRule>
<locatingRule name="Messages" pattern="*.msg" target="messages.its"/>
</locatingRules>
The locatingRule element must have a pattern attribute,
which denotes either a literal file name or a wildcard pattern of the
XML file7. The locatingRule element can have child
documentRule element, which adds checks on the content of the XML
file.
The first rule matches any file with the .xml file extension, but it only applies to XML files whose root element is ‘<messages>’.
The second rule indicates that the same ITS rule file are also
applicable to any file with the .msg file extension. The
optional name attribute of locatingRule allows to choose
rules by name, typically with xgettext’s -L option.
The associated ITS rule file is indicated by the target attribute
of locatingRule or documentRule. If it is specified in a
documentRule element, the parent locatingRule shouldn’t
have the target attribute.
Locating rule files must have the .loc file extension. Both ITS
rule files and locating rule files must be installed in the
$prefix/share/gettext/its directory. Once those files are
properly installed, xgettext can extract translatable strings
from the matching XML files.
For XML, there are two use-cases of translated strings. One is the case where the translated strings are directly consumed by programs, and the other is the case where the translated strings are merged back to the original XML document. In the former case, special characters in the extracted strings shouldn’t be escaped, while they should in the latter case. To control wheter to escape special characters, the ‘Escape Special Characters’ data category can be used.
To merge the translations, the ‘msgfmt’ program can be used with
the option --xml. See msgfmt Invocation, for more details
about how one calls the ‘msgfmt’ program. ‘msgfmt’’s
--xml option doesn’t perform character escaping, so translated
strings can have arbitrary XML constructs, such as elements for markup.
Note that the file name matching is done after
removing any .in suffix from the input file name. Thus the
pattern attribute must not include a pattern matching .in.
For example, if the input file name is foo.msg.in, the pattern
should be either *.msg or just *, rather than
*.in.
Previous: AppData, Up: List of Data Formats [Contents][Index]