Hardcoded string concatenation is sometimes used to construct English strings:
strcpy (s, "Replace "); strcat (s, object1); strcat (s, " with "); strcat (s, object2); strcat (s, "?");
In order to present to the translator only entire sentences, and also
because in some languages the translator might want to swap the order
of object1 and object2, it is necessary to change this
to use a format string:
sprintf (s, "Replace %s with %s?", object1, object2);
In many programming languages, a particular operator denotes string concatenation at runtime (or possibly at compile time, if the compiler supports that).
std::string objects
is denoted by the ‘+’ operator.
So, for example, in Java, you would change
System.out.println("Replace "+object1+" with "+object2+"?");
into a statement involving a format string:
System.out.println(
MessageFormat.format("Replace {0} with {1}?",
new Object[] { object1, object2 }));
Similarly, in C#, you would change
Console.WriteLine("Replace "+object1+" with "+object2+"?");
into a statement involving a format string:
Console.WriteLine(
String.Format("Replace {0} with {1}?", object1, object2));
In some programming languages, it is possible to have strings with embedded expressions. The expressions can refer to variables of the program. The value of such an expression is converted to a string and inserted in place of the expression; but no formatting function is called.
f"Hello, {name}!".
$"Hello, {name}!".
`Hello, ${name}!`.
"Hello, #{name}!".
"Hello, $name!" or "Hello, ${name}!".
i"Hello, $(name)!".
"Hello, $name!".
"Hello, $name!".
"Hello, $name!".
These cases are effectively string concatenation as well, just with a different syntax.
So, for example, in Python, you would change
print (f'Replace {object1.name} with {object2.name}?')
into a statement involving a format string:
print ('Replace %(name1)s with %(name2)s?'
% { 'name1': object1.name, 'name2': object2.name })
or equivalently
print ('Replace {name1} with {name2}?'
.format(name1 = object1.name, name2 = object2.name))
And in JavaScript, you would change
print (`Replace ${object1.name} with ${object2.name}?`)
into a statement involving a format string:
print ('Replace %s with %s?'.format(object1.name, object2.name))
Specifically in JavaScript, an alternative is to use a tagged template literal:
print (tag`Replace ${object1.name} with ${object2.name}?`)
and pass an option ‘--tag=tag:format’ to xgettext.
Format strings with embedded named references are different:
They are suitable for internationalization, because it is possible
to insert a call to the gettext function (that will return a
translated format string) before the argument values are
inserted in place of the placeholders.
The format string types that allow embedded named references are:
<inttypes.h> macros ¶A similar case is compile time concatenation of strings. The ISO C 99
include file <inttypes.h> contains a macro PRId64 that
can be used as a formatting directive for outputting an ‘int64_t’
integer through printf. It expands to a constant string, usually
"d" or "ld" or "lld" or something like this, depending on the platform.
Assume you have code like
printf ("The amount is %0" PRId64 "\n", number);
The gettext tools and library have special support for these
<inttypes.h> macros. You can therefore simply write
printf (gettext ("The amount is %0" PRId64 "\n"), number);
The PO file will contain the string "The amount is %0<PRId64>\n".
The translators will provide a translation containing "%0<PRId64>"
as well, and at runtime the gettext function’s result will
contain the appropriate constant string, "d" or "ld" or "lld".
This works only for the predefined <inttypes.h> macros. If
you have defined your own similar macros, let’s say ‘MYPRId64’,
that are not known to xgettext, the solution for this problem
is to change the code like this:
char buf1[100];
sprintf (buf1, "%0" MYPRId64, number);
printf (gettext ("The amount is %s\n"), buf1);
This means, you put the platform dependent code in one statement, and the internationalization code in a different statement. Note that a buffer length of 100 is safe, because all available hardware integer types are limited to 128 bits, and to print a 128 bit integer one needs at most 54 characters, regardless whether in decimal, octal or hexadecimal.