Different countries and cultures have varying conventions for how to
communicate. These conventions range from very simple ones, such as the
format for representing dates and times, to very complex ones, such as
the language spoken. Provided the programs are written to obey the
choice of conventions, they will follow the conventions preferred by the
user. gnu Smalltalk provides two packages to ease you in doing so.
I18N package covers both internationalization and
multilingualization; the lighter-weight
covers only the latter, as it is a prerequisite for correct
Multilingualizing software means programming it to be able to
support languages from every part of the world. In particular, it
includes understanding multi-byte character sets (such as UTF-8)
and Unicode characters whose code point (the equivalent of the
ASCII value) is above 127. To this end, gnu Smalltalk provides the
UnicodeString class that stores its data as 32-bit Unicode
values. In addition,
Character will provide support for
all the over one million available code points in Unicode.
I18N package improves this support through
EncodedStream class1, which interprets and transcodes
non-ASCII Unicode characters. This support is mostly transparent,
because the base classes
UnicodeString are enhanced to use it. Sending
printString to an instance of
UnicodeString will convert Unicode characters so that they
are printed correctly in the current locale. For example,
`$<279> printNl' will print a small Latin letter `e' with
a dot above, when the
I18N package is loaded.
Dually, you can convert
ByteArray objects to
Unicode with a single method call. If the current locale's encoding is
UTF-8, `#[196 151] asUnicodeString' will return a Unicode string
with the same character as above, the small Latin letter `e' with
a dot above.
The implementation of multilingualization support is not yet
complete. For example, methods such as
isLetter do not yet recognize Unicode
You need to exercise some care, or your program will be buggy when
Unicode characters are used. In particular, Characters must
not be compared with
==2 and should
be printed on a Stream with
display: rather than
Also, Characters need to be created with
the class method
codePoint: if you are referring to their
codePoint: is also the only method to create
characters that is accepted by the ANSI Standard for Smalltalk.
value:, instead, should be used if you are referring
to a byte in a particular encoding. This subtle difference means
that, for example, the last two of the following examples will fail:
"Correct. Use #value: with Strings, #codePoint: with UnicodeString." String with: (Character value: 65) String with: (Character value: 128) UnicodeString with: (Character codePoint: 65) UnicodeString with: (Character codePoint: 128) "Correct. Only works for characters in the 0-127 range, which may be considered as defensive programming." String with: (Character codePoint: 65) "Dubious, and only works for characters in the 0-127 range. With UnicodeString, probably you always want #codePoint:." UnicodeString with: (Character value: 65) "Fails, we try to use a high character in a String" String with: (Character codePoint: 128) "Fails, we try to use an encoding in a Unicode string" UnicodeString with: (Character value: 128)
Internationalizing software, instead, means programming it to be able to adapt to the user's favorite conventions. These conventions can get pretty complex; for example, the user might specify the locale `espana-castellano' for most purposes, but specify the locale `usa-english' for currency formatting: this might make sense if the user is a Spanish-speaking American, working in Spanish, but representing monetary amounts in US dollars. You can see that this system is simple but, at the same time, very complete. This manual, however, is not the right place for a thorough discussion of how an user would set up his system for these conventions; for more information, refer to your operating system's manual or to the gnu C library's manual.
gnu Smalltalk inherits from iso C the concept of a locale, that is, a
collection of conventions, one convention for each purpose, and maps each of
these purposes to a Smalltalk class defined by the
I18N package, and
these classes form a small hierarchy with class
Locale as its roots:
LcMonetaryISOformat currency amounts.
LcTimeformats dates and times.
LcMessagestranslates your program's output. Of course, the package can't automatically translate your program's output messages into other languages; the only way you can support output in the user's favorite language is to translate these messages by hand. The package does, though, provide methods to easily handle translations into multiple languages.
Basic usage of the
I18N package involves a single selector, the
question mark (
?), which is a rarely used yet valid character for
a Smalltalk binary message. The meaning of the question mark selector
is “How do you say ... under your convention?”. You can send
? to either a specific instance of a subclass of
or to the class itself; in this case, rules for the default locale
(which is specified via environment variables) apply. You might say,
LcTime ? Date today or, for example,
germanMonetaryLocale ? account balance. This syntax can be at
first confusing, but turns out to be convenient because of its
consistency and overall simplicity.
Here is how
? works for different classes:
LcMessagesDomainthat retrieves translations from the specified file.
Retrieve the translation of the given string.3
These two packages provides much more functionality, including more advanced formatting options support for Unicode, and conversion to and from several character sets. For more information, refer to Multilingual and international support with Iconv and I18N.
As an aside, the representation of locales that the package uses is exactly the same as the C library, which has many advantages: the burden of mantaining locale data is removed from gnu Smalltalk's mantainers; the need of having two copies of the same data is removed from gnu Smalltalk's users; and finally, uniformity of the conventions assumed by different internationalized programs is guaranteed to the end user.
In addition, the representation of translated strings is the standard
mo file format adopted by the gnu
the classes mentioned in this section reside in the
 Character equality
= will be as fast as with
does not apply to the LcMessagesDomain class itself, but only to its
instances. This is because LcMessagesDomain is not a subclass of