Cyrillic and Japanese - The Plotutils Package

Next: Text Fonts in X, Previous: Text Fonts, Up: Fonts and Markers

A.2 Cyrillic and Japanese fonts

The built-in fonts discussed in the previous section include Cyrillic and Japanese vector fonts. This section explains how these fonts are encoded, i.e., how their character maps are laid out. You may use the plotfont utility to display the character map for any font, including the Cyrillic and Japanese vector fonts. See plotfont.

The HersheyCyrillic and HersheyCyrillic-Oblique fonts use an encoding called KOI8-R, a superset of ASCII that has become the de facto standard for Unix and networking applications in the former Soviet Union. Insofar as printable ASCII characters go, they resemble the HersheySerif vector font. But their upper halves are different. The byte range 0xc0...0xdf contains lower-case Cyrillic characters and the byte range 0xe0...0xff contains upper case Cyrillic characters. Additional Cyrillic characters are located at 0xa3 and 0xb3. For more on the encoding scheme, see the official KOI8-R Web page and Internet RFC 1489, which is available in many places, including Information Sciences Institute.

The HersheyEUC font is a vector font that is used for displaying Japanese text. It uses the 8-bit EUC-JP encoding. EUC stands for `extended Unix code', which is a scheme for encoding Japanese, and also other character sets (e.g., Greek and Cyrillic) as multibyte character strings. The format of EUC strings is explained in Ken Lunde's Understanding Japanese Information Processing (O'Reilly, 1993), which contains much additional information on Japanese text processing. See also his on-line supplement, and his more recent book CJKV Information Processing (O'Reilly, 1999).

In the HersheyEUC font, characters in the printable ASCII range, 0x20...0x7e, are similar to HersheySerif (their encoding is `JIS Roman', an ASCII variant standardized by the Japanese Industrial Standards Committee). Also, each successive pair of bytes in the 0xa1...0xfe range defines a single character in the JIS X0208 standard. The characters in the JIS X0208 standard include Japanese syllabic characters (Hiragana and Katakana), ideographic characters (Kanji), Roman, Greek, and Cyrillic alphabets, punctuation marks, and miscellaneous symbols. For example, the JIS X0208 standard indexes the 83 Hiragana as 0x2421...0x2473. To obtain the EUC code for any JIS X0208 character, you would add 0x80 to each byte (i.e., `set the high bit' on each byte). So the first of the 83 Hiragana (0x2421) would be encoded as the successive pair of bytes 0xa4 and 0xa1.

The implementation of the JIS X0208 standard in the HersheyEUC font is based on Dr. Hershey's digitizations, and is complete enough to be useful. All 83 Hiragana and 86 Katakana are available, though the little-used `half-width Katakana' are not supported. Also, 603 Kanji are available, including 596 of the 2965 JIS Level 1 (i.e., frequently used) Kanji. The Hiragana, the Katakana, and the available Kanji all have the same width. The file kanji.doc, which on most systems is installed in /usr/share/libplot or /usr/local/share/libplot, lists the 603 available Kanji. Each JIS X0208 character that is unavailable will be drawn as an `undefined character' glyph (a bundle of horizontal lines).

The eight Hewlett–Packard vector fonts in the ArcANK and StickANK typefaces are also used for displaying Japanese text. They are available when producing HP-GL/2 output, or HP-GL output for the HP7550A graphics plotter and the HP758x, HP7595A and HP7596A drafting plotters. That is, they are available only if HPGL_VERSION is "2" (the default) or "1.5".

ANK stands for Alphabet, Numerals, and Katakana. The ANK fonts use a special mixed encoding. The lower half of each font uses the JIS Roman encoding, and the upper half contains half-width Katakana. Half-width Katakana are simplified Katakana that may need to be equipped with diacritical marks. The diacritical marks are included in the encoding as separate characters.