Next: , Previous: , Up: Text   [Contents][Index]


5.1.9 Input Encodings

The groff command’s -k option calls the preconv preprocessor to perform input character encoding conversions. Input to the GNU troff formatter itself, on the other hand, must be in one of two encodings it can recognize.

cp1047

The code page 1047 input encoding works only on EBCDIC platforms (and conversely, the other input encodings don’t work with EBCDIC); the file cp1047.tmac is loaded at startup.

latin1

ISO Latin-1, an encoding for Western European languages, is the default input encoding on non-EBCDIC platforms; the file latin1.tmac is loaded at startup.

Any document that is encoded in ISO 646:1991 (a descendant of USAS X3.4-1968 or “US-ASCII”), or, equivalently, uses only code points from the “C0 Controls” and “Basic Latin” parts of the Unicode character set is also a valid ISO Latin-1 document; the standards are interchangeable in their first 128 code points.30

Other encodings are supported by means of macro packages.

latin2

To use ISO Latin-2, an encoding for Central and Eastern European languages, invoke ‘.mso latin2.tmac at the beginning of your document or supply ‘-mlatin2’ as a command-line argument to groff.

latin5

To use ISO Latin-5, an encoding for the Turkish language, invoke ‘.mso latin5.tmac at the beginning of your document or supply ‘-mlatin5’ as a command-line argument to groff.

latin9

ISO Latin-9 succeeds Latin-1; it includes a Euro sign and better glyph coverage for French. To use this encoding, invoke ‘.mso latin9.tmac at the beginning of your document or supply ‘-mlatin9’ as a command-line argument to groff.

Some characters from an input encoding may not be available with a particular output driver, or their glyphs may not have representation in the font used. For terminal devices, fallbacks are defined, like ‘EUR’ for the Euro sign and ‘(C)’ for the copyright sign. For typesetter devices, you may need to “mount” fonts that support glyphs required by the document. See Font Positions.

Because a Euro glyph was not historically defined in PostScript fonts, groff comes with a font called freeeuro.pfa that provides the Euro in several styles. Standard PostScript fonts contain the glyphs from Latin-5 and Latin-9 that Latin-1 lacks, so these encodings are supported for the ps and pdf output devices as groff ships, while Latin-2 is not.

Unicode supports characters from all other input encodings; the utf8 output driver for terminals therefore does as well. The DVI output driver supports the Latin-2 and Latin-9 encodings if the command-line option -mec is used as well. 31


Next: , Previous: , Up: Text   [Contents][Index]