Text String Format - The Plotutils Package

Next: Marker Symbols, Previous: Text Fonts in X, Up: Fonts and Markers

A.4 Text string format and escape sequences

Text strings that are drawn by the GNU libplot library and by applications built on it, such as graph, plot, pic2plot, tek2plot, and plotfont, must consist of printable characters. No embedded control characters, such as newlines or carriage returns, are allowed. Technically, a character is `printable' if it comes from either of the two byte ranges 0x20...0x7e and 0xa0...0xff. The former is the printable ASCII range and the latter is the printable `8-bit' range.

Text strings may, however, include embedded `escape sequences' that shift the font, append subscripts or superscripts, or include non-ASCII characters and mathematical symbols. As a consequence, the axis labels on a plot prepared with graph may include such features. So may the text strings that pic2plot uses to label objects.

The format of the escape sequences should look familiar to anyone who is familiar with the TeX, troff, or groff document formatters. Each escape sequence consists of three characters: a backslash and two additional characters. The most frequently used escape sequences are as follows.

"\sp": start superscript mode
"\ep": end superscript mode
"\sb": start subscript mode
"\eb": end subscript mode
"\mk": mark position
"\rt": return to marked position

For example, the string "x\sp2\ep" would be interpreted as `x squared'. Subscripts on subscripts, etc., are allowed. Subscripts and superscripts may be vertically aligned by judicious use of the "\mk" and "\rt" escape sequences. For example, "a\mk\sbi\eb\rt\sp2\ep" produces "a sub i squared", with the exponent `2' placed immediately above the subscript.

There are also escape sequences that switch from font to font within a typeface. For an enumeration of the fonts within each typeface, see Text Fonts. Suppose for example that the current font is Times-Roman, which is font #1 in the `Times' typeface. The string "A \f2very\f1 well labeled axis" would be a string in which the word `very' appears in Times-Italic rather than Times-Roman. That is because Times-Italic is the #2 font in the typeface. Font-switching escape sequences are of the form "\fn", where n is the number of the font to be switched to. For compatibility with troff and groff, "\fR", "\fI", "\fB" are equivalent to "\f1", "\f2", "\f3", respectively. "\fP" will switch the font to the previously used font (only one font is remembered). There is currently no support for switching between fonts in different typefaces.

There are also a few escape sequences for horizontal shifts, which are useful for improving horizontal alignment, such as when shifting between italic and non-italic fonts. "\r1", "\r2", "\r4", "\r6", "\r8", and "\r^" are escape sequences that shift right by 1 em, 1/2 em, 1/4 em, 1/6 em, 1/8 em, and 1/12 em, respectively. "\l1", "\l2", "\l4", "\l6", "\l8", and "\l^" are similar, but shift left instead of right. "A \fIvery\r^\fP well labeled axis" would look slightly better than "A \fIvery\fP well labeled axis".

Square roots are handled with the aid of a special pair of escape sequences, together with the "\mk" and "\rt" sequences discussed above. A square root symbol is begun with "\sr", and continued arbitrarily far to the right with the overbar (`run') escape sequence, "\rn". For example, the string "\sr\mk\rn\rn\rtab" would be plotted as `the square root of ab'. To adjust the length of the overbar, you may need to experiment with the number of times "\rn" appears.

To underline a string, you would use "\ul", the underline escape sequence, one or more times. The "\mk"..."\rt" trick would be employed in the same way. So, for example, "\mk\ul\ul\ul\rtabc" would yield an underlined "abc". To adjust the length of the underline, you may need to experiment with the number of times "\ul" appears. You may also need to use one or more of the abovementioned horizontal shifts. For example, if the "HersheySerif" font were used, "\mk\ul\ul\l8\ul\rtabc" would yield a better underline than "\mk\ul\ul\ul\rtabc".

Besides the preceding escape sequences, there are also escape sequences for the printable non-ASCII characters in each of the built-in ISO-Latin-1 fonts (which means in every built-in font, except for the symbol fonts, the HersheyCyrillic fonts, HersheyEUC, and ZapfDingbats). The useful non-ASCII characters include accented characters among others. Such `8-bit' characters, in the 0xa0...0xff byte range, may be included directly in a text string. But if your terminal does not permit this, you may use the escape sequences for them instead.

There are escape sequences for the mathematical symbols and Greek characters in the symbol fonts, as well. This is how the symbol fonts are usually accessed. Which symbol font the mathematical symbols and Greek characters are taken from depends on whether your current font is a Hershey font or a non-Hershey font. They are taken from the HersheySerifSymbol font or the HersheySansSymbol font in the former case, and from the Symbol font in the latter.

The following are the escape sequences that provide access to the non-ASCII characters of the current font, provided that it is an ISO-Latin-1 font. Each escape sequence is followed by the position of the corresponding character in the ISO-Latin-1 encoding (in decimal), and the official Postscript name of the character. Most names should be self-explanatory. For example, `eacute' is a lower-case `e', equipped with an acute accent.

"\r!": [161] exclamdown
"\ct": [162] cent
"\Po": [163] sterling
"\Cs": [164] currency
"\Ye": [165] yen
"\bb": [166] brokenbar
"\sc": [167] section
"\ad": [168] dieresis
"\co": [169] copyright
"\Of": [170] ordfeminine
"\Fo": [171] guillemotleft
"\no": [172] logicalnot
"\hy": [173] hyphen
"\rg": [174] registered
"\a-": [175] macron
"\de": [176] degree
"\+-": [177] plusminus
"\S2": [178] twosuperior
"\S3": [179] threesuperior
"\aa": [180] acute
"\*m": [181] mu
"\ps": [182] paragraph
"\md": [183] periodcentered
"\ac": [184] cedilla
"\S1": [185] onesuperior
"\Om": [186] ordmasculine
"\Fc": [187] guillemotright
"\14": [188] onequarter
"\12": [189] onehalf
"\34": [190] threequarters
"\r?": [191] questiondown
"\`A": [192] Agrave
"\'A": [193] Aacute
"\^A": [194] Acircumflex
"\~A": [195] Atilde
"\:A": [196] Adieresis
"\oA": [197] Aring
"\AE": [198] AE
"\,C": [199] Ccedilla
"\`E": [200] Egrave
"\'E": [201] Eacute
"\^E": [202] Ecircumflex
"\:E": [203] Edieresis
"\`I": [204] Igrave
"\'I": [205] Iacute
"\^I": [206] Icircumflex
"\:I": [207] Idieresis
"\-D": [208] Eth
"\~N": [209] Ntilde
"\'O": [210] Ograve
"\'O": [211] Oacute
"\^O": [212] Ocircumflex
"\~O": [213] Otilde
"\:O": [214] Odieresis
"\mu": [215] multiply
"\/O": [216] Oslash
"\`U": [217] Ugrave
"\'U": [218] Uacute
"\^U": [219] Ucircumflex
"\:U": [220] Udieresis
"\'Y": [221] Yacute
"\TP": [222] Thorn
"\ss": [223] germandbls
"\`a": [224] agrave
"\'a": [225] aacute
"\^a": [226] acircumflex
"\~a": [227] atilde
"\:a": [228] adieresis
"\oa": [229] aring
"\ae": [230] ae
"\,c": [231] ccedilla
"\`e": [232] egrave
"\'e": [233] eacute
"\^e": [234] ecircumflex
"\:e": [235] edieresis
"\`i": [236] igrave
"\'i": [237] iacute
"\^i": [238] icircumflex
"\:i": [239] idieresis
"\Sd": [240] eth
"\~n": [241] ntilde
"\`o": [242] ograve
"\'o": [243] oacute
"\^o": [244] ocircumflex
"\~o": [245] otilde
"\:o": [246] odieresis
"\di": [247] divide
"\/o": [248] oslash
"\`u": [249] ugrave
"\'u": [250] uacute
"\^u": [251] ucircumflex
"\:u": [252] udieresis
"\'y": [253] yacute
"\Tp": [254] thorn
"\:y": [255] ydieresis

The following are the escape sequences that provide access to mathematical symbols and Greek characters in the current symbol font, whether HersheySerifSymbol or HersheySansSymbol (for Hershey fonts) or Symbol (for Postscript fonts). Each escape sequence is followed by the position (in octal) of the corresponding character in the symbol encoding, and the official Postscript name of the character. Many escape sequences and names should be self-explanatory. "\*a" represents a lower-case Greek alpha, for example. For a table displaying each of the characters below, see the Postscript Language Reference Manual.

"\fa": [0042] universal
"\te": [0044] existential
"\st": [0047] suchthat
"\**": [0052] asteriskmath
"\=~": [0100] congruent
"\*A": [0101] Alpha
"\*B": [0102] Beta
"\*X": [0103] Chi
"\*D": [0104] Delta
"\*E": [0105] Epsilon
"\*F": [0106] Phi
"\*G": [0107] Gamma
"\*Y": [0110] Eta
"\*I": [0111] Iota
"\+h": [0112] theta1
"\*K": [0113] Kappa
"\*L": [0114] Lambda
"\*M": [0115] Mu
"\*N": [0116] Nu
"\*O": [0117] Omicron
"\*P": [0120] Pi
"\*H": [0121] Theta
"\*R": [0122] Rho
"\*S": [0123] Sigma
"\*T": [0124] Tau
"\*U": [0125] Upsilon
"\ts": [0126] sigma1
"\*W": [0127] Omega
"\*C": [0130] Xi
"\*Q": [0131] Psi
"\*Z": [0132] Zeta
"\tf": [0134] therefore
"\pp": [0136] perpendicular
"\ul": [0137] underline
"\rx": [0140] radicalex
"\*a": [0141] alpha
"\*b": [0142] beta
"\*x": [0143] chi
"\*d": [0144] delta
"\*e": [0145] epsilon
"\*f": [0146] phi
"\*g": [0147] gamma
"\*y": [0150] eta
"\*i": [0151] iota
"\+f": [0152] phi1
"\*k": [0153] kappa
"\*l": [0154] lambda
"\*m": [0155] mu
"\*n": [0156] nu
"\*o": [0157] omicron
"\*p": [0160] pi
"\*h": [0161] theta
"\*r": [0162] rho
"\*s": [0163] sigma
"\*t": [0164] tau
"\*u": [0165] upsilon
"\+p": [0166] omega1
"\*w": [0167] omega
"\*c": [0170] xi
"\*q": [0171] psi
"\*z": [0172] zeta
"\ap": [0176] similar
"\+U": [0241] Upsilon1
"\fm": [0242] minute
"\<=": [0243] lessequal
"\f/": [0244] fraction
"\if": [0245] infinity
"\Fn": [0246] florin
"\CL": [0247] club
"\DI": [0250] diamond
"\HE": [0251] heart
"\SP": [0252] spade
"\<>": [0253] arrowboth
"\<-": [0254] arrowleft
"\ua": [0255] arrowup
"\->": [0256] arrowright
"\da": [0257] arrowdown
"\de": [0260] degree
"\+-": [0261] plusminus
"\sd": [0262] second
"\>=": [0263] greaterequal
"\mu": [0264] multiply
"\pt": [0265] proportional
"\pd": [0266] partialdiff
"\bu": [0267] bullet
"\di": [0270] divide
"\!=": [0271] notequal
"\==": [0272] equivalence
"\~~": [0273] approxequal
"\..": [0274] ellipsis
NONE: [0275] arrowvertex
"\an": [0276] arrowhorizex
"\CR": [0277] carriagereturn
"\Ah": [0300] aleph
"\Im": [0301] Ifraktur
"\Re": [0302] Rfraktur
"\wp": [0303] weierstrass
"\c*": [0304] circlemultiply
"\c+": [0305] circleplus
"\es": [0306] emptyset
"\ca": [0307] cap
"\cu": [0310] cup
"\SS": [0311] superset
"\ip": [0312] reflexsuperset
"\n<": [0313] notsubset
"\SB": [0314] subset
"\ib": [0315] reflexsubset
"\mo": [0316] element
"\nm": [0317] notelement
"\/_": [0320] angle
"\gr": [0321] nabla
"\rg": [0322] registerserif
"\co": [0323] copyrightserif
"\tm": [0324] trademarkserif
"\PR": [0325] product
"\sr": [0326] radical
"\md": [0327] dotmath
"\no": [0330] logicalnot
"\AN": [0331] logicaland
"\OR": [0332] logicalor
"\hA": [0333] arrowdblboth
"\lA": [0334] arrowdblleft
"\uA": [0335] arrowdblup
"\rA": [0336] arrowdblright
"\dA": [0337] arrowdbldown
"\lz": [0340] lozenge
"\la": [0341] angleleft
"\RG": [0342] registersans
"\CO": [0343] copyrightsans
"\TM": [0344] trademarksans
"\SU": [0345] summation
NONE: [0346] parenlefttp
NONE: [0347] parenleftex
NONE: [0350] parenleftbt
"\lc": [0351] bracketlefttp
NONE: [0352] bracketleftex
"\lf": [0353] bracketleftbt
"\lt": [0354] bracelefttp
"\lk": [0355] braceleftmid
"\lb": [0356] braceleftbt
"\bv": [0357] braceex
"\eu": [0360] euro
"\ra": [0361] angleright
"\is": [0362] integral
NONE: [0363] integraltp
NONE: [0364] integralex
NONE: [0365] integralbt
NONE: [0366] parenrighttp
NONE: [0367] parenrightex
NONE: [0370] parenrightbt
"\rc": [0371] bracketrighttp
NONE: [0372] bracketrightex
"\rf": [0373] bracketrightbt
"\RT": [0374] bracerighttp
"\rk": [0375] bracerightmid
"\rb": [0376] bracerightbt

Finally, there are escape sequences that apply only if the current font is a Hershey font. Most of these escape sequences provide access to special symbols that belong to no font, and are accessible by no other means. These symbols are of two sorts: miscellaneous, and astronomical or zodiacal. The escape sequences for the miscellaneous symbols are as follows.

"\dd": daggerdbl
"\dg": dagger
"\hb": hbar
"\li": lineintegral
"\IB": interbang
"\Lb": lambdabar
"\~-": modifiedcongruent
"\-+": minusplus
"\||": parallel
"\s-": [variant form of s]

The final escape sequence in the table above, "\s-", yields a letter rather than a symbol. It is provided because in some Hershey fonts, the shape of the lower-case letter `s' differs if it is the last letter in a word. This is the case for HersheyGothicGerman. The German word "besonders", for example, should be written as "besonder\s-" if it is to be rendered correctly in this font. The same is true for the two Hershey symbol fonts, with their Greek alphabets (in Greek text, lower-case final `s' is different from lower-case non-final `s'). In Hershey fonts where there is no distinction between final and non-final `s', "s" and "\s-" are equivalent.

The escape sequences for the astronomical symbols, including the signs for the twelve constellations of the zodiac, are listed in the following table. We stress that that like the preceding miscellaneous escape sequences, they apply only if the current font is a Hershey font.

"\SO": sun
"\ME": mercury
"\VE": venus
"\EA": earth
"\MA": mars
"\JU": jupiter
"\SA": saturn
"\UR": uranus
"\NE": neptune
"\PL": pluto
"\LU": moon
"\CT": comet
"\ST": star
"\AS": ascendingnode
"\DE": descendingnode
"\AR": aries
"\TA": taurus
"\GE": gemini
"\CA": cancer
"\LE": leo
"\VI": virgo
"\LI": libra
"\SC": scorpio
"\SG": sagittarius
"\CP": capricornus
"\AQ": aquarius
"\PI": pisces

The preceding miscellaneous and astronomical symbols are not the only special non-font symbols that can be used if the current font is a Hershey font. The entire library of glyphs digitized by Allen Hershey is built into GNU libplot. So text strings may include any Hershey glyph. Each of the available Hershey glyphs is identified by a four-digit number. Standard Hershey glyph #1 would be specified as "\#H0001". The standard Hershey glyphs range from "\#H0001" to "\#H3999", with a number of gaps. Some additional glyphs designed by others appear in the "\#H4000"..."\#H4194" range. Syllabic Japanese characters (Kana) are located in the "\#H4195"..."\#H4399" range.

You may order a table of nearly all the Hershey glyphs in the "\#H0001"..."\#H3999" range from the U.S. National Technical Information Service, at +1 703 487 4650. Ask for item number PB251845; the current price is about US$40. By way of example, the string

     "\#H0744\#H0745\#H0001\#H0002\#H0003\#H0869\#H0907\#H2330\#H2331"

when drawn will display a shamrock, a fleur-de-lys, cartographic (small) letters A, B, C, a bell, a large circle, a treble clef, and a bass clef. Again, this assumes that the current font is a Hershey font.

You may also use Japanese syllabic characters (Hiragana and Katakana) and ideographic characters (Kanji) when drawing strings in any Hershey font. In all, 603 Kanji are available; these are the same Kanji that are available in the HersheyEUC font. The Japanese characters are indexed according to the JIS X0208 standard for Japanese typography, which represents each character by a two-byte sequence. The file kanji.doc, which is distributed along with the GNU plotting utilities, lists the available Kanji. On most systems it is installed in /usr/share/libplot or /usr/local/share/libplot.

Each JIS X0208 character would be specified by an escape sequence which expresses this two-byte sequence as four hexadecimal digits, such as "\#J357e". Both bytes must be in the 0x21...0x7e range in order to define a JIS X0208 character. Kanji are located at "\#J3021" and above. Characters appearing elsewhere in the JIS X0208 encoding may be accessed similarly. For example, Hiragana and Katakana are located in the "\#J2421"..."\#J257e" range, and Roman characters in the "\#J2321"..."\#J237e" range. The file kana.doc, which is installed in the same directory as kanji.doc, lists the encodings of the Hiragana and Katakana. For more on the JIS X0208 standard, see Ken Lunde's Understanding Japanese Information Processing (O'Reilly, 1993), and his on-line supplement.

The Kanji numbering used in A. N. Nelson's Modern Reader's Japanese-English Character Dictionary, a longtime standard, is also supported. (This dictionary is published by C. E. Tuttle and Co., with ISBN 0-8048-0408-7. A revised edition [ISBN 0-8048-2036-8] appeared in 1997, but uses a different numbering.) `Nelson' escape sequences for Kanji are similar to JIS X0208 escape sequences, but use four decimal instead of four hexadecimal digits. The file kanji.doc gives the correspondence between the JIS numbering scheme and the Nelson numbering scheme. For example, "\#N0001" is equivalent to "\#J306c". It also gives the positions of the available Kanji in the Unicode encoding.

All available Kanji have the same width, which is the same as that of the syllabic Japanese characters (Hiragana and Katakana). Each Kanji that is not available will print as an `undefined character' glyph (a bundle of horizontal lines). The same is true for non-Kanji JIS X0208 characters that are not available.