Previous: , Up: Insertions   [Contents][Index]

12.10 Inserting Unicode: @U

The command @U{hex} inserts a representation of the Unicode character U+hex. For example, @U{0132} inserts the Dutch ‘IJ’ ligature (poorly shown here as simply the two letters ‘I’ and ‘J’).

The hex value should be at least four hex digits; leading zeros are not added. In general, hex must specify a valid normal Unicode character; e.g., U+10FFFF (the very last code point) is invalid by definition, and thus cannot be inserted this way.

@U is useful for inserting occasional glyphs for which Texinfo has no dedicated command, while allowing the Texinfo source to remain purely 7-bit ASCII for maximum portability.

This command has many limitations—the same limitations as inserting Unicode characters in UTF-8 or another binary form. First and most importantly, TeX knows nothing about most of Unicode. Supporting specific additional glyphs upon request is possible, but it’s not viable for texinfo.tex to support whole additional scripts (Japanese, Urdu, …). The @U command does nothing to change this. If the specified character is not supported in TeX, an error is given. (See @documentencoding.)

In HTML, XML, and Docbook, the output from @U is always an entity reference of the form ‘&#xhex;’, as in ‘IJ’ for the example above. This should work even when an HTML document uses some other encoding (say, Latin 1) and the given character is not supported in that encoding.

In Info and plain text, if the document encoding is specified explicitly to be UTF-8, the output will be the UTF-8 representation of the character U+hex (presuming it’s a valid character). In all other cases, the output is the ASCII sequence ‘U+hex’, as in the six ASCII characters ‘U+0132’ for the example above.

That’s all. No magic!

Previous: , Up: Insertions   [Contents][Index]