11.10 Inserting Unicode: @U

The command @U{hex} inserts a representation of the Unicode character U+hex. For example, @U{0132} inserts the Dutch ‘IJ’ ligature (‘IJ’).

The hex value should be at least four hex digits; leading zeros are not added. In general, hex must specify a valid normal Unicode character; e.g., U+10FFFF (the very last code point) is invalid by definition, and thus cannot be inserted this way.

@U is useful for inserting occasional glyphs for which Texinfo has no dedicated command, while allowing the Texinfo source to remain purely 7-bit ASCII for maximum portability.

This command has many limitations—the same limitations as inserting Unicode characters in UTF-8 or another binary form. First and most importantly, TeX knows nothing about most of Unicode. Supporting specific additional glyphs upon request is possible, but it’s not viable for texinfo.tex to support whole additional scripts (Japanese, Urdu, …). The @U command does nothing to change this. If the specified character is not supported in TeX, an error is given. LaTeX output has more possibilities regarding UTF-8, but could require extra code to load fonts and declare how UTF-8 characters are output. (See @documentencoding enc: Set Input Encoding.)

In HTML and DocBook, the output from @U is always an entity reference of the form ‘&#xhex;’, as in ‘IJ’ for the example above. This should work even when an HTML document uses some other encoding (say, Latin 1) and the given character is not supported in that encoding.

In Info and plain text, if the output encoding is not UTF-8, the output is the ASCII sequence ‘U+hex’, as in the six ASCII characters ‘U+0132’ for the example above.