21.7.2 HTML Cross-reference Node Name Expansion

As mentioned in the previous section, the key part of the HTML cross reference algorithm is the conversion of node names in the Texinfo source into strings suitable for XHTML identifiers and file names. The restrictions are similar for each: plain ASCII letters, numbers, and the ‘-’ and ‘_’ characters are all that can be used. (Although HTML anchors can contain most characters, XHTML is more restrictive.)

Cross-references in Texinfo can refer either to nodes, anchors (see @anchor: Defining Arbitrary Cross-reference Targets) or float labels (see @float [type][,label]: Floating Material). However, anchors and float labels are treated identically to nodes in this context, so we’ll continue to say “node” names for simplicity.

A special exception: the Top node (see The ‘Top’ Node and Master Menu) is always mapped to the file index.html, to match web server software. However, the HTML target is ‘Top’. Thus (in the split case):

@xref{Top,,, emacs, The GNU Emacs Manual}.
⇒ <a href="../emacs_html/index.html#Top">
  1. The standard ASCII letters (a-z and A-Z) are not modified. All other characters may be changed as specified below.
  2. The standard ASCII numbers (0-9) are not modified except when a number is the first character of the node name. In that case, see below.
  3. Multiple consecutive space, tab and newline characters are transformed into just one space.
  4. Leading and trailing spaces are removed.
  5. After the above has been applied, each remaining space character is converted into a ‘-’ character.
  6. Other ASCII 7-bit characters are transformed into ‘_00xx’, where xx is the ASCII character code in (lowercase) hexadecimal. This includes ‘_’, which is mapped to ‘_005f’.
  7. If the node name does not begin with a letter, the literal string ‘g_t’ is prefixed to the result. (Due to the rules above, that string can never occur otherwise; it is an arbitrary choice, standing for “GNU Texinfo”.) This is necessary because XHTML requires that identifiers begin with a letter.

For example:

@node A  node --- with _'%
⇒ A-node-_002d_002d_002d-with-_005f_0027_0025

Example translations of common characters:

On case-folding computer systems, nodes differing only by case will be mapped to the same file. In particular, as mentioned above, Top always maps to the file index.html. Thus, on a case-folding system, Top and a node named ‘Index’ will both be written to index.html. Fortunately, the targets serve to distinguish these cases, since HTML target names are always case-sensitive, independent of operating system.