Next: , Previous: , Up: GNU Libidn   [Contents][Index]


Appendix B On Label Separators

Some strings contains characters whose NFKC normalized form contain the ASCII dot (0x2E, “.”). Examples of these characters are U+2024 (ONE DOT LEADER) and U+248C (DIGIT FIVE FULL STOP). The strings have the interesting property that their IDNA ToASCII output will contain embedded dots. For example:

ToASCII (hi U+248C com) = hi5.com
ToASCII (räksmörgås U+2024 com) = xn--rksmrgs.com-l8as9u

This demonstrate the two general cases: The first where the ASCII dot is part of an output that do not begin with the IDN prefix xn--. The second example illustrate when the dot is part of IDN prefixed with xn--.

The input strings are, from the DNS point of view, a single label. The IDNA algorithm translate one label at a time. Thus, the output is expected to be only one label. What is important here is to make sure the DNS resolver receives the correct query. The DNS protocol does not use the dot to delimit labels on the wire, rather it uses length-value pairs. Thus the correct query would be for {7}hi5.com and {22}xn--rksmrgs.com-l8as9u respectively.

Some implementations 1 have decided that these inputs strings are potentially confusing for the user. The string hi U+248C com looks like hi5.com on systems that support Unicode properly. These implementations do not follow RFC 3490. They yield:

ToASCII (hi U+248C com) = hi5.com
ToASCII (räksmörgås U+2024 com) = xn--rksmrgs-5wao1o.com

The DNS query they perform are {3}hi5{3}com and {18}xn--rksmrgs-5wao1o{3}com respectively. Arguably, this leads to a better user experience, and suggests that the IDNA specification is sub-optimal in this area.


Footnotes

(1)

Notably Microsoft’s Internet Explorer and Mozilla’s Firefox, but not Apple’s Safari.


Next: GNU Free Documentation License, Previous: PR29 discussion, Up: GNU Libidn   [Contents][Index]