Next: , Previous: PR29 discussion, Up: Top

Appendix B On Label Separators

Some strings contains characters whose NFKC normalized form contain the ASCII dot (0x2E, “.”). Examples of these characters are U+2024 (ONE DOT LEADER) and U+248C (DIGIT FIVE FULL STOP). The strings have the interesting property that their IDNA ToASCII output will contain embedded dots. For example:

     ToASCII (hi U+248C com) =
     ToASCII (räksmörgås U+2024 com) =

This demonstrate the two general cases: The first where the ASCII dot is part of an output that do not begin with the IDN prefix xn--. The second example illustrate when the dot is part of IDN prefixed with xn--.

The input strings are, from the DNS point of view, a single label. The IDNA algorithm translate one label at a time. Thus, the output is expected to be only one label. What is important here is to make sure the DNS resolver receives the correct query. The DNS protocol does not use the dot to delimit labels on the wire, rather it uses length-value pairs. Thus the correct query would be for {7} and {22} respectively.

Some implementations 1 have decided that these inputs strings are potentially confusing for the user. The string hi U+248C com looks like on systems that support Unicode properly. These implementations do not follow RFC 3490. They yield:

     ToASCII (hi U+248C com) =
     ToASCII (räksmörgås U+2024 com) =

The DNS query they perform are {3}hi5{3}com and {18}xn--rksmrgs-5wao1o{3}com respectively. Arguably, this leads to a better user experience, and suggests that the IDNA specification is sub-optimal in this area.

B.1 Recommended Workaround

It has been suggested to normalize the entire input string using NFKC before passing it to IDNA ToASCII. You may use stringprep_utf8_nfkc_normalize or stringprep_ucs4_nfkc_normalize. This appears to lead to similar behaviour as IE/Firefox, which would avoid the problem, but this needs to be confirmed. Feel free to discuss the issue with us.

Alternative workarounds are being considered. Eventually Libidn may implement a new flag to the idna_* functions that implements a recommended way to work around this problem.


[1] Notably Microsoft's Internet Explorer and Mozilla's Firefox, but not Apple's Safari.