GNU Libidn (idn) – Internationalized Domain Names command line tool
idn allows internationalized string preparation
(‘stringprep’), encoding and decoding of punycode data, and IDNA
ToASCII/ToUnicode operations to be performed on the command line.
If strings are specified on the command line, they are used as input
and the computed output is printed to standard output
If no strings are specified on the command line, the program read
data, line by line, from the standard input
stdin, and print
the computed output to standard output. What processing is performed
(e.g., ToASCII, or Punycode encode) is indicated by options. If any
errors are encountered, the execution of the applications is aborted.
All strings are expected to be encoded in the preferred charset used
by your locale. Use
--debug to find out what this charset is.
You can override the charset used by setting environment variable
To process a string that starts with
-, for example
-- to signal the end of parameters, as in
idn --quiet -a -- -foo.
idn recognizes these commands:
-h, --help Print help and exit -V, --version Print version and exit -s, --stringprep Prepare string according to nameprep profile -d, --punycode-decode Decode Punycode -e, --punycode-encode Encode Punycode -a, --idna-to-ascii Convert to ACE according to IDNA (default mode) -u, --idna-to-unicode Convert from ACE according to IDNA --allow-unassigned Toggle IDNA AllowUnassigned flag (default off) --usestd3asciirules Toggle IDNA UseSTD3ASCIIRules flag (default off) --no-tld Don't check string for TLD specific rules -n, --nfkc Normalize string according to Unicode v3.2 NFKC -p, --profile=STRING Use specified stringprep profile instead --debug Print debugging information --quiet Silent operation
The CHARSET environment variable can be used to override what character set to be used for decoding incoming data (i.e., on the command line or on the standard input stream), and to encode data to the standard output. If your system is set up correctly, however, the application will guess which character set is used automatically. Example usage:
$ CHARSET=ISO-8859-1 idn --punycode-encode ...
Standard usage, reading input from standard input. The parameter
--quiet disable printing copyright, license and usage
jas@latte:~$ idn --quiet räksmörgås.se xn--rksmrgs-5wao1o.se jas@latte:~$
Reading input from command line:
jas@latte:~$ idn --quiet räksmörgås.se blåbærgrød.no xn--rksmrgs-5wao1o.se xn--blbrgrd-fxak7p.no jas@latte:~$
Accessing a specific StringPrep profile directly:
jas@latte:~$ idn --quiet --profile=SASLprep --stringprep teßtª teßta jas@latte:~$
Getting character data encoded right, and making sure Libidn use the
same encoding, can be difficult. The reason for this is that most
systems encode character data in more than one character encoding,
UTF-8 together with
ISO-2022-JP. This problem is likely to continue to exist until
only one character encoding come out as the evolutionary winner, or
(more likely, at least to some extents) forever.
The first step to troubleshooting character encoding problems with Libidn is to use the ‘--debug’ parameter to find out which character set encoding ‘idn’ believe your locale uses.
jas@latte:~$ idn --debug --quiet "" system locale uses charset `UTF-8'. jas@latte:~$
If it prints
indicate you have not configured your locale properly. To configure
the locale, you can, for example, use ‘LANG=sv_SE.UTF-8; export
LANG’ at a
/bin/sh prompt, to set up your locale for a Swedish
UTF-8 as the encoding.
Sometimes ‘idn’ appear to be unable to translate from your system
UTF-8 (which is used internally), and you get an
error like the following:
jas@latte:~$ idn --quiet foo idn: could not convert from ISO-8859-1 to UTF-8. jas@latte:~$
The simplest explanation is that you haven’t installed the ‘iconv’ conversion tools. You can find it as a standalone library in GNU Libiconv (https://www.gnu.org/software/libiconv/). On many GNU/Linux systems, this library is part of the system, but you may have to install additional packages (e.g., ‘glibc-locale’ for Debian) to be able to use it.
Another explanation is that the error is correct and you are feeding
‘idn’ invalid data. This can happen inadvertently if you are not
careful with the character set encoding you use. For example, if your
shell run in a
ISO-8859-1 environment, and you invoke
‘idn’ with the ‘CHARSET’ environment variable as follows,
you will feed it
ISO-8859-1 characters but force it to believe
UTF-8. Naturally this will lead to an error, unless
the byte sequences happen to be valid
UTF-8. Note that even if
you don’t get an error, the output may be incorrect in this situation,
UTF-8 does not in general encode
the same characters as the same byte sequences.
jas@latte:~$ idn --quiet --debug "" system locale uses charset `ISO-8859-1'. jas@latte:~$ CHARSET=UTF-8 idn --quiet --debug räksmörgås system locale uses charset `UTF-8'. input = U+0072 input = U+4af3 input = U+006d input = U+1b29e5 input = U+0073 output = U+0078 output = U+006e output = U+002d output = U+002d output = U+0072 output = U+006d output = U+0073 output = U+002d output = U+0068 output = U+0069 output = U+0036 output = U+0064 output = U+0035 output = U+0039 output = U+0037 output = U+0035 output = U+0035 output = U+0032 output = U+0061 xn--rms-hi6d597552a jas@latte:~$
The sense moral here is to forget about ‘CHARSET’ (configure your locales properly instead) unless you know what you are doing, and if you want to use it, do it carefully, after verifying with ‘--debug’ that you get the desired results.