Previous: , Up: Implementation Details   [Contents][Index]


30.2.6 Version sort uses ASCII order, ignores locale, unicode characters

In version sort, unicode characters are compared byte-by-byte according to their binary representation, ignoring their unicode value or the current locale.

Most commonly, unicode characters (e.g. Greek Small Letter Alpha U+03B1 ‘α’) are encoded as UTF-8 bytes (e.g. ‘α’ is encoded as UTF-8 sequence 0xCE 0xB1). The encoding will be compared byte-by-byte, e.g. first 0xCE (decimal value 206) then 0xB1 (decimal value 177).

$ touch   aa    az    "a%"    "aα"

$ ls -1 -v
aa
az
a%
aα

Ignoring the first letter (a) which is identical in all strings, the compared values are:

a’ and ‘z’ are letters, and sort earlier than all other non-digit characters.

Then, percent sign ‘%’ (ASCII value 37) is compared to the first byte of the UTF-8 sequence of ‘α’, which is 0xCE or 206). The value 37 is smaller, hence ‘a%’ is listed before ‘’.