In version sort, unicode characters are compared byte-by-byte according to their binary representation, ignoring their unicode value or the current locale.
Most commonly, unicode characters (e.g. Greek Small Letter Alpha
U+03B1 ‘α’) are encoded as UTF-8 bytes (e.g. ‘α’ is encoded as UTF-8
0xCE 0xB1). The encoding will be compared byte-by-byte,
0xCE (decimal value 206) then
0xB1 (decimal value 177).
$ touch aa az "a%" "aα" $ ls -1 -v aa az a% aα
Ignoring the first letter (
a) which is identical in all
strings, the compared values are:
a’ and ‘
z’ are letters, and sort earlier than
all other non-digit characters.
Then, percent sign ‘
%’ (ASCII value 37) is compared to the
first byte of the UTF-8 sequence of ‘
α’, which is 0xCE or 206). The
value 37 is smaller, hence ‘
a%’ is listed before ‘