Previous: , Up: Differences from the official Debian Algorithm   [Contents][Index]


30.3.3 Special handling of file extensions

GNU coreutils’ version sort algorithm implements specialized handling of file extensions (or strings that look like file names with extensions).

This nuanced implementation enables slightly more natural ordering of files.

The additional rules are:

  1. A suffix (i.e., a file extension) is defined as: a dot, followed by a letter or tilde, followed by one or more letters, digits, or tildes (possibly repeated more than once), until the end of the string (technically, matching the regular expression (\.[A-Za-z~][A-Za-z0-9~]*)*).
  2. If the strings contains suffixes, the suffixes are temporarily removed, and the strings are compared without them (using the algorithm above).
  3. If the suffix-less strings are identical, the suffix is restored and the entire strings are compared.
  4. If the non-suffixed strings differ, the result is returned and the suffix is effectively ignored.

Examples for rule 1:

Examples for rule 2:

Example for rule 3:

Examples for rule 4:

How does the suffix-removal algorithm effect ordering results?

Consider the comparison of hello-8.txt and hello-8.2.txt.

Without the suffix-removal algorithm, the strings will be broken down to the following parts:

hello-  vs  hello-  (rule 2, all non-digit characters)
8       vs  8       (rule 3, all digit characters)
.txt    vs  .       (rule 2)
empty   vs  2
empty   vs  .txt

The comparison of the third parts (‘.’ vs ‘.txt’) will determine that the shorter string comes first - resulting in hello-8.2.txt appearing first.

Indeed this is the order in which Debian’s dpkg compares the strings.

A more natural result is that hello-8.txt should come before hello-8.2.txt, and this is where the suffix-removal comes into play:

The suffixes (.txt) are removed, and the remaining strings are broken down into the following parts:

hello-  vs  hello-  (rule 2, all non-digit characters)
8       vs  8       (rule 3, all digit characters)
empty   vs  .       (rule 2)
empty   vs  2

As empty strings sort before non-empty strings, the result is hello-8 being first.

A real-world example would be listing files such as: gcc_10.fc9.tar.gz and gcc_10.8.12.7rc2.fc9.tar.bz2: Debian’s algorithm would list gcc_10.8.12.7rc2.fc9.tar.bz2 first, while ‘ls -v’ will list gcc_10.fc9.tar.gz first.

These priorities make sense for ‘ls -v’: Versioned files will be listed in a more natural order.

For ‘sort -V’ these priorities might seem arbitrary. However, because the sorting code is shared between the ls and sort program, the ordering rules are the same.


Previous: , Up: Differences from the official Debian Algorithm   [Contents][Index]