The old database format is used by Unix
programs and earlier releases of the GNU ones.
produces this format if given the ‘--old-format’ option.
updatedb runs programs called
produce old-format databases. The old format differs from the new one
in the following ways. Instead of each entry starting with an
offset-differential count byte and ending with a null, byte values
from 0 through 28 indicate offset-differential counts from -14 through
14. The byte value indicating that a long offset-differential count
follows is 0x1e (30), not 0x80. The long counts are stored in host
byte order, which is not necessarily network byte order, and host
integer word size, which is usually 4 bytes. They also represent a
count 14 less than their value. The database lines have no
termination byte; the start of the next line is indicated by its first
byte having a value <= 30.
In addition, instead of starting with a dummy entry, the old database format starts with a 256 byte table containing the 128 most common bigrams in the file list. A bigram is a pair of adjacent bytes. Bytes in the database that have the high bit set are indexes (with the high bit cleared) into the bigram table. The bigram and offset-differential count coding makes these databases 20-25% smaller than the new format, but makes them not 8-bit clean. Any byte in a file name that is in the ranges used for the special codes is replaced in the database by a question mark, which not coincidentally is the shell wildcard to match a single character.
The old format therefore cannot faithfully store entries with non-ASCII characters. It therefore should not be used in internationalised environments. That is, most installations should not use it.
Because the long counts are stored by the
code program as
native-order machine words, the database format is not easily used in
environments which differ in terms of byte order. If locate databases
are to be shared between machines, the LOCATE02 database format should
be used. This has other benefits as discussed above. However, the
length of the filename currently being processed can normally be used
to place reasonable limits on the long counts and so this information
is used by locate to help it guess the byte ordering of the old format
database. Unless it finds evidence to the contrary,
will assume that the byte order of the database is the same as the
native byte order of the machine running
locate. The output of
‘locate --statistics’ also includes information about the byte
order of old-format databases.
The output of ‘locate --statistics’ will give an incorrect count of the number of file names containing newlines or high-bit characters for old-format databases.
Old versions of GNU
locate fail to correctly handle very long
file names, possibly leading to security problems relating to a heap
buffer overrun. See Security Considerations for locate, for a