Next: , Previous: Common options, Up: Top


4 ‘mkid’: Creating an ID Database

mkid builds an ID database. It accepts the names of files and/or directories on the command line, selects files that have an enabled scanner, then extracts and stores tokens from those files. The resulting ID database is architecture- and byte-order-independent so it can be shared among all systems.

The primary virtues of mkid are speed and high capacity. The size of the source trees it can index is limited only by available system memory. mkid's indexing algorithm is very space-efficient and exhibits excellent locality-of-reference, and so is capable of operating with a working-set size that is only half the size of its virtual address space. A typical unix-like operating system with 16 megabytes of system memory should be able to build an ID database covering approximately 12,000-14,000 source files totaling approximately 50–100 Megabytes. A 66 MHz 486 computer can build such a large ID database in approximately 10-15 minutes.

In a future release, mkid will be able to incrementally update an ID database much faster than it can build one from scratch. Until this feature becomes available, it might be a good idea to schedule a cron job to regularly update large ID databases during off-hours.

mkid writes the ID file, therefore it accepts the ‘--output’ (and ‘--file’) options as described in Writing options. mkid extracts tokens from source files, therefore it accepts the ‘--lang-map’, ‘--include’, ‘--exclude’, and ‘--lang-option’ options, as well as the language-specific scanner options, all of which are described in Extraction options. mkid walks file trees, therefore it handles file and directory names on its command line and the ‘--prune’ option as described in Walker options.

In addition, mkid accepts the following command-line options:

-s
--statistics
mkid reports statistics about resource usage at the end of its run.
-v
--verbose
mkid reports statistics about each file as it is scanned, and about the resource usage of its indexing algorithm at regular intervals.