Algorithmic Details - Perfect Hash Function Generator

Next: Verbosity, Previous: Output Details, Up: Options

4.5 Options for changing the Algorithms employed by `gperf`

‘-k selected-byte-positions’

‘--key-positions=selected-byte-positions’

Allows selection of the byte positions used in the keywords' hash function. The allowable choices range between 1-255, inclusive. The positions are separated by commas, e.g., ‘-k 9,4,13,14’; ranges may be used, e.g., ‘-k 2-7’; and positions may occur in any order. Furthermore, the wildcard '*' causes the generated hash function to consider all byte positions in each keyword, whereas '$' instructs the hash function to use the “final byte” of a keyword (this is the only way to use a byte position greater than 255, incidentally).

For instance, the option ‘-k 1,2,4,6-10,'$'’ generates a hash function that considers positions 1,2,4,6,7,8,9,10, plus the last byte in each keyword (which may be at a different position for each keyword, obviously). Keywords with length less than the indicated byte positions work properly, since selected byte positions exceeding the keyword length are simply not referenced in the hash function.

This option is not normally needed since version 2.8 of gperf; the default byte positions are computed depending on the keyword set, through a search that minimizes the number of byte positions.

‘-D’

‘--duplicates’

Handle keywords whose selected byte sets hash to duplicate values. Duplicate hash values can occur if a set of keywords has the same names, but possesses different attributes, or if the selected byte positions are not well chosen. With the -D option gperf treats all these keywords as part of an equivalence class and generates a perfect hash function with multiple comparisons for duplicate keywords. It is up to you to completely disambiguate the keywords by modifying the generated C code. However, gperf helps you out by organizing the output.

Using this option usually means that the generated hash function is no longer perfect. On the other hand, it permits gperf to work on keyword sets that it otherwise could not handle.

‘-m iterations’

‘--multiple-iterations=iterations’

Perform multiple choices of the ‘-i’ and ‘-j’ values, and choose the best results. This increases the running time by a factor of iterations but does a good job minimizing the generated table size.

‘-i initial-value’

‘--initial-asso=initial-value’

Provides an initial value for the associate values array. Default is 0. Increasing the initial value helps inflate the final table size, possibly leading to more time efficient keyword lookups. Note that this option is not particularly useful when ‘-S’ (or, equivalently, ‘%switch’) is used. Also, ‘-i’ is overridden when the ‘-r’ option is used.

‘-j jump-value’

‘--jump=jump-value’

Affects the “jump value”, i.e., how far to advance the associated byte value upon collisions. Jump-value is rounded up to an odd number, the default is 5. If the jump-value is 0 gperf jumps by random amounts.

‘-n’

‘--no-strlen’

Instructs the generator not to include the length of a keyword when computing its hash value. This may save a few assembly instructions in the generated lookup table.

‘-r’

‘--random’

Utilizes randomness to initialize the associated values table. This frequently generates solutions faster than using deterministic initialization (which starts all associated values at 0). Furthermore, using the randomization option generally increases the size of the table.

‘-s size-multiple’

‘--size-multiple=size-multiple’

Affects the size of the generated hash table. The numeric argument for this option indicates “how many times larger or smaller” the maximum associated value range should be, in relationship to the number of keywords. It can be written as an integer, a floating-point number or a fraction. For example, a value of 3 means “allow the maximum associated value to be about 3 times larger than the number of input keywords”. Conversely, a value of 1/3 means “allow the maximum associated value to be about 3 times smaller than the number of input keywords”. Values smaller than 1 are useful for limiting the overall size of the generated hash table, though the option ‘-m’ is better at this purpose.

If `generate switch' option ‘-S’ (or, equivalently, ‘%switch’) is not enabled, the maximum associated value influences the static array table size, and a larger table should decrease the time required for an unsuccessful search, at the expense of extra table space.

The default value is 1, thus the default maximum associated value about the same size as the number of keywords (for efficiency, the maximum associated value is always rounded up to a power of 2). The actual table size may vary somewhat, since this technique is essentially a heuristic.

4.5 Options for changing the Algorithms employed by gperf

4.5 Options for changing the Algorithms employed by `gperf`