Next: Address Hashing, Previous: Basic Hash Table Operations, Up: Hash Tables

Normally, hash tables automatically resize themselves according to need. Because of this, the programmer need not be concerned with management of the table's size. However, some limited control over the table's size is provided, which will be discussed below. This discussion involves two concepts, usable size and physical size, which we will now define.

The usable size of a hash table is the number of associations that the table can hold at a given time. If the number of associations in the table exceeds the usable size, the table will automatically grow, increasing the usable size to a new value that is sufficient to hold the associations.

The physical size is an abstract measure of a hash table that specifies how much space is allocated to hold the associations of the table. The physical size is always greater than or equal to the usable size. The physical size is not interesting in itself; it is interesting only for its effect on the performance of the hash table. While the average performance of a hash-table lookup is bounded by a constant, the worst-case performance is not. For a table containing a given number of associations, increasing the physical size of the table decreases the probability that worse-than-average performance will occur.

The physical size of a hash table is statistically related to the number of associations. However, it is possible to place bounds on the physical size, and from this to estimate the amount of space used by the table:

(define (hash-table-space-bounds count rehash-size rehash-threshold) (let ((tf (/ 1 rehash-threshold))) (values (if (exact-integer? rehash-size) (- (* count (+ 4 tf)) (* tf (+ rehash-size rehash-size))) (* count (+ 4 (/ tf (* rehash-size rehash-size))))) (* count (+ 4 tf)))))

What this formula shows is that, for a “normal” rehash size (that is, not an exact integer), the amount of space used by the hash table is proportional to the number of associations in the table. The constant of proportionality varies statistically, with the low bound being

(+ 4 (/ (/ 1 rehash-threshold) (* rehash-size rehash-size)))

and the high bound being

(+ 4 (/ 1 rehash-threshold))

which, for the default values of these parameters, are `4.25`

and
`5`

, respectively. Reducing the rehash size will tighten these
bounds, but increases the amount of time spent resizing, so you can see
that the rehash size gives some control over the time-space tradeoff of
the table.

The programmer can control the size of a hash table by means of three parameters:

- Each table's
`initial-size`may be specified when the table is created. - Each table has a rehash size that specifies how the size of the table is changed when it is necessary to grow or shrink the table.
- Each table has a rehash threshold that specifies the relationship of the table's physical size to its usable size.

If the programmer knows that the table will initially contain a specific
number of items, `initial-size` can be given when the table is
created. If `initial-size` is an exact non-negative integer, it
specifies the initial usable size of the hash table; the table will not
change size until the number of items in the table exceeds
`initial-size`, after which automatic resizing is enabled and
`initial-size` no longer has any effect. Otherwise, if
`initial-size` is not given or is `#f`

, the table is
initialized to an unspecified size and automatic resizing is immediately
enabled.

The rehash size specifies how much to increase the usable size of
the hash table when it becomes full. It is either an exact positive
integer, or a real number greater than one. If it is an integer, the
new size is the sum of the old size and the rehash size. Otherwise, it
is a real number, and the new size is the product of the old size and
the rehash size. Increasing the rehash size decreases the average cost
of an insertion, but increases the average amount of space used by the
table. The rehash size of a table may be altered dynamically by the
application in order to optimize the resizing of the table; for example,
if the table will grow quickly for a known period and afterwards will
not change size, performance might be improved by using a large rehash
size during the growth phase and a small one during the static phase.
The default rehash size of a newly constructed hash table is `2.0`

.

**Warning**: The use of an exact positive integer for a rehash
size is almost always undesirable; this option is provided solely for
compatibility with the Common Lisp hash-table mechanism. The reason for
this has to do with the time penalty for resizing the hash table. The
time needed to resize a hash table is proportional to the
number of associations in the table. This resizing cost is
amortized across the insertions required to fill the table to the
point where it needs to grow again. If the table grows by an amount
proportional to the number of associations, then the cost of
resizing and the increase in size are both proportional to the
number of associations, so the amortized cost of an insertion
operation is still bounded by a constant. However, if the table grows
by a constant amount, this is not true: the amortized cost of an
insertion is not bounded by a constant. Thus, using a constant rehash
size means that the average cost of an insertion increases
proportionally to the number of associations in the hash table.

The rehash threshold is a real number, between zero exclusive and
one inclusive, that specifies the ratio between a hash table's usable
size and its physical size. Decreasing the rehash threshold decreases
the probability of worse-than-average insertion, deletion, and lookup
times, but increases the physical size of the table for a given usable
size. The default rehash threshold of a newly constructed hash table is
`1`

.

— procedure: **hash-table/size**` hash-table`

Returns the usable size of

hash-tableas an exact positive integer. This is the number of associations thathash-tablecan hold before it will grow.

— procedure: **set-hash-table/rehash-size!**` hash-table x`

Xmust be either an exact positive integer, or a real number that is greater than one. Sets the rehash size ofhash-tabletoxand returns an unspecified result. This operation adjusts the “shrink threshold” of the table; the table might shrink if the number of associations is less than the new threshold.

— procedure: **set-hash-table/rehash-threshold!**` hash-table x`

Xmust be a real number between zero exclusive and one inclusive. Sets the rehash threshold ofhash-tabletoxand returns an unspecified result. This operation does not change the usable size of the table, but it usually changes the physical size of the table, which causes the table to be rehashed.