Hash tables

A hashtable is a data structure that associates keys with values. The hashtable has no intrinsic order for the (key, value) associations it contains, and supports in-place modification as the primary means of setting the contents of a hash table. Any object can be used as a key, provided a hash function and a suitable equivalence function is available. A hash function is a procedure that maps keys to exact integer objects.

The hashtable provides key lookup and destructive update in amortised constant time, provided that a good hash function is used. A hash function h is acceptable for an equivalence predicate e iff (e obj1 obj2) implies (= (h obj1) (h obj2)). A hash function h is good for a equivalence predicate e if it distributes the resulting hash values for non-equal objects (by e) as uniformly as possible over the range of hash values, especially in the case when some (non-equal) objects resemble each other by e.g. having common subsequences. This definition is vague but should be enough to assert that e.g. a constant function is not a good hash function.

Kawa provides two complete sets of functions for hashtables:

Both interfaces use the same underlying datatype, so it is possible to mix and match from both sets. That datatype implements java.util.Map. Freshly-written code should probably use the R6RS functions.

R6RS hash tables

To use these hash table functions in your Kawa program you must first:

(import (rnrs hashtables))

This section uses the hashtable parameter name for arguments that must be hashtables, and the key parameter name for arguments that must be hashtable keys.

Procedure: make-eq-hashtable

Procedure: make-eq-hashtable k

Return a newly allocated mutable hashtable that accepts arbitrary objects as keys, and compares those keys with eq?. If an argument is given, the initial capacity of the hashtable is set to approximately k elements.

Procedure: make-eqv-hashtable

Procedure: make-eqv-hashtable k

Return a newly allocated mutable hashtable that accepts arbitrary objects as keys, and compares those keys with eqv?. If an argument is given, the initial capacity of the hashtable is set to approximately k elements.

Procedure: make-hashtable hash-function equiv

Procedure: make-hashtable hash-function equiv k

hash-function and equiv must be procedures. hash-function should accept a key as an argument and should return a non–negative exact integer object. equiv should accept two keys as arguments and return a single value. Neither procedure should mutate the hashtable returned by make-hashtable.

The make-hashtable procedure returns a newly allocated mutable hashtable using hash-function as the hash function and equiv as the equivalence function used to compare keys. If a third argument is given, the initial capacity of the hashtable is set to approximately k elements.

Both hash-function and equiv should behave like pure functions on the domain of keys. For example, the string-hash and string=? procedures are permissible only if all keys are strings and the contents of those strings are never changed so long as any of them continues to serve as a key in the hashtable. Furthermore, any pair of keys for which equiv returns true should be hashed to the same exact integer objects by hash-function.

Note: Hashtables are allowed to cache the results of calling the hash function and equivalence function, so programs cannot rely on the hash function being called for every lookup or update. Furthermore any hashtable operation may call the hash function more than once.

Procedures

Procedure: hashtable? obj

Return #t if obj is a hashtable, #f otherwise.

Procedure: hashtable-size hashtable

Return the number of keys contained in hashtable as an exact integer object.

Procedure: hashtable-ref hashtable key default

Return the value in hashtable associated with key. If hashtable does not contain an association for key, default is returned.

Procedure: hashtable-set! hashtable key obj

Change hashtable to associate key with obj, adding a new association or replacing any existing association for key, and returns unspecified values.

Procedure: hashtable-delete! hashtable key

Remove any association for key within hashtable and returns unspecified values.

Procedure: hashtable-contains? hashtable key

Return #t if hashtable contains an association for key, #f otherwise.

Procedure: hashtable-update! hashtable key proc default

proc should accept one argument, should return a single value, and should not mutate hashtable.

The hashtable-update! procedure applies proc to the value in hashtable associated with key, or to default if hashtable does not contain an association for key. The hashtable is then changed to associate key with the value returned by proc.

The behavior of hashtable-update! is equivalent to the following code, but is may be (and is in Kawa) implemented more efficiently in cases where the implementation can avoid multiple lookups of the same key:

(hashtable-set!
  hashtable key
  (proc (hashtable-ref
         hashtable key default)))

Procedure: hashtable-copy hashtable

Procedure: hashtable-copy hashtable mutable

Return a copy of hashtable. If the mutable argument is provided and is true, the returned hashtable is mutable; otherwise it is immutable.

Procedure: hashtable-clear! hashtable

Procedure: hashtable-clear! hashtable k

Remove all associations from hashtable and returns unspecified values.

If a second argument is given, the current capacity of the hashtable is reset to approximately k elements.

Procedure: hashtable-keys hashtable

Return a vector of all keys in hashtable. The order of the vector is unspecified.

Procedure: hashtable-entries hashtable

Return two values, a vector of the keys in hashtable, and a vector of the corresponding values.

Example:

(let ((h (make-eqv-hashtable)))
  (hashtable-set! h 1 'one)
  (hashtable-set! h 2 'two)
  (hashtable-set! h 3 'three)
  (hashtable-entries h))
⇒ #(1 2 3) #(one two three) ; two return values

the order of the entries in the result vectors is not known.

Inspection

Procedure: hashtable-equivalence-function hashtable

Return the equivalence function used by hashtable to compare keys. For hashtables created with make-eq-hashtable and make-eqv-hashtable, returns eq? and eqv? respectively.

Procedure: hashtable-hash-function hashtable

Return the hash function used by hashtable. For hashtables created by make-eq-hashtable or make-eqv-hashtable, #f is returned.

Procedure: hashtable-mutable? hashtable

Return #t if hashtable is mutable, otherwise #f.

Hash functions

The equal-hash, string-hash, and string-ci-hash procedures of this section are acceptable as the hash functions of a hashtable only if the keys on which they are called are not mutated while they remain in use as keys in the hashtable.

Procedure: equal-hash obj

Return an integer hash value for obj, based on its structure and current contents. This hash function is suitable for use with equal? as an equivalence function.

Note: Like equal?, the equal-hash procedure must always terminate, even if its arguments contain cycles.

Procedure: string-hash string

Return an integer hash value for string, based on its current contents. This hash function is suitable for use with string=? as an equivalence function.

Procedure: string-ci-hash string

Return an integer hash value for string based on its current contents, ignoring case. This hash function is suitable for use with string-ci=? as an equivalence function.

Procedure: symbol-hash symbol

Return an integer hash value for symbol.

SRFI-69 hash tables

To use these hash table functions in your Kawa program you must first:

(require 'srfi-69)

or

(require 'hash-table)

or

(import (srfi :69 basic-hash-tables))

Type constructors and predicate

Procedure: make-hash-table [ equal? [ hash [ size-hint]]] hash-table

Create a new hash table with no associations. The equal? parameter is a predicate that should accept two keys and return a boolean telling whether they denote the same key value; it defaults to the equal? function.

The hash parameter is a hash function, and defaults to an appropriate hash function for the given equal? predicate (see the Hashing section). However, an acceptable default is not guaranteed to be given for any equivalence predicate coarser than equal?, except for string-ci=?. (The function hash is acceptable for equal?, so if you use coarser equivalence than equal? other than string-ci=?, you must always provide the function hash yourself.) (An equivalence predicate c1 is coarser than a equivalence predicate c2 iff there exist values x and y such that (and (c1 x y) (not (c2 x y))).)

The size-hint parameter can be used to suggested an approriate initial size. This option is not part of the SRFI-69 specification (though it is handled by the reference implementation), so specifying that option might be unportable.

Procedure: hash-table? obj boolean

A predicate to test whether a given object obj is a hash table.

Procedure: alist->hash-table alist [ equal? [ hash [ size-hint]]] hash-table

Takes an association list alist and creates a hash table hash-table which maps the car of every element in alist to the cdr of corresponding elements in alist. The equal?, hash, and size-hint parameters are interpreted as in make-hash-table. If some key occurs multiple times in alist, the value in the first association will take precedence over later ones. (Note: the choice of using cdr (instead of cadr) for values tries to strike balance between the two approaches: using cadr would render this procedure unusable for cdr alists, but not vice versa.)

Reflective queries

Procedure: hash-table-equivalence-function hash-table

Returns the equivalence predicate used for keys of hash-table.

Procedure: hash-table-hash-function hash-table

Returns the hash function used for keys of hash-table.

Dealing with single elements

Procedure: hash-table-ref hash-table key [ thunk ] value

This procedure returns the value associated to key in hash-table. If no value is associated to key and thunk is given, it is called with no arguments and its value is returned; if thunk is not given, an error is signalled. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

Procedure: hash-table-ref/default hash-table key default value

Evaluates to the same value as (hash-table-ref hash-table key (lambda () default)). Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

Procedure: hash-table-set! hash-table key value void

This procedure sets the value associated to key in hash-table. The previous association (if any) is removed. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

Procedure: hash-table-delete! hash-table key void

This procedure removes any association to key in hash-table. It is not an error if no association for the key exists; in this case, nothing is done. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

Procedure: hash-table-exists? hash-table key boolean

This predicate tells whether there is any association to key in hash-table. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

Procedure: hash-table-update! hash-table key function [ thunk ] void

Semantically equivalent to, but may be implemented more efficiently than, the following code:

(hash-table-set! hash-table key
                 (function (hash-table-ref hash-table key thunk)))

Procedure: hash-table-update!/default hash-table key function default void

Behaves as if it evaluates to (hash-table-update! hash-table key function (lambda () default)).

Dealing with the whole contents

Procedure: hash-table-size hash-table integer

Returns the number of associations in hash-table. This operation takes constant time.

Procedure: hash-table-keys hash-table list

Returns a list of keys in hash-table. The order of the keys is unspecified.

Procedure: hash-table-values hash-table list

Returns a list of values in hash-table. The order of the values is unspecified, and is not guaranteed to match the order of keys in the result of hash-table-keys.

Procedure: hash-table-walk hash-table proc void

proc should be a function taking two arguments, a key and a value. This procedure calls proc for each association in hash-table, giving the key of the association as key and the value of the association as value. The results of proc are discarded. The order in which proc is called for the different associations is unspecified.

Procedure: hash-table-fold hash-table f init-value final-value

This procedure calls f for every association in hash-table with three arguments: the key of the association key, the value of the association value, and an accumulated value, val. The val is init-value for the first invocation of f, and for subsequent invocations of f, the return value of the previous invocation of f. The value final-value returned by hash-table-fold is the return value of the last invocation of f. The order in which f is called for different associations is unspecified.

Procedure: hash-table->alist hash-table alist

Returns an association list such that the car of each element in alist is a key in hash-table and the corresponding cdr of each element in alist is the value associated to the key in hash-table. The order of the elements is unspecified.

The following should always produce a hash table with the same mappings as a hash table h:

(alist->hash-table (hash-table->alist h)
                        (hash-table-equivalence-function h)
                        (hash-table-hash-function h))

Procedure: hash-table-copy hash-table hash-table

Returns a new hash table with the same equivalence predicate, hash function and mappings as in hash-table.

Procedure: hash-table-merge! hash-table1 hash-table2 hash-table

Adds all mappings in hash-table2 into hash-table1 and returns the resulting hash table. This function may modify hash-table1 destructively.

Hash functions

The Kawa implementation always calls these hash functions with a single parameter, and expects the result to be within the entire (32-bit signed) int range, for compatibility with standard hashCode methods.

Procedure: hash object [ bound ] integer

Produces a hash value for object in the range from 0 (inclusive) tp to bound (exclusive).

If bound is not given, the Kawa implementation returns a value within the range (- (expt 2 32)) (inclusive) to (- (expt 2 32) 1) (inclusive). It does this by calling the standard hashCode method, and returning the result as is. (If the object is the Java null value, 0 is returned.) This hash function is acceptable for equal?.

Procedure: string-hash string [ bound ] integer

The same as hash, except that the argument string must be a string. (The Kawa implementation returns the same as the hash function.)

Procedure: string-ci-hash string [ bound ] integer

The same as string-hash, except that the case of characters in string does not affect the hash value produced. (The Kawa implementation returns the same the hash function applied to the lower-cased string.)

Procedure: hash-by-identity object [ bound ] integer

The same as hash, except that this function is only guaranteed to be acceptable for eq?. Kawa uses the identityHashCode method of java.lang.System.