Next: The Poisson Distribution, Previous: The Dirichlet Distribution, Up: Random Number Distributions [Index]

Given *K* discrete events with different probabilities *P[k]*,
produce a random value *k* consistent with its probability.

The obvious way to do this is to preprocess the probability list by
generating a cumulative probability array with *K+1* elements:

C[0] = 0 C[k+1] = C[k]+P[k].

Note that this construction produces *C[K]=1*. Now choose a
uniform deviate *u* between 0 and 1, and find the value of *k*
such that *C[k] <= u < C[k+1]*.
Although this in principle requires of order *\log K* steps per
random number generation, they are fast steps, and if you use something
like *\lfloor uK \rfloor* as a starting point, you can often do
pretty well.

But faster methods have been devised. Again, the idea is to preprocess
the probability list, and save the result in some form of lookup table;
then the individual calls for a random discrete event can go rapidly.
An approach invented by G. Marsaglia (Generating discrete random variables
in a computer, Comm ACM 6, 37–38 (1963)) is very clever, and readers
interested in examples of good algorithm design are directed to this
short and well-written paper. Unfortunately, for large *K*,
Marsaglia’s lookup table can be quite large.

A much better approach is due to Alastair J. Walker (An efficient method
for generating discrete random variables with general distributions, ACM
Trans on Mathematical Software 3, 253–256 (1977); see also Knuth, v2,
3rd ed, p120–121,139). This requires two lookup tables, one floating
point and one integer, but both only of size *K*. After
preprocessing, the random numbers are generated in O(1) time, even for
large *K*. The preprocessing suggested by Walker requires
*O(K^2)* effort, but that is not actually necessary, and the
implementation provided here only takes *O(K)* effort. In general,
more preprocessing leads to faster generation of the individual random
numbers, but a diminishing return is reached pretty early. Knuth points
out that the optimal preprocessing is combinatorially difficult for
large *K*.

This method can be used to speed up some of the discrete random number
generators below, such as the binomial distribution. To use it for
something like the Poisson Distribution, a modification would have to
be made, since it only takes a finite set of *K* outcomes.

- Function:
*gsl_ran_discrete_t ****gsl_ran_discrete_preproc***(size_t*`K`, const double *`P`) -
This function returns a pointer to a structure that contains the lookup table for the discrete random number generator. The array

`P`[] contains the probabilities of the discrete events; these array elements must all be positive, but they needn’t add up to one (so you can think of them more generally as “weights”)—the preprocessor will normalize appropriately. This return value is used as an argument for the`gsl_ran_discrete`

function below.

- Function:
*size_t***gsl_ran_discrete***(const gsl_rng **`r`, const gsl_ran_discrete_t *`g`) -
After the preprocessor, above, has been called, you use this function to get the discrete random numbers.

- Function:
*double***gsl_ran_discrete_pdf***(size_t*`k`, const gsl_ran_discrete_t *`g`) -
Returns the probability

*P[k]*of observing the variable`k`. Since*P[k]*is not stored as part of the lookup table, it must be recomputed; this computation takes*O(K)*, so if`K`is large and you care about the original array*P[k]*used to create the lookup table, then you should just keep this original array*P[k]*around.

- Function:
*void***gsl_ran_discrete_free***(gsl_ran_discrete_t **`g`) -
De-allocates the lookup table pointed to by

`g`.