Next: Paired-Sample Statistics, Previous: Statistical Operations, Up: Statistical Operations [Contents][Index]

These functions do various statistical computations on single
vectors. Given a numeric prefix argument, they actually pop
`n` objects from the stack and combine them into a data
vector. Each object may be either a number or a vector; if a
vector, any sub-vectors inside it are “flattened” as if by
`v a 0`; see Manipulating Vectors. By default one object
is popped, which (in order to be useful) is usually a vector.

If an argument is a variable name, and the value stored in that variable is a vector, then the stored vector is used. This method has the advantage that if your data vector is large, you can avoid the slow process of manipulating it directly on the stack.

These functions are left in symbolic form if any of their arguments are not numbers or vectors, e.g., if an argument is a formula, or a non-vector variable. However, formulas embedded within vector arguments are accepted; the result is a symbolic representation of the computation, based on the assumption that the formula does not itself represent a vector. All varieties of numbers such as error forms and interval forms are acceptable.

Some of the functions in this section also accept a single error form
or interval as an argument. They then describe a property of the
normal or uniform (respectively) statistical distribution described
by the argument. The arguments are interpreted in the same way as
the `M` argument of the random number function `k r`. In
particular, an interval with integer limits is considered an integer
distribution, so that ‘`[2 .. 6)`’ is the same as ‘`[2 .. 5]`’.
An interval with at least one floating-point limit is a continuous
distribution: ‘`[2.0 .. 6.0)`’ is *not* the same as
‘`[2.0 .. 5.0]`’!

The `u #` (`calc-vector-count`

) [`vcount`

] command
computes the number of data values represented by the inputs.
For example, ‘`vcount(1, [2, 3], [[4, 5], [], x, y])`’ returns 7.
If the argument is a single vector with no sub-vectors, this
simply computes the length of the vector.

The `u +` (`calc-vector-sum`

) [`vsum`

] command
computes the sum of the data values. The `u *`
(`calc-vector-prod`

) [`vprod`

] command computes the
product of the data values. If the input is a single flat vector,
these are the same as `V R +` and `V R *`
(see Reducing and Mapping).

The `u X` (`calc-vector-max`

) [`vmax`

] command
computes the maximum of the data values, and the `u N`
(`calc-vector-min`

) [`vmin`

] command computes the minimum.
If the argument is an interval, this finds the minimum or maximum
value in the interval. (Note that ‘`vmax([2..6)) = 5`’ as
described above.) If the argument is an error form, this returns
plus or minus infinity.

The `u M` (`calc-vector-mean`

) [`vmean`

] command
computes the average (arithmetic mean) of the data values.
If the inputs are error forms
‘`x +/- s`’,
this is the weighted mean of the ‘`x`’ values with weights
‘`1 / s^2`’.
If the inputs are not error forms, this is simply the sum of the
values divided by the count of the values.

Note that a plain number can be considered an error form with
error
‘`s = 0`’.
If the input to `u M` is a mixture of
plain numbers and error forms, the result is the mean of the
plain numbers, ignoring all values with non-zero errors. (By the
above definitions it’s clear that a plain number effectively
has an infinite weight, next to which an error form with a finite
weight is completely negligible.)

This function also works for distributions (error forms or
intervals). The mean of an error form ‘`a` `+/-` `b`’ is simply
‘`a`’. The mean of an interval is the mean of the minimum
and maximum values of the interval.

The `I u M` (`calc-vector-mean-error`

) [`vmeane`

]
command computes the mean of the data points expressed as an
error form. This includes the estimated error associated with
the mean. If the inputs are error forms, the error is the square
root of the reciprocal of the sum of the reciprocals of the squares
of the input errors. (I.e., the variance is the reciprocal of the
sum of the reciprocals of the variances.)
If the inputs are plain
numbers, the error is equal to the standard deviation of the values
divided by the square root of the number of values. (This works
out to be equivalent to calculating the standard deviation and
then assuming each value’s error is equal to this standard
deviation.)

The `H u M` (`calc-vector-median`

) [`vmedian`

]
command computes the median of the data values. The values are
first sorted into numerical order; the median is the middle
value after sorting. (If the number of data values is even,
the median is taken to be the average of the two middle values.)
The median function is different from the other functions in
this section in that the arguments must all be real numbers;
variables are not accepted even when nested inside vectors.
(Otherwise it is not possible to sort the data values.) If
any of the input values are error forms, their error parts are
ignored.

The median function also accepts distributions. For both normal (error form) and uniform (interval) distributions, the median is the same as the mean.

The `H I u M` (`calc-vector-harmonic-mean`

) [`vhmean`

]
command computes the harmonic mean of the data values. This is
defined as the reciprocal of the arithmetic mean of the reciprocals
of the values.

The `u G` (`calc-vector-geometric-mean`

) [`vgmean`

]
command computes the geometric mean of the data values. This
is the `n`th root of the product of the values. This is also
equal to the `exp`

of the arithmetic mean of the logarithms
of the data values.

The `H u G` [`agmean`

] command computes the “arithmetic-geometric
mean” of two numbers taken from the stack. This is computed by
replacing the two numbers with their arithmetic mean and geometric
mean, then repeating until the two values converge.

The `u S` (`calc-vector-sdev`

) [`vsdev`

] command
computes the standard
deviation
of the data values. If the values are error forms, the errors are used
as weights just as for `u M`. This is the *sample* standard
deviation, whose value is the square root of the sum of the squares of
the differences between the values and the mean of the ‘`N`’ values,
divided by ‘`N-1`’.

This function also applies to distributions. The standard deviation
of a single error form is simply the error part. The standard deviation
of a continuous interval happens to equal the difference between the
limits, divided by
‘`sqrt(12)`’.
The standard deviation of an integer interval is the same as the
standard deviation of a vector of those integers.

The `I u S` (`calc-vector-pop-sdev`

) [`vpsdev`

]
command computes the *population* standard deviation.
It is defined by the same formula as above but dividing
by ‘`N`’ instead of by ‘`N-1`’. The population standard
deviation is used when the input represents the entire set of
data values in the distribution; the sample standard deviation
is used when the input represents a sample of the set of all
data values, so that the mean computed from the input is itself
only an estimate of the true mean.

For error forms and continuous intervals, `vpsdev`

works
exactly like `vsdev`

. For integer intervals, it computes the
population standard deviation of the equivalent vector of integers.

The `H u S` (`calc-vector-variance`

) [`vvar`

] and
`H I u S` (`calc-vector-pop-variance`

) [`vpvar`

]
commands compute the variance of the data values. The variance
is the
square
of the standard deviation, i.e., the sum of the
squares of the deviations of the data values from the mean.
(This definition also applies when the argument is a distribution.)

The `vflat`

algebraic function returns a vector of its
arguments, interpreted in the same way as the other functions
in this section. For example, ‘`vflat(1, [2, [3, 4]], 5)`’
returns ‘`[1, 2, 3, 4, 5]`’.