Next: Polynomial and Multilinear Fits, Previous: Curve Fitting, Up: Curve Fitting

The `a F` (`calc-curve-fit`

) [`fit`

] command attempts
to fit a set of data (‘`x`’ and ‘`y`’ vectors of numbers) to a
straight line, polynomial, or other function of ‘`x`’. For the
moment we will consider only the case of fitting to a line, and we
will ignore the issue of whether or not the model was in fact a good
fit for the data.

In a standard linear least-squares fit, we have a set of ‘`(x,y)`’
data points that we wish to fit to the model ‘`y = m x + b`’
by adjusting the parameters ‘`m`’ and ‘`b`’ to make the ‘`y`’
values calculated from the formula be as close as possible to the actual
‘`y`’ values in the data set. (In a polynomial fit, the model is
instead, say, ‘`y = a x^3 + b x^2 + c x + d`’. In a multilinear fit,
we have data points of the form ‘`(x_1,x_2,x_3,y)`’ and our model is
‘`y = a x_1 + b x_2 + c x_3 + d`’. These will be discussed later.)

In the model formula, variables like ‘`x`’ and ‘`x_2`’ are called
the independent variables, and ‘`y`’ is the dependent
variable. Variables like ‘`m`’, ‘`a`’, and ‘`b`’ are called
the parameters of the model.

The `a F` command takes the data set to be fitted from the stack.
By default, it expects the data in the form of a matrix. For example,
for a linear or polynomial fit, this would be a
2xN
matrix where the first row is a list of ‘`x`’ values and the second
row has the corresponding ‘`y`’ values. For the multilinear fit
shown above, the matrix would have four rows (‘`x_1`’, ‘`x_2`’,
‘`x_3`’, and ‘`y`’, respectively).

If you happen to have an
Nx2
matrix instead of a
2xN
matrix, just press `v t` first to transpose the matrix.

After you type `a F`, Calc prompts you to select a model. For a
linear fit, press the digit `1`.

Calc then prompts for you to name the variables. By default it chooses
high letters like ‘`x`’ and ‘`y`’ for independent variables and
low letters like ‘`a`’ and ‘`b`’ for parameters. (The dependent
variable doesn't need a name.) The two kinds of variables are separated
by a semicolon. Since you generally care more about the names of the
independent variables than of the parameters, Calc also allows you to
name only those and let the parameters use default names.

For example, suppose the data matrix

[ [ 1, 2, 3, 4, 5 ] [ 5, 7, 9, 11, 13 ] ]

is on the stack and we wish to do a simple linear fit. Type
`a F`, then `1` for the model, then <RET> to use
the default names. The result will be the formula ‘`3. + 2. x`’
on the stack. Calc has created the model expression `a + b x`,
then found the optimal values of ‘`a`’ and ‘`b`’ to fit the
data. (In this case, it was able to find an exact fit.) Calc then
substituted those values for ‘`a`’ and ‘`b`’ in the model
formula.

The `a F` command puts two entries in the trail. One is, as
always, a copy of the result that went to the stack; the other is
a vector of the actual parameter values, written as equations:
‘`[a = 3, b = 2]`’, in case you'd rather read them in a list
than pick them out of the formula. (You can type `t y`
to move this vector to the stack; see Trail Commands.

Specifying a different independent variable name will affect the
resulting formula: `a F 1 k <RET>` produces `3 + 2 k`.
Changing the parameter names (say, `a F 1 k;b,m <RET>`) will affect
the equations that go into the trail.

To see what happens when the fit is not exact, we could change the number 13 in the data matrix to 14 and try the fit again. The result is:

2.6 + 2.2 x

Evaluating this formula, say with `v x 5 <RET> <TAB> V M $ <RET>`, shows
a reasonably close match to the y-values in the data.

[4.8, 7., 9.2, 11.4, 13.6]

Since there is no line which passes through all the `n` data points,
Calc has chosen a line that best approximates the data points using
the method of least squares. The idea is to define the chi-square
error measure

chi^2 = sum((y_i - (a + b x_i))^2, i, 1, N)

which is clearly zero if ‘`a + b x`’ exactly fits all data points,
and increases as various ‘`a + b x_i`’ values fail to match the
corresponding ‘`y_i`’ values. There are several reasons why the
summand is squared, one of them being to ensure that
‘`chi^2 >= 0`’.
Least-squares fitting simply chooses the values of ‘`a`’ and ‘`b`’
for which the error
‘`chi^2`’
is as small as possible.

Other kinds of models do the same thing but with a different model
formula in place of ‘`a + b x_i`’.

A numeric prefix argument causes the `a F` command to take the
data in some other form than one big matrix. A positive argument `n`
will take `N` items from the stack, corresponding to the `n` rows
of a data matrix. In the linear case, `n` must be 2 since there
is always one independent variable and one dependent variable.

A prefix of zero or plain `C-u` is a compromise; Calc takes two
items from the stack, an `n`-row matrix of ‘`x`’ values, and a
vector of ‘`y`’ values. If there is only one independent variable,
the ‘`x`’ values can be either a one-row matrix or a plain vector,
in which case the `C-u` prefix is the same as a `C-u 2` prefix.