Nonlinear Least-Squares Fitting

This chapter describes functions for multidimensional nonlinear least-squares fitting. The library provides low level components for a variety of iterative solvers and convergence tests. These can be combined by the user to achieve the desired solution, with full access to the intermediate steps of the iteration. Each class of methods uses the same framework, so that you can switch between solvers at runtime without needing to recompile your program. Each instance of a solver keeps track of its own state, allowing the solvers to be used in multi-threaded programs.

The header file `gsl_multifit_nlin.h' contains prototypes for the multidimensional nonlinear fitting functions and related declarations.

Overview

The problem of multidimensional nonlinear least-squares fitting requires the minimization of the squared residuals of n functions, f_i, in p parameters, x_i,

\Phi(x) = (1/2) || F(x) ||^2
= (1/2) \sum_{i=1}^{n} f_i(x_1, ..., x_p)^2

All algorithms proceed from an initial guess using the linearization,

\psi(p) = || F(x+p) || ~=~ || F(x) + J p ||

where x is the initial point, p is the proposed step and J is the Jacobian matrix J_{ij} = d f_i / d x_j. Additional strategies are used to enlarge the region of convergence. These include requiring a decrease in the norm ||F|| on each step or using a trust region to avoid steps which fall outside the linear regime.

To perform a weighted least-squares fit of a nonlinear model Y(x,t) to data (t_i, y_i) with independent Gaussian errors \sigma_i, use function components of the following form,

f_i = (Y(x, t_i) - y_i) / \sigma_i

Note that the model parameters are denoted by x in this chapter since the non-linear least-squares algorithms are described geometrically (i.e. finding the minimum of a surface). The independent variable of any data to be fitted is denoted by t.

With the definition above the Jacobian is J_{ij} =(1 / \sigma_i) d Y_i / d x_j, where Y_i = Y(x,t_i).

Initializing the Solver

Function: gsl_multifit_fsolver * gsl_multifit_fsolver_alloc (const gsl_multifit_fsolver_type * T, size_t n, size_t p)
This function returns a pointer to a newly allocated instance of a solver of type T for n observations and p parameters. The number of observations n must be greater than or equal to parameters p.

If there is insufficient memory to create the solver then the function returns a null pointer and the error handler is invoked with an error code of GSL_ENOMEM.

Function: gsl_multifit_fdfsolver * gsl_multifit_fdfsolver_alloc (const gsl_multifit_fdfsolver_type * T, size_t n, size_t p)
This function returns a pointer to a newly allocated instance of a derivative solver of type T for n observations and p parameters. For example, the following code creates an instance of a Levenberg-Marquardt solver for 100 data points and 3 parameters,

const gsl_multifit_fdfsolver_type * T
= gsl_multifit_fdfsolver_lmder;
gsl_multifit_fdfsolver * s
= gsl_multifit_fdfsolver_alloc (T, 100, 3);

The number of observations n must be greater than or equal to parameters p.

If there is insufficient memory to create the solver then the function returns a null pointer and the error handler is invoked with an error code of GSL_ENOMEM.

Function: int gsl_multifit_fsolver_set (gsl_multifit_fsolver * s, gsl_multifit_function * f, const gsl_vector * x)
This function initializes, or reinitializes, an existing solver s to use the function f and the initial guess x.

Function: int gsl_multifit_fdfsolver_set (gsl_multifit_fdfsolver * s, gsl_multifit_function_fdf * fdf, const gsl_vector * x)
This function initializes, or reinitializes, an existing solver s to use the function and derivative fdf and the initial guess x.

Function: void gsl_multifit_fsolver_free (gsl_multifit_fsolver * s)
Function: void gsl_multifit_fdfsolver_free (gsl_multifit_fdfsolver * s)
These functions free all the memory associated with the solver s.

Function: const char * gsl_multifit_fsolver_name (const gsl_multifit_fsolver * s)
Function: const char * gsl_multifit_fdfsolver_name (const gsl_multifit_fdfsolver * s)
These functions return a pointer to the name of the solver. For example,

printf ("s is a '%s' solver\n",
gsl_multifit_fdfsolver_name (s));

would print something like s is a 'lmder' solver.

Providing the Function to be Minimized

You must provide n functions of p variables for the minimization algorithms to operate on. In order to allow for arbitrary parameters the functions are defined by the following data types:

Data Type: gsl_multifit_function
This data type defines a general system of functions with arbitrary parameters.

int (* f) (const gsl_vector * x, void * params, gsl_vector * f)
this function should store the vector result f(x,params) in f for argument x and arbitrary parameters params, returning an appropriate error code if the function cannot be computed.
size_t n
the number of functions, i.e. the number of components of the vector f.
size_t p
the number of independent variables, i.e. the number of components of the vector x.
void * params
a pointer to the arbitrary parameters of the function.

Data Type: gsl_multifit_function_fdf
This data type defines a general system of functions with arbitrary parameters and the corresponding Jacobian matrix of derivatives,

int (* f) (const gsl_vector * x, void * params, gsl_vector * f)
this function should store the vector result f(x,params) in f for argument x and arbitrary parameters params, returning an appropriate error code if the function cannot be computed.
int (* df) (const gsl_vector * x, void * params, gsl_matrix * J)
this function should store the n-by-p matrix result J_ij = d f_i(x,params) / d x_j in J for argument x and arbitrary parameters params, returning an appropriate error code if the function cannot be computed. If an analytic Jacobian is unavailable, or too expensive to compute, this function pointer may be set to NULL, in which case the Jacobian will be internally computed using finite difference approximations of the function f.
int (* fdf) (const gsl_vector * x, void * params, gsl_vector * f, gsl_matrix * J)
This function should set the values of the f and J as above, for arguments x and arbitrary parameters params. This function provides an optimization of the separate functions for f(x) and J(x)---it is always faster to compute the function and its derivative at the same time. If an analytic Jacobian is unavailable, or too expensive to compute, this function pointer may be set to NULL, in which case the Jacobian will be internally computed using finite difference approximations of the function f.
size_t n
the number of functions, i.e. the number of components of the vector f.
size_t p
the number of independent variables, i.e. the number of components of the vector x.
void * params
a pointer to the arbitrary parameters of the function.

Note that when fitting a non-linear model against experimental data, the data is passed to the functions above using the params argument and the trial best-fit parameters through the x argument.

Finite Difference Jacobian

For the algorithms which require a Jacobian matrix of derivatives of the fit functions, there are times when an analytic Jacobian may be unavailable or too expensive to compute. Therefore GSL supports approximating the Jacobian numerically using finite differences of the fit functions. This is typically done by setting the relevant function pointers of the gsl_multifit_function_fdf data type to NULL, however the following functions allow the user to access the approximate Jacobian directly if needed.

Function: int gsl_multifit_fdfsolver_dif_df (const gsl_vector * x, gsl_multifit_function_fdf * fdf, const gsl_vector * f, gsl_matrix * J)
This function takes as input the current position x with the function values computed at the current position f, along with fdf which specifies the fit function and parameters and approximates the n-by-p Jacobian J using forward finite differences: J_ij = d f_i(x,params) / d x_j = (f_i(x^*,params) - f_i(x,params)) / d x_j. where x^* has the jth element perturbed by \Delta x_j and \Delta x_j = \epsilon |x_j|, where \epsilon is the square root of the machine precision GSL_DBL_EPSILON.

Function: int gsl_multifit_fdfsolver_dif_fdf (const gsl_vector * x, gsl_multifit_function_fdf * fdf, gsl_vector * f, gsl_matrix * J)
This function computes the vector of function values f and the approximate Jacobian J at the position vector x using the system described in fdf. See gsl_multifit_fdfsolver_dif_df for a description of how the Jacobian is computed.

Iteration

The following functions drive the iteration of each algorithm. Each function performs one iteration to update the state of any solver of the corresponding type. The same functions work for all solvers so that different methods can be substituted at runtime without modifications to the code.

Function: int gsl_multifit_fsolver_iterate (gsl_multifit_fsolver * s)
Function: int gsl_multifit_fdfsolver_iterate (gsl_multifit_fdfsolver * s)
These functions perform a single iteration of the solver s. If the iteration encounters an unexpected problem then an error code will be returned. The solver maintains a current estimate of the best-fit parameters at all times.

The solver struct s contains the following entries, which can be used to track the progress of the solution:

gsl_vector * x
The current position.
gsl_vector * f
The function value at the current position.
gsl_vector * dx
The difference between the current position and the previous position, i.e. the last step, taken as a vector.
gsl_matrix * J
The Jacobian matrix at the current position (for the gsl_multifit_fdfsolver struct only)

The best-fit information also can be accessed with the following auxiliary functions,

Function: gsl_vector * gsl_multifit_fsolver_position (const gsl_multifit_fsolver * s)
Function: gsl_vector * gsl_multifit_fdfsolver_position (const gsl_multifit_fdfsolver * s)
These functions return the current position (i.e. best-fit parameters) s->x of the solver s.

Search Stopping Parameters

A minimization procedure should stop when one of the following conditions is true:

• A minimum has been found to within the user-specified precision.
• A user-specified maximum number of iterations has been reached.
• An error has occurred.

The handling of these conditions is under user control. The functions below allow the user to test the current estimate of the best-fit parameters in several standard ways.

Function: int gsl_multifit_test_delta (const gsl_vector * dx, const gsl_vector * x, double epsabs, double epsrel)

This function tests for the convergence of the sequence by comparing the last step dx with the absolute error epsabs and relative error epsrel to the current position x. The test returns GSL_SUCCESS if the following condition is achieved,

|dx_i| < epsabs + epsrel |x_i|

for each component of x and returns GSL_CONTINUE otherwise.

Function: int gsl_multifit_test_gradient (const gsl_vector * g, double epsabs)
This function tests the residual gradient g against the absolute error bound epsabs. Mathematically, the gradient should be exactly zero at the minimum. The test returns GSL_SUCCESS if the following condition is achieved,

\sum_i |g_i| < epsabs

and returns GSL_CONTINUE otherwise. This criterion is suitable for situations where the precise location of the minimum, x, is unimportant provided a value can be found where the gradient is small enough.

Function: int gsl_multifit_gradient (const gsl_matrix * J, const gsl_vector * f, gsl_vector * g)
This function computes the gradient g of \Phi(x) = (1/2) ||F(x)||^2 from the Jacobian matrix J and the function values f, using the formula g = J^T f.

High Level Driver

These routines provide a high level wrapper that combine the iteration and convergence testing for easy use.

Function: int gsl_multifit_fsolver_driver (gsl_multifit_fsolver * s, const size_t maxiter, const double epsabs, const double epsrel)
Function: int gsl_multifit_fdfsolver_driver (gsl_multifit_fdfsolver * s, const size_t maxiter, const double epsabs, const double epsrel)
These functions iterate the solver s for a maximum of maxiter iterations. After each iteration, the system is tested for convergence using gsl_multifit_test_delta with the error tolerances epsabs and epsrel.

Minimization Algorithms using Derivatives

The minimization algorithms described in this section make use of both the function and its derivative. They require an initial guess for the location of the minimum. There is no absolute guarantee of convergence--the function must be suitable for this technique and the initial guess must be sufficiently close to the minimum for it to work.

Derivative Solver: gsl_multifit_fdfsolver_lmsder
This is a robust and efficient version of the Levenberg-Marquardt algorithm as implemented in the scaled LMDER routine in MINPACK. Minpack was written by Jorge J. Moré, Burton S. Garbow and Kenneth E. Hillstrom.

The algorithm uses a generalized trust region to keep each step under control. In order to be accepted a proposed new position x' must satisfy the condition |D (x' - x)| < \delta, where D is a diagonal scaling matrix and \delta is the size of the trust region. The components of D are computed internally, using the column norms of the Jacobian to estimate the sensitivity of the residual to each component of x. This improves the behavior of the algorithm for badly scaled functions.

On each iteration the algorithm attempts to minimize the linear system |F + J p| subject to the constraint |D p| < \Delta. The solution to this constrained linear system is found using the Levenberg-Marquardt method.

The proposed step is now tested by evaluating the function at the resulting point, x'. If the step reduces the norm of the function sufficiently, and follows the predicted behavior of the function within the trust region, then it is accepted and the size of the trust region is increased. If the proposed step fails to improve the solution, or differs significantly from the expected behavior within the trust region, then the size of the trust region is decreased and another trial step is computed.

The algorithm also monitors the progress of the solution and returns an error if the changes in the solution are smaller than the machine precision. The possible error codes are,

GSL_ETOLF
the decrease in the function falls below machine precision
GSL_ETOLX
the change in the position vector falls below machine precision
GSL_ETOLG
the norm of the gradient, relative to the norm of the function, falls below machine precision
GSL_ENOPROG
the routine has made 10 or more attempts to find a suitable trial step without success (but subsequent calls can be made to continue the search).(15)

These error codes indicate that further iterations will be unlikely to change the solution from its current value.

Derivative Solver: gsl_multifit_fdfsolver_lmder
This is an unscaled version of the LMDER algorithm. The elements of the diagonal scaling matrix D are set to 1. This algorithm may be useful in circumstances where the scaled version of LMDER converges too slowly, or the function is already scaled appropriately.

Minimization Algorithms without Derivatives

There are no algorithms implemented in this section at the moment.

Computing the covariance matrix of best fit parameters

Function: int gsl_multifit_covar (const gsl_matrix * J, double epsrel, gsl_matrix * covar)
This function uses the Jacobian matrix J to compute the covariance matrix of the best-fit parameters, covar. The parameter epsrel is used to remove linear-dependent columns when J is rank deficient.

The covariance matrix is given by,

covar = (J^T J)^{-1}

and is computed by QR decomposition of J with column-pivoting. Any columns of R which satisfy

|R_{kk}| <= epsrel |R_{11}|

are considered linearly-dependent and are excluded from the covariance matrix (the corresponding rows and columns of the covariance matrix are set to zero).

If the minimisation uses the weighted least-squares function f_i = (Y(x, t_i) - y_i) / \sigma_i then the covariance matrix above gives the statistical error on the best-fit parameters resulting from the Gaussian errors \sigma_i on the underlying data y_i. This can be verified from the relation \delta f = J \delta c and the fact that the fluctuations in f from the data y_i are normalised by \sigma_i and so satisfy <\delta f \delta f^T> = I.

For an unweighted least-squares function f_i = (Y(x, t_i) - y_i) the covariance matrix above should be multiplied by the variance of the residuals about the best-fit \sigma^2 = \sum (y_i - Y(x,t_i))^2 / (n-p) to give the variance-covariance matrix \sigma^2 C. This estimates the statistical error on the best-fit parameters from the scatter of the underlying data.

Examples

The following example program fits a weighted exponential model with background to experimental data, Y = A \exp(-\lambda t) + b. The first part of the program sets up the functions expb_f and expb_df to calculate the model and its Jacobian. The appropriate fitting function is given by,

f_i = ((A \exp(-\lambda t_i) + b) - y_i)/\sigma_i

where we have chosen t_i = i. The Jacobian matrix J is the derivative of these functions with respect to the three parameters (A, \lambda, b). It is given by,

J_{ij} = d f_i / d x_j

where x_0 = A, x_1 = \lambda and x_2 = b.

/* expfit.c -- model functions for exponential + background */

struct data {
size_t n;
double * y;
double * sigma;
};

int
expb_f (const gsl_vector * x, void *data,
gsl_vector * f)
{
size_t n = ((struct data *)data)->n;
double *y = ((struct data *)data)->y;
double *sigma = ((struct data *) data)->sigma;

double A = gsl_vector_get (x, 0);
double lambda = gsl_vector_get (x, 1);
double b = gsl_vector_get (x, 2);

size_t i;

for (i = 0; i < n; i++)
{
/* Model Yi = A * exp(-lambda * i) + b */
double t = i;
double Yi = A * exp (-lambda * t) + b;
gsl_vector_set (f, i, (Yi - y[i])/sigma[i]);
}

return GSL_SUCCESS;
}

int
expb_df (const gsl_vector * x, void *data,
gsl_matrix * J)
{
size_t n = ((struct data *)data)->n;
double *sigma = ((struct data *) data)->sigma;

double A = gsl_vector_get (x, 0);
double lambda = gsl_vector_get (x, 1);

size_t i;

for (i = 0; i < n; i++)
{
/* Jacobian matrix J(i,j) = dfi / dxj, */
/* where fi = (Yi - yi)/sigma[i],      */
/*       Yi = A * exp(-lambda * i) + b  */
/* and the xj are the parameters (A,lambda,b) */
double t = i;
double s = sigma[i];
double e = exp(-lambda * t);
gsl_matrix_set (J, i, 0, e/s);
gsl_matrix_set (J, i, 1, -t * A * e/s);
gsl_matrix_set (J, i, 2, 1/s);
}
return GSL_SUCCESS;
}

int
expb_fdf (const gsl_vector * x, void *data,
gsl_vector * f, gsl_matrix * J)
{
expb_f (x, data, f);
expb_df (x, data, J);

return GSL_SUCCESS;
}

The main part of the program sets up a Levenberg-Marquardt solver and some simulated random data. The data uses the known parameters (1.0,5.0,0.1) combined with Gaussian noise (standard deviation = 0.1) over a range of 40 timesteps. The initial guess for the parameters is chosen as (0.0, 1.0, 0.0).

#include <stdlib.h>
#include <stdio.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_blas.h>
#include <gsl/gsl_multifit_nlin.h>

#include "expfit.c"

#define N 40

void print_state (size_t iter, gsl_multifit_fdfsolver * s);

int
main (void)
{
const gsl_multifit_fdfsolver_type *T;
gsl_multifit_fdfsolver *s;
int status;
unsigned int i, iter = 0;
const size_t n = N;
const size_t p = 3;

gsl_matrix *covar = gsl_matrix_alloc (p, p);
double y[N], sigma[N];
struct data d = { n, y, sigma};
gsl_multifit_function_fdf f;
double x_init[3] = { 1.0, 0.0, 0.0 };
gsl_vector_view x = gsl_vector_view_array (x_init, p);
const gsl_rng_type * type;
gsl_rng * r;

gsl_rng_env_setup();

type = gsl_rng_default;
r = gsl_rng_alloc (type);

f.f = &expb_f;
f.df = &expb_df;
f.fdf = &expb_fdf;
f.n = n;
f.p = p;
f.params = &d;

/* This is the data to be fitted */

for (i = 0; i < n; i++)
{
double t = i;
y[i] = 1.0 + 5 * exp (-0.1 * t)
+ gsl_ran_gaussian (r, 0.1);
sigma[i] = 0.1;
printf ("data: %u %g %g\n", i, y[i], sigma[i]);
};

T = gsl_multifit_fdfsolver_lmsder;
s = gsl_multifit_fdfsolver_alloc (T, n, p);
gsl_multifit_fdfsolver_set (s, &f, &x.vector);

print_state (iter, s);

do
{
iter++;
status = gsl_multifit_fdfsolver_iterate (s);

printf ("status = %s\n", gsl_strerror (status));

print_state (iter, s);

if (status)
break;

status = gsl_multifit_test_delta (s->dx, s->x,
1e-4, 1e-4);
}
while (status == GSL_CONTINUE && iter < 500);

gsl_multifit_covar (s->J, 0.0, covar);

#define FIT(i) gsl_vector_get(s->x, i)
#define ERR(i) sqrt(gsl_matrix_get(covar,i,i))

{
double chi = gsl_blas_dnrm2(s->f);
double dof = n - p;
double c = GSL_MAX_DBL(1, chi / sqrt(dof));

printf("chisq/dof = %g\n",  pow(chi, 2.0) / dof);

printf ("A      = %.5f +/- %.5f\n", FIT(0), c*ERR(0));
printf ("lambda = %.5f +/- %.5f\n", FIT(1), c*ERR(1));
printf ("b      = %.5f +/- %.5f\n", FIT(2), c*ERR(2));
}

printf ("status = %s\n", gsl_strerror (status));

gsl_multifit_fdfsolver_free (s);
gsl_matrix_free (covar);
gsl_rng_free (r);
return 0;
}

void
print_state (size_t iter, gsl_multifit_fdfsolver * s)
{
printf ("iter: %3u x = % 15.8f % 15.8f % 15.8f "
"|f(x)| = %g\n",
iter,
gsl_vector_get (s->x, 0),
gsl_vector_get (s->x, 1),
gsl_vector_get (s->x, 2),
gsl_blas_dnrm2 (s->f));
}

The iteration terminates when the change in x is smaller than 0.0001, as both an absolute and relative change. Here are the results of running the program:

iter: 0 x=1.00000000 0.00000000 0.00000000 |f(x)|=117.349
status=success
iter: 1 x=1.64659312 0.01814772 0.64659312 |f(x)|=76.4578
status=success
iter: 2 x=2.85876037 0.08092095 1.44796363 |f(x)|=37.6838
status=success
iter: 3 x=4.94899512 0.11942928 1.09457665 |f(x)|=9.58079
status=success
iter: 4 x=5.02175572 0.10287787 1.03388354 |f(x)|=5.63049
status=success
iter: 5 x=5.04520433 0.10405523 1.01941607 |f(x)|=5.44398
status=success
iter: 6 x=5.04535782 0.10404906 1.01924871 |f(x)|=5.44397
chisq/dof = 0.800996
A      = 5.04536 +/- 0.06028
lambda = 0.10405 +/- 0.00316
b      = 1.01925 +/- 0.03782
status = success

The approximate values of the parameters are found correctly, and the chi-squared value indicates a good fit (the chi-squared per degree of freedom is approximately 1). In this case the errors on the parameters can be estimated from the square roots of the diagonal elements of the covariance matrix.

If the chi-squared value shows a poor fit (i.e. chi^2/dof >> 1) then the error estimates obtained from the covariance matrix will be too small. In the example program the error estimates are multiplied by \sqrt{\chi^2/dof} in this case, a common way of increasing the errors for a poor fit. Note that a poor fit will result from the use an inappropriate model, and the scaled error estimates may then be outside the range of validity for Gaussian errors.