Basis Splines

This chapter describes functions for the computation of smoothing basis splines (B-splines). A smoothing spline differs from an interpolating spline in that the resulting curve is not required to pass through each datapoint. For information about interpolating splines, see Interpolation.

The header file gsl_bspline.h contains the prototypes for the bspline functions and related declarations.

Overview

B-splines are commonly used as basis functions to fit smoothing curves to large data sets. To do this, the abscissa axis is broken up into some number of intervals, where the endpoints of each interval are called breakpoints. These breakpoints are then converted to knots by imposing various continuity and smoothness conditions at each interface. Given a nondecreasing knot vector

t = \{t_0, t_1, \dots, t_{n+k-1}\}

the n basis splines of order k are defined by

B_{i,1}(x) &=
  \left\{
    \begin{array}{cc}
      1, & t_i \le x < t_{i+1} \\
      0, & else
    \end{array}
  \right. \\
B_{i,k}(x) &= {(x - t_i) \over (t_{i+k-1} - t_i)} B_{i,k-1}(x) +
              {(t_{i+k} - x) \over (t_{i+k} - t_{i+1})} B_{i+1,k-1}(x)

for i = 0, \ldots, n-1. The common case of cubic B-splines is given by k = 4. The above recurrence relation can be evaluated in a numerically stable way by the de Boor algorithm.

If we define appropriate knots on an interval [a,b] then the B-spline basis functions form a complete set on that interval. Therefore we can expand a smoothing function as

f(x) = \sum_{i=0}^{n-1} c_i B_{i,k}(x)

given enough (x_j, f(x_j)) data pairs. The coefficients c_i can be readily obtained from a least-squares fit.

Initializing the B-splines solver

type gsl_bspline_workspace

The computation of B-spline functions requires a preallocated workspace.

gsl_bspline_workspace *gsl_bspline_alloc(const size_t k, const size_t nbreak)

This function allocates a workspace for computing B-splines of order k. The number of breakpoints is given by nbreak. This leads to n = nbreak + k - 2 basis functions. Cubic B-splines are specified by k = 4. The size of the workspace is O(2k^2 + 5k + nbreak).

void gsl_bspline_free(gsl_bspline_workspace *w)

This function frees the memory associated with the workspace w.

Constructing the knots vector

int gsl_bspline_knots(const gsl_vector *breakpts, gsl_bspline_workspace *w)

This function computes the knots associated with the given breakpoints and stores them internally in w->knots.

int gsl_bspline_knots_uniform(const double a, const double b, gsl_bspline_workspace *w)

This function assumes uniformly spaced breakpoints on [a,b] and constructs the corresponding knot vector using the previously specified nbreak parameter. The knots are stored in w->knots.

Evaluation of B-splines

int gsl_bspline_eval(const double x, gsl_vector *B, gsl_bspline_workspace *w)

This function evaluates all B-spline basis functions at the position x and stores them in the vector B, so that the i-th element is B_i(x). The vector B must be of length n = nbreak + k - 2. This value may also be obtained by calling gsl_bspline_ncoeffs(). Computing all the basis functions at once is more efficient than computing them individually, due to the nature of the defining recurrence relation.

int gsl_bspline_eval_nonzero(const double x, gsl_vector *Bk, size_t *istart, size_t *iend, gsl_bspline_workspace *w)

This function evaluates all potentially nonzero B-spline basis functions at the position x and stores them in the vector Bk, so that the i-th element is B_{(istart+i)}(x). The last element of Bk is B_{iend}(x). The vector Bk must be of length k. By returning only the nonzero basis functions, this function allows quantities involving linear combinations of the B_i(x) to be computed without unnecessary terms (such linear combinations occur, for example, when evaluating an interpolated function).

size_t gsl_bspline_ncoeffs(gsl_bspline_workspace *w)

This function returns the number of B-spline coefficients given by n = nbreak + k - 2.

Evaluation of B-spline derivatives

int gsl_bspline_deriv_eval(const double x, const size_t nderiv, gsl_matrix *dB, gsl_bspline_workspace *w)

This function evaluates all B-spline basis function derivatives of orders 0 through nderiv (inclusive) at the position x and stores them in the matrix dB. The (i,j)-th element of dB is d^jB_i(x)/dx^j. The matrix dB must be of size n = nbreak + k - 2 by nderiv + 1. The value n may also be obtained by calling gsl_bspline_ncoeffs(). Note that function evaluations are included as the zeroth order derivatives in dB. Computing all the basis function derivatives at once is more efficient than computing them individually, due to the nature of the defining recurrence relation.

int gsl_bspline_deriv_eval_nonzero(const double x, const size_t nderiv, gsl_matrix *dB, size_t *istart, size_t *iend, gsl_bspline_workspace *w)

This function evaluates all potentially nonzero B-spline basis function derivatives of orders 0 through nderiv (inclusive) at the position x and stores them in the matrix dB. The (i,j)-th element of dB is d^jB_{(istart+i)}(x)/dx^j. The last row of dB contains d^jB_{iend}(x)/dx^j. The matrix dB must be of size k by at least nderiv + 1. Note that function evaluations are included as the zeroth order derivatives in dB. By returning only the nonzero basis functions, this function allows quantities involving linear combinations of the B_i(x) and their derivatives to be computed without unnecessary terms.

Working with the Greville abscissae

The Greville abscissae are defined to be the mean location of k-1 consecutive knots in the knot vector for each basis spline function of order k. With the first and last knots in the gsl_bspline_workspace knot vector excluded, there are gsl_bspline_ncoeffs() Greville abscissae for any given B-spline basis. These values are often used in B-spline collocation applications and may also be called Marsden-Schoenberg points.

double gsl_bspline_greville_abscissa(size_t i, gsl_bspline_workspace *w)

Returns the location of the i-th Greville abscissa for the given B-spline basis. For the ill-defined case when k = 1, the implementation chooses to return breakpoint interval midpoints.

Examples

The following program computes a linear least squares fit to data using cubic B-spline basis functions with uniform breakpoints. The data is generated from the curve y(x) = \cos{(x)} \exp{(-x/10)} on the interval [0, 15] with Gaussian noise added.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <gsl/gsl_bspline.h>
#include <gsl/gsl_multifit.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_statistics.h>

/* number of data points to fit */
#define N        200

/* number of fit coefficients */
#define NCOEFFS  12

/* nbreak = ncoeffs + 2 - k = ncoeffs - 2 since k = 4 */
#define NBREAK   (NCOEFFS - 2)

int
main (void)
{
  const size_t n = N;
  const size_t ncoeffs = NCOEFFS;
  const size_t nbreak = NBREAK;
  size_t i, j;
  gsl_bspline_workspace *bw;
  gsl_vector *B;
  double dy;
  gsl_rng *r;
  gsl_vector *c, *w;
  gsl_vector *x, *y;
  gsl_matrix *X, *cov;
  gsl_multifit_linear_workspace *mw;
  double chisq, Rsq, dof, tss;

  gsl_rng_env_setup();
  r = gsl_rng_alloc(gsl_rng_default);

  /* allocate a cubic bspline workspace (k = 4) */
  bw = gsl_bspline_alloc(4, nbreak);
  B = gsl_vector_alloc(ncoeffs);

  x = gsl_vector_alloc(n);
  y = gsl_vector_alloc(n);
  X = gsl_matrix_alloc(n, ncoeffs);
  c = gsl_vector_alloc(ncoeffs);
  w = gsl_vector_alloc(n);
  cov = gsl_matrix_alloc(ncoeffs, ncoeffs);
  mw = gsl_multifit_linear_alloc(n, ncoeffs);

  /* this is the data to be fitted */
  for (i = 0; i < n; ++i)
    {
      double sigma;
      double xi = (15.0 / (N - 1)) * i;
      double yi = cos(xi) * exp(-0.1 * xi);

      sigma = 0.1 * yi;
      dy = gsl_ran_gaussian(r, sigma);
      yi += dy;

      gsl_vector_set(x, i, xi);
      gsl_vector_set(y, i, yi);
      gsl_vector_set(w, i, 1.0 / (sigma * sigma));

      printf("%f %f\n", xi, yi);
    }

  /* use uniform breakpoints on [0, 15] */
  gsl_bspline_knots_uniform(0.0, 15.0, bw);

  /* construct the fit matrix X */
  for (i = 0; i < n; ++i)
    {
      double xi = gsl_vector_get(x, i);

      /* compute B_j(xi) for all j */
      gsl_bspline_eval(xi, B, bw);

      /* fill in row i of X */
      for (j = 0; j < ncoeffs; ++j)
        {
          double Bj = gsl_vector_get(B, j);
          gsl_matrix_set(X, i, j, Bj);
        }
    }

  /* do the fit */
  gsl_multifit_wlinear(X, w, y, c, cov, &chisq, mw);

  dof = n - ncoeffs;
  tss = gsl_stats_wtss(w->data, 1, y->data, 1, y->size);
  Rsq = 1.0 - chisq / tss;

  fprintf(stderr, "chisq/dof = %e, Rsq = %f\n",
                   chisq / dof, Rsq);

  printf("\n\n");

  /* output the smoothed curve */
  {
    double xi, yi, yerr;

    for (xi = 0.0; xi < 15.0; xi += 0.1)
      {
        gsl_bspline_eval(xi, B, bw);
        gsl_multifit_linear_est(B, c, cov, &yi, &yerr);
        printf("%f %f\n", xi, yi);
      }
  }

  gsl_rng_free(r);
  gsl_bspline_free(bw);
  gsl_vector_free(B);
  gsl_vector_free(x);
  gsl_vector_free(y);
  gsl_matrix_free(X);
  gsl_vector_free(c);
  gsl_vector_free(w);
  gsl_matrix_free(cov);
  gsl_multifit_linear_free(mw);

  return 0;
} /* main() */

The output is shown below:

$ ./a.out > bspline.txt
chisq/dof = 1.118217e+00, Rsq = 0.989771

The data and fitted model are shown in Fig. 39.

_images/bspline.png

Fig. 39 Data (black) and fitted model (red)

References and Further Reading

Further information on the algorithms described in this section can be found in the following book,

  • C. de Boor, A Practical Guide to Splines (1978), Springer-Verlag, ISBN 0-387-90356-9.

Further information of Greville abscissae and B-spline collocation can be found in the following paper,

  • Richard W. Johnson, Higher order B-spline collocation at the Greville abscissae. Applied Numerical Mathematics. vol.: 52, 2005, 63–75.

A large collection of B-spline routines is available in the PPPACK library available at http://www.netlib.org/pppack, which is also part of SLATEC.