GNU Astronomy Utilities



12.3.28 Interpolation (interpolate.h)

During data analysis, it happens that parts of the data cannot be given a value, but one is necessary for the higher-level analysis. For example, a very bright star saturated part of your image and you need to fill in the saturated pixels with some values. Another common usage case are masked sky-lines in 1D spectra that similarly need to be assigned a value for higher-level analysis. In other situations, you might want a value in an arbitrary point: between the elements/pixels where you have data. The functions described in this section are for such operations.

The parametric interpolations discussed below are wrappers around the interpolation functions of the GNU Scientific Library (or GSL, see GNU Scientific Library). To identify the different GSL interpolation types, Gnuastro’s gnuastro/interpolate.h header file contains macros that are discussed below. The GSL wrappers provided here are not yet complete because we are too busy. If you need them, please consider helping us in adding them to Gnuastro’s library. Your contributions would be very welcome and appreciated.

Macro: GAL_INTERPOLATE_NEIGHBORS_METRIC_RADIAL
Macro: GAL_INTERPOLATE_NEIGHBORS_METRIC_MANHATTAN
Macro: GAL_INTERPOLATE_NEIGHBORS_METRIC_INVALID

The metric used to find distance for nearest neighbor interpolation. A radial metric uses the simple Euclidean function to find the distance between two pixels. A manhattan metric will always be an integer and is like steps (but is also much faster to calculate than radial metric because it does not need a square root calculation).

Macro: GAL_INTERPOLATE_NEIGHBORS_FUNC_MIN
Macro: GAL_INTERPOLATE_NEIGHBORS_FUNC_MAX
Macro: GAL_INTERPOLATE_NEIGHBORS_FUNC_MEAN
Macro: GAL_INTERPOLATE_NEIGHBORS_FUNC_MEDIAN
Macro: GAL_INTERPOLATE_NEIGHBORS_FUNC_INVALID

The various types of nearest-neighbor interpolation functions for gal_interpolate_neighbors. The names are descriptive for the operation they do, so we will not go into much more detail here. The median operator will be one of the most used, but operators like the maximum are good to fill the center of saturated stars.

Function:
gal_data_t *
gal_interpolate_neighbors (gal_data_t *input, struct gal_tile_two_layer_params *tl, uint8_t metric, size_t numneighbors, size_t numthreads, int onlyblank, int aslinkedlist, int function)

Interpolate the values in the input dataset using a calculated statistics from the distribution of their numneighbors closest neighbors. The desired statistics is determined from the func argument, which takes any of the GAL_INTERPOLATE_NEIGHBORS_FUNC_ macros (see above). This function is non-parametric and thus agnostic to the input’s number of dimension or shape of the distribution.

Distance can be defined on different metrics that are identified through metric (taking values determined by the GAL_INTERPOLATE_NEIGHBORS_METRIC_ macros described above). If onlyblank is non-zero, then only blank elements will be interpolated and pixels that already have a value will be left untouched. This function is multi-threaded and will run on numthreads threads (see gal_threads_number in Multithreaded programming (threads.h)).

tl is Gnuastro’s tessellation structure used to define tiles over an image and is fully described in Tile grid. When tl!=NULL, then it is assumed that the input->array contains one value per tile and interpolation will respect certain tessellation properties, for example, to not interpolate over channel borders.

If several datasets have the same set of blank values, you do not need to call this function multiple times. When aslinkedlist is non-zero, then input will be seen as a List of gal_data_t. In this case, the same neighbors will be used for all the datasets in the list. Of course, the values for each dataset will be different, so a different value will be written in each dataset, but the neighbor checking that is the most CPU intensive part will only be done once.

This is a non-parametric and robust function for interpolation. The interpolated values are also always within the range of the non-blank values and strong outliers do not get created. However, this type of interpolation must be used with care when there are gradients. This is because it is non-parametric and if there are not enough neighbors, step-like features can be created.

Macro: GAL_INTERPOLATE_1D_INVALID

This is just a place-holder to manage errors.

Macro: GAL_INTERPOLATE_1D_LINEAR

[From GSL:] Linear interpolation. This interpolation method does not require any additional memory.

Macro: GAL_INTERPOLATE_1D_POLYNOMIAL

[From GSL:] Polynomial interpolation. This method should only be used for interpolating small numbers of points because polynomial interpolation introduces large oscillations, even for well-behaved datasets. The number of terms in the interpolating polynomial is equal to the number of points.

Macro: GAL_INTERPOLATE_1D_CSPLINE

[From GSL:] Cubic spline with natural boundary conditions. The resulting curve is piece-wise cubic on each interval, with matching first and second derivatives at the supplied data-points. The second derivative is chosen to be zero at the first point and last point.

Macro: GAL_INTERPOLATE_1D_CSPLINE_PERIODIC

[From GSL:] Cubic spline with periodic boundary conditions. The resulting curve is piece-wise cubic on each interval, with matching first and second derivatives at the supplied data-points. The derivatives at the first and last points are also matched. Note that the last point in the data must have the same y-value as the first point, otherwise the resulting periodic interpolation will have a discontinuity at the boundary.

Macro: GAL_INTERPOLATE_1D_AKIMA

[From GSL:] Non-rounded Akima spline with natural boundary conditions. This method uses the non-rounded corner algorithm of Wodicka.

Macro: GAL_INTERPOLATE_1D_AKIMA_PERIODIC

[From GSL:] Non-rounded Akima spline with periodic boundary conditions. This method uses the non-rounded corner algorithm of Wodicka.

Macro: GAL_INTERPOLATE_1D_STEFFEN

[From GSL:] Steffen’s method276 guarantees the monotonicity of the interpolating function between the given data points. Therefore, minima and maxima can only occur exactly at the data points, and there can never be spurious oscillations between data points. The interpolated function is piece-wise cubic in each interval. The resulting curve and its first derivative are guaranteed to be continuous, but the second derivative may be discontinuous.

Function:
gsl_spline *
gal_interpolate_1d_make_gsl_spline (gal_data_t *X, gal_data_t *Y, int type_1d)

Allocate and initialize a GNU Scientific Library (GSL) 1D gsl_spline structure using the non-blank elements of Y. type_1d identifies the interpolation scheme and must be one of the GAL_INTERPOLATE_1D_* macros defined above.

If X==NULL, the X-axis is assumed to be integers starting from zero (the index of each element in Y). Otherwise, the values in X will be used to initialize the interpolation structure. Note that when given, X must not contain any blank elements and it must be sorted (in increasing order).

Each interpolation scheme needs a minimum number of elements to successfully operate. If the number of non-blank values in Y is less than this number, this function will return a NULL pointer.

To be as generic and modular as possible, GSL’s tools are low-level. Therefore before doing the interpolation, many steps are necessary (like preparing your dataset, then allocating and initializing gsl_spline). The metadata available in Gnuastro’s Generic data container (gal_data_t) make it easy to hide all those preparations within this function.

Once gsl_spline has been initialized by this function, the interpolation can be evaluated for any X value within the non-blank range of the input using gsl_spline_eval or gsl_spline_eval_e.

For example, in the small program below (sample-interp.c), we read the first two columns of the table in table.txt and feed them to this function to later estimate the values in the second column for three selected points. You can use BuildProgram to compile and run this function, see Library demo programs for more.

Contents of the table.txt file:

$ cat table.txt
0  0
1  2
3  6
4  8
6  12
8  16
9  18

Contents of the sample-interp.c file:

#include <stdio.h>
#include <stdlib.h>
#include <gnuastro/table.h>
#include <gnuastro/interpolate.h>

int
main(void)
{
  size_t i;
  gal_data_t *X, *Y;
  gsl_spline *spline;
  gsl_interp_accel *acc;
  gal_list_str_t *cols=NULL;

  /* Change the values based on your input table. */
  double points[]={1.8, 2.5, 7};

  /* Read the first two columns from `tab.txt'.
     IMPORTANT: the list is first-in-first-out, so the output
     column order is the inverse of the input order. */
  gal_list_str_add(&cols, "1", 0);
  gal_list_str_add(&cols, "2", 0);
  Y=gal_table_read("table.txt", NULL, NULL, cols,
                   GAL_TABLE_SEARCH_NAME, 0, 1, -1, 1, NULL);
  X=Y->next;

  /* Allocate the GSL interpolation accelerator and make the
     `gsl_spline' structure. */
  acc=gsl_interp_accel_alloc();
  spline=gal_interpolate_1d_make_gsl_spline(X, Y,
                                 GAL_INTERPOLATE_1D_STEFFEN);

  /* Calculate the respective value for all the given points,
     if `spline' could be allocated. */
  if(spline)
    for(i=0; i<(sizeof points)/(sizeof *points); ++i)
      printf("%f: %f\n", points[i],
             gsl_spline_eval(spline, points[i], acc));

  /* Clean up and return. */
  gal_data_free(X);
  gal_data_free(Y);
  gsl_spline_free(spline);
  gsl_interp_accel_free(acc);
  gal_list_str_free(cols, 0);
  return EXIT_SUCCESS;
}

Compile and run this program with BuildProgram to see the interpolation results for the three points within the program.

$ astbuildprog sample-interp.c --quiet
1.800000: 3.600000
2.500000: 5.000000
7.000000: 14.000000
Function:
void
gal_interpolate_1d_blank (gal_data_t *in, int type_1d)

Fill the blank elements of in using the rest of the elements and the given interpolation. The interpolation scheme can be set through type_1d, which accepts any of the GAL_INTERPOLATE_1D_* macros above. The interpolation is internally done in 64-bit floating point type (double). However the evaluated/interpolated values (originally blank) will be written (in in) with its original numeric datatype, using C’s standard type conversion.

By definition, interpolation is only defined “between” valid points. Therefore, if any number of elements on the start or end of the 1D array are blank, those elements will not be interpolated and will remain blank. To see if any blank (non-interpolated) elements remain, you can use gal_blank_present on in after this function is finished.


Footnotes

(276)

http://adsabs.harvard.edu/abs/1990A%26A...239..443S