GNU Astronomy Utilities



7.1.5.4 Fitting options

With the options below, you can customize the least squares fitting features of Statistics. For a tutorial of the usage of least squares fitting in Statistics, please see Least squares fitting. Here, we will just review the details of each option.

To activate least squares fitting in Statistics, it is necessary to use the --fit option to specify the type of fit you want to do. See the description of --fit for the various available fitting models. The fitting models that account for weights require three input columns, while the non-weighted ones only take two input columns. Here is a summary of the input columns:

  1. The first input column is assumed to be the independent variable (on the horizontal axis of a plot, or \(X\) in the equations of each fit).
  2. The second input column is assumed to be the measured value (on the vertical axis of a plot, or \(Y\) in the equation above).
  3. The third input column is only for fittings with a weight. It is assumed to be the “weight” of the measurement column. The nature of the “weight” can be set with the --fitweight option, for example, if you have the standard deviation of the error in \(Y\), you can use --fitweight=std (which is the default, so unless the default value has been changed, you will not need to set this).

If three columns are given to a model without weight, or two columns are given to a model that requires weights, Statistics will abort and inform you. Below you can see an example of fitting with the same linear model, once weighted and once without weights.

$ aststatistics table.fits --column=X,Y      --fit=linear
$ aststatistics table.fits --column=X,Y,Yerr --fit=linear-weighted

The output of the fitting can be in three modes listed below. For a complete example, see the tutorial in Least squares fitting).

Human friendly format

By default (for example, the commands above) the output is an elaborate description of the model parameters. For example, \(c_0\) and \(c_1\) in the linear model (\(Y=c_0+c_1X\)). Their covariance matrix and the reduced \(\chi^2\) of the fit are also printed on the output.

Raw numbers

If you don’t need the human friendly components of the output (which are annoying when you want to parse the outputs in some scenarios), you can use --quiet option. Only the raw output numbers will be printed.

Estimate on a custom X column

Through the --fitestimate option, you can specify an independent table column to estimate the fit (it can also take a single value). See the description of this option for more.

-f STR
--fit=STR

The name of the fitting method to use. They are based on the linear and nonlinear least-squares fitting functions of the GNU Scientific Library (GSL).

linear

\(Y=c_0+c_1X\)

linear-weighted

\(Y=c_0+c_1X\); accounting for “weights” in \(Y\).

linear-no-constant

\(Y=c_1X\).

linear-no-constant-weighted

\(Y=c_1X\); accounting for “weights” in \(Y\).

polynomial

\(Y=c_0+c_1X+c_2X^2+\cdots+c_nX^n\); the maximum required power (\(n\)) is specified by --fitmaxpower.

polynomial-weighted

\(Y=c_0+c_1X+c_2X^2+\cdots+c_nX^n\); accounting for “weights” in \(Y\). The maximum required power (\(n\)) is specified by --fitmaxpower.

polynomial-robust

\(Y=c_0+c_1X+c_2X^2+\cdots+c_nX^n\); rejects outliers. The function to use for outlier removal can be specified with the --fitrobust option described below. This model doesn’t take weights since they are calculated internally based on the outlier removal function (requires two input columns). The maximum required power (\(n\)) is specified by --fitmaxpower.

For a comprehensive review of “robust” fitting and the available functions, please see the Robust linear regression section of the GNU Scientific Library.

--fitweight=STR

The nature of the “weight” column (when a weight is necessary for the model). It can take one of the following values:

std

Standard deviation of each \(Y\) axis measurement: this is the usual “error” associated with a measurement (for example, in MakeCatalog) and is the default value to this option.

var

Variance of each \(Y\) axis measurement. Assuming a Gaussian distribution with standard deviation \(\sigma\), the variance is \(\sigma^2\).

inv-var

Inverse variance of each \(Y\) axis measurement. Assuming a Gaussian distribution with standard deviation \(\sigma\), the variance is \(1/\sigma^2\).

--fitmaxpower=INT

The maximum power (an integer) in a polynomial (\(n\) in \(Y=c_0+c_1X+c_2X^2+\cdots+c_nX^n\)). This is only relevant when one of the polynomial models is given to --fit. The fit will return \(n+1\) coefficients.

--fitrobust=STR

The function for rejecting outliers in the polynomial-robust fitting model. For a comprehensive review of “robust” fitting and the available functions, please see the Robust linear regression section of the GNU Scientific Library. This function can take the following values:

bisquare

Tukey’s biweight (bisquare) function, this is the default function. According to the GSL manual, this is a good general purpose weight function.

cauchy

Cauchy’s function (also known as the Lorentzian function). It doesn’t guarantee a unique solution, so it should be used with care.

fair

The fair function. It guarantees a unique solution and has continuous derivatives to three orders.

huber

Huber’s \(\rho\) function. This is also a good general purpose weight function for rejecting outliers, but can cause difficulty in some special scenarios.

ols

Ordinary Least Squares (OLS) solution with a constant weight of unity.

welsch

Welsch function which is useful when the residuals follow an exponential distribution.

--fitestimate=STR/FLT

Estimate the fitted function at a single point or a complete column of points. The input \(X\) axis positions to estimate the function can be specified in the following ways:

  • A real number: the fitted function will be estimated at that \(X\) position and the corresponding \(Y\) and its error will be printed to standard output.
  • self: in this mode, the same X axis column that was used in the fit will be used for estimating the fitted function. This can be useful to visually/easily check the fit, see Least squares fitting.
  • A file name: If the value is none of the above, Statistics expects it to be a file name containing a table. If the file is a FITS file, the HDU containing the table should be specified with the --fitestimatehdu option. The column of the table to use for the \(X\) axis points should be specified with the --fitestimatecol option.

The output in this mode can be customized in the following ways:

  • If a single floating point value is given --fitestimate, the fitted function will be estimated on that point and printed to standard output.
  • When nothing is given to --output, the independent column and the estimated values and errors are printed on the standard output.
  • If a file name is given to --output, the estimated table above is saved in that file. It can have any of the formats in Recognized table formats. As a FITS file, all the fit outputs (coefficients, covariance matrix and reduced \(\chi^2\)) are kept as FITS keywords in the same HDU of the estimated table. For a complete example, see Least squares fitting.

    When the covariance matrix (and thus the \(\chi^2\)) cannot be calculated (for example if you only have two rows!), the printed values on the terminal will be NaN. However, the FITS standard does not allow NaN values in keyword values! Therefore, when writing the \(\chi^2\) and covariance matrix elements into the output FITS keywords, the largest value of the 64-bit floating point type will be written: \(1.79769313486232\times10^{308}\); see Numeric data types.

  • When --quiet is given with --fitestimate, the fitted parameters are no longer printed on the standard output; they are available as FITS keywords in the file given to --output.
--fitestimatehdu=STR/INT

HDU name or counter (counting from zero) that contains the table to be used for the estimating the fitted function over many points through --fitestimate. For more on selecting a HDU, see the description of --hdu in Input/Output options.

--fitestimatecol=STR/INT

Column name or counter (counting from one) that contains the table to be used for the estimating the fitted function over many points through --fitestimate. See Selecting table columns.