GNU Astronomy Utilities



7.1.5 Invoking Statistics

Statistics will print statistical measures of an input dataset (table column or image). The executable name is aststatistics with the following general template

$ aststatistics [OPTION ...] InputImage.fits

One line examples:

## Print some general statistics of input image:
$ aststatistics image.fits

## Print some general statistics of column named MAG_F160W:
$ aststatistics catalog.fits -h1 --column=MAG_F160W

## Make the histogram of the column named MAG_F160W:
$ aststatistics table.fits -cMAG_F160W --histogram

## Find the Sky value on image with a given kernel:
$ aststatistics image.fits --sky --kernel=kernel.fits

## Print Sigma-clipped results of records with a MAG_F160W
## column value between 26 and 27:
$ aststatistics cat.fits -cMAG_F160W -g26 -l27 --sigmaclip=3,0.2

## Find the polynomial (to third order) that best fits the X and Y
## columns of 'table.fits'. Robust fitting will be used to reject
## outliers. Also, estimate the fitted polynomial on the same input
## column (with errors).
$ aststatistics table.fits --fit=polynomial-robust --fitmaxpower=3 \
                -cX,Y --fitestimate=self --output=estimated.fits

## Print the median value of all records in column MAG_F160W that
## have a value larger than 3 in column PHOTO_Z:
$ aststatistics tab.txt -rPHOTO_Z -g3 -cMAG_F160W --median

## Calculate the median of the third column in the input table, but only
## for rows where the mean of the first and second columns is >5.
$ awk '($1+$2)/2 > 5 {print $3}' table.txt | aststatistics --median

Statistics can take its input dataset either from a file (image or table) or the Standard input (see Standard input). If any output file is to be created, the value to the --output option, is used as the base name for the generated files. Without --output, the input name will be used to generate an output name, see Automatic output. The options described below are particular to Statistics, but for general operations, it shares a large collection of options with the other Gnuastro programs, see Common options for the full list. For more on reading from standard input, please see the description of --stdintimeout option in Input/Output options. Options can also be given in configuration files, for more, please see Configuration files.

The input dataset may have blank values (see Blank pixels), in this case, all blank pixels are ignored during the calculation. Initially, the full dataset will be read, but it is possible to select a specific range of data elements to use in the analysis of each run. You can either directly specify a minimum and maximum value for the range of data elements to use (with --greaterequal or --lessthan), or specify the range using quantiles (with --qrange). If a range is specified, all pixels outside of it are ignored before any processing.

When no operation is requested, Statistics will print some general basic properties of the input dataset on the command-line like the example below (ran on one of the output images of make check199). This default behavior is designed to help give you a general feeling of how the data are distributed and help in narrowing down your analysis.

$ aststatistics convolve_spatial_scaled_noised.fits     \
                --greaterequal=9500 --lessthan=11000
Statistics (GNU Astronomy Utilities) X.X
-------
Input: convolve_spatial_scaled_noised.fits (hdu: 0)
Range: from (inclusive) 9500, upto (exclusive) 11000.
Unit: counts
-------
  Number of elements:                      9074
  Minimum:                                 9622.35
  Maximum:                                 10999.7
  Mode:                                    10055.45996
  Mode quantile:                           0.4001983908
  Median:                                  10093.7
  Mean:                                    10143.98257
  Standard deviation:                      221.80834
-------
Histogram:
 |                   **
 |                 ******
 |                 *******
 |                *********
 |              *************
 |              **************
 |            ******************
 |            ********************
 |          *************************** *
 |        ***************************************** ***
 |*  **************************************************************
 |-----------------------------------------------------------------

Gnuastro’s Statistics is a very general purpose program, so to be able to easily understand this diversity in its operations (and how to possibly run them together), we will divided the operations into two types: those that do not respect the position of the elements and those that do (by tessellating the input on a tile grid, see Tessellation). The former treat the whole dataset as one and can re-arrange all the elements (for example, sort them), but the former do their processing on each tile independently. First, we will review the operations that work on the whole dataset.

The group of options below can be used to get single value measurement(s) of the whole dataset. They will print only the requested value as one field in a line/row, like the --mean, --median options. These options can be called any number of times and in any order. The outputs of all such options will be printed on one line following each other (with a space character between them). This feature makes these options very useful in scripts, or to redirect into programs like GNU AWK for higher-level processing. These are some of the most basic measures, Gnuastro is still under heavy development and this list will grow. If you want another statistical parameter, please contact us and we will do out best to add it to this list, see Suggest new feature.


Footnotes

(199)

You can try it by running the command in the tests directory, open the image with a FITS viewer and have a look at it to get a sense of how these statistics relate to the input image/dataset.