GNU Astronomy Utilities



7.1.5.2 Single value measurements

-n
--number

Print the number of all used (non-blank and in range) elements.

--minimum

Print the minimum value of all used elements.

--maximum

Print the maximum value of all used elements.

--sum

Print the sum of all used elements.

-m
--mean

Print the mean (average) of all used elements.

-t
--std

Print the standard deviation of all used elements.

--mad

Print the median absolute deviation (MAD) of all used elements.

-E
--median

Print the median of all used elements.

-u FLT[,FLT[,...]]
--quantile=FLT[,FLT[,...]]

Print the values at the given quantiles of the input dataset. Any number of quantiles may be given and one number will be printed for each. Values can either be written as a single number or as fractions, but must be between zero and one (inclusive). Hence, in effect --quantile=0.25 --quantile=0.75 is equivalent to --quantile=0.25,3/4, or -u1/4,3/4.

The returned value is one of the elements from the dataset. Taking \(q\) to be your desired quantile, and \(N\) to be the total number of used (non-blank and within the given range) elements, the returned value is at the following position in the sorted array: \(round(q\times{}N\)).

--quantfunc=FLT[,FLT[,...]]

Print the quantiles of the given values in the dataset. This option is the inverse of the --quantile and operates similarly except that the acceptable values are within the range of the dataset, not between 0 and 1. Formally it is known as the “Quantile function”.

Since the dataset is not continuous this function will find the nearest element of the dataset and use its position to estimate the quantile function.

--quantofmean

Print the quantile of the mean in the dataset. This is a very good measure of detecting skewness or outliers. The concept is used by programs like NoiseChisel to identify the presence of signal in a tile of the image (because signal in noise causes skewness).

For example, take this simple array: 1 2 20 4 5 6 3. The mean is 5.85. The nearest element to this mean is 6 and the quantile of 6 in this distribution is 0.8333. Here is how we got to this: in the sorted dataset (1 2 3 4 5 6 20), 6 is the 5-th element (counting from zero, since a quantile of zero corresponds to the minimum, by definition) and the maximum is the 6-th element (again, counting from zero). So the quantile of the mean in this case is \(5/6=0.8333\).

In the example above, if we had 7 instead of 20 (which was an outlier), then the mean would be 4 and the quantile of the mean would be 0.5 (which by definition, is the quantile of the median), showing no outliers. As the number of elements increases, the mean itself is less affected by a small number of outliers, but skewness can be nicely identified by the quantile of the mean.

-O
--mode

Print the mode of all used elements. The mode is found through the mirror distribution which is fully described in Appendix C of Akhlaghi and Ichikawa 2015. See that section for a full description.

This mode calculation algorithm is non-parametric, so when the dataset is not large enough (larger than about 1000 elements usually), or does not have a clear mode it can fail. In such cases, this option will return a value of nan (for the floating point NaN value).

As described in that paper, the easiest way to assess the quality of this mode calculation method is to use it’s symmetricity (see --modesym below). A better way would be to use the --mirror option to generate the histogram and cumulative frequency tables for any given mirror value (the mode in this case) as a table. If you generate plots like those shown in Figure 21 of that paper, then your mode is accurate.

--modequant

Print the quantile of the mode. You can get the actual mode value from the --mode described above. In many cases, the absolute value of the mode is irrelevant, but its position within the distribution is important. In such cases, this option will become handy.

--modesym

Print the symmetricity of the calculated mode. See the description of --mode for more. This mode algorithm finds the mode based on how symmetric it is, so if the symmetricity returned by this option is too low, the mode is not too accurate. See Appendix C of Akhlaghi and Ichikawa 2015 for a full description. In practice, symmetricity values larger than 0.2 are mostly good.

--modesymvalue

Print the value in the distribution where the mirror and input distributions are no longer symmetric, see --mode and Appendix C of Akhlaghi and Ichikawa 2015 for more.

--sigclip-std
--sigclip-mad
--sigclip-mean
--sigclip-number
--sigclip-median

Calculate the desired statistic after applying \(\sigma\)-clipping (see Sigma clipping, part of the tutorial Clipping outliers). \(\sigma\)-clipping configuration is done with the --sclipparams option.

Here is one scenario where this can be useful: assume you have a table and you would like to remove the rows that are outliers (not within the \(\sigma\)-clipping range). Let’s assume your table is called table.fits and you only want to keep the rows that have a value in COLUMN within the \(\sigma\)-clipped range (to \(3\sigma\), with a tolerance of 0.1). This command will return the \(\sigma\)-clipped median and standard deviation (used to define the range later).

$ aststatistics table.fits -cCOLUMN --sclipparams=3,0.1 \
                --sigclip-median --sigclip-std

You can then use the --range option of Table (see Table) to select the proper rows. But for that, you need the actual starting and ending values of the range (\(m\pm s\sigma\); where \(m\) is the median and \(s\) is the multiple of sigma to define an outlier). Therefore, the raw outputs of Statistics in the command above are not enough.

To get the starting and ending values of the non-outlier range (and put a ‘,’ between them, ready to be used in --range), pipe the result into AWK. But in AWK, we will also need the multiple of \(\sigma\), so we will define it as a shell variable (s) before calling Statistics (note how $s is used two times now):

$ s=3
$ aststatistics table.fits -cCOLUMN --sclipparams=$s,0.1 \
                --sigclip-median --sigclip-std           \
     | awk '{s='$s'; printf("%f,%f\n", $1-s*$2, $1+s*$2)}'

To pass it onto Table, we will need to keep the printed output from the command above in another shell variable (r), not print it. In Bash, can do this by putting the whole statement within a $():

$ s=3
$ r=$(aststatistics table.fits -cCOLUMN --sclipparams=$s,0.1 \
                    --sigclip-median --sigclip-std           \
        | awk '{s='$s'; printf("%f,%f\n", $1-s*$2, $1+s*$2)}')
$ echo $r      # Just to confirm.

Now you can use Table with the --range option to only print the rows that have a value in COLUMN within the desired range:

$ asttable table.fits --range=COLUMN,$r

To save the resulting table (that is clean of outliers) in another file (for example, named cleaned.fits, it can also have a .txt suffix), just add --output=cleaned.fits to the command above.

--madclip-std
--madclip-mad
--madclip-mean
--madclip-number
--madclip-median

Calculate the desired statistic after applying median absolute deviation (MAD) clipping (see MAD clipping, part of the tutorial Clipping outliers). MAD-clipping configuration is done with the --mclipparams option.

This option behaves similarly to --sigclip-* options, read their description for usage examples.