Next: Binary datasets, Previous: Matching, Up: Gnuastro library [Contents][Index]

After reading a dataset into memory from a file or fully simulating it with
another process, the most common processes that will be done on it are
statistical operations to let you quantify different aspects of the
data. the functions in this section describe Gnuastro’s current set of
tools for this job. All these functions can work on any numeric data type
natively (see Numeric data types) and can also work on tiles over a
dataset. Hence the inputs and outputs are in Gnuastro’s Generic data container (`gal_data_t`

).

- Macro:
**GAL_STATISTICS_SIG_CLIP_MAX_CONVERGE** The maximum number of clips, when \(\sigma\)-clipping should be done by convergence. If the clipping does not converge before making this many clips, all sigma-clipping outputs will be NaN.

- Macro:
**GAL_STATISTICS_MODE_GOOD_SYM** The minimum acceptable symmetricity of the mode calculation. If the symmetricity of the derived mode is less than this value, all the returned values by

`gal_statistics_mode`

will have a value of NaN.

- Macro:
**GAL_STATISTICS_SORTED_NOT** - Macro:
**GAL_STATISTICS_SORTED_INCREASING** - Macro:
**GAL_STATISTICS_SORTED_DECREASING** Macros used to identify if the dataset is sorted and increasing, sorted and decreasing or not sorted.

- Macro:
**GAL_STATISTICS_BINS_INVALID** - Macro:
**GAL_STATISTICS_BINS_REGULAR** - Macro:
**GAL_STATISTICS_BINS_IRREGULAR** Macros used to identify if the regularity of the bins when defining bins.

- Function:

*gal_data_t **

**gal_statistics_number***(gal_data_t*`*input`

) Return a single-element

`uint64`

dataset containing the number of non-blank elements in`input`

.

- Function:

*gal_data_t **

**gal_statistics_minimum***(gal_data_t*`*input`

) Return a single-element dataset containing the minimum non-blank value in

`input`

. The numerical datatype of the output is the same as`input`

.

- Function:

*gal_data_t **

**gal_statistics_maximum***(gal_data_t*`*input`

) Return a single-element dataset containing the maximum non-blank value in

`input`

. The numerical datatype of the output is the same as`input`

.

- Function:

*gal_data_t **

**gal_statistics_sum***(gal_data_t*`*input`

) Return a single-element (

`double`

or`float64`

) dataset containing the sum of the non-blank values in`input`

.

- Function:

*gal_data_t **

**gal_statistics_mean***(gal_data_t*`*input`

) Return a single-element (

`double`

or`float64`

) dataset containing the mean of the non-blank values in`input`

.

- Function:

*gal_data_t **

**gal_statistics_std***(gal_data_t*`*input`

) Return a single-element (

`double`

or`float64`

) dataset containing the standard deviation of the non-blank values in`input`

.

- Function:

*gal_data_t **

**gal_statistics_mean_std***(gal_data_t*`*input`

) Return a two-element (

`double`

or`float64`

) dataset containing the mean and standard deviation of the non-blank values in`input`

. The first element of the returned dataset is the mean and the second is the standard deviation.This function will calculate both values in one pass over the dataset. Hence when both the mean and standard deviation of a dataset are necessary, this function is much more efficient than calling

`gal_statistics_mean`

and`gal_statistics_std`

separately.

- Function:

*gal_data_t **

**gal_statistics_median***(gal_data_t*`*input`

, int`inplace`

) Return a single-element dataset containing the median of the non-blank values in

`input`

. The numerical datatype of the output is the same as`input`

.Calculating the median involves sorting the dataset and removing blank values, for better performance (and less memory usage), you can give a non-zero value to the

`inplace`

argument. In this case, the sorting and removal of blank elements will be done directly on the input dataset. However, after this function the original dataset may have changed (if it wasn’t sorted or had blank values).

- Function:

*size_t*

**gal_statistics_quantile_index***(size_t*`size`

, double`quantile`

) Return the index of the element that has a quantile of

`quantile`

assuming the dataset has`size`

elements.

- Function:

*size_t*

**gal_statistics_quantile***(gal_data_t*`*input`

, double`quantile`

, int`inplace`

) Return a single-element dataset containing the value with in a quantile

`quantile`

of the non-blank values in`input`

. The numerical datatype of the output is the same as`input`

. See`gal_statistics_median`

for a description of`inplace`

.

- Function:

*size_t*

**gal_statistics_quantile_function_index***(gal_data_t*`*input`

, gal_data_t`*value`

, int`inplace`

) Return the index of the quantile function (inverse quantile) of

`input`

at`value`

. In other words, this function will return the index of the nearest element (of a sorted and non-blank)`input`

to`value`

. See`gal_statistics_median`

for a description of`inplace`

.

- Function:

*gal_data_t **

**gal_statistics_quantile_function***(gal_data_t*`*input`

, gal_data_t`*value`

, int`inplace`

) Return a single-element (

`double`

or`float64`

) dataset containing the quantile function of the non-blank values in`input`

at`value`

. In other words, this function will return the quantile of`value`

in`input`

. See`gal_statistics_median`

for a description of`inplace`

.

- Function:

*gal_data_t **

**gal_statistics_mode***(gal_data_t*`*input`

, float`mirrordist`

, int`inplace`

) Return a four-element (

`double`

or`float64`

) dataset that contains the mode of the`input`

distribution. This function implements the non-parametric algorithm to find the mode that is described in Appendix C of Akhlaghi and Ichikawa [2015].In short it compares the actual distribution and its “mirror distribution” to find the mode. In order to be efficient, you can determine how far the comparison goes away from the mirror through the

`mirrordist`

parameter (think of it as a multiple of sigma/error). See`gal_statistics_median`

for a description of`inplace`

.The output array has the following elements (in the given order, note that counting in C starts from 0).

array[0]: mode array[1]: mode quantile. array[2]: symmetricity. array[3]: value at the end of symmetricity.

- Function:

*gal_data_t **

**gal_statistics_mode_mirror_plots***(gal_data_t*`*input`

, gal_data_t`*value`

, size_t`numbins`

, int`inplace`

, double`*mirror_val`

) Make a mirrored histogram and cumulative frequency plot (with

`numbins`

) with the mirror distribution of the`input`

with a value at`value`

.The output is a list of data structures (see List of

`gal_data_t`

): the first is the bins with one bin at the mirror point, the second is the histogram with a maximum of one and the third is the cumulative frequency plot (with a maximum of one).

- Function:

*int*

**gal_statistics_is_sorted***(gal_data_t*`*input`

) Return the respective sort macro (see above) for the

`input`

dataset.

- Function:

*void*

**gal_statistics_sort_increasing***(gal_data_t*`*input`

) Sort the input dataset (in place) in an increasing order.

- Function:

*void*

**gal_statistics_sort_decreasing***(gal_data_t*`*input`

) Sort the input dataset (in place) in a decreasing order.

- Function:

*gal_data_t **

**gal_statistics_no_blank_sorted***(gal_data_t*`*input`

, int`inplace`

) Remove all the blanks and sort the input dataset. If

`inplace`

is non-zero this will happen on the input dataset (and the output will be the same as the input). However, if`inplace`

is zero, this function will allocate a new copy of the dataset that is sorted and has no blank values.

- Function:

*gal_data_t **

**gal_statistics_regular_bins***(gal_data_t*`*input`

, gal_data_t`*inrange`

, size_t`numbins`

, double`onebinstart`

) Generate an array of regularly spaced elements as a 1D array (column) of type

`double`

(i.e.,`float64`

, it has to be double to account for small differences on the bin edges). The input arguments are described below`input`

The dataset you want to apply the bins to. This is only necessary if the range argument is not complete, see below. If

`inrange`

has all the necessary information, you can pass a`NULL`

pointer for this.`inrange`

This dataset keeps the desired range along each dimension of the input data structure, it has to be in

`float`

(i.e.,`float32`

) type.- If you want the full range of the dataset (in any dimensions, then just set
`inrange`

to`NULL`

and the range will be specified from the minimum and maximum value of the dataset (`input`

cannot be`NULL`

in this case). - If there is one element for each dimension in range, then it is viewed as a quantile (Q), and the range will be: ‘Q to 1-Q’.
- If there are two elements for each dimension in range, then they are assumed to be your desired minimum and maximum values. When either of the two are NaN, the minimum and maximum will be calculated for it.

- If you want the full range of the dataset (in any dimensions, then just set
`numbins`

The number of bins: must be larger than 0.

`onebinstart`

A desired value for onebinstart. Note that with this option, the bins won’t start and end exactly on the given range values, it will be slightly shifted to accommodate this request.

- Function:

*gal_data_t **

**gal_statistics_histogram***(gal_data_t*`*input`

, gal_data_t`*bins`

, int`normalize`

, int`maxone`

) Make a histogram of all the elements in the given dataset with bin values that are defined in the

`inbins`

structure (see`gal_statistics_regular_bins`

).`inbins`

is not mandatory, if you pass a`NULL`

pointer, the bins structure will be built within this function based on the`numbins`

input. As a result, when you have already defined the bins,`numbins`

is not used.

- Function:

*gal_data_t **

**gal_statistics_cfp***(gal_data_t*`*input`

, gal_data_t`*bins`

, int`normalize`

) Make a cumulative frequency plot (CFP) of all the elements in

`input`

with bin values that are defined in the`bins`

structure (see`gal_statistics_regular_bins`

).The CFP is built from the histogram: in each bin, the value is the sum of all previous bins in the histogram. Thus, if you have already calculated the histogram before calling this function, you can pass it onto this function as the data structure in

`bins->next`

(see`List of gal_data_t`

). If`bin->next!=NULL`

, then it is assumed to be the histogram. If it is`NULL`

, then the histogram will be calculated internally and freed after the job is finished.When a histogram is given and it is normalized, the CFP will also be normalized (even if the normalized flag is not set here): note that a normalized CFP’s maximum value is 1.

- Function:

*gal_data_t **

**gal_statistics_sigma_clip***(gal_data_t*`*input`

, float`multip`

, float`param`

, int`inplace`

, int`quiet`

) Apply \(\sigma\)-clipping on a given dataset and return a dataset that contains the results. For a description of \(\sigma\)-clipping see Sigma clipping.

`multip`

is the multiple of the standard deviation (\(\sigma\) that is used to define outliers in each round of clipping).The role of

`param`

is determined based on its value. If`param`

is larger than`1`

(one), it must be an integer and will be interpretted as the number clips to do. If it is less than`1`

(one), it is interpretted as the tolerance level to stop the iteration.The output dataset has the following elements with a

`GAL_TYPE_FLOAT32`

type:array[0]: Number of points used. array[1]: Median. array[2]: Mean. array[3]: Standard deviation.

Next: Binary datasets, Previous: Matching, Up: Gnuastro library [Contents][Index]

JavaScript license information

GNU Astronomy Utilities 0.5 manual, December 2017.