GNU Astronomy Utilities



7.1.5.3 Generating histograms and cumulative freq.

The list of options below are for those statistical operations that output more than one value. So while they can be called together in one run, their outputs will be distinct (each one’s output will usually be printed in more than one line).

-A
--asciihist

Print an ASCII histogram of the usable values within the input dataset along with some basic information like the example below (from the UVUDF catalog200). The width and height of the histogram (in units of character widths and heights on your command-line terminal) can be set with the --numasciibins (for the width) and --asciiheight options.

For a full description of the histogram, please see Histogram and Cumulative Frequency Plot. An ASCII plot is certainly very crude and cannot be used in any publication, but it is very useful for getting a general feeling of the input dataset very fast and easily on the command-line without having to take your hands off the keyboard (which is a major distraction!). If you want to try it out, you can write it all in one line and ignore the \ and extra spaces.

$ aststatistics uvudf_rafelski_2015.fits.gz --hdu=1         \
                --column=MAG_F160W --lessthan=40            \
                --asciihist --numasciibins=55
ASCII Histogram:
Number: 8593
Y: (linear: 0 to 660)
X: (linear: 17.7735 -- 31.4679, in 55 bins)
 |                                         ****
 |                                        *****
 |                                       ******
 |                                      ********
 |                                      *********
 |                                    ***********
 |                                  **************
 |                                *****************
 |                           ***********************
 |                    ********************************
 |*** ***************************************************
 |-------------------------------------------------------
--asciicfp

Print the cumulative frequency plot of the usable elements in the input dataset. Please see descriptions under --asciihist for more, the example below is from the same input table as that example. To better understand the cumulative frequency plot, please see Histogram and Cumulative Frequency Plot.

$ aststatistics uvudf_rafelski_2015.fits.gz --hdu=1         \
                --column=MAG_F160W --lessthan=40            \
                --asciicfp --numasciibins=55
ASCII Cumulative frequency plot:
Y: (linear: 0 to 8593)
X: (linear: 17.7735 -- 31.4679, in 55 bins)
 |                                                *******
 |                                             **********
 |                                            ***********
 |                                          *************
 |                                         **************
 |                                        ***************
 |                                      *****************
 |                                    *******************
 |                                ***********************
 |                         ******************************
 |*******************************************************
 |-------------------------------------------------------
-H
--histogram

Save the histogram of the usable values in the input dataset into a table. The first column is the value at the center of the bin and the second is the number of points in that bin. If the --cumulative option is also called with this option in a run, then the table will have three columns (the third is the cumulative frequency plot). Through the --numbins, --onebinstart, or --manualbinrange, you can modify the first column values and with --normalize and --maxbinone you can modify the second columns. See below for the description of each.

By default (when no --output is specified) a plain text table will be created, see Gnuastro text table format. If a FITS name is specified, you can use the common option --tableformat to have it as a FITS ASCII or FITS binary format, see Common options. This table can then be fed into your favorite plotting tool and get a much more clean and nice histogram than what the raw command-line can offer you (with the --asciihist option).

--histogram2d

Save the 2D histogram of two input columns into an output file, see 2D Histograms. The output will have three columns: the first two are the coordinates of each box’s center in the first and second dimensions/columns. The third will be number of input points that fall within that box.

-C
--cumulative

Save the cumulative frequency plot of the usable values in the input dataset into a table, similar to --histogram.

--madclip

Do median absolute deviation (MAD) clipping on the usable pixels of the input dataset. See MAD clipping for a description on MAD-clipping and Clipping outliers for a complete tutorial on clipping of outliers. The MAD-clipping parameters can be set through the --mclipparams option (see below).

-s
--sigmaclip

Do \(\sigma\)-clipping on the usable pixels of the input dataset. See Sigma clipping for a full description on \(\sigma\)-clipping and Clipping outliers for a complete tutorial on clipping of outliers. The \(\sigma\)-clipping parameters can be set through the --sclipparams option (see below).

--mirror=FLT

Make a histogram and cumulative frequency plot of the mirror distribution for the given dataset when the mirror is located at the value to this option. The mirror distribution is fully described in Appendix C of Akhlaghi and Ichikawa 2015 and currently it is only used to calculate the mode (see --mode).

Just note that the mirror distribution is a discrete distribution like the input, so while you may give any number as the value to this option, the actual mirror value is the closest number in the input dataset to this value. If the two numbers are different, Statistics will warn you of the actual mirror value used.

This option will make a table as output. Depending on your selected name for the output, it will be either a FITS table or a plain text table (which is the default). It contains three columns: the first is the center of the bins, the second is the histogram (with the largest value set to 1) and the third is the normalized cumulative frequency plot of the mirror distribution. The bins will be positioned such that the mode is on the starting interval of one of the bins to make it symmetric around the mirror. With this output file and the input histogram (that you can generate in another run of Statistics, using the --onebinvalue), it is possible to make plots like Figure 21 of Akhlaghi and Ichikawa 2015.

The list of options below allow customization of the histogram and cumulative frequency plots (for the --histogram, --cumulative, --asciihist, and --asciicfp options).

--numbins

The number of bins (rows) to use in the histogram and the cumulative frequency plot tables (outputs of --histogram and --cumulative).

--numasciibins

The number of bins (characters) to use in the ASCII plots when printing the histogram and the cumulative frequency plot (outputs of --asciihist and --asciicfp).

--asciiheight

The number of lines to use when printing the ASCII histogram and cumulative frequency plot on the command-line (outputs of --asciihist and --asciicfp).

-n
--normalize

Normalize the histogram or cumulative frequency plot tables (outputs of --histogram and --cumulative). For a histogram, the sum of all bins will become one and for a cumulative frequency plot the last bin value will be one.

--maxbinone

Divide all the histogram values by the maximum bin value so it becomes one and the rest are similarly scaled. In some situations (for example, if you want to plot the histogram and cumulative frequency plot in one plot) this can be very useful.

--onebinstart=FLT

Make sure that one bin starts with the value to this option. In practice, this will shift the bins used to find the histogram and cumulative frequency plot such that one bin’s lower interval becomes this value.

For example, when a histogram range includes negative and positive values and zero has a special significance in your analysis, then zero might fall somewhere in one bin. As a result that bin will have counts of positive and negative. By setting --onebinstart=0, you can make sure that one bin will only count negative values in the vicinity of zero and the next bin will only count positive ones in that vicinity.

Note that by default, the first row of the histogram and cumulative frequency plot show the central values of each bin. So in the example above you will not see the 0.000 in the first column, you will see two symmetric values.

If the value is not within the usable input range, this option will be ignored. When it is, this option is the last operation before the bins are finalized, therefore it has a higher priority than options like --manualbinrange.

--manualbinrange

Use the values given to the --greaterequal and --lessthan to define the range of all bin-based calculations like the histogram. This option itself does not take any value, but just tells the program to use the values of those two options instead of the minimum and maximum values of a plot. If any of the two options are not given, then the minimum or maximum will be used respectively. Therefore, if none of them are called calling this option is redundant.

The --onebinstart option has a higher priority than this option. In other words, --onebinstart takes effect after the range has been finalized and the initial bins have been defined, therefore it has the power to (possibly) shift the bins. If you want to manually set the range of the bins and have one bin on a special value, it is thus better to avoid --onebinstart.

--numbins2=INT

Similar to --numbins, but for the second column when a 2D histogram is requested, see --histogram2d.

--greaterequal2=FLT

Similar to --greaterequal, but for the second column when a 2D histogram is requested, see --histogram2d.

--lessthan2=FLT

Similar to --lessthan, but for the second column when a 2D histogram is requested, see --histogram2d.

--onebinstart2=FLT

Similar to --onebinstart, but for the second column when a 2D histogram is requested, see --histogram2d.


Footnotes

(200)

https://asd.gsfc.nasa.gov/UVUDF/uvudf_rafelski_2015.fits.gz