GNU Astronomy Utilities



7.1.2 2D Histograms

In Histogram and Cumulative Frequency Plot the concept of histograms were introduced on a single dataset. But they are only useful for viewing the distribution of a single variable (column in a table). In many contexts, the distribution of two variables in relation to each other may be of interest. For example, the color-magnitude diagrams in astronomy, where the horizontal axis is the luminosity or magnitude of an object, and the vertical axis is the color. Scatter plots are useful to see these relations between the objects of interest when the number of the objects is small.

As the density of points in the scatter plot increases, the points will fall over each other and just make a large connected region hide potentially interesting behaviors/correlations in the densest regions. This is where 2D histograms can become very useful. A 2D histogram is composed of 2D bins (boxes or pixels), just as a 1D histogram consists of 1D bins (lines). The number of points falling within each box/pixel will then be the value of that box. Added with a color-bar, you can now clearly see the distribution independent of the density of points (for example, you can even visualize it in log-scale if you want).

Gnuastro’s Statistics program has the --histogram2d option for this task. It takes a single argument (either table or image) that specifies the format of the output 2D histogram. The two formats will be reviewed separately in the sub-sections below. But let’s start with the generalities that are common to both (related to the input, not the output).

You can specify the two columns to be shown using the --column (or -c) option. So if you want to plot the color-magnitude diagram from a table with the MAG-R column on the horizontal and COLOR-G-R on the vertical column, you can use --column=MAG-r,COLOR-G-r. The number of bins along each dimension can be set with --numbins (for first input column) and --numbins2 (for second input column).

Without specifying any range, the full range of values will be used in each dimension. If you only want to focus on a certain interval of the values in the columns in any dimension you can use the --greaterequal and --lessthan options to limit the values along the first/horizontal dimension and --greaterequal2 and --lessthan2 options for the second/vertical dimension.