GNU Astronomy Utilities


Next: , Previous: , Up: Statistics   [Contents][Index]


7.1.2 2D Histograms

In Histogram and Cumulative Frequency Plot the concept of histograms were introduced on a single dataset. However, especially when doing high-level science on tables, the distribution in a 2D space may be of interest (for example a color-magnitude diagram). But the number of points may be too large for a simple scatter plot to show the concentration of the points: they will all fall over each other and just make a large connected region that will hide potentially interesting behaviors. This is where 2D histograms can become very useful. The desired 2D region is broken up into 2D bins (boxes) and the number of points falling within each box is returned. Added with a color-bar, you can now clearly see the distribution.

Gnuastro’s Statistics program has the --histogram2d option for this task. Its output will be three columns that have the centers of every box in both dimensions. The first column is the central box coordinates in the first dimension, the second has values along the second dimension and the third has the number of input points that fall within each box. You can specify the number of bins along each dimension through the --numbins (for first input column) an --numbins2 (for second input column). The output file from this command can then be given to any plotting tool to visualize the distribution.

For example, you can make high-quality plots within your paper (using the same LaTeX engine, thus blending very nicely with your text) using PGFPlots. Below you can see one such minimal example, using your favorite text editor, save it into a file, make the two small corrections in it, then run the commands shown at the top. This assumes that you have LaTeX installed, if not the steps to install a minimally sufficient LaTeX package on your system, see the respective section in Bootstrapping dependencies.

The two parts that need to be corrected are marked with ’%% <--’: the first one (XXXXXXXXX) should be replaced by the value to the --numbins option which is the number of bins along the first dimension. The second one (FILE.txt) should be replaced with the name of the file generated by Statistics.

%% Replace 'XXXXXXXXX' with your selected number of bins in the first
%% dimension.
%%
%% Then run these commands to build the plot in a LaTeX command.
%%    mkdir tikz
%%    pdflatex -shell-escape -halt-on-error plot.tex
\documentclass{article}

%% Load PGFPlots and set it to build the figure separately in a 'tikz'
%% directory (which has to exist before LaTeX is run). This
%% "externalization" is very useful to include the commands of multiple
%% plots in the middle of your paper/report, but also have the plots
%% separately to use in slides or other scenarios.
\usepackage{pgfplots}
\usetikzlibrary{external}
\tikzexternalize
\tikzsetexternalprefix{tikz/}

%% Start the document
\begin{document}

  You can actually write a full paper here and include many figures!
  Feel free to change this text.

  %% Define the colormap.
  \pgfplotsset{
    /pgfplots/colormap={coldredux}{
      [1cm]
      rgb255(0cm)=(255,255,255)
      rgb255(2cm)=(0,192,255)
      rgb255(4cm)=(0,0,255)
      rgb255(6cm)=(0,0,0)
    }
  }

  %% Draw the plot.
  \begin{tikzpicture}
    \small
    \begin{axis}[
      width=\linewidth,
      view={0}{90},
      colorbar horizontal,
      xlabel=X axis,
      ylabel=Y axis,
      ylabel shift=-0.1cm,
      colorbar style={at={(0,1.01)}, anchor=south west,
                      xticklabel pos=upper},
    ]
      \addplot3[
        surf,
        shader=flat corner,
        mesh/ordering=rowwise,
        mesh/cols=XXXXXXXXX,     %% <-- Number of bins in 1st column.
      ] file {FILE.txt};         %% <-- Name of aststatistics output.

  \end{axis}
\end{tikzpicture}

\end{document}

Next: , Previous: , Up: Statistics   [Contents][Index]