GNU Astronomy Utilities



2.1.16 Column statistics (color-magnitude diagram)

In Working with catalogs (estimating colors) we created a single catalog containing the magnitudes of our desired clumps in all three filters, and their colors. To start with, let’s inspect the distribution of three colors with the Statistics program.

$ aststatistics cat/mags-with-color.fits -cF105W-F125W
$ aststatistics cat/mags-with-color.fits -cF105W-F160W
$ aststatistics cat/mags-with-color.fits -cF125W-F160W

This tiny and cute ASCII histogram (and the general information printed above it) gives you a crude (but very useful and fast) feeling on the distribution. You can later use Gnuastro’s Statistics program with the --histogram option to build a much more fine-grained histogram as a table to feed into your favorite plotting program for a much more accurate/appealing plot (for example, with PGFPlots in LaTeX). If you just want a specific measure, for example, the mean, median and standard deviation, you can ask for them specifically, like below:

$ aststatistics cat/mags-with-color.fits -cF105W-F160W \
                --mean --median --std

The basic statistics we measured above were just on one column. In many scenarios this is fine, but things get much more exciting if you look at the correlation of two columns with each other. For example, let’s create the color-magnitude diagram for our measured targets.

In many papers, the color-magnitude diagram is usually plotted as a scatter plot. However, scatter plots have a major limitation when there are a lot of points and they cluster together in one region of the plot: the possible correlation in that dense region is lost (because the points fall over each other). In such cases, it is much better to use a 2D histogram. In a 2D histogram, the full range in both columns is divided into discrete 2D bins (or pixels!) and we count how many objects fall in that 2D bin.

Since a 2D histogram is a pixelated space, we can simply save it as a FITS image and view it in a FITS viewer. Let’s do this in the command below. As is common with color-magnitude plots, we will put the redder magnitude on the horizontal axis and the color on the vertical axis. We will set both dimensions to have 100 bins (with --numbins for the horizontal and --numbins2 for the vertical). Also, to avoid strong outliers in any of the dimensions, we will manually set the range of each dimension with the --greaterequal, --greaterequal2, --lessthan and --lessthan2 options.

$ aststatistics cat/mags-with-color.fits -cMAG-F160W,F105W-F160W \
                --histogram2d=image --manualbinrange \
                --numbins=100  --greaterequal=22  --lessthan=30 \
                --numbins2=100 --greaterequal2=-1 --lessthan2=3 \
                --manualbinrange --output=cmd.fits

You can now open this FITS file as a normal FITS image, for example, with the command below. Try hovering/zooming over the pixels: not only will you see the number of objects in catalog that fall in each bin/pixel, but you also see the F160W magnitude and color of that pixel also (in the same place you usually see RA and Dec when hovering over an astronomical image).

$ astscript-fits-view cmd.fits --ds9scale=minmax

Having a 2D histogram as a FITS image with WCS has many great advantages. For example, just like FITS images of the night sky, you can “match” many 2D histograms that were created independently. You can add two histograms with each other, or you can use advanced features of FITS viewers to find structure in the correlation of your columns.

With the first command below, you can activate the grid feature of DS9 to actually see the coordinate grid, as well as values on each line. With the second command, DS9 will even read the labels of the axes and use them to generate an almost publication-ready plot.

$ astscript-fits-view cmd.fits --ds9scale=minmax --ds9extra="-grid yes"
$ astscript-fits-view cmd.fits --ds9scale=minmax \
           --ds9extra="-grid yes -grid type publication"

If you are happy with the grid and coloring and the rest, you can also use ds9 to save this as a JPEG image to directly use in your documents/slides with these extra DS9 options (DS9 will write the image to cmd-2d.jpeg and quit immediately afterwards):

$ astscript-fits-view cmd.fits --ds9scale=minmax \
           --ds9extra="-grid yes -grid type publication" \
           --ds9extra="-saveimage cmd-2d.jpeg -quit"

This is good for a fast progress update. But for your paper or more official report, you want to show something with higher quality. For that, you can use the PGFPlots package in LaTeX to add axes in the same font as your text, sharp grids and many other elegant/powerful features (like over-plotting interesting points and lines). But to load the 2D histogram into PGFPlots first you need to convert the FITS image into a more standard format, for example, PDF. We will use Gnuastro’s ConvertType for this, and use the sls-inverse color map (which will map the pixels with a value of zero to white):

$ astconvertt cmd.fits --colormap=sls-inverse --borderwidth=0 -ocmd.pdf

Open the resulting cmd.pdf and see the PDF. Below you can see a minimally working example of how to add axis numbers, labels and a grid to the PDF generated above. First, let’s create a new report directory to keep the LaTeX outputs, then put the minimal report’s source in a file called report.tex. Notice the xmin, xmax, ymin, ymax values and how they are the same as the range specified above.

$ mkdir report-cmd
$ mv cmd.pdf report-cmd/
$ cat report-cmd/report.tex
\documentclass{article}
\usepackage{pgfplots}
\dimendef\prevdepth=0
\begin{document}

You can write all you want here...

\begin{tikzpicture}
  \begin{axis}[
      enlargelimits=false,
      grid,
      axis on top,
      width=\linewidth,
      height=\linewidth,
      xlabel={Magnitude (F160W)},
      ylabel={Color (F105W-F160W)}]

    \addplot graphics[xmin=22, xmax=30, ymin=-1, ymax=3] {cmd.pdf};
  \end{axis}
\end{tikzpicture}
\end{document}

Run this command to build your PDF (assuming you have LaTeX and PGFPlots).

$ cd report-cmd
$ pdflatex report.tex

Open the newly created report.pdf and enjoy the exquisite quality. The improved quality, blending in with the text, vector-graphics resolution and other features make this plot pleasing to the eye, and let your readers focus on the main point of your scientific argument. PGFPlots can also built the PDF of the plot separately from the rest of the paper/report, see 2D histogram as a table for plotting for the necessary changes in the preamble.

We will not go much deeper into the Statistics program here, but there is so much more you can do with it. After finishing the tutorial, see Statistics.