GNU Astronomy Utilities



2.1.14 Measuring the dataset limits

In Segmentation and making a catalog, we created a catalog of the different objects with the image. Before measuring colors, or doing any other kind of analysis on the catalogs (and detected objects), it is very important to understand the limitations of the dataset. Without understanding the limitations of your dataset, you cannot make any physical interpretation of your results. The theory behind the calculations discussed here is thoroughly introduced in Quantifying measurement limits.

For example, with the command below, let’s sort all the detected clumps in the image by magnitude (with --sort=magnitude) and and print the magnitude and signal-to-noise ratio (S/N; with -cmagnitude,sn):

$ asttable cat/xdf-f160w.fits -hclumps -cmagnitude,sn \
           --sort=magnitude --noblank=magnitude

As you see, we have clumps with a total magnitude of almost 32! This is extremely faint! Are these things trustable? Let’s have a look at all of those with a magnitude between 31 and 32 with the command below. We are first using Table to only keep the relevant columns rows, and using Gnuastro’s DS9 region file creation script (astscript-ds9-region) to generate DS9 region files, and open DS9:

$ asttable cat/xdf-f160w.fits -hclumps -cra,dec \
           --range=magnitude,31:32  \
      | astscript-ds9-region -c1,2 --radius=0.5 \
           --command="ds9 -mecube seg/xdf-f160w.fits -zscale"

Zoom-out a little and you will see some green circles (DS9 region files) in some regions of the image. There actually does seem to be a true peak under the selected regions, but as you see, they are very small, diffuse and noisy. How reliable are the measured magnitudes? Using the S/N column from the first command above, you can see that such objects only have a signal to noise of about 2.6 (which is indeed too low for most analysis purposes)

$ asttable cat/xdf-f160w.fits -hclumps -csn \
           --range=magnitude,31:32 | aststatistics

This brings us to the first method of quantifying your dataset’s magnitude limit, which is also sometimes called detection limit (see Magnitude limit of image). To estimate the \(5\sigma\) detection limit of your dataset, you simply report the median magnitude of the objects that have a signal to noise of (approximately) five. This is very easy to calculate with the command below:

$ asttable cat/xdf-f160w.fits -hclumps --range=sn,4.8:5.2 -cmagnitude \
           | aststatistics --median
29.9949

Let’s have a look at these objects, to get a feeling of what these clump looks like:

$ asttable cat/xdf-f160w.fits -hclumps --range=sn,4.8:5.2 \
           -cra,dec,magnitude \
           | astscript-ds9-region -c1,2 --namecol=3 \
                      --width=2 --radius=0.5 \
                      --command="ds9 -mecube seg/xdf-f160w.fits -zscale"

The number you see on top of each region is the clump’s magnitude. Please go over the objects and have a close look at them! It is very important to have a feeling of what your dataset looks like, and how to interpret the numbers to associate an image with them.

Generally, they look very small with different levels of diffuse-ness! Those that are sharper make more visual sense (to be \(5\sigma\) detections), but the more diffuse ones extend over a larger area. Furthermore, the noise is measured on individual pixel measurements. However, during the reduction many exposures are co-added and stacked, mixing the pixels like a small convolution (creating “correlated noise”). Therefore you clearly see two main issues with the detection limit as defined above: it depends on the morphology, and it does not take into account the correlated noise.

A more realistic way to estimate the significance of the detection is to take its footprint, randomly place it in thousands of undetected regions of the image and use that distribution as a reference. This is technically known as upper-limit measurements. For a full discussion, see Upper limit magnitude of each detection).

Since it is for each separate object, the upper-limit measurements should be requested as extra columns in MakeCatalog’s output. For example, with the command below, let’s generate a new catalog of the F160W filter, but with two extra columns compared to the one in cat/: the upper-limit magnitude and the upper-limit multiple of sigma.

$ astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \
               --zeropoint=25.94 --clumpscat --upnsigma=3 \
               --upperlimit-mag --upperlimit-sigma \
               --output=xdf-f160w.fits

Let’s compare the upper-limit magnitude with the measured magnitude of each clump:

$ asttable xdf-f160w.fits -hclumps -cmagnitude,upperlimit_mag

As you see, in almost all of the cases, the measured magnitude is sufficiently higher than the upper-limit magnitude. Let’s subtract the latter from the former to better see this difference in a third column:

$ asttable xdf-f160w.fits -hclumps -cmagnitude,upperlimit_mag \
           -c'arith upperlimit_mag magnitude -'

The ones with a positive third column (difference) show that the clump has sufficiently higher brightness than the noisy background to be usable. Let’s use Table’s Column arithmetic to find only those that have a negative difference:

$ asttable xdf-f160w.fits -hclumps -cra,dec --noblankend=3 \
      -c'arith upperlimit_mag magnitude - set-d d d 0 gt nan where'

From more than 3500 clumps, this command only gave \(\sim150\) rows (this number may slightly change on different runs due to the random nature of the upper-limit sampling37)! Let’s have a look at them:

$ asttable xdf-f160w.fits -hclumps -cra,dec --noblankend=3 \
      -c'arith upperlimit_mag magnitude - set-d d d 0 gt nan where' \
      | astscript-ds9-region -c1,2 --namecol=3 --width=2 \
                  --radius=0.5 \
                  --command="ds9 -mecube seg/xdf-f160w.fits -zscale"

You see that they are all extremely faint and diffuse/small peaks. Therefore, if an object’s magnitude is fainter than its upper-limit magnitude, you should not use the magnitude: it is not accurate! You should use the upper-limit magnitude instead (with an arrow in your plots to mark which ones are upper-limits).

But the main point (in relation to the magnitude limit) with the upper-limit, is the UPPERLIMIT_SIGMA column. you can think of this as a realistic S/N for extremely faint/diffuse/small objects). The raw S/N column is simply calculated on a pixel-by-pixel basis, however, the upper-limit sigma is produced by actually taking the label’s footprint, and randomly placing it thousands of time over un-detected parts of the image and measuring the brightness of the sky. The clump’s brightness is then divided by the standard deviation of the resulting distribution to give you exactly how significant it is (accounting for inter-pixel issues like correlated noise, which are strong in this dataset). You can actually compare the two values with the command below:

$ asttable xdf-f160w.fits -hclumps -csn,upperlimit_sigma

As you see, the second column (upper-limit sigma) is almost always less than the S/N. This clearly shows the effect of correlated noise! If you now use this column as the reference for deriving the magnitude limit, you will see that it will shift by almost 0.5 magnitudes brighter and is now more reasonable:

$ asttable xdf-f160w.fits -hclumps --range=upperlimit_sigma,4.8:5.2 \
           -cmagnitude | aststatistics --median
29.6257

We see that the \(5\sigma\) detection limit is \(\sim29.6\)! This is extremely deep! For example, in the Legacy Survey38, the \(5\sigma\) detection limit for point sources is approximately 24.5 (5 magnitudes, or 100 times, shallower than this image).

As mentioned above, an important caveat in this simple calculation is that we should only be looking at point-like objects, not simply everything. This is because the shape or radial slope of the profile has an important effect on this measurement: at the same total magnitude, a sharper object will have a higher S/N. To be more precise, we should first perform star-galaxy separation, then do this only for the objects that are classified as stars. A crude, first-order, method is to use the --axis-ratio option so MakeCatalog also measures the axis ratio, then call Table with --range=upperlimit_sigma,,4.8:5.2 and --range=axis_ratio,0.95:1 (in one command). Please do this for yourself as an exercise to see the difference with the result above.

Before continuing, let’s remove this temporarily produced catalog:

$ rm xdf-f160w.fits

Another measure of the dataset’s limit is the completeness limit (Completeness limit of each detection). This is necessary when you are looking at populations of objects over the image. You want to know until what magnitude you can be sure that you have detected an object (if it was present). As described in Completeness limit of each detection, the best way to do this is with mock images. But a crude, first order result can be obtained from the actual image: by simply plotting the histogram of the magnitudes:

$ aststatistics cat/xdf-f160w.fits -hclumps -cmagnitude
...
Histogram:
 |                                                           *
 |                                                      ** ****
 |                                                   ***********
 |                                                 *************
 |                                                ****************
 |                                             *******************
 |                                           **********************
 |                                        **************************
 |                                 *********************************
 |                      *********************************************
 |* *   ** ** **********************************************************
 |----------------------------------------------------------------------

This plot (the histogram of magnitudes; where fainter magnitudes are towards the right) is technically called the dataset’s number count plot. You see that the number of objects increases with magnitude as the magnitudes get fainter (to the right). However, beyond a certain magnitude, you see it becomes flat, and soon afterwards, the numbers suddenly drop.

Once you have your catalog, you can easily find this point with the two commands below. First we generate a histogram with fewer bins (to have more numbers in each bin). We then use AWK to find the magnitude bin where the number of points decrease compared to the previous bin. But we only do this for bins that have more than 50 items (to avoid scatter in the bright end). Finally, in Statistics, we have manually set the magnitude range and number of bins so each bin is roughly 0.5 magnitudes thick (with --greaterequal=20, --lessthan=32 and --numbins=24)

$ aststatistics cat/xdf-f160w.fits -hclumps -cmagnitude --histogram \
                --greaterequal=20 --lessthan=32 --numbins=24 \
                --output=f160w-hist.txt
$ asttable f160w-hist.txt \
           | awk '$2>50 && $2<prev{print prevbin; exit} \
                  {prev=$2; prevbin=$1}'
28.932122667631

Therefore, to first order (and very crudely!) we can say that if an object is in our field of view and has a magnitude of \(\sim29\) or brighter, we can be highly confident that we have detected it. But before continuing, let’s clean up behind ourselves:

$ rm f160w-hist.txt

Another important limiting parameter in a processed dataset is the surface brightness limit (Surface brightness limit of image). The surface brightness limit of a dataset is an important measure for extended structures (for example, when you want to look at the outskirts of galaxies). In the next tutorial, we have thoroughly described the derivation of the surface brightness limit of a dataset. So we will just show the final result here, and encourage you to follow up with that tutorial after finishing this tutorial (see Image surface brightness limit)

By default, MakeCatalog will estimate the surface brightness limit of a given dataset, and put it in the keywords of the output (all keywords starting with SBL, which is short for surface brightness limit):

$ astfits cat/xdf-f160w.fits -h1 | grep SBL

As you see, the only one with a unit of mag/arcsec^2 is SBLMAG. It contains the surface brightness limit of the input dataset over SBLAREA arcsec\(^2\) with SBLNSIG multiples of \(\sigma\). In the current version of Gnuastro, SBLAREA=100 and SBLNSIG=3, so the surface brightness limit of this image is 32.66 mag/arcsec\(^2\) (\(3\sigma\), over 100 arcsec\(^2\)). Therefore, if this default area and multiple of sigma are fine for you39 (these are the most commonly used values), you can simply read the image surface brightness limit from the catalogs produced by MakeCatalog with this command:

$ astfits cat/*.fits -h1 --keyvalue=SBLMAG

Footnotes

(37)

You can fix the random number generator seed, so you always get the same sampling, see Generating random numbers.

(38)

https://www.legacysurvey.org/dr9/description

(39)

You can change these values with the --sfmagarea and --sfmagnsigma