GNU Astronomy Utilities


Next: , Previous: , Up: Tutorials   [Contents][Index]


2.3 Detecting large extended targets

The outer wings of large and extended objects can sink into the noise very gradually and can have a large variety of shapes (for example due to tidal interactions). Therefore separating the outer boundaries of the galaxies from the noise can be particularly tricky. Besides causing an under-estimation in the total estimated brightness of the target, failure to detect such faint wings will also cause a bias in the noise measurements, thereby hampering the accuracy of any measurement on the dataset. Therefore even if they don’t constitute a significant fraction of the target’s light, or aren’t your primary target, these regions must not be ignored. In this tutorial, we’ll walk you through the strategy of detecting such targets using NoiseChisel.

Don’t start with this tutorial: If you haven’t already completed General program usage tutorial, we strongly recommend going through that tutorial before starting this one. Basic features like access to this book on the command-line, the configuration files of Gnuastro’s programs, benefiting from the modular nature of the programs, viewing multi-extension FITS files, or using NoiseChisel’s outputs are discussed in more detail there.

We’ll try to detect the faint tidal wings of the beautiful M51 group38 in this tutorial. We’ll use a dataset/image from the public Sloan Digital Sky Survey, or SDSS. Due to its more peculiar low surface brightness structure/features, we’ll focus on the dwarf companion galaxy of the group (or NGC 5195). To get the image, you can use SDSS’s Simple field search tool. As long as it is covered by the SDSS, you can find an image containing your desired target either by providing a standard name (if it has one), or its coordinates. To access the dataset we will use here, write NGC5195 in the “Object Name” field and press “Submit” button.

Type the example commands: Try to type the example commands on your terminal and use the history feature of your command-line (by pressing the “up” button to retrieve previous commands). Don’t simply copy and paste the commands shown here. This will help simulate future situations when you are processing your own datasets.

You can see the list of available filters under the color image. For this demonstration, we’ll use the r-band filter image. By clicking on the “r-band FITS” link, you can download the image. Alternatively, you can just run the following command to download it with GNU Wget39. To keep things clean, let’s also put it in a directory called ngc5195. With the -O option, we are asking Wget to save the downloaded file with a more manageable name: r.fits.bz2 (this is an r-band image of NGC 5195, which was the directory name).

$ mkdir ngc5195
$ cd ngc5195
$ topurl=https://dr12.sdss.org/sas/dr12/boss/photoObj/frames
$ wget $topurl/301/3716/6/frame-r-003716-6-0117.fits.bz2 -Or.fits.bz2

This server keeps the files in a Bzip2 compressed file format. So we’ll first decompress it with the following command. By convention, compression programs delete the original file (compressed when uncompressing, or uncompressed when compressing). To keep the original file, you can use the --keep or -k option which is available in most compression programs for this job. Here, we don’t need the compressed file any more, so we’ll just let bunzip delete it for us and keep the directory clean.

$ bunzip2 r.fits.bz2

Let’s see how NoiseChisel operates on it with its default parameters:

$ astnoisechisel r.fits -h0

As described in NoiseChisel output, NoiseChisel’s default output is a multi-extension FITS file. A method to view them effectively and easily is discussed in Viewing multiextension FITS images.

Open the output r_detected.fits file and you will immediately notice how NoiseChisel’s default configuration is not suitable for this dataset: the Sky estimation has failed so terribly that the tile grid (where the Sky was estimated, and subtracted) is visible in the first extension (input dataset subtracted by the Sky value). If you look into the third and fourth extensions (the Sky and its standard deviation) you will see how they exactly map NGC 5195! This is not good! There shouldn’t be any signature of your extended target on the Sky and its standard deviation images. After all, the Sky is suppose to be the average value in the absence of signal, see Sky value.

The fact that signal has been detected as Sky shows that you haven’t done a good detection. Generally, any time your target is much larger than the tile size and the signal is almost flat (like this case), this will happen, even if it isn’t dramatic enough to be seen in the first extension. Therefore, the best place to check the accuracy of your detection is the noise extensions (third and fourth extension) of NoiseChisel’s output.

When dominated by the background, noise has a symmetric distribution. However, signal is not symmetric (we don’t have negative signal). Therefore when non-constant signal is present in a noisy dataset, the distribution will be positively skewed. This skewness is a good measure of how much signal we have in the distribution. The skewness can be accurately measured by the difference in the mode and median, for more see Quantifying signal in a tile, and Appendix C Akhlaghi and Ichikawa [2015].

Skewness is only a proxy for signal when the signal has structure (varies per pixel). Therefore, when it is approximately constant over a whole tile, or sub-set of the image, the signal’s effect is just to shift the symmetric center of the noise distribution to the positive and there won’t be any skewness: this positive40 shift that preserves the symmetric distribution is the Sky value. When there is a gradient over the dataset, different tiles will have different constant shifts/Sky-values, for example see Figure 11 of Akhlaghi and Ichikawa [2015].

To get less scatter in measuring the mode and median (and thus better estimate the skewness), you will need a larger tile. In Gnuastro, you can see the option values (--tilesize in this case) by adding the -P option to your last command. Try it. You can clearly see that the default tile size is indeed much smaller than this (huge) dwarf galaxy. Therefore NoiseChisel was unable to identify the skewness within the tiles under NGC 5159. Recall that NoiseChisel only uses tiles with no signal/skewness to define its threshold. Because of this, the threshold has been over-estimated on those tiles and further exacerbated the non-detection of the diffuse regions. To see which tiles were used for estimating the quantile threshold (no skewness was measured), you can use NoiseChisel’s --checkqthresh option:

$ astnoisechisel r.fits -h0 --checkqthresh

Notice how this option doesn’t allow NoiseChisel to finish. NoiseChisel aborted after finding the quantile thresholds. When you call any of NoiseChisel’s --check* options, by default, it will abort as soon as all the check steps have been written in the check file (a multi-extension FITS file). To optimize the threshold-related settings for this image, we’ll be playing with this tile for the majority of this tutorial. So let’s have a closer look at it.

The first extension of r_qthresh.fits (CONVOLVED) is the convolved input image (where the threshold is defined and applied), for more on the effect of convolution and thresholding, see Sections 3.1.1 and 3.1.2 of Akhlaghi and Ichikawa [2015]. The second extension (QTHRESH_ERODE) has a blank value for all the pixels of any tile that was identified as having significant signal. Playing a little with the dynamic range of this extension, you clearly see how the non-blank tiles around NGC 5195 have a gradient. You do not want this behavior. The ultimate purpose of the next few trials will be to remove the gradient from the non-blank tiles.

The next two extensions (QTHRESH_NOERODE and QTHRESH_EXPAND) are for later steps in NoiseChisel. The same tiles are masked, but those with a value, have a different value compared to QTHRESH_ERODE. In the subsequent three extensions, you can see how the blank tiles are filled/interpolated. The subsequent three extensions show the smoothed tile values. Finally in the last extension (QTHRESH-APPLIED), you can see the effect of applying QTHRESH_ERODE on CONVOLVED (pixels with a value of 0 were below the threshold).

Skipping convolution for faster tests: The slowest step of NoiseChisel is the convolution of the input dataset. Therefore when your dataset is large (unlike the one in this test), and you are not changing the input dataset or kernel in multiple runs (as in the tests of this tutorial), it is faster to do the convolution separately once (using Convolve) and use NoiseChisel’s --convolved option to directly feed the convolved image and avoid convolution. For more on --convolved, see NoiseChisel input.

Fortunately this image is large and has a nice and clean region also (filled with very small/distant stars and galaxies). So our first solution is to increase the tile size. To identify the skewness caused by NGC 5195 on the tiles under it, we thus have to choose a tile size that is larger than the gradient of the signal. Let’s try a 100 by 100 tile size:

$ astnoisechisel r.fits -h0 --tilesize=100,100 --checkqthresh

You can clearly see the effect of this increased tile size: they are much larger. Nevertheless the higher valued tiles are still systematically surrounding NGC 5195. As a result, when flipping through the interpolated and smoothed values, you can still see the signature of the galaxy, and the ugly tile signatures are still present in QTHRESH-APPLIED. So let’s increase the tile size even further (check the result of the first before going to the second):

$ astnoisechisel r.fits -h0 --tilesize=150,150 --checkqthresh
$ astnoisechisel r.fits -h0 --tilesize=200,200 --checkqthresh

The number of tiles with a gradient does indeed decrease with a larger tile size, but you still see a gradient in the raw values. The tile signatures in the thresholded image are also still present. These are not a good sign. So, let’s keep the 200 by 200 tile size and start playing with the other constraint that we have: the acceptable distance (in quantile), between the mode and median.

The tile size is now very large (16 times the area of the default tile size). We thus have much less scatter in the estimation of the mode and median and we can afford to decrease the acceptable distance between the two. The acceptable distance can be set through the --modmedqdiff option (read as “mode-median quantile difference”). Before trying the next command, run the previous command with a -P to see what value it originally had.

$ astnoisechisel r.fits -h0 --tilesize=200,200 --modmedqdiff=0.005    \
                 --checkqthresh

But this command doesn’t finish like the previous ones. A r_qthresh.fits file was created, but instead of saying that the quantile thresholds have been applied, a long error message is printed. Please read the error message before continuing to read here.

The error message fully describes the problem and even proposes solutions. As suggested there, the ideal solution would be to use SDSS images outside of this field and add them to this one to have a larger input image (with a larger area outside the diffuse regions). But we don’t always have this luxury, so let’s keep using to this image for the rest of this tutorial.

First, open r_qthresh.fits and have a look at the successful tiles. Unlike the previous --checkqthresh outputs, this one only has four extensions: as the error message explains, the interpolation (to give values to blank tiles) has not been done. Therefore its check results aren’t present.

At the start of the error message, NoiseChisel tells us how many tiles passed the test for having no significant signal: six. Looking closely at the dataset, we see that outside NGC 5195, there is no background gradient (the background is a fixed value). Our tile sizes are also very large (and thus don’t have much scatter). So a good way to loosen up the parameters can be to simply decrease the number of neighboring tiles needed for interpolation, with --interpnumngb (read as “interpolation number of neighbors”).

$ astnoisechisel r.fits -h0 --tilesize=200,200 --modmedqdiff=0.005    \
                 --interpnumngb=6 --checkqthresh

There is no longer any significant gradient in the usable tiles and no signature of NGC 5195 exists in the interpolated and smoothed values. But before finishing the quantile threshold, let’s have a closer look at the thresholded image in QTHRESH-APPLIED. Slide the dynamic range in your FITS viewer so 0 valued pixels are black and all non-zero pixels are white. You will see that the black holes are not evenly distributed. Those that follow the tail of NGC 5195 are systematically smaller than those in the far-right of the image. This suggests that we can decrease the quantile threshold (--qthresh) even further: there is still signal down there!

$ rm r_qthresh.fits
$ astnoisechisel r.fits -h0 --tilesize=200,200 --modmedqdiff=0.005    \
                 --interpnumngb=6 --qthresh=0.2

Since the quantile threshold of the previous command was satisfactory, we finally removed --checkqthresh to let NoiseChisel proceed until completion. Looking at the DETECTIONS extension of NoiseChisel’s output, we see the right-ward edges in particular have many holes that are fully surrounded by signal and the signal stretches out in the noise very thinly. This suggests that there is still signal that can be detected. So we’ll decrease the growth quantile (for larger/deeper growth into the noise, with --detgrowquant) and increase the size of holes that can be filled (if they are fully surrounded by signal, with --detgrowmaxholesize). Since we are done with our detection, to facilitate later steps, we’ll also add the --label option so the connected regions get different labels.

$ astnoisechisel r.fits -h0 --tilesize=200,200 --modmedqdiff=0.005    \
                 --interpnumngb=6 --qthresh=0.2 --detgrowquant=0.6    \
                 --detgrowmaxholesize=10000 --label

Looking into the output, we now clearly see that the tidal features of M51 and NGC 5195 are detected nicely in the same direction as expected (towards the bottom right side of the image). However, as discussed above, the best measure of good detection is the noise, not the detections themselves. So let’s look at the Sky and its Standard deviation. The Sky standard deviation no longer has any footprint of NGC 5195. But the Sky still has a very faint shadow of the dwarf galaxy (the values on the left are larger than on the right). However, this gradient in the Sky (output of first command below) is much less (by \(\sim20\) times) than the standard deviation (output of second command). So we can stop playing with NoiseChisel here, and leave the configuration for a more accurate detection to you.

$ aststatistics r_detected.fits -hSKY --maximum --minimum       \
                | awk '{print $1-$2}'
$ aststatistics r_detected.fits -hSKY_STD --mean

Let’s see how deeply/successfully we carved out M51 and NGC 5195’s tail from the noise. For this measurement, we’ll need to estimate the average flux on the outer edges of the detection. Fortunately all this can be done with a few simple commands (and no higher-level language mini-environments) using Arithmetic and MakeCatalog.

The M51 group detection is by far the largest detection in this image. We can thus easily find the ID/label that corresponds to it. We’ll first run MakeCatalog to find the area of all the detections, then we’ll use AWK to find the ID of the largest object and keep it as a shell variable (id):

$ astmkcatalog r_detected.fits --ids --geoarea -hDETECTIONS -ocat.txt
$ id=$(awk '!/^#/{if($2>max) {id=$1; max=$2}} END{print id}' cat.txt)

To separate the outer edges of the detections, we’ll need to “erode” the detections. We’ll erode two times (one time would be too thin for such a huge object), using a maximum connectivity of 2 (8-connected neighbors). We’ll then save the output in eroded.fits.

$ astarithmetic r_detected.fits 0 gt 2 erode -hDETECTIONS -oeroded.fits

We should now just keep the pixels that have the ID of the M51 group, but a value of 0 in erode.fits. We’ll keep the output in boundary.fits.

$ astarithmetic r_detected.fits $id eq eroded.fits 0 eq and     \
                -hDETECTIONS -h1 -oboundary.fits

Open the image and have a look. You’ll see that the detected edge of the M51 group is now clearly visible. You can use boundary.fits to mark (set to blank) this boundary on the input image and get a visual feeling of how far it extends:

$ astarithmetic r.fits boundary.fits nan where -ob-masked.fits -h0

To quantify how deep we have detected the low-surface brightness regions, we’ll use the command below. In short it just divides all the 1-valued pixels of boundary.fits in the Sky subtracted input (first extension of NoiseChisel’s output) by the pixel standard deviation of the same pixel. This will give us a signal-to-noise ratio image. The mean value of this image shows the level of surface brightness that we have achieved.

You can also break the command below into multiple calls to Arithmetic and create temporary files to understand it better. However, if you have a look at Reverse polish notation and Arithmetic operators, you should be able to easily understand what your computer does when you run this command41.

$ astarithmetic r_detected.fits boundary.fits not nan where \
                r_detected.fits /                           \
                meanvalue                                   \
                -hINPUT-NO-SKY -h1 -hSKY_STD --quiet
--> 0.0511864

The outer wings where therefore non-parametrically detected until \(\rm{S/N}\approx0.05\).

In interpreting this value, you should just have in mind that NoiseChisel works based on the contiguity of signal in the pixels. Therefore the larger the object, the deeper NoiseChisel can carve it out of the noise. In other words, this reported depth, is only for this particular object and dataset, processed with this particular NoiseChisel configuration: if the M51 group in this image was larger/smaller than this, or if the image was larger/smaller, or if we had used a different configuration, we would go deeper/shallower.

The NoiseChisel configuration found here is NOT GENERIC for any large object: As you saw above, the reason we chose this particular configuration for NoiseChisel to detect the wings of the M51 group was strongly influenced by this particular object in this particular image. When low surface brightness signal takes over such a large fraction of your dataset (and you want to accurately detect/account for it), to make sure that it is successfully detected, you will need some manual checking, intervention, or customization. In other words, to make sure that your noise measurements are least affected by the signal42.

To avoid typing all these options every time you run NoiseChisel on this image, you can use Gnuastro’s configuration files, see Configuration files. For an applied example of setting/using them, see General program usage tutorial.

To continue your analysis of such datasets with extended emission, you can use Segment to identify all the “clumps” over the diffuse regions: background galaxies and foreground stars.

$ astsegment r_detected.fits

Open the output r_detected_segmented.fits as a multi-extension data cube like before and flip through the first and second extensions to see the detected clumps (all pixels with a value larger than 1). To optimize the parameters and make sure you have detected what you wanted, its highly recommended to visually inspect the detected clumps on the input image.

For visual inspection, you can make a simple shell script like below. It will first call MakeCatalog to estimate the positions of the clumps, then make an SAO ds9 region file and open ds9 with the image and region file. Recall that in a shell script, the numeric variables (like $1, $2, and $3 in the example below) represent the arguments given to the script. But when used in the AWK arguments, they refer to column numbers.

To create the shell script, using your favorite text editor, put the contents below into a file called check-clumps.sh. Recall that everything after a # is just comments to help you understand the command (so read them!). Also note that if you are copying from the PDF version of this book, fix the single quotes in the AWK command.

#! /bin/bash
set -e	   # Stop execution when there is an error.
set -u	   # Stop execution when a variable is not initialized.

# Run MakeCatalog to write the coordinates into a FITS table.
# Default output is `$1_cat.fits'.
astmkcatalog $1.fits --clumpscat --ids --ra --dec

# Use Gnuastro's Table program to read the RA and Dec columns of the
# clumps catalog (in the `CLUMPS' extension). Then pipe the columns
# to AWK for saving as a DS9 region file.
asttable $1"_cat.fits" -hCLUMPS -cRA,DEC                               \
         | awk 'BEGIN { print "# Region file format: DS9 version 4.1"; \
                        print "global color=green width=1";            \
                        print "fk5" }                                  \
                { printf "circle(%s,%s,1\")\n", $1, $2 }' > $1.reg

# Show the image (with the requested color scale) and the region file.
ds9 -geometry 1800x3000 -mecube $1.fits -zoom to fit                   \
    -scale limits $2 $3 -regions load all $1.reg

# Clean up (delete intermediate files).
rm $1"_cat.fits" $1.reg

Finally, you just have to activate the script’s executable flag with the command below. This will enable you to directly/easily call the script as a command.

$ chmod +x check-clumps.sh

This script doesn’t expect the .fits suffix of the input’s filename as the first argument. Because the script produces intermediate files (a catalog and DS9 region file, which are later deleted). However, we don’t want multiple instances of the script (on different files in the same directory) to collide (read/write to the same intermediate files). Therefore, we have used suffixes added to the input’s name to identify the intermediate files. Note how all the $1 instances in the commands (not within the AWK command43) are followed by a suffix. If you want to keep the intermediate files, put a # at the start of the last line.

The few, but high-valued, bright pixels in the central parts of the galaxies can hinder easy visual inspection of the fainter parts of the image. With the second and third arguments to this script, you can set the numerical values of the color map (first is minimum/black, second is maximum/white). You can call this script with any44 output of Segment (when --rawoutput is not used) with a command like this:

$ ./check-clumps.sh r_detected_segmented -0.1 2

Go ahead and run this command. You will see the intermediate processing being done and finally it opens SAO DS9 for you with the regions superimposed on all the extensions of Segment’s output. The script will only finish (and give you control of the command-line) when you close DS9. If you need your access to the command-line before closing DS9, add a & after the end of the command above.

While DS9 is open, slide the dynamic range (values for black and white, or minimum/maximum values in different color schemes) and zoom into various regions of the M51 group to see if you are satisfied with the detected clumps. Don’t forget that through the “Cube” window that is opened along with DS9, you can flip through the extensions and see the actual clumps also. The questions you should be asking your self are these: 1) Which real clumps (as you visually feel) have been missed? In other words, is the completeness good? 2) Are there any clumps which you feel are false? In other words, is the purity good?

Note that completeness and purity are not independent of each other, they are anti-correlated: the higher your purity, the lower your completeness and vice-versa. You can see this by playing with the purity level using the --snquant option. Run Segment as shown above again with -P and see its default value. Then increase/decrease it for higher/lower purity and check the result as before. You will see that if you want the best purity, you have to sacrifice completeness and vice versa.

One interesting region to inspect in this image is the many bright peaks around the central parts of M51. Zoom into that region and inspect how many of them have actually been detected as true clumps. Do you have a good balance between completeness and purity? Also look out far into the wings of the group and inspect the completeness and purity there.

An easier way to inspect completeness (and only completeness) is to mask all the pixels detected as clumps and visually inspecting the rest of the pixels. You can do this using Arithmetic in a command like below. For easy reading of the command, we’ll define the shell variable i for the image name and save the output in masked.fits.

$ i=r_detected_segmented.fits
$ astarithmetic $i $i 0 gt nan where -hINPUT -hCLUMPS -omasked.fits

Inspecting masked.fits, you can see some very diffuse peaks that have been missed, especially as you go farther away from the group center and into the diffuse wings. This is due to the fact that with this configuration, we have focused more on the sharper clumps. To put the focus more on diffuse clumps, you can use a wider convolution kernel. Using a larger kernel can also help in detecting the existing clumps to fainter levels (thus better separating them from the surrounding diffuse signal).

You can make any kernel easily using the --kernel option in MakeProfiles. But note that a larger kernel is also going to wash-out many of the sharp/small clumps close to the center of M51 and also some smaller peaks on the wings. Please continue playing with Segment’s configuration to obtain a more complete result (while keeping reasonable purity). We’ll finish the discussion on finding true clumps at this point.

The properties of the background objects can then easily be measured using MakeCatalog. To measure the properties of the background objects (detected as clumps over the diffuse region), you shouldn’t mask the diffuse region. When measuring clump properties with MakeCatalog, the ambient flux (from the diffuse region) is calculated and subtracted. If the diffuse region is masked, its effect on the clump brightness cannot be calculated and subtracted.

To keep this tutorial short, we’ll stop here. See General program usage tutorial and Segment for more on using Segment, producing catalogs with MakeCatalog and using those catalogs.

Finally, if this book or any of the programs in Gnuastro have been useful for your research, please cite the respective papers and share your thoughts and suggestions with us (it can be very encouraging). All Gnuastro programs have a --cite option to help you cite the authors’ work more easily. Just note that it may be necessary to cite additional papers for different programs, so please use --cite with any program that has been useful in your work.

$ astmkcatalog --cite
$ astnoisechisel --cite

Footnotes

(38)

https://en.wikipedia.org/wiki/M51_Group

(39)

To make the command easier to view on screen or in a page, we have defined the top URL of the image as the topurl shell variable. You can just replace the value of this variable with $topurl in the wget command.

(40)

In processed images, where the Sky value can be over-estimated, this constant shift can be negative.

(41)

boundary.fits (extension 1) is a binary (0 or 1 valued) image. Applying the not operator on it, just flips all its pixels. Through the where operator, we are setting all the newly 1-valued pixels in r_detected.fits (extension INPUT-NO-SKY) to NaN/blank. In the second line, we are dividing all the non-blank values by r_detected.fits (extension SKY_STD). This gives the signal-to-noise ratio for each of the pixels on the boundary. Finally, with the meanvalue operator, we are taking the mean value of all the non-blank pixels and reporting that as a single number.

(42)

In the future, we may add capabilities to optionally automate some of the choices made here, please join us in doing this if you are interested. However, given the many problems in existing “smart” solutions, such automatic changing of the configuration may cause more problems than they solve. So even when they are implemented, we would strongly recommend manual checks and intervention for a robust analysis.

(43)

In AWK, $1 refers to the first column, while in the shell script, it refers to the first argument.

(44)

Some modifications are necessary based on the input dataset: depending on the dynamic range, you have to adjust the second and third arguments. But more importantly, depending on the dataset’s world coordinate system, you have to change the region width, in the AWK command. Otherwise the circle regions can be too small/large.


Next: , Previous: , Up: Tutorials   [Contents][Index]