GNU Astronomy Utilities



2.2.2 NoiseChisel optimization

In Detecting large extended targets we downloaded the single exposure SDSS image. Let’s see how NoiseChisel operates on it with its default parameters:

$ astnoisechisel r.fits -h0

As described in NoiseChisel and Multi-Extension FITS files, NoiseChisel’s default output is a multi-extension FITS file. Open the output r_detected.fits file and have a look at the extensions, the 0-th extension is only meta-data and contains NoiseChisel’s configuration parameters. The rest are the Sky-subtracted input, the detection map, Sky values and Sky standard deviation.

$ ds9 -mecube r_detected.fits -zscale -zoom to fit

Flipping through the extensions in a FITS viewer, you will see that the first image (Sky-subtracted image) looks reasonable: there are no major artifacts due to bad Sky subtraction compared to the input. The second extension also seems reasonable with a large detection map that covers the whole of NGC5195, but also extends towards the bottom of the image where we actually see faint and diffuse signal in the input image.

Now try flipping between the DETECTIONS and SKY extensions. In the SKY extension, you’ll notice that there is still significant signal beyond the detected pixels. You can tell that this signal belongs to the galaxy because the far-right side of the image (away from M51) is dark (has lower values) and the brighter parts in the Sky image (with larger values) are just under the detections and follow a similar pattern.

The fact that signal from the galaxy remains in the SKY HDU shows that NoiseChisel can be optimized for a much better result. The SKY extension must not contain any light around the galaxy. Generally, any time your target is much larger than the tile size and the signal is very diffuse and extended at low signal-to-noise values (like this case), this will happen. Therefore, when there are large objects in the dataset, the best place to check the accuracy of your detection is the estimated Sky image.

When dominated by the background, noise has a symmetric distribution. However, signal is not symmetric (we do not have negative signal). Therefore when non-constant47 signal is present in a noisy dataset, the distribution will be positively skewed. For a demonstration, see Figure 1 of Akhlaghi and Ichikawa 2015. This skewness is a good measure of how much faint signal we have in the distribution. The skewness can be accurately measured by the difference in the mean and median (assuming no strong outliers): the more distant they are, the more skewed the dataset is. This important concept will be discussed more extensively in the next section (Skewness caused by signal and its measurement).

However, skewness is only a proxy for signal when the signal has structure (varies per pixel). Therefore, when it is approximately constant over a whole tile, or sub-set of the image, the constant signal’s effect is just to shift the symmetric center of the noise distribution to the positive and there will not be any skewness (major difference between the mean and median). This positive48 shift that preserves the symmetric distribution is the Sky value. When there is a gradient over the dataset, different tiles will have different constant shifts/Sky-values, for example, see Figure 11 of Akhlaghi and Ichikawa 2015.

To make this very large diffuse/flat signal detectable, you will therefore need a larger tile to contain a larger change in the values within it (and improve number statistics, for less scatter when measuring the mean and median). So let’s play with the tessellation a little to see how it affects the result. In Gnuastro, you can see the option values (--tilesize in this case) by adding the -P option to your last command. Try running NoiseChisel with -P to see its default tile size.

You can clearly see that the default tile size is indeed much smaller than this (huge) galaxy and its tidal features. As a result, NoiseChisel was unable to identify the skewness within the tiles under the outer parts of M51 and NGC 5159 and the threshold has been over-estimated on those tiles. To see which tiles were used for estimating the quantile threshold (no skewness was measured), you can use NoiseChisel’s --checkqthresh option:

$ astnoisechisel r.fits -h0 --checkqthresh

Did you see how NoiseChisel aborted after finding and applying the quantile thresholds? When you call any of NoiseChisel’s --check* options, by default, it will abort as soon as all the check steps have been written in the check file (a multi-extension FITS file). This allows you to focus on the problem you wanted to check as soon as possible (you can disable this feature with the --continueaftercheck option).

To optimize the threshold-related settings for this image, let’s play with this quantile threshold check image a little. Do not forget that “Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer” (Anscombe 1973, see Gnuastro manifesto: Science and its tools). A good scientist must have a good understanding of her tools to make a meaningful analysis. So do not hesitate in playing with the default configuration and reviewing the manual when you have a new dataset (from a new instrument) in front of you. Robust data analysis is an art, therefore a good scientist must first be a good artist. So let’s open the check image as a multi-extension cube:

$ ds9 -mecube r_qthresh.fits -zscale -cmap sls -zoom to fit

The first extension (called CONVOLVED) of r_qthresh.fits is the convolved input image where the threshold(s) is(are) defined (and later applied to). For more on the effect of convolution and thresholding, see Sections 3.1.1 and 3.1.2 of Akhlaghi and Ichikawa 2015. The second extension (QTHRESH_ERODE) has a blank/white value for all the pixels of any tile that was identified as having significant signal. The other tiles have the measured threshold over them. The next two extensions (QTHRESH_NOERODE and QTHRESH_EXPAND) are the other two quantile thresholds that are necessary in NoiseChisel’s later steps. Every step in this file is repeated on the three thresholds.

Play a little with the color bar of the QTHRESH_ERODE extension, you clearly see how the non-blank tiles around NGC 5195 have a gradient. As one line of attack against discarding too much signal below the threshold, NoiseChisel rejects outlier tiles. Go forward by three extensions to VALUE1_NO_OUTLIER and you will see that many of the tiles over the galaxy have been removed in this step. For more on the outlier rejection algorithm, see the latter half of Quantifying signal in a tile.

Even though much of the galaxy’s footprint has been rejected as outliers, there are still tiles with signal remaining: play with the DS9 color-bar and you still see a gradient near the outer tidal feature of the galaxy. Before trying to correct this, let’s look at the other extensions of this check image. We will use a * as a wild-card that can be 1, 2 or 3. In the THRESH*_INTERP extensions, you see that all the blank tiles have been interpolated using their nearest neighbors (the relevant option here is --interpnumngb). In the following THRESH*_SMOOTH extensions, you can see the tile values after smoothing (configured with --smoothwidth option). Finally, in QTHRESH-APPLIED, you see the thresholded image: pixels with a value of 1 will be eroded later, but pixels with a value of 2 will pass the erosion step un-touched.

Let’s get back to the problem of optimizing the result. You have two strategies for detecting the outskirts of the merging galaxies: 1) Increase the tile size to get more accurate measurements of skewness. 2) Strengthen the outlier rejection parameters to discard more of the tiles with signal (primarily by increasing --outliernumngb). Fortunately in this image we have a sufficiently large region on the right side of the image that the galaxy does not extend to. So we can use the more robust first solution. In situations where this does not happen (for example, if the field of view in this image was shifted to the left to have more of M51 and less sky) you are limited to a combination of the two solutions or just to the second solution.

Skipping convolution for faster tests: The slowest step of NoiseChisel is the convolution of the input dataset. Therefore when your dataset is large (unlike the one in this test), and you are not changing the input dataset or kernel in multiple runs (as in the tests of this tutorial), it is faster to do the convolution separately once (using Convolve) and use NoiseChisel’s --convolved option to directly feed the convolved image and avoid convolution. For more on --convolved, see NoiseChisel input.

To better identify the skewness caused by the flat NGC 5195 and M51 tidal features on the tiles under it, we have to choose a larger tile size. Let’s try a tile size of 100 by 100 pixels and inspect the check image.

$ astnoisechisel r.fits -h0 --tilesize=100,100 --checkqthresh
$ ds9 -mecube r_qthresh.fits -zscale -cmap sls -zoom to fit

You can clearly see the effect of this increased tile size: the tiles are much larger and when you look into VALUE1_NO_OUTLIER, you see that all the tiles are nicely grouped on the right side of the image (the farthest from M51, where we do not see a gradient in QTHRESH_ERODE). Things look good now, so let’s remove --checkqthresh and let NoiseChisel proceed with its detection.

$ astnoisechisel r.fits -h0 --tilesize=100,100
$ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit

The detected pixels of the DETECTIONS extension have expanded a little, but not as much. Also, the gradient in the SKY image is almost fully removed (and does not fall over M51 anymore). However, on the bottom-right of the m51 detection, we see many holes gradually increasing in size. This hints that there is still signal out there. Let’s check the next series of detection steps by adding the --checkdetection option this time:

$ astnoisechisel r.fits -h0 --tilesize=100,100 --checkdetection
$ ds9 -mecube r_detcheck.fits -zscale -cmap sls -zoom to fit

The output now has 16 extensions, showing every step that is taken by NoiseChisel. The first and second (INPUT and CONVOLVED) are clear from their names. The third (THRESHOLDED) is the thresholded image after finding the quantile threshold (last extension of the output of --checkqthresh). The fourth HDU (ERODED) is new: it is the name-stake of NoiseChisel, or eroding pixels that are above the threshold. By erosion, we mean that all pixels with a value of 1 (above the threshold) that are touching a pixel with a value of 0 (below the threshold) will be flipped to zero (or “carved” out)49. You can see its effect directly by going back and forth between the THRESHOLDED and ERODED extensions.

In the fifth extension (OPENED-AND-LABELED) the image is “opened”, which is a name for eroding once, then dilating (dilation is the inverse of erosion). This is good to remove thin connections that are only due to noise. Each separate connected group of pixels is also given its unique label here. Do you see how just beyond the large M51 detection, there are many smaller detections that get smaller as you go more distant? This hints at the solution: the default number of erosions is too much. Let’s see how many erosions take place by default (by adding -P | grep erode to the previous command)

$ astnoisechisel r.fits -h0 --tilesize=100,100 -P | grep erode

We see that the value of erode is 2. The default NoiseChisel parameters are primarily targeted to processed images (where there is correlated noise due to all the processing that has gone into the warping and stacking of raw images, see NoiseChisel optimization for detection). In those scenarios 2 erosions are commonly necessary. But here, we have a single-exposure image where there is no correlated noise (the pixels are not mixed). So let’s see how things change with only one erosion:

$ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1 \
                 --checkdetection
$ ds9 -mecube r_detcheck.fits -zscale -cmap sls -zoom to fit

Looking at the OPENED-AND-LABELED extension again, we see that the main/large detection is now much larger than before. While the immediately-outer connected regions are still present, they have decreased dramatically, so we can pass this step.

After the OPENED-AND-LABELED extension, NoiseChisel goes onto finding false detections using the undetected pixels. The process is fully described in Section 3.1.5. (Defining and Removing False Detections) of Akhlaghi and Ichikawa 2015. Please compare the extensions to what you read there and things will be very clear. In the last HDU (DETECTION-FINAL), we have the final detected pixels that will be used to estimate the Sky and its Standard deviation. We see that the main detection has indeed been detected very far out, so let’s see how the full NoiseChisel will estimate the Sky and its standard deviation (by removing --checkdetection):

$ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1
$ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit

The DETECTIONS extension of r_detected.fits closely follows what the DETECTION-FINAL of the check image (looks good!). If you go ahead to the SKY extension, things still look good. But it can still be improved.

Look at the DETECTIONS again, you will see the right-ward edges of M51’s detected pixels have many “holes” that are fully surrounded by signal (value of 1) and the signal stretches out in the noise very thinly (the size of the holes increases as we go out). This suggests that there is still undetected signal and that we can still dig deeper into the noise.

With the --detgrowquant option, NoiseChisel will “grow” the detections in to the noise. Its value is the ultimate limit of the growth in units of quantile (between 0 and 1). Therefore --detgrowquant=1 means no growth and --detgrowquant=0.5 means an ultimate limit of the Sky level (which is usually too much and will cover the whole image!). See Figure 2 of Akhlaghi 2019 for more on this option. Try running the previous command with various values (from 0.6 to higher values) to see this option’s effect on this dataset. For this particularly huge galaxy (with signal that extends very gradually into the noise), we will set it to 0.75:

$ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1 \
                 --detgrowquant=0.75
$ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit

Beyond this level (smaller --detgrowquant values), you see many of the smaller background galaxies (towards the right side of the image) starting to create thin spider-leg-like features, showing that we are following correlated noise for too much. Please try it for yourself by changing it to 0.6 for example.

When you look at the DETECTIONS extension of the command shown above, you see the wings of the galaxy being detected much farther out, But you also see many holes which are clearly just caused by noise. After growing the objects, NoiseChisel also allows you to fill such holes when they are smaller than a certain size through the --detgrowmaxholesize option. In this case, a maximum area/size of 10,000 pixels seems to be good:

$ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1 \
                 --detgrowquant=0.75 --detgrowmaxholesize=10000
$ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit

When looking at the raw input image (which is very “shallow”: less than a minute exposure!), you do not see anything so far out of the galaxy. You might just think to yourself that “this is all noise, I have just dug too deep and I’m following systematics”! If you feel like this, have a look at the deep images of this system in Watkins 2015, or a 12 hour deep image of this system (with a 12-inch telescope): https://i.redd.it/jfqgpqg0hfk11.jpg50. In these deeper images you clearly see how the outer edges of the M51 group follow this exact structure, below in Achieved surface brightness level, we will measure the exact level.

As the gradient in the SKY extension shows, and the deep images cited above confirm, the galaxy’s signal extends even beyond this. But this is already far deeper than what most (if not all) other tools can detect. Therefore, we will stop configuring NoiseChisel at this point in the tutorial and let you play with the other options a little more, while reading more about it in the papers: Akhlaghi and Ichikawa 2015 and 2019) and NoiseChisel. When you do find a better configuration feel free to contact us for feedback. Do not forget that good data analysis is an art, so like a sculptor, master your chisel for a good result.

To avoid typing all these options every time you run NoiseChisel on this image, you can use Gnuastro’s configuration files, see Configuration files. For an applied example of setting/using them, see Option management and configuration files.

This NoiseChisel configuration is NOT GENERIC: Do not use the configuration derived above, on another instrument’s image blindly. If you are unsure, just use the default values. As you saw above, the reason we chose this particular configuration for NoiseChisel to detect the wings of the M51 group was strongly influenced by the noise properties of this particular image. Remember NoiseChisel optimization for detection, where we looked into the very deep XDF image which had strong correlated noise?

As long as your other images have similar noise properties (from the same data-reduction step of the same instrument), you can use your configuration on any of them. But for images from other instruments, please follow a similar logic to what was presented in these tutorials to find the optimal configuration.

Smart NoiseChisel: As you saw during this section, there is a clear logic behind the optimal parameter value for each dataset. Therefore, we plan to add capabilities to (optionally) automate some of the choices made here based on the actual dataset, please join us in doing this if you are interested. However, given the many problems in existing “smart” solutions, such automatic changing of the configuration may cause more problems than they solve. So even when they are implemented, we would strongly recommend quality checks for a robust analysis.


Footnotes

(47)

by constant, we mean that it has a single value in the region we are measuring.

(48)

In processed images, where the Sky value can be over-estimated, this constant shift can be negative.

(49)

Pixels with a value of 2 are very high signal-to-noise pixels, they are not eroded, to preserve sharp and bright sources.

(50)

The image is taken from this Reddit discussion: https://www.reddit.com/r/Astronomy/comments/9d6x0q/12_hours_of_exposure_on_the_whirlpool_galaxy/