GNU Astronomy Utilities



2.10.3 MAD clipping

When clipping outliers, it is important that the used measure of dispersion is itself not strongly affected by the outliers. Previously (in Sigma clipping), we saw that the standard deviation is not a good measure of dispersion because of its strong dependency on outliers. In this section, we’ll introduce clipping operators that are based on the median absolute deviation (MAD).

The median absolute deviation is defined as the median of the differences of each element from the median of the elements. As mathematically derived in the Wikipedia page above, for a pure Gaussian distribution, the median absolute deviation will be roughly \(0.67449\sigma\). We can confirm this numerically from the images with pure noise that we created previously in Building inputs and analysis without clipping. With the first command below we can see the raw standard deviation and median absolute deviation values and the second command shows their division:

$ aststatistics build/in-1.fits --std --mad
1.00137568561776e+01 6.74662296703343e+00

$ aststatistics build/in-1.fits --std --mad | awk '{print $2/$1}'
0.673735

The algorithm of MAD-clipping is identical to \(\sigma\)-clipping, except that instead of \(\sigma\), it uses the median absolute deviation. Since the median absolute deviation is smaller than the standard deviation by roughly 0.67, if you regularly use \(3\sigma\) there, you should use \((3/0.67)\rm{MAD}=(4.48)\rm{MAD}\) when doing MAD-clipping. The usual tolerance should also be changed due to the differing nature of the median absolute deviation (based on sorted differences) in relation to the standard deviation (based on the sum of squared differences). A tolerance of 0.01 is better suited to the termination criteria of MAD-clipping.

To demonstrate the steps in practice, let’s assume you have the original script in Building inputs and analysis without clipping with the changes shown in the first command below. With the second command we’ll execute the script, and with the third command we’ll do the iterations of MAD-clipping:

$ grep '^clip_\|^profsum' script.sh
profsum=1e5
clip_operator=madclip
clip_multiple=4.48
clip_tolerance=0.01

$ bash ./script.sh

$ aststatistics build/in-9.fits --madclip
Statistics (GNU Astronomy Utilities) 0.21.6-28a1
-------
Input: build/in-9.fits (hdu: 1)
-------
4.48-MAD clipping steps until relative change in MAD
(median absolute deviation) is less than 0.01:

round number     median       MAD
1     40401      1.09295e+01  7.38609e+00
2     38789      1.04261e+01  7.03508e+00
3     38549      1.03469e+01  6.97927e+00
-------
Statistics (after clipping):
  Number of input elements:                40401
  Number of clips:                         2
  Final number of elements:                38549
  Median:                                  1.034690e+01
  Mean:                                    1.068946e+01
  Standard deviation:                      1.062083e+01
  Median Absolute Deviation:               6.979274e+00

We see that the median, mean and standard deviation after MAD-clipping is much better than the basic \(\sigma\)-clipping (see Sigma clipping): the median is now 10.3 (was 10.5 in \(\sigma\)-clipping), mean is 10.7 (was 10.11) and the standard deviation is 10.6 (was 10.12).

Let’s compare the MAD-clipped stacks with the results of the previous section. Since we want the images shown in a certain order, we’ll first construct the list of images (with a for loop that will fill the imgs variable). Note that this assumes you have ran and carefully read/understand all the commands in the previous sections (Building inputs and analysis without clipping and Sigma clipping). Tip: the three --ds9extra options ensure that the bottom row (showing the number of images used in each pixel) has the same scale and limits in all three columns.

$ imgs=""
$ p=build/stack   # 'p' is short for "prefix"
$ for m in std mean median mad number; do \
   imgs="$imgs $p-$m.fits $p-sigclip-$m.fits $p-madclip-$m.fits"; \
  done
$ astscript-fits-view $imgs --ds9extra="-tile grid layout 3 5" \
                      --ds9extra="-scale limits 1 9" \
                      --ds9extra="-frame move back -scale limits 1 9" \
                      --ds9extra="-frame move back -scale limits 1 9"

The third column shows the newly created MAD-clipped stacks. We see that the outlying circle is much more weaker in the MAD-clipped stacks than in the \(\sigma\)-clipped stacks in all measures (except for the “number” measure where the circle should be stronger).

However, unfortunately even MAD-clipping is not perfect and we still see the circle in all four cases, even with the MAD-clipped median (more clearly: after smoothing/blocking). The reason is similar to what was described in \(\sigma\)-clipping (using the original profsum=3e5: the leaked holes in the numbers image. Because the circle is not too distant from the noise some of its elements do not get clipped, and their stacked value gets systematically higher than the rest of the image. In Gnuastro, we have a fix for this that is described fully in the next section (Filled re-clipping).