Previous: Sky value misconceptions, Up: Sky value [Contents][Index]

Put simply, noise can characterized with a certain spread about a
characteristic value. In the Gaussian distribution (most commonly used to
model noise) the spread is defined by the standard deviation about the
characteristic mean. Before continuing let’s clarify some definitions
first: *Data* is defined as the combination of signal and noise (so a
noisy image is one *data*-set). *Signal* is defined as the mean
of the noise on each element (after sky subtraction, see Sky value definition).

Let’s assume that the *background* (see Sky value definition) is
subtracted and is zero. When a data set doesn’t have any signal (only
noise), the mean, median and mode of the distribution are equal within
statistical errors and approximately equal to the background value. Signal
always has a positive value and will never become negative, see Figure 1 in
Akhlaghi and Ichikawa
(2015). Therefore, as more signal is added to the raw noise, the mean,
median and mode of the dataset (which has both signal and noise) shift to
the positive. The mean’s shift is the largest. The median shifts less,
since it is defined based on an ordered distribution and so is not affected
by a small number of outliers. The distribution’s mode shifts the least to
the positive.

Inverting the argument above gives us a robust method to quantify the
significance of signal in a dataset. Namely, when the mode and median of a
distribution are approximately equal, we can argue that there is no
significant signal. To allow for gradients (which are commonly present in
ground-based images), we can consider the image to be made of a grid of
tiles (see Tessellation^{99}). Hence, from the
difference of the mode and median on each tile, we can ‘detect’ the
significance of signal in it. The median of a distribution is defined to be
the value of the distribution’s middle point after sorting (or 0.5
quantile). Thus, to estimate the presence of signal, we’ll compare with the
quantile of the mode with 0.5, if the difference is larger than the value
given to the `--modmedqdiff` option, this tile will be ignored. You
can read this option as “mode-median-quantile-diff”.

This method to use the input’s skewness is possible because of a new algorithm to find the mode of a distribution that was defined in Appendix C of Akhlaghi and Ichikawa (2015). However, the raw dataset’s distribution is noisy (noise also affects the sorting), so using the argument above on the raw input will give a noisy result. To decrease the noise/error in estimating the mode, we will use convolution (see Convolution process). Convolution decreases the range of the dataset and enhances its skewness, See Section 3.1.1 and Figure 4 in Akhlaghi and Ichikawa (2015). This enhanced skewness can be interpreted as an increase in the Signal to noise ratio of the objects buried in the noise. Therefore, to obtain an even better measure of the presence of signal in a mesh, the image can be convolved with a given kernel first.

Note that through the difference of the mode and median we have actually ‘detected’ data in the distribution. However this “detection” was only based on the total distribution of the data in each tile (a much lower resolution). This is the main limitation of this technique. The best approach is thus to do detection over the dataset, mask all the detected pixels and use the undetected regions to estimate the sky and its standard deviation.

The mean value of the tiles that have an approximately equal mode and median will be the Sky value. However there is one final hurdle: astronomical datasets are commonly plagued with Cosmic rays. Images of Cosmic rays aren’t smoothed by the atmosphere or telescope aperture, so they have sharp boundaries. Also, since they don’t occupy too many pixels, they don’t affect the mode and median calculation. But their very high values can greatly bias the calculation of the mean (recall how the mean shifts the fastest in the presence of outliers), see Figure 15 in Akhlaghi and Ichikawa (2015) for one example.

The effect of outliers like cosmic rays on the mean and standard deviation
can be removed through \(\sigma\)-clipping, see Sigma clipping
for a complete explanation. Therefore, after asserting that the mode and
median are approximately equal in a tile (see Tessellation), the
final Sky value and its standard deviation are determined after
\(\sigma\)-clipping with the `--sigmaclip` option.

Previous: Sky value misconceptions, Up: Sky value [Contents][Index]

JavaScript license information

GNU Astronomy Utilities 0.5 manual, December 2017.