Put simply, noise can characterized with a certain spread about a characteristic value. In the Gaussian distribution (most commonly used to model noise) the spread is defined by the standard deviation about the characteristic mean. Before continuing let’s clarify some definitions first: Data is defined as the combination of signal and noise (so a noisy image is one data-set). Signal is defined as the mean of the noise on each element (after sky subtraction, see Sky value definition).
Let’s assume that the background (see Sky value definition) is subtracted and is zero. When a data set doesn’t have any signal (only noise), the mean, median and mode of the distribution are equal within statistical errors and approximately equal to the background value. Signal always has a positive value and will never become negative, see Figure 1 in Akhlaghi and Ichikawa (2015). Therefore, as more signal is added to the raw noise, the mean, median and mode of the dataset (which has both signal and noise) shift to the positive. The mean’s shift is the largest. The median shifts less, since it is defined based on an ordered distribution and so is not affected by a small number of outliers. The distribution’s mode shifts the least to the positive.
Inverting the argument above gives us a robust method to quantify the significance of signal in a dataset. Namely, when the mode and median of a distribution are approximately equal, we can argue that there is no significant signal. To allow for gradients (which are commonly present in ground-based images), we can consider the image to be made of a grid of tiles (see Tessellation124). Hence, from the difference of the mode and median on each tile, we can ‘detect’ the significance of signal in it. The median of a distribution is defined to be the value of the distribution’s middle point after sorting (or 0.5 quantile). Thus, to estimate the presence of signal, we’ll compare with the quantile of the mode with 0.5, if the difference is larger than the value given to the --modmedqdiff option, this tile will be ignored. You can read this option as “mode-median-quantile-diff”.
This method to use the input’s skewness is possible because of a new algorithm to find the mode of a distribution that was defined in Appendix C of Akhlaghi and Ichikawa (2015). However, the raw dataset’s distribution is noisy (noise also affects the sorting), so using the argument above on the raw input will give a noisy result. To decrease the noise/error in estimating the mode, we will use convolution (see Convolution process). Convolution decreases the range of the dataset and enhances its skewness, See Section 3.1.1 and Figure 4 in Akhlaghi and Ichikawa (2015). This enhanced skewness can be interpreted as an increase in the Signal to noise ratio of the objects buried in the noise. Therefore, to obtain an even better measure of the presence of signal in a mesh, the image can be convolved with a given kernel first.
Note that through the difference of the mode and median we have actually ‘detected’ data in the distribution. However this “detection” was only based on the total distribution of the data in each tile (a much lower resolution). This is the main limitation of this technique. The best approach is thus to do detection over the dataset, mask all the detected pixels and use the undetected regions to estimate the sky and its standard deviation.
The mean value of the tiles that have an approximately equal mode and median will be the Sky value. However there is one final hurdle: astronomical datasets are commonly plagued with Cosmic rays. Images of Cosmic rays aren’t smoothed by the atmosphere or telescope aperture, so they have sharp boundaries. Also, since they don’t occupy too many pixels, they don’t affect the mode and median calculation. But their very high values can greatly bias the calculation of the mean (recall how the mean shifts the fastest in the presence of outliers), see Figure 15 in Akhlaghi and Ichikawa (2015) for one example.
The effect of outliers like cosmic rays on the mean and standard deviation can be removed through \(\sigma\)-clipping, see Sigma clipping for a complete explanation. Therefore, after asserting that the mode and median are approximately equal in a tile (see Tessellation), the final Sky value and its standard deviation are determined after \(\sigma\)-clipping with the --sigmaclip option.