## GNU Astronomy Utilities

Next: , Previous: , Up: Statistics   [Contents][Index]

#### 7.1.2 Sigma clipping

Let’s assume that you have pure noise (centered on zero) with a clear Gaussian distribution, or see Photon counting noise. Now let’s assume you add very bright objects (signal) on the image which have a very sharp boundary. By a sharp boundary, we mean that there is a clear cutoff (from the noise) at the pixels the objects finish. In other words, at their boundaries, the objects do not fade away into the noise. In such a case, when you plot the histogram (see Histogram and Cumulative Frequency Plot) of the distribution, the pixels relating to those objects will be clearly separate from pixels that belong to parts of the image that did not have any signal (were just noise). In the cumulative frequency plot, after a steady rise (due to the noise), you would observe a long flat region were for a certain range of data (horizontal axis), there is no increase in the index (vertical axis).

Outliers like the example above can significantly bias the measurement of noise statistics. $$\sigma$$-clipping is defined as a way to avoid the effect of such outliers. In astronomical applications, cosmic rays (when they collide at a near normal incidence angle) are a very good example of such outliers. The tracks they leave behind in the image are perfectly immune to the blurring caused by the atmosphere and the aperture. They are also very energetic and so their borders are usually clearly separated from the surrounding noise. So $$\sigma$$-clipping is very useful in removing their effect on the data. See Figure 15 in Akhlaghi and Ichikawa, 2015.

$$\sigma$$-clipping is defined as the very simple iteration below. In each iteration, the range of input data might decrease and so when the outliers have the conditions above, the outliers will be removed through this iteration. The exit criteria will be discussed below.

1. Calculate the standard deviation ($$\sigma$$) and median ($$m$$) of a distribution.
2. Remove all points that are smaller or larger than $$m\pm\alpha\sigma$$.
3. Go back to step 1, unless the selected exit criteria is reached.

The reason the median is used as a reference and not the mean is that the mean is too significantly affected by the presence of outliers, while the median is less affected, see Quantifying signal in a tile. As you can tell from this algorithm, besides the condition above (that the signal have clear high signal to noise boundaries) $$\sigma$$-clipping is only useful when the signal does not cover more than half of the full data set. If they do, then the median will lie over the outliers and $$\sigma$$-clipping might remove the pixels with no signal.

There are commonly two exit criteria to stop the $$\sigma$$-clipping iteration:

• When a certain number of iterations has taken place (second value to the --sclipparams option is larger than 1).
• When the new measured standard deviation is within a certain tolerance level of the old one (second value to the --sclipparams option is less than 1). The tolerance level is defined by:

$$\sigma_{old}-\sigma_{new} \over \sigma_{new}$$

The standard deviation is used because it is heavily influenced by the presence of outliers. Therefore the fact that it stops changing between two iterations is a sign that we have successfully removed outliers. Note that in each clipping, the dispersion in the distribution is either less or equal. So $$\sigma_{old}\geq\sigma_{new}$$.

 When working on astronomical images, objects like galaxies and stars are blurred by the atmosphere and the telescope aperture, therefore their signal sinks into the noise very gradually. Galaxies in particular do not appear to have a clear high signal to noise cutoff at all. Therefore $$\sigma$$-clipping will not be useful in removing their effect on the data. To gauge if $$\sigma$$-clipping will be useful for your dataset, look at the histogram (see Histogram and Cumulative Frequency Plot). The ASCII histogram that is printed on the command-line with --asciihist is good enough in most cases.

Next: , Previous: , Up: Statistics   [Contents][Index]