Detection options (GNU Astronomy Utilities)

Next: NoiseChisel output, Previous: NoiseChisel input, Up: Invoking NoiseChisel [Contents][Index]

7.2.2.2 Detection options ¶

Detection is the process of separating the pixels in the image into two groups: 1) Signal, and 2) Noise. Through the parameters below, you can customize the detection process in NoiseChisel. Recall that you can always see the full list of NoiseChisel’s options with the --help (see Getting help), or --printparams (or -P) to see their values (see Operating mode options).

-Q FLT

--meanmedqdiff=FLT

The maximum acceptable distance between the quantiles of the mean and median in each tile, see Quantifying signal in a tile. The quantile threshold estimates are measured on tiles where the quantiles of their mean and median are less distant than the value given to this option. For example, --meanmedqdiff=0.01 means that only tiles where the mean’s quantile is between 0.49 and 0.51 (recall that the median’s quantile is 0.5) will be used.

-a INT

--outliernumngb=INT

Number of neighboring tiles to use for outlier rejection (mostly the wings of bright stars or galaxies). For optimal detection of the wings of bright stars or galaxies, this is the most important option in NoiseChisel. This is because the extended wings of bright galaxies or stars (the PSF) can become flat over the tile. In this case, they will satisfy the --meanmedqdiff condition and pass that step. Therefore, to correctly identify such bad tiles, we need to look at the neighboring nearby tiles. A tile that is on the wing of a bright galaxy/star will clearly be an outlier when looking at the neighbors. For more on the details of the outlier rejection algorithm, see the latter half of Quantifying signal in a tile. If this option is given a value of zero, no outlier rejection will take place.

--outliersclip=FLT,FLT

\(\sigma\)-clipping parameters for the outlier rejection of the quantile threshold. The format of the given values is similar to --sigmaclip below. In NoiseChisel, outlier rejection on tiles is used when identifying the quantile thresholds (--qthresh, --noerodequant, and detgrowquant).

Outlier rejection is useful when the dataset contains a large and diffuse (almost flat within each tile) signal. The flatness of the profile will cause it to successfully pass the mean-median quantile difference test, so we will need to use the distribution of successful tiles for removing these false positives. For more, see the latter half of Quantifying signal in a tile.

--outliersigma=FLT

Multiple of sigma to define an outlier. If this option is given a value of zero, no outlier rejection will take place. For more see --outliersclip and the latter half of Quantifying signal in a tile.

-t FLT

--qthresh=FLT

The quantile threshold to apply to the convolved image. The detection process begins with applying a quantile threshold to each of the tiles in the small tessellation. The quantile is only calculated for tiles that do not have any significant signal within them, see Quantifying signal in a tile. Interpolation is then used to give a value to the unsuccessful tiles and it is finally smoothed.

The quantile value is a floating point value between 0 and 1. Assume that we have sorted the \(N\) data elements of a distribution (the pixels in each mesh on the convolved image). The quantile (\(q\)) of this distribution is the value of the element with an index of (the nearest integer to) \(q\times{N}\) in the sorted data set. After thresholding is complete, we will have a binary (two valued) image. The pixels above the threshold are known as foreground pixels (have a value of 1) while those which lie below the threshold are known as background (have a value of 0).

--smoothwidth=INT

Width of flat kernel used to smooth the interpolated quantile thresholds, see --qthresh for more.

--checkqthresh ¶

Check the quantile threshold values on the mesh grid. A multi-extension FITS file, suffixed with _qthresh.fits will be created showing each step of how the final quantile threshold is found. With this option, NoiseChisel will abort as soon as quantile estimation has been completed, allowing you to inspect the steps leading to the final quantile threshold, this can be disabled with --continueaftercheck. By default the output will have the same pixel size as the input, but with the --oneelempertile option, only one pixel will be used for each tile (see Processing options).

The key things to remember are:

The measurements to find the thresholds are done on tiles that cover the whole image in a tessellation. Recall that you can set the size of tiles with --tilesize and check them with --checktiles. Therefore except for the first and last extensions, the rest only show tiles.
NoiseChisel ultimately has three thresholds: the quantile threshold (that you set with --qthresh), the no-erode quantile (set with --noerodequant) and the growth quantile (set with --detgrowquant). Therefore for each step, we have three extensions.

The output file will have the following extensions. Below, the extensions are put in the same order as you see in the file, with their name.

CONVOLVED

This is the input image after convolution with the kernel (which is a FWHM=2 Gaussian by default, but you can change with --kernel). Recall that the thresholds are defined on the convolved image.

QTHRESH_ERODE

QTHRESH_NOERODE

QTHRESH_EXPAND

In these three extensions, the tiles that have a quantile-of-mean more/less than 0.5 (quantile of median) \(\pm d\) are set to NaN (\(d\) is the value given to --meanmedqdiff, see Quantifying signal in a tile). Therefore the non-NaN tiles that you see here are the tiles where there is no significant skewness (changing signal) within that tile. The only differing thing between the three extensions is the values of the non-NaN tiles. These values will be used to construct the final threshold map over the whole image.

VALUE1_NO_OUTLIER

VALUE2_NO_OUTLIER

VALUE3_NO_OUTLIER

All outlier tiles have been masked. The reason for removing outliers is that the quantile-of-mean is only sensitive to signal that varies on a scale that is smaller than the tile size. Therefore the extended wings of large galaxies or bright stars (which vary on scales much larger than the tile size) will pass that test. As described in Quantifying signal in a tile outlier rejection is customized through --outliernumngb, --outliersclip and --outliersigma.

THRESH1_INTERP

THRESH2_INTERP

THRESH3_INTERP

Using the successful values that remain after the previous step, give values to all (interpolate) the tiles in the image. The interpolation is done using the nearest-neighbor method: for each tile, the N nearest neighbors are found and the median of their values is used to fill it. You can set the value of N through the --interpnumngb option.

THRESH1_SMOOTH

THRESH2_SMOOTH

THRESH3_SMOOTH

Smooth the interpolated image to remove the strong differences between touching tiles. Because we used the median value of the N nearest neighbors in the previous step, there can be strong discontinuities on the edges of tiles (which can directly show in the image after applying the threshold). The scale of the smoothing (number of nearby tiles to smooth with) is set with the --smoothwidth option.

QTHRESH-APPLIED

The pixels in this image can only have three values:

0: These pixels had a value below the quantile threshold.
1: These pixels had a value above the quantile threshold, but below the threshold for no erosion. Therefore in the next step, NoiseChisel will erode (set them to 0) these pixels if they are touching a 0-valued pixel.
2: These pixels had a value above the no-erosion threshold. So NoiseChisel will not erode these pixels, it will only apply Opening to them afterwards. Recall that this was done to avoid loosing sharp point-sources (like stars in space-based imaging).

--blankasforeground

In the erosion and opening steps below, treat blank elements as foreground (regions above the threshold). By default, blank elements in the dataset are considered to be background, so if a foreground pixel is touching it, it will be eroded. This option is irrelevant if the datasets contains no blank elements.

When there are many blank elements in the dataset, treating them as foreground will systematically erode their regions less, therefore systematically creating more false positives. So use this option (when blank values are present) with care.

-e INT ¶

--erode=INT

The number of erosions to apply to the binary thresholded image. Erosion is simply the process of flipping (from 1 to 0) any of the foreground pixels that neighbor a background pixel. In a 2D image, there are two kinds of neighbors, 4-connected and 8-connected neighbors. In a 3D dataset, there are three: 6-connected, 18-connected, and 26-connected. You can specify which class of neighbors should be used for erosion with the --erodengb option, see below.

Erosion has the effect of shrinking the foreground pixels. To put it another way, it expands the holes. This is a founding principle in NoiseChisel: it exploits the fact that with very low thresholds, the holes in the very low surface brightness regions of an image will be smaller than regions that have no signal. Therefore by expanding those holes, we are able to separate the regions harboring signal.

--erodengb=INT

The type of neighborhood (structuring element) used in erosion, see --erode for an explanation on erosion. If the input is a 2D image, only two integer values are acceptable: 4 or 8. For a 3D input data cube, the acceptable values are: 6, 18 and 26.

In 2D 4-connectivity, the neighbors of a pixel are defined as the four pixels on the top, bottom, right and left of a pixel that share an edge with it. The 8-connected neighbors on the other hand include the 4-connected neighbors along with the other 4 pixels that share a corner with this pixel. See Figure 6 (a) and (b) in Akhlaghi and Ichikawa (2015) for a demonstration. A similar argument applies to 3D data cubes.

--noerodequant

Pure erosion is going to carve off sharp and small objects completely out of the detected regions. This option can be used to avoid missing such sharp and small objects (which have significant pixels, but not over a large area). All pixels with a value larger than the significance level specified by this option will not be eroded during the erosion step above. However, they will undergo the erosion and dilation of the opening step below.

Like the --qthresh option, the significance level is determined using the quantile (a value between 0 and 1). Just as a reminder, in the normal distribution, \(1\sigma\), \(1.5\sigma\), and \(2\sigma\) are approximately on the 0.84, 0.93, and 0.98 quantiles.

-p INT

--opening=INT

Depth of opening to be applied to the eroded binary image. Opening is a composite operation. When opening a binary image with a depth of \(n\), \(n\) erosions (explained in --erode) are followed by \(n\) dilations. Simply put, dilation is the inverse of erosion. When dilating an image any background pixel is flipped (from 0 to 1) to become a foreground pixel. Dilation has the effect of fattening the foreground. Note that in NoiseChisel, the erosion which is part of opening is independent of the initial erosion that is done on the thresholded image (explained in --erode). The structuring element for the opening can be specified with the --openingngb option. Opening has the effect of removing the thin foreground connections (mostly noise) between separate foreground ‘islands’ (detections) thereby completely isolating them. Once opening is complete, we have initial detections.

--openingngb=INT

The structuring element used for opening, see --erodengb for more information about a structuring element.

--skyfracnoblank

Ignore blank pixels when estimating the fraction of undetected pixels for Sky estimation. NoiseChisel only measures the Sky over the tiles that have a sufficiently large fraction of undetected pixels (value given to --minskyfrac). By default this fraction is found by dividing number of undetected pixels in a tile by the tile’s area. But this default behavior ignores the possibility of blank pixels. In situations that blank/masked pixels are scattered across the image and if they are large enough, all the tiles can fail the --minskyfrac test, thus not allowing NoiseChisel to proceed. With this option, such scenarios can be fixed: the denominator of the fraction will be the number of non-blank elements in the tile, not the total tile area.

-B FLT

--minskyfrac=FLT

Minimum fraction (value between 0 and 1) of Sky (undetected) areas in a tile. Only tiles with a fraction of undetected pixels (Sky) larger than this value will be used to estimate the Sky value. NoiseChisel uses this option value twice to estimate the Sky value: after initial detections and in the end when false detections have been removed.

Because of the PSF and their intrinsic amorphous properties, astronomical objects (except cosmic rays) never have a clear cutoff and commonly sink into the noise very slowly. Even below the very low thresholds used by NoiseChisel. So when a large fraction of the area of one mesh is covered by detections, it is very plausible that their faint wings are present in the undetected regions (hence causing a bias in any measurement). To get an accurate measurement of the above parameters over the tessellation, tiles that harbor too many detected regions should be excluded. The used tiles are visible in the respective --check option of the given step.

--checkdetsky

Check the initial approximation of the sky value and its standard deviation in a FITS file ending with _detsky.fits. With this option, NoiseChisel will abort as soon as the sky value used for defining pseudo-detections is complete. This allows you to inspect the steps leading to the final quantile threshold, this behavior can be disabled with --continueaftercheck. By default the output will have the same pixel size as the input, but with the --oneelempertile option, only one pixel will be used for each tile (see Processing options).

-s FLT,FLT

--sigmaclip=FLT,FLT

The \(\sigma\)-clipping parameters for measuring the initial and final Sky values from the undetected pixels, see Sigma clipping.

This option takes two values which are separated by a comma (,). Each value can either be written as a single number or as a fraction of two numbers (for example, 3,1/10). The first value to this option is the multiple of \(\sigma\) that will be clipped (\(\alpha\) in that section). The second value is the exit criteria. If it is less than 1, then it is interpreted as tolerance and if it is larger than one it is assumed to be the fixed number of iterations. Hence, in the latter case the value must be an integer.

-R FLT

--dthresh=FLT

The detection threshold: a multiple of the initial Sky standard deviation added with the initial Sky approximation (which you can inspect with --checkdetsky). This flux threshold is applied to the initially undetected regions on the unconvolved image. The background pixels that are completely engulfed in a 4-connected foreground region are converted to background (holes are filled) and one opening (depth of 1) is applied over both the initially detected and undetected regions. The Signal to noise ratio of the resulting ‘pseudo-detections’ are used to identify true vs. false detections. See Section 3.1.5 and Figure 7 in Akhlaghi and Ichikawa (2015) for a very complete explanation.

--dopening=INT

The number of openings to do after applying --dthresh.

--dopeningngb=INT

The connectivity used in the opening of --dopening. In a 2D image this must be either 4 or 8. The stronger the connectivity, the more smaller regions will be discarded.

--holengb=INT

The connectivity (defined by the number of neighbors) to fill holes after applying --dthresh (above) to find pseudo-detections. For example, in a 2D image it must be 4 (the neighbors that are most strongly connected) or 8 (all neighbors). The stronger the connectivity, the stronger the hole will be enclosed. So setting a value of 8 in a 2D image means that the walls of the hole are 4-connected. If standard (near Sky level) values are given to --dthresh, setting --holengb=4, might fill the complete dataset and thus not create enough pseudo-detections.

--pseudoconcomp=INT

The connectivity (defined by the number of neighbors) to find individual pseudo-detections. If it is a weaker connectivity (4 in a 2D image), then pseudo-detections that are connected on the corners will be treated as separate.

-m INT

--snminarea=INT

The minimum area to calculate the Signal to noise ratio on the pseudo-detections of both the initially detected and undetected regions. When the area in a pseudo-detection is too small, the Signal to noise ratio measurements will not be accurate and their distribution will be heavily skewed to the positive. So it is best to ignore any pseudo-detection that is smaller than this area. Use --detsnhistnbins to check if this value is reasonable or not.

--checksn

Save the S/N values of the pseudo-detections (and possibly grown detections if --cleangrowndet is called) into separate tables. If --tableformat is a FITS table, each table will be written into a separate extension of one file suffixed with _detsn.fits. If it is plain text, a separate file will be made for each table (ending in _detsn_sky.txt, _detsn_det.txt and _detsn_grown.txt). For more on --tableformat see Input/Output options.

You can use these to inspect the S/N values and their distribution (in combination with the --checkdetection option to see where the pseudo-detections are). You can use Gnuastro’s Statistics to make a histogram of the distribution or any other analysis you would like for better understanding of the distribution (for example, through a histogram).

--minnumfalse=INT

The minimum number of ‘pseudo-detections’ over the undetected regions to identify a Signal-to-Noise ratio threshold. The Signal to noise ratio (S/N) of false pseudo-detections in each tile is found using the quantile of the S/N distribution of the pseudo-detections over the undetected pixels in each mesh. If the number of S/N measurements is not large enough, the quantile will not be accurate (can have large scatter). For example, if you set --snquant=0.99 (or the top 1 percent), then it is best to have at least 100 S/N measurements.

-c FLT

--snquant=FLT

The quantile of the Signal to noise ratio distribution of the pseudo-detections in each mesh to use for filling the large mesh grid. Note that this is only calculated for the large mesh grids that satisfy the minimum fraction of undetected pixels (value of --minbfrac) and minimum number of pseudo-detections (value of --minnumfalse).

--snthresh=FLT

Manually set the signal-to-noise ratio of true pseudo-detections. With this option, NoiseChisel will not attempt to find pseudo-detections over the noisy regions of the dataset, but will directly go onto applying the manually input value.

This option is useful in crowded images where there is no blank sky to find the sky pseudo-detections. You can get this value on a similarly reduced dataset (from another region of the Sky with more undetected regions spaces).

-d FLT

--detgrowquant=FLT

Quantile limit to “grow” the final detections. As discussed in the previous options, after applying the initial quantile threshold, layers of pixels are carved off the objects to identify true signal. With this step you can return those low surface brightness layers that were carved off back to the detections. To disable growth, set the value of this option to 1.

The process is as follows: after the true detections are found, all the non-detected pixels above this quantile will be put in a list and used to “grow” the true detections (seeds of the growth). Like all quantile thresholds, this threshold is defined and applied to the convolved dataset. Afterwards, the dataset is dilated once (with minimum connectivity) to connect very thin regions on the boundary: imagine building a dam at the point rivers spill into an open sea/ocean. Finally, all holes are filled. In the geography metaphor, holes can be seen as the closed (by the dams) rivers and lakes, so this process is like turning the water in all such rivers and lakes into soil. See --detgrowmaxholesize for configuring the hole filling.

Note that since the growth occurs on all neighbors of a data element, the quantile for 3D detection must be must larger than that of 2D detection. Recall that in 2D each element has 8 neighbors while in 3D there are 27 neighbors.

--detgrowmaxholesize=INT

The maximum hole size to fill during the final expansion of the true detections as described in --detgrowquant. This is necessary when the input contains many smaller objects and can be used to avoid marking blank sky regions as detections.

For example, multiple galaxies can be positioned such that they surround an empty region of sky. If all the holes are filled, the Sky region in between them will be taken as a detection which is not desired. To avoid such cases, the integer given to this option must be smaller than the hole between such objects. However, we should caution that unless the “hole” is very large, the combined faint wings of the galaxies might actually be present in between them, so be very careful in not filling such holes.

On the other hand, if you have a very large (and extended) galaxy, the diffuse wings of the galaxy may create very large holes over the detections. In such cases, a large enough value to this option will cause all such holes to be detected as part of the large galaxy and thus help in detecting it to extremely low surface brightness limits. Therefore, especially when large and extended objects are present in the image, it is recommended to give this option (very) large values. For one real-world example, see Detecting large extended targets.

--cleangrowndet

After dilation, if the signal-to-noise ratio of a detection is less than the derived pseudo-detection S/N limit, that detection will be discarded. In an ideal/clean noise, a true detection’s S/N should be larger than its constituent pseudo-detections because its area is larger and it also covers more signal. However, on a false detections (especially at lower --snquant values), the increase in size can cause a decrease in S/N below that threshold.

This will improve purity and not change completeness (a true detection will not be discarded). Because a true detection has flux in its vicinity and dilation will catch more of that flux and increase the S/N. So on a true detection, the final S/N cannot be less than pseudo-detections.

However, in many real images bad processing creates artifacts that cannot be accurately removed by the Sky subtraction. In such cases, this option will decrease the completeness (will artificially discard true detections). So this feature is not default and should to be explicitly called when you know the noise is clean.

--checkdetection

Every step of the detection process will be added as an extension to a file with the suffix _det.fits. Going through each would just be a repeat of the explanations above and also of those in Akhlaghi and Ichikawa (2015). The extension label should be sufficient to recognize which step you are observing. Viewing all the steps can be the best guide in choosing the best set of parameters. With this option, NoiseChisel will abort as soon as a snapshot of all the detection process is saved. This behavior can be disabled with --continueaftercheck.

--checksky

Check the derivation of the final sky and its standard deviation values on the mesh grid. With this option, NoiseChisel will abort as soon as the sky value is estimated over the image (on each tile). This behavior can be disabled with --continueaftercheck. By default the output will have the same pixel size as the input, but with the --oneelempertile option, only one pixel will be used for each tile (see Processing options).