GNU Astronomy Utilities



7.3 Segment

Once signal is separated from noise (for example, with NoiseChisel), you have a binary dataset: each pixel is either signal (1) or noise (0). Signal (for example, every galaxy in your image) has been “detected”, but all detections have a label of 1. Therefore while we know which pixels contain signal, we still cannot find out how many galaxies they contain or which detected pixels correspond to which galaxy. At the lowest (most generic) level, detection is a kind of segmentation (segmenting the whole dataset into signal and noise, see NoiseChisel). Here, we will define segmentation only on signal: to separate sub-structure within the detections.

If the targets are clearly separated, or their detected regions are not touching, a simple connected components203 algorithm (very basic segmentation) is enough to separate the regions that are touching/connected. This is such a basic and simple form of segmentation that Gnuastro’s Arithmetic program has an operator for it: see connected-components in Arithmetic operators. Assuming the binary dataset is called binary.fits, you can use it with a command like this:

$ astarithmetic binary.fits 2 connected-components

You can even do a very basic detection (a threshold, say at value 100) and segmentation in Arithmetic with a single command like below:

$ astarithmetic in.fits 100 gt 2 connected-components

However, in most astronomical situations our targets are not nicely separated or have a sharp boundary/edge (for a threshold to suffice): they touch (for example, merging galaxies), or are simply in the same line-of-sight (which is much more common). This causes their images to overlap.

In particular, when you do your detection with NoiseChisel, you will detect signal to very low surface brightness limits: deep into the faint wings of galaxies or bright stars (which can extend very far and irregularly from their center). Therefore, it often happens that several galaxies are detected as one large detection. Since they are touching, a simple connected components algorithm will not suffice. It is therefore necessary to do a more sophisticated segmentation and break up the detected pixels (even those that are touching) into multiple target objects as accurately as possible.

Segment will use a detection map and its corresponding dataset to find sub-structure over the detected areas and use them for its segmentation. Until Gnuastro version 0.6 (released in 2018), Segment was part of NoiseChisel. Therefore, similar to NoiseChisel, the best place to start reading about Segment and understanding what it does (with many illustrative figures) is Section 3.2 of Akhlaghi and Ichikawa 2015, and continue with Akhlaghi 2019.

As a summary, Segment first finds true clumps over the detections. Clumps are associated with local maxima/minima204 and extend over the neighboring pixels until they reach a local minimum/maximum (river/watershed). By default, Segment will use the distribution of clump signal-to-noise ratios over the undetected regions as reference to find “true” clumps over the detections. Using the undetected regions can be disabled by directly giving a signal-to-noise ratio to --clumpsnthresh.

The true clumps are then grown to a certain threshold over the detections. Based on the strength of the connections (rivers/watersheds) between the grown clumps, they are considered parts of one object or as separate objects. See Section 3.2 of Akhlaghi and Ichikawa 2015 for more. Segment’s main output are thus two labeled datasets: 1) clumps, and 2) objects. See Segment output for more.

To start learning about Segment, especially in relation to detection (NoiseChisel) and measurement (MakeCatalog), the recommended references are Akhlaghi and Ichikawa 2015, Akhlaghi 2016 and Akhlaghi 2019. If you have used Segment within your research, please run it with --cite to list the papers you should cite and how to acknowledge its funding sources.

Those papers cannot be updated any more but the software will evolve. For example, Segment became a separate program (from NoiseChisel) in 2018 (after those papers were published). Therefore this book is the definitive reference. Finally, in Invoking Segment, we will discuss Segment’s inputs, outputs and configuration options.


Footnotes

(203)

https://en.wikipedia.org/wiki/Connected-component_labeling

(204)

By default the maximum is used as the first clump pixel, to define clumps based on local minima, use the --minima option.