GNU Astronomy Utilities


Next: , Previous: , Up: Gnuastro library   [Contents][Index]


10.3.19 Binary datasets (binary.h)

Binary datasets only have two (usable) values: 0 (also known as background) or 1 (also known as foreground). They are created after some binary classification is applied to the dataset. The most common is thresholding: for example in an image, pixels with a value above the threshold are given a value of 1 and those with a value less than the threshold are assigned a value of 0.

Since there is only two values, in the processing of binary images, you are usually concerned with the positioning of an element and its vicinity (neighbors). When a dataset has more than one dimension, multiple classes of immediate neighbors (that are touching the element) can be defined for each data-element. To separate these different classes of immediate neighbors, we define connectivity.

The classification is done by the distance from element center to the neighbor’s center. The nearest immediate neighbors have a connectivity of 1, the second nearest class of neighbors have a connectivity of 2 and so on. In total, the largest possible connectivity for data with ndim dimensions is ndim. For example in a 2D dataset, 4-connected neighbors (that share an edge and have a distance of 1 pixel) have a connectivity of 1. The other 4 neighbors that only share a vertice (with a distance of \(\sqrt{2}\) pixels) have a connectivity of 2. Conventionally, the class of connectivity-2 neighbors also includes the connectivity 1 neighbors, so for example we call them 8-connected neighbors in 2D datasets.

Ideally, one bit is sufficient for each element of a binary dataset. However, CPUs are not designed to work on individual bits, the smallest unit of memory addresses is a byte (containing 8 bits on modern CPUs). Therefore, in Gnuastro, the type used for binary dataset is uint8_t (see Numeric data types). Although it does take 8-times more memory, this choice offers much better performance and the some extra (useful) features.

The advantage of using a full byte for each element of a binary dataset is that you can also have other values (that will be ignored in the processing). One such common “other” value in real datasets is a blank value (to mark regions that should not be processed because there is no data). The constant GAL_BLANK_UINT8 value must be used in these cases (see Library blank values (blank.h)). Another is some temporary value(s) that can be given to a processed pixel to avoid having another copy of the dataset as in GAL_BINARY_TMP_VALUE that is described below.

Macro: GAL_BINARY_TMP_VALUE

The functions described below work on a uint8_t type dataset with values of 1 or 0 (no other pixel will be touched). However, in some cases, it is necessary to put temporary values in each element during the processing of the functions. This temporary value has a special meaning for the operation and will be operated on. So if your input datasets have values other than 0 and 1 that you don’t want these functions to work on, be sure they are not equal to this macro’s value. Note that this value is also different from GAL_BLANK_UINT8, so your input datasets may also contain blank elements.

Function:
gal_data_t *
gal_binary_erode (gal_data_t *input, size_t num, int connectivity, int inplace)

Do num erosions on the connectivity-connected neighbors of input (see above for the definition of connectivity).

If inplace is non-zero and the input’s type is GAL_TYPE_UINT8, then the erosion will be done within the input dataset and the returned pointer will be input. Otherwise, input is copied (and converted if necessary) to GAL_TYPE_UINT8 and erosion will be done on this new dataset which will also be returned. This function will only work on the elements with a value of 1 or 0. It will leave all the rest unchanged.

Erosion (inverse of dilation) is an operation in mathematical morphology where each foreground pixel that is touching a background pixel is flipped (changed to background). The connectivity value determines the definition of “touching”. Erosion will thus decrease the area of the foreground regions by one layer of pixels.

Function:
gal_data_t *
gal_binary_dilate (gal_data_t *input, size_t num, int connectivity, int inplace)

Do num dilations on the connectivity-connected neighbors of input (see above for the definition of connectivity). For more on inplace and the output, see gal_binary_erode.

Dilation (inverse of erosion) is an operation in mathematical morphology where each background pixel that is touching a foreground pixel is flipped (changed to foreground). The connectivity value determines the definition of “touching”. Dilation will thus increase the area of the foreground regions by one layer of pixels.

Function:
gal_data_t *
gal_binary_open (gal_data_t *input, size_t num, int connectivity, int inplace)

Do num openings on the connectivity-connected neighbors of input (see above for the definition of connectivity). For more on inplace and the output, see gal_binary_erode.

Opening is an operation in mathematical morphology which is defined as erosion followed by dilation (see above for the definitions of erosion and dilation). Opening will thus remove the outer structure of the foreground. In this implementation, num erosions are going to be applied on the dataset, then num dilations.

Function:
size_t
gal_binary_connected_components (gal_data_t *binary, gal_data_t **out, int connectivity)

Return the number of connected components in binary through the breadth first search algorithm (finding all pixels belonging to one component before going on to the next). Connection between two pixels is defined based on the value to connectivity. out is a dataset with the same size as binary with GAL_TYPE_INT32 type. Every pixel in out will have the label of the connected component it belongs to. The labeling of connected components starts from 1, so a label of zero is given to the input’s background pixels.

When *out!=NULL (its space is already allocated), it will be cleared (to zero) at the start of this function. Otherwise, when *out==NULL, the necessary dataset to keep the output will be allocated by this function.

binary must have a type of GAL_TYPE_UINT8, otherwise this function will abort with an error. Other than blank pixels (with a value of GAL_BLANK_UINT8 defined in Library blank values (blank.h)), all other non-zero pixels in binary will be considered as foreground (and will be labeled). Blank pixels in the input will also be blank in the output.

Function:
gal_data_t *
gal_binary_connected_adjacency_matrix (gal_data_t *adjacency, size_t *numconnected)

Find the number of connected labels and new labels based on an adjacency matrix, which must be a square binary array (type GAL_TYPE_UINT8). The returned dataset is a list of new labels for each old label. In other words, this function will find the objects that are connected (possibly through a third object) and in the output array, the respective elements for all input labels is going to have the same value. The total number of connected labels is put into the space that numconnected points to.

An adjacency matrix defines connection between two labels. For example, let’s assume we have 5 labels and we know that labels 1 and 5 are connected to label 3, but are not connected with each other. Also, labels 2 and 4 are not touching any other label. So in total we have 3 final labels: one combined object (merged from labels 1, 3, and 5) and the initial labels 2 and 4. The input adjacency matrix would look like this (note the extra row and column for a label 0 which is ignored):

            INPUT                             OUTPUT
            =====                             ======
          in_lab  1  2  3  4  5   |
                                  |       numconnected = 3
               0  0  0  0  0  0   |
in_lab 1 -->   0  0  0  1  0  0   |
in_lab 2 -->   0  0  0  0  0  0   |  Returned: new labels for the
in_lab 3 -->   0  1  0  0  0  1   |       5 initial objects
in_lab 4 -->   0  0  0  0  0  0   |   | 0 | 1 | 2 | 1 | 3 | 1 |
in_lab 5 -->   0  0  0  1  0  0   |

Although the adjacency matrix as used here is symmetric, currently this function assumes that it is filled on both sides of the diagonal.

Function:
void
gal_binary_fill_holes (gal_data_t *input)

Fill all the holes that are bounded within a 4-connected region of the binary input dataset. This function currently only works on a 2D dataset.


Next: , Previous: , Up: Gnuastro library   [Contents][Index]