Next: Labeled datasets, Previous: Statistical operations, Up: Gnuastro library [Contents][Index]

Binary datasets only have two (usable) values: 0 (also known as background) or 1 (also known as foreground). They are created after some binary classification is applied to the dataset. The most common is thresholding: for example in an image, pixels with a value above the threshold are given a value of 1 and those with a value less than the threshold are assigned a value of 0.

Since there is only two values, in the processing of binary images, you are
usually concerned with the positioning of an element and its vicinity
(neighbors). When a dataset has more than one dimension, multiple classes
of immediate neighbors (that are touching the element) can be defined for
each data-element. To separate these different classes of immediate
neighbors, we define *connectivity*.

The classification is done by the distance from element center to the
neighbor’s center. The nearest immediate neighbors have a connectivity of
1, the second nearest class of neighbors have a connectivity of 2 and so
on. In total, the largest possible connectivity for data with `ndim`

dimensions is `ndim`

. For example in a 2D dataset, 4-connected
neighbors (that share an edge and have a distance of 1 pixel) have a
connectivity of 1. The other 4 neighbors that only share a vertice (with a
distance of \(\sqrt{2}\) pixels) have a connectivity of
2. Conventionally, the class of connectivity-2 neighbors also includes the
connectivity 1 neighbors, so for example we call them 8-connected neighbors
in 2D datasets.

Ideally, one bit is sufficient for each element of a binary
dataset. However, CPUs are not designed to work on individual bits, the
smallest unit of memory addresses is a byte (containing 8 bits on modern
CPUs). Therefore, in Gnuastro, the type used for binary dataset is
`uint8_t`

(see Numeric data types). Although it does take
8-times more memory, this choice offers much better performance and the
some extra (useful) features.

The advantage of using a full byte for each element of a binary dataset is
that you can also have other values (that will be ignored in the
processing). One such common “other” value in real datasets is a blank
value (to mark regions that should not be processed because there is no
data). The constant `GAL_BLANK_UINT8`

value must be used in these
cases (see Library blank values (`blank.h`)). Another is some temporary value(s)
that can be given to a processed pixel to avoid having another copy of the
dataset as in `GAL_BINARY_TMP_VALUE`

that is described below.

- Macro:
**GAL_BINARY_TMP_VALUE** The functions described below work on a

`uint8_t`

type dataset with values of 1 or 0 (no other pixel will be touched). However, in some cases, it is necessary to put temporary values in each element during the processing of the functions. This temporary value has a special meaning for the operation and will be operated on. So if your input datasets have values other than 0 and 1 that you don’t want these functions to work on, be sure they are not equal to this macro’s value. Note that this value is also different from`GAL_BLANK_UINT8`

, so your input datasets may also contain blank elements.

- Function:

*gal_data_t **

**gal_binary_erode***(gal_data_t*`*input`

, size_t`num`

, int`connectivity`

, int`inplace`

) Do

`num`

erosions on the`connectivity`

-connected neighbors of`input`

(see above for the definition of connectivity).If

`inplace`

is non-zero*and*the input’s type is`GAL_TYPE_UINT8`

, then the erosion will be done within the input dataset and the returned pointer will be`input`

. Otherwise,`input`

is copied (and converted if necessary) to`GAL_TYPE_UINT8`

and erosion will be done on this new dataset which will also be returned. This function will only work on the elements with a value of 1 or 0. It will leave all the rest unchanged.Erosion (inverse of dilation) is an operation in mathematical morphology where each foreground pixel that is touching a background pixel is flipped (changed to background). The

`connectivity`

value determines the definition of “touching”. Erosion will thus decrease the area of the foreground regions by one layer of pixels.

- Function:

*gal_data_t **

**gal_binary_dilate***(gal_data_t*`*input`

, size_t`num`

, int`connectivity`

, int`inplace`

) Do

`num`

dilations on the`connectivity`

-connected neighbors of`input`

(see above for the definition of connectivity). For more on`inplace`

and the output, see`gal_binary_erode`

.Dilation (inverse of erosion) is an operation in mathematical morphology where each background pixel that is touching a foreground pixel is flipped (changed to foreground). The

`connectivity`

value determines the definition of “touching”. Dilation will thus increase the area of the foreground regions by one layer of pixels.

- Function:

*gal_data_t **

**gal_binary_open***(gal_data_t*`*input`

, size_t`num`

, int`connectivity`

, int`inplace`

) Do

`num`

openings on the`connectivity`

-connected neighbors of`input`

(see above for the definition of connectivity). For more on`inplace`

and the output, see`gal_binary_erode`

.Opening is an operation in mathematical morphology which is defined as erosion followed by dilation (see above for the definitions of erosion and dilation). Opening will thus remove the outer structure of the foreground. In this implementation,

`num`

erosions are going to be applied on the dataset, then`num`

dilations.

- Function:

*size_t*

**gal_binary_connected_components***(gal_data_t*`*binary`

, gal_data_t`**out`

, int`connectivity`

) -
Return the number of connected components in

`binary`

through the breadth first search algorithm (finding all pixels belonging to one component before going on to the next). Connection between two pixels is defined based on the value to`connectivity`

.`out`

is a dataset with the same size as`binary`

with`GAL_TYPE_INT32`

type. Every pixel in`out`

will have the label of the connected component it belongs to. The labeling of connected components starts from 1, so a label of zero is given to the input’s background pixels.When

`*out!=NULL`

(its space is already allocated), it will be cleared (to zero) at the start of this function. Otherwise, when`*out==NULL`

, the necessary dataset to keep the output will be allocated by this function.`binary`

must have a type of`GAL_TYPE_UINT8`

, otherwise this function will abort with an error. Other than blank pixels (with a value of`GAL_BLANK_UINT8`

defined in Library blank values (`blank.h`)), all other non-zero pixels in`binary`

will be considered as foreground (and will be labeled). Blank pixels in the input will also be blank in the output.

- Function:

*gal_data_t **

**gal_binary_connected_indexs(gal_data_t**`*binary`

, int`connectivity`

) Build a

`gal_data_t`

linked list, where each node of the list contains an array with indices of the connected regions. Therefore the arrays of each node can have a different size. Note that the indices will only be calculated on the pixels with a value of 1 and internally, it will temporarily change the values to 2 (and return them back to 1 in the end).

- Function:

*gal_data_t **

**gal_binary_connected_adjacency_matrix***(gal_data_t*`*adjacency`

, size_t`*numconnected`

) -
Find the number of connected labels and new labels based on an adjacency matrix, which must be a square binary array (type

`GAL_TYPE_UINT8`

). The returned dataset is a list of new labels for each old label. In other words, this function will find the objects that are connected (possibly through a third object) and in the output array, the respective elements for all input labels is going to have the same value. The total number of connected labels is put into the space that`numconnected`

points to.An adjacency matrix defines connection between two labels. For example, let’s assume we have 5 labels and we know that labels 1 and 5 are connected to label 3, but are not connected with each other. Also, labels 2 and 4 are not touching any other label. So in total we have 3 final labels: one combined object (merged from labels 1, 3, and 5) and the initial labels 2 and 4. The input adjacency matrix would look like this (note the extra row and column for a label 0 which is ignored):

INPUT OUTPUT ===== ====== in_lab 1 2 3 4 5 | | numconnected = 3 0 0 0 0 0 0 | in_lab 1 --> 0 0 0 1 0 0 | in_lab 2 --> 0 0 0 0 0 0 | Returned: new labels for the in_lab 3 --> 0 1 0 0 0 1 | 5 initial objects in_lab 4 --> 0 0 0 0 0 0 | | 0 | 1 | 2 | 1 | 3 | 1 | in_lab 5 --> 0 0 0 1 0 0 |

Although the adjacency matrix as used here is symmetric, currently this function assumes that it is filled on both sides of the diagonal.

- Function:

*gal_data_t **

**gal_binary_connected_adjacency_list***(gal_list_sizet_t*`**listarr`

, size_t`number`

, size_t`minmapsize`

, int`quietmmap`

, size_t`*numconnected`

) -
Find the number of connected labels and new labels based on an adjacency list. The output of this function is identical to that of

`gal_binary_connected_adjacency_matrix`

. But the major difference is that it uses a list of connected labels to each label instead of a square adjacency matrix. This is done because when the number of labels becomes very large (for example on the scale of 100,000), the adjacency matrix can consume more than 10GB of RAM!The input list has the following format: it is an array of pointers to

`gal_list_sizet_t *`

(or`gal_list_sizet_t **`

). The array has`number`

elements and each`listarr[i]`

is a linked list of`gal_list_sizet_t *`

. As a demonstration, the input of the same example in`gal_binary_connected_adjacency_matrix`

would look like below and the output of this function will be identical to there.listarr[0] = NULL listarr[1] = 3 listarr[2] = NULL listarr[3] = 1 -> 5 listarr[4] = NULL listarr[5] = 3

From this example, it is already clear that this method will consume far less memory. But because it needs to parse lists (and not easily jump between array elements), it can be slower. But in scenarios where there are too many objects (that may exceed the whole system’s RAM+SWAP), this option is a good alternative and the drop in processing speed is worth getting the job done.

Similar to

`gal_binary_connected_adjacency_matrix`

, this function will write the final number of connected labels in`numconnected`

. But since it takes no`gal_data_t *`

argument (where it can inherit the`minmapsize`

and`quietmmap`

parameters), it also needs these as input. For more on`minmapsize`

and`quietmmap`

, see Memory management.

- Function:

*gal_data_t **

**gal_binary_holes_label***(gal_data_t*`*input`

, int`connectivity`

, size_t`*numholes`

) Label all the holes in the foreground (non-zero elements in input) as independent regions. Holes are background regions (zero-valued in input) that are fully surrounded by the foreground, as defined by

`connectivity`

. The returned dataset has a 32-bit signed integer type with the size of the input. All holes in the input will have labels/counters greater or equal to`1`

. The rest of the background regions will still have a value of`0`

and the initial foreground pixels will have a value of`-1`

. The total number of holes will be written where`numholes`

points to.

- Function:

*void*

**gal_binary_holes_fill***(gal_data_t*`*input`

, int`connectivity`

, size_t`maxsize`

) Fill all the holes (0 valued pixels surrounded by 1 valued pixels) of the binary

`input`

dataset. The connectivity of the holes can be set with`connectivity`

. Holes larger than`maxsize`

are not filled. This function currently only works on a 2D dataset.

Next: Labeled datasets, Previous: Statistical operations, Up: Gnuastro library [Contents][Index]

JavaScript license information

GNU Astronomy Utilities 0.15 manual, May 2021.