In many contexts, it is desirable to slice the dataset into subsets or
tiles (overlapping or not). In such a way that you can work on each tile
independently. One method would be to copy that region to a separate
allocated space, but in many contexts this is not necessary and in fact can
be a big burden on CPU/Memory usage. The block
pointer in Gnuastro’s
Generic data container (gal_data_t
) is defined for such situations: where
allocation is not necessary. You just want to read the data or write to it
independently (or in coordination with) other regions of the dataset. Added
with parallel processing, this can greatly improve the time/memory
consumption.
See the figure below for example: assume the larger
dataset is a
contiguous block of memory that you are interpreting as a 2D array. But
you only want to work on the smaller tile
region.
larger --------------------------------- | | | tile | | ---------- | | | | | | |_ | | | |*| | | | ---------- | | tile->block = larger | |_ | |*| | ---------------------------------
To use gal_data_t
’s block
concept, you allocate a
gal_data_t *tile
which is initialized with the pointer to the first
element in the sub-array (as its array
argument). Note that this is
not necessarily the first element in the larger array. You can set the size
of the tile along with the initialization as you please. Recall that, when
given a non-NULL
pointer as array
, gal_data_initialize
(and thus gal_data_alloc
) do not allocate any space and just uses
the given pointer for the new array
element of the
gal_data_t
. So your tile
data structure will not be pointing
to a separately allocated space.
After the allocation is done, you just point tile->block
to the
larger
dataset which hosts the full block of memory. Where relevant,
Gnuastro’s library functions will check the block
pointer of their
input dataset to see how to deal with dimensions and increments so they can
always remain within the tile. The tools introduced in this section are
designed to help in defining and working with tiles that are created in
this manner.
Since the block structure is defined as a pointer, arbitrary levels of
tessellation/grid-ing are possible (tile->block
may itself be a tile
in an even larger allocated space). Therefore, just like a linked-list (see
Linked lists (list.h)), it is important to have the block
pointer of
the largest (allocated) dataset set to NULL
. Normally, you will not
have to worry about this, because gal_data_initialize
(and thus
gal_data_alloc
) will set the block
element to NULL
by
default, just remember not to change it. You can then only change the
block
element for the tiles you define over the allocated space.
Below, we will first review constructs for Independent tiles and then define the current approach to fully tessellating a dataset (or covering every pixel/data-element with a non-overlapping tile grid in Tile grid. This approach to dealing with parts of a larger block was inspired from a similarly named concept in the GNU Scientific Library (GSL), see its “Vectors and Matrices” chapter for their implementation.
GNU Astronomy Utilities 0.20 manual, April 2023.