GNU Astronomy Utilities


Next: , Previous: , Up: Library data container   [Contents][Index]


10.3.5.1 Generic data container (gal_data_t)

To be able to deal with any dataset (various dimensions, numeric data types, units and higher-level structures), Gnuastro defines the gal_data_t type which is the input/output container of choice for many of Gnuastro library’s functions. It is defined in gnuastro/data.h. If you will be using (‘# include’ing) those libraries, you don’t need to include this header explicitly, it is already included by any library header that uses gal_data_t.

Type (C struct): gal_data_t

The main container for datasets in Gnuastro. It can host data of any dimensionality, with any numeric data type. It is actually a structure, but typedef’d as a new type to avoid having to write the struct before any declaration. The actual structure is shown below which is followed by a description of each element.

typedef struct gal_data_t
{
  void     *restrict array;  /* Basic array information.   */
  uint8_t             type;
  size_t              ndim;
  size_t            *dsize;
  size_t              size;
  char           *mmapname;
  size_t        minmapsize;

  int                 nwcs;  /* WCS information.           */
  struct wcsprm       *wcs;

  uint8_t             flag;  /* Content description.       */
  int               status;
  char               *name;
  char               *unit;
  char            *comment;

  int             disp_fmt;  /* For text printing.         */
  int           disp_width;
  int       disp_precision;

  struct gal_data_t  *next;  /* For higher-level datasets. */
  struct gal_data_t *block;
} gal_data_t;

The list below contains a description for each gal_data_t element.

void *restrict array

This is the pointer to the main array of the dataset containing the raw data (values). All the other elements in this data-structure are actually meta-data enabling us to use/understand the series of values in this array. It must allow data of any type (see Numeric data types), so it is defined as a void * pointer. A void * array is not directly usable in C, so you have to cast it to proper type before using it, please see Library demo - reading a FITS image for a demonstration.

The restrict keyword was formally introduced in C99 and is used to tell the compiler that at any moment only this pointer will modify what it points to (a pixel in an image for example)132. This extra piece of information can greatly help in compiler optimizations and thus the running time of the program. But older compilers might not have this capability, so at ./configure time, Gnuastro checks this feature and if the user’s compiler doesn’t support restrict, it will be removed from this definition.

uint8_t type

A fixed code (integer) used to identify the type of data in array (see Numeric data types). For the list of acceptable values to this variable, please see Library data types (type.h).

size_t ndim

The dataset’s number of dimensions.

size_t *dsize

The size of the dataset along each dimension. This is an array (with ndim elements), of positive integers in row-major order133 (based on C). When a data file is read into memory with Gnuastro’s libraries, this array is dynamically allocated based on the number of dimensions that the dataset has.

It is important to remember that C’s row-major ordering is the opposite of the FITS standard which is in column-major order: in the FITS standard the fastest dimension’s size is specified by NAXIS1, and slower dimensions follow. The FITS standard was defined mainly based on the FORTRAN language which is the opposite of C’s approach to multi-dimensional arrays (and also starts counting from 1 not 0). Hence if a FITS image has NAXIS1==20 and NAXIS2==50, the dsize array must be filled with dsize[0]==50 and dsize[1]==20.

The fastest dimension is the one that is contiguous in memory: to increment by one along that dimension, just go to the next element in the array. As we go to slower dimensions, the number of memory cells we have to skip for an increment along that dimension becomes larger.

size_t size

The total number of elements in the dataset. This is actually a multiplication of all the values in the dsize array, so it is not an independent parameter. However, low-level operations with the dataset (irrespective of its dimensionality) commonly need this number, so this element is designed to avoid calculating it every time.

char *mmapname

Name of file hosting the mmap’d contents of array. If the value of this variable is NULL, then the contents of array are actually stored in RAM, not in a file on the HDD/SSD. See the description of minmapsize below for more.

If a file is used, it will be kept in the hidden .gnuastro directory with a randomly selected name to allow multiple arrays to be kept there at the same time. When gal_data_free is called the randomly named file will be deleted.

size_t minmapsize

The minimum size of an array (in bytes) to store the contents of array as a file (on the non-volatile HDD/SSD), not in RAM. This can be very useful for large datasets which can be very memory intensive and the user’s hardware RAM might not be sufficient to keep/process it. A random filename is assigned to the array which is available in the mmapname element of gal_data_t (above), see there for more.

When this variable has a value of 0 (zero), any allocated array will actually be in a file (not in RAM). When the value is -1 (largest possible number in the unsigned types including size_t) the array will be definitely allocated in RAM.

Please note that using a non-volatile file instead of RAM will significantly increase the programs running time, especially on HDDs. So it is best to give this option very large values (depending on how much memory you will need for a given input). For example your processing might involve a copy of the the input (possibly to a wider data type which takes more bytes for each element), so take all such issues into consideration. minmapsize is actually stored in each gal_data_t, so it can be passed on to subsequent/derived datasets.

nwcs

The number of WCS coordinate representations (for WCSLIB).

struct wcsprm *wcs

The main WCSLIB structure keeping all the relevant information necessary for WCSLIB to do its processing and convert data-set positions into real-world positions. When it is given a NULL value, all possible WCS calculations/measurements will be ignored.

uint8_t flag

Bit-wise flags to describe general properties of the dataset. The number of bytes available in this flag is stored in the GAL_DATA_FLAG_SIZE macro. Note that you should use bit-wise operators134 to check these flags. The currently recognized bits are stored in these macros:

GAL_DATA_FLAG_BLANK_CH

Marking that the dataset has been checked for blank values. Therefore, the value of the bit in GAL_DATA_FLAG_HASBLANK is reliable. Without this bit, when a dataset doesn’t have any blank values (and this has been checked), the GAL_DATA_FLAG_HASBLANK bit will be zero so a checker has no way to know if this zero is real or if no check has been done yet.

GAL_DATA_FLAG_HASBLANK

This bit has a value of 1 when the given dataset has blank values. If this bit is 0 and GAL_DATA_FLAG_BLANK_CH is 1, then the dataset has been checked and it didn’t have any blank values, so there is no more need for further checks.

GAL_DATA_FLAG_SORT_CH

Marking that the dataset is already checked for being sorted or not and thus that the possible 0 values in GAL_DATA_FLAG_SORTED_I and GAL_DATA_FLAG_SORTED_D are meaningful.

GAL_DATA_FLAG_SORTED_I

This bit has a value of 1 when the given dataset is sorted in an increasing manner. If this bit is 0 and GAL_DATA_FLAG_SORT_CH is 1, then the dataset has been checked and wasn’t sorted (increasing), so there is no more need for further checks.

GAL_DATA_FLAG_SORTED_D

This bit has a value of 1 when the given dataset is sorted in a decreasing manner. If this bit is 0 and GAL_DATA_FLAG_SORT_CH is 1, then the dataset has been checked and wasn’t sorted (decreasing), so there is no more need for further checks.

The macro GAL_DATA_FLAG_MAXFLAG contains the largest internally used bit-position. Higher-level flags can be defined with the bit-wise shift operators using this macro to define internal flags for libraries/programs that depend on Gnuastro without causing any possible conflict with the internal flags discussed above or having to check the values manually on every release.

int status

A context-specific status values for this data-structure. This integer will not be set by Gnuastro’s libraries. You can use it keep some additional information about the dataset (with integer constants) depending on your applications.

char *name

The name of the dataset. If the dataset is a multi-dimensional array and read/written as a FITS image, this will be the value in the EXTNAME FITS keyword. If the dataset is a one-dimensional table column, this will be the column name. If it is set to NULL (by default), it will be ignored.

char *unit

The units of the dataset (for example BUNIT in the standard FITS keywords) that will be read from or written to files/tables along with the dataset. If it is set to NULL (by default), it will be ignored.

char *comment

Any further explanation about the dataset which will be written to any output file if present.

disp_fmt

Format to use for printing each element of the dataset to a plain text file, the acceptable values to this element are defined in Table input output (table.h). Based on C’s printf standards.

disp_width

Width of printing each element of the dataset to a plain text file, the acceptable values to this element are defined in Table input output (table.h). Based on C’s printf standards.

disp_precision

Precision of printing each element of the dataset to a plain text file, the acceptable values to this element are defined in Table input output (table.h). Based on C’s printf standards.

gal_data_t *next

Through this pointer, you can link a gal_data_t with other datasets related datasets, for example the different columns in a dataset each have one gal_data_t associate with them and they are linked to each other using this element. There are several functions described below to facilitate using gal_data_t as a linked list. See Linked lists (list.h) for more on these wonderful high-level constructs.

gal_data_t *block

Pointer to the start of the complete allocated block of memory. When this pointer is not NULL, the dataset is not treated as a contiguous patch of memory. Rather, it is seen as covering only a portion of the larger patch of memory that block points to. See Tessellation library (tile.h) for a more thorough explanation and functions to help work with tiles that are created from this pointer.


Footnotes

(132)

Also see https://en.wikipedia.org/wiki/Restrict.

(133)

Also see https://en.wikipedia.org/wiki/Row-_and_column-major_order.

(134)

See https://en.wikipedia.org/wiki/Bitwise_operations_in_C.


Next: , Previous: , Up: Library data container   [Contents][Index]