GNU Astronomy Utilities



12.3.6.1 Generic data container (gal_data_t)

To be able to deal with any dataset (various dimensions, numeric data types, units and higher-level structures), Gnuastro defines the gal_data_t type which is the input/output container of choice for many of Gnuastro library’s functions. It is defined in gnuastro/data.h. If you will be using (‘# include’ing) those libraries, you do not need to include this header explicitly, it is already included by any library header that uses gal_data_t.

Type (C struct): gal_data_t

The main container for datasets in Gnuastro. It can host data of any dimensions, with any numeric data type. It is actually a structure, but typedef’d as a new type to avoid having to write the struct before any declaration. The actual structure is shown below which is followed by a description of each element.

typedef struct gal_data_t
{
  void     *restrict array;  /* Basic array information.   */
  uint8_t             type;
  size_t              ndim;
  size_t            *dsize;
  size_t              size;
  int            quietmmap;
  char           *mmapname;
  size_t        minmapsize;

  int                 nwcs;  /* WCS information.           */
  struct wcsprm       *wcs;

  uint8_t             flag;  /* Content description.       */
  int               status;
  char               *name;
  char               *unit;
  char            *comment;

  int             disp_fmt;  /* For text printing.         */
  int           disp_width;
  int       disp_precision;

  struct gal_data_t  *next;  /* For higher-level datasets. */
  struct gal_data_t *block;
} gal_data_t;

The list below contains a description for each gal_data_t element.

void *restrict array

This is the pointer to the main array of the dataset containing the raw data (values). All the other elements in this data-structure are actually meta-data enabling us to use/understand the series of values in this array. It must allow data of any type (see Numeric data types), so it is defined as a void * pointer. A void * array is not directly usable in C, so you have to cast it to proper type before using it, please see Library demo - reading a FITS image for a demonstration.

The restrict keyword was formally introduced in C99 and is used to tell the compiler that at any moment only this pointer will modify what it points to (a pixel in an image for example)257. This extra piece of information can greatly help in compiler optimizations and thus the running time of the program. But older compilers might not have this capability, so at ./configure time, Gnuastro checks this feature and if the user’s compiler does not support restrict, it will be removed from this definition.

uint8_t type

A fixed code (integer) used to identify the type of data in array (see Numeric data types). For the list of acceptable values to this variable, please see Library data types (type.h).

size_t ndim

The dataset’s number of dimensions.

size_t *dsize

The size of the dataset along each dimension. This is an array (with ndim elements), of positive integers in row-major order258 (based on C). When a data file is read into memory with Gnuastro’s libraries, this array is dynamically allocated based on the number of dimensions that the dataset has.

It is important to remember that C’s row-major ordering is the opposite of the FITS standard which is in column-major order: in the FITS standard the fastest dimension’s size is specified by NAXIS1, and slower dimensions follow. The FITS standard was defined mainly based on the FORTRAN language which is the opposite of C’s approach to multi-dimensional arrays (and also starts counting from 1 not 0). Hence if a FITS image has NAXIS1==20 and NAXIS2==50, the dsize array must be filled with dsize[0]==50 and dsize[1]==20.

The fastest dimension is the one that is contiguous in memory: to increment by one along that dimension, just go to the next element in the array. As we go to slower dimensions, the number of memory cells we have to skip for an increment along that dimension becomes larger.

size_t size

The total number of elements in the dataset. This is actually a multiplication of all the values in the dsize array, so it is not an independent parameter. However, low-level operations with the dataset (irrespective of its dimensions) commonly need this number, so this element is designed to avoid calculating it every time.

int quietmmap

When this value is zero, and the dataset must not be allocated in RAM (see mmapname and minmapsize below), a warning will be printed to inform the user when the file is created and when it is deleted. The warning includes the filename, the size in bytes, and the fact that they can toggle this behavior through --minmapsize option in Gnuastro’s programs.

char *mmapname

Name of file hosting the mmap’d contents of array. If the value of this variable is NULL, then the contents of array are actually stored in RAM, not in a file on the HDD/SSD. See the description of minmapsize below for more.

If a file is used, it will be kept in the gnuastro_mmap directory of the running directory. Its name is randomly selected to allow multiple arrays at the same time, see description of --minmapsize in Processing options. When gal_data_free is called the randomly named file will be deleted.

size_t minmapsize

The minimum size of an array (in bytes) to store the contents of array as a file (on the non-volatile HDD/SSD), not in RAM. This can be very useful for large datasets which can be very memory intensive and the user’s RAM might not be sufficient to keep/process it. A random filename is assigned to the array which is available in the mmapname element of gal_data_t (above), see there for more. minmapsize is stored in each gal_data_t, so it can be passed on to subsequent/derived datasets.

See the description of the --minmapsize option in Processing options for more on using this value.

nwcs

The number of WCS coordinate representations (for WCSLIB).

struct wcsprm *wcs

The main WCSLIB structure keeping all the relevant information necessary for WCSLIB to do its processing and convert data-set positions into real-world positions. When it is given a NULL value, all possible WCS calculations/measurements will be ignored.

uint8_t flag

Bit-wise flags to describe general properties of the dataset. The number of bytes available in this flag is stored in the GAL_DATA_FLAG_SIZE macro. Note that you should use bit-wise operators259 to check these flags. The currently recognized bits are stored in these macros:

GAL_DATA_FLAG_BLANK_CH

Marking that the dataset has been checked for blank values or not. When a dataset does not have any blank values, the GAL_DATA_FLAG_HASBLANK bit will be zero. But upon initialization, all bits also get a value of zero. Therefore, a checker needs this flag to see if the value in GAL_DATA_FLAG_HASBLANK is reliable (dataset has actually been parsed for a blank value) or not.

Also, if it is necessary to re-check the presence of flags, you just have to set this flag to zero and call gal_blank_present for example, to parse the dataset and check for blank values. Note that for improved efficiency, when this flag is set, gal_blank_present will not actually parse the dataset, it will just use GAL_DATA_FLAG_HASBLANK.

GAL_DATA_FLAG_HASBLANK

This bit has a value of 1 when the given dataset has blank values. If this bit is 0 and GAL_DATA_FLAG_BLANK_CH is 1, then the dataset has been checked and it did not have any blank values, so there is no more need for further checks.

GAL_DATA_FLAG_SORT_CH

Marking that the dataset is already checked for being sorted or not and thus that the possible 0 values in GAL_DATA_FLAG_SORTED_I and GAL_DATA_FLAG_SORTED_D are meaningful. The logic behind this is similar to that in GAL_DATA_FLAG_BLANK_CH.

GAL_DATA_FLAG_SORTED_I

This bit has a value of 1 when the given dataset is sorted in an increasing manner. If this bit is 0 and GAL_DATA_FLAG_SORT_CH is 1, then the dataset has been checked and was not sorted (increasing), so there is no more need for further checks.

GAL_DATA_FLAG_SORTED_D

This bit has a value of 1 when the given dataset is sorted in a decreasing manner. If this bit is 0 and GAL_DATA_FLAG_SORT_CH is 1, then the dataset has been checked and was not sorted (decreasing), so there is no more need for further checks.

The macro GAL_DATA_FLAG_MAXFLAG contains the largest internally used bit-position. Higher-level flags can be defined with the bit-wise shift operators using this macro to define internal flags for libraries/programs that depend on Gnuastro without causing any possible conflict with the internal flags discussed above or having to check the values manually on every release.

int status

A context-specific status values for this data-structure. This integer will not be set by Gnuastro’s libraries. You can use it keep some additional information about the dataset (with integer constants) depending on your applications.

char *name

The name of the dataset. If the dataset is a multi-dimensional array and read/written as a FITS image, this will be the value in the EXTNAME FITS keyword. If the dataset is a one-dimensional table column, this will be the column name. If it is set to NULL (by default), it will be ignored.

char *unit

The units of the dataset (for example, BUNIT in the standard FITS keywords) that will be read from or written to files/tables along with the dataset. If it is set to NULL (by default), it will be ignored.

char *comment

Any further explanation about the dataset which will be written to any output file if present.

disp_fmt

Format to use for printing each element of the dataset to a plain text file, the acceptable values to this element are defined in Table input output (table.h). Based on C’s printf standards.

disp_width

Width of printing each element of the dataset to a plain text file, the acceptable values to this element are defined in Table input output (table.h). Based on C’s printf standards.

disp_precision

Precision of printing each element of the dataset to a plain text file, the acceptable values to this element are defined in Table input output (table.h). Based on C’s printf standards.

gal_data_t *next

Through this pointer, you can link a gal_data_t with other datasets related datasets, for example, the different columns in a dataset each have one gal_data_t associate with them and they are linked to each other using this element. There are several functions described below to facilitate using gal_data_t as a linked list. See Linked lists (list.h) for more on these wonderful high-level constructs.

gal_data_t *block

Pointer to the start of the complete allocated block of memory. When this pointer is not NULL, the dataset is not treated as a contiguous patch of memory. Rather, it is seen as covering only a portion of the larger patch of memory that block points to. See Tessellation library (tile.h) for a more thorough explanation and functions to help work with tiles that are created from this pointer.


Footnotes

(257)

Also see https://en.wikipedia.org/wiki/Restrict.

(258)

Also see https://en.wikipedia.org/wiki/Row-_and_column-major_order.

(259)

See https://en.wikipedia.org/wiki/Bitwise_operations_in_C.