GNU Astronomy Utilities



12.3.12.1 Text files (txt.h)

The most universal and portable format for data storage are plain text files. They can be viewed and edited on any text editor or even on the command-line. This section are describes some functions that help in reading from and writing to plain text files.

Lines are one of the most basic building blocks (delimiters) of a text file. Some operating systems like Microsoft Windows, terminate their ASCII text lines with a carriage return character and a new-line character (two characters, also known as CRLF line terminators). While Unix-like operating systems just use a single new-line character. The functions below that read an ASCII text file are able to identify lines with both kinds of line terminators.

Gnuastro defines a simple format for metadata of table columns in a plain text file that is discussed in Gnuastro text table format. The functions to get information from, read from and write to plain text files also follow those conventions.

Macro: GAL_TXT_LINESTAT_INVALID
Macro: GAL_TXT_LINESTAT_BLANK
Macro: GAL_TXT_LINESTAT_COMMENT
Macro: GAL_TXT_LINESTAT_DATAROW

Status codes for lines in a plain text file that are returned by gal_txt_line_stat. Lines which have a # character as their first non-white character are considered to be comments. Lines with nothing but white space characters are considered blank. The remaining lines are considered as containing data.

Function:
int
gal_txt_line_stat (char *line)

Check the contents of line and see if it is a blank, comment, or data line. The returned values are the macros that start with GAL_TXT_LINESTAT.

Function:
char *
gal_txt_trim_space (char *str)

Trim the white space characters before and after the given string. The operation is done within the allocated space of the string, so if you need the string untouched, please pass an allocated copy of the string to this function. The returned pointer is within the input string. If the input pointer is NULL, or the string only has white-space characters, the returned pointer will be NULL.

Function:
int
gal_txt_contains_string (char *full, char *match)

Return 1 if the string that match points to, can be exactly found within the string that full points to (character by character). The to-match string can be in any part of the full string. If any of the two strings have zero length or are a NULL pointer, this function will return 0.

Function:
gal_data_t *
gal_txt_table_info (char *filename, gal_list_str_t *lines, size_t *numcols, size_t *numrows)

Store the information of each column in a text file filename, or list of strings (lines) into an array of data structures with numcols elements (one data structure for each column) see Arrays of datasets. The total number of rows in the table is also put into the memory that numrows points to.

lines is a list of strings with each node representing one line (including the new-line character), see List of strings. It will mostly be the output of gal_txt_stdin_read, which is used to read the program’s input as separate lines from the standard input (see below). Note that filename and lines are mutually exclusive and one of them must be NULL.

This function is just for column information. Therefore it only stores meta-data like column name, units and comments. No actual data (contents of the columns for example, the array or dsize elements) will be allocated by this function. This is a low-level function particular to reading tables in plain text format. To be generic, it is recommended to use gal_table_info which will allow getting information from a variety of table formats based on the filename (see Table input output (table.h)).

Function:
gal_data_t *
gal_txt_table_read (char *filename, gal_list_str_t *lines, size_t numrows, gal_data_t *colinfo, gal_list_sizet_t *indexll, size_t minmapsize, int quietmmap)

Read the columns given in the list indexll from a plain text file (filename) or list of strings (lines), into a linked list of data structures (see List of size_t and List of gal_data_t). If the necessary space for each column is larger than minmapsize, do not keep it in the RAM, but in a file on the HDD/SSD. For more one minmapsize and quietmmap, see the description under the same name in Generic data container (gal_data_t).

lines is a list of strings with each node representing one line (including the new-line character), see List of strings. It will mostly be the output of gal_txt_stdin_read, which is used to read the program’s input as separate lines from the standard input (see below). Note that filename and lines are mutually exclusive and one of them must be NULL.

Note that this is a low-level function, so the output data list is the inverse of the input indices linked list. It is recommended to use gal_table_read for generic reading of tables in any format, see Table input output (table.h).

Function:
gal_data_t *
gal_txt_image_read (char *filename, gal_list_str_t *lines, size_t minmapsize, int quietmmap)

Read the 2D plain text dataset in file (filename) or list of strings (lines) into a dataset and return the dataset. If the necessary space for the image is larger than minmapsize, do not keep it in the RAM, but in a file on the HDD/SSD. For more on minmapsize and quietmmap, see the description under the same name in Generic data container (gal_data_t).

lines is a list of strings with each node representing one line (including the new-line character), see List of strings. It will mostly be the output of gal_txt_stdin_read, which is used to read the program’s input as separate lines from the standard input (see below). Note that filename and lines are mutually exclusive and one of them must be NULL.

Function:
gal_list_str_t *
gal_txt_stdin_read (long timeout_microsec)

Read the complete standard input and return a list of strings with each line (including the new-line character) as one node of that list. If the standard input is already filled (for example, connected to another program’s output with a pipe), then this function will parse the whole stream.

If Standard input is not pre-configured and the first line is typed/written in the terminal before timeout_microsec micro-seconds, it will continue parsing until reaches an end-of-file character (CTRL-D after a new-line on the keyboard) with no time limit. If nothing is entered before timeout_microsec micro-seconds, it will return NULL.

All the functions that can read plain text tables will accept a filename as well as a list of strings (intended to be the output of this function for using Standard input). The reason for keeping the standard input is that once something is read from the standard input, it is hard to put it back. We often need to read a text file several times: once to count how many columns it has and which ones are requested, and another time to read the desired columns. So it easier to keep it all in allocated memory and pass it on from the start for each round.

Function:
gal_list_str_t *
gal_txt_read_to_list (char *filename)

Read the contents of the given plain-text file and put each word (separated by a SPACE character, into a new node of the output list. The order of nodes in the output is the same as the input. Any new-line character at the end of a word is removed in the output list.

Function:
void
gal_txt_write (gal_data_t *cols, struct gal_fits_list_key_t **keylist, gal_list_str_t *comment, char *filename, uint8_t colinfoinstdout, int tab0_img1, int freekeys)

Write cols in a plain text file filename (table when tab0_img1==0 and image when tab0_img1==1). cols may have one or two dimensions which determines the output:

1D

cols is treated as a column and a list of datasets (see List of gal_data_t): every node in the list is written as one column in a table.

2D

cols is a two dimensional array, it cannot be treated as a list (only one 2D array can currently be written to a text file). So if cols->next!=NULL the next nodes in the list are ignored and will not be written.

This is a low-level function for tables. It is recommended to use gal_table_write for generic writing of tables in a variety of formats, see Table input output (table.h).

It is possible to add two types of metadata to the printed table: comments and keywords. Each string in the list given to comments will be printed into the file as a separate line, starting with #. Keywords have a more specific and computer-parsable format and are passed through keylist. Each keyword is also printed in one line, but with the format below. Because of the various components in a keyword, it is thus necessary to use the gal_fits_list_key_t data structure. For more, see FITS header keywords.

# [key] NAME: VALUE / [UNIT] KEYWORD COMMENT.

If filename already exists this function will abort with an error and will not write over the existing file. Before calling this function make sure if the file exists or not. If comments!=NULL, a # will be put at the start of each node of the list of strings and will be written in the file before the column meta-data in filename (see List of strings).

When filename==NULL, the column information will be printed on the standard output (command-line). When colinfoinstdout!=0 and filename==NULL (columns are printed in the standard output), the dataset metadata will also printed in the standard output. When printing to the standard output, the column information can be piped into another program for further processing and thus the meta-data (lines starting with a #) must be ignored. In such cases, you only print the column values by passing 0 to colinfoinstdout.