GNU Astronomy Utilities


Next: , Previous: , Up: Tables   [Contents][Index]


4.5.2 Gnuastro text table format

Plain text files are the most generic, portable, and easiest way to (manually) create, (visually) inspect, or (manually) edit a table. In this format, the ending of a row is defined by the new-line character (a line on a text editor). So when you view it on a text editor, every row will occupy one line. The delimiters (or characters separating the columns) are white space characters (space, horizontal tab, vertical tab) and a comma (,). The only further requirement is that all rows/lines must have the same number of columns.

The columns don’t have to be exactly under each other and the rows can be arbitrarily long with different lengths. For example the following contents in a file would be interpreted as a table with 4 columns and 2 rows, with each element interpreted as a double type (see Numeric data types).

1     2.234948   128   39.8923e8
2 , 4.454        792     72.98348e7

However, the example above has no other information about the columns (it is just raw data, with no meta-data). To use this table, you have to remember what the numbers in each column represent. Also, when you want to select columns, you have to count their position within the table. This can become frustrating and prone to bad errors (getting the columns wrong) especially as the number of columns increase. It is also bad for sending to a colleague, because they will find it hard to remember/use the columns properly.

To solve these problems in Gnuastro’s programs/libraries you aren’t limited to using the column’s number, see Selecting table columns. If the columns have names, units, or comments you can also select your columns based on searches/matches in these fields, for example see Table. Also, in this manner, you can’t guide the program reading the table on how to read the numbers. As an example, the first and third columns above can be read as integer types: the first column might be an ID and the third can be the number of pixels an object occupies in an image. So there is no need to read these to columns as a double type (which takes more memory, and is slower).

In the bare-minimum example above, you also can’t use strings of characters, for example the names of filters, or some other identifier that includes non-numerical characters. In the absence of any information, only numbers can be read robustly. Assuming we read columns with non-numerical characters as string, there would still be the problem that the strings might contain space (or any delimiter) character for some rows. So, each ‘word’ in the string will be interpreted as a column and the program will abort with an error that the rows don’t have the same number of columns.

To correct for these limitations, Gnuastro defines the following convention for storing the table meta-data along with the raw data in one plain text file. The format is primarily designed for ease of reading/writing by eye/fingers, but is also structured enough to be read by a program.

When the first non-white character in a line is #, or there are no non-white characters in it, then the line will not be considered as a row of data in the table (this is a pretty standard convention in many programs, and higher level languages). In the former case, the line is interpreted as a comment. If the comment line starts with ‘# Column N:’, then it is assumed to contain information about column N (a number, counting from 1). Comment lines that don’t start with this pattern are ignored and you can use them to include any further information you want to store with the table in the text file. A column information comment is assumed to have the following format:

# Column N: NAME [UNIT, TYPE, BLANK] COMMENT

Any sequence of characters between ‘:’ and ‘[’ will be interpreted as the column name (so it can contain anything except the ‘[’ character). Anything between the ‘]’ and the end of the line is defined as a comment. Within the brackets, anything before the first ‘,’ is the units (physical units, for example km/s, or erg/s), anything before the second ‘,’ is the short type identifier (see below, and Numeric data types). Finally (still within the brackets), any non-white characters after the second ‘,’ are interpreted as the blank value for that column (see Blank pixels). Note that blank values will be stored in the same type as the column, not as a string52.

When a formatting problem occurs (for example you have specified the wrong type code, see below), or the the column was already given meta-data in a previous comment, or the column number is larger than the actual number of columns in the table (the non-commented or empty lines), then the comment information line will be ignored.

When a comment information line can be used, the leading and trailing white space characters will be stripped from all of the elements. For example in this line:

# Column 5:  column name   [km/s,    f32,-99] Redshift as speed

The NAME field will be ‘column name’ and the TYPE field will be ‘f32’. Note how all the white space characters before and after strings are not used, but those in the middle remained. Also, white space characters aren’t mandatory. Hence, in the example above, the BLANK field will be given the value of ‘-99’.

Except for the column number (N), the rest of the fields are optional. Also, the column information comments don’t have to be in order. In other words, the information for column \(N+m\) (\(m>0\)) can be given in a line before column \(N\). Also, you don’t have to specify information for all columns. Those columns that don’t have this information will be interpreted with the default settings (like the case above: values are double precision floating point, and the column has no name, unit, or comment). So these lines are all acceptable for any table (the first one, with nothing but the column number is redundant):

# Column 5:
# Column 1: ID [,i] The Clump ID.
# Column 3: mag_f160w [AB mag, f] Magnitude from the F160W filter

The data type of the column should be specified with one of the following values:

Note that the FITS binary table standard does not define the unsigned int and unsigned long types, so if you want to convert your tables to FITS binary tables, use other types. Also, note that in the FITS ASCII table, there is only one integer type (long). So if you convert a Gnuastro plain text table to a FITS ASCII table with the Table program, the type information for integers will be lost. Conversely if integer types are important for you, you have to manually set them when reading a FITS ASCII table (for example with the Table program when reading/converting into a file, or with the gnuastro/table.h library functions when reading into memory).


Footnotes

(52)

For floating point types, the nan, or inf strings (both not case-sensitive) refer to IEEE NaN (not a number) and infinity values respectively and will be stored as a floating point, so they are acceptable.


Next: , Previous: , Up: Tables   [Contents][Index]