Plain text files are the most generic, portable, and easiest way to (manually) create, (visually) inspect, or (manually) edit a table. In this format, the ending of a row is defined by the new-line character (a line on a text editor). So when you view it on a text editor, every row will occupy one line. The delimiters (or characters separating the columns) are white space characters (space, horizontal tab, vertical tab) and a comma (,). The only further requirement is that all rows/lines must have the same number of columns.
The columns don’t have to be exactly under each other and the rows can be
arbitrarily long with different lengths. For example the following contents
in a file would be interpreted as a table with 4 columns and 2 rows, with
each element interpreted as a
double type (see Numeric data types).
1 2.234948 128 39.8923e8 2 , 4.454 792 72.98348e7
However, the example above has no other information about the columns (it is just raw data, with no meta-data). To use this table, you have to remember what the numbers in each column represent. Also, when you want to select columns, you have to count their position within the table. This can become frustrating and prone to bad errors (getting the columns wrong) especially as the number of columns increase. It is also bad for sending to a colleague, because they will find it hard to remember/use the columns properly.
To solve these problems in Gnuastro’s programs/libraries you aren’t limited
to using the column’s number, see Selecting table columns. If the
columns have names, units, or comments you can also select your columns
based on searches/matches in these fields, for example see Table.
Also, in this manner, you can’t guide the program reading the table on how
to read the numbers. As an example, the first and third columns above can
be read as integer types: the first column might be an ID and the third can
be the number of pixels an object occupies in an image. So there is no need
to read these to columns as a
double type (which takes more memory,
and is slower).
In the bare-minimum example above, you also can’t use strings of characters, for example the names of filters, or some other identifier that includes non-numerical characters. In the absence of any information, only numbers can be read robustly. Assuming we read columns with non-numerical characters as string, there would still be the problem that the strings might contain space (or any delimiter) character for some rows. So, each ‘word’ in the string will be interpreted as a column and the program will abort with an error that the rows don’t have the same number of columns.
To correct for these limitations, Gnuastro defines the following convention for storing the table meta-data along with the raw data in one plain text file. The format is primarily designed for ease of reading/writing by eye/fingers, but is also structured enough to be read by a program.
When the first non-white character in a line is #, or there are no
non-white characters in it, then the line will not be considered as a row
of data in the table (this is a pretty standard convention in many
programs, and higher level languages). In the former case, the line is
interpreted as a comment. If the comment line starts with ‘
Column N:’, then it is assumed to contain information about column
N (a number, counting from 1). Comment lines that don’t start with
this pattern are ignored and you can use them to include any further
information you want to store with the table in the text file. A column
information comment is assumed to have the following format:
# Column N: NAME [UNIT, TYPE, BLANK] COMMENT
Any sequence of characters between ‘:’ and ‘[’ will be interpreted as the column name (so it can contain anything except the ‘[’ character). Anything between the ‘]’ and the end of the line is defined as a comment. Within the brackets, anything before the first ‘,’ is the units (physical units, for example km/s, or erg/s), anything before the second ‘,’ is the short type identifier (see below, and Numeric data types). Finally (still within the brackets), any non-white characters after the second ‘,’ are interpreted as the blank value for that column (see Blank pixels). Note that blank values will be stored in the same type as the column, not as a string83.
When a formatting problem occurs (for example you have specified the wrong type code, see below), or the the column was already given meta-data in a previous comment, or the column number is larger than the actual number of columns in the table (the non-commented or empty lines), then the comment information line will be ignored.
When a comment information line can be used, the leading and trailing white space characters will be stripped from all of the elements. For example in this line:
# Column 5: column name [km/s, f32,-99] Redshift as speed
NAME field will be ‘
column name’ and the
field will be ‘
f32’. Note how all the white space characters before
and after strings are not used, but those in the middle remained. Also,
white space characters aren’t mandatory. Hence, in the example above, the
BLANK field will be given the value of ‘
Except for the column number (
N), the rest of the fields are
optional. Also, the column information comments don’t have to be in
order. In other words, the information for column \(N+m\)
(\(m>0\)) can be given in a line before column \(N\). Also, you
don’t have to specify information for all columns. Those columns that don’t
have this information will be interpreted with the default settings (like
the case above: values are double precision floating point, and the column
has no name, unit, or comment). So these lines are all acceptable for any
table (the first one, with nothing but the column number is redundant):
# Column 5: # Column 1: ID [,i] The Clump ID. # Column 3: mag_f160w [AB mag, f] Magnitude from the F160W filter
The data type of the column should be specified with one of the following values:
strN’: for strings. The
Nvalue identifies the length of the string (how many characters it has). The start of the string on each row is the first non-delimiter character of the column that has the string type. The next
Ncharacters will be interpreted as a string and all leading and trailing white space will be removed.
If the next column’s characters, are closer than
N characters to the
start of the string column in that line/row, they will be considered part
of the string column. If there is a new-line character before the ending of
the space given to the string column (in other words, the string column is
the last column), then reading of the string will stop, even if the
N characters are not complete yet. See tests/table/table.txt
for one example. Therefore, the only time you have to pay attention to the
positioning and spaces given to the string column is when it is not the
last column in the table.
The only limitation in this format is that trailing and leading white space characters will be removed from the columns that are read. In most cases, this is the desired behavior, but if trailing and leading white-spaces are critically important to your analysis, define your own starting and ending characters and remove them after the table has been read. For example in the sample table below, the two ‘|’ characters (which are arbitrary) will remain in the value of the second column and you can remove them manually later. If only one of the leading or trailing white spaces is important for your work, you can only use one of the ‘|’s.
# Column 1: ID [label, uc] # Column 2: Notes [no unit, str50] 1 leading and trailing white space is ignored here 2.3442e10 2 | but they will be preserved here | 8.2964e11
Note that the FITS binary table standard does not define the
unsigned long types, so if you want to convert your tables
to FITS binary tables, use other types. Also, note that in the FITS ASCII
table, there is only one integer type (
long). So if you convert a
Gnuastro plain text table to a FITS ASCII table with the Table
program, the type information for integers will be lost. Conversely if
integer types are important for you, you have to manually set them when
reading a FITS ASCII table (for example with the Table program when
reading/converting into a file, or with the gnuastro/table.h library
functions when reading into memory).
For floating point types, the
strings (both not case-sensitive) refer to IEEE NaN (not a number) and
infinity values respectively and will be stored as a floating point, so
they are acceptable.