Vector columns (GNU Astronomy Utilities)

Next: Column arithmetic, Previous: Printing floating point numbers, Up: Table [Contents][Index]

5.3.2 Vector columns ¶

In its most common format, each column of a table only has a single value in each row. For example, we usually have one column for the magnitude, another column for the RA (Right Ascension) and yet another column for the DEC (Declination) of a set of galaxies/stars (where each galaxy is represented by one row in the table). This common single-valued column format is sufficient in many scenarios. However, in some situations (like those below) it would help to have multiple values for each row in each column, not just one.

Conceptually: the various numbers are “connected” to each other. In other words, their order and position in relation to each other matters. Common examples in astronomy are the radial profiles of each galaxy in your catalog, or their spectrum. For example, each MUSE¹⁴⁵ spectra has 3681 points (with a sampling of of 1.25 Angstroms).
Dealing with this many separate measurements as separate columns in your table is very annoying and prone to error: you don’t want to forget moving some of them in an output table for further analysis, mistakenly change their order, or do some operation only on a sub-set of them.
Technically: in the FITS standard, you can only store a maximum of 999 columns in a FITS table. Therefore, if you have more than 999 data points for each galaxy (like the MUSE spectra example above), it is impossible to store each point in one table as separate columns.

To address these problems, the FITS standard has defined the concept of “vector” columns in its Binary table format (ASCII FITS tables don’t support vector columns, but Gnuastro’s plain-text format does, as described here). Within each row of a single vector column, we can store any number of data points (like the MUSE spectra above or the full radial profile of each galaxy). All the values in a vector column have to have the same Numeric data types, and the number of elements within each vector column is the same for all rows.

By grouping conceptually similar data points (like a spectrum) in one vector column, we can significantly reduce the number of columns and make it much more manageable, without loosing any information! To demonstrate the vector column features of Gnuastro’s Table program, let’s start with a randomly generated small (5 rows and 3 columns) catalog. This will allows us to show the outputs of each step here, but you can apply the same concept to vectors with any number of columns.

With the command below, we use seq to generate a single-column table that is piped to Gnuastro’s Table program. Table then uses column arithmetic to generate three columns with random values from that column (for more, see Column arithmetic). Each column becomes noisy, with standard deviations of 2, 5 and 10. Finally, we will add metadata to each column, giving each a different name (using names is always the best way to work with columns):

$ seq 1 5 \
      | asttable -c'arith $1 2  mknoise-sigma f32' \
                 -c'arith $1 5  mknoise-sigma f32' \
                 -c'arith $1 10 mknoise-sigma f32' \
                 --colmetadata=1,abc,none,"First column." \
                 --colmetadata=2,def,none,"Second column." \
                 --colmetadata=3,ghi,none,"Third column." \
                 --output=table.fits

With the command below, let’s have a look at the table. When you run it, you will have a different random number generator seed, so the numbers will be slightly different. For making reproducible random numbers, see Generating random numbers. The -Y option is used for more easily readable numbers (without it, floating point numbers are written in scientific notation, for more see Printing floating point numbers) and with the -O we are asking Table to also print the metadata. For more on Table’s options, see Invoking Table and for seeing how the short options can be merged (such that -Y -O is identical to -YO), see Options.

$ asttable table.fits -YO
# Column 1: abc [none,f32,] First column.
# Column 2: def [none,f32,] Second column.
# Column 3: ghi [none,f32,] Third column.
1.074           5.535         -4.464
0.606          -2.011          15.397
1.475           1.811          5.687
2.248           7.663         -7.789
6.355           17.374         6.767

We see that indeed, it has three columns, with our given names. Now, let’s assume that you want to make a two-element vector column from the values in the def and ghi columns. To do that, you can use the --tovector option like below. As the name suggests, --tovector will merge the rows of the two columns into one vector column with multiple values in each row.

$ asttable table.fits -YO --tovector=def,ghi
# Column 1: abc        [none,f32   ,] First column.
# Column 2: def-VECTOR [none,f32(2),] Vector by merging multiple cols.
1.074           5.535         -4.464
0.606          -2.011          15.397
1.475           1.811          5.687
2.248           7.663         -7.789
6.355           17.374         6.767

If you ignore the metadata, this doesn’t seem to have changed anything! You see that each line of numbers still has three “tokens” (to distinguish them from “columns”). But once you look at the metadata, you only see metadata for two columns, not three. If you look closely, the numeric data type of the newly added fourth column is ‘f32(2)’ (look above; previously it was f32). The (2) shows that the second column contains two numbers/tokens not one. If your vector column consisted of 3681 numbers, this would be f32(3681). Looking again at the metadata, we see that --tovector has also created a new name and comments for the new column. This is done all the time to avoid confusion with the old columns.

Let’s confirm that the newly added column is indeed a single column but with two values. To do this, with the command below, we’ll write the output into a FITS table. In the same command, let’s also give a more suitable name for the new merged/vector column). We can get a first confirmation by looking at the table’s metadata in the second command below:

$ asttable table.fits -YO --tovector=def,ghi --output=vec.fits \
           --colmetadata=2,vector,nounits,"New vector column."

$ asttable vec.fits -i
--------
vec.fits (hdu: 1)
-------    -----    ----        -------
No.Name    Units    Type        Comment
-------    -----    ----        -------
1  abc     none     float32     First column.
2  vector  nounits  float32(2)  New vector column.
--------
Number of rows: 5
--------

A more robust confirmation would be to print the values in the newly added vector column. As expected, asking for a single column with --column (or -c) will given us two numbers per row/line (instead of one!).

$ asttable vec.fits -c vector -YO
# Column 1: vector [nounits,f32(2),] New vector column.
 5.535         -4.464
-2.011          15.397
 1.811          5.687
 7.663         -7.789
 17.374         6.767

If you want to keep the original single-valued columns that went into the vector column, you can use the --keepvectfin option (read it as “KEEP VECtor To/From Inputs”):

$ asttable table.fits -YO --tovector=def,ghi --keepvectfin \
           --colmetadata=4,vector,nounits,"New vector column."
# Column 1: abc    [none   ,f32   ,] First column.
# Column 2: def    [none   ,f32   ,] Second column.
# Column 3: ghi    [none   ,f32   ,] Third column.
# Column 4: vector [nounits,f32(2),] New vector column.
1.074           5.535         -4.464          5.535         -4.464
0.606          -2.011          15.397        -2.011          15.397
1.475           1.811          5.687          1.811          5.687
2.248           7.663         -7.789          7.663         -7.789
6.355           17.374         6.767          17.374         6.767

Now that you know how to create vector columns, let’s assume you have the inverse scenario: you want to extract one of the values of a vector column into a separate single-valued column. To do this, you can use the --fromvector option. The --fromvector option takes the name (or counter) of a vector column, followed by any number of integer counters (counting from 1). It will extract those elements into separate single-valued columns. For example, let’s assume you want to extract the second element of the defghi column in the file you made before:

$ asttable vec.fits --fromvector=vector,2 -YO
# Column 1: abc      [none   ,f32,] First column.
# Column 2: vector-2 [nounits,f32,] New vector column.
1.074          -4.464
0.606           15.397
1.475           5.687
2.248          -7.789
6.355           6.767

Just like the case with --tovector above, if you want to keep the input vector column, use --keepvectfin. This feature is useful in scenarios where you want to select some rows based on a single element (or multiple) of the vector column.

Vector columns and FITS ASCII tables: As mentioned above, the FITS standard only recognizes vector columns in its Binary table format (the default FITS table format in Gnuastro). You can still use the --tableformat=fits-ascii option to write your tables in the FITS ASCII format (see Input/Output options). In this case, if a vector column is present, it will be written as separate single-element columns to avoid loosing information (as if you run called --fromvector on all the elements of the vector column). A warning is printed if this occurs.

For an application of the vector column concepts introduced here on MUSE data, see the 3D data cube tutorial and in particular these two sections: 3D measurements and spectra and Extracting a single spectrum and plotting it.

GNU Astronomy Utilities

5.3.2 Vector columns ¶

Footnotes

(145)