GNU Astronomy Utilities



5.3.4 Operation precedence in Table

The Table program can do many operations on the rows and columns of the input tables and they are not always applied in the order you call the operation on the command-line. In this section we will describe which operation is done before/after which operation. Knowing this precedence table is important to avoid confusion when you ask for more than one operation. For a description of each option, please see Invoking Table. By default, column-based operations will be done first. You can ask for switching to row-based operations to be done first, using the --rowfirst option.

Pipes for different precedence: It may happen that your desired series of operations cannot be done with the precedence mentioned below (in one command). In this case, you can pipe the output of one call to asttable to another asttable. Just don’t forget to give -O (or --colinfoinstdout) to the first instance (so the column metadata are also passed to the next instance). Without metadata, all numbers will be read as double-precision (see Gnuastro text table format; recall that piping is done in plain text format), vector columns will be broken into single-valued columns, and column names, units and comments will be lost. At the end of this section, there is an example of doing this.

Input table information

The first set of operations that will be preformed (if requested) are the printing of the input table information. Therefore, when the following options are called, the column data are not read at all. Table simply reads the main input’s column metadata (name, units, numeric data type and comments), and the number of rows and prints them. Table then terminates and no other operation is done. These can therefore be called at the end of an arbitrarily long Table command. When you have forgot some information about the input table. You can then delete these options and continue writing the command (using the shell’s history to retrieve the previous command with an up-arrow key).

At any time only a single one of the options in this category may be called. The order of checking for these options is therefore important: in the same order that they are described below:

Column and row information (--information or -i)

Print the list of input columns and the metadata of each column in a single row. This includes the column name, numeric data type, units and comments of each column within a separate row of the output. Finally, print the number of rows.

Number of columns (--info-num-cols)

Print the number of columns in the input table. Only a single integer (number of columns) is printed before Table terminates.

Number of rows (--info-num-rows)

Print the number of rows in the input table. Only a single integer (number of rows) is printed before Table terminates.

Column selection (--column)

When this option is given, only the columns given to this option (from the main input) will be used for all future steps. When --column (or -c) is not given, then all the main input’s columns will be used in the next steps.

Column-based operations

By default the following column-based operations will be done before the row-based operations in the next item. If you need to give precedence to row-based operations, use --rowfirst.

Column(s) from other file(s): --catcolumnfile

When column concatenation (addition) is requested, columns from other tables (in other files, or other HDUs of the same FITS file) will be added after the existing columns are read from the main input. In one command, you can call --catcolumnfile multiple times to allow addition of columns from many files.

Therefore you can merge the columns of various tables into one table in this step (at the start), then start adding/limiting the rows, or building vector columns, . If any of the row-based operations below are requested in the same asttable command, they will also be applied to the rows of the added columns. However, the conditions to keep/reject rows can only be applied to the rows of the columns in main input table (not the columns that are added with these options).

Extracting single-valued columns from vectors (--fromvector)

Once all the input columns are read into memory, if any of them are vectors, you can extract a single-valued column from the vector columns at this stage. For more on vector columns, see Vector columns.

Creating vector columns (--tovector)

After column arithmetic, there is no other way to add new columns so the --tovector operator is applied at this stage. You can use it to merge multiple columns that are available in this stage to a single vector column. For more, see Vector columns.

Column arithmetic

Once the final rows are selected in the requested order, column arithmetic is done (if requested). For more on column arithmetic, see Column arithmetic.

Row-based operations

Row-based operations only work within the rows of existing columns when they are activated. By default row-based operations are activated after column-based operations (which are mentioned above). If you need to give precedence to row-based operations, use --rowfirst.

Rows from other file(s) (--catrowfile)

With this feature, you can import rows from other tables (in other files, or other HDUs of the same FITS file). The same column selection of --column is applied to the tables given to this option. The column metadata (name, units and comments) will be taken from the main input. Two conditions are mandatory for adding rows:

  • The number of columns used from the new tables must be equal to the number of columns in memory, by the time control reaches here.
  • The data type of each column (see Numeric data types) should be the same as the respective column in memory by the time control reaches here. If the data types are different, you can use the type conversion operators of column arithmetic which has higher precedence (and will therefore be applied before this by default). For more on type conversion, see Numerical type conversion operators and Column arithmetic).
Row selection by value in a column

The following operations select rows based on the values in them. A more complete description of each of these options is given in Invoking Table.

  • --range: only keep rows where the value in the given column is within a certain interval.
  • --inpolygon: only keep rows where the value is within the polygon of --polygon.
  • --outpolygon: only keep rows outside the polygon of --polygon.
  • --equal: only keep rows with an specified value in given column.
  • --notequal: only keep rows without specified value in given column.
  • --noblank: only keep rows that are not blank in the given column(s).

These options can be called any number of times (to limit the final rows based on values in different columns for example). Since these are row-rejection operations, their internal order is irrelevant. In other words, it makes no difference if --equal is called before or after --range for example.

As a side-effect, because NaN/blank values are defined to fail on any condition, these operations will also remove rows with NaN/blank values in the specified column they are checking. Also, the columns that are used for these operations do not necessarily have to be in the final output table (you may not need the column after doing the selection based on it).

By default, these options are applied after merging columns from other tables. However, currently, the column given to these options can only come from the main input table. If you need to apply these operations on columns from --catcolumnfile, pipe the output of one instance of Table with --catcolumnfile into another instance of Table as suggested in the box above this list.

These row-based operations options are applied first because the speed of later operations can be greatly affected by the number of rows. For example, if you also call the --sort option, and your row selection will result in 50 rows (from an input of 10000 rows), limiting the number of rows first will greatly speed up the sorting in your final output.

Sorting (--sort)

Sort of the rows based on values in a certain column. The column to sort by can only come from the main input table columns (not columns that may have been added with --catcolumnfile).

Row selection (by position)
  • --head: keep only requested number of top rows.
  • --tail: keep only requested number of bottom rows.
  • --rowrandom: keep only a random number of rows.
  • --rowrange: keep only rows within a certain positional interval.

These options limit/select rows based on their position within the table (not their value in any certain column).

Transpose vector columns (--transpose)

Transposing vector columns will not affect the number or metadata of columns, it will just re-arrange them in their 2D structure. As a result, after transposing, the number of rows changes, as well as the number of elements in each vector column. See the description of this option in Invoking Table for more (with an example).

Column metadata (--colmetadata)

Once the structure of the final table is set, you can set the column metadata just before finishing.

Output row selection (--noblankend)

Only keep the output rows that do not have a blank value in the given column(s). For example, you may need to apply arithmetic operations on the columns (through Column arithmetic) before rejecting the undesired rows. After the arithmetic operation is done, you can use the where operator to set the non-desired columns to NaN/blank and use --noblankend option to remove them just before writing the output. In other scenarios, you may want to remove blank values based on columns in another table. To help in readability, you can also use the final column names that you set with --colmetadata! See the example below for applying any generic value-based row selection based on --noblankend.

As an example, let’s review how Table interprets the command below. We are assuming that table.fits contains at least three columns: RA, DEC and PARAM and you only want the RA and Dec of the rows where \(p\times 2<5\) (\(p\) is the value of each row in the PARAM column).

$ asttable table.fits -cRA,DEC --noblankend=MULTIP \
           -c'arith PARAM 2 x set-i i i 5 gt nan where' \
           --colmetadata=3,MULTIP,unit,"Description of column"

Due to the precedence described in this section, Table does these operations (which are independent of the order of the operations written on the command-line):

  1. At the start (with -cRA,DEC), Table reads the RA and DEC columns.
  2. In between all the operations in the command above, Column arithmetic (with -c'arith ...') has the highest precedence. So the arithmetic operation is done and stored as a new (third) column. In this arithmetic operation, we multiply all the values of the PARAM column by 2, then set all those with a value larger than 5 to NaN (for more on understanding this operation, see the ‘set-’ and ‘where’ operators in Arithmetic operators).
  3. Updating column metadata (with --colmetadata) is then done to give a name (MULTIP) to the newly calculated (third) column. During the process, besides a name, we also set a unit and description for the new column. These metadata entries are very important, so always be sure to add metadata after doing column arithmetic.
  4. The lowest precedence operation is --noblankend=MULTIP. So only rows that are not blank/NaN in the MULTIP column are kept.
  5. Finally, the output table (with three columns) is written to the command-line. If you also want to print the column metadata, you can use the -O (or --colinfoinstdout) option. Alternatively, if you want the output in a file, you can use the --output option to save the table in FITS or plain-text format.

It may happen that your desired operation needs a separate precedence. In this case you can pipe the output of Table into another call of Table and use the -O (or --colinfoinstdout) option to preserve the metadata between the two calls.

For example, let’s assume that you want to sort the output table from the example command above based on the new MULTIP column. Since sorting is done prior to column arithmetic, you cannot do it in one command, but you can circumvent this limitation by simply piping the output (including metadata) to another call to Table:

asttable table.fits -cRA,DEC --noblankend=MULTIP --colinfoinstdout \
         -c'arith PARAM 2 x set-i i i 5 gt nan where' \
         --colmetadata=3,MULTIP,unit,"Description of column" \
         | asttable --sort=MULTIP --output=selected.fits