Column arithmetic (GNU Astronomy Utilities)

Next: Operation precedence in Table, Previous: Vector columns, Up: Table [Contents][Index]

5.3.3 Column arithmetic ¶

In many scenarios, you want to apply some kind of operation on the columns and save them in another table or feed them into another program. With Table you can do a rich set of operations on the contents of one or more columns in a table, and save the resulting values as new column(s) in the output table. For seeing the precedence of Column arithmetic in relation to other Table operators, see Operation precedence in Table.

To enable column arithmetic, the first 6 characters of the value to --column (-c) should be the activation word ‘arith ’ (note the space character in the end, after ‘arith’). After the activation word, you can use reverse polish notation to identify the operators and their operands, see Reverse polish notation. Just note that white-space characters are used between the tokens of the arithmetic expression and that they are meaningful to the command-line environment. Therefore the whole expression (including the activation word) has to be quoted on the command-line or in a shell script (see the examples below).

To identify a column you can directly use its name, or specify its number (counting from one, see Selecting table columns). When you are giving a column number, it is necessary to prefix the number with a $, similar to AWK. Otherwise the number is not distinguishable from a constant number to use in the arithmetic operation.

For example, with the command below, the first two columns of table.fits will be printed along with a third column that is the result of multiplying the first column with $10^{10}$ (for example, to convert wavelength from Meters to Angstroms). Note that without the ‘$’, it is not possible to distinguish between “1” as a column-counter, or “1” as a constant number to use in the arithmetic operation. Also note that because of the significance of $ for the command-line environment, the single-quotes are the recommended quoting method (as in an AWK expression), not double-quotes (for the significance of using single quotes see the box below).

$ asttable table.fits -c1,2 -c'arith $1 1e10 x'

Single quotes when string contains $: On the command-line, or in shell-scripts, $ is used to expand variables, for example, echo $PATH prints the value (a string of characters) in the variable PATH, it will not simply print $PATH. This operation is also permitted within double quotes, so echo "$PATH" will produce the same output. This is good when printing values, for example, in the command below, $PATH will expand to the value within it.

$ echo "My path is: $PATH"

If you actually want to return the literal string $PATH, not the value in the PATH variable (like the scenario here in column arithmetic), you should put it in single quotes like below. The printed value here will include the $, please try it to see for yourself and compare to above.

$ echo 'My path is: $PATH'

Therefore, when your column arithmetic involves the $ sign (to specify columns by number), quote your arith string with a single quotation mark. Otherwise you can use both single or double quotes.

Manipulate all columns in one call using $_all: Usually we manipulate one column in one call of column arithmetic. For instance, with the command below the elements of the AWAV column will be sumed.

$ asttable table.fits -c'arith AWAV sumvalue'

But sometimes, we want to manipulate more than one column with the same expression. For example we want to sum all the elements of all the columns. In this case we could use the following command (assuming that the table has four different AWAV-* columns):

$ asttable table.fits -c'arith AWAV-1 sumvalue' \
                      -c'arith AWAV-2 sumvalue' \
                      -c'arith AWAV-3 sumvalue' \
                      -c'arith AWAV-4 sumvalue'

To avoid repetition and mistakes, instead of using column arithmetic many times, we can use the $_all identifier. When column arithmetic confronts this special string, it will repeat the expression for all the columns in the input table. Therefore the command above can be written as:

$ asttable table.fits -c'arith $_all sumvalue'

Alternatively, if the columns have meta-data and the first two are respectively called AWAV and SPECTRUM, the command above is equivalent to the command below. Note that the character ‘$’ is no longer necessary in this scenario (because names will not be confused with numbers):

$ asttable table.fits -cAWAV,SPECTRUM -c'arith AWAV 1e10 x'

Comparison of the two commands above clearly shows why it is recommended to use column names instead of numbers. When the columns have descriptive names, the command/script actually becomes much more readable, describing the intent of the operation. It is also independent of the low-level table structure: for the second command, the column numbers of the AWAV and SPECTRUM columns in table.fits is irrelevant.

Column arithmetic changes the values of the data within the column. So the old column metadata cannot be used any more. By default the output column of the arithmetic operation will be given a generic metadata (for example, its name will be ARITH_1, which is hardly useful!). But metadata are critically important and it is good practice to always have short, but descriptive, names for each columns, units and also some comments for more explanation. To add metadata to a column, you can use the --colmetadata option that is described in Invoking Table and Operation precedence in Table.

Since the arithmetic expressions are a value to --column, it does not necessarily have to be a separate option, so the commands above are also identical to the command below (note that this only has one -c option). Just be very careful with the quoting! With the --colmetadata option, we are also giving a name, units and a comment to the third column.

$ asttable table.fits -cAWAV,SPECTRUM,'arith AWAV 1e10 x' \
           --colmetadata=3,AWAV_A,angstrom,"Wavelength (in Angstroms)"

In case you need to append columns from other tables (with --catcolumnfile), you can use those extra columns in column arithmetic also. The easiest, and most robust, way is that your columns of interest (in all files whose columns are to be merged) have different names. In this scenario, you can simply use the names of the columns you plan to append. If there are similar names, note that by default Table appends a -N to similar names (where N is the file counter given to --catcolumnfile, see the description of --catcolumnfile for more). Using column numbers can get complicated: if the number is smaller than the main input’s number of columns, the main input’s column will be used. Otherwise (when the requested column number is larger than the main input’s number of columns), the final output (after appending all the columns from all the possible files) column number will be used.

Almost all the arithmetic operators of Arithmetic operators are also supported for column arithmetic in Table. In particular, the few that are not present in the Gnuastro library¹⁴⁵ are not yet supported for column arithmetic. Besides the operators in Arithmetic operators, several operators are only available in Table to use on table columns.

wcs-to-img

Convert the given WCS positions to image/dataset coordinates based on the number of dimensions in the WCS structure of --wcshdu extension/HDU in --wcsfile. It will output the same number of columns. The first popped operand is the last FITS dimension.

For example, the two commands below (which have the same output) will produce 5 columns. The first three columns are the input table’s ID, RA and Dec columns. The fourth and fifth columns will be the pixel positions in image.fits that correspond to each RA and Dec.

$ asttable table.fits -cID,RA,DEC,'arith RA DEC wcs-to-img' \
           --wcsfile=image.fits
$ asttable table.fits -cID,RA -cDEC \
           -c'arith RA DEC wcs-to-img' --wcsfile=image.fits

img-to-wcs

Similar to wcs-to-img, except that image/dataset coordinates are converted to WCS coordinates.

eq-j2000-to-flat

Convert spherical RA and Dec (in Julian year 2000.0 equatorial coordinates; which are the most common) into RA and Dec on a flat surface based on the given reference point and projection. The full details of the operands to this operator are given below, but let’s start with a practical example to show the concept.

At (or very near) the reference point the output of this operator will be the same as the input. But as you move away from the reference point, distortions due to the particular projection will gradually cause changes in the output (when compared to the input). For example if you simply plot RA and Dec without this operator, a circular annulus on the sky will become elongated as the declination of its center goes farther from the equator. For a demonstration of the difference between curved and flat RA and Decs, see Pointings that account for sky curvature in the Tutorials chapter.

Let’s assume you want to plot a set of RA and Dec points (defined on a spherical surface) in a paper (a flat surface) and that table.fits contains the RA and Dec in columns that are called RA and DEC. With the command below, the points will be converted to flat-RA and flat-Dec using the Gnomonic projection (which is known as TAN in the FITS WSC standard; see the description of the first popped operand below):

$ asttable table.fits \
           -c'arith RA set-r DEC set-d \
                    r d r meanvalue d meanvalue TAN \
                    eq-j2000-to-flat'

As you see, the RA and Dec (r and d) are the last two operators that are popped. Before them, the reference point’s coordinates are calculated from the mean of the RA and Decs (‘r meanvalue’ and ‘d meanvalue’), and the first popped operand is the projection (TAN). We are using the mean RA and Dec as the reference point since we are assuming that this is the only set of points you want to convert. In case you have to plot multiple sets of points in the same plot, you should give the same reference point in each separate conversion; like the example below:

$ ref_ra=123.45
$ ref_dec=-6.789
$ asttable table-1.fits --output=flat-1.txt \
           -c'arith RA DEC '$ref_ra' '$ref_dec' TAN \
                    eq-j2000-to-flat'
$ asttable table-2.fits --output=flat-2.txt \
           -c'arith RA DEC '$ref_ra' '$ref_dec' TAN \
                    eq-j2000-to-flat'

This operator takes 5 operands:

The first popped operand (closest to the operator) is the standard FITS WCS projection to use; and should contain a single element (not a column). The full list of projections can be seen in the description of the --ctype option in Align pixels with WCS considering distortions. The most common projection for smaller fields of view is TAN (Gnomonic), but when your input catalog contains large portions of the sky, projections like MOL (Mollweide) should be used. This is because distortions caused by the TAN projection can become very significant after a couple of degrees from the reference point.
The second popped operand is the reference point’s declination; and should contain a single value (not a column).
The third popped operand is the reference point’s right ascension; and should contain a single value (not a column).
The fourth popped operand is the declination column of your input table (the points that will be converted).
The fifth popped operand is the right ascension column of your input table (the points that will be converted).

eq-j2000-from-flat

The inverse of eq-j2000-to-flat. In other words, you have a set of points defined on the flat RA and Dec (after the projection from spherical to flat), but you want to map them to spherical RA and Dec. For an example, see Pointings that account for sky curvature in the Gnuastro tutorials.

distance-flat

Return the distance between two points assuming they are on a flat surface. Note that each point needs two coordinates, so this operator needs four operands (currently it only works for 2D spaces). The first and second popped operands are considered to belong to one point and the third and fourth popped operands to the second point.

Each of the input points can be a single coordinate or a full table column (containing many points). In other words, the following commands are all valid:

$ asttable table.fits \
           -c'arith X1 Y1 X2 Y2 distance-flat'
$ asttable table.fits \
           -c'arith X Y 12.345 6.789 distance-flat'
$ asttable table.fits \
           -c'arith 12.345 6.789 X Y distance-flat'

In the first case we are assuming that table.fits has the following four columns X1, Y1, X2, Y2. The returned column by this operator will be the difference between two points in each row with coordinates like the following (X1, Y1) and (X2, Y2). In other words, for each row, the distance between different points is calculated. In the second and third cases (which are identical), it is assumed that table.fits has the two columns X and Y. The returned column by this operator will be the difference of each row with the fixed point at (12.345, 6.789).

distance-on-sphere

Return the spherical angular distance (along a great circle, in degrees) between the given two points. Note that each point needs two coordinates (in degrees), so this operator needs four operands. The first and second popped operands are considered to belong to one point and the third and fourth popped operands to the second point.

Each of the input points can be a single coordinate or a full table column (containing many points). In other words, the following commands are all valid:

$ asttable table.fits \
           -c'arith RA1 DEC1 RA2 DEC2 distance-on-sphere'
$ asttable table.fits \
           -c'arith RA DEC 9.876 5.432 distance-on-sphere'
$ asttable table.fits \
           -c'arith 9.876 5.432 RA DEC distance-on-sphere'

In the first case we are assuming that table.fits has the following four columns RA1, DEC1, RA2, DEC2. The returned column by this operator will be the difference between two points in each row with coordinates like the following (RA1, DEC1) and (RA2, DEC2). In other words, for each row, the angular distance between different points is calculated. In the second and third cases (which are identical), it is assumed that table.fits has the two columns RA and DEC. The returned column by this operator will be the difference of each row with the fixed point at (9.876, 5.432).

The distance (along a great circle) on a sphere between two points is calculated with the equation below, where $r_1$, $r_2$, $d_1$ and $d_2$ are the right ascensions and declinations of points 1 and 2.

$$\cos(d)=\sin(d_1)\sin(d_2)+\cos(d_1)\cos(d_2)\cos(r_1-r_2)$$

ra-to-degree

Convert the hour-wise Right Ascension (RA) string, in the sexagesimal format of _h_m_s or _:_:_, to degrees. Note that the input column has to have a string format. In FITS tables, string columns are well-defined. For plain-text tables, please follow the standards defined in Gnuastro text table format, otherwise the string column will not be read.

$ asttable catalog.fits -c'arith RA ra-to-degree'
$ asttable catalog.fits -c'arith $5 ra-to-degree'

dec-to-degree

Convert the sexagesimal Declination (Dec) string, in the format of _d_m_s or _:_:_, to degrees (a single floating point number). For more details please see the ra-to-degree operator.

degree-to-ra ¶

Convert degrees (a column with a single floating point number) to the Right Ascension, RA, string (in the sexagesimal format hours, minutes and seconds, written as _h_m_s). The output will be a string column so no further mathematical operations can be done on it. The output file can be in any format (for example, FITS or plain-text). If it is plain-text, the string column will be written following the standards described in Gnuastro text table format.

degree-to-dec ¶

Convert degrees (a column with a single floating point number) to the Declination, Dec, string (in the format of _d_m_s). See the degree-to-ra for more on the format of the output.

date-to-sec ¶

Return the number of seconds from the Unix epoch time (00:00:00 Thursday, January 1st, 1970). The input (popped) operand should be a string column in the FITS date format (most generally: YYYY-MM-DDThh:mm:ss.ddd...).

The returned operand will be named UNIXSEC (short for Unix-seconds) and will be a 64-bit, signed integer, see Numeric data types. If the input string has sub-second precision, it will be ignored because floating point numbers cannot accurately store numbers with many significant digits. To preserve sub-second precision, please use date-to-millisec.

For example, in the example below we are using this operator, in combination with the --keyvalue option of the Fits program, to sort your desired FITS files by observation date (value in the DATE-OBS keyword in example below):

$ astfits *.fits --keyvalue=DATE-OBS --colinfoinstdout \
          | asttable -cFILENAME,'arith DATE-OBS date-to-sec' \
                     --colinfoinstdout \
          | asttable --sort=UNIXSEC

If you do not need to see the Unix-seconds any more, you can add a -cFILENAME (short for --column=FILENAME) at the end. For more on --keyvalue, see Keyword inspection and manipulation.

date-to-millisec

Return the number of milli-seconds from the Unix epoch time (00:00:00 Thursday, January 1st, 1970). The input (popped) operand should be a string column in the FITS date format (most generally: YYYY-MM-DDThh:mm:ss.ddd..., where .ddd is the optional sub-second component).

The returned operand will be named UNIXMILLISEC (short for Unix milli-seconds) and will be a 64-bit, signed integer, see Numeric data types. The returned value is not a floating point type because for large numbers, floating point data types loose single-digit precision (which is important here).

Other than the units of the output, this operator behaves similarly to date-to-sec. See the description of that operator for an example.

sorted-to-interval

Given a single column (which must be already sorted and have a numeric data type), return two columns: the first returned column is the minimum and the second returned column is the maximum value of the interval of each row row. The maximum of each row is equal to the minimum of the previous row; thus creating a contiguous interval coverage of the input column’s range in all rows.

The minimum value of the first row and maximum of the last row will be smaller/larger than the respective row of the input (based on the distance to the next/previous element). This is done to ensure that if your input has a fixed interval length between all elements, the first and last intervals also have that fixed length.

For example, with the command below, we’ll use this operator on a hypothetical radial profile. Note how the intervals are contiguous even though the radius values are not equally distant (if the row with a radius of 2.5 didn’t exist, the intervals would all be the same length). For another example of the usage of this operator, see the example in the description of --customtable in MakeProfiles profile settings.

$ cat radial-profile.txt
# Column 1: RADIUS [pix,f32,] Distance to center in pixels.
# Column 2: MEAN   [ADU,f32,] Mean value at that radius.
0    100
1    40
2    30
2.5  25
3    20

$ asttable radial-profile.txt --txtf32f=fixed --txtf32p=3 \
           -c'arith RADIUS sorted-to-interval',MEAN
-0.500        0.500         100.000
0.500         1.500         40.000
1.500         2.250         30.000
2.250         2.750         25.000
2.750         3.250         20.000

Such intervals can be useful in scenarios like generating the input to --customtable in MakeProfiles (see MakeProfiles profile settings) from a radial profile (see Generate radial profile).

Footnotes

(145)

For a list of the Gnuastro library arithmetic operators, please see the macros starting with GAL_ARITHMETIC_OP and ending with the operator name in Arithmetic on datasets (arithmetic.h).