Invoking astmatch (GNU Astronomy Utilities)

Previous: Matching algorithms, Up: Match [Contents][Index]

7.5.2 Invoking Match ¶

When given two catalogs, Match finds the rows that are nearest to each other within an input aperture. The executable name is astmatch with the following general template

$ astmatch [OPTION ...] input-1 input-2

One line examples:

## 1D wavelength match (within 5 angstroms) of the two inputs.
## The wavelengths are in the 5th and 10th columns respectively.
$ astmatch --aperture=5e-10 --ccol1=5 --ccol2=10 in1.fits in2.txt

## Find the row that is closest to (RA,DEC) of (12.3456,6.7890)
## with a maximum distance of 1 arcseconds (1/3600 degrees).
## The coordinates can also be given in sexagesimal.
$ astmatch input1.txt --ccol1=ra,dec --coord=12.3456,6.7890 \
           --aperture=1/3600

## Find matching rows of two catalogs with a circular aperture
## of width 2 (same unit as position columns: pixels in this case).
$ astmatch input1.txt input2.fits --aperture=2 \
           --ccol1=X,Y --ccol2=IMG_X,IMG_Y

## Similar to before, but the output is created by merging various
## columns from the two inputs: columns 1, RA, DEC from the first
## input, followed by all columns starting with `MAG' and the `BRG'
## column from second input and the 10th column from first input.
$ astmatch input1.txt input2.fits --aperture=1/3600 \
           --ccol1=ra,dec --ccol2=RAJ2000,DEJ2000 \
           --outcols=a1,aRA,aDEC,b/^MAG/,bBRG,a10

## Assuming both inputs have the same column metadata (same name
## and numeric type), the output will contain all the rows of the
## first input, appended with the non-matching rows of the second
## input (good when you need to merge multiple catalogs that
## may have matching items, which you do not want to repeat).
$ astmatch input1.fits input2.fits --ccol1=RA,DEC --ccol2=RA,DEC \
           --aperture=1/3600 --notmatched --outcols=_all

## Match the two catalogs within an elliptical aperture of 1 and 2
## arc-seconds along RA and Dec respectively.
$ astmatch --aperture=1/3600,2/3600 in1.fits in2.txt

## Match the RA and DEC columns of the first input with the RA_D
## and DEC_D columns of the second within a 0.5 arcseconds aperture.
$ astmatch --ccol1=RA,DEC --ccol2=RA_D,DEC_D --aperture=0.5/3600  \
           in1.fits in2.fits

## Match in 3D (RA, Dec and Wavelength).
$ astmatch --ccol1=2,3,4 --ccol2=2,3,4 -a0.5/3600,0.5/3600,5e-10 \
           in1.fits in2.txt

Match will find the rows that are nearest to each other in two catalogs (given some coordinate columns). Alternatively, it can construct the k-d tree of one catalog to save in a FITS file for future matching of the same catalog with many others. To understand the inner working of Match and its algorithms, see Matching algorithms.

When matching, two catalogs are necessary for input. But for constructing a k-d tree, only a single catalog should be given. The input tables can be plain text tables or FITS tables, for more see Tables. But other ways of feeding inputs area also supported:

The first catalog can also come from the standard input (for example, a pipe that feeds the output of a previous command to Match, see Standard input);
When you only want to match one point with another catalog, you can use the --coord option to avoid creating a file for the second input catalog.

Match follows the same basic behavior of all Gnuastro programs as fully described in Common program behavior. If the first input is a FITS file, the common --hdu option (see Input/Output options) should be used to identify the extension. When the second input is FITS, the extension must be specified with --hdu2.

When --quiet is not called, Match will print its various processing phases (including the number of matches found) in standard output (on the command-line). When matches are found, by default, two tables will be output (if in FITS format, as two HDUs). Each output table will contain the re-arranged rows of the respective input table. In other words, both tables will have the same number of rows, and row N in both corresponds to the 10th match between the two. If no matches are found, the columns of the output table(s) will have zero rows (with proper meta-data). The output format can be changed with the following options:

--outcols: The output will be a single table with rows chosen from either of the two inputs in any order.
--notmatched: The output tables will contain the rows that did not match between the two tables. If called with --outcols, the output will be a single table with all non-matched rows of both tables.
--logasoutput: The output will be a single table with the contents of the log file, see below.

If no output file name is given with the --output option, then automatic output Automatic output will be used to determine the output name(s). Depending on --tableformat (see Input/Output options), the output will be a (possibly multi-extension) FITS file or (possibly two) plain text file(s). Generally, giving a filename to --output is recommended.

When the --log option is called (see Operating mode options), and there was a match, Match will also create a file named astmatch.fits (or astmatch.txt, depending on --tableformat, see Input/Output options) in the directory it is run in. This log table will have three columns. The first and second columns show the matching row/record number (counting from 1) of the first and second input catalogs respectively. The third column is the distance between the two matched positions. The units of the distance are the same as the given coordinates (given the possible ellipticity, see description of --aperture below). When --logasoutput is called, no log file (with a fixed name) will be created. In this case, the output file (possibly given by the --output option) will have the contents of this log file.

--log is not thread-safe: As described above, when --logasoutput is not called, the Log file has a fixed name for all calls to Match. Therefore if a separate log is requested in two simultaneous calls to Match in the same directory, Match will try to write to the same file. This will cause problems like unreasonable log file, undefined behavior, or a crash. Remember that --log is mainly intended for debugging purposes, if you want the log file with a specific name, simply use --logasoutput (which will also be faster, since no arranging of the input columns is necessary).

-H STR

--hdu2=STR

The extension/HDU of the second input if it is a FITS file. When it is not a FITS file, this option’s value is ignored. For the first input, the common option --hdu must be used.

-k STR

--kdtree=STR

Select the algorithm and/or the way to construct or import the k-d tree. A summary of the four acceptable strings for this option are described here for completeness. However, for a much more detailed discussion on Match’s algorithms with examples, see Matching algorithms.

internal: Construct a k-d tree for the first input internally (within the same run of Match), and parallelize over the rows of the second to find the nearest points. This is the default algorithm/method used by Match (when this option is not called).
build: Only construct a k-d tree of a single input and abort. The name of the k-d tree is value to --output.
CUSTOM-FITS-FILE: Use the given FITS file as a k-d tree (that was previously constructed with Match itself) of the first input, and do not construct any k-d tree internally. The FITS file should have two columns with an unsigned 32-bit integer data type and a KDTROOT keyword that contains the index of the root of the k-d tree. For more on Gnuastro’s k-d tree format, see K-d tree (kdtree.h).
disable: Do not use the k-d tree algorithm for finding the nearest neighbor, instead, use the sort-based method.

--kdtreehdu=STR

The HDU of the FITS file, when a FITS file is given to the --kdtree option that was described above.

--outcols=STR[,STR,[...]]

Columns (from both inputs) to write into a single matched table output. The value to --outcols must be a comma-separated list of column identifiers (number or name, see Selecting table columns). The expected format depends on --notmatched and explained below. By default (when --nomatched is not called), the number of rows in the output will be equal to the number of matches. However, when --notmatched is called, all the rows (from the requested columns) of the first input are placed in the output, and the not-matched rows of the second input are inserted afterwards (useful when you want to merge unique entries of multiple catalogs into one).

Default (only matching rows)

The first character of each string specifies the input catalog: a for the first and b for the second. The rest of the characters of the string will be directly used to identify the proper column(s) in the respective table. See Selecting table columns for how columns can be specified in Gnuastro.

For example, the output of --outcols=a1,bRA,bDEC will have three columns: the first column of the first input, along with the RA and DEC columns of the second input.

If the string after a or b is _all, then all the columns of the respective input file will be written in the output. For example, the command below will print all the input columns from the first catalog along with the 5th column from the second:

$ astmatch a.fits b.fits --outcols=a_all,b5

_all can be used multiple times, possibly on both inputs. Tip: if an input’s column is called _all (an unlikely name!) and you do not want all the columns from that table the output, use its column number to avoid confusion.

Another example is given in the one-line examples above. Compared to the default case (where two tables with all their columns) are saved separately, using this option is much faster: it will only read and re-arrange the necessary columns and it will write a single output table. Combined with regular expressions in large tables, this can be a very powerful and convenient way to merge various tables into one.

When --coord is given, no second catalog will be read. The second catalog will be created internally based on the values given to --coord. So column names are not defined and you can only request integer column numbers that are less than the number of coordinates given to --coord. For example, if you want to find the row matching RA of 1.2345 and Dec of 6.7890, then you should use --coord=1.2345,6.7890. But when using --outcols, you cannot give bRA, or b25.

With --notmatched

Only the column names/numbers should be given (for example, --outcols=RA,DEC,MAGNITUDE). It is assumed that both input tables have the requested column(s) and that the numerical data types of each column in each input (with same name) is the same as the corresponding column in the other. Therefore if one input has a MAGNITUDE column with a 32-bit floating point type, but the MAGNITUDE column of the other is 64-bit floating point, Match will crash with an error. The metadata of the columns will come from the first input.

As an example, let’s assume input1.txt and input2.fits each have a different number of columns and rows. However, they both have the RA (64-bit floating point), DEC (64-bit floating point) and MAGNITUDE (32-bit floating point) columns. If input1.txt has 100 rows and input2.fits has 300 rows (such that 50 of them match within 1 arcsec of the first), then the output of the command above will have \(100+(300-50)=350\) rows and only three columns. Other columns in each catalog, which may be different, are ignored.

$ astmatch input1.txt  --ccol1=RA,DEC \
           input2.fits --ccol2=RA,DEC \
           --aperture=1/3600 \
           --notmatched --outcols=RA,DEC,MAGNITUDE

-l

--logasoutput

The output file will have the contents of the log file: indexes in the two catalogs that match with each other along with their distance, see description of the log file above.

When this option is called, a separate log file will not be created and the output will not contain any of the input columns (either as two tables containing the re-arranged columns of each input, or a single table mixing columns), only their indices in the log format.

--notmatched

Write the non-matching rows into the outputs, not the matched ones. By default, this will produce two output tables, that will not necessarily have the same number of rows. However, when called with --outcols, it is possible to import non-matching rows of the second into the first. See the description of --outcols for more.

-c INT/STR[,INT/STR]

--ccol1=INT/STR[,INT/STR]

The coordinate columns of the first input. The number of dimensions for the match is determined by the number of comma-separated values given to this option. The values can be the column number (counting from 1), exact column name or a regular expression. For more, see Selecting table columns. See the one-line examples above for some usages of this option.

-C INT/STR[,INT/STR]

--ccol2=INT/STR[,INT/STR]

The coordinate columns of the second input. See the example in --ccol1 for more.

-d FLT[,FLT]

--coord=FLT[,FLT]

Manually specify the coordinates to match against the given catalog. With this option, Match will not look for a second input file/table and will directly use the coordinates given to this option. When the coordinates are RA and Dec, the comma-separated values can either be in degrees (a single number), or sexagesimal (_h_m_ for RA, _d_m_ for Dec, or _:_:_ for both).

When this option is called, the output changes in the following ways: 1) when --outcols is specified, for the second input, it can only accept integer numbers that are less than the number of values given to this option, see description of that option for more. 2) By default (when --outcols is not used), only the matching row of the first table will be output (a single file), not two separate files (one for each table).

This option is good when you have a (large) catalog and only want to match a single coordinate to it (for example, to find the nearest catalog entry to your desired point). With this option, you can write the coordinates on the command-line and thus avoid the need to make a single-row file.

-a FLT[,FLT[,FLT]]

--aperture=FLT[,FLT[,FLT]]

Parameters of the aperture for matching. The values given to this option can be fractions, for example, when the position columns are in units of degrees, 1/3600 can be used to ask for one arc-second. The interpretation of the values depends on the requested dimensions (determined from --ccol1 and --ccol2) and how many values are given to this option.

When multiple objects are found within the aperture, the match is defined as the nearest one. In a multi-dimensional dataset, when the aperture is a general ellipse or ellipsoid (and not a circle or sphere), the distance is calculated in the elliptical space along the major axis. For the defintion of this distance, see \(r_{el}\) in Defining an ellipse and ellipsoid.

1D match

The aperture/interval can only take one value: half of the interval around each point (maximum distance from each point).

2D match

In a 2D match, the aperture can be a circle, an ellipse aligned in the axes or an ellipse with a rotated major axis. To simply the usage, you can determine the shape based on the number of free parameters for each.

1 number: for example, --aperture=2. The aperture will be a circle of the given radius. The value will be in the same units as the columns in --ccol1 and --ccol2).
2 numbers: for example, --aperture=3,4e-10. The aperture will be an ellipse (if the two numbers are different) with the respective value along each dimension. The numbers are in units of the first and second axis. In the example above, the semi-axis value along the first axis will be 3 (in units of the first coordinate) and along the second axis will be \(4\times10^{-10}\) (in units of the second coordinate). Such values can happen if you are comparing catalogs of a spectra for example. If more than one object exists in the aperture, the nearest will be found along the major axis as described in Defining an ellipse and ellipsoid.
3 numbers: for example, --aperture=2,0.6,30. The aperture will be an ellipse (if the second value is not 1). The first number is the semi-major axis, the second is the axis ratio and the third is the position angle (in degrees). If multiple matches are found within the ellipse, the distance (to find the nearest) is calculated along the major axis in the elliptical space, see Defining an ellipse and ellipsoid.

3D match

The aperture (matching volume) can be a sphere, an ellipsoid aligned on the three axes or a genenral ellipsoid rotated in any direction. To simplifythe usage, the shape can be determined based on the number of values given to this option.

1 number: for example, --aperture=3. The matching volume will be a sphere of the given radius. The value is in the same units as the input coordinates.
3 numbers: for example, --aperture=4,5,6e-10. The aperture will be a general ellipsoid with the respective extent along each dimension. The numbers must be in the same units as each axis. This is very similar to the two number case of 2D inputs. See there for more.
6 numbers: for example, --aperture=4,0.5,0.6,10,20,30. The numbers represent the full general ellipsoid definition (in any orientation). For the definition of a general ellipsoid, see Defining an ellipse and ellipsoid. The first number is the semi-major axis. The second and third are the two axis ratios. The last three are the three Euler angles in units of degrees in the ZXZ order as fully described in Defining an ellipse and ellipsoid.