GNU Astronomy Utilities



12.3.21 Matching (match.h)

Matching is often necessary when two measurements of the same points have been done using different instruments (or hardware), different software or different configurations of the same software. In other words, you have two catalogs or tables, and each has N columns containing the N-dimensional “coordinate” values of each point. Each table can have other columns too, for example, one can have magnitudes in one filter, and another can have morphology measurements.

The matching functions here will use the coordinate columns of the two tables to find a permutation for each, and the total number of matched rows (\(N_{match}\)). This will enable you to match by the positions if you like. At a higher level, you can apply the permutation to the magnitude or morphology columns to merge the catalogs over the \(N_{match}\) rows. The input and output data formats of the functions are the some and described below before the actual functions. Each function also has extra arguments due to the particular algorithm it uses for the matching.

The two inputs of the functions (coord1 and coord2) must be List of gal_data_t. Each gal_data_t node in coord1 or coord2 should be a single dimensional dataset (column in a table) and all the nodes (in each) must have the same number of elements (rows). In other words, each column can be visualized as having the coordinates of each point in its respective dimension. The dimensions of the coordinates is determined by the number of gal_data_t nodes in the two input lists (which must be equal). The number of rows (or the number of elements in each gal_data_t) in the columns of coord1 and coord2 can (and, usually will!) be different. In summary, these functions will be happy if you use gal_table_read to read the two coordinate columns from a file, see Table input output (table.h).

The functions below return a simply-linked list of three 1D datasets (see List of gal_data_t), let’s call the returned dataset ret. The first two (ret and ret->next) are permutations. In other words, the array elements of both have a type of size_t, see Permutations (permutation.h). The third node (ret->next->next) is the calculated distance for that match and its array has a type of double. The number of matches will be put in the space pointed by the nummatched argument. If there was not any match, this function will return NULL.

The two permutations can be applied to the rows of the two inputs: the first one (ret) should be applied to the rows of the table containing coord1 and the second one (ret->next) to the table containing coord2. After applying the returned permutations to the inputs, the top nummatched elements of both will match with each other. The ordering of the rest of the elements is undefined (depends on the matching function used). The third node is the distances between the respective match (which may be elliptical distance, see discussion of “aperture” below).

The functions will not simply return the nearest neighbor as a match. This is because the nearest neighbor may be too far to be a meaningful! They will check the distance between the nearest neighbor of each point and only return a match if it is within an acceptable N-dimensional distance (or “aperture”). The matching aperture is defined by the aperture array that is an input argument to the functions.

If several points of one catalog lie within this aperture of a point in the other catalog, the nearest is defined as the match. In a 2D situation (where the input lists have two nodes), for the most generic case, aperture must have three elements: the major axis length, axis ratio and position angle (see Defining an ellipse and ellipsoid). If aperture[1]==1, the aperture will be a circle of radius aperture[0] and the third value will not be used. When the aperture is an ellipse, distances between the points are also calculated in the respective elliptical distances (\(r_{el}\) in Defining an ellipse and ellipsoid).

Output permutations ignore internal sorting: the output permutations will correspond to the initial inputs. Therefore, even when inplace!=0 (and this function re-arranges the inputs in place), the output permutation will correspond to original (possibly non-sorted) inputs. The reason for this is that you rarely want to permute the actual positional columns after the match. Usually, you also have other columns (such as the magnitude and morphology) and you want to find how they differ between the objects that match. Once you have the permutations, they can be applied to those other columns (see Permutations (permutation.h)) and the higher-level processing can continue. So if you do not need the coordinate columns for the rest of your analysis, it is better to set inplace=1.

Function:
gal_data_t *
gal_match_sort_based (gal_data_t *coord1, gal_data_t *coord2, double *aperture, int sorted_by_first, int inplace, size_t minmapsize, int quietmmap, size_t *nummatched)

Use a basic sort-based match to find the matching points of two input coordinates. See the descriptions above on the format of the inputs and outputs. To speed up the search, this function will sort the input coordinates by their first column (first axis). If both are already sorted by their first column, you can avoid the sorting step by giving a non-zero value to sorted_by_first.

When sorting is necessary and inplace is non-zero, the actual input columns will be sorted. Otherwise, an internal copy of the inputs will be made, used (sorted) and later freed before returning. Therefore, when inplace==0, inputs will remain untouched, but this function will take more time and memory. If internal allocation is necessary and the space is larger than minmapsize, the space will be not allocated in the RAM, but in a file, see description of --minmapsize and --quietmmap in Processing options.

Function:
gal_data_t *
gal_match_kdtree (gal_data_t *coord1, gal_data_t *coord2, gal_data_t *coord1_kdtree, size_t kdtree_root, double *aperture, size_t numthreads, size_t minmapsize, int quietmmap, size_t *nummatched)

Use the k-d tree concept for finding matches between two catalogs, optionally in parallel (on numthreads threads). The k-d tree of the first input (coord1_kdtree), and its root index (kdtree_root), should be constructed and found before calling this function, to do this, you can use the gal_kdtree_create of K-d tree (kdtree.h). The desired aperture array is the same as gal_match_sort_based and described at the top of this section. If coord1_kdtree==NULL, this function will return a NULL pointer and write a value of 0 in the space that nummatched points to.

The final number of matches is returned in nummatched and the format of the returned dataset (three columns) is described above. If internal allocation is necessary and the space is larger than minmapsize, the space will be not allocated in the RAM, but in a file, see description of --minmapsize and --quietmmap in Processing options.