Matching involves two catalogs, let’s call them catalog A (with N rows) and catalog B (with M rows).
The most basic matching algorithm that immediately comes to mind is this:
for each row in A (let’s call it \(A_i\)), go over all the rows in B (\(B_j\), where \(0 However, this basic parsing algorithm is very computationally expensive:
\(N\times M\) distances have to measured, and calculating the distance requires a square root and power of 2: in 2 dimensions it would be \(d(A_i,B_j)=\sqrt{(B_{ix}-A_{ix})^2+(B_{iy}-A_{iy})^2}\).
If an elliptical aperture is necessary, it can even get more complicated (see Defining an ellipse and ellipsoid).
Such operations are not simple, and will consume many cycles of your CPU!
As a result, this basic algorithm will become terribly slow as your datasets grow in size.
For example, when N or M exceed hundreds of thousands (which is common in the current days with datasets like the European Space Agency’s Gaia mission).
Therefore that basic parsing algorithm will take too much time and more efficient ways to find the nearest neighbor need to be found.
Gnuastro’s Match currently has the following algorithms for finding the nearest neighbor:
In this algorithm, we will use a moving window over the sorted datasets:
You can use this method by disabling the default algorithm that is described next with --kdtree=disable.
The reason the sort-based algorithm is not the default is that it has some caveats:
The k-d tree concept is much more abstract, but powerful (addressing all the caveats of the sort-based method described above.).
In short a k-d tree is a partitioning of a k-dimensional space (“k” is just a place-holder, so together with “d” for dimension, “k-d” means “any number of dimensions”!).
The k-d tree of table A is another table with the same number of rows, but only two integer columns: the integers contain the row indexs (counting from zero) of the left and right “branch” (in the “tree”) of that row.
With a k-d tree we can find the nearest point with much fewer checks (statistically: compared to always parsing everything from the top-down).
We won’t go deeper into the concept of k-d trees here and will focus on the high-level usage in of k-d trees in Match.
In case you are interested to learn more on the k-d tree concept and Gnuastro’s implementation, please see its Wikipedia page and K-d tree (kdtree.h).
When given two catalogs (like the command below), Gnuastro’s Match will internally construct a k-d tree for catalog A (the first catalog given to it) and use the k-d tree of A, for finding the nearest row in B to each row in A.
This is done in parallel on all available threads (unless you specify a certain number of threads to use with --numthreads, see Multi-threaded operations)
In scenarios where your reference (A) catalog is the same (and it is large!), you can save time by building the k-d tree of A and saving it into a file once, and simply use that k-d tree in all future matches.
The command below shows how to do the first step (to build the k-d tree and keep it in a file).
This external k-d tree (A-kdtree.fits) can be fed to Match later (to avoid having to reconstruct it every time you want to match a new catalog with A).
The commands below show how to do this by matching both B.fits and C.fits with A.fits using its pre-built k-d tree.
Note that the same --kdtree option above (which has a value of There is just one technical issue however: when there is no neighbor within the acceptable distance of the k-d tree, it is forced to parse all elements to confirm that there is no match!
Therefore if one catalog only covers a small portion (in the coordinate space) of the other catalog, the k-d tree algorithm will be forced to parse the full k-d tree for the majority of points!
This will dramatically decrease the running speed of Match.
To mitigate this, Match first divides the range of the first input in all its dimensions into bins that have a width of the requested aperture (similar to a histogram), and will only do the k-d tree based search when the point in catalog B actually falls within a bin that has at least one element in A.
In summary, here are the points to consider when selecting an algorithm, or the order of your inputs (for optimal speed, the match will be the same):
$ astmatch A.fits --ccol1=ra,dec B.fits --ccol2=RA,DEC \
--aperture=1/3600
$ astmatch A.fits --ccol1=ra,dec --kdtree=build \
--output=A-kdtree.fits
build), is now given the file name of the already-built k-d tree.
$ astmatch A.fits --ccol1=ra,dec --kdtree=A-kdtree.fits \
B.fits --ccol2=RA,DEC --aperture=1/3600 \
--output=A-B.fits
$ astmatch A.fits --ccol1=ra,dec --kdtree=A-kdtree.fits \
C.fits --ccol2=RA,DEC --aperture=1/3600 \
--output=A-C.fits
JavaScript license information
GNU Astronomy Utilities 0.24 manual, November 2025.