Invoking astquery (GNU Astronomy Utilities)

Previous: Available databases, Up: Query [Contents][Index]

5.4.2 Invoking Query ¶

Query provides a high-level interface to downloading subsets of data from databases. The executable name is astquery with the following general template

$ astquery DATABASE-NAME [OPTION...] ...

One line examples:


## Information about all datasets in ESA's GAIA database:
$ astquery gaia --information

## Only show catalogs in VizieR that have 'MUSE' in their
## description. The '-i' is short for '--information'.
$ astquery vizier -i --limitinfo=MUSE

## List of columns in 'J/A+A/608/A2/udf10' (one of the above).
$ astquery vizier --dataset=J/A+A/608/A2/udf10 -i

## ID, RA and Dec of all Gaia sources within an image.
$ astquery gaia --dataset=dr3 --overlapwith=image.fits \
           -csource_id,ra,dec

## RA, Dec and Spectroscopic redshifts of objects in SDSS DR12
## spectroscopic redshift that overlap with 'image.fits'.
$ astquery vizier --dataset=sdss12 --overlapwith=image.fits \
           -cRA_ICRS,DE_ICRS,zsp --range=zsp,1e-10,inf

## All columns of all entries in the Gaia DR3 catalog (hosted at
## VizieR) within 1 arc-minute of the given coordinate.
$ astquery vizier --dataset=gaiadr3 --output=my-gaia.fits \
           --center=113.8729761,31.9027152 --radius=1/60 \

## Similar to above, but only ID, RA and Dec columns for objects with
## magnitude range 10 to 15. In VizieR, this column is called 'Gmag'.
## Also, using sexagesimal coordinates instead of degrees for center.
$ astquery vizier --dataset=gaiadr3 --output=my-gaia.fits \
           --center=07h35m29.51,31d54m9.77 --radius=1/60 \
           --range=Gmag,10:15 -cDR3Name,RAJ2000,DEJ2000

Query takes a single argument which is the name of the database. For the full list of available databases and accessing them, see Available databases. There are two methods to query the databases, each is more fully discussed in its option’s description below.

Low-level: With --query you can directly give a raw query statement that is recognized by the database. This is very low level and will require a good knowledge of the database’s query language, but of course, it is much more powerful. If this option is given, the raw string is directly passed to the server and all other constraints/options (for Query’s high-level interface) are ignored.
High-level: With the high-level options (like --column, --center, --radius, --range and other constraining options below), the low-level query will be constructed automatically for the particular database. This method is only limited to the generic capabilities that Query provides for all servers. So --query is more powerful, however, in this mode, you do not need any knowledge of the database’s query language. You can see the internally generated query on the terminal (if --quiet is not used) or in the 0-th extension of the output (if it is a FITS file). This full command contains the internally generated query.

The name of the downloaded output file can be set with --output. The requested output format can have any of the Recognized table formats (currently .txt or .fits). Like all Gnuastro programs, if the output is a FITS file, the zero-th/first HDU of the output will contain all the command-line options given to Query as well as the full command used to access the server. When --output is not set, the output name will be in the format of NAME-STRING.fits, where NAME is the name of the database and STRING is a randomly selected 6-character set of numbers and alphabetic characters. With this feature, a second run of astquery that is not called with --output will not over-write an already downloaded one. Generally, when calling Query more than once, it is recommended to set an output name for each call based on your project’s context.

The outputs of Query will have a common output format, irrespective of the used database. To achieve this, Query will ask the databases to provide a FITS table output (for larger tables, FITS can consume much less download volume). After downloading is complete, the raw downloaded file will be read into memory once by Query, and written into the file given to --output. The raw downloaded file will be deleted by default, but can be preserved with the --keeprawdownload option. This strategy avoids unnecessary surprises depending on database. For example, some databases can download a compressed FITS table, even though we ask for FITS. But with the strategy above, the final output will be an uncompressed FITS file. The metadata that is added by Query (including the full download command) is also very useful for future usage of the downloaded data. Unfortunately many databases do not write the input queries into their generated tables.

--dry-run

Only print the final download command to contact the server, do not actually run it. This option is good when you want to check the finally constructed query or download options given to the download program. You may also want to use the constructed command as a base to do further customizations on it and run it yourself.

-k

--keeprawdownload

Do not delete the raw downloaded file from the database. The name of the raw download will have a OUTPUT-raw-download.fits format. Where OUTPUT is either the base-name of the final output file (without a suffix).

-i

--information

Print the information of all datasets (tables) within a database or all columns within a database. When --dataset is specified, the latter mode (all column information) is downloaded and printed and when it is not defined, all dataset information (within the database) is printed.

Some databases (like VizieR) contain tens of thousands of datasets, so you can limit the downloaded and printed information for available databases with the --limitinfo option (described below). Dataset descriptions are often large and contain a lot of text (unlike column descriptions). Therefore when printing the information of all datasets within a database, the information (e.g., database name) will be printed on separate lines before the description. However, when printing column information, the output has the same format as a similar option in Table (see Invoking Table).

Important note to consider: the printed order of the datasets or columns is just for displaying in the printed output. You cannot ask for datasets or columns based on the printed order, you need to use dataset or column names.

-L STR

--limitinfo=STR

Limit the information that is downloaded and displayed (with --information) to those that have the string given to this option in their description. Note that this is case-sensitive. This option is only relevant when --information is also called.

Databases may have thousands (or tens of thousands) of datasets. Therefore just the metadata (information) to show with --information can be tens of megabytes (for example, the full VizieR metadata file is about 23Mb as of January 2021). Once downloaded, it can also be hard to parse manually. With --limitinfo, only the metadata of datasets that contain this string in their description will be downloaded and displayed, greatly improving the speed of finding your desired dataset.

-Q "STR"

--query="STR"

Directly specify the query to be passed onto the database. The queries will generally contain space and other meta-characters, so we recommend placing the query within quotations.

-s STR

--dataset=STR

The dataset to query within the database (not compatible with --query). This option is mandatory when --query or --information are not provided. You can see the list of available datasets within a database using --information (possibly supplemented by --limitinfo). The output of --information will contain the recognized name of the datasets within that database. You can pass the recognized name directly to this option. For more on finding and using your desired database, see Available databases.

-c STR

--column=STR[,STR[,...]]

The column name(s) to retrieve from the dataset in the given order (not compatible with --query). If not given, all the dataset’s columns for the selected rows will be queried (which can be large!). This option can take multiple values in one instance (for example, --column=ra,dec,mag), or in multiple instances (for example, -cra -cdec -cmag), or mixed (for example, -cra,dec -cmag).

In case, you do not know the full list of the dataset’s column names a-priori, and you do not want to download all the columns (which can greatly decrease your download speed), you can use the --information option combined with the --dataset option, see Available databases.

-H INT

--head=INT

Only ask for the first INT rows of the finally selected columns, not all the rows. This can be good when your search can result a large dataset, but before downloading the full volume, you want to see the top rows and get a feeling of what the whole dataset looks like.

-v FITS

--overlapwith=FITS

File name of FITS file containing an image (in the HDU given by --hdu) to use for identifying the region to query in the give database and dataset. Based on the image’s WCS and pixel size, the sky coverage of the image is estimated and values to the --center, --width will be calculated internally. Hence this option cannot be used with --center, --width or --radius. Also, since it internally generates the query, it cannot be used with --query.

Note that if the image has WCS distortions and the reference point for the WCS is not within the image, the WCS will not be well-defined. Therefore the resulting catalog may not overlap, or correspond to a larger/small area in the sky.

-C FLT,FLT

--center=FLT,FLT

The spatial center position (mostly RA and Dec) to use for the automatically generated query (not compatible with --query). The comma-separated values can either be in degrees (a single number), or sexagesimal (_h_m_ for RA, _d_m_ for Dec, or _:_:_ for both).

The given values will be compared to two columns in the database to find/return rows within a certain region around this center position will be requested and downloaded. Pre-defined RA and Dec column names are defined in Query for every database, however you can use --ccol to select other columns to use instead. The region can either be a circle and the point (configured with --radius) or a box/rectangle around the point (configured with --width).

--ccol=STR,STR

The name of the coordinate-columns in the dataset to compare with the values given to --center. Query will use its internal defaults for each dataset (for example, RAJ2000 and DEJ2000 for VizieR data). But each dataset is treated separately and it is not guaranteed that these columns exist in all datasets. Also, more than one coordinate system/epoch may be present in a dataset and you can use this option to construct your spatial constraint based on the others coordinate systems/epochs.

-r FLT

--radius=FLT

The radius about the requested center to use for the automatically generated query (not compatible with --query). The radius is in units of degrees, but you can use simple division with this option directly on the command-line. For example, if you want a radius of 20 arc-minutes or 20 arc-seconds, you can use --radius=20/60 or --radius=20/3600 respectively (which is much more human-friendly than 0.3333 or 0.005556).

-w FLT[,FLT]

--width=FLT[,FLT]

The square (or rectangle) side length (width) about the requested center to use for the automatically generated query (not compatible with --query). If only one value is given to --width the region will be a square, but if two values are given, the widths of the query box along each dimension will be different. The value(s) is (are) in the same units as the coordinate column (see --ccol, usually RA and Dec which are degrees). You can use simple division for each value directly on the command-line if you want relatively small (and more human-friendly) sizes. For example, if you want your box to be 1 arc-minutes along the RA and 2 arc-minutes along Dec, you can use --width=1/60,2/60.

-g STR,FLT,FLT

--range=STR,FLT,FLT

The column name and numerical range (inclusive) of acceptable values in that column (not compatible with --query). This option can be called multiple times for applying range limits on many columns in one call (thus greatly reducing the download size). For example, when used on the ESA gaia database, you can use --range=phot_g_mean_mag,10:15 to only get rows that have a value between 10 and 15 (inclusive on both sides) in the phot_g_mean_mag column.

If you want all rows larger, or smaller, than a certain number, you can use inf, or -inf as the first or second values respectively. For example, if you want objects with SDSS spectroscopic redshifts larger than 2 (from the VizieR sdss12 database), you can use --range=zsp,2,inf

If you want the interval to not be inclusive on both sides, you can run astquery once and get the command that it executes. Then you can edit it to be non-inclusive on your desired side.

-b STR[,STR]

--noblank=STR[,STR]

Only ask for rows that do not have a blank value in the STR column. This option can be called many times, and each call can have multiple column names (separated by a comma or ,). For example, if you want the retrieved rows to not have a blank value in columns A, B, C and D, you can use --noblank=A -bB,C,D.

--sort=STR[,STR]

Ask for the server to sort the downloaded data based on the given columns. For example, let’s assume your desired catalog has column Z for redshift and column MAG_R for magnitude in the R band. When you call --sort=Z,MAG_R, it will primarily sort the columns based on the redshift, but if two objects have the same redshift, they will be sorted by magnitude. You can add as many columns as you like for higher-level sorting.