GNU Astronomy Utilities

This book documents version 0.22 of the GNU Astronomy Utilities (Gnuastro). Gnuastro provides various programs and libraries for astronomical data manipulation and analysis.

Copyright © 2015-2024 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”.

To navigate easily in this web page, you can use the Next, Previous, Up and Contents links in the top and bottom of each page. Next and Previous will take you to the next or previous topic in the same level, for example, from chapter 1 to chapter 2 or vice versa. To go to the sections or subsections, you have to click on the menu entries that are there when ever a sub-component to a title is present.

Short Table of Contents

Table of Contents


1 Introduction

GNU Astronomy Utilities (Gnuastro) is an official GNU package consisting of separate programs and libraries for the manipulation and analysis of astronomical data. All the programs share the same basic command-line user interface for the comfort of both the users and developers. Gnuastro is written to comply fully with the GNU coding standards so it integrates finely with the GNU/Linux operating system. This also enables astronomers to expect a fully familiar experience in the source code, building, installing and command-line user interaction that they have seen in all the other GNU software that they use. The official and always up to date version of this book (or manual) is freely available under GNU Free Doc. License in various formats (PDF, HTML, plain text, info, and as its Texinfo source) at http://www.gnu.org/software/gnuastro/manual/.

For users who are new to the GNU/Linux environment, unless otherwise specified most of the topics in Installation and Common program behavior are common to all GNU software, for example, installation, managing command-line options or getting help (also see New to GNU/Linux?). So if you are new to this empowering environment, we encourage you to go through these chapters carefully. They can be a starting point from which you can continue to learn more from each program’s own manual and fully benefit from and enjoy this wonderful environment. Gnuastro also comes with a large set of libraries, so you can write your own programs using Gnuastro’s building blocks, see Review of library fundamentals for an introduction.

In Gnuastro, no change to any program or library will be committed to its history, before it has been fully documented here first. As discussed in Gnuastro manifesto: Science and its tools this is a founding principle of the Gnuastro.


1.1 Quick start

The latest official release tarball is always available as gnuastro-latest.tar.lz. The Lzip format is used for better compression (smaller output size, thus faster download), and robust archival features and standards. For historical reasons (those users that do not yet have Lzip), the Gzip’d tarball1 is available at the same URL (just change the .lz suffix above to .gz; however, the Lzip’d file is recommended). See Release tarball for more details on the tarball release.

Let’s assume the downloaded tarball is in the TOPGNUASTRO directory. You can follow the commands below to download and un-compress the Gnuastro source. You need to have the lzip program for the decompression (see Dependencies from package managers) If your Tar implementation does not recognize Lzip (the third command fails), run the fourth command. Note that lines starting with ## do not need to be typed (they are only a description of the following command):

## Go into the download directory.
$ cd TOPGNUASTRO

## If you do not already have the tarball, you can download it:
$ wget http://ftp.gnu.org/gnu/gnuastro/gnuastro-latest.tar.lz

## If this fails, run the next command.
$ tar -xf gnuastro-latest.tar.lz

## Only when the previous command fails.
$ lzip -cd gnuastro-latest.tar.lz | tar -xf -

Gnuastro has three mandatory dependencies and some optional dependencies for extra functionality, see Dependencies for the full list. In Dependencies from package managers we have prepared the command to easily install Gnuastro’s dependencies using the package manager of some operating systems. When the mandatory dependencies are ready, you can configure, compile, check and install Gnuastro on your system with the following commands. See Known issues if you confront any complications.

$ cd gnuastro-X.X                  # Replace X.X with version number.
$ ./configure
$ make -j8                         # Replace 8 with no. CPU threads.
$ make check -j8                   # Replace 8 with no. CPU threads.
$ sudo make install

For each program there is an ‘Invoke ProgramName’ sub-section in this book which explains how the programs should be run on the command-line (for example, see Invoking Table).

In Tutorials, we have prepared some complete tutorials with common Gnuastro usage scenarios in astronomical research. They even contain links to download the necessary data, and thoroughly describe every step of the process (the science, statistics and optimal usage of the command-line). We therefore recommend to read (an run the commands in) the tutorials before starting to use Gnuastro.


1.2 Gnuastro programs list

One of the most common ways to operate Gnuastro is through its command-line programs. For some tutorials on several real-world usage scenarios, see Tutorials. The list here is just provided as a general summary for those who are new to Gnuastro.

GNU Astronomy Utilities 0.22, contains the following programs. They are sorted in alphabetical order and a short description is provided for each program. The description starts with the executable names in thisfont followed by a pointer to the respective section in parenthesis. Throughout this book, they are ordered based on their context, please see the top-level contents for contextual ordering (based on what they do).

Arithmetic

(astarithmetic, see Arithmetic) For arithmetic operations on multiple (theoretically unlimited) number of datasets (images). It has a large and growing set of arithmetic, mathematical, and even statistical operators (for example, +, -, *, /, sqrt, log, min, average, median, see Arithmetic operators).

BuildProgram

(astbuildprog, see BuildProgram) Compile, link and run custom C programs that depend on the Gnuastro library (see Gnuastro library). This program will automatically link with the libraries that Gnuastro depends on, so there is no need to explicitly mention them every time you are compiling a Gnuastro library dependent program.

ConvertType

(astconvertt, see ConvertType) Convert astronomical data files (FITS or IMH) to and from several other standard image and data formats, for example, TXT, JPEG, EPS or PDF. Optionally, it is also possible to add vector graphics markers over the output image (for example, circles from catalogs containing RA or Dec).

Convolve

(astconvolve, see Convolve) Convolve (blur or smooth) data with a given kernel in spatial and frequency domain on multiple threads. Convolve can also do deconvolution to find the appropriate kernel to PSF-match two images.

CosmicCalculator

(astcosmiccal, see CosmicCalculator) Do cosmological calculations, for example, the luminosity distance, distance modulus, comoving volume and many more.

Crop

(astcrop, see Crop) Crop region(s) from one or many image(s) and stitch several images if necessary. Input coordinates can be in pixel coordinates or world coordinates.

Fits

(astfits, see Fits) View and manipulate FITS file extensions and header keywords.

MakeCatalog

(astmkcatalog, see MakeCatalog) Make catalog of labeled image (output of NoiseChisel). The catalogs are highly customizable and adding new calculations/columns is very straightforward.

MakeProfiles

(astmkprof, see MakeProfiles) Make mock 2D profiles in an image. The central regions of radial profiles are made with a configurable 2D Monte Carlo integration. It can also build the profiles on an over-sampled image.

Match

(astmatch, see Match) Given two input catalogs, find the rows that match with each other within a given aperture (may be an ellipse).

NoiseChisel

(astnoisechisel, see NoiseChisel) Detect signal in noise. It uses a technique to detect very faint and diffuse, irregularly shaped signal in noise (galaxies in the sky), using thresholds that are below the Sky value, see Akhlaghi and Ichikawa 2015.

Query

(astquery, see Query) High-level interface to query pre-defined remote, or external databases, and directly download the required sub-tables on the command-line.

Segment

(astsegment, see Segment) Segment detected regions based on the structure of signal and the input dataset’s noise properties.

Statistics

(aststatistics, see Statistics) Statistical calculations on the input dataset (column in a table, image or datacube). This includes man operations such as generating histogram, sigma clipping, and least squares fitting.

Table

(asttable, Table) Convert FITS binary and ASCII tables into other such tables, print them on the command-line, save them in a plain text file, do arithmetic on the columns or get the FITS table information. For a full list of operations, see Operation precedence in Table.

Warp

(astwarp, see Warp) Warp image to new pixel grid. By default it will align the pixel and WCS coordinates, removing any non-linear WCS distortions. Any linear warp (projective transformation or Homography) can also be applied to the input images by explicitly calling the respective operation.

The programs listed above are designed to be highly modular and generic. Hence, they are naturally for lower-level operations. In Gnuastro, higher-level operations (combining multiple programs, or running a program in a special way), are done with installed Bash scripts (all prefixed with astscript-). They can be run just like a program and behave very similarly (with minor differences, see Installed scripts).

astscript-ds9-region

(See SAO DS9 region files from table) Given a table (either as a file or from standard input), create an SAO DS9 region file from the requested positional columns (WCS or image coordinates).

astscript-fits-view

(see Viewing FITS file contents with DS9 or TOPCAT) Given any number of FITS files, this script will either open SAO DS9 (for images or cubes) or TOPCAT (for tables) to view them in a graphic user interface (GUI).

astscript-pointing-simulate

(See Pointing pattern simulation) Given a table of pointings on the sky, create and a reference image that contains your camera’s distortions and properties, generate a stacked exposure map. This is very useful in testing the coverage of dither patterns when designing your observing strategy and it is highly customizable. See Akhlaghi 2023, or the dedicated tutorial in Pointing pattern design.

astscript-radial-profile

(See Generate radial profile) Calculate the radial profile of an object within an image. The object can be at any location in the image, using various measures (median, sigma-clipped mean, etc.), and the radial distance can also be measured on any general ellipse. See Infante-Sainz et al. 2024.

astscript-color-faint-gray

(see Color images with gray faint regions) Given three images for the Red-Green-Blue (RGB) channels, this script will use the bright pixels for color and will show the faint/diffuse regions in grayscale. This greatly helps in visualizing the full dynamic range of astronomical data. See Infante-Sainz et al. 2024 or a dedicated tutorial in Color images with full dynamic range.

astscript-sort-by-night

(See Sort FITS files by night) Given a list of FITS files, and a HDU and keyword name (for a date), this script separates the files in the same night (possibly over two calendar days).

astscript-zeropoint

(see Zero point estimation) Estimate the zero point (to calibrate pixel values) of an input image using a reference image or a reference catalog. This is necessary to produce measurements with physical units from new images. See Eskandarlou et al. 2023, or a dedicated tutorial in Zero point of an image.

astscript-psf-*

The following scripts are used to estimate the extended PSF estimation and subtraction as described in the tutorial Building the extended PSF:

astscript-psf-select-stars

(see Invoking astscript-psf-select-stars) Find all the stars within an image that are suitable for constructing an extended PSF. If the image has WCS, this script can automatically query Gaia to find the good stars.

astscript-psf-stamp

(see Invoking astscript-psf-stamp) build a crop (stamp) of a certain width around a star at a certain coordinate in a larger image. This script will do sub-pixel re-positioning to make sure the star is centered and can optionally mask all other background sources).

astscript-psf-scale-factor

(see Invoking astscript-psf-scale-factor) Given a PSF model, and the central coordinates of a star in an image, find the scale factor that has to be multiplied by the PSF to scale it to that star.

astscript-psf-unite

(see Invoking astscript-psf-unite) Unite the various components of a PSF into one. Because of saturation and non-linearity, to get a good estimate of the extended PSF, it is necessary to construct various parts from different magnitude ranges.

astscript-psf-subtract

(see Invoking astscript-psf-subtract) Given the model of a PSF and the central coordinates of a star in the image, do sub-pixel re-positioning of the PSF, scale it to the star and subtract it from the image.


1.3 Gnuastro manifesto: Science and its tools

History of science indicates that there are always inevitably unseen faults, hidden assumptions, simplifications and approximations in all our theoretical models, data acquisition and analysis techniques. It is precisely these that will ultimately allow future generations to advance the existing experimental and theoretical knowledge through their new solutions and corrections.

In the past, scientists would gather data and process them individually to achieve an analysis thus having a much more intricate knowledge of the data and analysis. The theoretical models also required little (if any) simulations to compare with the data. Today both methods are becoming increasingly more dependent on pre-written software. Scientists are dissociating themselves from the intricacies of reducing raw observational data in experimentation or from bringing the theoretical models to life in simulations. These ‘intricacies’ are precisely those unseen faults, hidden assumptions, simplifications and approximations that define scientific progress.

Unfortunately, most persons who have recourse to a computer for statistical analysis of data are not much interested either in computer programming or in statistical method, being primarily concerned with their own proper business. Hence the common use of library programs and various statistical packages. ... It’s time that was changed.

F.J. Anscombe. The American Statistician, Vol. 27, No. 1. 1973

Anscombe’s quartet demonstrates how four data sets with widely different shapes (when plotted) give nearly identical output from standard regression techniques. Anscombe uses this (now famous) quartet, which was introduced in the paper quoted above, to argue that “Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer”. Echoing Anscombe’s concern after 44 years, some of the highly recognized statisticians of our time (Leek, McShane, Gelman, Colquhoun, Nuijten and Goodman), wrote in Nature that:

We need to appreciate that data analysis is not purely computational and algorithmic – it is a human behavior....Researchers who hunt hard enough will turn up a result that fits statistical criteria – but their discovery will probably be a false positive.

Five ways to fix statistics, Nature, 551, Nov 2017.

Users of statistical (scientific) methods (software) are therefore not passive (objective) agents in their results. It is necessary to actually understand the method, not just use it as a black box. The subjective experience gained by frequently using a method/software is not sufficient to claim an understanding of how the tool/method works and how relevant it is to the data and analysis. This kind of subjective experience is prone to serious misunderstandings about the data, what the software/statistical-method really does (especially as it gets more complicated), and thus the scientific interpretation of the result. This attitude is further encouraged through non-free software2, poorly written (or non-existent) scientific software manuals, and non-reproducible papers3. This approach to scientific software and methods only helps in producing dogmas and an “obscurantist faith in the expert’s special skill, and in his personal knowledge and authority4.

Program or be programmed. Choose the former, and you gain access to the control panel of civilization. Choose the latter, and it could be the last real choice you get to make.

Douglas Rushkoff. Program or be programmed, O/R Books (2010).

It is obviously impractical for any one human being to gain the intricate knowledge explained above for every step of an analysis. On the other hand, scientific data can be large and numerous, for example, images produced by telescopes in astronomy. This requires efficient algorithms. To make things worse, natural scientists have generally not been trained in the advanced software techniques, paradigms and architecture that are taught in computer science or engineering courses and thus used in most software. The GNU Astronomy Utilities are an effort to tackle this issue.

Gnuastro is not just a software, this book is as important to the idea behind Gnuastro as the source code (software). This book has tried to learn from the success of the “Numerical Recipes” book in educating those who are not software engineers and computer scientists but still heavy users of computational algorithms, like astronomers. There are two major differences.

The first difference is that Gnuastro’s code and the background information are segregated: the code is moved within the actual Gnuastro software source code and the underlying explanations are given here in this book. In the source code, every non-trivial step is heavily commented and correlated with this book, it follows the same logic of this book, and all the programs follow a similar internal data, function and file structure, see Program source. Complementing the code, this book focuses on thoroughly explaining the concepts behind those codes (history, mathematics, science, software and usage advice when necessary) along with detailed instructions on how to run the programs. At the expense of frustrating “professionals” or “experts”, this book and the comments in the code also intentionally avoid jargon and abbreviations. The source code and this book are thus intimately linked, and when considered as a single entity can be thought of as a real (an actual software accompanying the algorithms) “Numerical Recipes” for astronomy.

The second major, and arguably more important, difference is that “Numerical Recipes” does not allow you to distribute any code that you have learned from it. In other words, it does not allow you to release your software’s source code if you have used their codes, you can only publicly release binaries (a black box) to the community. Therefore, while it empowers the privileged individual who has access to it, it exacerbates social ignorance. Exactly at the opposite end of the spectrum, Gnuastro’s source code is released under the GNU general public license (GPL) and this book is released under the GNU free documentation license. You are therefore free to distribute any software you create using parts of Gnuastro’s source code or text, or figures from this book, see Your rights.

With these principles in mind, Gnuastro’s developers aim to impose the minimum requirements on you (in computer science, engineering and even the mathematics behind the tools) to understand and modify any step of Gnuastro if you feel the need to do so, see Why C programming language? and Program design philosophy.

Without prior familiarity and experience with optics, it is hard to imagine how, Galileo could have come up with the idea of modifying the Dutch military telescope optics to use in astronomy. Astronomical objects could not be seen with the Dutch military design of the telescope. In other words, it is unlikely that Galileo could have asked a random optician to make modifications (not understood by Galileo) to the Dutch design, to do something no astronomer of the time took seriously. In the paradigm of the day, what could be the purpose of enlarging geometric spheres (planets) or points (stars)? In that paradigm only the position and movement of the heavenly bodies was important, and that had already been accurately studied (recently by Tycho Brahe).

In the beginning of his “The Sidereal Messenger” (published in 1610) he cautions the readers on this issue and before describing his results/observations, Galileo instructs us on how to build a suitable instrument. Without a detailed description of how he made his tools and done his observations, no reasonable person would believe his results. Before he actually saw the moons of Jupiter, the mountains on the Moon or the crescent of Venus, Galileo was “evasive”5 to Kepler. Science is defined by its tools/methods, not its raw results6.

The same is true today: science cannot progress with a black box, or poorly released code. The source code of a research is the new (abstractified) communication language in science, understandable by humans and computers. Source code (in any programming language) is a language/notation designed to express all the details that would be too tedious/long/frustrating to report in spoken languages like English, similar to mathematic notation.

An article about computational science [almost all sciences today] ... is not the scholarship itself, it is merely advertising of the scholarship. The Actual Scholarship is the complete software development environment and the complete set of instructions which generated the figures.

Buckheit & Donoho, Lecture Notes in Statistics, Vol 103, 1996

Today, the quality of the source code that goes into a scientific result (and the distribution of that code) is as critical to scientific vitality and integrity, as the quality of its written language/English used in publishing/distributing its paper. A scientific paper will not even be reviewed by any respectable journal if its written in a poor language/English. A similar level of quality assessment is thus increasingly becoming necessary regarding the codes/methods used to derive the results of a scientific paper. For more on this, please see Akhlaghi et al. 2021).

Bjarne Stroustrup (creator of the C++ language) says: “Without understanding software, you are reduced to believing in magic”. Ken Thomson (the designer or the Unix operating system) says “I abhor a system designed for the ‘user’ if that word is a coded pejorative meaning ‘stupid and unsophisticated’.” Certainly no scientist (user of a scientific software) would want to be considered a believer in magic, or stupid and unsophisticated.

This can happen when scientists get too distant from the raw data and methods, and are mainly discussing results. In other words, when they feel they have tamed Nature into their own high-level (abstract) models (creations), and are mainly concerned with scaling up, or industrializing those results. Roughly five years before special relativity, and about two decades before quantum mechanics fundamentally changed Physics, Lord Kelvin is quoted as saying:

There is nothing new to be discovered in physics now. All that remains is more and more precise measurement.

William Thomson (Lord Kelvin), 1900

A few years earlier Albert. A. Michelson made the following statement:

The more important fundamental laws and facts of physical science have all been discovered, and these are now so firmly established that the possibility of their ever being supplanted in consequence of new discoveries is exceedingly remote.... Our future discoveries must be looked for in the sixth place of decimals.

Albert. A. Michelson, dedication of Ryerson Physics Lab, U. Chicago 1894

If scientists are considered to be more than mere puzzle solvers7 (simply adding to the decimals of existing values or observing a feature in 10, 100, or 100000 more galaxies or stars, as Kelvin and Michelson clearly believed), they cannot just passively sit back and uncritically repeat the previous (observational or theoretical) methods/tools on new data. Today there is a wealth of raw telescope images ready (mostly for free) at the finger tips of anyone who is interested with a fast enough internet connection to download them. The only thing lacking is new ways to analyze this data and dig out the treasure that is lying hidden in them to existing methods and techniques.

New data that we insist on analyzing in terms of old ideas (that is, old models which are not questioned) cannot lead us out of the old ideas. However many data we record and analyze, we may just keep repeating the same old errors, missing the same crucially important things that the experiment was competent to find.

Jaynes, Probability theory, the logic of science. Cambridge U. Press (2003).

1.4 Your rights

The paragraphs below, in this section, belong to the GNU Texinfo8 manual and are not written by us! The name “Texinfo” is just changed to “GNU Astronomy Utilities” or “Gnuastro” because they are released under the same licenses and it is beautifully written to inform you of your rights.

GNU Astronomy Utilities is “free software”; this means that everyone is free to use it and free to redistribute it on certain conditions. Gnuastro is not in the public domain; it is copyrighted and there are restrictions on its distribution, but these restrictions are designed to permit everything that a good cooperating citizen would want to do. What is not allowed is to try to prevent others from further sharing any version of Gnuastro that they might get from you.

Specifically, we want to make sure that you have the right to give away copies of the programs that relate to Gnuastro, that you receive the source code or else can get it if you want it, that you can change these programs or use pieces of them in new free programs, and that you know you can do these things.

To make sure that everyone has such rights, we have to forbid you to deprive anyone else of these rights. For example, if you distribute copies of the Gnuastro related programs, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights.

Also, for our own protection, we must make certain that everyone finds out that there is no warranty for the programs that relate to Gnuastro. If these programs are modified by someone else and passed on, we want their recipients to know that what they have is not what we distributed, so that any problems introduced by others will not reflect on our reputation.

The full text of the licenses for the Gnuastro book and software can be respectively found in GNU Gen. Pub. License v39 and GNU Free Doc. License10.


1.5 Logo of Gnuastro

Gnuastro’s logo is an abstract image of a barred spiral galaxy. The galaxy is vertically cut in half: on the left side, the beauty of a contiguous galaxy image is visible. But on the right, the image gets pixelated, and we only see the parts that are within the pixels. The pixels that are more near to the center of the galaxy (which is brighter) are also larger. But as we follow the spiral arms (and get more distant from the center), the pixels get smaller (signifying less signal).

This sharp distinction between the contiguous and pixelated view of the galaxy signifies the main struggle in science: in the “real” world, objects are not pixelated or discrete and have no noise. However, when we observe nature, we are confined and constrained by the resolution of our data collection (CCD imager in this case).

On the other hand, we read English text from the left and progress towards the right. This defines the positioning of the “real” and observed halves of the galaxy: the no-noised and contiguous half (on the left) passes through our observing tools and becomes pixelated and noisy half (on the right). It is the job of scientific software like Gnuastro to help interpret the underlying mechanisms of the “real” universe from the pixelated and noisy data.

Gnuastro’s logo was designed by Marjan Akbari. The concept behind it was created after several design iterations with Mohammad Akhlaghi.


1.6 Naming convention

Gnuastro is a package of independent programs and a collection of libraries, here we are mainly concerned with the programs. Each program has an official name which consists of one or two words, describing what they do. The latter are printed with no space, for example, NoiseChisel or Crop. On the command-line, you can run them with their executable names which start with an ast and might be an abbreviation of the official name, for example, astnoisechisel or astcrop, see Executable names.

We will use “ProgramName” for a generic official program name and astprogname for a generic executable name. In this book, the programs are classified based on what they do and thoroughly explained. An alphabetical list of the programs that are installed on your system with this installation are given in Gnuastro programs list. That list also contains the executable names and version numbers along with a one line description.


1.7 Version numbering

Gnuastro can have two formats of version numbers, for official and unofficial releases. Official Gnuastro releases are announced on the info-gnuastro mailing list, they have a version control tag in Gnuastro’s development history, and their version numbers are formatted like “A.B”. A is a major version number, marking a significant planned achievement (for example, see GNU Astronomy Utilities 1.0), while B is a minor version number, see below for more on the distinction. Note that the numbers are not decimals, so version 2.34 is much more recent than version 2.5, which is not equal to 2.50.

Gnuastro also allows a unique version number for unofficial releases. Unofficial releases can mark any point in Gnuastro’s development history. This is done to allow astronomers to easily use any point in the version controlled history for their data-analysis and research publication. See Version controlled source for a complete introduction. This section is not just for developers and is intended to straightforward and easy to read, so please have a look if you are interested in the cutting-edge. This unofficial version number is a meaningful and easy to read string of characters, unique to that particular point of history. With this feature, users can easily stay up to date with the most recent bug fixes and additions that are committed between official releases.

The unofficial version number is formatted like: A.B.C-D. A and B are the most recent official version number. C is the number of commits that have been made after version A.B. D is the first 4 or 5 characters of the commit hash number11. Therefore, the unofficial version number ‘3.92.8-29c8’, corresponds to the 8th commit after the official version 3.92 and its commit hash begins with 29c8. The unofficial version number is sort-able (unlike the raw hash) and as shown above is descriptive of the state of the unofficial release. Of course an official release is preferred for publication (since its tarballs are easily available and it has gone through more tests, making it more stable), so if an official release is announced prior to your publication’s final review, please consider updating to the official release.

The major version number is set by a major goal which is defined by the developers and user community before hand, for example, see GNU Astronomy Utilities 1.0. The incremental work done in minor releases are commonly small steps in achieving the major goal. Therefore, there is no limit on the number of minor releases and the difference between the (hypothetical) versions 2.927 and 3.0 can be a small (negligible to the user) improvement that finalizes the defined goals.


1.7.1 GNU Astronomy Utilities 1.0

Currently (prior to Gnuastro 1.0), the aim of Gnuastro is to have a complete system for data manipulation and analysis at least similar to IRAF12. So an astronomer can take all the standard data analysis steps (starting from raw data to the final reduced product and standard post-reduction tools) with the various programs in Gnuastro.

The maintainers of each camera or detector on a telescope can provide a completely transparent shell script or Makefile to the observer for data analysis. This script can set configuration files for all the required programs to work with that particular camera. The script can then run the proper programs in the proper sequence. The user/observer can easily follow the standard shell script to understand (and modify) each step and the parameters used easily. Bash (or other modern GNU/Linux shell scripts) is powerful and made for this gluing job. This will simultaneously improve performance and transparency. Shell scripting (or Makefiles) are also basic constructs that are easy to learn and readily available as part of the Unix-like operating systems. If there is no program to do a desired step, Gnuastro’s libraries can be used to build specific programs.

The main factor is that all observatories or projects can freely contribute to Gnuastro and all simultaneously benefit from it (since it does not belong to any particular one of them), much like how for-profit organizations (for example, RedHat, or Intel and many others) are major contributors to free and open source software for their shared benefit. Gnuastro’s copyright has been fully awarded to GNU, so it does not belong to any particular astronomer or astronomical facility or project.


1.8 New to GNU/Linux?

Some astronomers initially install and use a GNU/Linux operating system because their necessary tools can only be installed in this environment. However, the transition is not necessarily easy. To encourage you in investing the patience and time to make this transition, and actually enjoy it, we will first start with a basic introduction to GNU/Linux operating systems. Afterwards, in Command-line interface we will discuss the wonderful benefits of the command-line interface, how it beautifully complements the graphic user interface, and why it is worth the (apparently steep) learning curve. Finally a complete chapter (Tutorials) is devoted to real world scenarios of using Gnuastro (on the command-line). Therefore if you do not yet feel comfortable with the command-line we strongly recommend going through that chapter after finishing this section.

You might have already noticed that we are not using the name “Linux”, but “GNU/Linux”. Please take the time to have a look at the following essays and FAQs for a complete understanding of this very important distinction.

In short, the Linux kernel13 is built using the GNU C library (glibc) and GNU compiler collection (gcc). The Linux kernel software alone is just a means for other software to access the hardware resources, it is useless alone! A normal astronomer (or scientist) will never interact with the kernel directly! For example, the command-line environment that you interact with is usually GNU Bash. It is GNU Bash that then talks to kernel.

To better clarify, let’s use this analogy inspired from one of the links above14: saying that you are “running Linux” is like saying you are “driving your engine”. The car’s engine is the main source of power in the car, no one doubts that. But you do not “drive” the engine, you drive the “car”. The engine alone is useless for transportation without the radiator, battery, transmission, wheels, chassis, seats, wind-shield, etc.

To have an operating system, you need lower-level tools (to build the kernel), and higher-level (to use it) software packages. For the Linux kernel, both the lower-level and higher-level tools are GNU. In other words,“the whole system is basically GNU with Linux loaded”.

You can replace the Linux kernel and still have the GNU shell and higher-level utilities. For example, using the “Windows Subsystem for Linux”, you can use almost all GNU tools without the original Linux kernel, but using the host Windows operating system, as in https://ubuntu.com/wsl. Alternatively, you can build a fully functional GNU-based working environment on a macOS or BSD-based operating system (using the host’s kernel and C compiler), for example, through projects like Maneage, see Akhlaghi et al. 2021, in particular Appendix C with all the GNU software tools that is exactly reproducible on a macOS also.

Therefore to acknowledge GNU’s instrumental role in the creation and usage of the Linux kernel and the operating systems that use it, we should call these operating systems “GNU/Linux”.


1.8.1 Command-line interface

One aspect of Gnuastro that might be a little troubling to new GNU/Linux users is that (at least for the time being) it only has a command-line user interface (CLI). This might be contrary to the mostly graphical user interface (GUI) experience with proprietary operating systems. Since the various actions available are not always on the screen, the command-line interface can be complicated, intimidating, and frustrating for a first-time user. This is understandable and also experienced by anyone who started using the computer (from childhood) in a graphical user interface (this includes most of Gnuastro’s authors). Here we hope to convince you of the unique benefits of this interface which can greatly enhance your productivity while complementing your GUI experience.

Through GNOME 315, most GNU/Linux based operating systems now have an advanced and useful GUI. Since the GUI was created long after the command-line, some wrongly consider the command-line to be obsolete. Both interfaces are useful for different tasks. For example, you cannot view an image, video, PDF document or web page on the command-line. On the other hand you cannot reproduce your results easily in the GUI. Therefore they should not be regarded as rivals but as complementary user interfaces, here we will outline how the CLI can be useful in scientific programs.

You can think of the GUI as a veneer over the CLI to facilitate a small subset of all the possible CLI operations. Each click you do on the GUI, can be thought of as internally running a different CLI command. So asymptotically (if a good designer can design a GUI which is able to show you all the possibilities to click on) the GUI is only as powerful as the command-line. In practice, such graphical designers are very hard to find for every program, so the GUI operations are always a subset of the internal CLI commands. For programs that are only made for the GUI, this results in not including lots of potentially useful operations. It also results in ‘interface design’ to be a crucially important part of any GUI program. Scientists do not usually have enough resources to hire a graphical designer, also the complexity of the GUI code is far more than CLI code, which is harmful for a scientific software, see Gnuastro manifesto: Science and its tools.

For programs that have a GUI, one action on the GUI (moving and clicking a mouse, or tapping a touchscreen) might be more efficient and easier than its CLI counterpart (typing the program name and your desired configuration). However, if you have to repeat that same action more than once, the GUI will soon become frustrating and prone to errors. Unless the designers of a particular program decided to design such a system for a particular GUI action, there is no general way to run any possible series of actions automatically on the GUI.

On the command-line, you can run any series of actions which can come from various CLI capable programs you have decided yourself in any possible permutation with one command16. This allows for much more creativity and exact reproducibility that is not possible to a GUI user. For technical and scientific operations, where the same operation (using various programs) has to be done on a large set of data files, this is crucially important. It also allows exact reproducibility which is a foundation principle for scientific results. The most common CLI (which is also known as a shell) in GNU/Linux is GNU Bash, we strongly encourage you to put aside several hours and go through this beautifully explained web page: https://flossmanuals.net/command-line/. You do not need to read or even fully understand the whole thing, only a general knowledge of the first few chapters are enough to get you going.

Since the operations in the GUI are limited and they are visible, reading a manual is not that important in the GUI (most programs do not even have any!). However, to give you the creative power explained above, with a CLI program, it is best if you first read the manual of any program you are using. You do not need to memorize any details, only an understanding of the generalities is needed. Once you start working, there are more easier ways to remember a particular option or operation detail, see Getting help.

To experience the command-line in its full glory and not in the GUI terminal emulator, press the following keys together: CTRL+ALT+F417 to access the virtual console. To return back to your GUI, press the same keys above replacing F4 with F7 (or F1, or F2, depending on your GNU/Linux distribution). In the virtual console, the GUI, with all its distracting colors and information, is gone. Enabling you to focus entirely on your actual work.

For operations that use a lot of your system’s resources (processing a large number of large astronomical images for example), the virtual console is the place to run them. This is because the GUI is not competing with your research work for your system’s RAM and CPU. Since the virtual consoles are completely independent, you can even log out of your GUI environment to give even more of your hardware resources to the programs you are running and thus reduce the operating time.

Since it uses far less system resources, the CLI is also convenient for remote access to your computer. Using secure shell (SSH) you can log in securely to your system (similar to the virtual console) from anywhere even if the connection speeds are low. There are apps for smart phones and tablets which allow you to do this.


1.9 Report a bug

According to Wikipedia “a software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways”. So when you see that a program is crashing, not reading your input correctly, giving the wrong results, or not writing your output correctly, you have found a bug. In such cases, it is best if you report the bug to the developers. The programs will also inform you if known impossible situations occur (which are caused by something unexpected) and will ask the users to report the bug issue.

Prior to actually filing a bug report, it is best to search previous reports. The issue might have already been found and even solved. The best place to check if your bug has already been discussed is the bugs tracker on Gnuastro project webpage at https://savannah.gnu.org/bugs/?group=gnuastro. In the top search fields (under “Display Criteria”) set the “Open/Closed” drop-down menu to “Any” and choose the respective program or general category of the bug in “Category” and click the “Apply” button. The results colored green have already been solved and the status of those colored in red is shown in the table.

Recently corrected bugs are probably not yet publicly released because they are scheduled for the next Gnuastro stable release. If the bug is solved but not yet released and it is an urgent issue for you, you can get the version controlled source and compile that, see Version controlled source.

To solve the issue as readily as possible, please follow the following to guidelines in your bug report. The How to Report Bugs Effectively and How To Ask Questions The Smart Way essays also provide some good generic advice for all software (do not contact their authors for Gnuastro’s problems). Mastering the art of giving good bug reports (like asking good questions) can greatly enhance your experience with any free and open source software. So investing the time to read through these essays will greatly reduce your frustration after you see something does not work the way you feel it is supposed to for a large range of software, not just Gnuastro.

Be descriptive

Please provide as many details as possible and be very descriptive. Explain what you expected and what the output was: it might be that your expectation was wrong. Also please clearly state which sections of the Gnuastro book (this book), or other references you have studied to understand the problem. This can be useful in correcting the book (adding links to likely places where users will check). But more importantly, it will be encouraging for the developers, since you are showing how serious you are about the problem and that you have actually put some thought into it. “To be able to ask a question clearly is two-thirds of the way to getting it answered.” – John Ruskin (1819-1900).

Individual and independent bug reports

If you have found multiple bugs, please send them as separate (and independent) bugs (as much as possible). This will significantly help us in managing and resolving them sooner.

Reproducible bug reports

If we cannot exactly reproduce your bug, then it is very hard to resolve it. So please send us a Minimal working example18 along with the description. For example, in running a program, please send us the full command-line text and the output with the -P option, see Operating mode options. If it is caused only for a certain input, also send us that input file. In case the input FITS is large, please use Crop to only crop the problematic section and make it as small as possible so it can easily be uploaded and downloaded and not waste the archive’s storage, see Crop.

There are generally two ways to inform us of bugs:

  • Send a mail to bug-gnuastro@gnu.org. Any mail you send to this address will be distributed through the bug-gnuastro mailing list19. This is the simplest way to send us bug reports. The developers will then register the bug into the project web page (next choice) for you.
  • Use the Gnuastro project web page at https://savannah.gnu.org/projects/gnuastro/: There are two ways to get to the submission page as listed below. Fill in the form as described below and submit it (see Gnuastro project webpage for more on the project web page).
    • Using the top horizontal menu items, immediately under the top page title. Hovering your mouse on “Support” will open a drop-down list. Select “Submit new”. Also if you have an account in Savannah, you can choose “Bugs” in the menu items and then select “Submit new”.
    • In the main body of the page, under the “Communication tools” section, click on “Submit new item”.

Once the items have been registered in the mailing list or web page, the developers will add it to either the “Bug Tracker” or “Task Manager” trackers of the Gnuastro project web page. These two trackers can only be edited by the Gnuastro project developers, but they can be browsed by anyone, so you can follow the progress on your bug. You are most welcome to join us in developing Gnuastro and fixing the bug you have found maybe a good starting point. Gnuastro is designed to be easy for anyone to develop (see Gnuastro manifesto: Science and its tools) and there is a full chapter devoted to developing it: Developing.

Savannah’s Markup: When posting to Savannah, it helps to have the code displayed in mono-space font and a different background, you may also want to make a list of items or make some words bold. For features like these, you should use Savannah’s “Markup” guide at https://savannah.gnu.org/markup-test.php. You can access this page by clicking on the “Full Markup” link that is just beside the “Preview” button, near the box that you write your comments. As you see there, for example when you want to high-light code, you should put it within a “+verbatim+” and “-verbatim-” environment like below:

+verbatim+
astarithmetic image.fits image_arith.fits -h1 isblank nan where
-verbatim-

Unfortunately, Savannah doesn’t have a way to edit submitted comments. Therefore be sure to press the “Preview” button and check your report’s final format before the final submission.


1.10 Suggest new feature

We would always be happy to hear of suggested new features. For every program, there are already lists of features that we are planning to add. You can see the current list of plans from the Gnuastro project web page at https://savannah.gnu.org/projects/gnuastro/ and following “Tasks”→“Browse” on the horizontal menu at the top of the page immediately under the title, see Gnuastro project webpage. If you want to request a feature to an existing program, click on the “Display Criteria” above the list and under “Category”, choose that particular program. Under “Category” you can also see the existing suggestions for new programs or other cases like installation, documentation or libraries. Also, be sure to set the “Open/Closed” value to “Any”.

If the feature you want to suggest is not already listed in the task manager, then follow the steps that are fully described in Report a bug. Please have in mind that the developers are all busy with their own astronomical research, and implementing existing “task”s to add or resolve bugs. Gnuastro is a volunteer effort and none of the developers are paid for their hard work. So, although we will try our best, please do not expect for your suggested feature to be immediately included (for the next release of Gnuastro).

The best person to apply the exciting new feature you have in mind is you, since you have the motivation and need. In fact, Gnuastro is designed for making it as easy as possible for you to hack into it (add new features, change existing ones and so on), see Gnuastro manifesto: Science and its tools. Please have a look at the chapter devoted to developing (Developing) and start applying your desired feature. Once you have added it, you can use it for your own work and if you feel you want others to benefit from your work, you can request for it to become part of Gnuastro. You can then join the developers and start maintaining your own part of Gnuastro. If you choose to take this path of action please contact us beforehand (Report a bug) so we can avoid possible duplicate activities and get interested people in contact.

Gnuastro is a collection of low level programs: As described in Program design philosophy, a founding principle of Gnuastro is that each library or program should be basic and low-level. High level jobs should be done by running the separate programs or using separate functions in succession through a shell script or calling the libraries by higher level functions, see the examples in Tutorials. So when making the suggestions please consider how your desired job can best be broken into separate steps and modularized.


1.11 Announcements

Gnuastro has a dedicated mailing list for making announcements (info-gnuastro). Anyone can subscribe to this mailing list. Anytime there is a new stable or test release, an email will be circulated there. The email contains a summary of the overall changes along with a detailed list (from the NEWS file). This mailing list is thus the best way to stay up to date with new releases, easily learn about the updated/new features, or dependencies (see Dependencies).

To subscribe to this list, please visit https://lists.gnu.org/mailman/listinfo/info-gnuastro. Traffic (number of mails per unit time) in this list is designed to be low: only a handful of mails per year. Previous announcements are available on its archive.


1.12 Conventions

In this book we have the following conventions:

  • All commands that are to be run on the shell (command-line) prompt as the user start with a $. In case they must be run as a superuser or system administrator, they will start with a single #. If the command is in a separate line and next line is also in the code type face, but does not have any of the $ or # signs, then it is the output of the command after it is run. As a user, you do not need to type those lines. A line that starts with ## is just a comment for explaining the command to a human reader and must not be typed.
  • If the command becomes larger than the page width a \ is inserted in the code. If you are typing the code by hand on the command-line, you do not need to use multiple lines or add the extra space characters, so you can omit them. If you want to copy and paste these examples (highly discouraged!) then the \ should stay.

    The \ character is a shell escape character which is used commonly to make characters which have special meaning for the shell, lose that special meaning (the shell will not treat them especially if there is a \ behind them). When \ is the last visible character in a line (the next character is a new-line character) the new-line character loses its meaning. Therefore, the shell sees it as a simple white-space character not the end of a command! This enables you to use multiple lines to write your commands.

This is not a convention, but a bi-product of the PDF building process of the manual: In the PDF version of this manual, a single quote (or apostrophe) character in the commands or codes is shown like this: '. Single quotes are sometimes necessary in combination with commands like awk or sed, or when using Column arithmetic in Gnuastro’s own Table (see Column arithmetic). Therefore when typing (recommended) or copy-pasting (not recommended) the commands that have a ', please correct it to the single-quote (or apostrophe) character, otherwise the command will fail.


1.13 Acknowledgments

Gnuastro would not have been possible without scholarships and grants from several funding institutions. We thus ask that if you used Gnuastro in any of your papers/reports, please add the proper citation and acknowledge the funding agencies/projects. For details of which papers to cite (may be different for different programs) and get the acknowledgment statement to include in your paper, please run the relevant programs with the common --cite option like the example commands below (for more on --cite, please see Operating mode options).

$ astnoisechisel --cite
$ astmkcatalog --cite

Here, we will acknowledge all the institutions (and their grants) along with the people who helped make Gnuastro possible. The full list of Gnuastro authors is available at the start of this book and the AUTHORS file in the source code (both are generated automatically from the version controlled history). The plain text file THANKS, which is also distributed along with the source code, contains the list of people and institutions who played an indirect role in Gnuastro (not committed any code in the Gnuastro version controlled history).

The Japanese Ministry of Education, Culture, Sports, Science, and Technology (MEXT) scholarship for Mohammad Akhlaghi’s Masters and PhD degree in Tohoku University Astronomical Institute had an instrumental role in the long term learning and planning that made the idea of Gnuastro possible. The very critical view points of Professor Takashi Ichikawa (Mohammad’s adviser) were also instrumental in the initial ideas and creation of Gnuastro. Afterwards, the European Research Council (ERC) advanced grant 339659-MUSICOS (Principal investigator: Roland Bacon) was vital in the growth and expansion of Gnuastro. Working with Roland at the Centre de Recherche Astrophysique de Lyon (CRAL), enabled a thorough re-write of the core functionality of all libraries and programs, turning Gnuastro into the large collection of generic programs and libraries it is today. At the Instituto de Astrofisica de Canarias (IAC, and in particular in collaboration with Johan Knapen and Ignacio Trujillo), Gnuastro matured and its user base significantly grew. Work on improving Gnuastro is now continuing primarily in the Centro de Estudios de Física del Cosmos de Aragón (CEFCA), located in Teruel, Spain.

In general, we would like to gratefully thank the following people for their useful and constructive comments and suggestions (in alphabetical order by family name): Valentina Abril-melgarejo, Marjan Akbari, Carlos Allende Prieto, Hamed Altafi, Roland Bacon, Roberto Baena Gallé, Zahra Bagheri, Karl Berry, Faezeh Bidjarchian, Leindert Boogaard, Nicolas Bouché, Stefan Brüns, Fernando Buitrago, Adrian Bunk, Rosa Calvi, Mark Calabretta Nushkia Chamba, Sergio Chueca Urzay, Tamara Civera Lorenzo, Benjamin Clement, Nima Dehdilani, Andrés Del Pino Molina, Antonio Diaz Diaz, Paola Dimauro, Alexey Dokuchaev, Pierre-Alain Duc, Alessandro Ederoclite, Elham Eftekhari, Paul Eggert, Sepideh Eskandarlou, Sílvia Farras, Juan Antonio Fernández Ontiveros, Gaspar Galaz, Andrés García-Serra Romero, Zohre Ghaffari, Thérèse Godefroy, Giulia Golini, Craig Gordon, Martin Guerrero Roncel, Madusha Gunawardhana, Bruno Haible, Stephen Hamer, Siyang He, Zahra Hosseini, Leslie Hunt, Takashi Ichikawa, Raúl Infante Sainz, Brandon Invergo, Oryna Ivashtenko, Aurélien Jarno, Lee Kelvin, Brandon Kelly, Mohammad-Reza Khellat, Johan Knapen, Geoffry Krouchi, Martin Kuemmel, Teet Kuutma, Clotilde Laigle, Floriane Leclercq, Alan Lefor, Javier Licandro, Jeremy Lim, Alejandro Lumbreras Calle, Sebastián Luna Valero, Alberto Madrigal, Guillaume Mahler, Juan Miro, Alireza Molaeinezhad, Javier Moldon, Juan Molina Tobar, Francesco Montanari, Raphael Morales, Carlos Morales Socorro, Sylvain Mottet, Dmitrii Oparin, François Ochsenbein, Bertrand Pain, William Pence, Irene Pintos Castro, Mamta Pommier, Marcel Popescu, Bob Proulx, Joseph Putko, Samane Raji, Ignacio Ruiz Cejudo, Teymoor Saifollahi, Joanna Sakowska, Elham Saremi, Nafise Sedighi, Markus Schaney, Yahya Sefidbakht, Alejandro Serrano Borlaff, Zahra Sharbaf, David Shupe, Leigh Smith, Jenny Sorce, Manuel Sánchez-Benavente, Lee Spitler, Richard Stallman, Michael Stein, Ole Streicher, Alfred M. Szmidt, Michel Tallon, Juan C. Tello, Vincenzo Testa, Éric Thiébaut, Ignacio Trujillo, Peter Teuben, David Valls-Gabaud, Jesús Varela, Aaron Watkins, Richard Wilbur, Michael H.F. Wilkinson, Christopher Willmer, Xiuqin Wu, Sara Yousefi Taemeh, Johannes Zabl. The GNU French Translation Team is also managing the French version of the top Gnuastro web page which we highly appreciate. Finally, we should thank all the (sometimes anonymous) people in various online forums who patiently answered all our small (but important) technical questions.

All work on Gnuastro has been voluntary, but the authors are most grateful to the following institutions (in chronological order) for hosting/supporting us in our research. Where necessary, these institutions have disclaimed any ownership of the parts of Gnuastro that were developed there, thus insuring the freedom of Gnuastro for the future (see Copyright assignment). We highly appreciate their support for free software, and thus free science, and therefore a free society.

Tohoku University Astronomical Institute, Sendai, Japan.
University of Salento, Lecce, Italy.
Centre de Recherche Astrophysique de Lyon (CRAL), Lyon, France.
Instituto de Astrofisica de Canarias (IAC), Tenerife, Spain.
Centro de Estudios de Física del Cosmos de Aragón (CEFCA), Teruel, Spain.
Google Summer of Code 2020, 2021 and 2022


2 Tutorials

To help new users have a smooth and easy start with Gnuastro, in this chapter several thoroughly elaborated tutorials, or cookbooks, are provided. These tutorials demonstrate the capabilities of different Gnuastro programs and libraries, along with tips and guidelines for the best practices of using them in various realistic situations.

We strongly recommend going through these tutorials to get a good feeling of how the programs are related (built in a modular design to be used together in a pipeline), very similar to the core Unix-based programs that they were modeled on. Therefore these tutorials will help in optimally using Gnuastro’s programs (and generally, the Unix-like command-line environment) effectively for your research.

The first three tutorials (General program usage tutorial and Detecting large extended targets and Building the extended PSF) use real input datasets from some of the deep Hubble Space Telescope (HST) images, the Sloan Digital Sky Survey (SDSS) and the Javalambre Photometric Local Universe Survey (J-PLUS) respectively. Their aim is to demonstrate some real-world problems that many astronomers often face and how they can be solved with Gnuastro’s programs. The fourth tutorial (Sufi simulates a detection) focuses on simulating astronomical images, which is another critical aspect of any analysis!

The ultimate aim of General program usage tutorial is to detect galaxies in a deep HST image, measure their positions, magnitude and select those with the strongest colors. In the process, it takes many detours to introduce you to the useful capabilities of many of the programs. So please be patient in reading it. If you do not have much time and can only try one of the tutorials, we recommend this one.

Detecting large extended targets deals with a major problem in astronomy: effectively detecting the faint outer wings of bright (and large) nearby galaxies to extremely low surface brightness levels (roughly one quarter of the local noise level in the example discussed). Besides the interesting scientific questions in these low-surface brightness features, failure to properly detect them will bias the measurements of the background objects and the survey’s noise estimates. This is an important issue, especially in wide surveys. Because bright/large galaxies and stars20, cover a significant fraction of the survey area.

Building the extended PSF tackles an important problem in astronomy: how the extract the PSF of an image, to the largest possible extent, without assuming any functional form. In Gnuastro we have multiple installed scripts for this job. Their usage and logic behind best tuning them for the particular step, is fully described in this tutorial, on a real dataset. The tutorial concludes with subtracting that extended PSF from the science image; thus giving you a cleaner image (with no scattered light of the brighter stars) for your higher-level analysis.

Sufi simulates a detection has a fictional21 setting! Showing how Abd al-rahman Sufi (903 – 986 A.D., the first recorded description of “nebulous” objects in the heavens is attributed to him) could have used some of Gnuastro’s programs for a realistic simulation of his observations and see if his detection of nebulous objects was trust-able. Because all conditions are under control in a simulated/mock environment/dataset, mock datasets can be a valuable tool to inspect the limitations of your data analysis and processing. But they need to be as realistic as possible, so this tutorial is dedicated to this important step of an analysis (simulations).

There are other tutorials also, on things that are commonly necessary in astronomical research: In Detecting lines and extracting spectra in 3D data, we use MUSE cubes (an IFU dataset) to show how you can subtract the continuum, detect emission-line features, extract spectra and build pseudo narrow-band images. In Color channels in same pixel grid we demonstrate how you can warp multiple images into a single pixel grid (often necessary with multi-wavelength data), and build a single color image. In Moiré pattern in stacking and its correction we show how you can avoid the unwanted Moiré pattern which happens when warping separate exposures to build a stacked/co-add deeper image. In Zero point of an image we review the process of estimating the zero point of an image using a reference image or catalog. Finally, in Pointing pattern design we show the process by which you can simulate a dither pattern to find the best observing strategy for your next exciting scientific project.

In these tutorials, we have intentionally avoided too many cross references to make it more easy to read. For more information about a particular program, you can visit the section with the same name as the program in this book. Each program section in the subsequent chapters starts by explaining the general concepts behind what it does, for example, see Convolve. If you only want practical information on running a program, for example, its options/configuration, input(s) and output(s), please consult the subsection titled “Invoking ProgramName”, for example, see Invoking NoiseChisel. For an explanation of the conventions we use in the example codes through the book, please see Conventions.


2.1 General program usage tutorial

Measuring colors of astronomical objects in broad-band or narrow-band images is one of the most basic and common steps in astronomical analysis. Here, we will use Gnuastro’s programs to get a physical scale (area at certain redshifts) of the field we are studying, detect objects in a Hubble Space Telescope (HST) image, measure their colors and identify the ones with the strongest colors, do a visual inspection of these objects and inspect spatial position in the image. After this tutorial, you can also try the Detecting large extended targets tutorial which goes into a little more detail on detecting very low surface brightness signal.

During the tutorial, we will take many detours to explain, and practically demonstrate, the many capabilities of Gnuastro’s programs. In the end you will see that the things you learned during this tutorial are much more generic than this particular problem and can be used in solving a wide variety of problems involving the analysis of data (images or tables). So please do not rush, and go through the steps patiently to optimally master Gnuastro.

In this tutorial, we will use the HSTeXtreme Deep Field dataset. Like almost all astronomical surveys, this dataset is free for download and usable by the public. You will need the following tools in this tutorial: Gnuastro, SAO DS9 22, GNU Wget23, and AWK (most common implementation is GNU AWK24).

This tutorial was first prepared for the “Exploring the Ultra-Low Surface Brightness Universe” workshop (November 2017) at the ISSI in Bern, Switzerland. It was further extended in the “4th Indo-French Astronomy School” (July 2018) organized by LIO, CRAL CNRS UMR5574, UCBL, and IUCAA in Lyon, France. We are very grateful to the organizers of these workshops and the attendees for the very fruitful discussions and suggestions that made this tutorial possible.

Write the example commands manually: Try to type the example commands on your terminal manually and use the history feature of your command-line (by pressing the “up” button to retrieve previous commands). Do not simply copy and paste the commands shown here. This will help simulate future situations when you are processing your own datasets.


2.1.1 Calling Gnuastro’s programs

A handy feature of Gnuastro is that all program names start with ast. This will allow your command-line processor to easily list and auto-complete Gnuastro’s programs for you. Try typing the following command (press TAB key when you see <TAB>) to see the list:

$ ast<TAB><TAB>

Any program that starts with ast (including all Gnuastro programs) will be shown. By choosing the subsequent characters of your desired program and pressing <TAB><TAB> again, the list will narrow down and the program name will auto-complete once your input characters are unambiguous. In short, you often do not need to type the full name of the program you want to run.


2.1.2 Accessing documentation

Gnuastro contains a large number of programs and it is natural to forget the details of each program’s options or inputs and outputs. Therefore, before starting the analysis steps of this tutorial, let’s review how you can access this book to refresh your memory any time you want, without having to take your hands off the keyboard.

When you install Gnuastro, this book is also installed on your system along with all the programs and libraries, so you do not need an internet connection to access/read it. Also, by accessing this book as described below, you can be sure that it corresponds to your installed version of Gnuastro.

GNU Info25 is the program in charge of displaying the manual on the command-line (for more, see Info). To see this whole book on your command-line, please run the following command and press subsequent keys. Info has its own mini-environment, therefore we will show the keys that must be pressed in the mini-environment after a -> sign. You can also ignore anything after the # sign in the middle of the line, they are only for your information.

$ info gnuastro                # Open the top of the manual.
-> <SPACE>                     # All the book chapters.
-> <SPACE>                     # Continue down: show sections.
-> <SPACE> ...                 # Keep pressing space to go down.
-> q                           # Quit Info, return to the command-line.

The thing that greatly simplifies navigation in Info is the links (regions with an underline). You can immediately go to the next link in the page with the <TAB> key and press <ENTER> on it to go into that part of the manual. Try the commands above again, but this time also use <TAB> to go to the links and press <ENTER> on them to go to the respective section of the book. Then follow a few more links and go deeper into the book. To return to the previous page, press l (small L). If you are searching for a specific phrase in the whole book (for example, an option name), press s and type your search phrase and end it with an <ENTER>. Finally, you can return to the command line and quit Info by pressing the q key.

You do not need to start from the top of the manual every time. For example, to get to Invoking NoiseChisel, run the following command. In general, all programs have such an “Invoking ProgramName” section in this book. These sections are specifically for the description of inputs, outputs and configuration options of each program. You can access them directly for each program by giving its executable name to Info.

$ info astnoisechisel

The other sections do not have such shortcuts. To directly access them from the command-line, you need to tell Info to look into Gnuastro’s manual, then look for the specific section (an unambiguous title is necessary). For example, if you only want to review/remember NoiseChisel’s Detection options), just run the following command. Note how case is irrelevant for Info when calling a title in this manner.

$ info gnuastro "Detection options"

In general, Info is a powerful and convenient way to access this whole book with detailed information about the programs you are running. If you are not already familiar with it, please run the following command and just read along and do what it says to learn it. Do not stop until you feel sufficiently fluent in it. Please invest the half an hour’s time necessary to start using Info comfortably. It will greatly improve your productivity and you will start reaping the rewards of this investment very soon.

$ info info

As a good scientist you need to feel comfortable to play with the features/options and avoid (be critical to) using default values as much as possible. On the other hand, our human memory is limited, so it is important to be able to easily access any part of this book fast and remember the option names, what they do and their acceptable values.

If you just want the option names and a short description, calling the program with the --help option might also be a good solution like the first example below. If you know a few characters of the option name, you can feed the printed output to grep like the second or third example commands.

$ astnoisechisel --help
$ astnoisechisel --help | grep quant
$ astnoisechisel --help | grep check

2.1.3 Setup and data download

The first step in the analysis of the tutorial is to download the necessary input datasets. First, to keep things clean, let’s create a gnuastro-tutorial directory and continue all future steps in it:

$ mkdir gnuastro-tutorial
$ cd gnuastro-tutorial

We will be using the near infra-red Wide Field Camera dataset. If you already have them in another directory (for example, XDFDIR, with the same FITS file names), you can set the download directory to be a symbolic link to XDFDIR with a command like this:

$ ln -s XDFDIR download

Otherwise, when the following images are not already present on your system, you can make a download directory and download them there.

$ mkdir download
$ cd download
$ xdfurl=http://archive.stsci.edu/pub/hlsp/xdf
$ wget $xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_f105w_v1_sci.fits
$ wget $xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_f125w_v1_sci.fits
$ wget $xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits
$ cd ..

In this tutorial, we will just use these three filters. Later, you may need to download more filters. To do that, you can use the shell’s for loop to download them all in series (one after the other26) with one command like the one below for the WFC3 filters. Put this command instead of the three wget commands above. Recall that all the extra spaces, backslashes (\), and new lines can be ignored if you are typing on the lines on the terminal.

$ for f in f105w f125w f140w f160w; do \
    wget $xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_"$f"_v1_sci.fits; \
  done

2.1.4 Dataset inspection and cropping

First, let’s visually inspect the datasets we downloaded in Setup and data download. Let’s take F160W image as an example. One of the most common programs for viewing FITS images is SAO DS9, which is usually called through the ds9 command-line program, like the command below. If you do not already have DS9 on your computer and the command below fails, please see SAO DS9.

$ ds9 download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits

By default, DS9 open a relatively small window (for modern browsers) and its default scale and color bar make it very hard to see any structure in the image: everything will look black. Also, by default, it zooms into the center of the image and you need to scroll to zoom-out and see the whole thing. To avoid these problems, Gnuastro has the astscript-fits-view script:

$ astscript-fits-view \
           download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits

After running this command, you will see that the DS9 window fully covers the height of your monitor, it is showing the whole image, using a more clear color-map, and many more useful things. In fact, you see the DS9 command that is used in your terminal27. On GNU/Linux operating systems (like Ubuntu, and Fedora), you can also set your graphics user interface to use this script for opening FITS files when you click on them. For more, see the instructions in the checklist at the start of Invoking astscript-fits-view.

As you hover your mouse over the image, notice how the “Value” and positional fields on the top of the ds9 window get updated. The first thing you might notice is that when you hover the mouse over the regions with no data, they have a value of zero. The next thing might be that the dataset has a shallower and deeper component (see Quantifying measurement limits). Recall that this is a combined/reduced image of many exposures, and the parts that have more exposures are deeper. In particular, the exposure time of the deep inner region is more than 4 times the exposure time of the outer (more shallower) parts.

To simplify the analysis in this tutorial, we will only be working on the deep field, so let’s crop it out of the full dataset. Fortunately the XDF survey web page (above) contains the vertices of the deep flat WFC3-IR field28. With Gnuastro’s Crop program, you can use those vertices to cutout this deep region from the larger image (to learn more about the Crop program see Crop). But before that, to keep things organized, let’s make a directory called flat-ir and keep the flat (single-depth) regions in that directory (with a ‘xdf-’ prefix for a shorter and easier filename).

$ mkdir flat-ir
$ astcrop --mode=wcs -h0 --output=flat-ir/xdf-f105w.fits \
          --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \
                     53.134517,-27.787144 : 53.161906,-27.807208" \
          download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f105w_v1_sci.fits

$ astcrop --mode=wcs -h0 --output=flat-ir/xdf-f125w.fits \
          --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \
                     53.134517,-27.787144 : 53.161906,-27.807208" \
          download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f125w_v1_sci.fits

$ astcrop --mode=wcs -h0 --output=flat-ir/xdf-f160w.fits \
          --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \
                     53.134517,-27.787144 : 53.161906,-27.807208" \
          download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits

Run the command below to have a look at the cropped images:

$ astscript-fits-view flat-ir/*.fits

You only see the deep region now, does not the noise look much cleaner? An important result of this crop is that regions with no data now have a NaN (Not-a-Number, or a blank value) value. Any self-respecting statistical program will ignore NaN values, so they will not affect your outputs. For example, notice how changing the DS9 color bar will not affect the NaN pixels (their colors will not change).

However, do you remember that in the downloaded files, such regions had a value of zero? That is a big problem! Because zero is a number, and is thus meaningful, especially when you later want to NoiseChisel to detect29 all the signal from the deep universe in this image. Generally, when you want to ignore some pixels in a dataset, and avoid higher-level ambiguities or complications, it is always best to give them blank values (not zero, or some other absurdly large or small number). Gnuastro has the Arithmetic program for such cases, and we will introduce it later in this tutorial.

In the example above, the polygon vertices are in degrees, but you can also replace them with sexagesimal30 coordinates (for example, using 03h32m44.9794 or 03:32:44.9794 instead of 53.187414, the first RA, and -27d46m44.9472 or -27:46:44.9472 instead of -27.779152, the first Dec). To further simplify things, you can even define your polygon visually as a DS9 “region”, save it as a “region file” and give that file to crop. But we need to continue, so if you are interested to learn more, see Crop.

Before closing this section, let’s just take a look at the three cropping commands we ran above. The only thing varying in the three commands the filter name! Note how everything else is the same! In such cases, you should generally avoid repeating a command manually, it is prone to many bugs, and as you see, it is very hard to read (did not you suddenly write a 7 as an 8?).

To simplify the command, and allow you to work on more filters, we can use the shell’s for loop as shown below. Notice how the place where the filter names (f105w, f125w and f160w) are used above, have been replaced with $f (the shell variable that for will update in every loop) below.

$ rm flat-ir/*.fits
$ for f in f105w f125w f160w; do \
    astcrop --mode=wcs -h0 --output=flat-ir/xdf-$f.fits \
            --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \
                       53.134517,-27.787144 : 53.161906,-27.807208" \
            download/hlsp_xdf_hst_wfc3ir-60mas_hudf_"$f"_v1_sci.fits; \
  done

2.1.5 Angular coverage on the sky

The cropped images in Dataset inspection and cropping are the deepest images we currently have of the sky. The first thing that comes to mind may be this: “How large is this field on the sky?”.

More accurate method: the steps mentioned in this section are primarily designed to help you get familiar with the FITS WCS standard and some shells scripting. The accuracy of this method will decrease as your image becomes large (on the scale of degrees). For an accurate method, see Area of non-blank pixels on sky.

You can get a fast and crude answer with Gnuastro’s Fits program, using this command:

$ astfits flat-ir/xdf-f160w.fits --skycoverage

It will print the sky coverage in two formats (all numbers are in units of degrees for this image): 1) the image’s central RA and Dec and full width around that center, 2) the range of RA and Dec covered by this image. You can use these values in various online query systems. You can also use this option to automatically calculate the area covered by this image. With the --quiet option, the printed output of --skycoverage will not contain human-readable text, making it easier for automatic (computer) processing:

$ astfits flat-ir/xdf-f160w.fits --skycoverage --quiet

The second row is the coverage range along RA and Dec (compare with the outputs before using --quiet). We can thus simply subtract the second from the first column and multiply it with the difference of the fourth and third columns to calculate the image area. We will also multiply each by 60 to have the area in arc-minutes squared.

$ astfits flat-ir/xdf-f160w.fits --skycoverage --quiet \
        | awk 'NR==2{print ($2-$1)*60*($4-$3)*60}'

The returned value is \(9.06711\) arcmin\(^2\). However, this method ignores the fact that many of the image pixels are blank! In other words, the image does cover this area, but there is no data in more than half of the pixels. So let’s calculate the area coverage over-which we actually have data.

The FITS world coordinate system (WCS) metadata standard contains the key to answering this question. Run the following command to see all the FITS keywords (metadata) for one of the images (almost identical with the other images because they are scaled to the same region of Sky):

$ astfits flat-ir/xdf-f160w.fits -h1

Look into the keywords grouped under the ‘World Coordinate System (WCS)’ title. These keywords define how the image relates to the outside world. In particular, the CDELT* keywords (or CDELT1 and CDELT2 in this 2D image) contain the “Coordinate DELTa” (or change in coordinate units) with a change in one pixel. But what is the units of each “world” coordinate? The CUNIT* keywords (for “Coordinate UNIT”) have the answer. In this case, both CUNIT1 and CUNIT1 have a value of deg, so both “world” coordinates are in units of degrees. We can thus conclude that the value of CDELT* is in units of degrees-per-pixel31.

With the commands below, we will use CDELT (along with the number of non-blank pixels) to find the answer of our initial question: “how much of the sky does this image cover?”. The lines starting with ## are just comments for you to read and understand each command. Do not type them on the terminal (no problem if you do, they will just not have any effect). The commands are intentionally repetitive in some places to better understand each step and also to demonstrate the beauty of command-line features like history, variables, pipes and loops (which you will commonly use as you become more proficient on the command-line).

Use shell history: Do not forget to make effective use of your shell’s history: you do not have to re-type previous command to add something to them (like the examples below). This is especially convenient when you just want to make a small change to your previous command. Press the “up” key on your keyboard (possibly multiple times) to see your previous command(s) and modify them accordingly.

Your locale does not use ‘.’ as decimal separator: on systems that do not use an English language environment, the dates, numbers, etc., can be printed in different formats (for example, ‘0.5’ can be written as ‘0,5’: with a comma). With the LC_NUMERIC line at the start of the script below, we are ensuring a unified format in the output of seq. For more, please see Numeric locale.

## Make sure that the decimal separator is a point in any environment.
$ export LC_NUMERIC=C

## See the general statistics of non-blank pixel values.
$ aststatistics flat-ir/xdf-f160w.fits

## We only want the number of non-blank pixels (add '--number').
$ aststatistics flat-ir/xdf-f160w.fits --number

## Keep the result of the command above in the shell variable `n'.
$ n=$(aststatistics flat-ir/xdf-f160w.fits --number)

## See what is stored the shell variable `n'.
$ echo $n

## Show all the FITS keywords of this image.
$ astfits flat-ir/xdf-f160w.fits -h1

## The resolution (in degrees/pixel) is in the `CDELT' keywords.
## Only show lines that contain these characters, by feeding
## the output of the previous command to the `grep' program.
$ astfits flat-ir/xdf-f160w.fits -h1 | grep CDELT

## Since the resolution of both dimensions is (approximately) equal,
## we will only read the value of one (CDELT1) with '--keyvalue'.
$ astfits flat-ir/xdf-f160w.fits -h1 --keyvalue=CDELT1

## We do not need the file name in the output (add '--quiet').
$ astfits flat-ir/xdf-f160w.fits -h1 --keyvalue=CDELT1 --quiet

## Save it as the shell variable `r'.
$ r=$(astfits flat-ir/xdf-f160w.fits -h1 --keyvalue=CDELT1 --quiet)

## Print the values of `n' and `r'.
$ echo $n $r

## Use the number of pixels (first number passed to AWK) and
## length of each pixel's edge (second number passed to AWK)
## to estimate the area of the field in arc-minutes squared.
$ echo $n $r | awk '{print $1 * ($2*60)^2}'

The output of the last command (area of this field) is 4.03817 (or approximately 4.04) arc-minutes squared. Just for comparison, this is roughly 175 times smaller than the average moon’s angular area (with a diameter of 30 arc-minutes or half a degree).

Some FITS writers do not use the CDELT convention, making it hard to use the steps above. In such cases, you can extract the pixel scale with the --pixelscale option of Gnuastro’s Fits program like the command below. Similar to the --skycoverage option above, you can also use the --quiet option to allow easy usage of the values in scripts.

$ astfits flat-ir/xdf-f160w.fits --pixelscale

AWK for table/value processing: As you saw above AWK is a powerful and simple tool for text processing. You will see it often in shell scripts. GNU AWK (the most common implementation) comes with a free and wonderful book in the same format as this book which will allow you to master it nicely. Just like this manual, you can also access GNU AWK’s manual on the command-line whenever necessary without taking your hands off the keyboard. Just run info awk.


2.1.6 Cosmological coverage and visualizing tables

Having found the angular coverage of the dataset in Angular coverage on the sky, we can now use Gnuastro to answer a more physically motivated question: “How large is this area at different redshifts?”. To get a feeling of the tangential area that this field covers at redshift 2, you can use Gnuastro’s CosmicCalcular program (CosmicCalculator). In particular, you need the tangential distance covered by 1 arc-second as raw output. Combined with the field’s area that was measured before, we can calculate the tangential distance in Mega Parsecs squared (\(Mpc^2\)).

## If your system language uses ',' (not '.') as decimal separator.
$ export LC_NUMERIC=C

## Print general cosmological properties at redshift 2 (for example).
$ astcosmiccal -z2

## When given a "Specific calculation" option, CosmicCalculator
## will just print that particular calculation. To see all such
## calculations, add a `--help' token to the previous command
## (under the same title). Note that with `--help', no processing
## is done, so you can always simply append it to remember
## something without modifying the command you want to run.
$ astcosmiccal -z2 --help

## Only print the "Tangential dist. covered by 1arcsec at z (kpc)".
## in units of kpc/arc-seconds.
$ astcosmiccal -z2 --arcsectandist

## It is easier to use the short (single character) version of
## this option when typing (but this is hard to read, so use
## the long version in scripts or notes you plan to archive).
$ astcosmiccal -z2 -s

## Short options can be merged (they are only a single character!)
$ astcosmiccal -sz2

## Convert this distance to kpc^2/arcmin^2 and save in `k'.
$ k=$(astcosmiccal -sz2 | awk '{print ($1*60)^2}')

## Calculate the area of the dataset in arcmin^2.
$ n=$(aststatistics flat-ir/xdf-f160w.fits --number)
$ r=$(astfits flat-ir/xdf-f160w.fits -h1 --keyvalue=CDELT1 -q)
$ a=$(echo $n $r | awk '{print $1 * ($2*60)^2 }')

## Multiply `k' and `a' and divide by 10^6 for value in Mpc^2.
$ echo $k $a | awk '{print $1 * $2 / 1e6}'

At redshift 2, this field therefore covers approximately 1.07 \(Mpc^2\). If you would like to see how this tangential area changes with redshift, you can use a shell loop like below.

$ for z in 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0; do        \
    k=$(astcosmiccal -sz$z);                                  \
    echo $z $k $a | awk '{print $1, ($2*60)^2 * $3 / 1e6}';   \
  done

Fortunately, the shell has a useful tool/program to print a sequence of numbers that is nicely called seq (short for “sequence”). You can use it instead of typing all the different redshifts in the loop above. For example, the loop below will calculate and print the tangential coverage of this field across a larger range of redshifts (0.1 to 5) and with finer increments of 0.1. For more on the LC_NUMERIC command, see Numeric locale.

## If your system language uses ',' (not '.') as decimal separator.
$ export LC_NUMERIC=C

## The loop over the redshifts
$ for z in $(seq 0.1 0.1 5); do                                  \
    k=$(astcosmiccal -z$z --arcsectandist);                      \
    echo $z $k $a | awk '{print $1, ($2*60)^2 * $3 / 1e6}';   \
  done

Have a look at the two printed columns. The first is the redshift, and the second is the area of this image at that redshift (in Mega Parsecs squared). Redshift (\(z\)) is a measure of distance in galaxy evolution and cosmology: a higher redshift corresponds to larger distance.

Now, have a look at the first few values. At \(z=0.1\) and \(z=0.5\), this image covers \(0.05 Mpc^2\) and \(0.57 Mpc^2\) respectively. This increase of coverage with redshift is expected because a fixed angle will cover a larger tangential area at larger distances. However, as you come down the list (to higher redshifts) you will notice that this relation does not hold! The largest coverage is at \(z=1.6\): at higher redshifts, the area decreases, and continues decreasing!!! In \(\Lambda{}CDM\) cosmology, this happens because of the finite speed of light and the expansion of the universe, see the Wikipedia page.

In case you have TOPCAT, you can visualize this as a plot (if you do not have TOPCAT, see TOPCAT). To do so, first you need to save the output of the loop above into a FITS table by piping the output to Gnuastro’s Table program and giving an output name:

$ for z in $(seq 0.1 0.1 5); do                                  \
    k=$(astcosmiccal -z$z --arcsectandist);                      \
    echo $z $k $a | awk '{print $1, ($2*60)^2 * $3 / 1e6}';   \
  done | asttable --output=z-vs-tandist.fits

You can now use Gnuastro’s astscript-fits-view to open this table in TOPCAT with the command below. Do you remember this script from Dataset inspection and cropping? There, we used it to view a FITS image with DS9! This script will see if the first dataset in the image is a table or an image and will call TOPCAT or DS9 accordingly: making it a very convenient tool to inspect the contents of all types of FITS data.

$ astscript-fits-view z-vs-tandist.fits

After TOPCAT opens, you will see the name of the table z-vs-tandist.fits in the left panel. On the top menu bar, select the “Graphics” menu, then select “Plain plot” to visualize the two columns printed above as a plot and get a better impression of the turn over point of the image cosmological coverage.


2.1.7 Building custom programs with the library

In Cosmological coverage and visualizing tables, we repeated a certain calculation/output of a program multiple times using the shell’s for loop. This simple way of repeating a calculation is great when it is only necessary once. However, if you commonly need this calculation and possibly for a larger number of redshifts at higher precision, the command above can be slow. Please try it out by changing the sequence command in the previous section to ‘seq 0.1 0.01 10’. It will take about 11 seconds32! This can be improved by hundreds of times! This section will show you how.

Generally, repeated calls to a generic program (like CosmicCalculator) are slow, because a generic program can have a lot of overhead on each call. To be generic and easy to operate, CosmicCalculator has to parse the command-line and all configuration files (see Option management and configuration files) which contain human-readable characters and need a lot of pre-processing to be ready for processing by the computer. Afterwards, CosmicCalculator has to check the sanity of its inputs and check which of its many options you have asked for. All the this pre-processing takes as much time as the high-level calculation you are requesting, and it has to re-do all of these for every redshift in your loop.

To greatly speed up the processing, you can directly access the core work-horse of CosmicCalculator without all that overhead by designing your custom program for this job. Using Gnuastro’s library, you can write your own tiny program particularly designed for this exact calculation (and nothing else!). To do that, copy and paste the following C program in a file called myprogram.c.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <gnuastro/cosmology.h>

int
main(void)
{
  double area=4.03817;          /* Area of field (arcmin^2). */
  double z, adist, tandist;     /* Temporary variables.      */

  /* Constants from Plank 2018 (arXiv:1807.06209, Table 2) */
  double H0=67.66, olambda=0.6889, omatter=0.3111, oradiation=0;

  /* Do the same thing for all redshifts (z) between 0.1 and 5. */
  for(z=0.1; z<10; z+=0.01)
    {
      /* Calculate the angular diameter distance. */
      adist=gal_cosmology_angular_distance(z, H0, olambda,
                                           omatter, oradiation);

      /* Calculate the tangential distance of one arcsecond. */
      tandist = adist * 1000 * M_PI / 3600 / 180;

      /* Print the redshift and area. */
      printf("%-5.2f %g\n", z, pow(tandist * 60,2) * area / 1e6);
    }

  /* Tell the system that everything finished successfully. */
  return EXIT_SUCCESS;
}

Then run the following command to compile your program and run it.

$ astbuildprog myprogram.c

In the command above, you used Gnuastro’s BuildProgram program. Its job is to simplify the compilation, linking and running of simple C programs that use Gnuastro’s library (like this one). BuildProgram is designed to manage Gnuastro’s dependencies, compile and link your custom program and then run it.

Did you notice how your custom program created the table almost instantaneously? Technically, it only took about 0.03 seconds! Recall that the for loop of Cosmological coverage and visualizing tables took more than 11 seconds (or \(\sim367\) times slower!).

Please run the ls command to see a listing of the files in the current directory. You will notice that a new file called myprogram has been created. This is the compiled program that was created and run by the command above (its in binary machine code format, not human-readable any more). You can run it again to get the same results by executing it:

$ ./myprogram

The efficiency of your custom myprogram compared to repeated calls to CosmicCalculator is because in the latter, the requested processing is comparable to the necessary overheads. For other programs that take large input datasets and do complicated processing on them, the overhead is usually negligible compared to the processing. In such cases, the libraries are only useful if you want a different/new processing compared to the functionalities in Gnuastro’s existing programs.

Gnuastro has a large library which is used extensively by all the programs. In other words, the library is like the skeleton of Gnuastro. For the full list of available functions classified by context, please see Gnuastro library. Gnuastro’s library and BuildProgram are created to make it easy for you to use these powerful features as you like. This gives you a high level of creativity, while also providing efficiency and robustness. Several other complete working examples (involving images and tables) of Gnuastro’s libraries can be see in Library demo programs.

But for this tutorial, let’s stop discussing the libraries here and get back to Gnuastro’s already built programs (which do not need C programming). But before continuing, let’s clean up the files we do not need any more:

$ rm myprogram* z-vs-tandist*

2.1.8 Option management and configuration files

In the previous section (Cosmological coverage and visualizing tables), when you ran CosmicCalculator, you only specified the redshfit with -z2 option. You did not specify the cosmological parameters that are necessary for the calculations! Parameters like the Hubble constant (\(H_0\)) and the matter density. In spite of this, CosmicCalculator done its processing and printed results.

None of Gnuastro’s programs keep a default value internally within their code (they are all set by the user)! So where did the necessary cosmological parameters that are necessary for its calculations come from? What were the values to those parameters? In short, they come from a configuration file (see Configuration file precedence), and the final used values can be checked/edited on the command-line. In this section we will review this important aspect of all the programs in Gnuastro.

Configuration files are an important part of all Gnuastro’s programs, especially the ones with a large number of options, so it is important to understand this part well. Once you get comfortable with configuration files, you can make good use of them in all Gnuastro programs (for example, NoiseChisel). For example, to do optimal detection on various datasets, you can have configuration files for different noise properties. The configuration of each program (besides its version) is vital for the reproducibility of your results, so it is important to manage them properly.

As we saw above, the full list of the options in all Gnuastro programs can be seen with the --help option. Try calling it with CosmicCalculator as shown below. Note how options are grouped by context to make it easier to find your desired option. However, in each group, options are ordered alphabetically.

$ astcosmiccal --help

After running the command above, please scroll to the line that you ran this command and read through the output (its the same format for all the programs). All options have a long format (starting with -- and a multi-character name) and some have a short format (starting with - and a single character), for more see Options. The options that expect a value, have an = sign after their long version. The format of their expected value is also shown as FLT, INT or STR for floating point numbers, integer numbers, and strings (filenames for example) respectively.

You can see the values of all options that need one with the --printparams option (or its short format: -P). --printparams is common to all programs (see Common options). You can see the default cosmological parameters, from the Plank collaboration 2020, under the # Input: title:

$ astcosmiccal -P

# Input:
 H0          67.66    # Current expansion rate (Hubble constant).
 olambda     0.6889   # Current cosmological cst. dens. per crit. dens.
 omatter     0.3111   # Current matter density per critical density.
 oradiation  0        # Current radiation density per critical density.

Let’s say you want to do the calculation in the previous section using \(H_0=70\) km/s/Mpc. To do this, just add --H0=70 after the command above (while keeping the -P). In the output, you can see that the used Hubble constant has also changed.

$ astcosmiccal -P --H0=70

Afterwards, delete the -P and add a -z2 to see the calculations with the new cosmology (or configuration).

$ astcosmiccal --H0=70 -z2

From the output of the --help option, note how the option for Hubble constant has both short (-H) and long (--H0) formats. One final note is that the equal (=) sign is not mandatory. In the short format, the value can stick to the actual option (the short option name is just one character after-all, thus easily identifiable) and in the long format, a white-space character is also enough.

$ astcosmiccal -H70    -z2
$ astcosmiccal --H0 70 -z2 --arcsectandist

When an option does not need a value, and has a short format (like --arcsectandist), you can easily append it before other short options. So the last command above can also be written as:

$ astcosmiccal --H0 70 -sz2

Let’s assume that in one project, you want to only use rounded cosmological parameters (\(H_0\) of 70km/s/Mpc and matter density of 0.3). You should therefore run CosmicCalculator like this:

$ astcosmiccal --H0=70 --olambda=0.7 --omatter=0.3 -z2

But having to type these extra options every time you run CosmicCalculator will be prone to errors (typos in particular), frustrating and slow. Therefore in Gnuastro, you can put all the options and their values in a “Configuration file” and tell the programs to read the option values from there.

Let’s create a configuration file... With your favorite text editor, make a file named my-cosmology.conf (or my-cosmology.txt, the suffix does not matter for Gnuastro, but a more descriptive suffix like .conf is recommended for humans reading your code and seeing your files: this includes you, looking into your own project, in a couple of months that you have forgot the details!). Then put the following lines inside of the plain-text file. One space between the option value and name is enough, the values are just under each other to help in readability. Also note that you should only use long option names in configuration files.

H0       70
olambda  0.7
omatter  0.3

You can now tell CosmicCalculator to read this file for option values immediately using the --config option as shown below. Do you see how the output of the following command corresponds to the option values in my-cosmology.conf, and is therefore identical to the previous command?

$ astcosmiccal --config=my-cosmology.conf -z2

But still, having to type --config=my-cosmology.conf every time is annoying, is not it? If you need this cosmology every time you are working in a specific directory, you can use Gnuastro’s default configuration file names and avoid having to type it manually.

The default configuration files (that are checked if they exist) must be placed in the hidden .gnuastro sub-directory (in the same directory you are running the program). Their file name (within .gnuastro) must also be the same as the program’s executable name. So in the case of CosmicCalculator, the default configuration file in a given directory is .gnuastro/astcosmiccal.conf.

Let’s do this. We will first make a directory for our custom cosmology, then build a .gnuastro within it. Finally, we will copy the custom configuration file there:

$ mkdir my-cosmology
$ mkdir my-cosmology/.gnuastro
$ mv my-cosmology.conf my-cosmology/.gnuastro/astcosmiccal.conf

Once you run CosmicCalculator within my-cosmology (as shown below), you will see how your custom cosmology has been implemented without having to type anything extra on the command-line.

$ cd my-cosmology
$ astcosmiccal -P       # Your custom cosmology is printed.
$ cd ..
$ astcosmiccal -P       # The default cosmology is printed.

To further simplify the process, you can use the --setdirconf option. If you are already in your desired working directory, calling this option with the others will automatically write the final values (along with descriptions) in .gnuastro/astcosmiccal.conf. For example, try the commands below:

$ mkdir my-cosmology2
$ cd my-cosmology2
$ astcosmiccal -P
$ astcosmiccal --H0 70 --olambda=0.7 --omatter=0.3 --setdirconf
$ astcosmiccal -P
$ cd ..

Gnuastro’s programs also have default configuration files for a specific user (when run in any directory). This allows you to set a special behavior every time a program is run by a specific user. Only the directory and filename differ from the above, the rest of the process is similar to before. Finally, there are also system-wide configuration files that can be used to define the option values for all users on a system. See Configuration file precedence for a more detailed discussion.

We will stop the discussion on configuration files here, but you can always read about them in Configuration files. Before continuing the tutorial, let’s delete the two extra directories that we do not need any more:

$ rm -rf my-cosmology*

2.1.9 Warping to a new pixel grid

We are now ready to start processing the deep HST images that were prepared in Dataset inspection and cropping. One of the most important points while using several images for data processing is that those images must have the same pixel grid. The process of changing the pixel grid is named ‘warp’. Fortunately, Gnuastro has Warp program for warping the pixel grid (see Warp).

Warping to a different/matched pixel grid is commonly needed before higher-level analysis especially when you are using datasets from different instruments. The XDF datasets we are using here are already aligned to the same pixel grid. But let’s have a look at some of Gnuastro’s linear warping features here. For example, try rotating one of the images by 20 degrees with the first command below. With the second command, open the output and input to see how it is rotated.

$ astwarp flat-ir/xdf-f160w.fits --rotate=20

$ astscript-fits-view flat-ir/xdf-f160w.fits xdf-f160w_rotated.fits

Warp can generally be used for many kinds of pixel grid manipulation (warping), not just rotations. For example, the outputs of the commands below will have larger pixels respectively (new resolution being one quarter the original resolution), get shifted by 2.8 (by sub-pixel), get a shear of 2, and be tilted (projected). Run each of them and open the output file to see the effect, they will become handy for you in the future.

$ astwarp flat-ir/xdf-f160w.fits --scale=0.25
$ astwarp flat-ir/xdf-f160w.fits --translate=2.8
$ astwarp flat-ir/xdf-f160w.fits --shear=0.2
$ astwarp flat-ir/xdf-f160w.fits --project=0.001,0.0005
$ astscript-fits-view flat-ir/xdf-f160w.fits *.fits

If you need to do multiple warps, you can combine them in one call to Warp. For example, to first rotate the image, then scale it, run this command:

$ astwarp flat-ir/xdf-f160w.fits --rotate=20 --scale=0.25

If you have multiple warps, do them all in one command. Do not warp them in separate commands because the correlated noise will become too strong. As you see in the matrix that is printed when you run Warp, it merges all the warps into a single warping matrix (see Merging multiple warpings) and simply applies that (mixes the pixel values) just once. However, if you run Warp multiple times, the pixels will be mixed multiple times, creating a strong artificial blur/smoothing, or stronger correlated noise.

Recall that the merging of multiple warps is done through matrix multiplication, therefore order matters in the separate operations. At a lower level, through Warp’s --matrix option, you can directly request your desired final warp and do not have to break it up into different warps like above (see Invoking Warp).

Fortunately these datasets are already aligned to the same pixel grid, so you do not actually need the files that were just generated. You can safely delete them all with the following command. Here, you see why we put the processed outputs that we need later into a separate directory. In this way, the top directory can be used for temporary files for testing that you can simply delete with a generic command like below.

$ rm *.fits

2.1.10 NoiseChisel and Multi-Extension FITS files

In the previous sections, we completed a review of the basics of Gnuastro’s programs. We are now ready to do some more serious analysis on the downloaded images: extract the pixels containing signal from the image, find sub-structure of the extracted signal, do measurements over the extracted objects and analyze them (finding certain objects of interest in the image).

The first step is to separate the signal (galaxies or stars) from the background noise in the image. We will be using the results of Dataset inspection and cropping, so be sure you already have them. Gnuastro has NoiseChisel for this job. But NoiseChisel’s output is a multi-extension FITS file, therefore to better understand how to use NoiseChisel, let’s take a look at multi-extension FITS files and how you can interact with them.

In the FITS format, each extension contains a separate dataset (image in this case). You can get basic information about the extensions in a FITS file with Gnuastro’s Fits program (see Fits). To start with, let’s run NoiseChisel without any options, then use Gnuastro’s Fits program to inspect the number of extensions in this file.

$ astnoisechisel flat-ir/xdf-f160w.fits
$ astfits xdf-f160w_detected.fits

From the output list, we see that NoiseChisel’s output contains 5 extensions. The zero-th (counting from zero, with name NOISECHISEL-CONFIG) is empty: it has value of 0 in the fourth column (which shows its size in pixels). Like NoiseChisel, in all of Gnuastro’s programs, the first (or zero-th) extension of the output only contains meta-data: data about/describing the datasets within (all) the output’s extensions. This is recommended by the FITS standard, see Fits for more. In the case of Gnuastro’s programs, this generic zero-th/meta-data extension (for the whole file) contains all the configuration options of the program that created the file.

Metadata regarding how the analysis was done (or a dataset was created) is very important for higher-level analysis and reproducibility. Therefore, Let’s first take a closer look at the NOISECHISEL-CONFIG extension. If you specify a special header in the FITS file, Gnuastro’s Fits program will print the header keywords (metadata) of that extension. You can either specify the HDU/extension counter (starting from 0), or name. Therefore, the two commands below are identical for this file. We are usually tempted to use the first (shorter format), but when putting your commands into a script, please use the second format which is more human-friendly and understandable for readers of your code who may not know what is in the 0-th extension (this includes yourself in a few months!):

$ astfits xdf-f160w_detected.fits -h0
$ astfits xdf-f160w_detected.fits -hNOISECHISEL-CONFIG

The first group of FITS header keywords you see (containing the SIMPLE and BITPIX keywords; before the first empty line) are standard keywords. They are required by the FITS standard and must be present in any FITS extension. The second group starts with the input file name (value to the INPUT keyword). The rest of the keywords you see afterwards have the same name as NoiseChisel’s options, and the value used by NoiseChisel in this run is shown after the = sign. Finally, the last group (starting with DATE) contains the date and version information of Gnuastro and its dependencies that were used to generate this file. Besides the option values, these are also critical for future reproducibility of the result (you may update Gnuastro or its dependencies, and they may behave differently afterwards). The “versions and date” group of keywords are present in all Gnuastro’s FITS extension outputs, for more see Output FITS files.

Note that if a keyword name is larger than 8 characters, it is preceded by a HIERARCH keyword and that all keyword names are in capital letters. These are all part of the FITS standard and originate from its history. But in short, both can be ignored! For example, with the commands below, let’s see at first what the default values are, and then just check the value of --detgrowquant option (using the -P option described in Option management and configuration files).

$ astnoisechisle -P
$ astnoisechisel -P | grep detgrowquant

To confirm that NoiseChisel used this value when we ran it above, let’s use grep to extract the keyword line with detgrowquant from the metadata extension. However, as you saw above, keyword names in the header is in all caps. So we need to ask grep to ignore case with the -i option.

$ astfits xdf-f160w_detected.fits -h0 | grep -i detgrowquant

In the output of the above command, you see HIERARCH at the start of the line. According to the FITS standard, HIERARCH is placed at the start of all keywords that have a name that is more than 8 characters long. Both the all-caps and the HIERARCH keyword can be annoying when you want to read/check the value. Therefore, the best solution is to use the --keyvalue option of Gnuastro’s astfits program as shown below. With it, you do not have to worry about HIERARCH or the case of the name (FITS keyword names are not case-sensitive).

$ astfits xdf-f160w_detected.fits -h0 --keyvalue=detgrowquant -q

The metadata (that is stored in the output) can later be used to exactly reproduce/understand your result, even if you have lost/forgot the command you used to create the file. This feature is present in all of Gnuastro’s programs, not just NoiseChisel.

The rest of the HDUs in NoiseChisel have data. So let’s open them in a DS9 window and then describe each:

$ astscript-fits-view xdf-f160w_detected.fits

A “cube” window opens along with DS9’s main window. The buttons and horizontal scroll bar in this small new window can be used to navigate between the extensions. In this mode, all DS9’s settings (for example, zoom or color-bar) will be identical between the extensions. Try zooming into one part and flipping through the extensions to see how the galaxies were detected along with the Sky and Sky standard deviation values for that region. Just have in mind that NoiseChisel’s job is only detection (separating signal from noise). We will do segmentation on this result later to find the individual galaxies/peaks over the detected pixels.

The second extension of NoiseChisel’s output (numbered 1, named INPUT-NO-SKY) is the Sky-subtracted input that you provided. The third (DETECTIONS) is NoiseChisel’s main output which is a binary image with only two possible values for all pixels: 0 for noise and 1 for signal. Since it only has two values, to avoid taking too much space on your computer, its numeric datatype an unsigned 8-bit integer (or uint8)33. The fourth and fifth (SKY and SKY_STD) extensions, have the Sky and its standard deviation values for the input on a tile grid and were calculated over the undetected regions (for more on the importance of the Sky value, see Sky value).

Each HDU/extension in a FITS file is an independent dataset (image or table) which you can delete from the FITS file, or copy/cut to another file. For example, with the command below, you can copy NoiseChisel’s DETECTIONS HDU/extension to another file:

$ astfits xdf-f160w_detected.fits --copy=DETECTIONS -odetections.fits

There are similar options to conveniently cut (--cut, copy, then remove from the input) or delete (--remove) HDUs from a FITS file also. See HDU information and manipulation for more.


2.1.11 NoiseChisel optimization for detection

In NoiseChisel and Multi-Extension FITS files, we ran NoiseChisel and reviewed NoiseChisel’s output format. Now that you have a better feeling for multi-extension FITS files, let’s optimize NoiseChisel for this particular dataset.

One good way to see if you have missed any signal (small galaxies, or the wings of brighter galaxies) is to mask all the detected pixels and inspect the noise pixels. For this, you can use Gnuastro’s Arithmetic program (in particular its where operator, see Arithmetic operators). The command below will produce mask-det.fits. In it, all the pixels in the INPUT-NO-SKY extension that are flagged 1 in the DETECTIONS extension (dominated by signal, not noise) will be set to NaN.

Since the various extensions are in the same file, for each dataset we need the file and extension name. To make the command easier to read/write/understand, let’s use shell variables: ‘in’ will be used for the Sky-subtracted input image and ‘det’ will be used for the detection map. Recall that a shell variable’s value can be retrieved by adding a $ before its name, also note that the double quotations are necessary when we have white-space characters in a variable value (like this case).

$ in="xdf-f160w_detected.fits -hINPUT-NO-SKY"
$ det="xdf-f160w_detected.fits -hDETECTIONS"
$ astarithmetic $in $det nan where --output=mask-det.fits

To invert the result (only keep the detected pixels), you can flip the detection map (from 0 to 1 and vice-versa) by adding a ‘not’ after the second $det:

$ astarithmetic $in $det not nan where --output=mask-sky.fits

Look again at the DETECTIONS extension, in particular the long worm-like structure around 34 pixel 1650 (X) and 1470 (Y). These types of long wiggly structures show that we have dug too deep into the noise, and are a signature of correlated noise. Correlated noise is created when we warp (for example, rotate) individual exposures (that are each slightly offset compared to each other) into the same pixel grid before adding them into one deeper image. During the warping, nearby pixels are mixed and the effect of this mixing on the noise (which is in every pixel) is called “correlated noise”. Correlated noise is a form of convolution and it slightly smooths the image.

In terms of the number of exposures (and thus correlated noise), the XDF dataset is by no means an ordinary dataset. Therefore the default parameters need to be slightly customized. It is the result of warping and adding roughly 80 separate exposures which can create strong correlated noise/smoothing. In common surveys the number of exposures is usually 10 or less. See Figure 2 of Akhlaghi 2019 and the discussion on --detgrowquant there for more on how NoiseChisel “grow”s the detected objects and the patterns caused by correlated noise.

Let’s tweak NoiseChisel’s configuration a little to get a better result on this dataset. Do not forget that “Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer” (Anscombe 1973, see Gnuastro manifesto: Science and its tools). A good scientist must have a good understanding of her tools to make a meaningful analysis. So do not hesitate in playing with the default configuration and reviewing the manual when you have a new dataset (from a new instrument) in front of you. Robust data analysis is an art, therefore a good scientist must first be a good artist. Once you have found the good configuration for that particular noise pattern (instrument) you can safely use it for all new data that have a similar noise pattern.

NoiseChisel can produce “Check images” to help you visualize and inspect how each step is done. You can see all the check images it can produce with this command.

$ astnoisechisel --help | grep check

Let’s check the overall detection process to get a better feeling of what NoiseChisel is doing with the following command. To learn the details of NoiseChisel in more detail, please see NoiseChisel, Akhlaghi and Ichikawa 2015 and Akhlaghi 2019.

$ astnoisechisel flat-ir/xdf-f160w.fits --checkdetection

The check images/tables are also multi-extension FITS files. As you saw from the command above, when check datasets are requested, NoiseChisel will not go to the end. It will abort as soon as all the extensions of the check image are ready. Please list the extensions of the output with astfits and then opening it with ds9 as we done above. If you have read the paper, you will see why there are so many extensions in the check image.

$ astfits xdf-f160w_detcheck.fits
$ astscript-fits-view xdf-f160w_detcheck.fits

In order to understand the parameters and their biases (especially as you are starting to use Gnuastro, or running it a new dataset), it is strongly encouraged to play with the different parameters and use the respective check images to see which step is affected by your changes and how, for example, see Detecting large extended targets.

Let’s focus on one step: the OPENED_AND_LABELED extension shows the initial detection step of NoiseChisel. We see the seeds of that correlated noise structure with many small detections (a relatively early stage in the processing). Such connections at the lowest surface brightness limits usually occur when the dataset is too smoothed, the threshold is too low, or the final “growth” is too much.

As you see from the 2nd (CONVOLVED) extension, the first operation that NoiseChisel does on the data is to slightly smooth it. However, the natural correlated noise of this dataset is already one level of artificial smoothing, so further smoothing it with the default kernel may be the culprit. To see the effect, let’s use a sharper kernel as a first step to convolve/smooth the input.

By default NoiseChisel uses a Gaussian with full-width-half-maximum (FWHM) of 2 pixels. We can use Gnuastro’s MakeProfiles to build a kernel with FWHM of 1.5 pixel (truncated at 5 times the FWHM, like the default) using the following command. MakeProfiles is a powerful tool to build any number of mock profiles on one image or independently, to learn more of its features and capabilities, see MakeProfiles.

$ astmkprof --kernel=gaussian,1.5,5 --oversample=1

Please open the output kernel.fits and have a look (it is very small and sharp). We can now tell NoiseChisel to use this instead of the default kernel with the following command (we will keep the --checkdetection to continue checking the detection steps)

$ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits  \
                 --checkdetection

Open the output xdf-f160w_detcheck.fits as a multi-extension FITS file and go to the last extension (DETECTIONS-FINAL, it is the same pixels as the final NoiseChisel output without --checkdetections). Look again at that position mentioned above (1650,1470), you see that the long wiggly structure is gone. This shows we are making progress :-).

Looking at the new OPENED_AND_LABELED extension, we see that the thin connections between smaller peaks has now significantly decreased. Going two extensions/steps ahead (in the first HOLES-FILLED), you can see that during the process of finding false pseudo-detections, too many holes have been filled: do you see how the many of the brighter galaxies are connected? At this stage all holes are filled, irrespective of their size.

Try looking two extensions ahead (in the first PSEUDOS-FOR-SN), you can see that there are not too many pseudo-detections because of all those extended filled holes. If you look closely, you can see the number of pseudo-detections in the printed outputs of NoiseChisel (around 6400). This is another side-effect of correlated noise. To address it, we should slightly increase the pseudo-detection threshold (before changing --dthresh, run with -P to see the default value):

$ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \
                 --dthresh=0.1 --checkdetection

Before visually inspecting the check image, you can already see the effect of this small change in NoiseChisel’s command-line output: notice how the number of pseudo-detections has increased to more than 7100! Open the check image now and have a look, you can see how the pseudo-detections are distributed much more evenly in the blank sky regions of the PSEUDOS-FOR-SN extension.

Maximize the number of pseudo-detections: When using NoiseChisel on datasets with a new noise-pattern (for example, going to a Radio astronomy image, or a shallow ground-based image), play with --dthresh until you get a maximal number of pseudo-detections: the total number of pseudo-detections is printed on the command-line when you run NoiseChisel, you do not even need to open a FITS viewer.

In this particular case, try --dthresh=0.2 and you will see that the total printed number decreases to around 6700 (recall that with --dthresh=0.1, it was roughly 7100). So for this type of very deep HST images, we should set --dthresh=0.1.

As discussed in Section 3.1.5 of Akhlaghi and Ichikawa 2015, the signal-to-noise ratio of pseudo-detections are critical to identifying/removing false detections. For an optimal detection they are very important to get right (where you want to detect the faintest and smallest objects in the image successfully). Let’s have a look at their signal-to-noise distribution with --checksn.

$ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits  \
                 --dthresh=0.1 --checkdetection --checksn

The output (xdf-f160w_detsn.fits) contains two extensions for the pseudo-detections containing two-column tables over the undetected (SKY_PSEUDODET_SN) regions and those over detections (DET_PSEUDODET_SN). With the first command below you can see the HDUs of this file, and with the second you can see the information of the table in the first HDU (which is the default when you do not use --hdu):

$ astfits xdf-f160w_detsn.fits
$ asttable xdf-f160w_detsn.fits -i

You can see the table columns with the first command below and get a feeling of the signal-to-noise value distribution with the second command (the two Table and Statistics programs will be discussed later in the tutorial):

$ asttable xdf-f160w_detsn.fits -hSKY_PSEUDODET_SN
$ aststatistics xdf-f160w_detsn.fits -hSKY_PSEUDODET_SN -c2
... [output truncated] ...
Histogram:
 |           *
 |          ***
 |         ******
 |        *********
 |        **********
 |       *************
 |      *****************
 |     ********************
 |    **************************
 |   ********************************
 |*******************************************************   * **       *
 |----------------------------------------------------------------------

The correlated noise is again visible in the signal-to-noise distribution of sky pseudo-detections! Do you see how skewed this distribution is? In an image with less correlated noise, this distribution would be much more symmetric. A small change in the quantile will translate into a big change in the S/N value. For example, see the difference between the three 0.99, 0.95 and 0.90 quantiles with this command:

$ aststatistics xdf-f160w_detsn.fits -hSKY_PSEUDODET_SN -c2      \
                --quantile=0.99 --quantile=0.95 --quantile=0.90

We get a change of almost 2 units (which is very significant). If you run NoiseChisel with -P, you’ll see the default signal-to-noise quantile --snquant is 0.99. In effect with this option you specify the purity level you want (contamination by false detections). With the aststatistics command above, you see that a small number of extra false detections (impurity) in the final result causes a big change in completeness (you can detect more lower signal-to-noise true detections). So let’s loosen-up our desired purity level, remove the check-image options, and then mask the detected pixels like before to see if we have missed anything.

$ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits  \
                 --dthresh=0.1 --snquant=0.95
$ in="xdf-f160w_detected.fits -hINPUT-NO-SKY"
$ det="xdf-f160w_detected.fits -hDETECTIONS"
$ astarithmetic $in $det nan where --output=mask-det.fits

Overall it seems good, but if you play a little with the color-bar and look closer in the noise, you’ll see a few very sharp, but faint, objects that have not been detected. For example, the object around pixel (456, 1662). Despite its high valued pixels, this object was lost because erosion ignores the precise pixel values. Losing small/sharp objects like this only happens for under-sampled datasets like HST (where the pixel size is larger than the point spread function FWHM). So this will not happen on ground-based images.

To address this problem of sharp objects, we can use NoiseChisel’s --noerodequant option. All pixels above this quantile will not be eroded, thus allowing us to preserve small/sharp objects (that cover a small area, but have a lot of signal in it). Check its default value, then run NoiseChisel like below and make the mask again.

$ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits     \
                 --noerodequant=0.95 --dthresh=0.1 --snquant=0.95

This seems to be fine and the object above is now detected. We will stop editing the configuration of NoiseChisel here, but please feel free to keep looking into the data to see if you can improve it even more.

Once you have found the proper configuration for the type of images you will be using you do not need to change them any more. The same configuration can be used for any dataset that has been similarly produced (and has a similar noise pattern). But entering all these options on every call to NoiseChisel is annoying and prone to bugs (mistakenly typing the wrong value for example). To simplify things, we will make a configuration file in a visible config directory. Then we will define the hidden .gnuastro directory (that all Gnuastro’s programs will look into for configuration files) as a symbolic link to the config directory. Finally, we will write the finalized values of the options into NoiseChisel’s standard configuration file within that directory. We will also put the kernel in a separate directory to keep the top directory clean of any files we later need.

$ mkdir kernel config
$ ln -s config/ .gnuastro
$ mv kernel.fits kernel/noisechisel.fits
$ echo "kernel kernel/noisechisel.fits" > config/astnoisechisel.conf
$ echo "noerodequant 0.95"             >> config/astnoisechisel.conf
$ echo "dthresh      0.1"              >> config/astnoisechisel.conf
$ echo "snquant      0.95"             >> config/astnoisechisel.conf

We are now ready to finally run NoiseChisel on the three filters and keep the output in a dedicated directory (which we will call nc for simplicity).

$ rm *.fits
$ mkdir nc
$ for f in f105w f125w f160w; do \
    astnoisechisel flat-ir/xdf-$f.fits --output=nc/xdf-$f.fits; \
  done

2.1.12 NoiseChisel optimization for storage

As we showed before (in NoiseChisel and Multi-Extension FITS files), NoiseChisel’s output is a multi-extension FITS file with several images the same size as the input. As the input datasets get larger this output can become hard to manage and waste a lot of storage space. Fortunately there is a solution to this problem (which is also useful for Segment’s outputs).

In this small section we will take a short detour to show this feature. Please note that the outputs generated here are not needed for the rest of the tutorial. But first, let’s have a look at the contents/HDUs and volume of NoiseChisel’s output from NoiseChisel optimization for detection (fast answer, it is larger than 100 mega-bytes):

$ astfits nc/xdf-f160w.fits
$ ls -lh nc/xdf-f160w.fits

Two options can drastically decrease NoiseChisel’s output file size: 1) With the --rawoutput option, NoiseChisel will not create a Sky-subtracted output. After all, it is redundant: you can always generate it by subtracting the SKY extension from the input image (which you have in your database) using the Arithmetic program. 2) With the --oneelempertile, you can tell NoiseChisel to store its Sky and Sky standard deviation results with one pixel per tile (instead of many pixels per tile). So let’s run NoiseChisel with these options, then have another look at the HDUs and the over-all file size:

$ astnoisechisel flat-ir/xdf-f160w.fits --oneelempertile --rawoutput \
                 --output=nc-for-storage.fits
$ astfits nc-for-storage.fits
$ ls -lh nc-for-storage.fits

See how nc-for-storage.fits has four HDUs, while nc/xdf-f160w.fits had five HDUs? As explained above, the missing extension is INPUT-NO-SKY. Also, look at the sizes of the SKY and SKY_STD HDUs, unlike before, they are not the same size as DETECTIONS, they only have one pixel for each tile (group of pixels in raw input). Finally, you see that nc-for-storage.fits is just under 8 mega bytes (while nc/xdf-f160w.fits was 100 mega bytes)!

But we are not yet finished! You can even be more efficient in storage, archival or transferring NoiseChisel’s output by compressing this file. Try the command below to see how NoiseChisel’s output has now shrunk to about 250 kilo-byes while keeping all the necessary information as the original 100 mega-byte output.

$ gzip --best nc-for-storage.fits
$ ls -lh nc-for-storage.fits.gz

We can get this wonderful level of compression because NoiseChisel’s output is binary with only two values: 0 and 1. Compression algorithms are highly optimized in such scenarios.

You can open nc-for-storage.fits.gz directly in SAO DS9 or feed it to any of Gnuastro’s programs without having to decompress it. Higher-level programs that take NoiseChisel’s output (for example, Segment or MakeCatalog) can also deal with this compressed image where the Sky and its Standard deviation are one pixel-per-tile. You just have to give the “values” image as a separate option, for more, see Segment and MakeCatalog.

Segment (the program we will introduce in the next section for identifying sub-structure), also has similar features to optimize its output for storage. Since this file was only created for a fast detour demonstration, let’s keep our top directory clean and move to the next step:

rm nc-for-storage.fits.gz

2.1.13 Segmentation and making a catalog

The main output of NoiseChisel is the binary detection map (DETECTIONS extension, see NoiseChisel optimization for detection). It only has two values: 1 or 0. This is useful when studying the noise or background properties, but hardly of any use when you actually want to study the targets/galaxies in the image, especially in such a deep field where almost everything is connected. To find the galaxies over the detections, we will use Gnuastro’s Segment program:

$ mkdir seg
$ astsegment nc/xdf-f160w.fits -oseg/xdf-f160w.fits
$ astsegment nc/xdf-f125w.fits -oseg/xdf-f125w.fits
$ astsegment nc/xdf-f105w.fits -oseg/xdf-f105w.fits

Segment’s operation is very much like NoiseChisel (in fact, prior to version 0.6, it was part of NoiseChisel). For example, the output is a multi-extension FITS file (previously discussed in NoiseChisel and Multi-Extension FITS files), it has check images and uses the undetected regions as a reference (previously discussed in NoiseChisel optimization for detection). Please have a look at Segment’s multi-extension output to get a good feeling of what it has done. Do not forget to flip through the extensions in the “Cube” window.

$ astscript-fits-view seg/xdf-f160w.fits

Like NoiseChisel, the first extension is the input. The CLUMPS extension shows the true “clumps” with values that are \(\ge1\), and the diffuse regions labeled as \(-1\). Please flip between the first extension and the clumps extension and zoom-in on some of the clumps to get a feeling of what they are. In the OBJECTS extension, we see that the large detections of NoiseChisel (that may have contained many galaxies) are now broken up into separate labels. Play with the color-bar and hover your mouse of the various detections to see their different labels.

The clumps are not affected by the hard-to-deblend and low signal-to-noise diffuse regions, they are more robust for calculating the colors (compared to objects). From this step onward, we will continue with clumps.

Having localized the regions of interest in the dataset, we are ready to do measurements on them with MakeCatalog. MakeCatalog is specialized and optimized for doing measurements over labeled regions of an image. In other words, through MakeCatalog, you can “reduce” an image to a table (catalog of certain properties of objects in the image). Each requested measurement (over each label) will be given a column in the output table. To see the full set of available measurements run it with --help like below (and scroll up), note that measurements are classified by context.

$ astmkcatalog --help

So let’s select the properties we want to measure in this tutorial. First of all, we need to know which measurement belongs to which object or clump, so we will start with the --ids (read as: IDs35). We also want to measure (in this order) the Right Ascension (with --ra), Declination (--dec), magnitude (--magnitude), and signal-to-noise ratio (--sn) of the objects and clumps. Furthermore, as mentioned above, we also want measurements on clumps, so we also need to call --clumpscat. The following command will make these measurements on Segment’s F160W output and write them in a catalog for each object and clump in a FITS table. For more on the zero point, see Brightness, Flux, Magnitude and Surface brightness.

$ mkdir cat
$ astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \
               --zeropoint=25.94 --clumpscat --output=cat/xdf-f160w.fits

From the printed statements on the command-line, you see that MakeCatalog read all the extensions in Segment’s output for the various measurements it needed. Let’s look at the output of the command above:

$ astfits cat/xdf-f160w.fits

You will see that the output of the MakeCatalog has two extensions. The first extension shows the measurements over the OBJECTS, and the second extension shows the measurements over the clumps CLUMPS.

To calculate colors, we also need magnitude measurements on the other filters. So let’s repeat the command above on them, just changing the file names and zero point (which we got from the XDF survey web page):

$ astmkcatalog seg/xdf-f125w.fits --ids --ra --dec --magnitude --sn \
               --zeropoint=26.23 --clumpscat --output=cat/xdf-f125w.fits

$ astmkcatalog seg/xdf-f105w.fits --ids --ra --dec --magnitude --sn \
               --zeropoint=26.27 --clumpscat --output=cat/xdf-f105w.fits

However, the galaxy properties might differ between the filters (which is the whole purpose behind observing in different filters!). Also, the noise properties and depth of the datasets differ. You can see the effect of these factors in the resulting clump catalogs, with Gnuastro’s Table program. We will go deep into working with tables in the next section, but in summary: the -i option will print information about the columns and number of rows. To see the column values, just remove the -i option. In the output of each command below, look at the Number of rows:, and note that they are different.

$ asttable cat/xdf-f105w.fits -hCLUMPS -i
$ asttable cat/xdf-f125w.fits -hCLUMPS -i
$ asttable cat/xdf-f160w.fits -hCLUMPS -i

Matching the catalogs is possible (for example, with Match). However, the measurements of each column are also done on different pixels: the clump labels can/will differ from one filter to another for one object. Please open them and focus on one object to see for yourself. This can bias the result, if you match catalogs.

An accurate color calculation can only be done when magnitudes are measured from the same pixels on all images and this can be done easily with MakeCatalog. In fact this is one of the reasons that NoiseChisel or Segment do not generate a catalog like most other detection/segmentation software. This gives you the freedom of selecting the pixels for measurement in any way you like (from other filters, other software, manually, etc.). Fortunately in these images, the Point spread function (PSF) is very similar, allowing us to use a single labeled image output for all filters36.

The F160W image is deeper, thus providing better detection/segmentation, and redder, thus observing smaller/older stars and representing more of the mass in the galaxies. We will thus use the F160W filter as a reference and use its segment labels to identify which pixels to use for which objects/clumps. But we will do the measurements on the sky-subtracted F105W and F125W images (using MakeCatalog’s --valuesfile option) as shown below: Notice that the only difference between these calls and the call to generate the raw F160W catalog (excluding the zero point and the output name) is the --valuesfile.

$ astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \
               --valuesfile=nc/xdf-f125w.fits --zeropoint=26.23 \
               --clumpscat --output=cat/xdf-f125w-on-f160w-lab.fits

$ astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \
               --valuesfile=nc/xdf-f105w.fits --zeropoint=26.27 \
               --clumpscat --output=cat/xdf-f105w-on-f160w-lab.fits

After running the commands above, look into what MakeCatalog printed on the command-line. You can see that (as requested) the object and clump pixel labels in both were taken from the respective extensions in seg/xdf-f160w.fits. However, the pixel values and pixel Sky standard deviation were respectively taken from nc/xdf-f105w.fits and nc/xdf-f125w.fits. Since we used the same labeled image on all filters, the number of rows in both catalogs are now identical. Let’s have a look:

$ asttable cat/xdf-f105w-on-f160w-lab.fits -hCLUMPS -i
$ asttable cat/xdf-f125w-on-f160w-lab.fits -hCLUMPS -i
$ asttable cat/xdf-f160w.fits -hCLUMPS -i

Finally, MakeCatalog also does basic calculations on the full dataset (independent of each labeled region but related to whole data), for example, pixel area or per-pixel surface brightness limit. They are stored as keywords in the FITS headers (or lines starting with # in plain text). This (and other ways to measure the limits of your dataset) are discussed in the next section: Measuring the dataset limits.


2.1.14 Measuring the dataset limits

In Segmentation and making a catalog, we created a catalog of the different objects with the image. Before measuring colors, or doing any other kind of analysis on the catalogs (and detected objects), it is very important to understand the limitations of the dataset. Without understanding the limitations of your dataset, you cannot make any physical interpretation of your results. The theory behind the calculations discussed here is thoroughly introduced in Quantifying measurement limits.

For example, with the command below, let’s sort all the detected clumps in the image by magnitude (with --sort=magnitude) and and print the magnitude and signal-to-noise ratio (S/N; with -cmagnitude,sn):

$ asttable cat/xdf-f160w.fits -hclumps -cmagnitude,sn \
           --sort=magnitude --noblank=magnitude

As you see, we have clumps with a total magnitude of almost 32! This is extremely faint! Are these things trustable? Let’s have a look at all of those with a magnitude between 31 and 32 with the command below. We are first using Table to only keep the relevant columns rows, and using Gnuastro’s DS9 region file creation script (astscript-ds9-region) to generate DS9 region files, and open DS9:

$ asttable cat/xdf-f160w.fits -hclumps -cra,dec \
           --range=magnitude,31:32  \
      | astscript-ds9-region -c1,2 --radius=0.5 \
           --command="ds9 -mecube seg/xdf-f160w.fits -zscale"

Zoom-out a little and you will see some green circles (DS9 region files) in some regions of the image. There actually does seem to be a true peak under the selected regions, but as you see, they are very small, diffuse and noisy. How reliable are the measured magnitudes? Using the S/N column from the first command above, you can see that such objects only have a signal to noise of about 2.6 (which is indeed too low for most analysis purposes)

$ asttable cat/xdf-f160w.fits -hclumps -csn \
           --range=magnitude,31:32 | aststatistics

This brings us to the first method of quantifying your dataset’s magnitude limit, which is also sometimes called detection limit (see Magnitude limit of image). To estimate the \(5\sigma\) detection limit of your dataset, you simply report the median magnitude of the objects that have a signal to noise of (approximately) five. This is very easy to calculate with the command below:

$ asttable cat/xdf-f160w.fits -hclumps --range=sn,4.8:5.2 -cmagnitude \
           | aststatistics --median
29.9949

Let’s have a look at these objects, to get a feeling of what these clump looks like:

$ asttable cat/xdf-f160w.fits -hclumps --range=sn,4.8:5.2 \
           -cra,dec,magnitude \
           | astscript-ds9-region -c1,2 --namecol=3 \
                      --width=2 --radius=0.5 \
                      --command="ds9 -mecube seg/xdf-f160w.fits -zscale"

The number you see on top of each region is the clump’s magnitude. Please go over the objects and have a close look at them! It is very important to have a feeling of what your dataset looks like, and how to interpret the numbers to associate an image with them.

Generally, they look very small with different levels of diffuse-ness! Those that are sharper make more visual sense (to be \(5\sigma\) detections), but the more diffuse ones extend over a larger area. Furthermore, the noise is measured on individual pixel measurements. However, during the reduction many exposures are co-added and stacked, mixing the pixels like a small convolution (creating “correlated noise”). Therefore you clearly see two main issues with the detection limit as defined above: it depends on the morphology, and it does not take into account the correlated noise.

A more realistic way to estimate the significance of the detection is to take its footprint, randomly place it in thousands of undetected regions of the image and use that distribution as a reference. This is technically known as upper-limit measurements. For a full discussion, see Upper limit magnitude of each detection).

Since it is for each separate object, the upper-limit measurements should be requested as extra columns in MakeCatalog’s output. For example, with the command below, let’s generate a new catalog of the F160W filter, but with two extra columns compared to the one in cat/: the upper-limit magnitude and the upper-limit multiple of sigma.

$ astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \
               --zeropoint=25.94 --clumpscat --upnsigma=3 \
               --upperlimit-mag --upperlimit-sigma \
               --output=xdf-f160w.fits

Let’s compare the upper-limit magnitude with the measured magnitude of each clump:

$ asttable xdf-f160w.fits -hclumps -cmagnitude,upperlimit_mag

As you see, in almost all of the cases, the measured magnitude is sufficiently higher than the upper-limit magnitude. Let’s subtract the latter from the former to better see this difference in a third column:

$ asttable xdf-f160w.fits -hclumps -cmagnitude,upperlimit_mag \
           -c'arith upperlimit_mag magnitude -'

The ones with a positive third column (difference) show that the clump has sufficiently higher brightness than the noisy background to be usable. Let’s use Table’s Column arithmetic to find only those that have a negative difference:

$ asttable xdf-f160w.fits -hclumps -cra,dec --noblankend=3 \
      -c'arith upperlimit_mag magnitude - set-d d d 0 gt nan where'

From more than 3500 clumps, this command only gave \(\sim150\) rows (this number may slightly change on different runs due to the random nature of the upper-limit sampling37)! Let’s have a look at them:

$ asttable xdf-f160w.fits -hclumps -cra,dec --noblankend=3 \
      -c'arith upperlimit_mag magnitude - set-d d d 0 gt nan where' \
      | astscript-ds9-region -c1,2 --namecol=3 --width=2 \
                  --radius=0.5 \
                  --command="ds9 -mecube seg/xdf-f160w.fits -zscale"

You see that they are all extremely faint and diffuse/small peaks. Therefore, if an object’s magnitude is fainter than its upper-limit magnitude, you should not use the magnitude: it is not accurate! You should use the upper-limit magnitude instead (with an arrow in your plots to mark which ones are upper-limits).

But the main point (in relation to the magnitude limit) with the upper-limit, is the UPPERLIMIT_SIGMA column. you can think of this as a realistic S/N for extremely faint/diffuse/small objects). The raw S/N column is simply calculated on a pixel-by-pixel basis, however, the upper-limit sigma is produced by actually taking the label’s footprint, and randomly placing it thousands of time over un-detected parts of the image and measuring the brightness of the sky. The clump’s brightness is then divided by the standard deviation of the resulting distribution to give you exactly how significant it is (accounting for inter-pixel issues like correlated noise, which are strong in this dataset). You can actually compare the two values with the command below:

$ asttable xdf-f160w.fits -hclumps -csn,upperlimit_sigma

As you see, the second column (upper-limit sigma) is almost always less than the S/N. This clearly shows the effect of correlated noise! If you now use this column as the reference for deriving the magnitude limit, you will see that it will shift by almost 0.5 magnitudes brighter and is now more reasonable:

$ asttable xdf-f160w.fits -hclumps --range=upperlimit_sigma,4.8:5.2 \
           -cmagnitude | aststatistics --median
29.6257

We see that the \(5\sigma\) detection limit is \(\sim29.6\)! This is extremely deep! For example, in the Legacy Survey38, the \(5\sigma\) detection limit for point sources is approximately 24.5 (5 magnitudes, or 100 times, shallower than this image).

As mentioned above, an important caveat in this simple calculation is that we should only be looking at point-like objects, not simply everything. This is because the shape or radial slope of the profile has an important effect on this measurement: at the same total magnitude, a sharper object will have a higher S/N. To be more precise, we should first perform star-galaxy separation, then do this only for the objects that are classified as stars. A crude, first-order, method is to use the --axis-ratio option so MakeCatalog also measures the axis ratio, then call Table with --range=upperlimit_sigma,,4.8:5.2 and --range=axis_ratio,0.95:1 (in one command). Please do this for yourself as an exercise to see the difference with the result above.

Before continuing, let’s remove this temporarily produced catalog:

$ rm xdf-f160w.fits

Another measure of the dataset’s limit is the completeness limit (Completeness limit of each detection). This is necessary when you are looking at populations of objects over the image. You want to know until what magnitude you can be sure that you have detected an object (if it was present). As described in Completeness limit of each detection, the best way to do this is with mock images. But a crude, first order result can be obtained from the actual image: by simply plotting the histogram of the magnitudes:

$ aststatistics cat/xdf-f160w.fits -hclumps -cmagnitude
...
Histogram:
 |                                                           *
 |                                                      ** ****
 |                                                   ***********
 |                                                 *************
 |                                                ****************
 |                                             *******************
 |                                           **********************
 |                                        **************************
 |                                 *********************************
 |                      *********************************************
 |* *   ** ** **********************************************************
 |----------------------------------------------------------------------

This plot (the histogram of magnitudes; where fainter magnitudes are towards the right) is technically called the dataset’s number count plot. You see that the number of objects increases with magnitude as the magnitudes get fainter (to the right). However, beyond a certain magnitude, you see it becomes flat, and soon afterwards, the numbers suddenly drop.

Once you have your catalog, you can easily find this point with the two commands below. First we generate a histogram with fewer bins (to have more numbers in each bin). We then use AWK to find the magnitude bin where the number of points decrease compared to the previous bin. But we only do this for bins that have more than 50 items (to avoid scatter in the bright end). Finally, in Statistics, we have manually set the magnitude range and number of bins so each bin is roughly 0.5 magnitudes thick (with --greaterequal=20, --lessthan=32 and --numbins=24)

$ aststatistics cat/xdf-f160w.fits -hclumps -cmagnitude --histogram \
                --greaterequal=20 --lessthan=32 --numbins=24 \
                --output=f160w-hist.txt
$ asttable f160w-hist.txt \
           | awk '$2>50 && $2<prev{print prevbin; exit} \
                  {prev=$2; prevbin=$1}'
28.932122667631

Therefore, to first order (and very crudely!) we can say that if an object is in our field of view and has a magnitude of \(\sim29\) or brighter, we can be highly confident that we have detected it. But before continuing, let’s clean up behind ourselves:

$ rm f160w-hist.txt

Another important limiting parameter in a processed dataset is the surface brightness limit (Surface brightness limit of image). The surface brightness limit of a dataset is an important measure for extended structures (for example, when you want to look at the outskirts of galaxies). In the next tutorial, we have thoroughly described the derivation of the surface brightness limit of a dataset. So we will just show the final result here, and encourage you to follow up with that tutorial after finishing this tutorial (see Image surface brightness limit)

By default, MakeCatalog will estimate the surface brightness limit of a given dataset, and put it in the keywords of the output (all keywords starting with SBL, which is short for surface brightness limit):

$ astfits cat/xdf-f160w.fits -h1 | grep SBL

As you see, the only one with a unit of mag/arcsec^2 is SBLMAG. It contains the surface brightness limit of the input dataset over SBLAREA arcsec\(^2\) with SBLNSIG multiples of \(\sigma\). In the current version of Gnuastro, SBLAREA=100 and SBLNSIG=3, so the surface brightness limit of this image is 32.66 mag/arcsec\(^2\) (\(3\sigma\), over 100 arcsec\(^2\)). Therefore, if this default area and multiple of sigma are fine for you39 (these are the most commonly used values), you can simply read the image surface brightness limit from the catalogs produced by MakeCatalog with this command:

$ astfits cat/*.fits -h1 --keyvalue=SBLMAG

2.1.15 Working with catalogs (estimating colors)

In the previous step we generated catalogs of objects and clumps over our dataset (see Segmentation and making a catalog). The catalogs are available in the two extensions of the single FITS file40. Let’s see the extensions and their basic properties with the Fits program:

$ astfits  cat/xdf-f160w.fits              # Extension information

Let’s inspect the table in each extension with Gnuastro’s Table program (see Table). We should have used -hOBJECTS and -hCLUMPS instead of -h1 and -h2 respectively. The numbers are just used here to convey that both names or numbers are possible, in the next commands, we will just use names.

$ asttable cat/xdf-f160w.fits -h1 --info   # Objects catalog info.
$ asttable cat/xdf-f160w.fits -h1          # Objects catalog columns.
$ asttable cat/xdf-f160w.fits -h2 -i       # Clumps catalog info.
$ asttable cat/xdf-f160w.fits -h2          # Clumps catalog columns.

As you see above, when given a specific table (file name and extension), Table will print the full contents of all the columns. To see the basic metadata about each column (for example, name, units and comments), simply append a --info (or -i) to the command.

To print the contents of special column(s), just give the column number(s) (counting from 1) or the column name(s) (if they have one) to the --column (or -c) option. For example, if you just want the magnitude and signal-to-noise ratio of the clumps (in the clumps catalog), you can get it with any of the following commands

$ asttable cat/xdf-f160w.fits -hCLUMPS --column=5,6
$ asttable cat/xdf-f160w.fits -hCLUMPS -c5,SN
$ asttable cat/xdf-f160w.fits -hCLUMPS -c5         -c6
$ asttable cat/xdf-f160w.fits -hCLUMPS -cMAGNITUDE -cSN

Similar to HDUs, when the columns have names, always use the name: it is so common to mis-write numbers or forget the order later! Using column names instead of numbers has many advantages:

  1. You do not have to worry about the order of columns in the table.
  2. It acts as a documentation in the script.
  3. Column meta-data (including a name) are not just limited to FITS tables and can also be used in plain text tables, see Gnuastro text table format.

Table also has tools to limit the displayed rows. For example, with the first command below only rows with a magnitude in the range of 29 to 30 will be shown. With the second command, you can further limit the displayed rows to rows with an S/N larger than 10 (a range between 10 to infinity). You can further sort the output rows, only show the top (or bottom) N rows, etc., see Table for more.

$ asttable cat/xdf-f160w.fits -hCLUMPS --range=MAGNITUDE,28:29
$ asttable cat/xdf-f160w.fits -hCLUMPS \
           --range=MAGNITUDE,28:29 --range=SN,10:inf

Now that you are comfortable in viewing table columns and rows, let’s look into merging columns of multiple tables into one table (which is necessary for measuring the color of the clumps). Since cat/xdf-f160w.fits and cat/xdf-f105w-on-f160w-lab.fits have exactly the same number of rows and the rows correspond to the same clump, let’s merge them to have one table with magnitudes in both filters.

We can merge columns with the --catcolumnfile option like below. You give this option a file name (which is assumed to be a table that has the same number of rows as the main input), and all the table’s columns will be concatenated/appended to the main table. Now, try it out with the commands below. We will first look at the metadata of the first table (only the CLUMPS extension). With the second command, we will concatenate the two tables and write them in, two-in-one.fits and finally, we will check the new catalog’s metadata.

$ asttable cat/xdf-f160w.fits -i -hCLUMPS
$ asttable cat/xdf-f160w.fits -hCLUMPS --output=two-in-one.fits \
           --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \
           --catcolumnhdu=CLUMPS
$ asttable two-in-one.fits -i

By comparing the two metadata, we see that both tables have the same number of rows. But what might have attracted your attention more, is that two-in-one.fits has double the number of columns (as expected, after all, you merged both tables into one file, and did not ask for any specific column). In fact you can concatenate any number of other tables in one command, for example:

$ asttable cat/xdf-f160w.fits -hCLUMPS --output=three-in-one.fits \
           --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \
           --catcolumnfile=cat/xdf-f105w-on-f160w-lab.fits \
           --catcolumnhdu=CLUMPS --catcolumnhdu=CLUMPS
$ asttable three-in-one.fits -i

As you see, to avoid confusion in column names, Table has intentionally appended a -1 to the column names of the first concatenated table if the column names are already present in the original table. For example, we have the original RA column, and another one called RA-1). Similarly a -2 has been added for the columns of the second concatenated table.

However, this example clearly shows a problem with this full concatenation: some columns are identical (for example, HOST_OBJ_ID and HOST_OBJ_ID-1), or not needed (for example, RA-1 and DEC-1 which are not necessary here). In such cases, you can use --catcolumns to only concatenate certain columns, not the whole table. For example, this command:

$ asttable cat/xdf-f160w.fits -hCLUMPS --output=two-in-one-2.fits \
           --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \
           --catcolumnhdu=CLUMPS --catcolumns=MAGNITUDE
$ asttable two-in-one-2.fits -i

You see that we have now only appended the MAGNITUDE column of cat/xdf-f125w-on-f160w-lab.fits. This is what we needed to be able to later subtract the magnitudes. Let’s go ahead and add the F105W magnitudes also with the command below. Note how we need to call --catcolumnhdu once for every table that should be appended, but we only call --catcolumn once (assuming all the tables that should be appended have this column).

$ asttable cat/xdf-f160w.fits -hCLUMPS --output=three-in-one-2.fits \
           --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \
           --catcolumnfile=cat/xdf-f105w-on-f160w-lab.fits \
           --catcolumnhdu=CLUMPS --catcolumnhdu=CLUMPS \
           --catcolumns=MAGNITUDE
$ asttable three-in-one-2.fits -i

But we are not finished yet! There is a very big problem: it is not immediately clear which one of MAGNITUDE, MAGNITUDE-1 or MAGNITUDE-2 columns belong to which filter! Right now, you know this because you just ran this command. But in one hour, you’ll start doubting yourself and will be forced to go through your command history, trying to figure out if you added F105W first, or F125W. You should never torture your future-self (or your colleagues) like this! So, let’s rename these confusing columns in the matched catalog.

Fortunately, with the --colmetadata option, you can correct the column metadata of the final table (just before it is written). It takes four values: 1) the original column name or number, 2) the new column name, 3) the column unit and 4) the column comments. Since the comments are usually human-friendly sentences and contain space characters, you should put them in double quotations like below. For example, by adding three calls of this option to the previous command, we write the filter name in the magnitude column name and description.

$ asttable cat/xdf-f160w.fits -hCLUMPS --output=three-in-one-3.fits \
        --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \
        --catcolumnfile=cat/xdf-f105w-on-f160w-lab.fits \
        --catcolumnhdu=CLUMPS --catcolumnhdu=CLUMPS \
        --catcolumns=MAGNITUDE \
        --colmetadata=MAGNITUDE,MAG-F160W,log,"Magnitude in F160W." \
        --colmetadata=MAGNITUDE-1,MAG-F125W,log,"Magnitude in F125W." \
        --colmetadata=MAGNITUDE-2,MAG-F105W,log,"Magnitude in F105W."
$ asttable three-in-one-3.fits -i

We now have all three magnitudes in one table and can start doing arithmetic on them (to estimate colors, which are just a subtraction of magnitudes). To use column arithmetic, simply call the column selection option (--column or -c), put the value in single quotations and start the value with arith (followed by a space) like the example below. Column arithmetic uses the same “reverse polish notation” as the Arithmetic program (see Reverse polish notation), with almost all the same operators (see Arithmetic operators), and some column-specific operators (that are not available for images). In column-arithmetic, you can identify columns by number (prefixed with a $) or name, for more see Column arithmetic.

So let’s estimate one color from three-in-one-3.fits using column arithmetic. All the commands below will produce the same output, try them each and focus on the differences. Note that column arithmetic can be mixed with other ways to choose output columns (the -c option).

$ asttable three-in-one-3.fits -ocolor-cat.fits \
           -c1,2,3,4,'arith $5 $7 -'

$ asttable three-in-one-3.fits -ocolor-cat.fits \
           -c1,2,RA,DEC,'arith MAG-F125W MAG-F160W -'

$ asttable three-in-one-3.fits -ocolor-cat.fits -c1,2 \
           -cRA,DEC --column='arith MAG-F105W MAG-F160W -'

This example again highlights the important point on using column names: if you do not know the commands before, you have no way of making sense of the first command: what is in column 5 and 7? why not subtract columns 3 and 4 from each other? Do you see how cryptic the first one is? Then look at the last one: even if you have no idea how this table was created, you immediately understand the desired operation. When you have column names, please use them. If your table does not have column names, give them names with the --colmetadata (described above) as you are creating them. But how about the metadata for the column you just created with column arithmetic? Have a look at the column metadata of the table produced above:

$ asttable color-cat.fits -i

The name of the column produced by arithmetic column is ARITH_1! This is natural: Arithmetic has no idea what the modified column is! You could have multiplied two columns, or done much more complex transformations with many columns. Metadata cannot be set automatically, your (the human) input is necessary. To add metadata, you can use --colmetadata like before:

$ asttable three-in-one-3.fits -ocolor-cat.fits -c1,2,RA,DEC \
         --column='arith MAG-F105W MAG-F160W -' \
         --colmetadata=ARITH_1,F105W-F160W,log,"Magnitude difference"
$ asttable color-cat.fits -i

Sometimes, because of a particular way of storing data, you might need to take all input columns. If there are many columns (for example hundreds!), listing them (like above) will become annoying, buggy and time-consuming. In such cases, you can give -c_all. Upon execution, _all will be replaced with a comma-separated list of all the input columns. This allows you to add new columns easily, without having to worry about the number of input columns that you want anyway. A lower-level but more customizable method is to use the seq (sequence) command with the -s (separator) option set to ','). For example, if you have 216 columns and only want to return columns 1 and 2 as well as all the columns between 12 to 58 (inclusive), you can use the command below:

$ asttable table.fits -c1,2,$(seq -s',' 12 58)

We are now ready to make our final table. We want it to have the magnitudes in all three filters, as well as the three possible colors. Recall that by convention in astronomy colors are defined by subtracting the bluer magnitude from the redder magnitude. In this way a larger color value corresponds to a redder object. So from the three magnitudes, we can produce three colors (as shown below). Also, because this is the final table we are creating here and want to use it later, we will store it in cat/ and we will also give it a clear name and use the --range option to only print columns with a signal-to-noise ratio (SN column, from the F160W filter) above 5.

$ asttable three-in-one-3.fits --range=SN,5,inf -c1,2,RA,DEC,SN \
         -cMAG-F160W,MAG-F125W,MAG-F105W \
         -c'arith MAG-F125W MAG-F160W -' \
         -c'arith MAG-F105W MAG-F125W -' \
         -c'arith MAG-F105W MAG-F160W -' \
         --colmetadata=SN,SN-F160W,ratio,"F160W signal to noise ratio" \
         --colmetadata=ARITH_1,F125W-F160W,log,"Color F125W-F160W." \
         --colmetadata=ARITH_2,F105W-F125W,log,"Color F105W-F125W." \
         --colmetadata=ARITH_3,F105W-F160W,log,"Color F105W-F160W." \
         --output=cat/mags-with-color.fits
$ asttable cat/mags-with-color.fits -i

The table now has all the columns we need and it has the proper metadata to let us safely use it later (without frustrating over column orders!) or passing it to colleagues.

Let’s finish this section of the tutorial with a useful tip on modifying column metadata. Above, updating/changing column metadata was done with the --colmetadata in the same command that produced the newly created Table file. But in many situations, the table is already made and you just want to update the metadata of one column. In such cases using --colmetadata is over-kill (wasting CPU/RAM energy or time if the table is large) because it will load the full table data and metadata into memory, just change the metadata and write it back into a file.

In scenarios when the table’s data does not need to be changed and you just want to set or update the metadata, it is much more efficient to use basic FITS keyword editing. For example, in the FITS standard, column names are stored in the TTYPE header keywords, so let’s have a look:

$ asttable two-in-one.fits -i
$ astfits two-in-one.fits -h1 | grep TTYPE

Changing/updating the column names is as easy as updating the values to these keywords. You do not need to touch the actual data! With the command below, we will just update the MAGNITUDE and MAGNITUDE-1 columns (which are respectively stored in the TTYPE5 and TTYPE11 keywords) by modifying the keyword values and checking the effect by listing the column metadata again:

$ astfits two-in-one.fits -h1 \
          --update=TTYPE5,MAG-F160W \
          --update=TTYPE11,MAG-F125W
$ asttable two-in-one.fits -i

You can see that the column names have indeed been changed without touching any of the data. You can do the same for the column units or comments by modifying the keywords starting with TUNIT or TCOMM.

Generally, Gnuastro’s table is a very useful program in data analysis and what you have seen so far is just the tip of the iceberg. But to avoid making the tutorial even longer, we will stop reviewing the features here, for more, please see Table. Before continuing, let’s just delete all the temporary FITS tables we placed in the top project directory:

rm *.fits

2.1.16 Column statistics (color-magnitude diagram)

In Working with catalogs (estimating colors) we created a single catalog containing the magnitudes of our desired clumps in all three filters, and their colors. To start with, let’s inspect the distribution of three colors with the Statistics program.

$ aststatistics cat/mags-with-color.fits -cF105W-F125W
$ aststatistics cat/mags-with-color.fits -cF105W-F160W
$ aststatistics cat/mags-with-color.fits -cF125W-F160W

This tiny and cute ASCII histogram (and the general information printed above it) gives you a crude (but very useful and fast) feeling on the distribution. You can later use Gnuastro’s Statistics program with the --histogram option to build a much more fine-grained histogram as a table to feed into your favorite plotting program for a much more accurate/appealing plot (for example, with PGFPlots in LaTeX). If you just want a specific measure, for example, the mean, median and standard deviation, you can ask for them specifically, like below:

$ aststatistics cat/mags-with-color.fits -cF105W-F160W \
                --mean --median --std

The basic statistics we measured above were just on one column. In many scenarios this is fine, but things get much more exciting if you look at the correlation of two columns with each other. For example, let’s create the color-magnitude diagram for our measured targets.

In many papers, the color-magnitude diagram is usually plotted as a scatter plot. However, scatter plots have a major limitation when there are a lot of points and they cluster together in one region of the plot: the possible correlation in that dense region is lost (because the points fall over each other). In such cases, it is much better to use a 2D histogram. In a 2D histogram, the full range in both columns is divided into discrete 2D bins (or pixels!) and we count how many objects fall in that 2D bin.

Since a 2D histogram is a pixelated space, we can simply save it as a FITS image and view it in a FITS viewer. Let’s do this in the command below. As is common with color-magnitude plots, we will put the redder magnitude on the horizontal axis and the color on the vertical axis. We will set both dimensions to have 100 bins (with --numbins for the horizontal and --numbins2 for the vertical). Also, to avoid strong outliers in any of the dimensions, we will manually set the range of each dimension with the --greaterequal, --greaterequal2, --lessthan and --lessthan2 options.

$ aststatistics cat/mags-with-color.fits -cMAG-F160W,F105W-F160W \
                --histogram2d=image --manualbinrange \
                --numbins=100  --greaterequal=22  --lessthan=30 \
                --numbins2=100 --greaterequal2=-1 --lessthan2=3 \
                --manualbinrange --output=cmd.fits

You can now open this FITS file as a normal FITS image, for example, with the command below. Try hovering/zooming over the pixels: not only will you see the number of objects in catalog that fall in each bin/pixel, but you also see the F160W magnitude and color of that pixel also (in the same place you usually see RA and Dec when hovering over an astronomical image).

$ astscript-fits-view cmd.fits --ds9scale=minmax

Having a 2D histogram as a FITS image with WCS has many great advantages. For example, just like FITS images of the night sky, you can “match” many 2D histograms that were created independently. You can add two histograms with each other, or you can use advanced features of FITS viewers to find structure in the correlation of your columns.

With the first command below, you can activate the grid feature of DS9 to actually see the coordinate grid, as well as values on each line. With the second command, DS9 will even read the labels of the axes and use them to generate an almost publication-ready plot.

$ astscript-fits-view cmd.fits --ds9scale=minmax --ds9extra="-grid yes"
$ astscript-fits-view cmd.fits --ds9scale=minmax \
           --ds9extra="-grid yes -grid type publication"

If you are happy with the grid and coloring and the rest, you can also use ds9 to save this as a JPEG image to directly use in your documents/slides with these extra DS9 options (DS9 will write the image to cmd-2d.jpeg and quit immediately afterwards):

$ astscript-fits-view cmd.fits --ds9scale=minmax \
           --ds9extra="-grid yes -grid type publication" \
           --ds9extra="-saveimage cmd-2d.jpeg -quit"

This is good for a fast progress update. But for your paper or more official report, you want to show something with higher quality. For that, you can use the PGFPlots package in LaTeX to add axes in the same font as your text, sharp grids and many other elegant/powerful features (like over-plotting interesting points and lines). But to load the 2D histogram into PGFPlots first you need to convert the FITS image into a more standard format, for example, PDF. We will use Gnuastro’s ConvertType for this, and use the sls-inverse color map (which will map the pixels with a value of zero to white):

$ astconvertt cmd.fits --colormap=sls-inverse --borderwidth=0 -ocmd.pdf

Open the resulting cmd.pdf and see the PDF. Below you can see a minimally working example of how to add axis numbers, labels and a grid to the PDF generated above. First, let’s create a new report directory to keep the LaTeX outputs, then put the minimal report’s source in a file called report.tex. Notice the xmin, xmax, ymin, ymax values and how they are the same as the range specified above.

$ mkdir report-cmd
$ mv cmd.pdf report-cmd/
$ cat report-cmd/report.tex
\documentclass{article}
\usepackage{pgfplots}
\dimendef\prevdepth=0
\begin{document}

You can write all you want here...

\begin{tikzpicture}
  \begin{axis}[
      enlargelimits=false,
      grid,
      axis on top,
      width=\linewidth,
      height=\linewidth,
      xlabel={Magnitude (F160W)},
      ylabel={Color (F105W-F160W)}]

    \addplot graphics[xmin=22, xmax=30, ymin=-1, ymax=3] {cmd.pdf};
  \end{axis}
\end{tikzpicture}
\end{document}

Run this command to build your PDF (assuming you have LaTeX and PGFPlots).

$ cd report-cmd
$ pdflatex report.tex

Open the newly created report.pdf and enjoy the exquisite quality. The improved quality, blending in with the text, vector-graphics resolution and other features make this plot pleasing to the eye, and let your readers focus on the main point of your scientific argument. PGFPlots can also built the PDF of the plot separately from the rest of the paper/report, see 2D histogram as a table for plotting for the necessary changes in the preamble.

We will not go much deeper into the Statistics program here, but there is so much more you can do with it. After finishing the tutorial, see Statistics.


2.1.17 Aperture photometry

The colors we calculated in Working with catalogs (estimating colors) used a different segmentation map for each object. This might not satisfy some science cases that need the flux within a fixed area/aperture. Fortunately Gnuastro’s modular programs make it very easy do this type of measurement (photometry). To do this, we can ignore the labeled images of NoiseChisel of Segment, we can just built our own labeled image! That labeled image can then be given to MakeCatalog

To generate the apertures catalog we will use Gnuastro’s MakeProfiles (see MakeProfiles). But first we need a list of positions (aperture photometry needs a-priori knowledge of your target positions). So we will first read the clump positions from the F160W catalog, then use AWK to set the other parameters of each profile to be a fixed circle of radius 5 pixels (recall that we want all apertures to have an identical size/area in this scenario).

$ rm *.fits *.txt
$ asttable cat/xdf-f160w.fits -hCLUMPS -cRA,DEC \
           | awk '!/^#/{print NR, $1, $2, 5, 5, 0, 0, 1, NR, 1}' \
           > apertures.txt
$ cat apertures.txt

We can now feed this catalog into MakeProfiles using the command below to build the apertures over the image. The most important option for this particular job is --mforflatpix, it tells MakeProfiles that the values in the magnitude column should be used for each pixel of a flat profile. Without it, MakeProfiles would build the profiles such that the sum of the pixels of each profile would have a magnitude (in log-scale) of the value given in that column (what you would expect when simulating a galaxy for example). See Invoking MakeProfiles for details on the options.

$ astmkprof apertures.txt --background=flat-ir/xdf-f160w.fits \
            --clearcanvas --replace --type=int16 --mforflatpix \
            --mode=wcs --output=apertures.fits

Open apertures.fits with a FITS image viewer (like SAO DS9) and look around at the circles placed over the targets. Also open the input image and Segment’s clumps image and compare them with the positions of these circles. Where the apertures overlap, you will notice that one label has replaced the other (because of the --replace option). In the future, MakeCatalog will be able to work with overlapping labels, but currently it does not. If you are interested, please join us in completing Gnuastro with added improvements like this (see task 14750 41).

We can now feed the apertures.fits labeled image into MakeCatalog instead of Segment’s output as shown below. In comparison with the previous MakeCatalog call, you will notice that there is no more --clumpscat option, since there is no more separate “clump” image now, each aperture is treated as a separate “object”.

$ astmkcatalog apertures.fits -h1 --zeropoint=26.27 \
               --valuesfile=nc/xdf-f105w.fits \
               --ids --ra --dec --magnitude --sn \
               --output=cat/xdf-f105w-aper.fits

This catalog has the same number of rows as the catalog produced from clumps in Working with catalogs (estimating colors). Therefore similar to how we found colors, you can compare the aperture and clump magnitudes for example.

You can also change the filter name and zero point magnitudes and run this command again to have the fixed aperture magnitude in the F160W filter and measure colors on apertures.


2.1.18 Matching catalogs

In the example above, we had the luxury to generate the catalogs ourselves, and where thus able to generate them in a way that the rows match. But this is not generally the case. In many situations, you need to use catalogs from many different telescopes, or catalogs with high-level calculations that you cannot simply regenerate with the same pixels without spending a lot of time or using heavy computation. In such cases, when each catalog has the coordinates of its own objects, you can use the coordinates to match the rows with Gnuastro’s Match program (see Match).

As the name suggests, Gnuastro’s Match program will match rows based on distance (or aperture in 2D) in one, two, or three columns. For this tutorial, let’s try matching the two catalogs that were not created from the same labeled images, recall how each has a different number of rows:

$ asttable cat/xdf-f105w.fits -hCLUMPS -i
$ asttable cat/xdf-f160w.fits -hCLUMPS -i

You give Match two catalogs (from the two different filters we derived above) as argument, and the HDUs containing them (if they are FITS files) with the --hdu and --hdu2 options. The --ccol1 and --ccol2 options specify the coordinate-columns which should be matched with which in the two catalogs. With --aperture you specify the acceptable error (radius in 2D), in the same units as the columns.

$ astmatch cat/xdf-f160w.fits           cat/xdf-f105w.fits \
           --hdu=CLUMPS                 --hdu2=CLUMPS \
           --ccol1=RA,DEC               --ccol2=RA,DEC \
           --aperture=0.5/3600 \
           --output=matched.fits
$ astfits matched.fits

From the second command, you see that the output has two extensions and that both have the same number of rows. The rows in each extension are the matched rows of the respective input table: those in the first HDU come from the first input and those in the second HDU come from the second. However, their order may be different from the input tables because the rows match: the first row in the first HDU matches with the first row in the second HDU, etc. You can also see which objects did not match with the --notmatched, like below. Note how each extension of now has a different number of rows.

$ astmatch cat/xdf-f160w.fits           cat/xdf-f105w.fits \
           --hdu=CLUMPS                 --hdu2=CLUMPS \
           --ccol1=RA,DEC               --ccol2=RA,DEC \
           --aperture=0.5/3600 \
           --output=not-matched.fits    --notmatched
$ astfits not-matched.fits

The --outcols of Match is a very convenient feature: you can use it to specify which columns from the two catalogs you want in the output (merge two input catalogs into one). If the first character is an ‘a’, the respective matched column (number or name, similar to Table above) in the first catalog will be written in the output table. When the first character is a ‘b’, the respective column from the second catalog will be written in the output. Also, if the first character is followed by _all, then all the columns from the respective catalog will be put in the output.

$ astmatch cat/xdf-f160w.fits           cat/xdf-f105w.fits \
           --hdu=CLUMPS                 --hdu2=CLUMPS \
           --ccol1=RA,DEC               --ccol2=RA,DEC \
           --aperture=0.35/3600 \
           --outcols=a_all,bMAGNITUDE,bSN \
           --output=matched.fits
$ astfits matched.fits

2.1.19 Reddest clumps, cutouts and parallelization

As a final step, let’s go back to the original clumps-based color measurement we generated in Working with catalogs (estimating colors). We will find the objects with the strongest color and make a cutout to inspect them visually and finally, we will see how they are located on the image. With the command below, we will select the reddest objects (those with a color larger than 1.5):

$ asttable cat/mags-with-color.fits --range=F105W-F160W,1.5,inf

You can see how many they are by piping it to wc -l:

$ asttable cat/mags-with-color.fits --range=F105W-F160W,1.5,inf | wc -l

Let’s crop the F160W image around each of these objects, but we first need a unique identifier for them. We will define this identifier using the object and clump labels (with an underscore between them) and feed the output of the command above to AWK to generate a catalog. Note that since we are making a plain text table, we will define the necessary (for the string-type first column) metadata manually (see Gnuastro text table format).

$ echo "# Column 1: ID [name, str10] Object ID" > cat/reddest.txt
$ asttable cat/mags-with-color.fits --range=F105W-F160W,1.5,inf \
           | awk '{printf("%d_%-10d %f %f\n", $1, $2, $3, $4)}' \
           >> cat/reddest.txt

Let’s see how these objects are positioned over the dataset. DS9 has the “Region”s concept for this purpose. And you build such regions easily from a table using Gnuastro’s astscript-ds9-region installed script, using the command below:

$ astscript-ds9-region cat/reddest.txt -c2,3 --mode=wcs \
           --command="ds9 flat-ir/xdf-f160w.fits -zscale"

We can now feed cat/reddest.txt into Gnuastro’s Crop program to get separate postage stamps for each object. To keep things clean, we will make a directory called crop-red and ask Crop to save the crops in this directory. We will also add a -f160w.fits suffix to the crops (to remind us which filter they came from). The width of the crops will be 15 arc-seconds (or 15/3600 degrees, which is the units of the WCS).

$ mkdir crop-red
$ astcrop flat-ir/xdf-f160w.fits --mode=wcs --namecol=ID \
          --catalog=cat/reddest.txt --width=15/3600,15/3600  \
          --suffix=-f160w.fits --output=crop-red

Like the MakeProfiles command in Aperture photometry, if you look at the order of the crops, you will notice that the crops are not made in order! This is because each crop is independent of the rest, therefore crops are done in parallel, and parallel operations are asynchronous. So the order can differ in each run, but the final output is the same! In the command above, you can change f160w to f105w to make the crops in both filters. You can see all the cropped FITS files in the crop-red directory with this command:

$ astscript-fits-view crop-red/*.fits

To view the crops more easily (not having to open ds9 for each image), you can convert the FITS crops into the JPEG format with a shell loop like below.

$ cd crop-red
$ for f in *.fits; do \
    astconvertt $f --fluxlow=-0.001 --fluxhigh=0.005 --invert -ojpg; \
  done
$ cd ..
$ ls crop-red/

You can now use your general graphic user interface image viewer to flip through the images more easily, or import them into your papers/reports.

The for loop above to convert the images will do the job in series: each file is converted only after the previous one is complete. But like the crops, each JPEG image is independent, so let’s parallelize it. In other words, we want to run more than one instance of the command at any moment. To do that, we will use Make. Make is a very wonderful pipeline management system, and the most common and powerful implementation is GNU Make, which has a complete manual just like this one. We cannot go into the details of Make here, for a hands-on video tutorial, see this video introduction. To do the process above in Make, please copy the contents below into a plain-text file called Makefile. Just replace the __[TAB]__ part at the start of the line with a single ‘TAB’ button on your keyboard.

jpgs=$(subst .fits,.jpg,$(wildcard *.fits))
all: $(jpgs)
$(jpgs): %.jpg: %.fits
__[TAB]__astconvertt $< --fluxlow=-0.001 --fluxhigh=0.005 \
__[TAB]__            --invert -o$ 

Now that the Makefile is ready, you can run Make on 12 threads using the commands below. Feel free to replace the 12 with any number of threads you have on your system (you can find out by running the nproc command on GNU/Linux operating systems):

$ make -j12

Did you notice how much faster this one was? When possible, it is always very helpful to do your analysis in parallel. You can build very complex workflows with Make, for example, see Akhlaghi 2021 so it is worth spending some time to master.


2.1.20 FITS images in a publication

In the previous section (Reddest clumps, cutouts and parallelization), we visually inspected the positions of the reddest objects using DS9. That is very good for an interactive inspection of the objects: you can zoom-in and out, you can do measurements, etc. Once the experimentation phase of your project is complete, you want to show these objects over the whole image in a report, paper or slides.

One solution is to use DS9 itself! For example, run the astscript-fits-view command of the previous section to open DS9 with the regions over-plotted. Click on the “File” menu and select “Save Image”. In the side-menu that opens, you have multiple formats to select from. Usually for publications, we want to show the regions and text (in the colorbar) in vector graphics, so it is best to export to EPS. Once you have made the EPS, you can then convert it to PDF with the epspdf command.

Another solution is to use Gnuastro’s ConvertType program. The main difference is that DS9 is a Graphic User Interface (GUI) program, so it takes relatively long (about a second) to load, and it requires many dependencies. This will slow-down automatic conversion of many files, and will make your code hard to move to another operating system. DS9 does have a command-line interface that you can use to automate the creation of each file, however, it has a very peculiar command-line interface and formats (like the “region” files). However, in ConvertType, there is no graphic interface, so it has very few dependencies, it is fast, and finally, it takes normal tables (in plain-text or FITS) as input. So in this concluding step of the analysis, let’s build a nice publication-ready plot, showing the positions of the reddest objects in the image for our paper.

In Reddest clumps, cutouts and parallelization, we already used ConvertType to make JPEG postage stamps. Here, we will use it to make a PDF image of the whole deep region. To start, let’s simply run ConvertType on the F160W image:

$ astconvertt flat-ir/xdf-f160w.fits -oxdf.pdf

Open the output in a PDF viewer. You see that it is almost fully black! Let’s see why this happens! First, with the two commands below, let’s calculate the maximum value, and the standard deviation of the sky in this image (using NoiseChisel’s output, which we found at the end of NoiseChisel optimization for detection). Note that NoiseChisel writes the median sky standard deviation before interpolation in the MEDSTD keyword of the SKY_STD HDU. This is more robust than the median of the Sky standard deviation image (which has gone through interpolation).

$ max=$(aststatistics nc/xdf-f160w.fits -hINPUT-NO-SKY --maximum)
$ skystd=$(astfits nc/xdf-f160w.fits -hSKY_STD --keyvalue=MEDSTD -q)

$ echo $max $skystd
58.8292 0.000410282

$ echo $max $skystd | awk '{print $1/$2}'
143387

In the last command above, we divided the maximum by the sky standard deviation. You see that the maximum value is more than \(140000\) times larger than the noise level! On the other hand common monitors or printers, usually have a maximum dynamic range of 8-bits, only allowing for \(2^8=256\) layers. This is therefore the maximum number of “layers” you can have in a common display formats like JPEG, PDF or PNG! Dividing the result above by 256, we get a layer spacing of

$ echo $max $skystd | awk '{print $1/$2/256}'
560.106

In other words, the first layer (which is black) will contain all the pixel values below \(\sim560\)! So all pixels with a signal-to-noise ratio lower than \(\sim560\) will have a black color since they fall in the first layer of an 8-bit PDF (or JPEG) image. This happens because by default we are assuming a linear mapping from floating point to 8-bit integers.

To fix this, we should move to a different mapping. A good, physically motivated, mapping is Surface Brightness (which is in log-scale, see Brightness, Flux, Magnitude and Surface brightness). Fortunately this is very easy to do with Gnuastro’s Arithmetic program, as shown in the commands below (using the known zero point42, and after calculating the pixel area in units of arcsec\(^2\)):

$ zeropoint=25.94
$ pixarcsec2=$(astfits nc/xdf-f160w.fits --pixelareaarcsec2)
$ astarithmetic nc/xdf-f160w.fits $zeropoint $pixarcsec2 counts-to-sb \
                --output=xdf-f160w-sb.fits

With the two commands below, first, let’s look at the dynamic range of the image now (dividing the maximum by the minimum), and then let’s open the image and have a look at it:

$ aststatistics xdf-f160w-sb.fits --minimum --maximum
$ astscript-fits-view xdf-f160w-sb.fits

The good news is that the dynamic range has now decreased to about 2! In other words, we can distribute the 256 layers of an 8-bit display over a much smaller range of values, and therefore better visualize the data. However, there are two important points to consider from the output of the first command and a visual inspection of the second.

  • The largest pixel value (faintest surface brightness level) in the image is \(\sim43\)! This is far too low to be realistic, and is just due to noise. As discussed in Measuring the dataset limits, the \(3\sigma\) surface brightness limit of this image, over 100 arcsec\(^2\) is roughly 32.66 mag/arcsec\(^2\).
  • You see many NaN pixels in between the galaxies! These are due to the fact that the magnitude is defined on a logarithmic scale and the logarithm of a negative number is not defined.

In other words, we should replace all NaN pixels, and pixels with a surface brightness value fainter than the image surface brightness limit to this limit. With the first command below, we will first extract the surface brightness limit from the catalog headers that we calculated before, and then call Arithmetic to use this limit.

$ sblimit=$(astfits cat/xdf-f160w.fits --keyvalue=SBLMAG -q)
$ astarithmetic nc/xdf-f160w.fits $zeropoint $pixarcsec2 \
                counts-to-sb set-sb \
                sb sb $sblimit gt sb isblank or $sblimit where \
                --output=xdf-f160w-sb.fits

Let’s convert this image into a PDF with the command below:

$ astconvertt xdf-f160w-sb.fits --output=xdf-f160w-sb.pdf

It is much better now and we can visualize many features of the FITS file (from the central structures of the galaxies and stars, to a little into the noise and their low surface brightness features. However, the image generally looks a little too gray! This is because of that bright star in the bottom half of the image! Stars are very sharp! So let’s manually tell ConvertType to set any pixel with a value less than (brighter than) 20 to black (and not use the minimum). We do this with the --fluxlow option:

$ astconvertt xdf-f160w-sb.fits --output=xdf-f160w-sb.pdf --fluxlow=20

We are still missing some of the diffuse flux in this PDF. This is because of those negative pixels that were set to NaN. To better show these structures, we should warp the image to larger pixels. So let’s warp it to a pixel grid where the new pixels are \(4\times4\) larger than the original pixels. But be careful that warping should be done on the original image, not on the surface brightness image. We should re-calculate the surface brightness image after the warping is one. This is because \(log(a+b)\ne log(a)+log(b)\). Recall that surface brightness calculation involves a logarithm, and warping involves addition of pixel values.

$ astwarp nc/xdf-f160w.fits --scale=1/4 --centeroncorner \
          --output=xdf-f160w-warped.fits

$ pixarcsec2=$(astfits xdf-f160w-warped.fits --pixelareaarcsec2)

$ astarithmetic xdf-f160w-warped.fits $zeropoint $pixarcsec2 \
                counts-to-sb set-sb \
                sb sb $sblimit gt sb isblank or $sblimit where \
                --output=xdf-f160w-sb.fits

$ astconvertt xdf-f160w-sb.fits --output=xdf-f160w-sb.pdf --fluxlow=20

Above, we needed to re-calculate the pixel area of the warpped image, but we did not need to re-calculate the surface brightness limit! The reason is that the surface brightness limit is independent of the pixel area (in its derivation, the pixel area has been accounted for). As a side-effect of the warping, the number of pixels in the image also dramatically decreased, therefore the volume of the output PDF (in bytes) is also smaller, making your paper/report easier to upload/download or send by email. This visual resolution is still more than enough for including on top of a column in your paper!

I do not have the zero point of my image: The absolute value of the zero point is irrelevant for the finally produced PDF. We used it here because it was available and makes the numbers physically understandable. If you do not have the zero point, just set it to zero (which is also the default zero point used by MakeCatalog when it estimates the surface brightness limit). For the value to --fluxlow above, you can simply subtract \(\sim10\) from the surface brightness limit.

To summarize, and to keep the image for the next section in a separate directory, here are the necessary commands:

$ zeropoint=25.94
$ mkdir report-image
$ cd report-image
$ sblimit=$(astfits cat/xdf-f160w.fits --keyvalue=SBLMAG -q)
$ astwarp nc/xdf-f160w.fits --scale=1/4 --centeroncorner \
          --output=warped.fits
$ pixarcsec2=$(astfits warped.fits --pixelareaarcsec2)
$ astarithmetic warped.fits $zeropoint $pixarcsec2 \
                counts-to-sb set-sb \
                sb sb $sblimit gt sb isblank or $sblimit where \
                --output=sb.fits
$ astconvertt sb.fits --output=sb.pdf --fluxlow=20

Finally, let’s remove all the temporary files we built in the top-level tutorial directory:

$ rm *.fits *.pdf

Color images: In this tutorial we just used one of the filters and showed the surface brightness image of that single filter as a grayscale image. But the image can also be in color (using three filters) to better convey the physical properties of the objects in your image. To create an image that shows the full dynamic range of your data, see this dedicated tutorial Color images with full dynamic range.


2.1.21 Marking objects for publication

In FITS images in a publication we created a ready-to-print visualization of the FITS image used in this tutorial. However, you rarely want to show a naked image like that! You usually want to highlight some objects (that are the target of your science) over the image and show different marks for the various types of objects you are studying. In this tutorial, we will do just that: select a sub-set of the full catalog of clumps, and show them with different marks shapes and colors, while also adding some text under each mark. To add coordinates on the edges of the figure in your paper, see Annotations for figure in paper.

To start with, let’s put a red plus sign over the sub-sample of reddest clumps similar to Reddest clumps, cutouts and parallelization. First, we will need to make the table of marks. We will choose those with a color stronger than 1.5 magnitudes and a signal-to-noise ratio (in F160W) larger than 5. We also only need the RA, Dec, color and magnitude (in F160W) columns (recall that at the end of the previous section we were already in the report-image/ directory):

$ asttable cat/mags-with-color.fits --range=F105W-F160W,1.5:inf \
           --range=sn-f160w,5:inf -cRA,DEC,MAG-F160w,F105W-F160W \
           --output=reddest-cat.fits

Gnuastro’s ConvertType program also has features to add marks over the finally produced PDF. Below, we will start with the same astconvertt command of the previous section. The positions of the marks should be given as a table to the --marks option. Two other options are also mandatory: --markcoords identifies the columns that contain the coordinates of each mark and --mode specifies if the coordinates are in image or WCS coordinates.

$ astconvertt sb.fits --output=reddest.pdf --fluxlow=20 \
              --marks=reddest-cat.fits --mode=wcs \
              --markcoords=RA,DEC

Open the output reddest.pdf and see the result. You will see relatively thick red circles placed over the given coordinates. In your PDF browser, zoom-in to one of the regions, you will see that while the pixels of the background image become larger, the lines of these regions do not degrade! This is the concept/power of Vector Graphics: ideal for publication! For more on raster (pixelated) and vector (infinite-resolution) graphics, see Raster and Vector graphics.

We had planned to put a plus-sign on each object. However, because we did not explicitly ask for a certain shape, ConvertType put a circle. Each mark can have its own separate shape. Shapes can be given by a name or a code. The full list of available shapes names and codes is given in the description of --markshape option of Drawing with vector graphics.

To use a different shape, we need to add a new column to the base table, containing the identifier of the desired shape for each mark. For example, the code for the plus sign is 2. With the commands below, we will add a new column with this fixed value. With the first AWK command we will make a single-column file, where all the rows have the same value. We pipe our base table into AWK, so it has the same number of rows. With the second command, we concatenate (or append) the new column with Table, and give this new column the name SHAPE (to easily refer to it later and not have to count). With the third command, we clean-up behind our selves (deleting the extra params.txt file). Finally, we use the --markshape option to tell ConvertType which column to use for the shape identifier.

$ asttable reddest-cat.fits | awk '{print 2}' > params.txt

$ asttable reddest-cat.fits --catcolumnfile=params.txt \
           --colmetadata=5,SHAPE,id,"Shape of mark" \
           --output=reddest-marks.fits
$ rm params.txt

$ astconvertt sb.fits --output=reddest.pdf --fluxlow=20 \
              --marks=reddest-marks.fits --mode=wcs \
              --markcoords=RA,DEC --markshape=SHAPE

Open the PDF and have a look! You do see red signs over the coordinates, but the thick plus-signs only become visible after you zoom-in multiple times! To make them larger, you can give another column to specify the size of each mark. Let’s set the full width of the plus sign to extend 3 arcseconds. The commands are similar to above, try to follow the difference (in particular, how we use --sizeinarcsec).

$ asttable reddest-cat.fits | awk '{print 2, 3}' > params.txt

$ asttable reddest-cat.fits --catcolumnfile=params.txt \
           --colmetadata=5,SHAPE,id,"Shape of mark" \
           --colmetadata=6,SIZE,arcsec,"Size in arcseconds" \
           --output=reddest-marks.fits
$ rm params.txt

$ astconvertt sb.fits --output=reddest.pdf --fluxlow=20 \
              --marks=reddest-marks.fits --mode=wcs \
              --markcoords=RA,DEC --markshape=SHAPE \
              --marksize=SIZE --sizeinarcsec

The power of this methodology is that each mark can be completely different! For example, let’s show the objects with a color less than 2 magnitudes with a circle, and those with a stronger color with a plus (recall that the code for a circle was 1 and that of a plus was 2). You only need to replace the first command above with the one below. Afterwards, run the rest of the commands in the last code-block.

$ asttable reddest-cat.fits -cF105W-F160W \
           | awk '{if($1<2) shape=1; else shape=2; print shape, 3}' \
           > params.txt

Have a look at the resulting reddest.pdf. You see that the circles are much larger than the plus signs. This is because the “size” of a cross is defined to be its full width, but for a circle, the value in the size column is the radius. The way each shape interprets the value of the size column is fully described under --markshape of Drawing with vector graphics. To make them more comparable, let’s set the circle sizes to be half of the cross sizes.

$ asttable reddest-cat.fits -cF105W-F160W \
           | awk '{if($1<2) {shape=1; size=1.5} \
                   else     {shape=2; size=3} \
                   print shape, size}' \
           > params.txt

Let’s make things a little more complex (and show more information in the visualization) by using color. Gnuastro recognizes the full extended web colors, for their full list (containing names and codes) see Vector graphics colors. But like everything else, an even easier way to view and select the color for your figure is on the command-line! If your terminal supports 24-bit true-color, you can see all the colors by running this command (supported on modern GNU/Linux distributions):

$ astconvertt --listcolors

we will give a “Sienna” color for the objects that are fainter than 29th magnitude and a “deeppink” color to the brighter ones (while keeping the same shapes definition as before) Since there are many colors, using their codes can make the table hard to read by a human! So let’s use the color names instead of the color codes in the example below (this is useful in other columns require strings-only, like the font name).

The only intricacy is in the making of params.txt. Recall that string columns need column metadata (Gnuastro text table format). In this particular case, since the string column is the last one, we can safely use AWK’s print command. But if you have multiple string columns, to be safe it is better to use AWK’s printf and explicitly specify the number of characters in the string columns.

$ asttable reddest-cat.fits -cF105W-F160W,MAG-F160W \
           | awk 'BEGIN{print "# Column 3: COLOR [name, str8]"}\
                  {if($1<2)  {shape=1; size=1.5} \
                   else      {shape=2; size=3} \
                   if($2>29) {color="sienna"} \
                   else      {color="deeppink"} \
                   print shape, size, color}' \
           > params.txt

$ asttable reddest-cat.fits --catcolumnfile=params.txt \
           --colmetadata=5,SHAPE,id,"Shape of mark" \
           --colmetadata=6,SIZE,arcsec,"Size in arcseconds" \
           --output=reddest-marks.fits
$ rm params.txt

$ astconvertt sb.fits --output=reddest.pdf --fluxlow=20 \
              --marks=reddest-marks.fits --mode=wcs \
              --markcoords=RA,DEC --markshape=SHAPE \
              --marksize=SIZE --sizeinarcsec --markcolor=COLOR

As one final example, let’s write the magnitude of each object under it. Since the magnitude is already in the marks.fits that we produced above, it is very easy to add it (just add --marktext option to ConvertType):

$ astconvertt sb.fits --output=reddest.pdf --fluxlow=20 \
              --marks=reddest-marks.fits --mode=wcs \
              --markcoords=RA,DEC --markshape=SHAPE \
              --marksize=SIZE --sizeinarcsec \
              --markcolor=COLOR --marktext=MAG-F160W

Open the final PDF (reddest.pdf) and you will see the magnitudes written under each mark in the same color. In the case of magnitudes (where the magnitude error is usually much larger than 0.01 magnitudes, four decimals is not too meaningful. By default, for printing floating point columns, we use the compiler’s default precision (which is about 4 digits for 32-bit floating point numbers). But you can over-write this (to only show two digits after the decimal point) with the --marktextprecision=2 option.

You can customize the written text by specifying a different line-width (for the text, different from the main mark), or even specifying a different font for each mark! You can see the full list of available fonts for the text under a mark with the first command below and with the second, you can actually see them in a custom PDF (to only show the fonts).

$ astconvertt --listfonts
$ astconvertt --showfonts

As you see, there are many ways you can customize each mark! The above examples were just the tip of the iceburg! But this section has already become long so we will stop it here (see the box at the end of this section for yet another useful example). Like above, each feature of a mark can be controlled with a column in the table of mark information. Please see in Drawing with vector graphics for the full list of columns/features that you can use.

Drawing ellipses: With the commands below, you can measure the elliptical properties of the objects and visualized them in a ready-to-publish PDF (we will only show the ellipses of the largest clumps):

$ astmkcatalog ../seg/xdf-f160w.fits --ra --dec --semi-major \
               --axis-ratio --position-angle --clumpscat \
               --output=ellipseinfo.fits
$ asttable ellipseinfo.fits -hCLUMPS | awk '{print 4}' > params.txt
$ asttable ellipseinfo.fits -hCLUMPS --catcolumnfile=params.txt \
           --range=SEMI_MAJOR,10,inf -oellipse-marks.fits \
           --colmetadata=6,SHAPE,id,"Shape of mark"
$ astconvertt sb.fits --output=ellipse.pdf --fluxlow=20 \
              --marks=ellipse-marks.fits --mode=wcs \
              --markcoords=RA,DEC --markshape=SHAPE \
              --marksize=SEMI_MAJOR,AXIS_RATIO --sizeinpix \
              --markrotate=POSITION_ANGLE

To conclude this section, let us highlight an important factor to consider in vector graphics. In ConvertType, things like line width or font size are defined in units of points. In vector graphics standards, 72 points correspond to one inch. Therefore, one way you can change these factors for all the objects is to assign a larger or smaller print size to the image. The print size is just a meta-data entry, and will not affect the file’s volume in bytes! You can do this with the --widthincm option. Try adding this option and giving it very different values like 5 or 30.


2.1.22 Writing scripts to automate the steps

In the previous sub-sections, we went through a series of steps like downloading the necessary datasets (in Setup and data download), detecting the objects in the image, and finally selecting a particular subset of them to inspect visually (in Reddest clumps, cutouts and parallelization). To benefit most effectively from this subsection, please go through the previous sub-sections, and if you have not actually done them, we recommended to do/run them before continuing here.

Each sub-section/step of the sub-sections above involved several commands on the command-line. Therefore, if you want to reproduce the previous results (for example, to only change one part, and see its effect), you’ll have to go through all the sections above and read through them again. If you have ran the commands recently, you may also have them in the history of your shell (command-line environment). You can see many of your previous commands on the shell (even if you have closed the terminal) with the history command, like this:

$ history

Try it in your teminal to see for yourself. By default in GNU Bash, it shows the last 500 commands. You can also save this “history” of previous commands to a file using shell redirection (to have it after your next 500 commands), with this command

$ history > my-previous-commands.txt

This is a good way to temporarily keep track of every single command you ran. But in the middle of all the useful commands, you will have many extra commands, like tests that you did before/after the good output of a step (that you decided to continue working on), or an unrelated job you had to do in the middle of this project. Because of these impurities, after a few days (that you have forgot the context: tests you did not end-up using, or unrelated jobs) reading this full history will be very frustrating.

Keeping the final commands that were used in each step of an analysis is a common problem for anyone who is doing something serious with the computer. But simply keeping the most important commands in a text file is not enough, the small steps in the middle (like making a directory to keep the outputs of one step) are also important. In other words, the only way you can be sure that you are under control of your processing (and actually understand how you produced your final result) is to run the commands automatically.

Fortunately, typing commands interactively with your fingers is not the only way to operate the shell. The shell can also take its orders/commands from a plain-text file, which is called a script. When given a script, the shell will read it line-by-line as if you have actually typed it manually.

Let’s continue with an example: try typing the commands below in your shell. With these commands we are making a text file (a.txt) containing a simple \(3\times3\) matrix, converting it to a FITS image and computing its basic statistics. After the first three commands open a.txt with a text editor to actually see the values we wrote in it, and after the fourth, open the FITS file to see the matrix as an image. a.txt is created through the shell’s redirection feature: ‘>’ overwrites the existing contents of a file, and ‘>>’ appends the new contents after the old contents.

$ echo "1 1 1" > a.txt
$ echo "1 2 1" >> a.txt
$ echo "1 1 1" >> a.txt
$ astconvertt a.txt --output=a.fits
$ aststatistics a.fits

To automate these series of commands, you should put them in a text file. But that text file must have two special features: 1) It should tell the shell what program should interpret the script. 2) The operating system should know that the file can be directly executed.

For the first, Unix-like operating systems define the shebang concept (also known as sha-bang or hashbang). In the shebang convention, the first two characters of a file should be ‘#!’. When confronted with these characters, the script will be interpreted with the program that follows them. In this case, we want to write a shell script and the most common shell program is GNU Bash which is installed in /bin/bash. So the first line of your script should be ‘#!/bin/bash43.

It may happen (rarely) that GNU Bash is in another location on your system. In other cases, you may prefer to use a non-standard version of Bash installed in another location (that has higher priority in your PATH, see Installation directory). In such cases, you can use the ‘#!/usr/bin/env bash’ shebang instead. Through the env program, this shebang will look in your PATH and use the first bash it finds to run your script. But for simplicity in the rest of the tutorial, we will continue with the ‘#!/bin/bash’ shebang.

Using your favorite text editor, make a new empty file, let’s call it my-first-script.sh. Write the GNU Bash shebang (above) as its first line. After the shebang, copy the series of commands we ran above. Just note that the ‘$’ sign at the start of every line above is the prompt of the interactive shell (you never actually typed it, remember?). Therefore, commands in a shell script should not start with a ‘$’. Once you add the commands, close the text editor and run the cat command to confirm its contents. It should look like the example below. Recall that you should only type the line that starts with a ‘$’, the lines without a ‘$’, are printed automatically on the command-line (they are the contents of your script).

$ cat my-first-script.sh
#!/bin/bash
echo "1 1 1" > a.txt
echo "1 2 1" >> a.txt
echo "1 1 1" >> a.txt
astconvertt a.txt --output=a.fits
aststatistics a.fits

The script contents are now ready, but to run it, you should activate the script file’s executable flag. In Unix-like operating systems, every file has three types of flags: read (or r), write (or w) and execute (or x). To toggle a file’s flags, you should use the chmod (for “change mode”) command. To activate a flag, you put a ‘+’ before the flag character (for example, +x). To deactivate it, you put a ‘-’ (for example, -x). In this case, you want to activate the script’s executable flag, so you should run

$ chmod +x my-first-script.sh

Your script is now ready to run/execute the series of commands. To run it, you should call it while specifying its location in the file system. Since you are currently in the same directory as the script, it is easiest to use relative addressing like below (where ‘./’ means the current directory). But before running your script, first delete the two a.txt and a.fits files that were created when you interactively ran the commands.

$ rm a.txt a.fits
$ ls
$ ./my-first-script.sh
$ ls

The script immediately prints the statistics while doing all the previous steps in the background. With the last ls, you see that it automatically re-built the a.txt and a.fits files, open them and have a look at their contents.

An extremely useful feature of shell scripts is that the shell will ignore anything after a ‘#’ character. You can thus add descriptions/comments to the commands and make them much more useful for the future. For example, after adding comments, your script might look like this:

$ cat my-first-script.sh
#!/bin/bash

# This script is my first attempt at learning to write shell scripts.
# As a simple series of commands, I am just building a small FITS
# image, and calculating its basic statistics.

# Write the matrix into a file.
echo "1 1 1" > a.txt
echo "1 2 1" >> a.txt
echo "1 1 1" >> a.txt

# Convert the matrix to a FITS image.
astconvertt a.txt --output=a.fits

# Calculate the statistics of the FITS image.
aststatistics a.fits

Is Not this much more easier to read now? Comments help to provide human-friendly context to the raw commands. At the time you make a script, comments may seem like an extra effort and slow you down. But in one year, you will forget almost everything about your script and you will appreciate the effort so much! Think of the comments as an email to your future-self and always put a well-written description of the context/purpose (most importantly, things that are not directly clear by reading the commands) in your scripts.

The example above was very basic and mostly redundant series of commands, to show the basic concepts behind scripts. You can put any (arbitrarily long and complex) series of commands in a script by following the two rules: 1) add a shebang, and 2) enable the executable flag. In fact, as you continue your own research projects, you will find that any time you are dealing with more than two or three commands, keeping them in a script (and modifying that script, and running it) is much more easier, and future-proof, then typing the commands directly on the command-line and relying on things like history. Here are some tips that will come in handy when you are writing your scripts:

As a more realistic example, let’s have a look at a script that will do the steps of Setup and data download and Dataset inspection and cropping. In particular note how often we are using variables to avoid repeating fixed strings of characters (usually file/directory names). This greatly helps in scaling up your project, and avoiding hard-to-find bugs that are caused by typos in those fixed strings.

$ cat gnuastro-tutorial-1.sh
#!/bin/bash


# Download the input datasets
# ---------------------------
#
# The default file names have this format (where `FILTER' differs for
# each filter):
#   hlsp_xdf_hst_wfc3ir-60mas_hudf_FILTER_v1_sci.fits
# To make the script easier to read, a prefix and suffix variable are
# used to sandwich the filter name into one short line.
dldir=download
xdfsuffix=_v1_sci.fits
xdfprefix=hlsp_xdf_hst_wfc3ir-60mas_hudf_
xdfurl=http://archive.stsci.edu/pub/hlsp/xdf

# The file name and full URLs of the input data.
f105w_in=$xdfprefix"f105w"$xdfsuffix
f160w_in=$xdfprefix"f160w"$xdfsuffix
f105w_url=$xdfurl/$f105w_in
f160w_url=$xdfurl/$f160w_in

# Go into the download directory and download the images there,
# then come back up to the top running directory.
mkdir $dldir
cd $dldir
wget $f105w_url
wget $f160w_url
cd ..


# Only work on the deep region
# ----------------------------
#
# To help in readability, each vertice of the deep/flat field is stored
# as a separate variable. They are then merged into one variable to
# define the polygon.
flatdir=flat-ir
vertice1="53.187414,-27.779152"
vertice2="53.159507,-27.759633"
vertice3="53.134517,-27.787144"
vertice4="53.161906,-27.807208"
f105w_flat=$flatdir/xdf-f105w.fits
f160w_flat=$flatdir/xdf-f160w.fits
deep_polygon="$vertice1:$vertice2:$vertice3:$vertice4"

mkdir $flatdir
astcrop --mode=wcs -h0 --output=$f105w_flat \
        --polygon=$deep_polygon $dldir/$f105w_in
astcrop --mode=wcs -h0 --output=$f160w_flat \
        --polygon=$deep_polygon $dldirdir/$f160w_in

The first thing you may notice is that even if you already have the downloaded input images, this script will always try to re-download them. Also, if you re-run the script, you will notice that mkdir prints an error message that the download directory already exists. Therefore, the script above is not too useful and some modifications are necessary to make it more generally useful. Here are some general tips that are often very useful when writing scripts:

Stop script if a command crashes

By default, if a command in a script crashes (aborts and fails to do what it was meant to do), the script will continue onto the next command. In GNU Bash, you can tell the shell to stop a script in the case of a crash by adding this line at the start of your script:

set -e
Check if a file/directory exists to avoid re-creating it

Conditionals are a very useful feature in scripts. One common conditional is to check if a file exists or not. Assuming the file’s name is FILENAME, you can check its existance (to avoid re-doing the commands that build it) like this:

if [ -f FILENAME ]; then
  echo "FILENAME exists"
else
  # Some commands to generate the file
  echo "done" > FILENAME
fi

To check the existance of a directory instead of a file, use -d instead of -f. To negate a conditional, use ‘!’ and note that conditionals can be written in one line also (useful for when it is short).

One common scenario that you’ll need to check the existance of directories is when you are making them: the default mkdir command will crash if the desired directory already exists. On some systems (including GNU/Linux distributions), mkdir has options to deal with such cases. But if you want your script to be portable, it is best to check yourself like below:

if ! [ -d DIRNAME ]; then mkdir DIRNAME; fi
Avoid changing directories (with ‘cd’) within the script

You can directly read and write files within other directories. Therefore using cd to enter a directory (like what we did above, around the wget commands), running command there and coming out is extra, and not good practice. This is because the running directory is part of the environment of a command. You can simply give the directory name before the input and output file names to use them from anywhere on the file system. See the same wget commands below for an example.

Copyright notice: A very important thing to put at the top of your script is a one-line description of what it does and its copyright information (see the example below). Here, we specify who is the author(s) of this script, in which years, and under what license others are allowed to use this file. Without it, your script does not credibility or identity, and others cannot trust, use or acknowledge your work on it. Since Gnuastro is itself licensed under a copyleft license (see Your rights and GNU Gen. Pub. License v3 or GNU GPL, the license finishes with a template on how to add it), any script that uses Gnuastro should also have a copyleft license: we recommend the same GNU GPL v3+ like below.

Taking the above points into consideration, we can write a better version of the script above. Please compare this script with the previous one carefully to spot the differences. These are very important points that you will definitely encouter during your own research, and knowing them can greatly help your productiveity, so pay close attention (even in the comments).

#!/bin/bash
# Script to download and keep the deep region of the XDF survey.
#
# Copyright (C) 2024      Your Name <yourname@email.company>
# Copyright (C) 2021-2024 Initial Author <incase@there-is.any>
#
# This script is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This script is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Gnuastro. If not, see <http://www.gnu.org/licenses/>.


# Abort the script in case of an error.
set -e


# Download the input datasets
# ---------------------------
#
# The default file names have this format (where `FILTER' differs for
# each filter):
#   hlsp_xdf_hst_wfc3ir-60mas_hudf_FILTER_v1_sci.fits
# To make the script easier to read, a prefix and suffix variable are
# used to sandwich the filter name into one short line.
dldir=download
xdfsuffix=_v1_sci.fits
xdfprefix=hlsp_xdf_hst_wfc3ir-60mas_hudf_
xdfurl=http://archive.stsci.edu/pub/hlsp/xdf

# The file name and full URLs of the input data.
f105w_in=$xdfprefix"f105w"$xdfsuffix
f160w_in=$xdfprefix"f160w"$xdfsuffix
f105w_url=$xdfurl/$f105w_in
f160w_url=$xdfurl/$f160w_in

# Make sure the download directory exists, and download the images.
if ! [ -d $dldir    ]; then mkdir $dldir; fi
if ! [ -f $f105w_in ]; then wget $f105w_url -O $dldir/$f105w_in; fi
if ! [ -f $f160w_in ]; then wget $f160w_url -O $dldir/$f160w_in; fi


# Crop out the deep region
# ------------------------
#
# To help in readability, each vertice of the deep/flat field is stored
# as a separate variable. They are then merged into one variable to
# define the polygon.
flatdir=flat-ir
vertice1="53.187414,-27.779152"
vertice2="53.159507,-27.759633"
vertice3="53.134517,-27.787144"
vertice4="53.161906,-27.807208"
f105w_flat=$flatdir/xdf-f105w.fits
f160w_flat=$flatdir/xdf-f160w.fits
deep_polygon="$vertice1:$vertice2:$vertice3:$vertice4"

if ! [ -d $flatdir ]; then mkdir $flatdir; fi
if ! [ -f $f105w_flat ]; then
    astcrop --mode=wcs -h0 --output=$f105w_flat \
            --polygon=$deep_polygon $dldir/$f105w_in
fi
if ! [ -f $f160w_flat ]; then
    astcrop --mode=wcs -h0 --output=$f160w_flat \
            --polygon=$deep_polygon $dldir/$f160w_in
fi

2.1.23 Citing and acknowledging Gnuastro

In conclusion, we hope this extended tutorial has been a good starting point to help in your exciting research. If this book or any of the programs in Gnuastro have been useful for your research, please cite the respective papers, and acknowledge the funding agencies that made all of this possible. Without citations, we will not be able to secure future funding to continue working on Gnuastro or improving it, so please take software citation seriously (for all the scientific software you use, not just Gnuastro).

To help you in this, all Gnuastro programs have a --cite option to facilitate the citation and acknowledgment. Just note that it may be necessary to cite additional papers for different programs, so please try it out on all the programs that you used, for example:

$ astmkcatalog --cite
$ astnoisechisel --cite

2.2 Detecting large extended targets

The outer wings of large and extended objects can sink into the noise very gradually and can have a large variety of shapes (for example, due to tidal interactions). Therefore separating the outer boundaries of the galaxies from the noise can be particularly tricky. Besides causing an under-estimation in the total estimated brightness of the target, failure to detect such faint wings will also cause a bias in the noise measurements, thereby hampering the accuracy of any measurement on the dataset. Therefore even if they do not constitute a significant fraction of the target’s light, or are not your primary target, these regions must not be ignored. In this tutorial, we will walk you through the strategy of detecting such targets using NoiseChisel.

Do not start with this tutorial: If you have not already completed General program usage tutorial, we strongly recommend going through that tutorial before starting this one. Basic features like access to this book on the command-line, the configuration files of Gnuastro’s programs, benefiting from the modular nature of the programs, viewing multi-extension FITS files, or using NoiseChisel’s outputs are discussed in more detail there.

We will try to detect the faint tidal wings of the beautiful M51 group44 in this tutorial. We will use a dataset/image from the public Sloan Digital Sky Survey, or SDSS. Due to its more peculiar low surface brightness structure/features, we will focus on the dwarf companion galaxy of the group (or NGC 5195).


2.2.1 Downloading and validating input data

To get the image, you can use the simple field search tool of SDSS. As long as it is covered by the SDSS, you can find an image containing your desired target either by providing a standard name (if it has one), or its coordinates. To access the dataset we will use here, write NGC5195 in the “Object Name” field and press “Submit” button.

Type the example commands: Try to type the example commands on your terminal and use the history feature of your command-line (by pressing the “up” button to retrieve previous commands). Do not simply copy and paste the commands shown here. This will help simulate future situations when you are processing your own datasets.

You can see the list of available filters under the color image. For this demonstration, we will use the r-band filter image. By clicking on the “r-band FITS” link, you can download the image. Alternatively, you can just run the following command to download it with GNU Wget45. To keep things clean, let’s also put it in a directory called ngc5195. With the -O option, we are asking Wget to save the downloaded file with a more manageable name: r.fits.bz2 (this is an r-band image of NGC 5195, which was the directory name).

$ mkdir ngc5195
$ cd ngc5195
$ topurl=https://dr12.sdss.org/sas/dr12/boss/photoObj/frames
$ wget $topurl/301/3716/6/frame-r-003716-6-0117.fits.bz2 -Or.fits.bz2

When you want to reproduce a previous result (a known analysis, on a known dataset, to get a known result: like the case here!) it is important to verify that the file is correct: that the input file has not changed (on the remote server, or in your own archive), or there was no downloading problem. Otherwise, if the data have changed in your server/archive, and you use the same script, you will get a different result, causing a lot of confusion!

One good way to verify the contents of a file is to store its Checksum in your analysis script and check it before any other operation. The Checksum algorithms look into the contents of a file and calculate a fixed-length string from them. If any change (even in a bit or byte) is made within the file, the resulting string will change, for more see Wikipedia. There are many common algorithms, but a simple one is the SHA-1 algorithm (Secure Hash Algorithm 1) that you can calculate easily with the command below (the second line is the output, and the checksum is the first/long string: it is independent of the file name)

$ sha1sum r.fits.bz2
5fb06a572c6107c72cbc5eb8a9329f536c7e7f65  r.fits.bz2

If the checksum on your computer is different from this, either the file has been incorrectly downloaded (most probable), or it has changed on SDSS servers (very unlikely46). To get a better feeling of checksums open your favorite text editor and make a test file by writing something in it. Save it and calculate the text file’s SHA-1 checksum with sha1sum. Try renaming that file, and you’ll see the checksum has not changed (checksums only look into the contents, not the name/location of the file). Then open the file with your text editor again, make a change and re-calculate its checksum, you’ll see the checksum string has changed.

Its always good to keep this short checksum string with your project’s scripts and validate your input data before using them. You can do this with a shell conditional like this:

filename=r.fits.bz2
expected=5fb06a572c6107c72cbc5eb8a9329f536c7e7f65
sum=$(sha1sum $filename | awk '{print $1}')
if [ $sum = $expected ]; then
  echo "$filename: validated"
else
  echo "$filename: wrong checksum!"
  exit 1
fi

Now that we know you have the same data that we wrote this tutorial with, let’s continue. The SDSS server keeps the files in a Bzip2 compressed file format (that have a .bz2 suffix). So we will first decompress it with the following command to use it as a normal FITS file. By convention, compression programs delete the original file (compressed when uncompressing, or uncompressed when compressing). To keep the original file, you can use the --keep or -k option which is available in most compression programs for this job. Here, we do not need the compressed file any more, so we will just let bunzip delete it for us and keep the directory clean.

$ bunzip2 r.fits.bz2

2.2.2 NoiseChisel optimization

In Detecting large extended targets we downloaded the single exposure SDSS image. Let’s see how NoiseChisel operates on it with its default parameters:

$ astnoisechisel r.fits -h0

As described in NoiseChisel and Multi-Extension FITS files, NoiseChisel’s default output is a multi-extension FITS file. Open the output r_detected.fits file and have a look at the extensions, the 0-th extension is only meta-data and contains NoiseChisel’s configuration parameters. The rest are the Sky-subtracted input, the detection map, Sky values and Sky standard deviation.

$ ds9 -mecube r_detected.fits -zscale -zoom to fit

Flipping through the extensions in a FITS viewer, you will see that the first image (Sky-subtracted image) looks reasonable: there are no major artifacts due to bad Sky subtraction compared to the input. The second extension also seems reasonable with a large detection map that covers the whole of NGC5195, but also extends towards the bottom of the image where we actually see faint and diffuse signal in the input image.

Now try flipping between the DETECTIONS and SKY extensions. In the SKY extension, you’ll notice that there is still significant signal beyond the detected pixels. You can tell that this signal belongs to the galaxy because the far-right side of the image (away from M51) is dark (has lower values) and the brighter parts in the Sky image (with larger values) are just under the detections and follow a similar pattern.

The fact that signal from the galaxy remains in the SKY HDU shows that NoiseChisel can be optimized for a much better result. The SKY extension must not contain any light around the galaxy. Generally, any time your target is much larger than the tile size and the signal is very diffuse and extended at low signal-to-noise values (like this case), this will happen. Therefore, when there are large objects in the dataset, the best place to check the accuracy of your detection is the estimated Sky image.

When dominated by the background, noise has a symmetric distribution. However, signal is not symmetric (we do not have negative signal). Therefore when non-constant47 signal is present in a noisy dataset, the distribution will be positively skewed. For a demonstration, see Figure 1 of Akhlaghi and Ichikawa 2015. This skewness is a good measure of how much faint signal we have in the distribution. The skewness can be accurately measured by the difference in the mean and median (assuming no strong outliers): the more distant they are, the more skewed the dataset is. This important concept will be discussed more extensively in the next section (Skewness caused by signal and its measurement).

However, skewness is only a proxy for signal when the signal has structure (varies per pixel). Therefore, when it is approximately constant over a whole tile, or sub-set of the image, the constant signal’s effect is just to shift the symmetric center of the noise distribution to the positive and there will not be any skewness (major difference between the mean and median). This positive48 shift that preserves the symmetric distribution is the Sky value. When there is a gradient over the dataset, different tiles will have different constant shifts/Sky-values, for example, see Figure 11 of Akhlaghi and Ichikawa 2015.

To make this very large diffuse/flat signal detectable, you will therefore need a larger tile to contain a larger change in the values within it (and improve number statistics, for less scatter when measuring the mean and median). So let’s play with the tessellation a little to see how it affects the result. In Gnuastro, you can see the option values (--tilesize in this case) by adding the -P option to your last command. Try running NoiseChisel with -P to see its default tile size.

You can clearly see that the default tile size is indeed much smaller than this (huge) galaxy and its tidal features. As a result, NoiseChisel was unable to identify the skewness within the tiles under the outer parts of M51 and NGC 5159 and the threshold has been over-estimated on those tiles. To see which tiles were used for estimating the quantile threshold (no skewness was measured), you can use NoiseChisel’s --checkqthresh option:

$ astnoisechisel r.fits -h0 --checkqthresh

Did you see how NoiseChisel aborted after finding and applying the quantile thresholds? When you call any of NoiseChisel’s --check* options, by default, it will abort as soon as all the check steps have been written in the check file (a multi-extension FITS file). This allows you to focus on the problem you wanted to check as soon as possible (you can disable this feature with the --continueaftercheck option).

To optimize the threshold-related settings for this image, let’s play with this quantile threshold check image a little. Do not forget that “Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer” (Anscombe 1973, see Gnuastro manifesto: Science and its tools). A good scientist must have a good understanding of her tools to make a meaningful analysis. So do not hesitate in playing with the default configuration and reviewing the manual when you have a new dataset (from a new instrument) in front of you. Robust data analysis is an art, therefore a good scientist must first be a good artist. So let’s open the check image as a multi-extension cube:

$ ds9 -mecube r_qthresh.fits -zscale -cmap sls -zoom to fit

The first extension (called CONVOLVED) of r_qthresh.fits is the convolved input image where the threshold(s) is(are) defined (and later applied to). For more on the effect of convolution and thresholding, see Sections 3.1.1 and 3.1.2 of Akhlaghi and Ichikawa 2015. The second extension (QTHRESH_ERODE) has a blank/white value for all the pixels of any tile that was identified as having significant signal. The other tiles have the measured threshold over them. The next two extensions (QTHRESH_NOERODE and QTHRESH_EXPAND) are the other two quantile thresholds that are necessary in NoiseChisel’s later steps. Every step in this file is repeated on the three thresholds.

Play a little with the color bar of the QTHRESH_ERODE extension, you clearly see how the non-blank tiles around NGC 5195 have a gradient. As one line of attack against discarding too much signal below the threshold, NoiseChisel rejects outlier tiles. Go forward by three extensions to VALUE1_NO_OUTLIER and you will see that many of the tiles over the galaxy have been removed in this step. For more on the outlier rejection algorithm, see the latter half of Quantifying signal in a tile.

Even though much of the galaxy’s footprint has been rejected as outliers, there are still tiles with signal remaining: play with the DS9 color-bar and you still see a gradient near the outer tidal feature of the galaxy. Before trying to correct this, let’s look at the other extensions of this check image. We will use a * as a wild-card that can be 1, 2 or 3. In the THRESH*_INTERP extensions, you see that all the blank tiles have been interpolated using their nearest neighbors (the relevant option here is --interpnumngb). In the following THRESH*_SMOOTH extensions, you can see the tile values after smoothing (configured with --smoothwidth option). Finally, in QTHRESH-APPLIED, you see the thresholded image: pixels with a value of 1 will be eroded later, but pixels with a value of 2 will pass the erosion step un-touched.

Let’s get back to the problem of optimizing the result. You have two strategies for detecting the outskirts of the merging galaxies: 1) Increase the tile size to get more accurate measurements of skewness. 2) Strengthen the outlier rejection parameters to discard more of the tiles with signal (primarily by increasing --outliernumngb). Fortunately in this image we have a sufficiently large region on the right side of the image that the galaxy does not extend to. So we can use the more robust first solution. In situations where this does not happen (for example, if the field of view in this image was shifted to the left to have more of M51 and less sky) you are limited to a combination of the two solutions or just to the second solution.

Skipping convolution for faster tests: The slowest step of NoiseChisel is the convolution of the input dataset. Therefore when your dataset is large (unlike the one in this test), and you are not changing the input dataset or kernel in multiple runs (as in the tests of this tutorial), it is faster to do the convolution separately once (using Convolve) and use NoiseChisel’s --convolved option to directly feed the convolved image and avoid convolution. For more on --convolved, see NoiseChisel input.

To better identify the skewness caused by the flat NGC 5195 and M51 tidal features on the tiles under it, we have to choose a larger tile size. Let’s try a tile size of 100 by 100 pixels and inspect the check image.

$ astnoisechisel r.fits -h0 --tilesize=100,100 --checkqthresh
$ ds9 -mecube r_qthresh.fits -zscale -cmap sls -zoom to fit

You can clearly see the effect of this increased tile size: the tiles are much larger and when you look into VALUE1_NO_OUTLIER, you see that all the tiles are nicely grouped on the right side of the image (the farthest from M51, where we do not see a gradient in QTHRESH_ERODE). Things look good now, so let’s remove --checkqthresh and let NoiseChisel proceed with its detection.

$ astnoisechisel r.fits -h0 --tilesize=100,100
$ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit

The detected pixels of the DETECTIONS extension have expanded a little, but not as much. Also, the gradient in the SKY image is almost fully removed (and does not fall over M51 anymore). However, on the bottom-right of the m51 detection, we see many holes gradually increasing in size. This hints that there is still signal out there. Let’s check the next series of detection steps by adding the --checkdetection option this time:

$ astnoisechisel r.fits -h0 --tilesize=100,100 --checkdetection
$ ds9 -mecube r_detcheck.fits -zscale -cmap sls -zoom to fit

The output now has 16 extensions, showing every step that is taken by NoiseChisel. The first and second (INPUT and CONVOLVED) are clear from their names. The third (THRESHOLDED) is the thresholded image after finding the quantile threshold (last extension of the output of --checkqthresh). The fourth HDU (ERODED) is new: it is the name-stake of NoiseChisel, or eroding pixels that are above the threshold. By erosion, we mean that all pixels with a value of 1 (above the threshold) that are touching a pixel with a value of 0 (below the threshold) will be flipped to zero (or “carved” out)49. You can see its effect directly by going back and forth between the THRESHOLDED and ERODED extensions.

In the fifth extension (OPENED-AND-LABELED) the image is “opened”, which is a name for eroding once, then dilating (dilation is the inverse of erosion). This is good to remove thin connections that are only due to noise. Each separate connected group of pixels is also given its unique label here. Do you see how just beyond the large M51 detection, there are many smaller detections that get smaller as you go more distant? This hints at the solution: the default number of erosions is too much. Let’s see how many erosions take place by default (by adding -P | grep erode to the previous command)

$ astnoisechisel r.fits -h0 --tilesize=100,100 -P | grep erode

We see that the value of erode is 2. The default NoiseChisel parameters are primarily targeted to processed images (where there is correlated noise due to all the processing that has gone into the warping and stacking of raw images, see NoiseChisel optimization for detection). In those scenarios 2 erosions are commonly necessary. But here, we have a single-exposure image where there is no correlated noise (the pixels are not mixed). So let’s see how things change with only one erosion:

$ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1 \
                 --checkdetection
$ ds9 -mecube r_detcheck.fits -zscale -cmap sls -zoom to fit

Looking at the OPENED-AND-LABELED extension again, we see that the main/large detection is now much larger than before. While the immediately-outer connected regions are still present, they have decreased dramatically, so we can pass this step.

After the OPENED-AND-LABELED extension, NoiseChisel goes onto finding false detections using the undetected pixels. The process is fully described in Section 3.1.5. (Defining and Removing False Detections) of Akhlaghi and Ichikawa 2015. Please compare the extensions to what you read there and things will be very clear. In the last HDU (DETECTION-FINAL), we have the final detected pixels that will be used to estimate the Sky and its Standard deviation. We see that the main detection has indeed been detected very far out, so let’s see how the full NoiseChisel will estimate the Sky and its standard deviation (by removing --checkdetection):

$ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1
$ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit

The DETECTIONS extension of r_detected.fits closely follows what the DETECTION-FINAL of the check image (looks good!). If you go ahead to the SKY extension, things still look good. But it can still be improved.

Look at the DETECTIONS again, you will see the right-ward edges of M51’s detected pixels have many “holes” that are fully surrounded by signal (value of 1) and the signal stretches out in the noise very thinly (the size of the holes increases as we go out). This suggests that there is still undetected signal and that we can still dig deeper into the noise.

With the --detgrowquant option, NoiseChisel will “grow” the detections in to the noise. Its value is the ultimate limit of the growth in units of quantile (between 0 and 1). Therefore --detgrowquant=1 means no growth and --detgrowquant=0.5 means an ultimate limit of the Sky level (which is usually too much and will cover the whole image!). See Figure 2 of Akhlaghi 2019 for more on this option. Try running the previous command with various values (from 0.6 to higher values) to see this option’s effect on this dataset. For this particularly huge galaxy (with signal that extends very gradually into the noise), we will set it to 0.75:

$ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1 \
                 --detgrowquant=0.75
$ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit

Beyond this level (smaller --detgrowquant values), you see many of the smaller background galaxies (towards the right side of the image) starting to create thin spider-leg-like features, showing that we are following correlated noise for too much. Please try it for yourself by changing it to 0.6 for example.

When you look at the DETECTIONS extension of the command shown above, you see the wings of the galaxy being detected much farther out, But you also see many holes which are clearly just caused by noise. After growing the objects, NoiseChisel also allows you to fill such holes when they are smaller than a certain size through the --detgrowmaxholesize option. In this case, a maximum area/size of 10,000 pixels seems to be good:

$ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1 \
                 --detgrowquant=0.75 --detgrowmaxholesize=10000
$ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit

When looking at the raw input image (which is very “shallow”: less than a minute exposure!), you do not see anything so far out of the galaxy. You might just think to yourself that “this is all noise, I have just dug too deep and I’m following systematics”! If you feel like this, have a look at the deep images of this system in Watkins 2015, or a 12 hour deep image of this system (with a 12-inch telescope): https://i.redd.it/jfqgpqg0hfk11.jpg50. In these deeper images you clearly see how the outer edges of the M51 group follow this exact structure, below in Achieved surface brightness level, we will measure the exact level.

As the gradient in the SKY extension shows, and the deep images cited above confirm, the galaxy’s signal extends even beyond this. But this is already far deeper than what most (if not all) other tools can detect. Therefore, we will stop configuring NoiseChisel at this point in the tutorial and let you play with the other options a little more, while reading more about it in the papers: Akhlaghi and Ichikawa 2015 and 2019) and NoiseChisel. When you do find a better configuration feel free to contact us for feedback. Do not forget that good data analysis is an art, so like a sculptor, master your chisel for a good result.

To avoid typing all these options every time you run NoiseChisel on this image, you can use Gnuastro’s configuration files, see Configuration files. For an applied example of setting/using them, see Option management and configuration files.

This NoiseChisel configuration is NOT GENERIC: Do not use the configuration derived above, on another instrument’s image blindly. If you are unsure, just use the default values. As you saw above, the reason we chose this particular configuration for NoiseChisel to detect the wings of the M51 group was strongly influenced by the noise properties of this particular image. Remember NoiseChisel optimization for detection, where we looked into the very deep XDF image which had strong correlated noise?

As long as your other images have similar noise properties (from the same data-reduction step of the same instrument), you can use your configuration on any of them. But for images from other instruments, please follow a similar logic to what was presented in these tutorials to find the optimal configuration.

Smart NoiseChisel: As you saw during this section, there is a clear logic behind the optimal parameter value for each dataset. Therefore, we plan to add capabilities to (optionally) automate some of the choices made here based on the actual dataset, please join us in doing this if you are interested. However, given the many problems in existing “smart” solutions, such automatic changing of the configuration may cause more problems than they solve. So even when they are implemented, we would strongly recommend quality checks for a robust analysis.


2.2.3 Skewness caused by signal and its measurement

In the previous section (NoiseChisel optimization) we showed how to customize NoiseChisel for a single-exposure SDSS image of the M51 group. During the customization, we also discussed the skewness caused by signal. In the next section (Image surface brightness limit), we will use this to measure the surface brightness limit of the image. However, to better understand NoiseChisel and also, the image surface brightness limit, understanding the skewness caused by signal, and how to measure it properly are very important. Therefore now that we have separated signal from noise, let’s pause for a moment and look into skewness, how signal creates it, and find the best way to measure it.

Let’s start masking all the detected pixels found at the end of the previous section (NoiseChisel optimization) and having a look at the noise distribution with Gnuastro’s Arithmetic and Statistics programs as shown below (while visually inspecting the masked image with DS9 in the middle).

$ astarithmetic r_detected.fits -hINPUT-NO-SKY set-in \
                r_detected.fits -hDETECTIONS set-det \
                in det nan where -odet-masked.fits
$ ds9 det-masked.fits
$ aststatistics det-masked.fits

You will see that Gnuastro’s Statistics program prints an ASCII histogram when no option is given (it is shown below). This is done to give you a fast and easy view of the distribution of values in the dataset (pixels in an image, or rows in a table’s column).

-------
Input: det-masked.fits (hdu: 1)
-------
  Number of elements:                      903920
  Minimum:                                 -0.113543
  Maximum:                                 0.130339
  Median:                                  -0.00216306
  Mean:                                    -0.0001893073877
  Standard deviation:                      0.02569057188
-------
Histogram:
 |                              ** *
 |                            * ** *  *
 |                           ** ** *  *
 |                         * ** ** ** *
 |                        ** ** ** ** * **
 |                        ** ** ** ** * ** *
 |                      * ** ** ** ** * ** **
 |                     ** ** ** ** **** ** ** *
 |                  ** ** ** ** ** **** ** ** ** *
 |               ** ** ** ** ** ** ******* ** ** ** *
 |*********** ** ** ** ******************* ** ** ** ** ***** ** ***** **
 |----------------------------------------------------------------------

This histogram shows a roughly symmetric noise distribution, so let’s have a look at its skewness. The most commonly used definition of skewness is known as the “Pearson’s first skewness coefficient”. It measures the difference between the mean and median, in units of the standard deviation (STD):

$$\rm{Skewness}\equiv\frac{(\rm{mean}-\rm{median})}{\rm{STD}}$$

The logic behind this definition is simple: as more signal is added to the same pixels that originally only have raw noise (skewness is increased), the mean shifts to the positive faster than the median, so the distance between the mean and median should increase. Let’s measure the skewness (as defined above) over the image without any signal. Its very easy with Gnuastro’s Statistics program (and piping the output to AWK):

$ aststatistics det-masked.fits --mean --median --std \
                | awk '{print ($1-$2)/$3}'
0.0768279

We see that the mean and median are only \(0.08\sigma\) (rounded) away from each other (which is very close)! All pixels with significant signal are masked, so this is expected, and everything is fine. Now, let’s check the pixel distribution of the sky-subtracted input (where pixels with significant signal remain, and are not masked):

$ ds9 r_detected.fits
$ aststatistics r_detected.fits -hINPUT-NO-SKY
-------
Input: r_detected.fits (hdu: INPUT-NO-SKY)
Unit: nanomaggy
-------
  Number of elements:                      3049472
  Minimum:                                 -0.113543
  Maximum:                                 159.25
  Median:                                  0.0241158
  Mean:                                    0.1057885317
  Standard deviation:                      0.698167489
-------
Histogram:
 |*
 |*
 |*
 |*
 |*
 |*
 |*
 |*
 |*
 |*
 |******************************************* ***  ** ****  * *   *  * *
 |----------------------------------------------------------------------

Comparing the distributions above, you can see that the minimum value of the image has not changed because we have not masked the minimum values. However, as expected, the maximum value of the image has changed (from \(0.13\) to \(159.25\)). This is clearly evident from the ASCII histogram: the distribution is very elongated because the galaxy inside the image is extremely bright.

Now, let’s limit the displayed information with the --lessthan=0.13 option of Statistics as shown below (to only use values less than 0.13; the maximum of the image where all signal is masked).

$ aststatistics r_detected.fits -hINPUT-NO-SKY --lessthan=0.13
-------
Input: r_detected.fits (hdu: INPUT-NO-SKY)
Range: up to (exclusive) 0.13.
Unit: nanomaggy
-------
  Number of elements:                      2531949
  Minimum:                                 -0.113543
  Maximum:                                 0.126233
  Median:                                  0.0137138
  Mean:                                    0.01735551527
  Standard deviation:                      0.03590550597
-------
Histogram:
 |                                   *
 |                                * ** **
 |                             *  * ** ** **
 |                             *  * ** ** ** *
 |                           * ** * ** ** ** *
 |                          ** ** * ** ** ** *  *
 |                        * ** ** * ** ** ** *  *
 |                       ** ** ** * ** ** ** ** * ** *
 |                     * ** ** **** ** ** ** **** ** ** **
 |                  * ** ** ** **** ** ** ** ******* ** ** ** * ** ** **
 |***** ** ********** ** ** ********** ** ********** ** ************* **
 |----------------------------------------------------------------------

The improvement is obvious: the ASCII histogram better shows the pixel values near the noise level. We can now compare with the distribution of det-masked.fits that we found earlier. The ASCII histogram of det-masked.fits was approximately symmetric, while this is asymmetric in this range, especially in outer (to the right, or positive) direction. The heavier right-side tail is a clear visual demonstration of skewness that is caused by the signal in the un-masked image.

Having visually confirmed the skewness, let’s quantify it with Pearson’s first skewness coefficient. Like before, we can simply use Gnuastro’s Statistics and AWK for the measurement and calculation:

$ aststatistics r_detected.fits --mean --median --std \
                | awk '{print ($1-$2)/$3}'
0.116982

The difference between the mean and median is now approximately \(0.12\sigma\). This is larger than the skewness of the masked image (which was approximately \(0.08\sigma\)). At a glance (only looking at the numbers), it seems that there is not much difference between the two distributions. However, visually looking at the non-masked image, or the ASCII histogram, you would expect the quantified skewness to be much larger than that of the masked image, but that has not happened! Why is that?

The reason is that the presence of signal does not only shift the mean and median, it also increases the standard deviation! To see this for yourself, compare the standard deviation of det-masked.fits (which was approximately \(0.025\)) to r_detected.fits (without --lessthan; which was approximately \(0.699\)). The latter is almost 28 times larger!

This happens because the standard deviation is defined only in a symmetric (and Gaussian) distribution. In a non-Gaussian distribution, the standard deviation is poorly defined and is not a good measure of “width”. Since Pearson’s first skewness coefficient is defined in units of the standard deviation, this very large increase in the standard deviation has hidden the much increased distance between the mean and median after adding signal.

We therefore need a better unit or scale to quantify the distance between the mean and median. A unit that is less affected by skewness or outliers. One solution that we have found to be very useful is the quantile units or quantile scale. The quantile scale is defined by first sorting the dataset (which has \(N\) elements). If we want the quantile of a value \(V\) in a distribution, we first find the nearest data element to \(V\) in the sorted dataset. Let’s assume the nearest element is the \(i\)-th element, counting from 0, after sorting. The quantile of V in that distribution is then defined as \(i/(N-1)\) (which will have a value between 0 and 1).

The quantile of the median is obvious from its definition: 0.5. This is because the median is defined to be the middle element of the distribution after sorting. We can therefore define skewness as the quantile of the mean (\(q_m\)). If \(q_m\sim0.5\) (the median), then the distribution (of signal blended in noise) is symmetric (possibly Gaussian, but the functional form is irrelevant here). A larger value for \(|q_m-0.5|\) quantifies a more skewed the distribution. Furthermore, a \(q_m>0.5\) signifies a positive skewness, while \(q_m<0.5\) signifies a negative skewness.

Let’s put this definition to a test on the same two images we have already created. Fortunately Gnuastro’s Statistics program has the --quantofmean option to easily calculate \(q_m\) for you. So testing is easy:

$ aststatistics det-masked.fits --quantofmean
0.51295636

$ aststatistics r_detected.fits -hINPUT-NO-SKY --quantofmean
0.8105163158

The two quantiles of mean are now very distinctly different (\(0.51\) and \(0.81\)): differing by about \(0.3\) (on a scale of 0 to 1)! Recall that when defining skewness with Pearson’s first skewness coefficient, their difference was negligible (\(0.04\sigma\))! You can now better appreciate why we discussed quantile so extensively in NoiseChisel optimization. In case you would like to know more about the usage of the quantile of the mean in Gnuastro, please see Quantifying signal in a tile, or watch this video demonstration: https://peertube.stream/w/35b7c398-9fd7-4bcf-8911-1e01c5124585.


2.2.4 Image surface brightness limit

When your science is related to extended emission (like the example here) and you are presenting your results in a scientific conference, usually the first thing that someone will ask (if you do not explicitly say it!), is the dataset’s surface brightness limit (a standard measure of the noise level), and your target’s surface brightness (a measure of the signal, either in the center or outskirts, depending on context). For more on the basics of these important concepts please see Quantifying measurement limits). So in this section of the tutorial, we will measure these values for this image and this target.

Before measuring the surface brightness limit, let’s see how reliable our detection was. In other words, let’s see how “clean” our noise is (after masking all detections, as described previously in Skewness caused by signal and its measurement)

$ aststatistics det-masked.fits --quantofmean
0.5111848629

Showing that the mean is indeed very close to the median, although just about 1 quantile larger. As we saw in NoiseChisel optimization, a very small residual signal still remains in the undetected regions and this very small difference is a quantitative measure of that undetected signal. It was up to you as an exercise to improve it, so we will continue with this dataset.

The surface brightness limit of the image can be measured from the masked image and the equation in Quantifying measurement limits. Let’s do it for a \(3\sigma\) surface brightness limit over an area of \(25 \rm{arcsec}^2\):

$ nsigma=3
$ zeropoint=22.5
$ areaarcsec2=25
$ std=$(aststatistics det-masked.fits --sigclip-std)
$ pixarcsec2=$(astfits det-masked.fits --pixelscale --quiet \
                       | awk '{print $3*3600*3600}')
$ astarithmetic --quiet $nsigma $std x \
                $areaarcsec2 $pixarcsec2 x \
                sqrt / $zeropoint counts-to-mag
26.0241

The customizable steps above are good for any type of mask. For example, your field of view may contain a very deep part so you need to mask all the shallow parts as well as the detections before these steps. But when your image is flat (like this), there is a much simpler method to obtain the same value through MakeCatalog (when the standard deviation image is made by NoiseChisel). NoiseChisel has already calculated the minimum (MINSTD), maximum (MAXSTD) and median (MEDSTD) standard deviation within the tiles during its processing and has stored them as FITS keywords within the SKY_STD HDU. You can see them by piping all the keywords in this HDU into grep. In Grep, each ‘.’ represents one character that can be anything so M..STD will match all three keywords mentioned above.

$ astfits r_detected.fits --hdu=SKY_STD | grep 'M..STD'

The MEDSTD value is very similar to the standard deviation derived above, so we can safely use it instead of having to mask and run Statistics. In fact, MakeCatalog also uses this keyword and will report the dataset’s \(n\sigma\) surface brightness limit as keywords in the output (not as measurement columns, since it is related to the noise, not labeled signal):

$ astmkcatalog r_detected.fits -hDETECTIONS --output=sbl.fits \
               --forcereadstd --ids

Before looking into the measured surface brightness limits, let’s review some important points about this call to MakeCatalog first:

  • We are only concerned with the noise (not the signal), so we do not ask for any further measurements, because they can un-necessarily slow it down. However, MakeCatalog requires at least one column, so we will only ask for the --ids column (which does not need any measurement!). The output catalog will therefore have a single row and a single column, with 1 as its value51.
  • If we do not ask for any noise-related column (for example, the signal-to-noise ratio column with --sn, among other noise-related columns), MakeCatalog is not going to read the noise standard deviation image (again, to speed up its operation when it is redundant). We are thus using the --forcereadstd option (short for “force read standard deviation image”) here so it is ready for the surface brightness limit measurements that are written as keywords.

With the command below you can see all the keywords that were measured with the table. Notice the group of keywords that are under the “Surface brightness limit (SBL)” title.

$ astfits sbl.fits -h1

Since all the keywords of interest here start with SBL, we can get a more cleaner view with this command.

$ astfits sbl.fits -h1 | grep ^SBL

Notice how the SBLSTD has the same value as NoiseChisel’s MEDSTD above. Using SBLSTD, MakeCatalog has determined the \(n\sigma\) surface brightness limiting magnitude in these header keywords. The multiple of \(\sigma\), or \(n\), is the value of the SBLNSIG keyword which you can change with the --sfmagnsigma. The surface brightness limiting magnitude within a pixel (SBLNSIG) and within a pixel-agnostic area of SBLAREA arcsec\(^2\) are stored in SBLMAG.

You will notice that the two surface brightness limiting magnitudes above have values around 3 and 4 (which is not correct!). This is because we have not given a zero point magnitude to MakeCatalog, so it uses the default value of 0. SDSS image pixel values are calibrated in units of “nanomaggy” which are defined to have a zero point magnitude of 22.552. So with the first command below we give the zero point value and with the second we can see the surface brightness limiting magnitudes with the correct values (around 25 and 26)

$ astmkcatalog r_detected.fits -hDETECTIONS --zeropoint=22.5 \
               --output=sbl.fits --forcereadstd --ids
$ astfits sbl.fits -h1 | grep ^SBL

As you see from SBLNSIG and SBLAREA, the default multiple of sigma is 1 and the default area is 1 arcsec\(^2\). Usually higher values are used for these two parameters. Following the manual example we did above, you can ask for the multiple of sigma to be 3 and the area to be 25 arcsec\(^2\):

$ astmkcatalog r_detected.fits -hDETECTIONS --zeropoint=22.5 \
               --output=sbl.fits --sfmagarea=25 --sfmagnsigma=3 \
               --forcereadstd --ids
$ astfits sbl.fits -h1 | awk '/^SBLMAG /{print $3}'
26.02296

You see that the value is identical to the custom surface brightness limiting magnitude we measured above (a difference of \(0.00114\) magnitudes is negligible and hundreds of times larger than the typical errors in the zero point magnitude or magnitude measurements). But it is much more easier to have MakeCatalog do this measurement, because these values will be appended (as keywords) into your final catalog of objects within that image.

Custom STD for MakeCatalog’s Surface brightness limit: You can manually change/set the value of the MEDSTD keyword in your input STD image with Fits:

$ std=$(aststatistics masked.fits --sigclip-std)
$ astfits noisechisel.fits -hSKY_STD --update=MEDSTD,$std

With this change, MakeCatalog will use your custom standard deviation for the surface brightness limit. This is necessary in scenarios where your image has multiple depths and during your masking, you also mask the shallow regions (as well as the detections of course).

We have successfully measured the image’s \(3\sigma\) surface brightness limiting magnitude over 25 arcsec\(^2\). However, as discussed in Quantifying measurement limits this value is just an extrapolation of the per-pixel standard deviation. Issues like correlated noise will cause the real noise over a large area to be different. So for a more robust measurement, let’s use the upper-limit magnitude of similarly sized region. For more on the upper-limit magnitude, see the respective item in Quantifying measurement limits.

In summary, the upper-limit measurements involve randomly placing the footprint of an object in undetected parts of the image many times. This results in a random distribution of brightness measurements, the standard deviation of that distribution is then converted into magnitudes. To be comparable with the results above, let’s make a circular aperture that has an area of 25 arcsec\(^2\) (thus with a radius of \(2.82095\) arcsec).

zeropoint=22.5
r_arcsec=2.82095

## Convert the radius (in arcseconds) to pixels.
r_pixel=$(astfits r_detected.fits --pixelscale -q \
                  | awk '{print '$r_arcsec'/($1*3600)}')

## Make circular aperture at pixel (100,100) position is irrelevant.
echo "1 100 100 5 $r_pixel 0 0 1 1 1" \
     | astmkprof --background=r_detected.fits \
                 --clearcanvas --mforflatpix --type=uint8 \
                 --output=lab.fits

## Do the upper-limit measurement, ignoring all NoiseChisel's
## detections as a mask for the upper-limit measurements.
astmkcatalog lab.fits -h1 --zeropoint=$zeropoint -osbl.fits \
             --sfmagarea=25 --sfmagnsigma=3 --forcereadstd \
             --valuesfile=r_detected.fits --valueshdu=INPUT-NO-SKY \
             --upmaskfile=r_detected.fits --upmaskhdu=DETECTIONS \
             --upnsigma=3 --checkuplim=1 --upnum=1000 \
             --ids --upperlimit-sb

The sbl.fits catalog now contains the upper-limit surface brightness for a circle with an area of 25 arcsec\(^2\). You can check the value with the command below, but the great thing is that now you have both of the surface brightness limiting magnitude in the headers discussed above, and the upper-limit surface brightness within the table. You can also add more profiles with different shapes and sizes if necessary. Of course, you can also use --upperlimit-sb in your actual science objects and clumps to get an object-specific or clump-specific value.

$ asttable sbl.fits -cUPPERLIMIT_SB
25.9119

You will get a slightly different value from the command above. In fact, if you run the MakeCatalog command again and look at the measured upper-limit surface brightness, it will be slightly different with your first trial! Please try exactly the same MakeCatalog command above a few times to see how it changes.

This is because of the random factor in the upper-limit measurements: every time you run it, different random points will be checked, resulting in a slightly different distribution. You can decrease the random scatter by increasing the number of random checks (for example, setting --upnum=100000, compared to 1000 in the command above). But this will be slower and the results will not be exactly reproducible. The only way to ensure you get an identical result later is to fix the random number generator function and seed like the command below53. This is a very important point regarding any statistical process involving random numbers, please see Generating random numbers.

export GSL_RNG_TYPE=ranlxs1
export GSL_RNG_SEED=1616493518
astmkcatalog lab.fits -h1 --zeropoint=$zeropoint -osbl.fits \
             --sfmagarea=25 --sfmagnsigma=3 --forcereadstd \
             --valuesfile=r_detected.fits --valueshdu=INPUT-NO-SKY \
             --upmaskfile=r_detected.fits --upmaskhdu=DETECTIONS \
             --upnsigma=3 --checkuplim=1 --upnum=1000 \
             --ids --upperlimit-sb --envseed

But where do all the random apertures of the upper-limit measurement fall on the image? It is good to actually inspect their location to get a better understanding for the process and also detect possible bugs/biases. When MakeCatalog is run with the --checkuplim option, it will print all the random locations and their measured brightness as a table in a file with the suffix _upcheck.fits. With the first command below you can use Gnuastro’s asttable and astscript-ds9-region to convert the successful aperture locations into a DS9 region file, and with the second can load the region file into the detections and sky-subtracted image to visually see where they are.

## Create a DS9 region file from the check table (activated
## with '--checkuplim')
asttable lab_upcheck.fits --noblank=RANDOM_SUM \
         | astscript-ds9-region -c1,2 --mode=img \
                                --radius=$r_pixel

## Have a look at the regions in relation with NoiseChisel's
## detections.
ds9 r_detected.fits[INPUT-NO-SKY] -regions load ds9.reg
ds9 r_detected.fits[DETECTIONS]   -regions load ds9.reg

In this example, we were looking at a single-exposure image that has no correlated noise. Because of this, the surface brightness limit and the upper-limit surface brightness are very close. They will have a bigger difference on deep datasets with stronger correlated noise (that are the result of stacking many individual exposures). As an exercise, please try measuring the upper-limit surface brightness level and surface brightness limit for the deep HST data that we used in the previous tutorial (General program usage tutorial).


2.2.5 Achieved surface brightness level

In NoiseChisel optimization we customized NoiseChisel for a single-exposure SDSS image of the M51 group and in Image surface brightness limit we measured the surface brightness limit and the upper-limit surface brightness level (which are both measures of the noise level). In this section, let’s do some measurements on the outer-most edges of the M51 group to see how they relate to the noise measurements found in the previous section.

For this measurement, we will need to estimate the average flux on the outer edges of the detection. Fortunately all this can be done with a few simple commands using Arithmetic and MakeCatalog. First, let’s separate each detected region, or give a unique label/counter to all the connected pixels of NoiseChisel’s detection map with the command below. Recall that with the set- operator, the popped operand will be given a name (det in this case) for easy usage later.

$ astarithmetic r_detected.fits -hDETECTIONS set-det \
                det 2 connected-components -olabeled.fits

You can find the label of the main galaxy visually (by opening the image and hovering your mouse over the M51 group’s label). But to have a little more fun, let’s do this automatically (which is necessary in a general scenario). The M51 group detection is by far the largest detection in this image, this allows us to find its ID/label easily. We will first run MakeCatalog to find the area of all the labels, then we will use Table to find the ID of the largest object and keep it as a shell variable (id):

# Run MakeCatalog to find the area of each label.
$ astmkcatalog labeled.fits --ids --geo-area -h1 -ocat.fits

## Sort the table by the area column.
$ asttable cat.fits --sort=AREA_FULL

## The largest object, is the last one, so we will use '--tail'.
$ asttable cat.fits --sort=AREA_FULL --tail=1

## We only want the ID, so let's only ask for that column:
$ asttable cat.fits --sort=AREA_FULL --tail=1 --column=OBJ_ID

## Now, let's put this result in a variable (instead of printing)
$ id=$(asttable cat.fits --sort=AREA_FULL --tail=1 --column=OBJ_ID)

## Just to confirm everything is fine.
$ echo $id

We can now use the id variable to reject all other detections:

$ astarithmetic labeled.fits $id eq -oonly-m51.fits

Open the image and have a look. To separate the outer edges of the detections, we will need to “erode” the M51 group detection. So in the same Arithmetic command as above, we will erode three times (to have more pixels and thus less scatter), using a maximum connectivity of 2 (8-connected neighbors). We will then save the output in eroded.fits.

$ astarithmetic labeled.fits $id eq 2 erode 2 erode 2 erode \
                -oeroded.fits

In labeled.fits, we can now set all the 1-valued pixels of eroded.fits to 0 using Arithmetic’s where operator added to the previous command. We will need the pixels of the M51 group in labeled.fits two times: once to do the erosion, another time to find the outer pixel layer. To do this (and be efficient and more readable) we will use the set-i operator (to give this image the name ‘i’). In this way we can use it any number of times afterwards, while only reading it from disk and finding M51’s pixels once.

$ astarithmetic labeled.fits $id eq set-i i \
                i 2 erode 2 erode 2 erode 0 where -oedge.fits

Open the image and have a look. You’ll see that the detected edge of the M51 group is now clearly visible. You can use edge.fits to mark (set to blank) this boundary on the input image and get a visual feeling of how far it extends:

$ astarithmetic r.fits -h0 edge.fits nan where -oedge-masked.fits

To quantify how deep we have detected the low-surface brightness regions (in units of signal to-noise ratio), we will use the command below. In short it just divides all the non-zero pixels of edge.fits in the Sky subtracted input (first extension of NoiseChisel’s output) by the pixel standard deviation of the same pixel. This will give us a signal-to-noise ratio image. The mean value of this image shows the level of surface brightness that we have achieved. You can also break the command below into multiple calls to Arithmetic and create temporary files to understand it better. However, if you have a look at Reverse polish notation and Arithmetic operators, you should be able to easily understand what your computer does when you run this command54.

$ astarithmetic edge.fits -h1                  set-edge \
                r_detected.fits -hSKY_STD      set-skystd \
                r_detected.fits -hINPUT-NO-SKY set-skysub \
                skysub skystd / edge not nan where meanvalue --quiet

We have thus detected the wings of the M51 group down to roughly 1/3rd of the noise level in this image which is a very good achievement! But the per-pixel S/N is a relative measurement. Let’s also measure the depth of our detection in absolute surface brightness units; or magnitudes per square arc-seconds (see Brightness, Flux, Magnitude and Surface brightness). We will also ask for the S/N and magnitude of the full edge we have defined. Fortunately doing this is very easy with Gnuastro’s MakeCatalog:

$ astmkcatalog edge.fits -h1 --valuesfile=r_detected.fits \
               --zeropoint=22.5 --ids --sb --sn --magnitude
$ asttable edge_cat.fits
1      25.6971       55.2406       15.8994

We have thus reached an outer surface brightness of \(25.70\) magnitudes/arcsec\(^2\) (second column in edge_cat.fits) on this single exposure SDSS image! This is very similar to the surface brightness limit measured in Image surface brightness limit (which is a big achievement!). But another point in the result above is very interesting: the total S/N of the edge is \(55.24\) with a total edge magnitude55 of 15.90!!! This is very large for such a faint signal (recall that the mean S/N per pixel was 0.32) and shows a very important point in the study of galaxies: While the per-pixel signal in their outer edges may be very faint (and invisible to the eye in noise), a lot of signal hides deeply buried in the noise.

In interpreting this value, you should just have in mind that NoiseChisel works based on the contiguity of signal in the pixels. Therefore the larger the object, the deeper NoiseChisel can carve it out of the noise (for the same outer surface brightness). In other words, this reported depth, is the depth we have reached for this object in this dataset, processed with this particular NoiseChisel configuration. If the M51 group in this image was larger/smaller than this (the field of view was smaller/larger), or if the image was from a different instrument, or if we had used a different configuration, we would go deeper/shallower.


2.2.6 Extract clumps and objects (Segmentation)

In NoiseChisel optimization we found a good detection map over the image, so pixels harboring signal have been differentiated from those that do not. For noise-related measurements like the surface brightness limit, this is fine. However, after finding the pixels with signal, you are most likely interested in knowing the sub-structure within them. For example, how many star forming regions (those bright dots along the spiral arms) of M51 are within this image? What are the colors of each of these star forming regions? In the outer most wings of M51, which pixels belong to background galaxies and foreground stars? And many more similar questions. To address these questions, you can use Segment to identify all the “clumps” and “objects” over the detection.

$ astsegment r_detected.fits --output=r_segmented.fits
$ ds9 -mecube r_segmented.fits -cmap sls -zoom to fit -scale limits 0 2

Open the output r_segmented.fits as a multi-extension data cube with the second command above and flip through the first and second extensions, zoom-in to the spiral arms of M51 and see the detected clumps (all pixels with a value larger than 1 in the second extension). To optimize the parameters and make sure you have detected what you wanted, we recommend to visually inspect the detected clumps on the input image.

For visual inspection, you can make a simple shell script like below. It will first call MakeCatalog to estimate the positions of the clumps, then make an SAO DS9 region file and open ds9 with the image and region file. Recall that in a shell script, the numeric variables (like $1, $2, and $3 in the example below) represent the arguments given to the script. But when used in the AWK arguments, they refer to column numbers.

To create the shell script, using your favorite text editor, put the contents below into a file called check-clumps.sh. Recall that everything after a # is just comments to help you understand the command (so read them!). Also note that if you are copying from the PDF version of this book, fix the single quotes in the AWK command.

#! /bin/bash
set -e     # Stop execution when there is an error.
set -u     # Stop execution when a variable is not initialized.

# Run MakeCatalog to write the coordinates into a FITS table.
# Default output is `$1_cat.fits'.
astmkcatalog $1.fits --clumpscat --ids --ra --dec

# Use Gnuastro's Table and astscript-ds9-region to build the DS9
# region file (a circle of radius 1 arcseconds on each point).
asttable $1"_cat.fits" -hCLUMPS -cRA,DEC \
         | astscript-ds9-region -c1,2 --mode=wcs --radius=1 \
                                --output=$1.reg

# Show the image (with the requested color scale) and the region file.
ds9 -geometry 1800x3000 -mecube $1.fits -zoom to fit                   \
    -scale limits $2 $3 -regions load all $1.reg

# Clean up (delete intermediate files).
rm $1"_cat.fits" $1.reg

Finally, you just have to activate the script’s executable flag with the command below. This will enable you to directly/easily call the script as a command.

$ chmod +x check-clumps.sh

This script does not expect the .fits suffix of the input’s filename as the first argument. Because the script produces intermediate files (a catalog and DS9 region file, which are later deleted). However, we do not want multiple instances of the script (on different files in the same directory) to collide (read/write to the same intermediate files). Therefore, we have used suffixes added to the input’s name to identify the intermediate files. Note how all the $1 instances in the commands (not within the AWK command56) are followed by a suffix. If you want to keep the intermediate files, put a # at the start of the last line.

The few, but high-valued, bright pixels in the central parts of the galaxies can hinder easy visual inspection of the fainter parts of the image. With the second and third arguments to this script, you can set the numerical values of the color map (first is minimum/black, second is maximum/white). You can call this script with any57 output of Segment (when --rawoutput is not used) with a command like this:

$ ./check-clumps.sh r_segmented -0.1 2

Go ahead and run this command. You will see the intermediate processing being done and finally it opens SAO DS9 for you with the regions superimposed on all the extensions of Segment’s output. The script will only finish (and give you control of the command-line) when you close DS9. If you need your access to the command-line before closing DS9, add a & after the end of the command above.

While DS9 is open, slide the dynamic range (values for black and white, or minimum/maximum values in different color schemes) and zoom into various regions of the M51 group to see if you are satisfied with the detected clumps. Do not forget that through the “Cube” window that is opened along with DS9, you can flip through the extensions and see the actual clumps also. The questions you should be asking yourself are these: 1) Which real clumps (as you visually feel) have been missed? In other words, is the completeness good? 2) Are there any clumps which you feel are false? In other words, is the purity good?

Note that completeness and purity are not independent of each other, they are anti-correlated: the higher your purity, the lower your completeness and vice-versa. You can see this by playing with the purity level using the --snquant option. Run Segment as shown above again with -P and see its default value. Then increase/decrease it for higher/lower purity and check the result as before. You will see that if you want the best purity, you have to sacrifice completeness and vice versa.

One interesting region to inspect in this image is the many bright peaks around the central parts of M51. Zoom into that region and inspect how many of them have actually been detected as true clumps. Do you have a good balance between completeness and purity? Also look out far into the wings of the group and inspect the completeness and purity there.

An easier way to inspect completeness (and only completeness) is to mask all the pixels detected as clumps and visually inspecting the rest of the pixels. You can do this using Arithmetic in a command like below. For easy reading of the command, we will define the shell variable i for the image name and save the output in masked.fits.

$ in="r_segmented.fits -hINPUT-NO-SKY"
$ clumps="r_segmented.fits -hCLUMPS"
$ astarithmetic $in $clumps 0 gt nan where -oclumps-masked.fits

Inspecting clumps-masked.fits, you can see some very diffuse peaks that have been missed, especially as you go farther away from the group center and into the diffuse wings. This is due to the fact that with this configuration, we have focused more on the sharper clumps. To put the focus more on diffuse clumps, you can use a wider convolution kernel. Using a larger kernel can also help in detecting the existing clumps to fainter levels (thus better separating them from the surrounding diffuse signal).

You can make any kernel easily using the --kernel option in MakeProfiles. But note that a larger kernel is also going to wash-out many of the sharp/small clumps close to the center of M51 and also some smaller peaks on the wings. Please continue playing with Segment’s configuration to obtain a more complete result (while keeping reasonable purity). We will finish the discussion on finding true clumps at this point.

The properties of the clumps within M51, or the background objects can then easily be measured using MakeCatalog. To measure the properties of the background objects (detected as clumps over the diffuse region), you should not mask the diffuse region. When measuring clump properties with MakeCatalog and using the --clumpscat, the ambient flux (from the diffuse region) is calculated and subtracted. If the diffuse region is masked, its effect on the clump brightness cannot be calculated and subtracted.

To keep this tutorial short, we will stop here. See Segmentation and making a catalog and Segment for more on using Segment, producing catalogs with MakeCatalog and using those catalogs.


2.3 Building the extended PSF

Deriving the extended PSF of an image is very important in many aspects of the analysis of the objects within it. Gnuastro has a set of installed scripts, designed to simplify the process following the recipe of Infante-Sainz et al. 2020; for more, see PSF construction and subtraction. An overview of the process is given in Overview of the PSF scripts.


2.3.1 Preparing input for extended PSF

We will use an image of the M51 galaxy group in the r (SDSS) band of the Javalambre Photometric Local Universe Survey (J-PLUS) to extract its extended PSF. For more information on J-PLUS, and its unique features visit: http://www.j-plus.es.

First, let’s download the image from the J-PLUS web page using wget. But to have a generalize-able, and easy to read command, we will define some base variables (in all-caps) first. After the download is complete, open the image with SAO DS9 (or any other FITS viewer you prefer!) to have a feeling of the data (and of course, enjoy the beauty of M51 in such a wide field of view):

$ urlend="jplus-dr2/get_fits?id=67510"
$ urlbase="http://archive.cefca.es/catalogues/vo/siap/"
$ mkdir jplus-dr2
$ wget $urlbase$urlend -O jplus-dr2/67510.fits.fz
$ astscript-fits-view jplus-dr2/67510.fits.fz

After enjoying the large field of view, have a closer look at the edges of the image. Please zoom in to the corners. You will see that on the edges, the pixel values are either zero or with significantly different values than the main body of the image. This is due to the dithering pattern that was used to make this image and happens in all imaging surveys58. To avoid potential issues or problems that these regions may cause, we will first crop out the main body of the image with the command below. To keep the top-level directory clean, let’s also put the crop in a directory called flat.

$ mkdir flat
$ astcrop jplus-dr2/67510.fits.fz --section=225:9275,150:9350 \
          --mode=img -oflat/67510.fits
$ astscript-fits-view flat/67510.fits

Please zoom into the edges again, you will see that they now have the same noise-level as the rest of the image (the problematic parts are now gone).


2.3.2 Saturated pixels and Segment’s clumps

A constant-depth (flat) image was created in the previous section (Preparing input for extended PSF). As explained in Overview of the PSF scripts, an important step when building the PSF is to mask other sources in the image. Therefore, before going onto selecting stars, let’s detect all significant signal, and identify the clumps of background objects over the wings of the extended PSF.

There is a problem however: the saturated pixels of the bright stars are going to cause problems in the segmentation phase. To see this problem, let’s make a \(1000\times1000\) crop around a bright star to speed up the test (and its solution). Afterwards we will apply the solution to the whole image.

$ astcrop flat/67510.fits --mode=wcs --widthinpix --width=1000 \
          --center=203.3916736,46.7968652 --output=saturated.fits
$ astnoisechisel saturated.fits --output=sat-nc.fits
$ astsegment sat-nc.fits --output=sat-seg.fits
$ astscript-fits-view sat-seg.fits

Have a look at the CLUMPS extension. You will see that instead of a single clump at the center of the bright star, we have many clumps! This has happened because of the saturated pixels! When saturation occurs, the sharp peak of the profile is lost (like cutting off the tip of a mountain to build a telescope!) and all saturated pixels get a noisy value close to the saturation level. To see this saturation noise run the last command again and in SAO DS9, set the “Scale” to “min max” and zoom into the center. You will see the noisy saturation pixels at the center of the star in red.

This noise-at-the-peak disrupts Segment’s assumption to expand clumps from a local maxima: each noisy peak is being treated as a separate local maxima and thus a separate clump. For more on how Segment defines clumps, see Section 3.2.1 and Figure 8 of Akhlaghi and Ichikawa 2015. To have the center identified as a single clump, we should mask these saturated pixels in a way that suites Segment’s non-parametric methodology.

First we need to find the saturation level! The saturation level is usually fixed for any survey or input data that you receive from a certain database, so you will usually have to do this only once (the first time you get data from that database). Let’s make a smaller crop of \(50\times50\) pixels around the star with the first command below. With the next command, please look at the crop with DS9 to visually understand the problem. You will see the saturated pixels as the noisy red pixels in the center of the image. A non-saturated star will have a single pixel as the maximum and will not have such a large area covered by a noisy constant value (find a few stars in the image and see for yourself). Visual and qualitative inspection of the process is very important for understanding the solution.

$ astcrop saturated.fits --mode=wcs --widthinpix --width=50 \
          --center=203.3916736,46.7968652 --output=sat-center.fits
$ astscript-fits-view sat-center.fits --ds9scale=minmax

To quantitatively identify the saturation level in this image, let’s have a look at the distribution of pixels with a value larger than 100 (above the noise level):

$ aststatistics sat-center.fits --greaterequal=100
Histogram:
 |*
 |*
 |*
 |*
 |*                                                              *
 |**                                                             *
 |***                                                           **
 |****                                                          **
 |******                                                        ****
 |********** *    *   *                                        ******
 |************************* ************ * ***  ******* *** ************
 |----------------------------------------------------------------------

The peak you see in the right end (larger values) of the histogram shows the saturated pixels (a constant level, with some scatter due to the large Poisson noise). If there was no saturation, the number of pixels should have decreased at increasing values; until reaching the maximum value of the profile in one pixel. But that is not the case here. Please try this experiment on a non-saturated (fainter) star to see what we mean.

If you still have not experimented on a non-saturated star, please stop reading this tutorial! Please open flat/67510.fits in DS9, select a fainter/smaller star and repeat the last three commands (with a different center). After you have confirmed the point above (visually, and with the histogram), please continue with the rest of this tutorial.

Finding the saturation level is easy with Statistics (by using the --lessthan option until the histogram becomes as expected: only decreasing). First, let’s try --lessthan=3000:

$ aststatistics sat-center.fits --greaterequal=100 --lessthan=3000
-------
Histogram:
 |*
 |*
 |*
 |*
 |*
 |**
 |***                                                                  *
 |****                                                                 *
 |*******                                                             **
 |*********** * *   *   *   *                            *  *       ****
 |************************* *  ***** *******  *****  ** ***** * ********
 |----------------------------------------------------------------------

We still see an increase in the histogram around 3000. Let’s try a threshold of 2500:

$ aststatistics sat-center.fits --greaterequal=100 --lessthan=2500
-------
Histogram:
 |*
 |*
 |**
 |**
 |**
 |**
 |****
 |*****
 |*********
 |*************  *   *  *   *
 |*********************************   ** ** ** *** **  * ****   ** *****
 |----------------------------------------------------------------------

The peak at the large end of the histogram has gone! But let’s have a closer look at the values (the resolution of an ASCII histogram is limited!). To do this, we will ask Statistics to save the histogram into a table with the --histogram option, then look at the last 20 rows:

$ aststatistics sat-center.fits --greaterequal=100 --lessthan=2500 \
                --histogram --output=sat-center-hist.fits
$ asttable sat-center-hist.fits --tail=20
2021.1849112701    1
2045.0495397186    0
2068.9141681671    1
2092.7787966156    1
2116.6434250641    0
2140.5080535126    0
2164.3726819611    0
2188.2373104095    0
2212.101938858     1
2235.9665673065    1
2259.831195755     2
2283.6958242035    0
2307.560452652     0
2331.4250811005    1
2355.289709549     1
2379.1543379974    1
2403.0189664459    2
2426.8835948944    1
2450.7482233429    2
2474.6128517914    2

Since the number of points at the extreme end are increasing (from 1 to 2), We therefore see that a value 2500 is still above the saturation level (the number of pixels has started to increase)! A more reasonable saturation level for this image would be 2200! As an exercise, you can try automating this selection with AWK.

Therefore, we can set the saturation level in this image59 to be 2200. Let’s mask all such pixels with the command below:

$ astarithmetic saturated.fits set-i i i 2200 gt nan where \
                --output=sat-masked.fits
$ astscript-fits-view sat-masked.fits --ds9scale=minmax

Please see the peaks of several bright stars, not just the central very bright star. Zoom into each of the peaks you see. Besides the central very bright one that we were looking at closely until now, only one other star is saturated (its center is NaN, or Not-a-Number). Try to find it.

But we are not done yet! Please zoom-in to that central bright star and have another look on the edges of the vertical “bleeding” saturated pixels, there are strong positive/negative values touching it (almost like “waves”). These will also cause problems and have to be masked! So with a small addition to the previous command, let’s dilate the saturated regions (with 2-connectivity, or 8-connected neighbors) four times and have another look:

$ astarithmetic saturated.fits set-i i i 2200 gt \
                2 dilate 2 dilate 2 dilate 2 dilate \
                nan where --output=sat-masked.fits
$ astscript-fits-view sat-masked.fits --ds9scale=minmax

Now that saturated pixels (and their problematic neighbors) have been masked, we can convolve the image (recall that Segment will use the convolved image for identifying clumps) with the command below. However, we will use the Spatial Domain convolution which can account for blank pixels (for more on the pros and cons of spatial and frequency domain convolution, see Spatial vs. Frequency domain). We will also create a Gaussian kernel with \(\rm{FWHM}=2\) pixels, truncated at \(5\times\rm{FWHM}\).

$ astmkprof --kernel=gaussian,2,5 --oversample=1 -okernel.fits
$ astconvolve sat-masked.fits --kernel=kernel.fits --domain=spatial \
              --output=sat-masked-conv.fits
$ astscript-fits-view sat-masked-conv.fits --ds9scale=minmax

Please zoom-in to the star and look closely to see how after spatial-domain convolution, the problematic pixels are still NaN. But Segment requires the profile to start with a maximum value and decrease. So before feeding into Segment, let’s fill the blank values with the maximum value of the neighboring pixels in both the input and convolved images (see Interpolation operators):

$ astarithmetic sat-masked.fits 2 interpolate-maxofregion \
                --output=sat-fill.fits
$ astarithmetic sat-masked-conv.fits 2 interpolate-maxofregion \
                --output=sat-fill-conv.fits
$ astscript-fits-view sat-fill* --ds9scale=minmax

Have a closer look at the opened images. Please zoom-in (you will notice that they are already matched and locked, so they will both zoom-in together). Go to the centers of the saturated stars and confirm how they are filled with the largest non-blank pixel. We can now feed this image to NoiseChisel and Segment as the convolved image:

$ astnoisechisel sat-fill.fits --convolved=sat-fill-conv.fits \
                 --output=sat-nc.fits
$ astsegment sat-nc.fits --convolved=sat-fill-conv.fits \
             --output=sat-seg.fits --rawoutput
$ ds9 -mecube sat-seg.fits -zoom to fit -scale limits -1 1

See the CLUMPS extension. Do you see how the whole center of the star has indeed been identified as a single clump? We thus achieved our aim and did not let the saturated pixels harm the identification of the center!

If the issue was only clumps (like in a normal deep image processing), this was the end of Segment’s special considerations. However, in the scenario here, with the very extended wings of the bright stars, it usually happens that background objects become “clumps” in the outskirts and will rip the bright star outskirts into separate “objects”. In the next section (One object for the whole detection), we will describe how you can modify Segment to avoid this issue.


2.3.3 One object for the whole detection

In Saturated pixels and Segment’s clumps, we described how you can run Segment such that saturated pixels do not interfere with its clumps. However, due to the very extended wings of the PSF, the default definition of “objects” should also be modified for the scenario here. To better see the problem, let’s inspect now the OBJECTS extension, focusing on those objects with a label between 50 to 150 (which include the main star):

$ astscript-fits-view sat-seg.fits -hOBJECTS --ds9scale="limits 50 150"

We can see that the detection corresponding to the star has been broken into different objects. This is not a good object segmentation image for our scenario here. Since those objects in the outer wings of the bright star’s detection harbor a lot of the extended PSF. We want to keep them with the same “object” label as the star (we only need to mask the “clumps” of the background sources). To do this, we will make the following changes to Segment’s options (see Segmentation options for more on this options):

  • Since we want the extended diffuse flux of the PSF to be taken as a single object, we want all the grown clumps to touch. Therefore, it is necessary to decrease --gthresh to very low values, like \(-10\). Recall that its value is in units of the input standard deviation, so --gthresh=-10 corresponds to \(-10\sigma\). The default value is not for such extended sources that dominate all background sources.
  • Since we want all connected grown clumps to be counted as a single object in any case, we will set --objbordersn=0 (its smallest possible value).

Let’s make these changes and check if the star has been kept as a single object in the OBJECTS extension or not:

$ astsegment sat-nc.fits --convolved=sat-fill-conv.fits \
             --gthresh=-10 --objbordersn=0 \
             --output=sat-seg.fits --rawoutput
$ astscript-fits-view sat-seg.fits -hOBJECTS --ds9scale="limits 50 150"

Now we can extend these same steps to the whole image. To detect signal, we can run NoiseChisel using the command below. We modified the default value to two of the options, below you can see the reason for these changes. See Detecting large extended targets for more on optimizing NoiseChisel.

  • Since the image is so large, we have increased --outliernumngb to get better outlier statistics on the tiles. The default value is primarily for small images, so this is usually the first thing you should do when running NoiseChisel on a real/large image.
  • Since the image is not too deep (made from few exposures), it does not have strong correlated noise, so we will decrease --detgrowquant and increase --detgrowmaxholesize to better extract signal.

Furthermore, since both NoiseChisel and Segment need a convolved image, we will do the convolution before and feed it to both (to save running time). But in the first command below, let’s delete all the temporary files we made above.

Since the image is large (+300 MB), to avoid wasting storage, any temporary file that is no longer necessary for later processing is deleted after it is used. You can visually check each of them with DS9 before deleting them (or not delete them at all!). Generally, within a pipeline it is best to remove such large temporary files, because space runs out much faster than you think (for example, once you get good results and want to use more fields).

$ rm *.fits
$ mkdir label
$ astmkprof --kernel=gaussian,2,5 --oversample=1 \
            -olabel/kernel.fits
$ astarithmetic flat/67510.fits set-i i i 2200 gt \
                2 dilate 2 dilate 2 dilate 2 dilate nan where \
                --output=label/67510-masked-sat.fits
$ astconvolve label/67510-masked-sat.fits --kernel=label/kernel.fits \
              --domain=spatial --output=label/67510-masked-conv.fits
$ rm label/kernel.fits
$ astarithmetic label/67510-masked-sat.fits 2 interpolate-maxofregion \
                --output=label/67510-fill.fits
$ astarithmetic label/67510-masked-conv.fits 2 interpolate-maxofregion \
                --output=label/67510-fill-conv.fits
$ rm label/67510-masked-conv.fits
$ astnoisechisel label/67510-fill.fits --outliernumngb=100 \
                 --detgrowquant=0.8 --detgrowmaxholesize=100000 \
                 --convolved=label/67510-fill-conv.fits \
                 --output=label/67510-nc.fits
$ rm label/67510-fill.fits
$ astsegment label/67510-nc.fits --output=label/67510-seg-raw.fits \
             --convolved=label/67510-fill-conv.fits --rawoutput \
             --gthresh=-10 --objbordersn=0
$ rm label/67510-fill-conv.fits
$ astscript-fits-view label/67510-seg-raw.fits

We see that the saturated pixels have not caused any problem and the central clumps/objects of bright stars are now a single clump/object. We can now proceed to estimating the outer PSF. But before that, let’s make a “standard” segment output: one that can safely be fed into MakeCatalog for measurements and can contain all necessary outputs of this whole process in a single file (as multiple extensions).

The main problem is again the saturated pixels: we interpolated them to be the maximum of their nearby pixels. But this will cause problems in any measurement that is done over those regions. To let MakeCatalog know that those pixels should not be used, the first extension of the file given to MakeCatalog should have blank values on those pixels. We will do this with the commands below:

## First HDU of Segment (Sky-subtracted input)
$ astarithmetic label/67510-nc.fits -hINPUT-NO-SKY \
                label/67510-masked-sat.fits isblank nan where \
                --output=label/67510-seg.fits
$ astfits label/67510-seg.fits --update=EXTNAME,INPUT-NO-SKY

## Second and third HDUs: CLUMPS and OBJECTS
$ astfits label/67510-seg-raw.fits --copy=CLUMPS --copy=OBJECTS \
          --output=label/67510-seg.fits

## Fourth HDU: Sky standard deviation (from NoiseChisel):
$ astfits label/67510-nc.fits --copy=SKY_STD \
          --output=label/67510-seg.fits

## Clean up all the un-necessary files:
$ rm label/67510-masked-sat.fits label/67510-nc.fits \
     label/67510-seg-raw.fits

You can now simply run MakeCatalog on this image and be sure that saturated pixels will not affect the measurements. As one example, you can use MakeCatalog to find the clumps containing saturated pixels: recall that the --area column only calculates the area of non-blank pixels, while --geo-area calculates the area of the label (independent of their blank-ness in the values image):

$ astmkcatalog label/67510-seg.fits --ids --ra --dec --area \
               --geo-area --clumpscat --output=cat.fits

The information of the clumps that have been affected by saturation can easily be found by selecting those with a differing value in the AREA and AREA_FULL columns:

## With AWK (second command, counts the number of rows)
$ asttable cat.fits -hCLUMPS | awk '$5!=$6'
$ asttable cat.fits -hCLUMPS | awk '$5!=$6' | wc -l

## Using Table arithmetic (compared to AWK, you can use column
## names, save as FITS, and be faster):
$ asttable cat.fits -hCLUMPS -cRA,DEC --noblankend=3 \
         -c'arith AREA AREA AREA_FULL eq nan where'

## Remove the table (which was just for a demo)
$ rm cat.fits

We are now ready to start building the outer parts of the PSF in Building outer part of PSF.


2.3.4 Building outer part of PSF

In Saturated pixels and Segment’s clumps, and One object for the whole detection, we described how to create a Segment clump and object map, while accounting for saturated stars and not having over-fragmentation of objects in the outskirts of stars. We are now ready to start building the extended PSF.

First we will build the outer parts of the PSF, so we want the brightest stars. You will see we have several bright stars in this very large field of view, but we do not yet have a feeling how many they are, and at what magnitudes. So let’s use Gnuastro’s Query program to find the magnitudes of the brightest stars (those brighter than g-magnitude 10 in Gaia data release 3, or DR3). For more on Query, see Query.

$ astquery gaia --dataset=dr3 --overlapwith=flat/67510.fits \
           --range=phot_g_mean_mag,-inf,10 \
           --output=flat/67510-bright.fits

Now, we can easily visualize the magnitude and positions of these stars using astscript-ds9-region and the command below (for more on this script, see SAO DS9 region files from table)

$ astscript-ds9-region flat/67510-bright.fits -cra,dec \
           --namecol=phot_g_mean_mag \
           --command="ds9 flat/67510.fits -zoom to fit -zscale"

You can see that we have several stars between magnitudes 6 to 10. Let’s use astscript-psf-select-stars in the command below to select the relevant stars in the image (the brightest; with a magnitude between 6 to 10). The advantage of using this script (instead of a simple --range in Table), is that it will also check distances to nearby stars and reject those that are too close (and not good for constructing the PSF). Since we have very bright stars in this very wide-field image, we will also increase the distance to nearby neighbors with brighter or similar magnitudes (the default value is 1 arcmin). To do this, we will set --mindistdeg=0.02, which corresponds to 1.2 arcmin. The details of the options for this script are discussed in Invoking astscript-psf-select-stars.

$ mkdir outer
$ astscript-psf-select-stars flat/67510.fits \
           --magnituderange=6,10 --mindistdeg=0.02 \
           --output=outer/67510-6-10.fits

Let’s have a look at the selected stars in the image (it is very important to visually check every step when you are first discovering a new dataset).

$ astscript-ds9-region outer/67510-6-10.fits -cra,dec \
           --namecol=phot_g_mean_mag \
           --command="ds9 flat/67510.fits -zoom to fit -zscale"

Now that the catalog of good stars is ready, it is time to construct the individual stamps from the catalog above. To create stamps, first, we need to crop a fixed-size box around each isolated star in the catalog. The contaminant objects in the crop should be masked and finally, the fluxes in these cropped images should be normalized. To do these, we will use astscript-psf-stamp (for more on this script see Invoking astscript-psf-stamp).

One of the most important parameters for this script is the normalization radii --normradii. This parameter defines a ring for the flux normalization of each star stamp. The normalization of the flux is necessary because each star has a different brightness, and consequently, it is crucial for having all the stamps with the same flux level in the same region. Otherwise the final stack of the different stamps would have no sense. Depending on the PSF shape, internal reflections, ghosts, saturated pixels, and other systematics, it would be necessary to choose the --normradii appropriately.

The selection of the normalization radii is something that requires a good understanding of the data. To do that, let’s use two useful parameters that will help us in the checking of the data: --tmpdir and --keeptmp;

  • With --tmpdir=checking-normradii all temporary files, including the radial profiles, will be save in that directory (instead of an internally-created name).
  • With --keeptmp we will not remove the temporal files, so it is possible to have a look at them (by default the temporary directory gets deleted at the end). It is necessary to specify the --normradii even if we do not know yet the final values. Otherwise the script will not generate the radial profile.

As a consequence, in this step we put the normalization radii equal to the size of the stamps. By doing this, the script will generate the radial profile of the entire stamp. In this particular step we set it to --normradii=500,510. We also use the --nocentering option to disable sub-pixel warping in this phase (it is only relevant for the central part of the PSF). Furthermore, since there are several stars, we iterate over each row of the catalog using a while loop.

$ counter=1
$ mkdir finding-normradii
$ asttable outer/67510-6-10.fits \
           | while read -r ra dec mag; do
               astscript-psf-stamp label/67510-seg.fits \
                    --mode=wcs \
                    --nocentering \
                    --center=$ra,$dec \
                    --normradii=500,510 \
                    --widthinpix=1000,1000 \
                    --segment=label/67510-seg.fits \
                    --output=finding-normradii/$counter.fits \
                    --tmpdir=finding-normradii --keeptmp; \
               counter=$((counter+1)); \
             done

First let’s have a look at all the masked postage stamps of the cropped stars. Once they all open, feel free to zoom-in, they are all matched and locked. It is always good to check the different stamps to ensure the quality and possible two dimensional features that are difficult to detect from the radial profiles (such as ghosts and internal reflections).

$ astscript-fits-view finding-normradii/cropped-masked*.fits

If everything looks good in the image, let’s open all the radial profiles and visually check those with the command below. Note that astscript-fits-view calls the topcat graphic user interface (GUI) program to visually inspect (plot) tables. If you do not already have it, see TOPCAT.

$ astscript-fits-view finding-normradii/rprofile*.fits

After some study of this data, we could say that a good normalization ring is those pixels between R=20 and R=30 pixels. Such a ring ensures having a high number of pixels so the estimation of the flux normalization will be robust. Also, at such distance from the center the signal to noise is high and there are not obvious features that can affect the normalization. Note that the profiles are different because we are considering a wide range of magnitudes, so the fainter stars are much more noisy. However, in this tutorial we will keep these stars in order to have a higher number of stars for the outer part. In a real case scenario, we should look for stars with a much more similar brightness (smaller range of magnitudes) to not lose signal to noise as a consequence of the inclusion of fainter stars.

$ rm -r finding-normradii
$ counter=1
$ mkdir outer/stamps
$ asttable outer/67510-6-10.fits \
           | while read -r ra dec mag; do
               astscript-psf-stamp label/67510-seg.fits \
                    --mode=wcs \
                    --nocentering \
                    --center=$ra,$dec \
                    --normradii=20,30 \
                    --widthinpix=1000,1000 \
                    --segment=label/67510-seg.fits \
                    --output=outer/stamps/67510-$counter.fits; \
               counter=$((counter+1)); \
             done

After the stamps are created, we need to stack them together with a simple Arithmetic command (see Stacking operators). The stack is done using the sigma-clipped mean operator that will preserve more of the signal, while rejecting outliers (more than \(3\sigma\) with a tolerance of \(0.2\), for more on sigma-clipping see Sigma clipping). Just recall that we need to specify the number of inputs into the stacking operators, so we are reading the list of images and counting them as separate variables before calling Arithmetic.

$ numimgs=$(echo outer/stamps/*.fits | wc -w)
$ astarithmetic outer/stamps/*.fits $numimgs 3 0.2 sigclip-mean \
                -g1 --output=outer/stack.fits --wcsfile=none

Did you notice the --wcsfile=none option above? With it, the stacked image no longer has any WCS information. This is natural, because the stacked image does not correspond to any specific region of the sky any more.

Let’s compare this stacked PSF with the images of the individual stars that were used to create it. You can clearly see that the number of masked pixels is significantly decreased and the PSF is much more “cleaner”.

$ astscript-fits-view outer/stack.fits outer/stamps/*.fits

However, the saturation in the center still remains. Also, because we did not have too many images, some regions still are very noisy. If we had more bright stars in our selected magnitude range, we could have filled those outer remaining patches. In a large survey like J-PLUS (that we are using here), you can simply look into other fields that were observed soon before/after the image ID 67510 that we used here (to have a similar PSF) and get more stars in those images to add to these. In fact, the J-PLUS DR2 image ID of the field above was intentionally preserved during the steps above to show how easy it is to use images from other fields and blend them all into the output PSF.


2.3.5 Inner part of the PSF

In Building outer part of PSF, we were able to create a stack of the outer-most behavior of the PSF in a J-PLUS survey image. But the central part that was affected by saturation and non-linearity is still remaining, and we still do not have a “complete” PSF! In this section, we will use the same steps before to make stacks of more inner regions of the PSF to ultimately unite them all into a single PSF in Uniting the different PSF components.

For the outer PSF, we selected stars in the magnitude range of 6 to 10. So let’s have a look and see how many stars we have in the magnitude range of 12 to 13 with a more relaxed condition on the minimum distance for neighbors, --mindistdeg=0.01 (36 arcsec, since these stars are fainter), and use the ds9 region script to visually inspect them:

$ mkdir inner
$ astscript-psf-select-stars flat/67510.fits \
           --magnituderange=12,13 --mindistdeg=0.01 \
           --output=inner/67510-12-13.fits

$ astscript-ds9-region inner/67510-12-13.fits -cra,dec \
           --namecol=phot_g_mean_mag \
           --command="ds9 flat/67510.fits -zoom to fit -zscale"

We have 41 stars, but if you zoom into their centers, you will see that they do not have any major bleeding-vertical saturation any more. Only the very central core of some of the stars is saturated. We can therefore use these stars to fill the strong bleeding footprints that were present in the outer stack of outer/stack.fits. Similar to before, let’s build ready-to-stack crops of these stars. To get a better feeling of the normalization radii, follow the same steps of Building outer part of PSF (setting --tmpdir and --keeptmp). In this case, since the stars are fainter, we can set a smaller size for the individual stamps, --widthinpix=500,500, to speed up the calculations:

$ counter=1
$ mkdir inner/stamps
$ asttable inner/67510-12-13.fits \
           | while read -r ra dec mag; do
               astscript-psf-stamp label/67510-seg.fits \
                    --mode=wcs \
                    --normradii=5,10 \
                    --center=$ra,$dec \
                    --widthinpix=500,500 \
                    --segment=label/67510-seg.fits \
                    --output=inner/stamps/67510-$counter.fits; \
               counter=$((counter+1)); \
             done

$ numimgs=$(echo inner/stamps/*.fits | wc -w)
$ astarithmetic inner/stamps/*.fits $numimgs 3 0.2 sigclip-mean \
                -g1 --output=inner/stack.fits --wcsfile=none
$ astscript-fits-view inner/stack.fits inner/stamps/*.fits

We are now ready to unite the two stacks we have constructed: the outer and the inner parts.


2.3.6 Uniting the different PSF components

In Building outer part of PSF we built the outer part of the extended PSF and the inner part was built in Inner part of the PSF. The outer part was built with very bright stars, and the inner part using fainter stars to not have saturation in the core of the PSF. The next step is to join these two parts in order to have a single PSF. First of all, let’s have a look at the two stacks and also to their radial profiles to have a good feeling of the task. Note that you will need to have TOPCAT to run the last command and plot the radial profile (see TOPCAT).

$ astscript-fits-view outer/stack.fits inner/stack.fits
$ astscript-radial-profile outer/stack.fits -o outer/profile.fits
$ astscript-radial-profile inner/stack.fits -o inner/profile.fits
$ astscript-fits-view outer/profile.fits inner/profile.fits

From the visual inspection of the images and the radial profiles, it is clear that we have saturation in the center for the outer part. Note that the absolute flux values of the PSFs are meaningless since they depend on the normalization radii we used to obtain them. The uniting step consists in scaling up (or down) the inner part of the PSF to have the same flux at the junction radius, and then, use that flux-scaled inner part to fill the center of the outer PSF. To get a feeling of the process, first, let’s open the two radial profiles and find the factor manually first:

  1. Run this command to open the two tables in TOPCAT:
    $ astscript-fits-view outer/profile.fits inner/profile.fits
    
  2. On the left side of the screen, under “Table List”, you will see the two imported tables. Click on the first one (profile of the outer part) so it is shown first.
  3. Under the “Graphics” menu item, click on “Plane plot”. A new window will open with the plot of the first two columns: RADIUS on the horizontal axis and MEAN on the vertical. The rest of the steps are done in this window.
  4. In the bottom settings, within the left panel, click on the “Axes” item. This will allow customization of the plot axes.
  5. In the bottom-right panel, click on the box in front of “Y Log” to make the vertical axis logarithmic-scaled.
  6. On the “Layers” menu, select “Add Position Control” to allow adding the profile of the inner region. After it, you will see that a new red-blue scatter plot icon opened on the bottom-left menu (with a title of <no table>).
  7. On the bottom-right panel, in the drop-down menu in front of Table:, select 2: profile.fits. Afterwards, you will see the radial profile of the inner stack as the newly added blue plot. Our goal here is to find the factor that is necessary to multiply with the inner profile so it matches the outer one.
  8. On the bottom-right panel, in front of Y:, you will see MEAN. Click in the white-space after it, and type this: *100. This will display the MEAN column of the inner profile, after multiplying it by 100. Afterwards, you will see that the inner profile (blue) matches more cleanly with the outer (red); especially in the smaller radii. At larger radii, it does not drop like the red plot. This is because of the extremely low signal-to-noise ratio at those regions in the fainter stars used to make this stack.
  9. Take your mouse cursor over the profile, in particular over the bump around a radius of 100 pixels. Scroll your mouse down-ward to zoom-in to the profile and up-ward to zoom-out. You can click-and-hold any part of the profile and if you move your cursor (while still holding the mouse-button) to look at different parts of the profile. This is particular helpful when you have zoomed-in to the profile.
  10. Zoom-in to the bump around a radius of 100 pixels until the horizontal axis range becomes around 50 to 130 pixels.
  11. You clearly see that the inner stack (blue) is much more noisy than the outer (red) stack. By “noisy”, we mean that the scatter of the points is much larger. If you further zoom-out, you will see that the shallow slope at the larger radii of the inner (blue) profile has also affected the height of this bump in the inner profile. This is a very important point: this clearly shows that the inner profile is too noisy at these radii.
  12. Click-and-hold your mouse to see the inner parts of the two profiles (in the range 0 to 80). You will see that for radii less than 40 pixels, the inner profile (blue) points loose their scatter (and thus have a good signal-to-noise ratio).
  13. Zoom-in to the plot and follow the profiles until smaller radii (for example, 10 pixels). You see that for each radius, the inner (blue) points are consistently above the outer (red) points. This shows that the \(\times100\) factor we selected above was too much.
  14. In the bottom-right panel, change the 100 to 80 and zoom-in to the same region. At each radius, the blue points are now below the red points, so the scale-factor 80 is not enough. So let’s increase it and try 90. After zooming-in, you will notice that in the inner-radii (less than 30 pixels), they are now very similar. The ultimate aim of the steps below is to find this factor automatically.
  15. But before continuing, let’s focus on another important point about the central regions: non-linearity and saturation. While you are zoomed-in (from the step above), follow (click-and-drag) the profile towards smaller radii. You will see that smaller than a radius of 10, they start to diverge. But this time, the outer (red) profile is getting a shallower slope and diverges significantly from about the radius of 8. We had masked all saturated pixels before, so this divergence for radii smaller than 10 shows the effect of the CCD’s non-linearity (where the number of electrons will not be linearly correlated with the number of incident photons). This is present in all CCDs and pixels beyond this level should not be used in measurements (or properly corrected).

The items above were only listed so you get a good mental/visual understanding of the logic behind the operation of the next script (and to learn how to tune its parameters where necessary): astscript-psf-scale-factor. This script is more general than this particular problem, but can be used for this special case also. Its job is to take a model of an object (PSF, or inner stack in this case) and the position of an instance of that model (a star, or the outer stack in this case) in a larger image.

Instead of dealing with radial profiles (that enforce a certain shape), this script will put the centers of the inner and outer PSFs over each other and divide the outer by the inner. Let’s have a look with the command below. Just note that we are running it with --keeptmp so the temporary directory with all the intermediate files remain for further clarification:

$ astscript-psf-scale-factor outer/stack.fits \
           --psf=inner/stack.fits --center=501,501 \
           --mode=img --normradii=10,15 --keeptmp
$ astscript-fits-view stack_psfmodelscalefactor/cropped-*.fits \
                      stack_psfmodelscalefactor/for-factor-*.fits

With the second command, you see the four steps of the process: the first two images show the cropped outer and inner stacks (to same width image). The third shows the radial position of each pixel (which is used to only keep the pixels within the desired radial range). The fourth shows the per-pixel division of the outer by the inner within the requested radii. The sigma-clipped median of these pixels is finally reported. Unlike the radial profile method (which averages over a circular/elliptical annulus for each radius), this method imposes no a-priori shape on the PSF. This makes it very useful for complex PSFs (like the case here).

To continue, let’s remove the temporary directory and re-run the script but with --quiet mode so we can put the output in a shell variable.

$ rm -r stack_psfmodelscalefactor
$ scale=$(astscript-psf-scale-factor outer/stack.fits \
                   --psf=inner/stack.fits --center=501,501 \
                   --mode=img --normradii=10,15 --quiet)
$ echo $scale

Now that we know the scaling factor, we are ready to unite the outer and the inner part of the PSF. To do that, we will use the script astscript-psf-unite with the command below (for more on this script, see Invoking astscript-psf-unite). The basic parameters are the inner part of the PSF (given to --inner), the inner part’s scale factor (--scale), and the junction radius (--radius). The inner part is first scaled, and all the pixels of the outer image within the given radius are replaced with the pixels of the inner image. Since the flux factor was computed for a ring of pixels between 10 and 15 pixels, let’s set the junction radius to be 12 pixels (roughly in between 10 and 15):

$ astscript-psf-unite outer/stack.fits \
           --inner=inner/stack.fits --radius=12 \
           --scale=$scale --output=psf.fits

Let’s have a look at the outer stack and the final PSF with the command below. Since we want several other DS9 settings to help you directly see the main point, we are using --ds9extra. After DS9 is opened, you can see that the center of the PSF has now been nicely filled. You can click on the “Edit” button and then the “Colorbar” and hold your cursor over the image and move it. You can see that besides filling the inner regions nicely, there is also no major discontinuity in the 2D image around the union radius of 12 pixels around the center.

$ astscript-fits-view outer/stack.fits psf.fits --ds9scale=minmax \
           --ds9extra="-scale limits 0 22000 -match scale" \
           --ds9extra="-lock scale yes -zoom 4 -scale log"

Nothing demonstrates the effect of a bad analysis than actually seeing a bad result! So let’s choose a bad normalization radial range (50 to 60 pixels) and unite the inner and outer parts based on that. The last command will open the two PSFs together in DS9, you should be able to immediately see the discontinuity in the union radius.

$ scale=$(astscript-psf-scale-factor outer/stack.fits \
                   --psf=inner/stack.fits --center=501,501 \
                   --mode=img --normradii=50,60 --quiet)

$ astscript-psf-unite outer/stack.fits \
           --inner=inner/stack.fits --radius=55 \
           --scale=$scale --output=psf-bad.fits

$ astscript-fits-view psf-bad.fits psf.fits --ds9scale=minmax \
           --ds9extra="-scale limits 0 50 -match scale" \
           --ds9extra="-lock scale yes -zoom 4 -scale log"

As you see, the selection of the normalization radii and the unite radius are very important. The first time you are trying to build the PSF of a new dataset, it has to be explored with a visual inspection of the images and radial profiles. Once you have found a good normalization radius for a certain part of the PSF in a survey, you can generally use it comfortably without change. But for a new survey, or a different part of the PSF, be sure to repeat the visual checks above to choose the best radii. As a summary, a good junction radius is one that:

  • Is large enough to not let saturation and non-linearity (from the outer profile) into the inner region.
  • Is small enough to have a sufficiently high signal to noise ratio (from the inner profile) to avoid adding noise in the union radius.

Now that the complete PSF has been obtained, let’s remove that bad-looking PSF, and stick with the nice and clean PSF for the next step in Subtracting the PSF.

$ rm -rf psf-bad.fits

2.3.7 Subtracting the PSF

Previously (in Uniting the different PSF components) we constructed a full PSF, from the central pixel to a radius of 500 pixels. Now, let’s use the PSF to subtract the scattered light from each individual star in the image.

By construction, the pixel values of the PSF came from the normalization of the individual stamps (that were created for stars of different magnitudes). As a consequence, it is necessary to compute a scale factor to fit that PSF image to each star. This is done with the same astscript-psf-scale-factor command that we used previously in Uniting the different PSF components. The difference is that now we are not aiming to join two different PSF parts but looking for the necessary scale factor to match the star with the PSF. Afterwards, we will use astscript-psf-subtract for placing the PSF image at the desired coordinates within the same pixel grid as the image. Finally, once the stars have been modeled by the PSF, we will subtract it.

First, let’s start with a single star. Later, when the basic idea has been explained, we will generalize the method for any number of stars. With the following command we obtain the coordinates (RA and DEC) and magnitude of the brightest star in the image (which is on the top edge of the image):

$ mkdir single-star
$ center=$(asttable flat/67510-bright.fits --sort phot_g_mean_mag \
                    --column=ra,dec --head 1 \
                    | awk '{printf "%s,%s", $1, $2}')
$ echo $center

With the center position of that star, let’s obtain the flux factor using the same normalization ring we used for the creation of the outer part of the PSF:

$ scale=$(astscript-psf-scale-factor label/67510-seg.fits \
                   --mode=wcs --quiet \
                   --psf=psf.fits \
                   --center=$center \
                   --normradii=10,15 \
                   --segment=label/67510-seg.fits)

Now we have all the information necessary to model the star using the PSF: the position on the sky and the flux factor. Let’s use this data with the script astscript-psf-subtract for modeling this star and have a look with DS9.

$ astscript-psf-subtract label/67510-seg.fits \
           --mode=wcs \
           --psf=psf.fits \
           --scale=$scale \
           --center=$center \
           --output=single-star/subtracted.fits

$ astscript-fits-view label/67510-seg.fits single-star/subtracted.fits \
           --ds9center=$center --ds9mode=wcs --ds9extra="-zoom 4"

You will notice that there is something wrong with this “subtraction”! The box of the extended PSF is clearly visible! The sky noise under the box is clearly larger than the rest of the noise in the image. Before reading on, please try to think about the cause of this yourself.

To understand the cause, let’s look at the scale factor, the number of stamps used to build the outer part (and its square root):

$ echo $scale
$ ls outer/stamps/*.fits | wc -l
$ ls outer/stamps/*.fits | wc -l | awk '{print sqrt($1)}'

You see that the scale is almost 19! As a result, the PSF has been multiplied by 19 before being subtracted. However, the outer part of the PSF was created with only a handful of star stamps. When you stack \(N\) images, the stack’s signal-to-noise ratio (S/N) improves by \(\sqrt{N}\). We had 8 images for the outer part, so the S/N has only improved by a factor of just under 3! When we multiply the final stacked PSF with 19, we are also scaling up the noise by that same factor (most importantly: in the outer most regions where there is almost no signal). So the stacked image’s noise-level is \(19/3=6.3\) times larger than the noise of the input image. This terrible noise-level is what you clearly see as the footprint of the PSF.

To confirm this, let’s use the commands below to subtract the faintest of the bright-stars catalog (note the use of --tail when finding the central position). You will notice that the scale factor (\(\sim1.3\)) is now smaller than 3. So when we multiply the PSF with this factor, the PSF’s noise level is lower than our input image and we should not see any footprint like before. Note also that we are using a larger zoom factor, because this star is smaller in the image.

$ center=$(asttable flat/67510-bright.fits --sort phot_g_mean_mag \
                    --column=ra,dec --tail 1 \
                    | awk '{printf "%s,%s", $1, $2}')

$ scale=$(astscript-psf-scale-factor label/67510-seg.fits \
                   --mode=wcs --quiet \
                   --psf=psf.fits \
                   --center=$center \
                   --normradii=10,15 \
                   --segment=label/67510-seg.fits)
$ echo $scale

$ astscript-psf-subtract label/67510-seg.fits \
           --mode=wcs \
           --psf=psf.fits \
           --scale=$scale \
           --center=$center \
           --output=single-star/subtracted.fits

$ astscript-fits-view label/67510-seg.fits single-star/subtracted.fits \
           --ds9center=$center --ds9mode=wcs --ds9extra="-zoom 10"

In a large survey like J-PLUS, it is easy to use more and more bright stars from different pointings (ideally with similar FWHM and similar telescope properties60) to improve the S/N of the PSF. As explained before, we designed the output files of this tutorial with the 67510 (which is this image’s pointing label in J-PLUS) where necessary so you see how easy it is to add more pointings to use in the creation of the PSF.

Let’s consider now more than one single star. We should have two things in mind:

  • The brightest (subtract-able, see the point below) star should be the first star to be subtracted. This is because of its extended wings which may affect the scale factor of nearby stars. So we should sort the catalog by magnitude and come down from the brightest.
  • We should only subtract stars where the scale factor is less than the S/N of the PSF (in relation to the data).

Since it can get a little complex, it is easier to implement this step as a script (that is heavily commented for you to easily understand every step; especially if you put it in a good text editor with color-coding!). You will notice that script also creates a .log file, which shows which star was subtracted and which one was not (this is important, and will be used below!).

#!/bin/bash

# Abort the script on first error.
set -e

# ID of image to subtract stars from.
imageid=67510

# Get S/N level of the final PSF in relation to the actual data:
snlevel=$(ls outer/stamps/*.fits | wc -l | awk '{print sqrt($1)}')

# Put a copy of the image we want to subtract the PSF from in the
# final file (this will be over-written after each subtraction).
subtracted=subtracted/$imageid.fits
cp label/$imageid-seg.fits $subtracted

# Name of log-file to keep status of the subtraction of each star.
logname=subtracted/$imageid.log
echo "# Column 1: RA   [deg, f64] Right ascension of star." >  $logname
echo "# Column 2: Dec  [deg, f64] Declination of star."     >> $logname
echo "# Column 3: Stat [deg, f64] Status (1: subtracted)"   >> $logname

# Go over each item in the bright star catalog:
asttable flat/67510-bright.fits -cra,dec --sort phot_g_mean_mag  \
    | while read -r ra dec; do

    # Put a comma between the RA/Dec to pass to options.
    center=$(echo $ra $dec | awk '{printf "%s,%s", $1, $2}')

    # Calculate the scale value
    scale=$(astscript-psf-scale-factor label/67510-seg.fits \
                   --mode=wcs --quiet\
                   --psf=psf.fits \
                   --center=$center \
                   --normradii=10,15 \
                   --segment=label/67510-seg.fits)

    # Subtract this star if the scale factor is less than the S/N
    # level calculated above.
    check=$(echo $snlevel $scale \
                | awk '{if($1>$2) c="good"; else c="bad"; print c}')
    if [ $check = good ]; then

        # A temporary file to subtract this star.
        subtmp=subtracted/$imageid-tmp.fits

        # Subtract this star from the image where all previous stars
        # were subtracted.
        astscript-psf-subtract $subtracted \
                 --mode=wcs \
                 --psf=psf.fits \
                 --scale=$scale \
                 --center=$center \
                 --output=$subtmp

        # Rename the temporary subtracted file to the final one:
        mv $subtmp $subtracted

        # Keep the status for this star.
        status=1
    else
        # Let the user know this star did not work, and keep the status
        # for this star.
        echo "$center: $scale is larger than $snlevel"
        status=0
    fi

    # Keep the status in a log file.
    echo "$ra $dec $status" >> $logname
done

Copy the contents above into a file called subtract-psf-from-cat.sh and run the following commands. Just note that in the script above, we assumed the output is written in the subtracted/, directory, so we will first make that.

$ mkdir subtracted
$ chmod +x subtract-psf-from-cat.sh
$ ./subtract-psf-from-cat.sh

$ astscript-fits-view label/67510-seg.fits subtracted/67510.fits

Can you visually find the stars that have been subtracted? Its a little hard, is not it? This shows that you done a good job this time (the sky-noise is not significantly affected)! So let’s subtract the actual image from the PSF-subtracted image to see the scattered light field of the subtracted stars. With the second command below we will zoom into the brightest subtracted star, but of course feel free to zoom-out and inspect the others also.

$ astarithmetic label/67510-seg.fits subtracted/67510.fits - \
                --output=scattered-light.fits -g1

$ center=$(asttable subtracted/67510.log --equal=Stat,1 --head=1 \
                    -cra,dec | awk '{printf "%s,%s", $1, $2}')

$ astscript-fits-view label/67510-seg.fits subtracted/67510.fits \
                      scattered-light.fits \
                      --ds9center=$center --ds9mode=wcs \
                      --ds9extra="-scale limits -0.5 1.5 -match scale" \
                      --ds9extra="-lock scale yes -zoom 10" \
                      --ds9extra="-tile mode column"

## We can always make it easily, so let's remove this.
$ rm scattered-light.fits

You will probably have noticed that in the scattered light field there are some patches that correspond to the saturation of the stars. Since we obtained the scattered light field by subtracting PSF-subtracted image from the original image, it is natural that we have such saturated regions. To solve such inconvenience, this script also has an option to not make the subtraction of the PSF but to give as output the modeled star. For doing that, it is necessary to run the script with the option --modelonly. We encourage the reader to obtain such scattered light field model. In some scenarios it could be interesting having such way of correcting the PSF. For example, if there are many faint stars that can be modeled at the same time because their flux do not affect each other. In such situation, the task could be easily parallelized without having to wait to model the brighter stars before the fainter ones. At the end, once all stars have been modeled, a simple Arithmetic command could be used to sum the different modeled-PSF stamps to obtain the entire scattered light field.

In general you see that the subtraction has been done nicely and almost all the extended wings of the PSF have been subtracted. The central regions of the stars are not perfectly subtracted:

  • Some may get too dark at the center. This may be due to the non-linearity of the CCD counting (as discussed previously in Uniting the different PSF components).
  • Others may have a strong gradient: one side is too positive and one side is too negative (only in the very central few pixels). This is due to the non-accurate positioning: most probably this happens because of imperfect astrometry.

Note also that during this process we assumed that the PSF does not vary with the CCD position or any other parameter. In other words, we are obtaining an averaged PSF model from a few star stamps that are naturally different, and this also explains the residuals on each subtracted star.

We let as an interesting exercise the modeling and subtraction of other stars, for example, the non saturated stars of the image. By doing this, you will notice that in the core region the residuals are different compared to the residuals of brighter stars that we have obtained.

In general, in this tutorial we have showed how to deal with the most important challenges for constructing an extended PSF. Each image or dataset will have its own particularities that you will have to take into account when constructing the PSF.


2.4 Sufi simulates a detection

It is the year 953 A.D. and Abd al-rahman Sufi (903 – 986 A.D.)61 is in Shiraz as a guest astronomer. He had come there to use the advanced 123 centimeter astrolabe for his studies on the ecliptic. However, something was bothering him for a long time. While mapping the constellations, there were several non-stellar objects that he had detected in the sky, one of them was in the Andromeda constellation. During a trip he had to Yemen, Sufi had seen another such object in the southern skies looking over the Indian ocean. He was not sure if such cloud-like non-stellar objects (which he was the first to call ‘Sahābi’ in Arabic or ‘nebulous’) were real astronomical objects or if they were only the result of some bias in his observations. Could such diffuse objects actually be detected at all with his detection technique?

He still had a few hours left until nightfall (when he would continue his studies on the ecliptic) so he decided to find an answer to this question. He had thoroughly studied Claudius Ptolemy’s (90 – 168 A.D) Almagest and had made lots of corrections to it, in particular in measuring the brightness. Using his same experience, he was able to measure a magnitude for the objects and wanted to simulate his observation to see if a simulated object with the same brightness and size could be detected in simulated noise with the same detection technique. The general outline of the steps he wants to take are:

  1. Make some mock profiles in an over-sampled image. The initial mock image has to be over-sampled prior to convolution or other forms of transformation in the image. Through his experiences, Sufi knew that this is because the image of heavenly bodies is actually transformed by the atmosphere or other sources outside the atmosphere (for example, gravitational lenses) prior to being sampled on an image. Since that transformation occurs on a continuous grid, to best approximate it, he should do all the work on a finer pixel grid. In the end he can resample the result to the initially desired grid size.
  2. Convolve the image with a point spread function (PSF, see Point spread function) that is over-sampled to the same resolution as the mock image. Since he wants to finish in a reasonable time and the PSF kernel will be very large due to oversampling, he has to use frequency domain convolution which has the side effect of dimming the edges of the image. So in the first step above he also has to build the image to be larger by at least half the width of the PSF convolution kernel on each edge.
  3. With all the transformations complete, the image should be resampled to the same size of the pixels in his detector.
  4. He should remove those extra pixels on all edges to remove frequency domain convolution artifacts in the final product.
  5. He should add noise to the (until now, noise-less) mock image. After all, all observations have noise associated with them.

Fortunately Sufi had heard of GNU Astronomy Utilities from a colleague in Isfahan (where he worked) and had installed it on his computer a year before. It had tools to do all the steps above. He had used MakeProfiles before, but was not sure which columns he had chosen in his user or system-wide configuration files for which parameters, see Configuration files. So to start his simulation, Sufi runs MakeProfiles with the -P option to make sure what columns in a catalog MakeProfiles currently recognizes, and confirm the output image parameters. In particular, Sufi is interested in the recognized columns (shown below).

$ astmkprof -P

[[[ ... Truncated lines ... ]]]

# Output:
 type         float32     # Type of output: e.g., int16, float32, etc.
 mergedsize   1000,1000   # Number of pixels along first FITS axis.
 oversample   5           # Scale of oversampling (>0 and odd).

[[[ ... Truncated lines ... ]]]

# Columns, by info (see `--searchin'), or number (starting from 1):
 ccol         2           # Coord. columns (one call for each dim.).
 ccol         3           # Coord. columns (one call for each dim.).
 fcol         4           # sersic (1), moffat (2), gaussian (3), point
                          # (4), flat (5), circumference (6), distance
                          # (7), custom-prof (8), azimuth (9),
                          # custom-img (10).
 rcol         5           # Effective radius or FWHM in pixels.
 ncol         6           # Sersic index or Moffat beta.
 pcol         7           # Position angle.
 qcol         8           # Axis ratio.
 mcol         9           # Magnitude.
 tcol         10          # Truncation in units of radius or pixels.

[[[ ... Truncated lines ... ]]]

In Gnuastro, column counting starts from 1, so the columns are ordered such that the first column (number 1) can be an ID he specifies for each object (and MakeProfiles ignores), each subsequent column is used for another property of the profile. It is also possible to use column names for the values of these options and change these defaults, but Sufi preferred to stick to the defaults. Fortunately MakeProfiles has the capability to also make the PSF which is to be used on the mock image and using the --prepforconv option, he can also make the mock image to be larger by the correct amount and all the sources to be shifted by the correct amount.

For his initial check he decides to simulate the nebula in the Andromeda constellation. The night he was observing, the PSF had roughly a FWHM of about 5 pixels, so as the first row (profile) in the table below, he defines the PSF parameters. Sufi sets the radius column (rcol above, fifth column) to 5.000, he also chooses a Moffat function for its functional form. Remembering how diffuse the nebula in the Andromeda constellation was, he decides to simulate it with a mock Sérsic index 1.0 profile. He wants the output to be 499 pixels by 499 pixels, so he can put the center of the mock profile in the central pixel of the image which is the 250th pixel along both dimensions (note that an even number does not have a “central” pixel).

Looking at his drawings of it, he decides a reasonable effective radius for it would be 40 pixels on this image pixel scale (second row, 5th column below). He also sets the axis ratio (0.4) and position angle (-25 degrees) to approximately correct values too, and finally he sets the total magnitude of the profile to 3.44 which he had measured. Sufi also decides to truncate both the mock profile and PSF at 5 times the respective radius parameters. In the end he decides to put four stars on the four corners of the image at very low magnitudes as a visual scale. While he was preparing the catalog, one of his students approached him and was also following the steps.

As described above, the catalog of profiles to build will be a table (multiple columns of numbers) like below:

0  0.000   0.000  2  5   4.7  0.0  1.0  30.0  5.0
1  250.0   250.0  1  40  1.0  -25  0.4  3.44  5.0
2  50.00   50.00  4  0   0.0  0.0  0.0  6.00  0.0
3  450.0   50.00  4  0   0.0  0.0  0.0  6.50  0.0
4  50.00   450.0  4  0   0.0  0.0  0.0  7.00  0.0
5  450.0   450.0  4  0   0.0  0.0  0.0  7.50  0.0

This contains all the “data” to build the profile, and you can easily pass it to Gnuastro’s MakeProfiles: since Sufi already knows the columns and expected values very good, he has placed the information in the proper columns. However, when the student sees this, he just sees a mumble-jumble of numbers! Generally, Sufi explains to the student that even if you know the number positions and values very nicely today, in a couple of months you will forget! It will then be very hard for you to interpret the numbers properly. So you should never use naked data (or data without any extra information).

Data (or information) that describes other data is called “metadata”! One common example is column names (the name of a column is itself a data element, but data that describes the lower-level data within that column: how to interpret the numbers within it). Sufi explains to his student that Gnuastro has a convention for adding metadata within a plain-text file; and guides him to Gnuastro text table format. Because we do not want metadata to be confused with the actual data, in a plain-text file, we start lines containing metadata with a ‘#’. For example, see the same data above, but this time with metadata for every column:

# Column 1:  ID      [counter, u8] Identifier
# Column 2:  X       [pix,    f32] Horizontal position
# Column 3:  Y       [pix,    f32] Vertical position
# Column 4:  PROFILE [name,    u8] Radial profile function
# Column 5:  R       [pix,    f32] Effective radius
# Column 6:  N       [n/a,    f32] Sersic index
# Column 7:  PA      [deg,    f32] Position angle
# Column 8:  Q       [n/a,    f32] Axis ratio
# Column 9:  MAG     [log,    f32] Magnitude
# Column 10: TRUNC   [n/a,    f32] Truncation (multiple of R)
0  0.000   0.000  2  5   4.7  0.0  1.0  30.0  5.0
1  250.0   250.0  1  40  1.0  -25  0.4  3.44  5.0
2  50.00   50.00  4  0   0.0  0.0  0.0  6.00  0.0
3  450.0   50.00  4  0   0.0  0.0  0.0  6.50  0.0
4  50.00   450.0  4  0   0.0  0.0  0.0  7.00  0.0
5  450.0   450.0  4  0   0.0  0.0  0.0  7.50  0.0

The numbers now make much more sense for the student! Before continuing, Sufi reminded the student that even though metadata may not be strictly/technically necessary (for the computer programs), metadata are critical for human readers! Therefore, a good scientist should never forget to keep metadata with any data that they create, use or archive.

To start simulating the nebula, Sufi creates a directory named simulationtest in his home directory. Note that the pwd command will print the “parent working directory” of the current directory (its a good way to confirm/check your current location in the full file system: it always starts from the root, or ‘/’).

$ mkdir ~/simulationtest
$ cd ~/simulationtest
$ pwd
/home/rahman/simulationtest

It is possible to use a plain-text editor to manually put the catalog contents above into a plain-text file. But to easily automate catalog production (in later trials), Sufi decides to fill the input catalog with the redirection features of the command-line (or shell). Sufi’s student was not familiar with this feature of the shell! So Sufi decided to do a fast demo; giving the following explanations while running the commands:

Shell redirection allows you to “re-direct” the “standard output” of a program (which is usually printed by the program on the command-line during its execution; like the output of pwd above) into a file. For example, let’s simply “echo” (or print to standard output) the line “This is a test.”:

$ echo "This is a test."
This is a test.

As you see, our statement was simply “echo”-ed to the standard output! To redirect this sentence into a file (instead of simply printing it on the standard output), we can simply use the > character, followed by the name of the file we want it to be dumped in.

$ echo "This is a test." > test.txt

This time, the echo command did not print anything in the terminal. Instead, the shell (command-line environment) took the output, and “re-directed” it into a file called test.txt. Let’s confirm this with the ls command (ls is short for “list” and will list all the files in the current directory):

$ ls
test.txt

Now that you confirm the existence of test.txt, you can see its contents with the cat command (short for “concatenation”; because it can also merge multiple files together):

$ cat test.txt
This is a test.

Now that we have written our first line in test.txt, let’s try adding a second line (do not forget that our final catalog of objects to simulate will have multiple lines):

$ echo "This is my second line." > test.txt
$ cat test.txt
This is my second line.

As you see, the first line that you put in the file is no longer present! This happens because ‘>’ always starts dumping content to a file from the start of the file. In effect, this means that any possibly pre-existing content is over-written by the new content! To append new lines (or dumping new content at the end of existing content), you can use ‘>>’. for example, with the commands below, first we will write the first sentence (using ‘>’), then use ‘>>’ to add the second and third sentences. Finally, we will print the contents of test.txt to confirm that all three lines are preserved.

$ echo "My first sentence."   > test.txt
$ echo "My second sentence." >> test.txt
$ echo "My third sentence."  >> test.txt
$ cat test.txt
My first sentence.
My second sentence.
My third sentence.

The student thanked Sufi for this explanation and now feels more comfortable with redirection. Therefore Sufi continues with the main project. But before that, he deletes the temporary test file:

$ rm test.txt

To put the catalog of profile data and their metadata (that was described above) into a file, Sufi uses the commands below. While Sufi was writing these commands, the student complained that “I could have done in this in a text editor”. Sufi reminded the student that it is indeed possible; but it requires manual intervention. The advantage of a solution like below is that it can be automated (for example, adding more rows; for more profiles in the final image).

$ echo "# Column 1:  ID    [counter, u8] Identifier" > cat.txt
$ echo "# Column 2:  X     [pix,    f32] Horizontal position" >> cat.txt
$ echo "# Column 3:  Y     [pix,    f32] Vertical position" >> cat.txt
$ echo "# Column 4:  PROF  [name,    u8] Radial profile function" \
       >> cat.txt
$ echo "# Column 5:  R     [pix,    f32] Effective radius" >> cat.txt
$ echo "# Column 6:  N     [n/a,    f32] Sersic index" >> cat.txt
$ echo "# Column 7:  PA    [deg,    f32] Position angle" >> cat.txt
$ echo "# Column 8:  Q     [n/a,    f32] Axis ratio" >> cat.txt
$ echo "# Column 9:  MAG   [log,    f32] Magnitude" >> cat.txt
$ echo "# Column 10: TRUNC [n/a,    f32] Truncation (multiple of R)" \
       >> cat.txt
$ echo "0  0.000   0.000  2  5   4.7  0.0  1.0  30.0  5.0" >> cat.txt
$ echo "1  250.0   250.0  1  40  1.0  -25  0.4  3.44  5.0" >> cat.txt
$ echo "2  50.00   50.00  4  0   0.0  0.0  0.0  6.00  0.0" >> cat.txt
$ echo "3  450.0   50.00  4  0   0.0  0.0  0.0  6.50  0.0" >> cat.txt
$ echo "4  50.00   450.0  4  0   0.0  0.0  0.0  7.00  0.0" >> cat.txt
$ echo "5  450.0   450.0  4  0   0.0  0.0  0.0  7.50  0.0" >> cat.txt

To make sure if the catalog’s content is correct (and there was no typo for example!), Sufi runs ‘cat cat.txt’, and confirms that it is correct.

Now that the catalog is created, Sufi is ready to call MakeProfiles to build the image containing these objects. He looks into his records and finds that the zero point magnitude for that night, and that particular detector, was 18 magnitudes. The student was a little confused on the concept of zero point, so Sufi pointed him to Brightness, Flux, Magnitude and Surface brightness, which the student can study in detail later. Sufi therefore runs MakeProfiles with the command below:

$ astmkprof --prepforconv --mergedsize=499,499 --zeropoint=18.0 cat.txt
MakeProfiles 0.22 started on Sat Oct  6 16:26:56 953
  - 6 profiles read from cat.txt
  - Random number generator (RNG) type: ranlxs1
  - Basic RNG seed: 1652884540
  - Using 12 threads.
  ---- row 3 complete, 5 left to go
  ---- row 4 complete, 4 left to go
  ---- row 6 complete, 3 left to go
  ---- row 5 complete, 2 left to go
  ---- ./0_cat_profiles.fits created.
  ---- row 1 complete, 1 left to go
  ---- row 2 complete, 0 left to go
  - ./cat_profiles.fits created.                       0.092573 seconds
  -- Output: ./cat_profiles.fits
MakeProfiles finished in 0.293644 seconds

Sufi encourages the student to read through the printed output. As the statements say, two FITS files should have been created in the running directory. So Sufi ran the command below to confirm:

$ ls
0_cat_profiles.fits  cat_profiles.fits  cat.txt

The file 0_cat_profiles.fits is the PSF Sufi had asked for, and cat_profiles.fits is the image containing the main objects in the catalog. Sufi opened the main image with the command below (using SAO DS9):

$ astscript-fits-view cat_profiles.fits --ds9scale=95

The student could clearly see the main elliptical structure in the center. However, the size of cat_profiles.fits was surprising for the student, instead of 499 by 499 (as we had requested), it was 2615 by 2615 pixels (from the command below):

$ astfits cat_profiles.fits
Fits (GNU Astronomy Utilities) 0.22
Run on Sat Oct  6 16:26:58 953
-----
HDU (extension) information: 'cat_profiles.fits'.
 Column 1: Index (counting from 0, usable with '--hdu').
 Column 2: Name ('EXTNAME' in FITS standard, usable with '--hdu').
 Column 3: Image data type or 'table' format (ASCII or binary).
 Column 4: Size of data in HDU.
-----
0      MKPROF-CONFIG   no-data         0
1      Mock profiles   float32         2615x2615

So Sufi explained why oversampling is important in modeling, especially for parts of the image where the flux change is significant over a pixel. Recall that when you oversample the model (for example, by 5 times), for every desired pixel, you get 25 pixels (\(5\times5\)). Sufi then explained that after convolving (next step below) we will down-sample the image to get our originally desired size/resolution.

After seeing the image, the student complained that only the large elliptical model for the Andromeda nebula can be seen in the center. He could not see the four stars that we had also requested in the catalog. So Sufi had to explain that the stars are there in the image, but the reason that they are not visible when looking at the whole image at once, is that they only cover a single pixel! To prove it, he centered the image around the coordinates 2308 and 2308, where one of the stars is located in the over-sampled image [you can do this in ds9 by selecting “Pan” in the “Edit” menu, then clicking around that position]. Sufi then zoomed in to that region and soon, the star’s non-zero pixel could be clearly seen.

Sufi explained that the stars will take the shape of the PSF (cover an area of more than one pixel) after convolution. If we did not have an atmosphere and we did not need an aperture, then stars would only cover a single pixel with normal CCD resolutions. So Sufi convolved the image with this command:

$ astconvolve --kernel=0_cat_profiles.fits cat_profiles.fits \
              --output=cat_convolved.fits
Convolve started on Sat Oct  6 16:35:32 953
  - Using 8 CPU threads.
  - Input: cat_profiles.fits (hdu: 1)
  - Kernel: 0_cat_profiles.fits (hdu: 1)
  - Input and Kernel images padded.                    0.075541 seconds
  - Images converted to frequency domain.              6.728407 seconds
  - Multiplied in the frequency domain.                0.040659 seconds
  - Converted back to the spatial domain.              3.465344 seconds
  - Padded parts removed.                              0.016767 seconds
  - Output: cat_convolved.fits
Convolve finished in:  10.422161 seconds

When convolution finished, Sufi opened cat_convolved.fits and the four stars could be easily seen now:

$ astscript-fits-view cat_convolved.fits --ds9scale=95

It was interesting for the student that all the flux in that single pixel is now distributed over so many pixels (the sum of all the pixels in each convolved star is actually equal to the value of the single pixel before convolution). Sufi explained how a PSF with a larger FWHM would make the points even wider than this (distributing their flux in a larger area). With the convolved image ready, they were prepared to resample it to the original pixel scale Sufi had planned [from the $ astmkprof -P command above, recall that MakeProfiles had over-sampled the image by 5 times]. Sufi explained the basic concepts of warping the image to his student and ran Warp with the following command:

$ astwarp --scale=1/5 --centeroncorner cat_convolved.fits
Warp started on Sat Oct  6 16:51:59 953
 Using 8 CPU threads.
 Input: cat_convolved.fits (hdu: 1)
 matrix:
        0.2000   0.0000   0.4000
        0.0000   0.2000   0.4000
        0.0000   0.0000   1.0000

$ astfits cat_convolved_scaled.fits --quiet
0      WARP-CONFIG     no-data         0
1      Warped          float32         523x523

cat_convolved_scaled.fits now has the correct pixel scale. However, the image is still larger than what we had wanted, it is \(523\times523\) pixels (not our desired \(499\times499\)). The student is slightly confused, so Sufi also resamples the PSF with the same scale by running

$ astwarp --scale=1/5 --centeroncorner 0_cat_profiles.fits
$ astfits 0_cat_profiles_scaled.fits --quiet
0      WARP-CONFIG     no-data         0
1      Warped          float32         25x25

Sufi notes that \(25=12+12+1\) and that \(523=499+12+12\). He goes on to explain that frequency space convolution will dim the edges and that is why he added the --prepforconv option to MakeProfiles above. Now that convolution is done, Sufi can remove those extra pixels using Crop with the command below. Crop’s --section option accepts coordinates inclusively and counting from 1 (according to the FITS standard), so the crop region’s first pixel has to be 13, not 12.

$ astcrop cat_convolved_scaled.fits --section=13:*-12,13:*-12    \
          --mode=img --zeroisnotblank
Crop started on Sat Oct  6 17:03:24 953
  - Read metadata of 1 image.                          0.001304 seconds
  ---- ...nvolved_scaled_cropped.fits created: 1 input.
Crop finished in:  0.027204 seconds

To fully convince the student, Sufi checks the size of the output of the crop command above:

$ astfits cat_convolved_scaled_cropped.fits --quiet
0      n/a             no-data         0
1      n/a             float32         499x499

Finally, cat_convolved_scaled_cropped.fits is \(499\times499\) pixels and the mock Andromeda galaxy is centered on the central pixel. This is the same dimensions as Sufi had desired in the beginning. All this trouble was certainly worth it because now there is no dimming on the edges of the image and the profile centers are more accurately sampled.

The final step to simulate a real observation would be to add noise to the image. Sufi set the zero point magnitude to the same value that he set when making the mock profiles and looking again at his observation log, he had measured the background flux near the nebula had a per-pixel magnitude of 7 that night. For more on how the background value determines the noise, see Noise basics. So using these values he ran Arithmetic’s mknoise-sigma-from-mean operator, and with the second command, he visually inspected the image. The mknoise-sigma-from-mean operator takes the noise standard deviation in linear units, not magnitudes (which are logarithmic). Therefore within the same Arithmetic command, he has converted the sky background magnitude to counts using Arithmetic’s counts-to-mag operator.

$ astarithmetic cat_convolved_scaled_cropped.fits \
                7 18 mag-to-counts mknoise-sigma-from-mean \
                --output=out.fits

$ astscript-fits-view out.fits

The out.fits file now contains the noised image of the mock catalog Sufi had asked for. The student had not observed the nebula in the sky, so when he saw the mock image in SAO DS9 (with the second command above), he understood why Sufi was dubious: it was very diffuse!

Seeing how the --output option allows the user to specify the name of the output file, the student was confused and wanted to know why Sufi had not used it more regularly before? Sufi explained that for intermediate steps, you can rely on the automatic output of the programs (see Automatic output). Doing so will give all the intermediate files a similar basic name structure, so in the end you can simply remove them all with the Shell’s capabilities, and it will be familiar for other users. So Sufi decided to show this to the student by making a shell script from the commands he had used before.

The command-line shell has the capability to read all the separate input commands from a file. This is useful when you want to do the same thing multiple times, with only the names of the files or minor parameters changing between the different instances. Using the shell’s history (by pressing the up keyboard key) Sufi reviewed all the commands and then he retrieved the last 5 commands with the $ history 5 command. He selected all those lines he had input and put them in a text file named mymock.sh. Then he defined the edge and base shell variables for easier customization later, and before every command, he added some comments (lines starting with #) for future readability. Finally, Sufi pointed the student to Gnuastro’s General program usage tutorial, which has a full section on Writing scripts to automate the steps.

#!/bin/bash

edge=12
base=cat

# Stop running next commands if one fails.
set -e

# Remove any (possibly) existing output (from previous runs)
# before starting.
rm -f out.fits

# Run MakeProfiles to create an oversampled FITS image.
astmkprof --prepforconv --mergedsize=499,499 --zeropoint=18.0 \
          "$base".txt

# Convolve the created image with the kernel.
astconvolve "$base"_profiles.fits \
            --kernel=0_"$base"_profiles.fits \
            --output="$base"_convolved.fits

# Scale the image back to the intended resolution.
astwarp --scale=1/5 --centeroncorner "$base"_convolved.fits

# Crop the edges out (dimmed during convolution). '--section'
# accepts inclusive coordinates, so the start of the section
# must be one pixel larger than its end.
st_edge=$(( edge + 1 ))
astcrop "$base"_convolved_scaled.fits --zeroisnotblank \
        --mode=img --section=$st_edge:*-$edge,$st_edge:*-$edge

# Add noise to the image.
$ astarithmetic "$base"_convolved_scaled_cropped.fits \
                7 18 mag-to-counts mknoise-sigma-from-mean \
                --output=out.fits

# Remove all the temporary files.
rm 0*.fits "$base"*.fits

He used this chance to remind the student of the importance of comments in code or shell scripts! Just like metadata in a dataset, when writing the code, you have a good mental picture of what you are doing, so writing comments might seem superfluous and excessive. However, in one month when you want to re-use the script, you have lost that mental picture and remembering it can be time-consuming and frustrating. The importance of comments is further amplified when you want to share the script with a friend/colleague. So it is good to accompany any step of a script, or code, with useful comments while you are writing it (create a good mental picture of why you are doing something: do not just describe the command, but its purpose).

Sufi then explained to the eager student that you define a variable by giving it a name, followed by an = sign and the value you want. Then you can reference that variable from anywhere in the script by calling its name with a $ prefix. So in the script whenever you see $base, the value we defined for it above is used. If you use advanced editors like GNU Emacs or even simpler ones like Gedit (part of the GNOME graphical user interface) the variables will become a different color which can really help in understanding the script. We have put all the $base variables in double quotation marks (") so the variable name and the following text do not get mixed, the shell is going to ignore the " after replacing the variable value. To make the script executable, Sufi ran the following command:

$ chmod +x mymock.sh

Then finally, Sufi ran the script, simply by calling its file name:

$ ./mymock.sh

After the script finished, the only file remaining is the out.fits file that Sufi had wanted in the beginning. Sufi then explained to the student how he could run this script anywhere that he has a catalog if the script is in the same directory. The only thing the student had to modify in the script was the name of the catalog (the value of the base variable in the start of the script) and the value to the edge variable if he changed the PSF size. The student was also happy to hear that he will not need to make it executable again when he makes changes later, it will remain executable unless he explicitly changes the executable flag with chmod.

The student was really excited, since now, through simple shell scripting, he could really speed up his work and run any command in any fashion he likes allowing him to be much more creative in his works. Until now he was using the graphical user interface which does not have such a facility and doing repetitive things on it was really frustrating and some times he would make mistakes. So he left to go and try scripting on his own computer. He later reminded Sufi that the second tutorial in the Gnuastro book as more complex commands in data analysis, and a more advanced introduction to scripting (see General program usage tutorial).

Sufi could now get back to his own work and see if the simulated nebula which resembled the one in the Andromeda constellation could be detected or not. Although it was extremely faint62. Therefore, Sufi ran Gnuastro’s detection software (NoiseChisel) to see if this object is detectable or not. NoiseChisel’s output (out_detected.fits) is a multi-extension FITS file, so he used Gnuastro’s astscript-fits-view program in the second command to see the output:

$ astnoisechisel out.fits

$ astscript-fits-view out_detected.fits

In the “Cube” window (that was opened with DS9), if Sufi clicked on the “Next” button to see the pixels that were detected to contain significant signal. Fortunately the nebula’s shape was detectable and he could finally confirm that the nebula he kept in his notebook was actually observable. He wrote this result in the draft manuscript that would later become “Book of fixed stars”63.

He still had to check the other nebula he saw from Yemen and several other such objects, but they could wait until tomorrow (thanks to the shell script, he only has to define a new catalog). It was nearly sunset and they had to begin preparing for the night’s measurements on the ecliptic.


2.5 Detecting lines and extracting spectra in 3D data

3D data cubes are an increasingly common format of data products in observational astronomy. As opposed to 2D images (where each 2D “picture element” or “pixel” covers an infinitesimal area on the surface of the sky), 3D data cubes contain “volume elements” or “voxels” that are also connected in a third dimension.

The most common case of 3D data in observational astrophysics is when the first two dimensions are spatial (RA and Dec on the sky), and the third dimension is wavelength. This type of data is generically (also outside of astronomy) known as Hyperspectral imaging64. For example high-level data products of Integral Field Units (IFUs) like MUSE65 in the optical, ACIS66 in the X-ray, or in the radio where most data are 3D cubes.

In this tutorial, we’ll use a small crop of a reduced deep MUSE cube centered on the Abell 370 galaxy cluster from the Pilot-WINGS survey; see Lagattuta et al. 2022. Abell 370 has a spiral galaxy in its background that is stretched due to the cluster’s gravitational potential to create a beautiful arch. If you haven’t seen it yet, have a look at some of its images in the Wikipedia link above before continuing.

The Pilot-WINGS survey data are available in its webpage67. The cube of the core region is 10.2GBs. This can be prohibitively large to download (and later process) on many networks and smaller computers. Therefore, in this demonstration we won’t be using the full cube. We have prepared a small crop68 of the full cube that you can download with the first command below. The randomly selected crop is centered on (RA,Dec) of (39.96769,-1.58930), with a width of about 27 arcseconds.

$ mkdir tutorial-3d
$ cd tutorial-3d
$ wget http://akhlaghi.org/data/a370-crop.fits    # Downloads 287 MB

In the sections below, we will first review how you can visually inspect a 3D datacube in DS9 and interactively see the spectra of any region. We will then subtract the continuum emission, detect the emission-lines within this cube and extract their spectra. We will finish with creating pseudo narrow-band images optimized for some of the emission lines.


2.5.1 Viewing spectra and redshifted lines

In Detecting lines and extracting spectra in 3D data we downloaded a small crop from the Pilot-WINGS survey of Abell 370 cluster; observed with MUSE. In this section, we will review how you can visualize/inspect a datacube using that example. With the first command below, we’ll open DS9 such that each 2D slice of the cube (at a fixed wavelength) is seen as a single image. If you move the slider in the “Cube” window (that also opens), you can view the same field at different wavelengths. We are ending the first command with a ‘&’ so you can continue viewing DS9 while using the command-line (press one extra ENTER to see the prompt). With the second command, you can see that the spacing between each slice is \(1.25\times10^{-10}\) meters (or 1.25 Angstroms).

$ astscript-fits-view a370-crop.fits -h1 --ds9scale="limits -5 20" &

$ astfits a370-crop.fits --pixelscale
Basic info. for --pixelscale (remove info with '--quiet' or '-q')
  Input: a370-crop.fits (hdu 1) has 3 dimensions.
  Pixel scale in each FITS dimension:
    1: 5.55556e-05 (deg/pixel) = 0.2 (arcsec/pixel)
    2: 5.55556e-05 (deg/pixel) = 0.2 (arcsec/pixel)
    3: 1.25e-10 (m/slice)
  Pixel area (on each 2D slice) :
    3.08642e-09 (deg^2) = 0.04 (arcsec^2)
  Voxel volume:
    3.85802e-19 (deg^2*m) = 5e-12 (arcsec^2*m) = 0.05 (arcsec^2*A)

In the DS9 “Cube” window, you will see two numbers on the two sides of the scroller. The left number is the wavelength in meters (WCS coordinate in 3rd dimension) and the right number is the slice number (slice number or array coordinates in 3rd dimension). You can manually edit any of these numbers and press ENTER to go to that slice in any coordinate system. If you want to go one-by-one, simply press the “Next” button. The first few slides are very noisy, but in the rest the noise level decreases and the galaxies are more obvious.

As you slide between the different wavelengths, you see that the noise-level is not constant and in some slices, the sky noise is very strong (for example, go to slice 3201 and press the “Next” button). We will discuss these issues below (in Sky lines in optical IFUs). To view the spectra of a region in DS9 take the following steps:

  1. Click somewhere on the image (to make sure DS9 receives your keyboard inputs), then press Ctrl+R to activate regions and click on the brightest galaxy of this cube (center-right, at RA, Dec of 39.9659175 and -1.5893075).
  2. A thin green circle will show up; this is called a “region” in DS9.
  3. Double-click on the region, and you will see a “Circle” window.
  4. Within the “Circle” window, click on the “Analysis” menu and select “Plot 3D”.
  5. A second “Circle” window will open that shows the spectra within your selected region. This is just the sum of values on each slice within the region.
  6. Don’t close the second “circle” window (that shows the spectrum). Click and hold the region in DS9, and move it to other objects within the cube. You will see that the spectrum changes as you move the region, and you can see that different objects have very different spectra. You can even see the spectra of only one part of a galaxy, not the whole galaxy.
  7. Take the region back to the first (brightest) galaxy that we originally started with.
  8. Slide over different wavelengths in the “Cube” window, you will see the light-blue line moving through the spectrum as you slide to different wavelengths. This line shows the wavelength of the displayed image in the main window over the spectra.
  9. The strongest emission line in this galaxy appears to be around 8500 Angstroms or \(8.5\times10^{-7}\) meters. From the position of the Balmer break (blue-ward of 5000 Angstroms for this galaxy), the strong seems to be H-alpha.
  10. To confirm that this is H-alpha, you can select the “Edit” menu in the spectrum window and select “Zoom”.
  11. Double-click and hold (for next step also) somewhere before the strongest line and slightly above the continuum (for example at 8E-07 in the horizontal and \(60\times10^{-20}\)erg/Angstrom/cm\(^2\)/s on the vertical). As you move your cursor (while holding), you will see a rectangular box getting created.
  12. Move the bottom-left corner of the box to somewhere after the strongest line and below the continuum. For example at 9E-07 and \(20\times10^{-20}\)erg/Angstrom/cm\(^2\)/s.
  13. Once you remove your finger from the mouse/touchpad, it will zoom-in to that part of the spectrum.
  14. To zoom out to the full spectrum, just press the right mouse button over the spectra (or tap with two fingers on a touchpad).
  15. Select that zoom-box again to see the brightest line much more clearly. You can also see the two lines of the Nitrogen II doublet that sandwich H-alpha. Beside its relative position to the Balmer break, this is further evidence that the strongest line is H-alpha.
  16. Let’s have a look at the galaxy in its best glory: right over the H-alpha line: Move the wavelength slider accurately (by pressing the “Previous” or “Next” buttons) such that the blue line falls in the middle of the H-alpha line. We see that the wavelength at this slice is 8.56593e-07 meters or 8565.93 Angstroms. Please compare the image of the galaxy at this wavelength with the wavelengths before (by pressing “Next” or “Previous”). You will also see that it is much more extended and brighter than other wavelengths! H-alpha shows the un-obscured star formation of the galaxy!

Automaticly going to next slice: When you want to get a general feeling of the cube, pressing the “Next” button many times is annoying and slow. To automatically shift between the slices, you can press the “Play” button in the DS9 “Cube” window. You can adjust the time it stays on each slice by clicking on the “Interval” menu and selecting lower values.

Knowing that this is H-alpha at 8565.93 Angstroms, you can get the redshift of the galaxy with the first command below and the location of all other expected lines in Gnuastro’s spectral line database with the second command. Because there are many lines in the second command (more than 200!), with the third command, we’ll only limit it to the Balmer series (that start with H-) using grep. The output of the second command prints the metadata on the top (that is not shown any more in the third command due to the grep call). To be complete, the first column is the observed wavelength of the given line in the given redshift and the second column is the name of the line.

# Redshift where H-alpha falls on 8565.93.
$ astcosmiccal --obsline=H-alpha,8565.93 --usedredshift
0.305221

# Wavelength of all lines in Gnuastro's database at this redshift
$ astcosmiccal --obsline=H-alpha,8565.93 --listlinesatz

# Only the Balmer series (Lines starting with 'H-'; given to Grep).
$ astcosmiccal --obsline=H-alpha,8565.93 --listlinesatz | grep H-
4812.13             H-19
4818.29             H-18
4825.61             H-17
4834.36             H-16
4844.95             H-15
4857.96             H-14
4874.18             H-13
4894.79             H-12
4921.52             H-11
4957.1              H-10
5006.03             H-9
5076.09             H-8
5181.83             H-epsilon
5353.68             H-delta
5665.27             H-gamma
6345.11             H-beta
8565.93             H-alpha
4758.84             H-limit

Zoom-out to the full spectrum and move the displayed slice to the location of the first emission line that is blue-ward (at shorter wavelengths) of H-alpha (at around 6300 Angstroms) and follow the previous steps to confirm that you are on its center. You will see that it falls exactly on \(6.34468\times10^{-7}\) m or 6344.68 Angstroms. Now, have a look at the Balmer lines above. You have found the H-beta line!

The rest of the Balmer series that you see in the list above (like H-gamma, H-delta and H-epsilon) are visible only as absorption lines. Please check their location by moving the blue line on the wavelengths above and confirm the spectral absorption lines with the ones above. The Balmer break is caused by the fact that these stronger Balmer absorption lines become too close to each other.

Looking back at the full spectrum, you can also confirm that the only other relatively strong emission line in this galaxy, that is on the blue side of the spectrum is the weakest OII line that is approximately located at 4864 Angstroms in the observed spectra of this galaxy. The numbers after the various OII emission lines show their rest-frame wavelengths (“OII” can correspond to many electron transitions, so we should be clear about which one we are talking about).

$ astcosmiccal --obsline=H-alpha,8565.93 --listlinesatz | grep O-II-
4863.3              O-II-3726
4866.93             O-II-3728
5634.82             O-II-4317
5762.42             O-II-4414
9554.21             O-II-7319
9568.22             O-II-7330

Please stop here and spend some time on doing the exercise above on other galaxies in the this cube to get a feeling of types of galaxy spectral features (and later on the full/large cube). You will notice that only star-forming galaxies have such strong emission lines! If you enjoy it, go get the full non-cropped cube and investigate the spectra, redshifts and emission/absorption lines of many more galaxies.

But going into those higher-level details of the physical meaning of the spectra (as intriguing as they are!) is beyond the scope of this tutorial. So we have to stop at this stage unfortunately. Now that you have a relatively good feeling of this small cube, let’s start doing some analysis to extract the spectra of the objects in this cube.


2.5.2 Sky lines in optical IFUs

As we were visually inspecting the cube in Viewing spectra and redshifted lines, we noticed some slices with very bad noise. They will later affect our detection within the cube, so in this section let’s have a fast look at them here. We’ll start by looking at the two cubes within the downloaded FITS file:

$ astscript-fits-view a370-crop.fits

The cube on the left is the same cube we studied before. The cube on the right (which is called STAT) shows the variance of each voxel. Go to slice 3195 and press “Next” to view the subsequent slices. Initially (for the first 5 or 6 slices), the noise looks reasonable. But as you pass slice 3206, you will see that the noise becomes very bad in both cubes. It stays like this until about slice 3238! As you go through the whole cube, you will notice that these slices are much more frequent in the reddest wavelengths.

These slices are affected by the emission lines from our own atmosphere! The atmosphere’s emission in these wavelengths significantly raises the background level in these slices. As a result, the Poisson noise also increases significantly (see Photon counting noise). During the data reduction, the excess background flux of each slice is removed as the Sky (or the mean of undetected pixels, see Sky value). However, the increased Poisson noise (scatter of pixel values) remains!

To see spectrum of the sky emission lines, simply put a region somewhere in the STAT cube and generate its spectrum (as we did in Viewing spectra and redshifted lines). You will clearly see the comb-like shape of atmospheric emission lines and can use this to know where to expect them.


2.5.3 Continuum subtraction

In Viewing spectra and redshifted lines, we visually inspected some of the most prominent emission lines of the brightest galaxy of the demo MUSE cube (see Detecting lines and extracting spectra in 3D data). Here, we will remove the “continuum” flux from under the emission lines to see them more distinctly.

Within a spectra, the continuum is the local “background” flux in the third/wavelength dimension. In other words, it is the flux that would be present at that wavelength if the emission line didn’t exist. Therefore, to accurately measure the flux of the emission line, we first need to subtract the continuum. One crude way of estimating the continuum flux at every slice is to use the sigma-clipped median value of that same pixel in the \(\pm{N/2}\) slides around it (for more on sigma-clipping, see Sigma clipping).

In this case, \(N=100\) should be a good first approximate (since it is much larger than any of the absorption or emission lines). With the first command below, let’s use Arithmetic’s filtering operators for estimating the sigma-clipped median only along the third dimension for every pixel in every slice (see Filtering (smoothing) operators). With the second command, have a look at the filtered cube and spectra. Note that the first command is computationally expensive and may take a minute or so.

$ astarithmetic a370-crop.fits set-i --output=filtered.fits \
                3 0.2 1 1 100 i filter-sigclip-median

$ astscript-fits-view filtered.fits -h1 --ds9scale="limits -5 20"

Looking at the filtered cube above, and sliding through the different wavelengths, you will see the noise in each slice has been significantly reduced! This is expected because each pixel’s value is now calculated from 100 others (along the third dimension)! Using the same steps as Viewing spectra and redshifted lines, plot the spectra of the brightest galaxy. Then, have a look at its spectra. You see that the emission lines have been significantly smoothed out to become almost69 invisible.

You can now subtract this “continuum” cube from the input cube to create the emission-line cube. In fact, as you see below, we can do it in a single Arithmetic command (blending the filtering and subtraction in one command). Note how the only difference with the previous Arithmetic command is that we added an i before the 3 and a - after filter-sigclip-median. For more on Arithmetic’s powerful notation, see Reverse polish notation. With the second command below, let’s view the input and continuum-subtracted cubes together:

$ astarithmetic a370-crop.fits set-i --output=no-continuum.fits \
                i 3 0.2 1 1 100 i filter-sigclip-median -

$ astscript-fits-view a370-crop.fits no-continuum.fits -h1 \
                      --ds9scale="limits -5 20"

Once the cubes are open, slide through the different wavelengths. Comparing the left (input) and right (continuum-subtracted) slices, you will rarely see any galaxy in the continuum-subtracted one! As its name suggests, the continuum flux is continuously present in all the wavelengths (with gradual change)! But the continuum has been subtracted now; so in the right-side image, you don’t see anything on wavelengths that don’t contain a spectral emission line. Some dark regions also appear; these are absorption lines! Please spend a few minutes sliding through the wavelengths and seeing how the emission lines pop-up and disappear again. It is almost like scuba diving, with fish appearing out of nowhere and passing by you.

Let’s go to slice 3046 (corresponding to 8555.93 Angstroms; just before the H-alpha line for the brightest galaxy in Viewing spectra and redshifted lines). Now press the “Next” button to change slices one by one until there is no more emission in the brightest galaxy. As you go to redder slices, you will see that not only does the brightness increase, but the position of the emission also changes. This is the Doppler effect caused by the rotation of the galaxy: the side that rotating towards us gets blue-shifted to bluer slices and the one that is going away from us gets redshifted to redder slices. If you go to the emission lines of the other galaxies, you will see that they move with a different angle! We can use this to derive the galaxy’s rotational properties and kinematics (Gnuastro doesn’t have this feature yet).

To see the Doppler shift in the spectrum, plot the spectrum over the top-side of the galaxy (which is visible in slice 3047). Then Zoom-in to the H-alpha line (as we did in Viewing spectra and redshifted lines) and press “Next” until you reach the end of the H-alpha emission-line. You see that by the time H-alpha disappears in the spectrum, within the cube, the emission shifts in the vertical axis by about 15 pixels! Then, move the region across the same path that the emission passed. You will clearly see that the H-alpha and Nitrogen II lines also move with you, in the zoomed-in spectra. Again, try this for several other emission lines, and several other galaxies to get a good feeling of this important concept when using hyper-spectral 3D data.


2.5.4 3D detection with NoiseChisel

In Continuum subtraction we subtracted the continuum emission, leaving us with only noise and the absorption and emission lines. The absorption lines are negative and will be missed by detection methods that look for a positive skewness70 (like NoiseChisel). So we will focus on the detection and extraction of emission lines here.

The first step is to extract the voxels that contain emission signal. To do that, we will be using NoiseChisel. NoiseChisel and Segment operate on 2D images or 3D cubes. But by default, they are configured for 2D images (some parameters like tile size take a different number of values based on the dimensionality). Therefore, to do 3D detection, the first necessary step is to run NoiseChisel with the default 3D configuration file.

To see where Gnuastro’s programs are installed, you can run the following command (the printed output is the default location when you install Gnuastro from source, but if you used another installation method or manually set a different location, you will see a different output, just use that):

$ which astnoisechisel
/usr/local/bin/astnoisechisel

As you see, the compiled binary programs (like NoiseChisel) are installed in the bin/ sub-directory of the install path (/usr/local in the example above, may be different on your system). The configuration files are in the etc/ sub-directory of the install path (here only showing NoiseChisel’s configuration files):

$ ls /usr/local/etc/astnoisechisel*.conf
/usr/local/etc/astnoisechisel-3d.conf
/usr/local/etc/astnoisechisel.conf

We should therefore call NoiseChisel with the 3D configuration file like below (please change /usr/local to any directory that you find from the which command above):

$ astnoisechisel --config=/usr/local/etc/astnoisechisel-3d.conf \
                 no-continuum.fits --output=det.fits

But having to add this long --config option is annoying and makes the command hard to read! To simplify the calling of NoiseChisel in 3D, let’s first make a shell alias called astnoisechisel-3d using the alias command. Afterwards, we can just use the alias. Afterwards (in the second command below), we are calling the alias, producing the same output as above. Finally (with the last command), let’s have a look at NoiseChisel’s output:

$ alias astnoisechisel-3d="astnoisechisel \
           --config=/usr/local/etc/astnoisechisel-3d.conf"

$ astnoisechisel-3d no-continuum.fits --output=det.fits

$ astscript-fits-view det.fits

Similar to its 2D outputs, NoiseChisel’s output contains four extensions/HDUs (see NoiseChisel output). For a multi-extension file with 3D data, astscript-fits-view shows each cube as a separate DS9 “Frame”. In this way, as you slide through the wavelengths, you see the same slice in all the cubes. The third and fourth extensions are the Sky and Sky standard deviation, which are not relevant here, so you can close them. To do that, press on the “Frame” button (in the top row of buttons), then press “delete” two times in the second row of buttons.

As a final preparation, manually set the scale of INPUT-NO-SKY cube to a fixed range so the changing flux/noise in each slice doesn’t interfere with visually comparing the data in the slices as you move around:

  1. Click on the INPUT-NO-SKY cube, so it is selected.
  2. Click on the “Scale” menu, then the “Scale Parameters”.
  3. For the “Low” value set -2 and for the “High” value set 5.
  4. In the “Cube” window, slide between the slices to confirm that the noise level is visually fixed.
  5. Go back to the first slice for the next steps. Note that the first and last couple of slices have much higher noise, don’t worry about those.

As you press the “Next” button in the first few slides, you will notice that the DETECTION cube is fully black: showing that nothing has been detected. The first detection pops up in the 55th slice for the galaxy on the top of this cube. As you press “Next” you will see that the detection fades away and other detections pop up. Spend a few minutes shifting between the different slices and comparing the detected voxels with the emission lines in the continuum-subtracted cube (the INPUT-NO-SKY extension).

Go ahead to slice 2815 and press “Next” a few times. You will notice that the detections suddenly start covering the whole slice and until slice 2859 where the detection map becomes normal (no extra detections!). This is the effect of the sky lines we mentioned before in Sky lines in optical IFUs. The increased noise makes the reduction very hard and as a result, a lot of artifacts appear. To reduce the effect of sky lines, we can divide the cube by its standard deviation (the square root of the variance or STAT extension; see Sky lines in optical IFUs) and run NoiseChisel afterwards.

$ astarithmetic no-continuum.fits -h1 a370-crop.fits -hSTAT sqrt / \
                --output=sn.fits

$ astnoisechisel-3d sn.fits --output=det.fits

$ astscript-fits-view det.fits

After the new detection map opens have another look at the specific slices mentioned above (from slice 2851 to 2859). You will see that there are no more detection maps that cover the whole field of view. Scroll the slide counter across the whole cube, you will rarely see such effects by Sky lines any more. But this is just a crude solution and doesn’t remove all sky line artifacts. For example go to slide 650 and press “Next”. You will see that the artifacts caused by this sky line are so strong that the solution above wasn’t successful. For these very strong emission lines, we need to improve the reduction. But generally, since the number of sky-line affected slices has significantly decreased, we can go ahead.


2.5.5 3D measurements and spectra

In the context of optical IFUs or radio IFUs in astronomy, a “Spectrum” is defined as separate measurements on each 2D slice of the 3D cube. Each 2D slice is defined by the first two FITS dimensions: the first FITS dimension is the horizontal axis and the second is the vertical axis. As with the tutorial on 2D image analysis (in Segmentation and making a catalog), let’s run Segment to see how it works in 3D. Like NoiseChisel above, to simplify the commands, let’s make an alias (3D detection with NoiseChisel):

$ alias astsegment-3d="astsegment \
           --config=/usr/local/etc/astsegment-3d.conf"

$ astsegment-3d det.fits --output=seg.fits

$ astscript-fits-view seg.fits

You see that we now have 3D clumps and 3D objects. So we can go ahead to do measurements. MakeCatalog can do single-valued measurements (as in 2D) on 3D datasets also. For example, with the command below, let’s get the flux-weighted center (in the three dimensions) and sum of pixel values. There isn’t usually a standard name for the third WCS dimension (unlike Ra/Dec). So in Gnuastro, we just call it --w3. With the second command, we are having a look at the first 5 rows. Note that we are not using -Y with asttable anymore because it the wavelength column would only be shown as zero (since it is in meters!).

$ astmkcatalog seg.fits --ids --ra --dec --w3 --sum --output=cat.fits

$ asttable cat.fits -h1 -O --txtf64p=5 --head=5
# Column 1: OBJ_ID [counter    ,i32,] Object identifier.
# Column 2: RA     [deg        ,f64,] Flux weighted center (WCS axis 1).
# Column 3: DEC    [deg        ,f64,] Flux weighted center (WCS axis 2).
# Column 4: AWAV   [m          ,f64,] Flux weighted center (WCS axis 3).
# Column 5: SUM    [input-units,f32,] Sum of sky subtracted values.
1  3.99677e+01   -1.58660e+00   4.82994e-07   7.311189e+02
2  3.99660e+01   -1.58927e+00   4.86411e-07   7.872681e+03
3  3.99682e+01   -1.59141e+00   4.90609e-07   1.314548e+03
4  3.99677e+01   -1.58666e+00   4.90816e-07   7.798024e+02
5  3.99659e+01   -1.58930e+00   4.93657e-07   3.255210e+03

Besides the single-valued measurements above (that are shared with 2D inputs), on 3D cubes, MakeCatalog can also do per-slice measurements. The options for these measurements are formatted as --*in-slice. With the command below, you can check their list:

$ astmkcatalog --help | grep in-slice
 --area-in-slice        [3D input] Number of labeled in each slice.
 --area-other-in-slice  [3D input] Area of other lab. in projected area.
 --area-proj-in-slice   [3D input] Num. voxels in '--sum-proj-in-slice'.
 --sum-err-in-slice     [3D input] Error in '--sum-in-slice'.
 --sum-in-slice         [3D input] Sum of values in each slice.
 --sum-other-err-in-slice   [3D input] Area in '--sum-other-in-slice'.
 --sum-other-in-slice   [3D input] Sum of other lab. in projected area.
 --sum-proj-err-in-slice   [3D input] Error of '--sum-proj-in-slice'.
 --sum-proj-in-slice    [3D input] Sum of projected area in each slice.

For every label and measurement, these options will give many values in a vector column (see Vector columns). Let’s have a look by asking for the sum of values and area of each label in each slice associated to each label with the command below. There is just one important point: in 3D detection with NoiseChisel, we ran NoiseChisel on the signal-to-noise image, not the continuum-subtracted image! So the values to use for the measurement of each label should come from the no-continuum.fits file (not seg.fits).

$ astmkcatalog seg.fits --ids --ra --dec --w3 --sum  \
               --area-in-slice --sum-in-slice --output=cat.fits \
               --valuesfile=no-continuum.fits

$ asttable -i cat.fits
--------
seg_cat.fits (hdu: 1)
-------          -----       ----          -------
No.Name          Units       Type          Comment
-------          -----       ----          -------
1  OBJ_ID        counter     int32         Object identifier.
2  RA            deg         float64       Flux wht center (WCS 1).
3  DEC           deg         float64       Flux wht center (WCS 2).
4  AWAV          m           float64       Flux wht center (WCS 3).
5  SUM           input-units float32       Sum of sky-subed values.
6  AREA-IN-SLICE counter     int32(3681)   Number of pix. in each slice.
7  SUM-IN-SLICE  input-units float32(3681) Sum of values in each slice.
--------
Number of rows: 211
--------

You can see that the new AREA-IN-SLICE and SUM-IN-SLICE columns have a (3681) in their types. This shows that unlike the single-valued columns before them, in these columns, each row has 3681 values (a “vector” column). If you are not already familiar with vector columns, please take a few minutes to read Vector columns. Since a MUSE data cube has 3681 slices, this is effectively the spectrum of each object.

Let’s find the object that corresponds to the H-alpha emission of the brightest galaxy (that we found in Viewing spectra and redshifted lines). That emission line was around 8565.93 Angstroms, so let’s look for the objects within \(\pm5\) Angstroms of that value (between 8560 to 8570 Angstroms):

$ asttable cat.fits --range=AWAV,8.560e-7,8.570e-7 -cobj_id,ra,dec -Y
198    39.965897   -1.589279

From the command above, we see that at this wavelength, there was only one object. Let’s extract its spectrum by asking for the sum-in-slice column:

$ asttable cat.fits --range=AWAV,8.560e-7,8.570e-7 \
           -carea-in-slice,sum-in-slice

If you look into the outputs, you will see that it is a single line! It contains a long list of 0 values at the start and nan values in the end. If you scroll slowly, in the middle of each you will see some non-zero and non-NaN numbers. To help interpret this more easily, let’s transpose these vector columns (so each value of the vector column becomes a row in the output). We will use the --transpose option of Table for this (just note that since transposition changes the number of rows, it can only be used when your table only has vector columns and they all have the same number of elements (as in this case, for more):

$ asttable cat.fits --range=AWAV,8.560e-7,8.570e-7 \
           -carea-in-slice,sum-in-slice --transpose

We now see the measurements on each slice printed in a separate line (making it much more easier to visually read). However, without a counter, it is very hard to interpret them. Let’s pipe the output to a new Table command and use column arithmetic’s counter operator for displaying the slice number (see Size and position operators). Note that since we are piping the output, we also added -O so the column metadata are also passed to the new instance of Table:

$ asttable cat.fits --range=AWAV,8.560e-7,8.570e-7 -O \
           -carea-in-slice,sum-in-slice --transpose \
           | asttable -c'arith $1 counter swap',2
...[[truncated]]...
3040   0       nan
3041   0       nan
3042   0       nan
3043   0       nan
3044   1       4.311140e-01
3045   18      3.936019e+00
3046   161    -5.800080e+00
3047   360     2.967184e+02
3048   625     1.912855e+03
3049   823     5.140487e+03
3050   945     7.174101e+03
3051   999     6.967604e+03
3052   1046    6.468591e+03
3053   1025    6.457354e+03
3054   996     6.599119e+03
3055   966     6.762280e+03
3056   873     5.014052e+03
3057   649     2.003334e+03
3058   335     3.167579e+02
3059   131     1.670975e+01
3060   25     -2.953789e+00
3061   0       nan
3062   0       nan
3063   0       nan
3064   0       nan
...[[truncated]]...

$ astscript-fits-view seg.fits

After DS9 opens with the last command above, go to slice 3044 (which is the first non-NaN slice in the spectrum above). In the OBJECTS extension of this slice, you see several non-zero pixels. The few non-zero pixels on the bottom have a label of 197 and the single non-zero pixel at a higher Y axis position has a label of 198 (which as we saw above, was the label of the H-alpha emission of this galaxy). The few 197 labeled pixels in this slice are the last voxels of the NII emission that is just blue-ward of H-alpha.

The single pixel you see in slice 3044 is why you see a value of 1 in the AREA-IN-SLICE column. As you go to the next slices, if you count the pixels, you will see they add up to the same number you see in that column. The values in the SUM-IN-SLICE are the sum of values in the continuum-subtracted cube for those same voxels. You should now be able to understand why the --sum-in-slice column has NaN values in all other slices: because this label doesn’t exist in any other slice! Also, within slices that contain label 198, this column only uses the voxels that have the label. So as you see in the second column above, the area that is used in each changes.

Therefore --sum-in-slice or area-in-slice are the raw 3D spectrum of each 3D emission-line. This is a different concept from the traditional “spectrum” where the same area is used over all the slices. To get that you should use the --sumprojinslice column of MakeCatalog. All the --*in-slice options that contain a proj in their name are measurements over the fixed “projection” of the 3D volume on the 2D surface of each slice. To see the effect, let’s also ask MakeCatalog to measure this projected sum column:

$ astmkcatalog seg.fits --ids --ra --dec --w3 --sum  \
               --area-in-slice --sum-in-slice --sum-proj-in-slice \
               --output=cat.fits --valuesfile=no-continuum.fits
$ asttable cat.fits --range=AWAV,8.560e-7,8.570e-7 -O \
           -carea-in-slice,sum-in-slice,sum-proj-in-slice \
           --transpose \
           | asttable -c'arith $1 counter swap',2,3
...[[truncated]]...
3040   0       nan            8.686357e+02
3041   0       nan            4.384907e+02
3042   0       nan            4.994813e+00
3043   0       nan           -1.595918e+02
3044   1       4.311140e-01  -2.793141e+02
3045   18      3.936019e+00  -3.251023e+02
3046   161    -5.800080e+00  -2.709914e+02
3047   360     2.967184e+02   1.049625e+02
3048   625     1.912855e+03   1.841315e+03
3049   823     5.140487e+03   5.108451e+03
3050   945     7.174101e+03   7.149740e+03
3051   999     6.967604e+03   6.913166e+03
3052   1046    6.468591e+03   6.442184e+03
3053   1025    6.457354e+03   6.393185e+03
3054   996     6.599119e+03   6.572642e+03
3055   966     6.762280e+03   6.716916e+03
3056   873     5.014052e+03   4.974084e+03
3057   649     2.003334e+03   1.870787e+03
3058   335     3.167579e+02   1.057906e+02
3059   131     1.670975e+01  -2.415764e+02
3060   25     -2.953789e+00  -3.534623e+02
3061   0       nan           -3.745465e+02
3062   0       nan           -2.532008e+02
3063   0       nan           -2.372232e+02
3064   0       nan           -2.153670e+02
...[[truncated]]...

As you see, in the new SUM-PROJ-IN-SLICE column, we have a measurement in each slice: including slices that do not have the label of 198 at all. Also, the area used to measure this sum is the same in all slices (similar to a classical spectrometer’s output).

However, there is a big problem: have a look at the sums in slices 3040 and 3041: the values are increasing! This is because of the emission in the NII line that also falls over the projected area of H-alpha. This shows the power of IFUs as opposed to classical spectrometers: we can distinguish between individual lines based on spatial position and do measurements in 3D!

Finally, in case you want the spectrum with the continuum, you just have to change the file given to --valuesfile:

$ astmkcatalog seg.fits --ids --ra --dec --w3 --sum  \
               --area-in-slice --sum-in-slice --sum-proj-in-slice \
               --valuesfile=a370-crop.fits \
               --output=cat-with-continuum.fits

2.5.6 Extracting a single spectrum and plotting it

In 3D measurements and spectra we measured the spectra of all the objects with the MUSE data cube of this demonstration tutorial. Let’s now write the resulting spectra for our object 198 into a file to view our measured spectra in TOPCAT for a more visual inspection. But we don’t want slice numbers (which are specific to MUSE), we want the horizontal axis to be in Angstroms. To do that, we can use the WCS information:

CRPIX3

The “Coordinate Reference PIXel” in the 3rd dimension (or slice number of reference) Let’s call this \(s_r\).

CRVAL3

The “Coordinate Reference VALue” in the 3rd dimension (the WCS coordinate of the slice in CRPIX3. Let’s call this \(\lambda_r\)

CDELT3

The “Coordinate DELTa” in the 3rd dimension, or how much the WCS changes with every slice. Let’s call this \(\delta\).

To find the \(\lambda\) (wavelength) of any slice with number \(s\), we can simply use this equation:

$$\lambda=\lambda_r+\delta(s-s_r)$$

Let’s extract these three values from the FITS WCS keywords as shell variables to automatically do this within Table’s column arithmetic. Here we are using the technique that is described in Separate shell variables for multiple outputs.

$ eval $(astfits seg.fits --keyvalue=CRPIX3,CRVAL3,CDELT3 -q \
                 | xargs printf "sr=%s; lr=%s; d=%s;")

## Just for a check:
$ echo $sr
1.000000e+00
$ echo $lr
4.749679687500000e-07
$ echo $d
1.250000000000000e-10

Now that we have the necessary constants, we can simply convert the equation above into Reverse polish notation and use column arithmetic to convert the slice counter into wavelength in the command of 3D measurements and spectra.

$ asttable cat.fits --range=AWAV,8.560e-7,8.570e-7 -O \
       -carea-in-slice,sum-in-slice,sum-proj-in-slice \
       --transpose \
       | asttable -c'arith $1 counter '$sr' - '$d' x '$lr' + f32 swap' \
                  -c2,3 --output=spectrum-obj-198.fits \
                  --colmetadata=1,WAVELENGTH,m,"Wavelength of slice." \
                  --colmetadata=2,"AREA-IN-SLICE",voxel,"No. of voxels."

$ astscript-fits-view spectrum-obj-198.fits

Once TOPCAT opens, take the following steps:

  1. In the “Graphics” menu, select “Plane plot”.
  2. Change AREA-IN-SLICE to SUM-PROJ-IN-SLICE.
  3. Select the “Form” tab.
  4. Click on the button with the large green “+” button and select “Add line”.
  5. Un-select the “Mark” item that was originally selected.

Of course, the table in spectrum-obj-198.fits can be plotted using any other plotting tool you prefer to use in your scientific papers.


2.5.7 Pseudo narrow-band images

In Continuum subtraction we subtracted/separated the continuum from the emission/absorption lines of our galaxy in the MUSE cube. Let’s visualize the morphology of the galaxy at some of the spectral lines to see how it looks. To do this, we will create pseudo narrow-band 2D images by collapsing the cube along the third dimension within a certain wavelength range that is optimized for that flux.

Let’s find the wavelength range that corresponds to H-alpha emission we studied in Extracting a single spectrum and plotting it. Fortunately MakeCatalog can calculate the minimum and maximum position of each label along each dimension like the command below. If you always need these values, you can include these columns in the same MakeCatalog with --sum-proj-in-slice. Here we are running it separately to help you follow the discussion there.

$ astmkcatalog seg.fits --output=cat-ranges.fits \
               --ids --min-x --max-x --min-y --max-y --min-z --max-z

Let’s extract the minimum and maximum positions of this particular object with the first command and with the second, we’ll write them into different shell variables. With the second command, we are writing those six values into a single string in the format of Crop’s Crop section syntax. For more on the eval-based shell trick we used here, see Separate shell variables for multiple outputs. Finally, we are running Crop and viewing the cropped 3D cube.

$ asttable cat-ranges.fits --equal=OBJ_ID,198 \
                  -cMIN_X,MAX_X,MIN_Y,MAX_Y,MIN_Z,MAX_Z
56     101    11     61     3044   3060

$ eval $(asttable cat-ranges.fits --equal=OBJ_ID,198 \
                  -cMIN_X,MAX_X,MIN_Y,MAX_Y,MIN_Z,MAX_Z \
                  | xargs printf "section=%s:%s,%s:%s,%s:%s; ")

$ astcrop no-continuum.fits --mode=img --section=$section \
          --output=crop-no-continuum.fits

$ astscript-fits-view crop-no-continuum.fits

Go through the slices and you will only see this particular region of the full cube. We can now collapse the third dimension of this image into a 2D pseudo-narrow band image with Arithmetic’s Dimensionality changing operators:

$ astarithmetic crop-no-continuum.fits 3 collapse-sum \
                --output=collapsed-all.fits

$ astscript-fits-view collapsed-all.fits

During the collapse, used all the pixels in each slice. This is not good for the faint outskirts in the peak of the emission line: the noise of the slices with less signal decreases the over-all signal-to-noise ratio in the pseudo-narrow band image. So let’s set all the pixels that aren’t labeled with this object as NaN, then collapse. To do that, we first need to crop the OBJECT cube in seg.fits. With the second command, please have a look to confirm how the labels change as a function of wavelength.

$ astcrop seg.fits -hOBJECTS --mode=img --section=$section \
          --output=crop-obj.fits

$ astscript-fits-view crop-obj.fits

Let’s use Arithmetic to first set all the pixels that are not equal to 198 in collapsed-obj.fits to be NaN in crop-no-continuum.fits. With the second command, we are opening the two collapsed images together:

$ astarithmetic crop-no-continuum.fits set-i \
                crop-obj.fits          set-o \
                i o 198 ne nan where 3 collapse-sum \
                --output=collapsed-obj.fits

$ astscript-fits-view collapsed-all.fits collapsed-obj.fits \
                      --ds9extra="-lock scalelimits yes -blink"

Let it blink a few times and focus on the outskirts: you will see that the diffuse flux in the outskirts has indeed been preserved better in the object-based collapsed narrow-band image. But this is a little hard to appreciate in the 2D image. To see it better practice, let’s get the two radial profiles. We will approximately assume a position angle of -80 and axis ratio of 0.771. With the final command below, we are opening both radial profiles in TOPCAT to visualize them. We are also undersampling the radial profile to have better signal-to-noise ratio in the outer radii:

$ astscript-radial-profile collapsed-all.fits \
           --position-angle=-80 --axis-ratio=0.7 \
           --undersample=2 --output=collapsed-all-rad.fits

$ astscript-radial-profile collapsed-obj.fits \
           --position-angle=-80 --axis-ratio=0.7 \
           --undersample=2 --output=collapsed-obj-rad.fits

To view the difference, let’s merge the two profiles (the MEAN column) into one table and simply print the two profiles beside each other. We will then pipe the resulting table containing both columns to a second call to Gnuastro’s Table and use column arithmetic to subtract the two mean values and divide them by the optimized one (to get the fractional difference):

$ asttable collapsed-all-rad.fits --catcolumns=MEAN -O \
           --catcolumnfile=collapsed-obj-rad.fits \
           | asttable -c1,2,3 -c'arith $3 $2 - $3 /' \
                      --colmetadata=2,MEAN-ALL \
                      --colmetadata=3,MEAN-OBJ \
                      --colmetadata=4,DIFF,frac,"Fractional diff." -YO
# Column 1: RADIUS   [pix        ,f32,] Radial distance
# Column 2: MEAN-ALL [input-units,f32,] Mean of sky subtracted values.
# Column 3: MEAN-OBJ [input-units,f32,] Mean of sky subtracted values.
# Column 4: DIFF     [frac       ,f32,] Fractional diff.
0.000          436.737        450.256        0.030
2.000          371.880        384.071        0.032
4.000          313.429        320.138        0.021
6.000          275.744        280.102        0.016
8.000          152.214        154.470        0.015
10.000         59.311         62.207         0.047
12.000         18.466         20.396         0.095
14.000         6.940          8.671          0.200
16.000         3.052          4.256          0.283
18.000         1.590          2.848          0.442
20.000         1.430          2.550          0.439
22.000         0.838          1.975          0.576

As you see, beyond a radius of 10, the last fractional difference column becomes very large, showing that a lot of signal is missing in the MEAN-ALL column. For a more visual comparison of the two profiles, you can use the command below to open both tables in TOPCAT:

$ astscript-fits-view collapsed-all-rad.fits \
                      collapsed-obj-rad.fits

Once TOPCAT has opened take the following steps:

  1. Select collapsed-all-rad.fits
  2. In the “Graphics” menu, select “Plane Plot”.
  3. Click on the “Axes” side-bar (by default, at the bottom half of the window), and click on “Y Log” to view the vertical axis in logarithmic scale.
  4. In the “Layers” menu, select “Add Position Control”. You will see that at the bottom half, a new scatter plot information is displayed.
  5. Click on the scroll-down menu in front of “Table” and select 2: collapsed-obj-rad.fits. Afterwards, you will see the optimized pseudo-narrow-band image radial profile as blue points.

2.6 Color images with full dynamic range

Color images are fundamental tools to visualize astronomical datasets, allowing to visualize valuable physical information within them. A color image is a composite representation derived from different channels. Each channel usually corresponding to different filters (each showing wavelength intervals of the object’s spectrum). In general, most common color image formats (like JPEG, PNG or PDF) are defined from a combination of Red-Green-Blue (RGB) channels (to cover the optical range with normal cameras). These three filters are hard-wired in your monitor and in most normal camera (for example smartphone or DSLR) pixels. For more on the concept and usage of colors, see Color and Colormaps for single-channel pixels.

However, normal images (that you take with your smartphone during the day for example) have a very limited dynamic range (difference between brightest and fainest part of an image). For example in an image you take from a farm, the brightness pixel (the sky) cannot be more than 255 times the faintest/darkest shadow in the image (because normal cameras produce unsigned 8 bit integers; containing \(2^8=256\) levels; see Numeric data types).

However, astronomical sources span a much wider dynamic range such that their central parts can be tens of millions of times brighter than their larger outer regions. Our astronomical images in the FITS format are therefore usually 32-bit floating points to preserve this information. Therefore a simple linear scaling of 32-bit astronomical data to the 8-bit range will put most of the pixels on the darkest level and barely show anything! This presents a major challenge in visualizing our astronomical images on a monitor, in print or for a projector when showing slides.

In this tutorial, we review how to prepare your images and create informative RGB images for your PDF reports. We start with aligning the images to the same pixel grid (which is usually necessary!) and using the low-level engine (Gnuastro’s ConvertType program) directly to create an RGB image. Afterwards, we will use a higher-level installed script (Color images with gray faint regions). This is a high-level wrapper over ConvertType that does some pre-processing and stretches the pixel values to enhance their 8-bit representation before calling ConvertType.


2.6.1 Color channels in same pixel grid

In order to use different images as color channels, it is important that the images be properly aligned and on the same pixel grid. When your inputs are high-level products of the same survey, this is usually the case. However, in many other situations the images you plan to use as different color channels lie on different sky positions, even if they may have the same number of pixels. In this section we will show how to solve this problem.

For an example dataset, let’s use the same SDSS field that we used in Detecting large extended targets: the field covering the outer parts of the M51 group. With the commands below, we’ll make an inputs directory and download and prepare the three g, r and i band images of SDSS over the same field there:

$ mkdir in
$ sdssurl=https://dr12.sdss.org/sas/dr12/boss/photoObj/frames
$ for f in g r i; do \
      wget $sdssurl/301/3716/6/frame-$f-003716-6-0117.fits.bz2 \
           -O$f.fits.bz2; \
      bunzip2 $f.fits.bz2; \
      astfits $f.fits --copy=0 -oin/$f-sdss.fits; \
      rm $f.fits; \
  done

Let’s have a look at the three three images with the first command, and get their number of pixels with the second:

## Open the images locked by image coordinates
$ astscript-fits-view in/*-sdss.fits

## Check the number of pixels along each axis of all images.
$ astfits in/*-sdss.fits --keyvalue=NAXIS1,NAXIS2
in/g-sdss.fits    2048   1489
in/i-sdss.fits    2048   1489
in/r-sdss.fits    2048   1489

From the first command, the images look like they cover the same astronomical object (M51) in the same region of the sky, and with the second, we see that they have the same number of pixels. But this general visual inspection does not guarantee that the astronomical objects within the pixel grid cover exactly the same positions (within a pixel!) on the sky. Let’s open the images again, but this time asking DS9 to only show one at a time, and to “blink” between them:

$ astscript-fits-view in/*-sdss.fits \
           --ds9extra="-single -zoom to fit -blink"

If you pay attention, you will see that the objects within each image are at slightly different locations. If you don’t immediately see it, try zooming in to any star within the image and let DS9 continue blinking. You will see that the star jumps a few pixels between each blink.

In essence, the images are not aligned on the same pixel grid, therefore, the same source does not share identical image coordinates across these three images. As a consequence, it is necessary to align the images before making the color image, otherwise this misalignment will generate multiply-peaked point-sources (stars and centers of galaxies) and artificial color gradients in the more diffuse parts. To align the images to the same pixel grid, we will employ Gnuastro’s Warp program. In particular, its features to Align pixels with WCS considering distortions.

Let’s take the middle (r band) filter as the reference to define our grid. With the first command after building the aligned/ directory, let’s align the r filter to the celestial coordinates (so the M51 group’s position angle doesn’t depend on the orientation of the telescope when it took this image). With for the other two filters, we will use Warp’s --gridfile option to ensure that ensure that their pixel grid and WCS exactly match the r band image, but the pixel values come from the other two filters. Finally, in the last command, we’ll visualize the three aligned images.

## Put all three channels in the same pixel grid.
$ mkdir aligned
$ astwarp in/r-sdss.fits --output=aligned/r-sdss.fits
$ astwarp in/g-sdss.fits --output=aligned/g-sdss.fits \
          --gridfile=aligned/r-sdss.fits
$ astwarp in/i-sdss.fits --output=aligned/i-sdss.fits \
          --gridfile=aligned/r-sdss.fits

## Open the images locked by image coordinates
$ astscript-fits-view aligned/*-sdss.fits \
           --ds9extra="-single -zoom to fit -blink"

As the images blink between each other, zoom in to some of the smaller stars and you will see that they no longer jump from one blink to the next. These images are now precisely pixel-aligned. We are now equipped with the essential data to proceed with the color image generation in Color image using linear transformation.


2.6.2 Color image using linear transformation

Previously (in Color channels in same pixel grid), we downloaded three SDSS filters of M51 and described how you can put them all in the same pixel grid. In this section, we will explore the raw and low-level process of generating color images using the input images (without modifying the pixel value distributions). We will use Gnuastro’s ConvertType program (with executable name astconvertt).

Let’s create our first color image using the aligned SDSS images mentioned in the previous section. The order in which you provide the images matters, so ensure that you sort the filters from redder to bluer (iSDSS and gSDSS are respectively the reddest and bluest of the three filters used here).

$ astconvertt aligned/i-sdss.fits aligned/r-sdss.fits \
              aligned/g-sdss.fits -g1 --output=m51.pdf

Other color formats: In the example above, we are using PDF because this is usually the best format to later also insert marks that are commonly necessary in scientific publications (see Marking objects for publication. But you can also generate JPEG and TIFF outputs simply by using a different suffix for your output file (for example --output=m51.jpg or --output=m51.tiff).

Open the image with your PDF viewer and have a look. Do you see something? Initially, it appears predominantly black. However, upon closer inspection, you will discern very tiny points where some color is visible. These points correspond to the brightest part of the brightest sources in this field! The reason you saw much more structure when looking at the image in DS9 previously in Color channels in same pixel grid was that astscript-fits-view used DS9’s -zscale option to scale the values in a non-linear way! Let’s have another look at the images with the linear minmax scaling of DS9:

$ astscript-fits-view aligned/*-sdss.fits \
           --ds9extra="-scale minmax -lock scalelimits"

You see that it looks very similar to the PDF we generated above: almost fully black! This phenomenon exemplifies the challenge discussed at the start of this tutorial in Color images with full dynamic range). Given the vast number of pixels close to the sky background level compared to the relatively few very bright pixels, visualizing the entire dynamic range simultaneously is tricky.

To address this challenge, the low-level ConvertType program allows you to selectively choose the pixel value ranges to be displayed in the color image. This can be accomplished using the --fluxlow and --fluxhigh options of ConvertType. Pixel values below --fluxlow are mapped to the minimum value (displayed as black in the default colormap), and pixel values above --fluxhigh are mapped to the maximum value (displayed as white)) The choice of these values depends on the pixel value distribution of the images.

But before that, we have to account for an important differences between the filters: the brightness of the background also has different values in different filters (the sky has colors!) So before making more progress, generally, first you have to subtract the sky from all three images you want to feed to the color channels. In a previous tutorial (Detecting large extended targets) we used these same images as a basis to show how you can do perfect sky subtraction in the presence of large extended objects like M51. Here we are just doing a visualization and bringing pixels to 8-bit, so we don’t need that level of precision reached there (we won’t be doing photometry!). Therefore, let’s just keep the --tilesize=100,100 of NoiseChisel.

$ mkdir no-sky
$ for f in i r g; do \
    astnoisechisel aligned/$f-sdss.fits --tilesize=100,100 \
                   --output=no-sky/$f-sdss.fits; \
  done

Accounting for zero points: An important step that we have not implemented in this section is to unify the zero points of the three filters. In the case of SDSS (and some other surveys), the images have already been brought to the same zero point, but that is not generally the case. So before subtracting sky (and estimating the standard deviation) you should also unify the zero points of your images (for example through Arithmetic’s counts-to-nanomaggy or counts-to-jy described in Unit conversion operators). If you don’t already have the zero point of your images, see the dedicated tutorial: Zero point of an image.

Now that we know the noise fluctuates around zero in all three images, we can start to define the values for the --fluxlow and --fluxhigh. But the sky standard deviation comes from the sky brightness in different filters and is therefore different! Let’s have a look by taking the median value of the SKY_STD extension of NoiseChisel’s output:

$ aststatistics no-sky/i-sdss.fits -hSKY_STD --median
2.748338e-02

$ aststatistics no-sky/r-sdss.fits -hSKY_STD --median
1.678463e-02

$ aststatistics no-sky/g-sdss.fits -hSKY_STD --median
9.687680e-03

You see that the sky standard deviation of the reddest filter (i) is almost three times the bluest filter (g)! This is usually the case in any scenario (redder emission usually requires much less energy and gets absorbed less, so the background is usually brighter in the reddest filters). As a result, we should define our limits based on the noise of the reddest filter. Let’s set the minimum flux to 0 and the maximum flux to ~50 times the noise of the i-band image (\(0.027\times50=1.35\)).

$ astconvertt no-sky/i-sdss.fits no-sky/r-sdss.fits no-sky/g-sdss.fits \
              -g1 --fluxlow=0.0 --fluxhigh=1.35 --output=m51.pdf

After opening the new color image, you will observe that a spiral arm of M51 and M51B (or NGC5195, which is interacting with M51), become visible. However, the majority of the image remains black. Feel free to experiment with different values for --fluxhigh to set the maximum value closer to the noise-level and see the more diffuse structures. For instance, try with --fluxhigh=0.27 the brightest pixels will have a signal-to-noise ratio of 10, or even --fluxhigh=0.135 for a signal-to-noise ratio of 5. But you will notice that, the brighter areas of the galaxy become "saturated": you don’t see the structure of brighter parts of the galaxy any more. As you bring down the maximum threshold, the saturated areas also increase in size: loosing some useful information on the bright side!

Let’s go to the extreme and decrease the threshold to close the noise-level (for example --fluxhigh=0.027 to have a signal-to-noise ratio of 1)! You will see that the noise now becomes colored! You generally don’t want this because the difference in filter values of one pixel are only physically meaningful when they have a high signal-to-noise ratio. For lower signal-to-noise ratios, we should avoid color.

Ideally, we want to see both the brighter parts of the central galaxy, as well as the fainter diffuse parts together! But with the simple linear transformation here, that is not possible! You need some pre-processing (before calling ConvertType) to scale the images. For example, you can experiment with taking the logarithm or the square root of the images (using Arithmetic) before creating the color image.

These non-linear functions transform pixel values, mapping them to a new range. After applying such transformations, you can use the transformed images as inputs to astconvertt to generate color images (similar to how we subtracted the sky; which is a linear operation). In addition to that, it is possible to use a different color schema for showing the different brightness ranges as it is explained in the next section. In the next section (Color for bright regions and grayscale for faint), we’ll review one high-level installed script which will simplify all these pre-processings and help you produce images with more information in them.


2.6.3 Color for bright regions and grayscale for faint

In the previous sections we aligned three SDSS images of M51 group Color channels in same pixel grid, and created a linearly-scaled color image (only using astconvertt program) in Color image using linear transformation. But we saw that showing the brighter and fainter parts of the galaxy in a single image is impossible in the linear scale! In this section, we will use Gnuastro’s astscript-color-faint-gray installed script to address this problem and create images which visualize a major fraction of the contents of our astronomical data.

This script aims to solve the problems mentioned in the previous section. See Infante-Sainz et al. 2024, which first introduced this script, for examples of the final images we will be producing in this tutorial. This script uses a non-linear transformation to modify the bright input values before combining them to produce the color image. Furthermore, for the faint regions of the image, it will use grayscale and avoid color over all (as we saw, colored noised is not too nice to look at!). The faint regions are also inverted: so the brightest pixel in the faint (black-and-white or grayscale) region is black and the faintest pixels will be white. Black therefore creates a smooth transition from the colored bright pixels: the faintest colored pixel is also black. Since the background is white and the diffuse parts are black, the final product will also show nice in print or show on a projector (the background is not black, but white!).

The SDSS image we used in the previous sections doesn’t show the full glory of the M51 group! Therefore, in this section, we will use the wider images from the J-PLUS survey. Fortunately J-PLUS includes the SDSS filters, so we can use the same iSDSS, rSDSSS, and gSDSS filters of J-PLUS. As a consequence, similar to the previous section, the R, G, and B channels are respectively mapped to the iSDSS, rSDSS and gSDSS filters of J-PLUS.

The J-PLUS identification numbers for the images containing the M51 galaxy group are in these three filters are respectively: 92797, 92801, 92803. The J-PLUS images are already sky subtracted and aligned into the same pixel grid (so we will not need the astwarp and astnoisechisel steps before). However, zero point magnitudes of the J-PLUS images are different: 23.43, 23.74, 23.74. Also, the field of view of the J-PLUS Camera is very large and we only need a small region to see the M51 galaxy group. Therefore, we will crop the regions around the M51 group with a width of 0.35 degree wide (or 21 arcmin) and put the crops in the same aligned/ directory we made before (which contains the inputs to the colored images). With all the above information, let’s download, crop, and have a look at the images to check that everything is fine. Finally, let’s run astscript-color-faint-gray on the three cropped images.

## Download
$ url=https://archive.cefca.es/catalogues/vo/siap/jplus-dr3/get_fits?id=
$ wget "$url"92797 -Oin/i-jplus.fits.fz
$ wget "$url"92801 -Oin/r-jplus.fits.fz
$ wget "$url"92803 -Oin/g-jplus.fits.fz

## Crop
$ widthdeg=0.35
$ ra=202.4741207
$ dec=47.2171879
$ for f in i r g; do \
    astcrop in/$f-jplus.fits.fz --center=$ra,$dec \
            --width=$widthdeg --output=aligned/$f-jplus.fits; \
  done

## Visual inspection of the images used for the color image
$ astscript-fits-view aligned/*-jplus.fits

## Create colored image.
$ R=aligned/i-jplus.fits
$ G=aligned/r-jplus.fits
$ B=aligned/g-jplus.fits
$ astscript-color-faint-gray $R $G $B -g1 --output=m51.pdf

After opening the PDF, you will notice that it is a color image with a gray background, making the M51 group and background galaxies visible together. However, the images does not look nice and there is significant room for improvement! You will notice that at the end of its operation, the script printed some numerical values for four options in a table, to show automatically estimated parameter values. To enhance the output, let’s go through and explain these step by step.

The first important point to take into account is the photometric calibration. If the images are photometrically calibrated, then it is necessary to use the calibration to put the images in the same physical units and create “real” colors. The script is able to do it through the zero point magnitudes with the option --zeropoint (or -z). With this option, the images are internally transformed to have the same pixel units and then create the color image. Since the magnitude zero points are 23.43, 23.74, 23.74 for the i, r, and g images, let’s use them in the definition

$ astscript-color-faint-gray $R $G $B -g1 --output=m51.pdf \
                           -z23.43 -z23.74 -z23.74

Open the image and have a look. This image does not differ too much from the one generated by default (not using the zero point magnitudes). This is because the zero point values used here are similar for the three images. But in other datasets the calibration could make a big difference!

Let’s consider another vital parameter: the minimum value to be displayed (--minimum or -m). Pixel values below this number will not be shown on the color image. In general, if the sky background has been subtracted (see Color image using linear transformation), you can use the same value (0) for all three. However, it is possible to consider different minimum values for the inputs (in this case use as many -m as input images). In this particular case, a minimum value of zero for all images is suitable. To keep the command simple, we’ll add the zero point, minimum and HDU of each image in the variable that also had its filename.

$ R="aligned/i-jplus.fits -h1 --zeropoint=23.43 --minimum=0.0"
$ G="aligned/r-jplus.fits -h1 --zeropoint=23.74 --minimum=0.0"
$ B="aligned/g-jplus.fits -h1 --zeropoint=23.74 --minimum=0.0"
$ astscript-color-faint-gray $R $G $B --output=m51.pdf

In contrast to the previous image, the new PDF (with a minimum value of zero) exhibits a better background visualization because it is avoiding negative pixels to be included in the scaling (they are white).

Now let’s review briefly how the script modifies the pixel value distribution in order to show the entire dynamical range in an appropriate way. The script combines the three images into a single one by using a the mean operator, as a consequence, the combined image is the average of the three R, G, and B images. This averaged image is used for performing the asinh transformation of Lupton et al. 2004 that is controlled by two parameters: --qbright (\(q\)) and --stretch (\(s\)).

The asinh transformation consists in transforming the combined image (\(I\)) according to the expression: \(f(I) = asinh(q\times{}s\times{}I)/q\). When \(q\rightarrow0\), the expression becomes linear with a slope of the “stretch” (\(s\)) parameter: \(f(I) = s\times{}I\). In practice, we can use this characteristic to first set a low value for --qbright and see the brighter parts in color, while changing the parameter --stretch to show linearly the fainter regions (outskirts of the galaxies for example. The image obtained previously was computed by the default parameters (--qthresh=1.0 and --stretch=1.0). So, let’s set a lower value for --qbright and check the result.

$ astscript-color-faint-gray $R $G $B --output=m51-qlow.pdf \
                             --qbright=0.01

Comparing m51.pdf and m51-qlow.pdf, you will see that a large area of the previously colored colored pixels have become black. Only the very brightest pixels (core of the galaxies and stars) are shown in color. Now, let’s bring out the fainter regions around the brightest pixels linearly by increasing --stretch. This allows you to reveal fainter regions, such as outer parts of galaxies, spiral arms, stellar streams, and similar structures. Please, try different values to see the effect of changing this parameter. Here, we will use the value of --stretch=100.

$ astscript-color-faint-gray $R $G $B --output=m51-qlow-shigh.pdf \
                             --qbright=0.01 --stretch=100

Do you see how the spiral arms and the outskirts of the galaxies have become visible as --stretch is increased? After some trials, you will have the necessary feeling to see how it works. Please, play with these two parameters until you obtain the desired results. Depending on the absolute pixel values of the input images and the photometric calibration, these two parameters will be different. So, when using this script on your own data, take your time to study and analyze which parameters are good for showing the entire dynamical range. For this tutorial, we will keep it simple and use the previous parameters. Let’s define a new variable to keep the parameters already discussed so we have short command-line examples.

$ params="--qbright=0.01 --stretch=100"
$ astscript-color-faint-gray $R $G $B $params --output=m51.pdf
$ rm m51-qlow.pdf m51-qlow-shigh.pdf

Having a separate color-map for the fainter parts is generally a good thing, but for some reason you may not want it! To disable this feature, you can use the --coloronly option:

$ astscript-color-faint-gray $R $G $B $params --coloronly \
                             --output=m51-coloronly.pdf

Open the image and note that now the coloring has gone all the way into the noise (producing a black background). In contrast with the gray background images before, the fainter/smaller stars/galaxies and the low surface brightness features are not visible anymore! These regions show the interaction of two galaxies; as well as all the other background galaxies and foreground stars. These structures were entirely hidden in the “only-color” images. Consequently, the gray background color scheme is particularly useful for visualizing the most features of your data and you will rarely need to use the --coloronly option. We will therefore not use this option anymore in this tutorial; and let’s clean the temporary file made before:

$ rm m51-coloronly.pdf

Now that we have the basic parameters are set, let’s consider other parameters that allow to fine tune the three ranges of values: color for the brightest pixel values, black for intermediate pixel values, and gray for the faintest pixel values:

  • --colorval defines the boundary between the color and black regions (the lowest pixel value that is colored).
  • --grayval defines the boundary between the black and gray regions (the highest gray value).

Looking at the last lines that the script prints, we see that the default value estimated for --colorval and --grayval are roughly 1.4. What do they mean? To answer this question it is necessary to have a look at the image that is used to separate those different regions. By default, this image is computed internally by the script and removed at the end. To have a look at it, you need to use the option --keeptmp to keep the temporary files. Let’s put the temporary files into the tmp directory with the option --tmpdir=tmp --keeptmp. The first will use the name tmp for the temporary directory and with the second, we ask the script to not delete (keep) it after all operations are done.

$ astscript-color-faint-gray $R $G $B $params --output=m51.pdf \
                             --tmpdir=tmp --keeptmp

The image that defines the thresholds is ./tmp/colorgray_threshold.fits. By default, this image is the asinh-transformed image with the pixel values between 0 (faint) and 100 (bright). If you obtain the statistics of this image, you will see that the median value is exactly the value that the script is giving as the --colorval.

$ aststatistics ./tmp/colorgray_threshold.fits

In other words, all pixels between 100 and this value (1.4) on the threshold image will be shown in color. To see its effect, let’s increase this parameter to --colorval=25. By doing this, we expect that only bright pixels (those between 100 and 25 in the threshold image) will be in color.

$ astscript-color-faint-gray $R $G $B $params --colorval=25 \
                             --output=m51-colorval.pdf

Open m51-colorval.pdf and check that it is true! Only the central part of the objects (very bright pixels, those between 100 and 25 on the threshold image) are shown in color. Fainter pixels (below 25 on the threshold image) are shown in black and gray. However, in many situations it is good to be able to show the outskirts of galaxies and low surface brightness features in pure black, while showing the background in gray. To do that, we can use another threshold that separates the black and gray pixels: --grayval.

Similar to --colorval, the --grayval option defines the separation between the pure black and the gray pixels from the threshold image. For example, by setting --grayval=5, those pixels below 5 in the threshold image will be shown in gray, brighter pixels will be shown in black until the value 25. Pixels brighter than 25 are shown in color.

$ astscript-color-faint-gray $R $G $B $params --output=m51-check.pdf \
                             --colorval=25 --grayval=5

Open the image and check that the regions shown in color are smaller (as before), and that now there is a region around those color pixels that are in pure black. After the black pixels toward the fainter ones, they are shown in gray. As explained above, in the gray region, the brightest are black and the faintest are white. It is recommended to experiment with different values around the estimated one to have a feeling on how it changes the image. To have even better idea of those regions, please run the following example to keep temporary files and check the labeled image it has produced:

$ astscript-color-faint-gray $R $G $B $params --output=m51-check.pdf \
                           --colorval=25 --grayval=5 \
                           --tmpdir=tmp --keeptmp

$ astscript-fits-view tmp/total_mask-2color-1black-0gray.fits

In this segmentation image, pixels equal to 2 will be shown in color, pixels equal to 1 will be shown as pure black, and pixels equal to zero are shown in gray. By default, the script sets the same value for both thresholds. That means that there is not many pure black pixels. By adjusting the --colorval and --grayval parameters, you can obtain an optimal result to show the bright and faint parts of your data within one printable image. The values used here are somewhat extreme to illustrate the logic of the procedure, but we encourage you to experiment with values close to the estimated by default in order to have a smooth transition between the three regions (color, black, and gray). The script can provide additional information about the pixel value distributions used to estimate the parameters by using the --checkparams option.

To conclude this section of the tutorial, let’s clean up the temporary test files:

$ rm m51-check.pdf m51-colorval.pdf

2.6.4 Manually setting color-black-gray regions

In Color for bright regions and grayscale for faint, we created a non-linear colored image. We used the --colorval and --grayval options to specify which regions to show in gray (faintest values), black (intermediate values) and color (brightest values). We also saw that the script uses a labeled image with three possible values for each pixel to identify how that pixel should be colored.

A useful feature of this script is the possibility of providing this labeled image as an input directly. This expands the possibilities of generating color images in a more quantitative way. In this section, we’ll use this feature to use a more physically motivated criteria to select these three regions (the surface brightness in the reddest band).

First, let’s generate a surface brightness image from the R channel. That is, the value of each pixel will be in the units of surface brightness (mag/arcsec\(^2\)). To do that, we need obtain the pixel area in arcsec and use the zero point value of the image. Then, the counts-to-sb operator of astarithmetic is used. For more on the conversion of NaN surface brightness values and the value to R_sbl (which is roughly the surface brightness limit of this image), see FITS images in a publication.

$ sb_sbl=26
$ sb_zp=23.43
$ sb_img=aligned/i-jplus.fits
$ pixarea=$(astfits $sb_img --pixelareaarcsec2 --quiet)

# Compute the SB image (set NaNs to SB of 26!)
$ astarithmetic $sb_img $sb_zp $pixarea counts-to-sb set-sb \
                sb sb isblank sb $sb_sbl gt or $sb_sbl where \
                --output=sb.fits

# Have a look at the image
$ astscript-fits-view sb.fits --ds9scale=minmax \
                      --ds9extra="-invert"

Remember that because sb.fits is a surface brightness image where lower values are brighter and higher values are fainter. Let’s build the labeled image that defines the regions (regions.fits) step-by-step with the following criteria in surface brightness (SB)

\(\rm{SB}<23\)

These are the brightest pixels, we want these in color. In the regions labeled image, these should get a value of 2.

\(23<\rm{SB}<25\)

These are the intermediate pixel values, to see the fainter parts better, we want these in pure black (no change in color in this range). In the regions labeled image, these should get a value of 1.

\(\rm{SB}>25\)

These are the faintest pixel values, we want these in a gray color map (pixels with an SB of 25 will be black and as they become fainter, they will become lighter shades of gray). In the regions labeled image, these should get a value of 0.

# SB thresholds (low and high)
$ sb_faint=25
$ sb_bright=23

# Select the three ranges of pixels.
$ astarithmetic sb.fits set-sb \
                sb $sb_bright lt set-color \
                sb $sb_bright ge sb $sb_faint lt and set-black \
                color 2 u8 x black + \
                --output=regions.fits

# Check the images
$ astscript-fits-view regions.fits

We can now use this labeled image with the --regions option for obtaining the final image with the desired regions (the R, G, B and params shell variables were set previously in Color for bright regions and grayscale for faint):

$ astscript-color-faint-gray $R $G $B $params --output=m51-sb.pdf \
                             --regions=regions.fits

Open m51-sb.pdf and have a look. Do you see how the different regions (SB intervals) have been colored differently? They come from the SB levels we defined, and because it is using absolute thresholds in physical units of surface brightness, the visualization is not only a nice looking color image, but can be used in scientific analysis.

This is really interesting because now it is possible to use color images for detecting low surface brightness features at the same time they provide quantitative measurements. Of course, here we have defined this region label image just using two surface brightness values, but it is possible to define any other labeled region image that you may need for your particular purpose.


2.6.5 Weights, contrast, markers and other customizations

Previously (in Manually setting color-black-gray regions) we used an absolute (in units of surface brightness) thresholding for selecting which regions to show by color, black and gray. To keep the previous configurations and avoid long commands, let’s add the previous options to the params shell variable. To help in readability, we will repeat the other shell variables from previous sections also:

$ R="aligned/i-jplus.fits -h1 --zeropoint=23.43 --minimum=0.0"
$ G="aligned/r-jplus.fits -h1 --zeropoint=23.74 --minimum=0.0"
$ B="aligned/g-jplus.fits -h1 --zeropoint=23.74 --minimum=0.0"
$ params="--regions=regions.fits --qbright=0.01 --stretch=100"
$ astscript-color-faint-gray $R $G $B $params --output=m51.pdf

To modify the color balance of the output image, you can weigh the three channels differently with the --weight or -w option. For example, by using -w1 -w1 -w2, you give two times more weight to the blue channel than to the red and green channels:

$ astscript-color-faint-gray $R $G $B $params -w1 -w1 -w2 \
                             --output=m51-weighted.pdf

The colored pixels of the output are much bluer now and the distinction between the two merging galaxies is more clear. However, keep in mind that altering the different filters can lead to incorrect subsequent analyses by the readers/viewers of this work (for example they will falsely think that the galaxy is blue, and not red!). If the reduction and photometric calibration are correct, and the images represent what you consider as the red, green, and blue channels, then the output color image should be suitable without weights.

In certain situations, the combination of channels may not have a traditional color interpretation. For instance, combining an X-ray channel with an optical filter and a far-infrared image can complicate the interpretation in terms of human understanding of color. But the physical interpretation remains valid as the different channels (colors in the output) represent different physical phenomena of astronomical sources. Another easier example is the use of narrow-band filters such as the H-alpha of J-PLUS survey. This is shown in the Bottom-right panel of Figure 1 by Infante-Sainz et al. 2024, in this case the G channel has been substituted by the image corresponding to the H-alpha filter to show the star formation regions. Therefore, please use the weights with caution, as it can significantly affect the output and misinform your readers/viewers.

If you do apply weights be sure to report the weights in the caption of the image (beside the filters that were used for each channel). With great power there must also come great responsibility!

Two additional transformations are available to modify the appearance of the output color image. The linear transformation combines bias adjustment and contrast enhancement through the --bias and --contrast options. In most cases, only the contrast adjustment is necessary to improve the quality of the color image. To illustrate the impact of adjusting image contrast, we will generate an image with higher contrast and compare with the previous one.

$ astscript-color-faint-gray $R $G $B $params --contrast=2 \
                             --output=m51-contrast.pdf

When you compare this (m51-contrast.pdf) with the previous output (m51.pdf), you see that the colored parts are now much more clear! Use this option also with caution because it may happen that the bright parts become saturated.

Another option available for transforming the image appearance is the gamma correction, a non-linear transformation that can be useful in specific cases. You can experiment with different gamma values to observe the impact on the resulting image. Lower gamma values will enhance faint structures, while higher values will emphasize brighter regions. Let’s have a look by giving two very different values to it with the simple loop below:

$ for g in 0.4 2.0; do \
    astscript-color-faint-gray $R $G $B $params --contrast=2 \
             --gamma=$g --output=m51-gamma-$g.pdf; \
  done

Comparing the last three files (m51-contrast.pdf, m51-gamma-0.4.pdf and m51-gamma-2.0.pdf), you will clearly see the effect of the --gamma.

Instead of using a combination of the three input images for the gray background, you can introduce a fourth image that will be used for generating the gray background. This image is referred to as the "K" channel and may be useful when a particular filter is deeper, has unique characteristics, or you have built by some custom processing to show the diffuse features better. In this case, this image will be used for defining the --colorval and --grayval thresholds, but the rationale remains the same as explained earlier.

Two additional options are available to smooth different regions by convolving with a Gaussian kernel: --colorkernelfwhm for smoothing color regions and --graykernelfwhm for convolving gray regions. The value specified for these options represents the full width at half maximum of the Gaussian kernel.

Finally, another commonly useful feature is --markoptions: it allows you to mark and label the final output image with vector graphics over the color image. The arguments passed through this option are directly passed to ConvertType for the generation of the output image. This feature was already used in Marking objects for publication of the General program usage tutorial; see there for a more complete introduction.

Let’s create four marks/labels just to illustrate the procedure within astscript-color-faint-gray. First we need to create a table that contains the parameters for creating the marks (coordinates, shape, size, colors, etc.). In order to have an example that could be easily salable to more marks, with elaborated options let’s create it by parts: the header with the column names, and the parameters. With the following commands, we’ll create the header that contains the column metadata.

echo "# Column 1: ra      [pix, f32] RA coordinate"   > markers.txt
echo "# Column 2: dec     [pix, f32] Dec coordinate" >> markers.txt
echo "# Column 3: shape   [none, u8] Marker shape"   >> markers.txt
echo "# Column 4: size    [pix, f32] Marker Size"    >> markers.txt
echo "# Column 5: aratio  [none, f32] Axis ratio"    >> markers.txt
echo "# Column 6: angle   [deg, f32] Position angle" >> markers.txt
echo "# Column 7: color   [none, u8] Marker color"   >> markers.txt

Next is to create the parameters that define the markers. In this case, with the lines below we create four markers (cross, ellipse, square, and line) at different positions, with different shapes, and colors. These lines are appended to the header file created previously.

echo "400.00  400.00  3  60.000  0.50  0.000  8"  >> markers.txt
echo "1800.0  400.00  4  120.00  0.30  45.00  58" >> markers.txt
echo "400.00  1800.0  6  180.00  1.00  0.000  85" >> markers.txt
echo "1800.0  1800.0  8  240.00  1.00  -45.0  25" >> markers.txt

Now that we have the table containing the definition of the markers, we use the --markoptions option of this script. This option will pass what ever is given to it directly to ConvertType, so you can use all the options in Drawing with vector graphics. For this basic example, let’s give it the following options:

markoptions="--mode=img \
             --sizeinarcsec \
             --markshape=shape \
             --markrotate=angle \
             --markcolor=color \
             --marks=markers.txt \
             --markcoords=ra,dec \
             --marksize=size,aratio"

The last step consists in executing the script with the option that provides all the markers options.

$ astscript-color-faint-gray $R $G $B $params --contrast=2 \
                             --markoptions="$markoptions" \
                             --output=m51-marked.pdf

Open the m51-marked.pdf and check that the four markers have been printed on the image. With this quick example we just show the possibility of drawing markers on images very easily. This task can be automated, for example by plotting markers from a given catalog at specific positions, and so on. Note that there are many other options for customize your markers/drawings over an output of ConvertType, see Drawing with vector graphics and Marking objects for publication.

Congratulations! By following the tutorial up to this point, we have been able to reproduce three images of Infante-Sainz et al. 2024. You can see the commands that were used to generate them within the reproducible source of that paper at https://codeberg.org/gnuastro/papers/src/branch/color-faint-gray. Remember that this paper is exactly reproducible with Maneage, so you can explore and build the entire paper by yourself. For more on Maneage, see Akhlaghi et al. 2021.

This tutorial provided a general overview of the various options to construct a color image from three different FITS images using the astscript-color-faint-gray script. Keep in mind that the optimal parameters for generating the best color image depend on your specific goals and the quality of your input images. We encourage you to follow this tutorial with the provided J-PLUS images and later with your own dataset. See Color images with gray faint regions for more information, and please consider citing Infante-Sainz et al. 2024 if you use this script in your work (the full BibTeX entry of this paper will be given to you with the --cite option).


2.7 Zero point of an image

The “zero point” of an image is astronomical jargon for the calibration factor of its pixel values; allowing us to convert the raw pixel values to physical units. It is therefore a critical step during data reduction. For more on the definition and importance of the zero point magnitude, see Brightness, Flux, Magnitude and Surface brightness and Zero point estimation.

In this tutorial, we will use Gnuastro’s astscript-zeropoint, to estimate the zero point of a single exposure image from the J-PLUS survey, while using an SDSS image as reference (recall that all SDSS images have been calibrated to have a fixed zero point of 22.5). In this case, both images that we are using were taken with the SDSS r filter. See Eskandarlou et al. 2023.

Same filters and SVO filter database: It is very important that both your images are taken with the same filter. When looking at filter names, don’t forget that different filter systems sometimes have the same names for one filter, such as the name “R”; which is used in both the Johnson and SDSS filter systems. Hence if you confront an image in the “R” or “r” filter, double check to see exactly which filter system it corresponds to. If you know which observatory your data came from, you can use the SVO database to confirm the similarity of the transmission curves of the filters of your input and reference images. SVO contains the filter data for many of the observatories world-wide.


2.7.1 Zero point tutorial with reference image

First, let’s create a directory named tutorial-zeropoint to keep things clean and work in that. Then, with the commands below, you can download an image from J-PLUS and SDSS. To speed up the analysis, the image is cropped to have a smaller region around its center.

$ mkdir tutorial-zeropoint
$ cd tutorial-zeropoint
$ jplusdr2=http://archive.cefca.es/catalogues/vo/siap/jplus-dr2/reduced
$ wget $jplusdr2/get_fits?id=771463 -O jplus.fits.fz
$ astcrop jplus.fits.fz --center=107.7263,40.1754 \
          --width=0.6 --output=jplus-crop.fits

Although we cropped the J-PLUS image, it is still very large in comparison with the SDSS image (the J-PLUS field of view is almost \(1.5\times1.5\) deg\(^2\), while the field of view of SDSS in each filter is almost \(0.3\times0.5\) deg\(^2\)). Therefore, let’s download two SDSS images (and then decompress them) in the region of the cropped J-PLUS image to have a more accurate result compared to a single SDSS footprint: generally, your zero point estimation will have less scatter with more overlap between your reference image(s) and your input image.

$ sdssbase=https://dr12.sdss.org/sas/dr12/boss/photoObj/frames
$ wget $sdssbase/301/6509/5/frame-r-006509-5-0115.fits.bz2 \
       -O sdss1.fits.bz2
$ wget $sdssbase/301/6573/5/frame-r-006573-5-0174.fits.bz2 \
       -O sdss2.fits.bz2
$ bunzip2 sdss1.fits.bz2
$ bunzip2 sdss2.fits.bz2

To have a feeling of the data, let’s open the three images with astscript-fits-view using the command below. Wait a few seconds to see the three images “blinking” one after another. The largest one is the J-PLUS crop and the two smaller ones that partially cover it in different regions are from SDSS.

$ astscript-fits-view sdss1.fits sdss2.fits jplus-crop.fits \
           --ds9extra="-lock frame wcs -single -zoom to fit -blink yes"

The test above showed that the three images are already astrometrically calibrated (the coverage of the pixel positions on the sky is correct in both). To confirm, you can zoom-in to a certain object and confirm it on a pixel level. It is always good to do the visual check above when you are confronted with new images (and may not be confident about the accuracy of the astrometry). Do not forget that the goal here is to find the calibration of pixel values; and that we assume pixel positions are already calibrated (the image already has a good astrometry).

The SDSS images are Sky subtracted, while this single-exposure J-PLUS image still contains the counts related to the Sky emission within them. In the J-PLUS survey, the sky-level in each pixel is kept in a separate BACKGROUND_MODEL HDU of jplus.fits.fz; this allows you to use a different sky if you like. The SDSS image FITS files also have multiple extensions. To understand our inputs, let’s have a fast look at the basic info of each:

$ astfits sdss1.fits
Fits (GNU Astronomy Utilities) 0.22
Run on Fri Apr 14 11:24:03 2023
-----
HDU (extension) information: 'sdss1.fits'.
 Column 1: Index (counting from 0, usable with '--hdu').
 Column 2: Name ('EXTNAME' in FITS standard, usable with '--hdu').
           ('n/a': no name in HDU metadata)
 Column 3: Image data type or 'table' format (ASCII or binary).
 Column 4: Size of data in HDU.
 Column 5: Units of data in HDU (only images).
           ('n/a': no unit in HDU metadata, or HDU is a table)
-----
0      n/a             float32         2048x1489 nanomaggy
1      n/a             float32         2048      n/a
2      n/a             table_binary    1x3       n/a
3      n/a             table_binary    1x31      n/a



$ astfits jplus.fits.fz
Fits (GNU Astronomy Utilities) 0.22
Run on Fri Apr 14 11:21:30 2023
-----
HDU (extension) information: 'jplus.fits.fz'.
 Column 1: Index (counting from 0, usable with '--hdu').
 Column 2: Name ('EXTNAME' in FITS standard, usable with '--hdu').
           ('n/a': no name in HDU metadata)
 Column 3: Image data type or 'table' format (ASCII or binary).
 Column 4: Size of data in HDU.
 Column 5: Units of data in HDU (only images).
           ('n/a': no unit in HDU metadata, or HDU is a table)
-----
0      n/a              no-data         0         n/a
1      IMAGE            float32         9216x9232 adu
2      MASKED_PIXELS    int16           9216x9232 n/a
3      BACKGROUND_MODEL float32         9216x9232 n/a
4      MASK_MODEL       uint8           9216x9232 n/a

Therefore, in order to be able to compare the SDSS and J-PLUS images, we should first subtract the sky from the J-PLUS image. To do that, we can either subtract the BACKGROUND_MODEL HDU from the IMAGE HDU using Arithmetic, or we can use NoiseChisel to find a good sky ourselves. As scientists we like to tweak and be creative, so let’s estimate it ourselves with the command below. Generally, you may not have a pre-estimated Sky estimation like above, so you should be prepared to subtract the sky yourself.

$ astnoisechisel jplus-crop.fits --output=jplus-nc.fits
$ astscript-fits-view jplus-nc.fits

Notice that there is a relatively bright star in the center-bottom of the image. In the “Cube” window, click on the “Next” button to see the DETECTIONS HDU. The large footprint of the bright star is obvious. Press the “Next” button one more time to get to the SKY HDU. You see that in the center-bottom, the footprint of the large star is clearly visible in the measured Sky level. This is not good! With Sky values above 54 ADU in the center of the star (the white pixels). This over-subtracted Sky level in part of the image will affect your magnitude measurements and thus the zero point!

In General program usage tutorial, we have a section on NoiseChisel optimization for detection, there is also a full tutorial on this in Detecting large extended targets. Therefore, we will not go into the details of NoiseChisel optimization here. Given the large images of J-PLUS, we will increase the tile-size to \(100\times100\) pixels and the number of neighbors to identify outlying tiles to 50 (these are usually the first parameters you should start editing when you are confronted with a new image). After the second command, check the SKY extension to confirm that there is no footprint of any bright object there. You will still see a gradient, but note the minimum and maximum values of the Sky level: their difference is more than 26 times smaller than the noise standard deviation (so statistically speaking, it is pretty flat!)

$ astnoisechisel jplus-crop.fits --output=jplus-nc.fits \
                 --tilesize=100,100 --outliernumngb=50
$ astscript-fits-view jplus-nc.fits


## Check that the gradient in the sky is statistically negligible.
$ aststatistics jplus-nc.fits -hSKY --minimum --maximum \
                | awk '{print $2-$1}'
0.32809
$ aststatistics jplus-nc.fits -hSKY_STD --median
8.377977e+00

We are now ready to find the zero point! First, let’s run the astscript-zeropoint with --help to see the option names (recall that you can see more details of each option in Invoking astscript-zeropoint). For the first time, let’s use the script in the most simple state possible. We will keep only the essential options: the names of the input and reference images (and their HDUs), the name of the output, and also two apertures with radii of 3 arcsec to start with:

$ astscript-zeropoint --help
$ astscript-zeropoint jplus-nc.fits --hdu=INPUT-NO-SKY \
                      --refimgs=sdss1.fits,sdss2.fits \
                      --output=jplus-zeropoint.fits \
                      --refimgszp=22.5,22.5 \
                      --refimgshdu=0,0 \
                      --aperarcsec=3

The output is a FITS table (because generally, you will give more apertures and choose the best one based on a higher-level analysis). Let’s check the output’s internal structure with Gnuastro’s astfits program.

$ astfits jplus-zeropoint.fits
-----
0      n/a             no-data         0     n/a
1      ZEROPOINTS      table_binary    1x3   n/a
2      APER-3          table_binary    321x2 n/a

You can see that there are two HDUs in this file. The HDU names give a hint, so let’s have a look at each extension with Gnuastro’s asttable program:

$ asttable jplus-zeropoint.fits --hdu=1 -i
--------
jplus-zeropoint.fits (hdu: 1)
-------       -----   ----     -------
No.Name       Units   Type     Comment
-------       -----   ----     -------
1  APERTURE   arcsec  float32  n/a
2  ZEROPOINT  mag     float32  n/a
3  ZPSTD      mag     float32  n/a
--------
Number of rows: 1
--------

As you can see, in the first extension, for each of the apertures you requested (APERTURE), there is a zero point (ZEROPOINT) and the standard deviation of the measurements on the apertures (ZPSTD). In this case, we only requested one aperture, so it only has one row. Now, let’s have a look at the next extension:

$ asttable jplus-zeropoint.fits --hdu=2 -i
--------
jplus-zeropoint.fits (hdu: 2)
-------      -----  ----     -------
No.Name      Units  Type     Comment
-------      -----  ----     -------
1  MAG-REF   f32    float32  Magnitude of reference.
2  MAG-DIFF  f32    float32  Magnitude diff with input.
--------
Number of rows: 321
--------

It contains a table of measurements for the aperture with the least scatter. In this case, we only gave one aperture, so it is the same. If you give multiple apertures, only the one with least scatter will be present by default. In the MAG-REF column you see the magnitudes within each aperture on the reference (SDSS) image(s). The MAG-DIFF column contains the difference of the input (J-PLUS) and reference (SDSS) magnitudes for each aperture (see Zero point estimation). The two catalogs, created by the aperture photometry from the SDSS images, are merged into one so that there are more stars to compare. Therefore, no matter how many reference images you provide, there will only be a single table here. If the two SDSS images overlapped, each object in the overlap region would have two rows (one row for the measurement from one SDSS image, and another from the measurement from the other).

Now that we have obtained the zero point of the J-PLUS image, let’s go a little deeper into lower-level details of how this script operates. This will help you better understand what happened and how to interpret and improve the outputs when you are confronted with a new image and strange outputs.

To keep intermediate results the astscript-zeropoint script keeps temporary files in a temporary directory and later deletes it (and all the intermediate products). If you like to check the temporary files of the intermediate steps, you can use --keeptmp option to not remove them.

Let’s take a closer look into the contents of each HDU. First, we’ll use Gnuastro’s asttable to see the measured zero point for this aperture. We are using -Y to have human-friendly (non-scientific!) numbers (which are sufficient here) and -O to also show the metadata of each column at the start.

$ asttable jplus-zeropoint.fits -Y -O
# Column 1: APERTURE  [arcsec,f32,] Aperture used.
# Column 2: ZEROPOINT [mag   ,f32,] Zero point (sig-clip median).
# Column 3: ZPSTD     [mag   ,f32,] Zero point Standard deviation.
3.000          26.435         0.057

Now, let’s have a look at the first 10 rows of the second (APER-3) extension. From the previous check we did above, we see that it contains 321 rows!

$ asttable jplus-zeropoint.fits -Y -O --hdu=APER-3 --head=10
# Column 1: MAG-REF  [f32,f32,] Magnitude of reference.
# Column 2: MAG-DIFF [f32,f32,] Magnitude diff with input.
16.461         30.035
16.243         28.209
15.427         26.427
20.064         26.459
17.334         26.425
20.518         26.504
17.100         26.400
16.919         26.428
17.654         26.373
15.392         26.429

But the table above is hard to interpret, so let’s plot it. To do this, we’ll use the same astscript-fits-view command above that we used for images. It detects if the file has a image or table HDU and will call DS9 or TOPCAT respectively. You can also use any other plotter you like (TOPCAT is not part of Gnuastro), this script just calls it.

$ astscript-fits-view jplus-zeropoint.fits --hdu=APER-3

After TOPCAT opens, you can select the “Graphics” menu and then “Plain plot”. This will show a plot with the SDSS (reference image) magnitude on the horizontal axis and the difference of magnitudes between the the input and reference (the zero point) on the vertical axis.

In an ideal world, the zero point should be independent of the magnitude of the different stars that were used. Therefore, this plot should be a horizontal line (with some scatter as we go to fainter stars). But as you can see in the plot, in the real world, this expected behavior is seen only for stars with magnitudes about 16 to 19 in the reference SDSS images. The stars that are brighter than 16 are saturated in one (or both) surveys72. Therefore, they do not have the correct magnitude or mag-diff. You can check some of these stars visually by using the blinking command above and zooming into some of the brighter stars in the SDSS images.

On the other hand, it is natural that we cannot measure accurate magnitudes for the fainter stars because the noise level (or “depth”) of each image is limited. As a result, the horizontal line becomes wider (scattered) as we go to the right (fainter magnitudes on the horizontal axis). So, let’s limit the range of used magnitudes from the SDSS catalog to calculate a more accurate zero point for the J-PLUS image. For this reason, we have the --magnituderange option in astscript-zeropoint.

Necessity of sky subtraction: To obtain this horizontal line, it is very important that both your images have been sky subtracted. Please, repeat the last astscript-zeropoint command above only by changing the input file to jplus-crop.fits. Then use Gnuastro’s astscript-fits-view again to draw a plot with TOPCAT (also same as above). Instead of a horizontal line, you will see a sloped line in the magnitude range above! This happens because the sky level acts as a source of constant signal in all apertures, so the magnitude difference will not be independent of the star’s magnitude, but dependent on it (the measurement on a fainter star will be dominated by the sky level).

Remember: if you see a sloped line instead of a horizontal line, the input or reference image(s) are not sky subtracted.

Another key parameter of this script is the aperture size (--aperarcsec) for the aperture photometry of images. On one hand, if the selected aperture is too small, you will be at the mercy of the differing PSFs between your input and reference image(s): part of the light of the star will be lost in the image with the worse PSF. On the other hand, with large aperture size, the light of neighboring objects (stars/galaxies) can affect the photometry. We should select an aperture radius of the same order than the one used in the reference image, typically 2 to 3 times the PSF FWHM of the images. For now, let’s assume the values 2, 3, 4, 5, and 6 arcsec for the aperture sizes parameter. The script will compare the result for several aperture sizes and choose the one with least standard deviation value, ZPSTD column of the ZEROPOINTS HDU.

Let’s re-run the script with the following changes:

  • Using --magnituderange to limit the stars used for estimating the zero point.
  • Giving more values for aperture size to find the best for these two images as explained above.
  • Call --keepzpap option to keep the result of matching the catalogs done with the selected apertures in the different extensions of the output file.
$ astscript-zeropoint jplus-nc.fits --hdu=INPUT-NO-SKY \
                      --refimgs=sdss1.fits,sdss2.fits \
                      --output=jplus-zeropoint.fits \
                      --refimgszp=22.5,22.5 \
                      --aperarcsec=2,3,4,5,6 \
                      --magnituderange=16,18 \
                      --refimgshdu=0,0 \
                      --keepzpap

Now, check number of HDU extensions by astfits.

$ astfits jplus-zeropoint.fits
-----
0      n/a             no-data         0     n/a
1      ZEROPOINTS      table_binary    5x3   n/a
2      APER-2          table_binary    319x2 n/a
3      APER-3          table_binary    321x2 n/a
4      APER-4          table_binary    323x2 n/a
5      APER-5          table_binary    323x2 n/a
6      APER-6          table_binary    325x2 n/a

You can see that the output file now has a separate HDU for each aperture (thanks to --keepzpap.) The ZEROPOINTS hdu contains the final zero point values for each aperture and their error. The best zero point value belongs to the aperture that has the least scatter (has the lowest standard deviation). The rest of extensions contain the zero point value computed within each aperture (as discussed above).

Let’s check the different tables by plotting all magnitude tables at the same time with TOPCAT.

$ astscript-fits-view jplus-zeropoint.fits

After TOPCAT has opened take the following steps:

  1. From the “Graphics” menu, select “Plain plot”. You will see the last HDU’s scatter plot open in a new window (for APER-6, with red points). The Bottom-left panel has the logo of a red-blue scatter plot that has written 6:jplus-zeropoint.fits in front of it (showing that this is the 6th HDU of this file). In the bottom-right panel, you see the names of the columns that are being displayed.
  2. In the “Layers” menu, Click on “Add Position Control”. On the bottom-left panel, you will notice that a new blue-red scatter plot has appeared but it just says <no table>. In the bottom-right panel, in front of “Table:”, select any other extension. This will plot the same two columns of that extension as blue points. Zoom-in to the region of the horizontal line to see/compare the different scatters.

    Change the HDU given to “Table:” and see the distribution of zero points for the different apertures.

The manual/visual operation above is critical if this is your first time with a new dataset (it shows all kinds of systematic biases (like the Sky issue above)! But once you know your data has no systematic biases, choosing between the different apertures is not easy visually! Let’s have a look at the table the ZEROPOINTS HDU (we don’t need to explicitly call this HDU since it is the first one):

$ asttable jplus-zeropoint.fits -O -Y
# Column 1: APERTURE  [arcsec,f32,] Aperture used.
# Column 2: ZEROPOINT [mag   ,f32,] Zero point (sig-clip median).
# Column 3: ZPSTD     [mag   ,f32,] Zero point Standard deviation.
2.000          26.405         0.028
3.000          26.436         0.030
4.000          26.448         0.035
5.000          26.458         0.042
6.000          26.466         0.056

The most accurate zero point is the one where ZPSTD is the smallest. In this case, minimum of ZPSTD is with radii of 2 and 3 arcseconds. Run the astscript-fits-view command above again to open TOPCAT. Let’s focus on the magnitude plots in these two apertures and determine a more accurate range of magnitude. The more reliable option is the range between 16.4 (where we have no saturated stars) and 18.5 mag (fainter than this, the scatter becomes too strong). Finally, let’s set some more apertures between 2 and 3 arcseconds radius:

$ astscript-zeropoint jplus-nc.fits --hdu=INPUT-NO-SKY \
                      --refimgs=sdss1.fits,sdss2.fits \
                      --output=jplus-zeropoint.fits \
                      --magnituderange=16.4,18.5 \
                      --refimgszp=22.5,22.5 \
                      --aperarcsec=2,2.5,3,3.5,4 \
                      --refimgshdu=0,0 \
                      --keepzpap

$ asttable jplus-zeropoint.fits -Y
2.000          26.405         0.037
2.500          26.425         0.033
3.000          26.436         0.034
3.500          26.442         0.039
4.000          26.449         0.044

The aperture with the least scatter is therefore the 2.5 arcsec radius aperture, giving a zero point of 26.425 magnitudes for this image. However, you can see that the scatter for the 3 arcsec aperture is also acceptable. Actually, the ZPSTD for of the 2.5 and 3 arcsec apertures only have a difference of \(3\%\) (\(= (0.034−0.0333)/0.033\times100\)). So simply choosing the minimum is just a first-order approximation (which is accurate within \(26.436−26.425=0.011\) magnitudes)

Note that in aperture photometry, the PSF plays an important role (because the aperture is fixed but the two images can have very different PSFs). The aperture with the least scatter should also account for the differing PSFs. Overall, please, always check the different and intermediate steps to make sure the parameters are the good so the estimation of the zero point is correct.

If you are happy with the minimum, you don’t have to search for the minimum aperture or its corresponding zero point yourself. This script has written it in ZPVALUE keyword of the table. With the first command, we also see the name of the file also, (you can use this on many files for example). With the second command, we are only printing the number by adding the -q (or --quiet) option (this is useful in a script where you want to write the value in a shell variable to use later).

$ astfits jplus-zeropoint.fits --keyvalue=ZPVALUE
jplus-zeropoint.fits 2.642512e+01

$ astfits jplus-zeropoint.fits --keyvalue=ZPVALUE -q
2.642512e+01

Generally, this script will write the following FITS keywords (all starting with ZP) for your future reference in its output:

$ astfits jplus-zeropoint.fits -h1 | grep ^ZP
ZPAPER  =                  2.5 / Best aperture.
ZPVALUE =             26.42512 / Best zero point.
ZPSTD   =           0.03276644 / Best std. dev. of zeropoint.
ZPMAGMIN=                 16.4 / Min mag for obtaining zeropoint.
ZPMAGMAX=                 18.5 / Max mag for obtaining zeropoint.

Using the --keyvalue option of the Fits program, you can easily get multiple of the values in one run (where necessary):

$ astfits jplus-zeropoint.fits --hdu=1 --quiet \
          --keyvalue=ZPAPER,ZPVALUE,ZPSTD
2.500000e+00   2.642512e+01   3.276644e-02

2.7.2 Zero point tutorial with reference catalog

In Zero point tutorial with reference image, we explained how to use the astscript-zeropoint for estimating the zero point of one image based on a reference image. Sometimes there is not a reference image and we need to use a reference catalog. Fortunately, astscript-zeropoint can also use the catalog instead of the image to find the zero point.

To show this, let’s download a catalog of SDSS in the area that overlaps with the cropped J-PLUS image (used in the previous section). For more on Gnuastro’s Query program, please see Query. The columns of ID, RA, Dec and magnitude in the SDSS r filter are called by their name in the SDSS catalog.

$ astquery vizier \
           --dataset=sdss12 \
           --overlapwith=jplus-crop.fits \
           --column=objID,RA_ICRS,DE_ICRS,rmag \
           --output=sdss-catalog.fits

To visualize the position of the SDSS objects over the J-PLUS image, let’s use astscript-ds9-region (for more details please see SAO DS9 region files from table) with the command below (it will automatically open DS9 and load the regions it created):

$ astscript-ds9-region sdss-catalog.fits \
                       --column=RA_ICRS,DE_ICRS \
                       --color=red --width=3 --output=sdss.reg \
                       --command="ds9 jplus-nc.fits[INPUT-NO-SKY] \
                                      -scale zscale"

Now, we are ready to estimate the zero point of the J-PLUS image based on the SDSS catalog. To download the input image and understand how to use the astscript-zeropoint, please see Zero point tutorial with reference image.

Many of the options (like the aperture size) and magnitude range are the same so we will not discuss them further. You will notice that the only substantive difference of the command below with the last command in the previous section is that we are using --refcat instead of --refimgs. There are also some cosmetic differences for example a new output name, not using --refimgszp since it is only necessary for images) and the --*column options which are used to identify the names of the necessary columns of the input catalog:

$ astscript-zeropoint jplus-nc.fits --hdu=INPUT-NO-SKY \
                      --refcat=sdss-catalog.fits \
                      --refcatmag=rmag \
                      --refcatra=RA_ICRS \
                      --refcatdec=DE_ICRS \
                      --output=jplus-zeropoint-cat.fits \
                      --magnituderange=16.4,18.5 \
                      --aperarcsec=2,2.5,3,3.5,4 \
                      --keepzpap

Let’s inspect the output with the command below.

$ asttable jplus-zeropoint-cat.fits -Y
2.000          26.337         0.034
2.500          26.386         0.036
3.000          26.417         0.041
3.500          26.439         0.043
4.000          26.455         0.050

As you see, the values and standard deviations are very similar to the results we got previously in Zero point tutorial with reference image. The Standard deviations are generally a little higher here because we didn’t do the photometry ourselves, but they are statistically similar.

Before we finish, let’s open the two outputs (from a reference image and reference catalog) with the command below. To confirm how they compare, we are showing the result for APER-3 extension in both (following the TOPCAT plotting recipe in Zero point tutorial with reference image).

$ astscript-fits-view jplus-zeropoint.fits jplus-zeropoint-cat.fits \
                      -hAPER-3

2.8 Pointing pattern design

A dataset that is ready for scientific analysis is usually composed of many separate exposures and how they are taken is usually known as “observing strategy”. This tutorial describes Gnuastro’s tools to simplify the process of deciding the pointing pattern of your observing strategy.

A “pointing” is the location on the sky that each exposure is aimed at. Each exposure’s pointing is usually moved (on the sky) compared to the previous exposure. This is done for reasons like improving calibration, increasing resolution, expending the area of the observation and etc. Therefore, deciding a suitable pointing pattern is one of the most important steps when planning your observation strategy.

There are commonly two types of pointings: “dither” and “offset”. These are sometimes used interchangeably with “pointing” (especially when the final stack is roughly the same area as the field of view. Alternatively, “dither” and “offset” are used to distinguish pointings with large or small (on the scale of the field of view) movement compared to a previous one. When a pointing has a large distance to the previous pointing, it is known as an “offset”, while pointings with a small displacement are known as a “dither”. This distinction originates from the mechanics and optics of most modern telescopes: the overhead (for example the need to re-focus the camera) to make small movements is usually less than large movements.

In this tutorial, let’s simulate a hypothetical pointing pattern using Gnuastro’s astscript-pointing-simulate installed script (see Pointing pattern simulation). Since we will be testing very different displacements between pointings, we’ll ignore the difference between offset and dither here, and only use the term pointing.

Let’s assume you want to observe M94 in the H-alpha and rSDSS filters (to study the extended star formation in the outer rings of this beautiful galaxy!). Including the outer parts of the rings, the galaxy is half a degree in diameter! This is very large, and you want to design a pointing pattern that will allow you to cover as much area, while not loosing your ability to calibrate properly.

Do not start with this tutorial: If you are new to Gnuastro and have not already completed General program usage tutorial, we recommend going through that tutorial before starting this one. Basic features like access to this book on the command-line, the configuration files of Gnuastro’s programs, benefiting from the modular nature of the programs, viewing multi-extension FITS files, and many others are discussed in more detail there.


2.8.1 Preparing input and generating exposure map

As mentioned in Pointing pattern design, the assumed goal here is to plan an observations strategy for M94. Let’s assume that after some searching, you decide to write a proposal for the JAST80 telescope at the Observatorio Astrofísico de Javalambre, OAJ73, in Teruel (Spain). The field of view of this telescope’s camera is almost 1.4 degrees wide, nicely fitting M94! It also has these two filters that you need74.

Before we start, as described in Pointing pattern simulation, it is just important to remember that the ideal pointing pattern depends primarily on your scientific objective, as well as the limitations of the instrument you are observing with. Therefore, there is no single pointing pattern for all purposes. However, the tools, methods, criteria or logic to check if your pointing pattern satisfies your scientific requirement are similar. Therefore, you can use the same methods, tools or logic here to simulate or verify that your pointing pattern will produce the products you expect after the observation.

To start simulating a pointing pattern for a certain telescope, you just need a single-exposure image of that telescope with WCS information. In other words, after astrometry, but before warping into any other pixel grid (to combine into a deeper stack). The image will give us the default number of the camera’s pixels, its pixel scale (width of pixel in arcseconds) and the camera distortion. These are reference parameters that are independent of the position of the image on the sky.

Because the actual position of the reference image is irrelevant, let’s assume that in a previous project, presumably on NGC 4395, you already had the download command of the following single exposure image. With the last command, please take a look at this image before continuing and explore it.

$ mkdir pointing-tutorial
$ cd pointing-tutorial
$ mkdir input
$ siapurl=https://archive.cefca.es/catalogues/vo/siap
$ wget $siapurl/jplus-dr3/reduced/get_fits?id=1050345 \
       -O input/jplus-1050345.fits.fz

$ astscript-fits-view input/jplus-1050345.fits.fz

This is the first time I am using an instrument: In case you haven’t already used images from your desired instrument (to use as reference), you can find such images from their public archives; or contacting them. A single exposure images is rarely of any scientific value (post-processing and stacking is necessary to make high-level and science-ready products). Therefore, they become publicly available very soon after the observation date; furthermore, calibration images are usually public immediately.

As you see from the image above, the T80Cam images are large (9216 by 9232 pixels). Therefore, to speed up the pointing testing, let’s down-sample the image by a factor of 10. This step is optional and you can safely use the full resolution, which will give you a more precise stack. But it will be much slower (maybe good after you have an almost final solution on the down-sampled image). We will call the output ref.fits (since it is the “reference” for our test). We are putting these two “input” files (to the script) in a dedicated directory to keep the running directory clean (and be able to easily delete temporary/test files for a fresh start with a ‘rm *.fits’).

$ astwarp input/jplus-1050345.fits.fz --scale=1/10 -oinput/ref.fits

For a first trial, let’s create a cross-shaped pointing pattern with 5 points around M94, which is centered at its center on the RA and Dec of 192.721250, 41.120556. We’ll center one exposure on the center of the galaxy, and include 4 more exposures that are each 1 arc-minute away along the RA and Dec axes. To simplify the actual command later75, let’s also include the column names in pointing.txt through two lines of metadata. Also note that the pointing.txt file can be made in any manner you like, for example, by writing the coordinates manually on your favorite text editor, or through another programming language or logic, or etc. Here, we are using AWK because it is sufficiently powerful for this job, and it is a very small program that is available on any Unix-based operating system (allowing you to easily run your programs on any computer).

$ step_arcmin=1
$ center_ra=192.721250
$ center_dec=41.120556

$ echo "# Column 1: RA  [deg, f64] Right Ascension"  > pointing.txt
$ echo "# Column 2: Dec [deg, f64] Declination"     >> pointing.txt

$ echo $center_ra $center_dec \
       | awk '{s='$step_arcmin'/60; fmt="%-10.6f %-10.6f\n"; \
               printf fmt, $1,   $2; \
               printf fmt, $1+s, $2; \
               printf fmt, $1,   $2+s; \
               printf fmt, $1-s, $2; \
               printf fmt, $1,   $2-s}' \
       >> pointing.txt

With the commands below, let’s have a look at the produced file, first as plain-text, then with TOPCAT (which needs conversion to FITS). After TOPCAT is opened, in the “Graphics” menu, select “Plane plot” to see the five points in a flat RA, Dec plot.

$ cat pointing.txt
# Column 1: RA  [deg, f64] Right Ascension
# Column 2: Dec [deg, f64] Declination
192.721250 41.120556
192.737917 41.120556
192.721250 41.137223
192.704583 41.120556
192.721250 41.103889

$ asttable pointing.txt -opointing.fits
$ astscript-fits-view pointing.fits
$ rm pointing.fits

We are now ready to generate the exposure map of the pointing pattern above using the reference image that we downloaded before. Let’s put the center of our final stack to be on the center of the galaxy, and we’ll assume the stack has a size of 2 degrees. With the second command, you can see the exposure map of the final stack. Recall that in this image, each pixel shows the number of input images that went into it.

$ astscript-pointing-simulate pointing.txt --output=stack.fits \
           --img=input/ref.fits --center=$center_ra,$center_dec \
           --width=2

$ astscript-fits-view stack.fits

You will see that except for a thin boundary, we have a depth of 5 exposures over the area of the single exposure. Let’s see what the width of the deepest part of the image is. First, we’ll use Arithmetic to set all pixels that contain less than 5 exposures (the outer pixels) to NaN (Not a Number). In the same Arithmetic command, let’s trim all the blank rows and columns, so the output only contains the pixels that are exposed 5 times. With the next command, let’s view the deep region and with the last command below, let’s use the --skycoverage option of the Fits program to see the coverage of deep part on the sky.

$ deep_thresh=5
$ astarithmetic stack.fits set-s s s $deep_thresh lt nan where trim \
                --output=deep.fits

$ astscript-fits-view deep.fits

$ astfits deep.fits --skycoverage
Input file: deep.fits (hdu: 1)

Sky coverage by center and (full) width:
  Center: 192.72125      41.120556
  Width:  1.880835157    1.392461166

Sky coverage by range along dimensions:
  RA       191.7808324    193.6616676
  DEC      40.42058203    41.81304319

As we see, in declination, the width of this deep field is about 1.4 degrees. Recall that RA is only defined on the equator and actual coverage in RA depends on the declination due to the spherical nature of the sky. This area therefore nicely covers the expected outer parts of M94. On first thought, it may seem that we are now finished, but that is not the case unfortunately!

There is a problem: with a step size of 1 arc-minute, the brighter central parts of this large galaxy will always be on very similar pixels; making it hard to calibrate those pixels properly. If you are interested in the low surface brightness parts of this galaxy, it is even worse: the outer parts of the galaxy will always cover similar parts of the detector in all the exposures; and they cover a large area on your image. To be able to accurately calibrate the image (in particular to estimate the flat field pattern and subtract the sky), you do not want this to happen! You want each exposure to cover very different sources of astrophysical signal, so you can accurately calibrate the artifacts created by the instrument or environment (for example flat field) or of natural causes (for example the Sky).

For an example of how these calibration issues can ruin low surface brightness science, please see the image of M94 in the Legacy Survey interactive viewer. After it is loaded, at the bottom-left corner of the window, write “M94” in the box of “Jump to object” and press ENTER. At first, M94 looks good with a black background, but as you increase the “Brightness” (by scrolling it to the right and seeing what is under the originally black pixels), you will see the calibration artifacts clearly.


2.8.2 Area of non-blank pixels on sky

In Preparing input and generating exposure map we generated a pointing pattern with very small steps, showing how this can cause calibration problems. Later (in Larger steps sizes for better calibration) using larger steps is discussed. In this section, let’s see how we can get an accurate measure of the area that is covered in a certain depth.

A first thought would be to simply multiply the widths along RA and Dec reported before: \(1.8808\times1.3924=2.6189\) degrees squared. But there are several problems with this:

  • It ignores the fact that RA only has units of degrees on the equator: at different declinations, differences in RA should be converted to degrees. This is discussed further in this tutorial: Pointings that account for sky curvature.
  • It doesn’t take into account the thin rows/columns of blank pixels (NaN) that are on the four edges of the deep.fits image.
  • The differing area of the pixels on the spherical sky in relation to those blank values can result in wrong estimations of the area.

Let’s get a very accurate estimation of the area that will not be affected by the issues above. With the first command below, we’ll use the --pixelareaonwcs option of the Fits program that will return the area of each pixel (in pixel units of degrees squared). After running the second command, please have a look at the produced image.

$ astfits deep.fits --pixelareaonwcs --output=deep-pix-area.fits

$ astfits deep.fits --pixelscale
Basic info. for --pixelscale (remove extra info with '--quiet' or '-q')
  Input: deep.fits (hdu 1) has 2 dimensions.
  Pixel scale in each FITS dimension:
    1: 0.00154403 (deg/pixel) = 5.5585 (arcsec/pixel)
    2: 0.00154403 (deg/pixel) = 5.5585 (arcsec/pixel)
  Pixel area:
    2.38402e-06 (deg^2) = 30.8969 (arcsec^2)

$ astscript-fits-view deep-pix-area.fits

You see a donut-like shape in DS9. Move your mouse over the central (white) region of the region and look at the values. You will see that the pixel area (in degrees squared) is exactly the same as we saw in the output of --pixelscale. As you move your mouse away to other colors, you will notice that the area covered by each pixel (its value in this image) deceases very slightly (in the 5th decimal!). This is the effect of the Gnomonic projection; summarized as TAN (for “tangential”) in the FITS WCS standard, the most commonly used in optical astronomical surveys and the default in this script.

Having deep-pix-area.fits, we can now use Arithmetic to set the areas of all the pixels that were NaN in deep.fits and sum all the values to get an accurate estimate of the area we get from this pointing pattern:

$ astarithmetic deep-pix-area.fits deep.fits isblank nan where -g1 \
                sumvalue --quiet
1.93836806631634e+00

Therefore, the actual area that is covered is less than the simple multiplication above. At these declinations, the dominant cause of this difference is the first point above (that RA needs correction), this will be discussed in more detail later in this tutorial (see Pointings that account for sky curvature). Generally, using this method to measure the area of your non-NAN pixels in an image is very easy and robust (automatically takes into account the curvature, coordinate system, projection and blank pixels of the image).


2.8.3 Script with pointing simulation steps so far

In Preparing input and generating exposure map and Area of non-blank pixels on sky, the basic steps to simulate a pointing pattern’s exposure map and measure the final output area on the sky where described in detail. From this point on in the tutorial, we will be experimenting with the shell variables that were set above, but the actual commands will not be changed regularly. If a change is necessary in a command, it is clearly mentioned in the text.

Therefore, it is better to write the steps above (after downloading the reference image) as a script. In this way, you can simply change those variables and see the final result fast by running your script. For more on writing scripts, see as described in Writing scripts to automate the steps.

Here is a summary of some points to remember when transferring the code in the sections before into a script:

  • Where the commands are edited/changed, please also update them in your script.
  • Keep all the variables at the top, even if they are used later. This allows to easily view or changed them without digging into the script.
  • You do not need to include visual check commands like the astscript-fits-view or cat commands above. Those can be run interactively after your script is finished; recall that a script is for batch (non-interactive) processing.
  • Put all your intermediate products inside a “build” directory.

Here is the script that summarizes the steps in Preparing input and generating exposure map (after download) and Area of non-blank pixels on sky:

#!/bin/bash
#
# Copyright (C) 2024-2024 Mohammad Akhlaghi <mohammad@akhlaghi.org>
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium under the GNU GPL v3+, without royalty
# provided the copyright notice and this notice are preserved.  This
# file is offered as-is, without any warranty.

# Parameters of the script
deep_thresh=5
step_arcmin=1
center_ra=192.721250
center_dec=41.120556

# Input and build directories (can be anywhere in your file system)
indir=input
bdir=build

# Abort the script in case of an error.
set -e

# Make the build directory if it doesn't already exist.
if ! [ -d $bdir ]; then mkdir $bdir; fi

# Build the 5-pointing pointing pattern (with the step size above).
pointingcat=$bdir/pointing.txt
echo "# Column 1: RA  [deg, f64] Right Ascension"  > $pointingcat
echo "# Column 2: Dec [deg, f64] Declination"     >> $pointingcat
echo $center_ra $center_dec \
            | awk '{s='$step_arcmin'/60; fmt="%-10.6f %-10.6f\n"; \
                    printf fmt, $1,   $2; \
                    printf fmt, $1+s, $2; \
                    printf fmt, $1,   $2+s; \
                    printf fmt, $1-s, $2; \
                    printf fmt, $1,   $2-s}' \
            >> $pointingcat

# Simulate the pointing pattern.
stack=$bdir/stack.fits
astscript-pointing-simulate $pointingcat --output=$stack \
           --img=input/ref.fits --center=$center_ra,$center_dec \
           --width=2

# Trim the regions shallower than the threshold.
deep=$bdir/deep.fits
astarithmetic $stack set-s s s $deep_thresh lt nan where trim \
               --output=$deep

# Calculate the area of each pixel on the curved celestial sphere:
pixarea=$bdir/deep-pix-area.fits
astfits $deep --pixelareaonwcs --output=$pixarea

# Report the final area (the empty 'echo's are for visual help in outputs)
echo; echo
echo "Area with step of $step_arcmin arcminutes, at $deep_thresh depth:"
astarithmetic $pixarea $deep isblank nan where -g1 \
              sumvalue --quiet

For a description of how to make it executable and how to run it, see Writing scripts to automate the steps. Note that as you start adding your own text to the script, be sure to add your name (and year that you modified) in the copyright notice at the start of the script (this is very important!).


2.8.4 Larger steps sizes for better calibration

In Preparing input and generating exposure map we saw that a small pointing pattern is not good for the reduction of data from a large object like M94! M94 is about half a degree in diameter; so let’s set step_arcmin=15. This is one quarter of a degree and will put the center of the four exposures on the four corners of the M94’s main ring. Furthermore, Script with pointing simulation steps so far, the steps were summarized into a script to allow easy changing of variables without manually re-entering the individual/separate commands.

After you change step_arcmin=15 and re-run the script, you will get a total area (from counting of per-pixel areas) of approximately 0.96 degrees squared. This is just roughly half the previous area and will barely fit M94! To understand the cause, let’s have a look at the full stack (not just the deepest area):

$ astscript-fits-view build/stack.fits

Compared to the first run (with step_arcmin=1), we clearly see how there are indeed fewer pixels that get photons in all 5 exposures. As the area of the deepest part has decreased, the areas with fewer exposures have also grown. Let’s define our deep region to be the pixels with 3 or more exposures. Please set deep_thresh=3 in the script and re-run it. You will see that the “deep” area is now almost 2.02 degrees squared! This is (slightly) larger than the first run (with step_arcmin=1)!

The difference between 3 exposures and 5 exposures seems a lot at first. But let’s calculate how much it actually affects the achieved signal-to-noise ratio and the surface brightness limit. The surface brightness limit (or upper-limit surface brightness) are both calculated by applying the definition of magnitude to the standard deviation of the background. So we should first calculate how much this difference in depth affects the sky standard deviation. For a complete discussion on the definition of the surface brightness limit, see Quantifying measurement limits.

Deep images will usually be dominated by Photon counting noise (or Poisson noise). Therefore, if a single exposure image has a sky standard deviation of \(\sigma_s\), and we combine \(N\) such exposures by taking their mean, the final/stacked sky standard deviation (\(\sigma\)) will be \(\sigma=\sigma_s/\sqrt{N}\). As a result, the surface brightness limit between the regions with \(N\) exposures and \(M\) exposures differs by \(2.5\times log_{10}(\sqrt{N/M}) = 1.25\times log_{10}(N/M)\) magnitudes. If we set \(N=3\) and \(M=5\), we get a surface brightness magnitude difference of 0.28!

This is a very small difference; given all the other sources of error that will be present; but how much it improves the calibration artifacts. Therefore at the cost of decreasing our surface brightness limit by 0.28 magnitudes, we are now able to calibrate the individual exposures much better, and even cover a larger area!

The argument above didn’t involve any image and was primarily theoretical. For the more visually-inclined readers, let’s add raw Gaussian noise (with a \(\sigma\) of 100 counts) over each simulated exposure. We will then instruct astscript-pointing-simulate to stack them as we would stack actual data (by taking the sigma-clipped mean). The command below is identical to the previous call to the pointing simulation script with the following differences. Note that this is just for demonstration, so you should not include this in your script (unless you want to see the noisy stack every time; at double the processing time).

--output

We are using a different output name, so we can compare the output of the new command with the previous one.

--stack-operator

This should be one of the Arithmetic program’s Stacking operators. By default the value is sum; because by default, each pixel of each exposure is given a value of 1. When stacking is defined through the summation operator, we can obtain the exposure map that you have already seen above.

But in this run, we are adding noise to each input exposure (through the hook that is described below) and stacking them (as we would stack actual science images). Since the purpose differs here, we are using this option to change the operator.

--hook-warp-after

This is the most visible difference of this command the previous one. Through a “hook”, you can give any arbitrarily long (series of) command(s) that will be added to the processing of this script at a certain location. This particular hook gets applied “after” the “warp”ing phase of each exposure (when the pixels of each exposure are mapped to the final pixel grid; but not yet stacked).

Since the script runs in parallel (the actual work-horse is a Makefile!), you can’t assume any fixed file name for the input(s) and output. Therefore the inputs to, and output(s) of, hooks are some pre-defined shell variables that you should use in the command(s) that you hook into the processing. They are written in full-caps to be clear and separate from your own variables. In this case, they are the $WARPED (input file of the hook) and $TARGET (output name that next steps in the script will operate on). As you see from the command below, through this hook we are calling the Arithmetic program to add noise to all non-zero pixels in the warped image. For more on the noise-adding operators, see Random number generators.

$ center_ra=192.721250
$ center_dec=41.120556
$ astscript-pointing-simulate build/pointing.txt --img=input/ref.fits \
           --center=$center_ra,$center_dec \
           --width=2 --stack-operator="3 0.2 sigclip-mean" \
           --output=build/stack-noised.fits \
           --hook-warp-after='astarithmetic $WARPED set-i \
                                          i i 0 uint8 eq nan where \
                                          100 mknoise-sigma \
                                           --output=$TARGET'

$ astscript-fits-view build/stack.fits build/stack-noised.fits

When you visually compare the two images of the last command above, you will see that (at least by eye) it is almost impossible to distinguish the differing noise pattern in the regions with 3 exposures from the regions with 5 exposures. But the regions with a single exposure are clearly visible! This is because the surface brightness limit in the single-exposure regions is \(1.25\times\log_{10}(1/5)=-0.87\) magnitudes brighter. This almost one magnitude difference in surface brightness is significant and clearly visible in the stacked image (recall that magnitudes are measured in a logarithmic scale).

Thanks to the argument above, we can now have a sufficiently large area with a usable depth. However, each the center of each pointing will still contain the central part of the galaxy. In other words, M94 will be present in all the exposures while doing the calibrations. Even in not-too-deep observations, we already see a large ring around this galaxy. When we do a low surface brightness optimized reduction, there is a good chance that the size of the galaxy is much larger than that ring. This very extended structure will make it hard to do the calibrations on very accurate scales. Accurate calibration is necessary if you do not want to loose the faint photons that have been recorded in your exposures.

Calibration is very important: Better calibration can result in a fainter surface brightness limit than more exposures with poor calibration; especially for very low surface brightness signal that covers a large area and is systematically affected by calibration issues.

Ideally, you want your target to be on the four edges/corners of each image. This will make sure that a large fraction of each exposure will not be covered by your final target in each exposure, allowing you to calibrate much more accurately.


2.8.5 Pointings that account for sky curvature

In Larger steps sizes for better calibration, we saw how a small loss in surface brightness limit can allow better calibration and even a larger area. Let’s extend this by setting step_arcmin=40 (almost half the width of the detector) inside your script (see Script with pointing simulation steps so far). After running the script with this change, take a look at build/deep.fits:

$ astscript-fits-view build/deep.fits --ds9scale=minmax

You will see that the region with 5 exposure depth is a horizontally elongated rectangle now! Also, the vertical component of the cross with four exposures is much thicker than the horizontal component! Where does this asymmetry come from? All the steps in our pointing strategy had the same (fixed) size of 40 arc minutes.

This happens because the same change in RA and Dec (defined on the curvature of a sphere) will result in different absolute changes on the equator. To visually see this, let’s look at the pointing positions in TOPCAT:

$ cat build/pointing.txt
# Column 1: RA  [deg, f64] Right Ascension
# Column 2: Dec [deg, f64] Declination
192.721250 41.120556
193.387917 41.120556
192.721250 41.787223
192.054583 41.120556
192.721250 40.453889

$ asttable build/pointing.txt -obuild/pointing.fits
$ astscript-fits-view build/pointing.fits

After TOPCAT opens, under the “graphics” window, select “Plane Plot”. In the newly opened window, click on the “Axes” item on the bottom-left list of items. Then activate the “Aspect lock” box so the vertical and horizontal axes have the same scaling. You will see what you expect from the numbers: we have a beautifully symmetric set of 5 points shaped like a ‘+’ sign.

Keep the previous window, and let’s go back to the original TOPCAT window. In the first TOPCAT window, click on “Graphics” again, but this time, select “Sky plot”. You will notice that the vertical component of the cross is now longer than the horizontal component! If you zoom-out (by scrolling your mouse over the plot) a lot, you will see that this is actually on the spherical surface of the sky! In other words, as you see here, on the sky, the horizontal points are closer to each other than the vertical points; causing a larger overlap between them, making the vertical overlap thicker in build/pointing.fits.

On the celestial sphere, only the declination is measured in degrees. In other words, the difference in declination of two points can be calculated only with their declination. However, except for points that are on the equator, differences in right ascension depend on the declination. Therefore, the origin of this problem is that we done the additions and subtractions for defining the pointing points in a flat space: based on the step size in arc minutes that was applied similarly on RA and Dec (in Preparing input and generating exposure map).

To fix this problem, we need to convert our points from the flat RA/Dec into the spherical RA/Dec. In the FITS standard, we have the “World Coordinate System” (WCS) that defines this type of conversion, using pre-defined projections in the CTYPEi keyword (short for for “Coordinate TYPE in dimension i”). Let’s have a look at the stack to see the default projection of our final stack:

$ astfits build/stack.fits -h1 | grep CTYPE
CTYPE1  = 'RA---TAN'           / Right ascension, gnomonic projection
CTYPE2  = 'DEC--TAN'           / Declination, gnomonic projection

We therefore see that the default projection of our final stack is the TAN (short for “tangential”) projection, which is more formally known as the Gnomonic projection. This is the most commonly used projection in optical astronomy. Now that we know the final projection, we can do this conversion using Table’s column arithmetic operator eq-j2000-from-flat like below:

$ pointingcat=build/pointing.txt
$ pointingonsky=build/pointing-on-sky.fits
$ asttable $pointingcat --output=$pointingonsky \
           -c'arith RA          set-r \
                    DEC         set-d \
                    r meanvalue set-ref-r \
                    d meanvalue set-ref-d \
                    r d ref-r ref-d TAN eq-j2000-from-flat' \
           --colmetadata=1,RA,deg,"Right ascension" \
           --colmetadata=2,Dec,deg,"Declination"

$ astscript-fits-view build/pointing-on-sky.fits

Here is a break-down of the first command above: to do the flat-to-sky conversion, we need a reference point (where the two are equal). We have used the mean RA and mean Dec (through the meanvalue operator in Arithmetic) as our reference point (which are placed in the ref-r and red-d variables. After calling the eq-j2000-from-flat operator, we have just added metadata to the two columns.

To confirm that this operator done the job correctly, after the second command above, repeat the same experiment as before with TOPCAT (where you viewed the pointing positions on a flat and spherical coordinate system). You will see that indeed, on the sphere you have a ‘+’ shape, but on the flat plot, it looks stretched.

Script update 1: you should now add the pointingonsky definition and the asttable command above into the script of Script with pointing simulation steps so far. They should be placed before the call to astscript-pointing-simulate. Also, in the call to astscript-pointing-simulate, $pointingcat should be replaced with $pointingonsky (so it doesn’t use the flat RA, Dec pointings).

After implementing this change in your script and running it, open deep.fits and you will see that the widths of both the horizontal and vertical regions are much more similar. The top of the vertical overlap is slightly wider than the bottom, but that is something you can’t fix by just pointing (your camera’s field of view is fixed on the sky!). It can be correctly by slightly rotating some of the exposures, but that will result in different PSFs from one exposure to another; and this can cause more important problems for your final science.

Plotting the spherical RA and Dec in your papers: The inverse of the eq-j2000-from-flat operator above is the eq-j2000-to-flat. eq-j2000-to-flat can be used when you want to plot a set points with spherical RA and Dec in a paper. When the minimum and maximum RA and Dec differ by larger than half a degree, you’ll clearly see the difference. For more, see the description of these operators in Column arithmetic.

Try to slightly increase step_arcmin to make the cross-like region with 4 exposures as thin as possible. For example, set it to step_arcmin=42. When you open deep.fits, you will see that the depth across this image is almost contiguous (which is another positive factor!). Try increasing it to 43 arc minutes to see that the central cross will become almost fully NaN in deep.fits (which is bad!).

You will notice that the vertical region of 4 exposure depth is thinner in the bottom than on the top. This is due to the RA/Dec change above, but across the width of the image. We can’t therefore change this by just changing the position of the pointings, we need to rotate some of the exposures if we want it to be removed. But rotation is not yet implemented in this script.

You can construct any complex pointing pattern (with more than 5 points and in any shape) based on the logic and reasoning above to help extract the most science from the valuable telescope time that you will be getting. Since the output is a FITS file, you can easily download another FITS file of your target, open it with DS9 (and “lock” the “WCS”) with the stack produced by this simulation to make sure that the deep parts correspond to the area of interest for your science case.

Factors like the optimal exposure time are also critical for the final result76, but is was beyond the scope of this tutorial. One relevant factor however is the effect of vignetting: the pixels on the outer extremes of the field of view that are not exposed to light and should be removed from your final stack. They effect your pointing pattern: by decreasing your total area, they act like a larger spacing between your points, causing similar shallow crosses as you saw when you set step_arcmin to 43 arc minutes. In Accounting for non-exposed pixels, we will show how this can be done within the same test concept that we done here.


2.8.6 Accounting for non-exposed pixels

At the end of Pointings that account for sky curvature we were able to maximize the region of same depth in our stack. But we noticed that issues like strong vignetting can create discontinuity in our final stacked data product. In this section, we’ll review the steps to account for such effects. Generally, the full area of a detector is not usually used in the final stack. Vignetting is one cause, it can be due to other problems also. For example due to baffles in the optical path (to block stray light), or large regions of bad (unusable or “dead”) pixels that may be in any place on the detector77.

Without accounting for these pixels that do not receive any light, the deep area we measured in the sections above will be over-estimated. In this sub-section, let’s review the necessary additions to account for such artifacts. Therefore, before continuing, please make sure that you have already read and applied the steps of the previous sections (this sub-section builds upon that section).

Vignetting strongly depends on the optical design of the instrument you are using. It can be a constant number of pixels on all the edges the detector, or it can have a more complex shape. For example on cameras that have multiple detectors on the field of view, in this case, the regions to exclude on each detector can be very different and will not be symmetric!

Therefore, within Gnuastro’s astscript-pointing-simulate script there is no parameter for pre-defined vignetting shapes. Instead, you should define a mask that you can apply on each exposure through the provided hook (--hook-warp-before; recall that we previously used another hook in Larger steps sizes for better calibration). Through the mask, you are free to set any vignetted or bad pixel to NaN (thus ignoring them in the stack) and applying it in any way that best suites your instrument and detector.

The mask image should be same size as the reference image, but only containing two values: 0 or 1. Pixels in each exposure that have a value of 1 in the mask will be set to NaN before the stacking process and will not contribute to the final stack. Ideally, you can use the master flat field image of the previous reductions to create this mask: any pixel that has a low sensitivity in the master flat (for any reason) can be set to 1, and the rest of the pixels to 0.

Let’s build a simple mask by assuming that we only have strong vignetting that is affecting the outer 30 arc seconds of the individual exposures. To mask the outer edges of an image we can use Gnuastro’s Arithmetic program; and in particular, the indexonly operator. To learn more about this operator, see Size and position operators.

But before doing that, we need convert this angular distance to pixels on the detector. In Pointing pattern design, we used an undersampled version of the input image, so we should do this conversion on that image:

$ margin_arcsec=30
$ margin_pix=$(astfits input/ref.fits --pixelscale --quiet \
                     | awk '{print int('$margin_arcsec'/($1*3600))}')
$ echo $margin_pix
5

To build the mask, we can now follow the recipe under “Image: masking margins” of the index operator in Arithmetic (for a full description of what this command is doing78, see Size and position operators). Finally, in the last command, let’s look at the mask image in the “red” color map of DS9 (which will shows the thin 1-valued pixels to mask on the border more clearly).

$ width=$(astfits  input/ref.fits --keyvalue=NAXIS1 -q)
$ height=$(astfits input/ref.fits --keyvalue=NAXIS2 -q)

$ astarithmetic input/ref.fits indexonly     set-i \
                $width         uint16        set-w \
                $height        uint16        set-h \
                $margin_pix    uint16        set-m \
                i w %          uint16        set-X \
                i w /          uint16        set-Y \
                X m lt         X w m - gt       or \
                Y m lt         Y h m - gt       or \
                or --output=build/mask.fits

$ astscript-fits-view build/mask.fits --ds9extra="-cmap red"

We are now ready to run the main pointing simulate script. With the command below, we will use the --hook-warp-before to apply this mask on the image of each exposure just before warping. The concept of this hook is very similar to that of --hook-warp-after in Pointing pattern design. As the name suggests, this hook is applied “before” the warping. The input to the command given to this hook should be called with $EXPOSURE and the output should be called with $TOWARP. With the second command, let’s compare the two outputs:

$ astscript-pointing-simulate build/pointing-on-sky.fits \
           --output=build/stack-with-trim.fits --img=input/ref.fits \
           --center=$center_ra,$center_dec --width=2 \
           --hook-warp-before='astarithmetic $EXPOSURE build/mask.fits \
                                             nan where -g1 -o$TOWARP'

$ astscript-fits-view build/stack.fits build/stack-with-trim.fits

As expected, due to the smaller area of the detector that is exposed to photons, the regions with 4 exposures have become much thinner and on the bottom, it has been removed. To have contiguous depth in the deeper region, use this new call in your script and decrease the step_arcmin=41.

You can use the same command on a mask that is created in any way and as realistic as possible. More generically, you can use the before and after hooks for any other operation; for example to insert objects from a catalog using MakeProfiles as well as adding noise as we did in Pointing pattern design.

Therefore it is also good to add the mask and its application in your script. This should be pretty easy by now (following Script with pointing simulation steps so far and the “Script update 1” box of Pointings that account for sky curvature). So we will leave this as an exercise.


2.9 Moiré pattern in stacking and its correction

After warping some images with the default mode of Warp (see Align pixels with WCS considering distortions) you may notice that the background noise is no longer flat. Some regions will be smoother and some will be sharper; depending on the orientation and distortion of the input/output pixel grids. This is due to the Moiré pattern, which is especially noticeable/significant when two slightly different grids are super-imposed.

With the commands below, we’ll download a single exposure image from the J-PLUS survey and run Warp (on a \(8\times8\) arcmin\(^2\) region to speed it up the demos here). Finally, we’ll open the image to visually see the artificial Moiré pattern on the warped image.

## Download the image (73.7 MB containing an 9216x9232 pixel image)
$ jplusdr2=http://archive.cefca.es/catalogues/vo/siap/jplus-dr2/reduced
$ wget $jplusdr2/get_fits?id=771463 -Ojplus-exp1.fits.fz

## Align a small part of it with the sky coordinates.
$ astwarp jplus-exp1.fits.fz --center=107.62920,39.72472 \
          --width=8/60 -ojplus-e1.fits

## Open the aligned region with DS9
$ astscript-fits-view jplus-e1.fits

In the opened DS9 window, you can see the Moiré pattern as wave-like patterns in the noise: some parts of the noise are more smooth and some parts are more sharp. Right in the center of the image is a blob of sharp noise. Warp has the --checkmaxfrac option for direct inspection of the Moiré pattern (described with the other options in Align pixels with WCS considering distortions). When run with this option, an extra HDU (called MAX-FRAC) will be added to the output. The image in this HDU has the same size as the output. However, each output pixel will contain the largest (maximum) fraction of area that it covered over the input pixel grid. So if an output pixel has a value of 0.9, this shows that it covered \(90\%\) of an input pixel. Let’s run Warp with --checkmaxfrac and see the output (after DS9 opens, in the “Cube” window, flip between the first and second HDUs):

$ astwarp jplus-exp1.fits.fz --center=107.62920,39.72472 \
          --width=8/60 -ojplus-e1.fits --checkmaxfrac

$ astscript-fits-view jplus-e1.fits

By comparing the first and second HDUs/extensions, you will clearly see that the regions with a sharp noise pattern fall exactly on parts of the MAX-FRAC extension with values larger than 0.5. In other words, output pixels where one input pixel contributed more than half of the its value. As this fraction increases, the sharpness also increases because a single input pixel’s value dominates the value of the output pixel. On the other hand, when this value is small, we see that many input pixels contribute to that output pixel. Since many input pixels contribute to an output pixel, it acts like a convolution, hence that output pixel becomes smoother (see Spatial domain convolution). Let’s have a look at the distribution of the MAX-FRAC pixel values:

$ aststatistics jplus-e1.fits -hMAX-FRAC
Statistics (GNU Astronomy Utilities) 0.22
-------
Input: jplus-e1.fits (hdu: MAX-FRAC)
-------
  Number of elements:                      744769
  Minimum:                                 0.250213461
  Maximum:                                 0.9987495374
  Mode:                                    0.5034223567
  Mode quantile:                           0.3773819498
  Median:                                  0.5520805544
  Mean:                                    0.5693956458
  Standard deviation:                      0.1554693738
-------
Histogram:
 |                      ***
 |                   **********
 |                 *****************
 |              ************************
 |           *******************************
 |         **************************************
 |       *********************************************
 |     ****************************************************
 |   ***********************************************************
 | ******************************************************************
 |**********************************************************************
 |----------------------------------------------------------------------

The smallest value is 0.25 (=1/4), showing that 4 input pixels contributed to the output pixels value. While the maximum is almost 1.0, showing that a single input pixel defined the output pixel value. You can also see that the most probable value (the mode) is 0.5, and that the distribution is positively skewed.

This is a well-known problem in astronomical imaging and professional photography. If you only have a single image (that is already taken!), you can undersample the input: set the angular size of the output pixels to be larger than the input. This will decrease the resolution of your image, but will ensure that pixel-mixing will always happen. In the example below we are setting the output pixel scale (which is known as CDELT in the FITS standard) to \(1/0.5=2\) of the input’s. In other words each output pixel edge will cover double the input pixel’s edge on the sky, and the output’s number of pixels in each dimension will be half of the previous output.

$ cdelt=$(astfits jplus-exp1.fits.fz --pixelscale -q \
                  | awk '{print $1}')
$ astwarp jplus-exp1.fits.fz --center=107.62920,39.72472 \
          --width=8/60 -ojplus-e1.fits --cdelt=$cdelt/0.5 \
          --checkmaxfrac

In the first extension, you can hardly see any Moiré pattern in the noise. When you go to the next (MAX-FRAC) extension, you will see that almost all the pixels have a value of 1. Of course, decreasing the resolution by half is a little too drastic. Depending on your image, you may be able to reach a sufficiently good result without such a drastic degrading of the input image. For example, if you want an output pixel scale that is just 1.5 times larger than the input, you can divide the original coordinate-delta (or “cdelt”) by \(1/1.5=0.6666\) and try again. In the MAX-FRAC extension, you will see that the range of pixel values is now between 0.56 to 1.0 (recall that originally, this was between 0.25 and 1.0). This shows that the pixels are more similarly mixed and in fact, when you look at the actual warped image, you can hardly distinguish any Moiré pattern in the noise.

However, deep astronomical data are usually built by several exposures (images), not a single one. Each image is also taken by (slightly) shifting the telescope compared to the previous exposure. This shift is known as “dithering” or a “pointing pattern”, see Pointing pattern design. We do this for many reasons (for example tracking errors in the telescope, high background values, removing the effect of bad pixels or those affected by cosmic rays, robust flat pattern measurement, etc.79). One of those “etc.” reasons is to correct the Moiré pattern in the final coadded deep image.

The Moiré pattern is fixed to the grid of the image, slightly shifting the telescope will result in the pattern appearing in different parts of the sky. Therefore when we later stack, or coadd, the separate exposures into a deep image, the Moiré pattern will be decreased there. However, dithering has possible drawbacks based on the scientific goal. For example when observing time-variable phenomena where cutting the exposures to several shorter ones is not feasible. If this is not the case for you (for example in galaxy evolution), continue with the rest of this section.

Because we have multiple exposures that are slightly (sub-pixel) shifted, we can also increase the spatial resolution of the output. For example, let’s set the output coordinate-delta (--cdelt, or pixel scale) to be 1/2 of the input. In other words, the number of pixels in each dimension of the output is double the first Warp command of this section:

$ astwarp jplus-exp1.fits.fz --center=107.62920,39.72472 \
          --width=8/60 -ojplus-e1.fits --cdelt=$cdelt/2 \
          --checkmaxfrac

$ aststatistics jplus-e1.fits -hMAX-FRAC --minimum --maximum
6.26360438764095e-02 2.50680270139128e-01

$ astscript-fits-view jplus-e1.fits

From the last command, you see that like the previous change in --cdelt, the range of MAX-FRAC has decreased. However, when you look at the warped image and the MAX-FRAC image with the last command, you still visually see the Moiré pattern in the noise (although it has significantly decreased compared to the original resolution). It is still present because 2 is an exact multiple of 1. Let’s try increasing the resolution (oversampling) by a factor of 1.25 (which isn’t an exact multiple of 1):

$ astwarp jplus-exp1.fits.fz --center=107.62920,39.72472 \
          --width=8/60 -ojplus-e1.fits --cdelt=$cdelt/1.25 \
          --checkmaxfrac
$ astscript-fits-view jplus-e1.fits

You don’t see any Moiré pattern in the noise any more, but when you look at the MAX-FRAC extension, you see it is very different from the ones you had seen before. In the previous MAX-FRAC image, you could see large blobs of similar values. But here, you see that the variation is almost on a pixel scale, and the difference between one pixel to the next is not significant. This is why you don’t see any Moiré pattern in the warped image.

In J-PLUS, each part of the sky was observed with a three-point pointing pattern (very small shifts in each pointing). Let’s download the other two exposures and warp the same region of the sky to the same pixel grid (using the --gridfile feature). Then, let’s open all three warped images in one DS9 instance:

$ wget $jplusdr2/get_fits?id=771465 -Ojplus-exp2.fits.fz
$ wget $jplusdr2/get_fits?id=771467 -Ojplus-exp3.fits.fz

$ astwarp jplus-exp2.fits.fz --gridfile jplus-e1.fits \
          -o jplus-e2.fits --checkmaxfrac
$ astwarp jplus-exp3.fits.fz --gridfile jplus-e1.fits \
          -o jplus-e3.fits --checkmaxfrac

$ astscript-fits-view jplus-e*.fits

In the three warped images, you don’t see any Moiré pattern, so far so good... now, take the following steps:

  1. In the small “Cube” window, click the “Next” button so you see the MAX-FRAC extension/HDU.
  2. Click on the “Frame” button (in the top row of buttons just on top of the image), and select the “Single” button in the bottom row.
  3. Open the “Zoom” menu (not button), and select “Zoom 16”.
  4. Press the TAB key to flip through each exposure.
  5. Focus your eyes on the pixels with the largest value (white colored pixels), while pressing TAB to flip between the exposures. You will see that in each exposure they cover different pixels (nicely getting averaged out after stacking).

The exercise above shows that the Moiré pattern (that had already decreased significantly) will be further decreased after we stack the images. So let’s stack these three images with the commands below. First, we need to remove the sky-level from each image using NoiseChisel, then we’ll stack the INPUT-NO-SKY extensions using filled MAD-clipping (to reject outliers, and especially diffuse outliers, robustly, see Clipping outliers).

$ for i in $(seq 3); do \
   astnoisechisel jplus-e$i.fits -ojplus-nc$i.fits; \
  done

$ astarithmetic jplus-nc*.fits 3 5 0.2 sigclip-mean \
                -gINPUT-NO-SKY -ojplus-stack.fits

$ astscript-fits-view jplus-nc*.fits jplus-stack.fits

After opening the individual exposures and the final stack with the last command, take the following steps to see the comparisons properly:

  1. Click on the stack image so it is selected.
  2. Go to the “Frame” menu, then the “Lock” item, then activate “Scale and Limits”.
  3. Scroll your mouse or touchpad to zoom into the image.

You clearly see that the stacked image is deeper and that there is no Moiré pattern, while you have slightly improved the spatial resolution of the output compared to the input. In case you want the stack to have the original pixel resolution, you just need one more warp:

$ astwarp jplus-stack.fits --cdelt=$cdelt -ojplus-stack-origres.fits

For optimal results, the oversampling should be determined by the dithering pattern of the observation: For example if you only have two dither points, you want the pixels with maximum value in the MAX-FRAC image of one exposure to fall on those with a minimum value in the other exposure. Ideally, many more dither points should be chosen when you are planning your observation (not just for the Moiré pattern, but also for all the other reasons mentioned above). Based on the dithering pattern, you want to select the increased resolution such that the maximum MAX-FRAC values fall on every different pixel of the output grid for each exposure. Note that this discussion is on small shifts between pointings (dithers), not large ones like offsets); see Pointing pattern design.


2.10 Clipping outliers

Outliers occur often in data sets. For example cosmic rays in astronomical imaging: the image of your target galaxy can be affected by a cosmic ray in one of the five exposures you took in one night. As a result, when you compare the measured magnitude of your target galaxy in all the exposures, you will get measurements like this (all in magnitudes) 19.8, 20.1, 20.5, 17.0, 19.9 (all fluctuating around magnitude 20, except the much brighter 17th magnitude measurement).

Normally, you would simply take the mean of these measurements to estimate the magnitude of your target with more precision. However, the 17th magnitude measurement above is clearly wrong and will significantly affect the mean: without it, the mean magnitude is 20.07, but with it, the mean is 19.46:

$ echo " 19.8 20.1 20.5 17 19.9" \
       | tr ' ' '\n' \
       | aststatistics --mean
1.94600000000000e+01

$ echo " 19.8 20.1 20.5 19.9" \
       | tr ' ' '\n' \
       | aststatistics --mean
2.00750000000000e+01

This difference of 0.61 magnitudes (or roughly 1.75 times) is significant (for the definition of magnitudes in astronomy, see Brightness, Flux, Magnitude and Surface brightness). In the simple example above, you can visually identify the “outlier” and manually remove it. But in most common situations you will not be so lucky! For example when you want to stack the five images of the five exposures above, and each image has \(4000\times4000\) (or 16 million!) pixels and not possible by hand in a reasonable time (an average human’s lifetime!).

This tutorial reviews the effect of outliers and different available ways to remove them. In particular, we will be looking at stacking of multiple datasets and collapsing one dataset along one of its dimensions. But the concepts and methods are applicable to any analysis that is affected by outliers.


2.10.1 Building inputs and analysis without clipping

As described in Clipping outliers, the goal of this tutorial is to demonstrate the effects of outliers and show how to “clip” them from basic statistics measurements. This is best done on an actual dataset (rather than pure theory). In this section we will build nine noisy images with the script below, such that one of the images has a circle in the middle. We will then stack the 9 images into one final image based on different statistical measurements: the mean, median, standard deviation (STD), median absolute deviation (MAD) and number of inputs used in each pixel. We will then analyze the resulting stacks to demonstrate the problem with outliers.

Put the script below into a plain-text file (assuming it is called script.sh), and run it with bash ./script.sh. For more on writing and good practices in shell scripts, see Writing scripts to automate the steps. The last command of the script above calls DS9 to visualize the five output stacked images mentioned above.

# Constants
list=""
sigma=10
number=9
radius=30
width=201
bdir=build
profsum=3e5
background=10
random_seed=1699270427


# Clipping parameters (will be set later when we start clipping).
# clip_multiple:  3   for sigma; 4.48 for MAD
# clip_tolerance: 0.1 for sigma; 0.01 for MAD
clip_operator=""
clip_multiple=""
clip_tolerance=""


# Stop if there is any error.
set -e


# If the build directory does not exist, build it.
if ! [ -d $bdir ]; then mkdir $bdir; fi


# The final image (with largest number) will contain the outlier:
# we'll put a flat circle in the center of the image as the outlier
# structure.
outlier=$bdir/in-$number.fits
nn=$bdir/$number-no-noise.fits
export GSL_RNG_SEED=$random_seed
center=$(echo $width | awk '{print int($1/2)+1}')
echo "1 $center $center 5 $radius 0 0 1 $profsum 1" \
    | astmkprof --mode=img --mergedsize=$width,$width \
                --oversample=1 --output=$nn --mcolissum
astarithmetic $nn $background + $sigma mknoise-sigma \
	      --envseed -o$outlier


# Build pure noise and add elements to the list of images to stack.
list=$outlier
numnoise=$(echo $number | awk '{print $1-1}')
for i in $(seq 1 $numnoise); do
    img="$bdir/in-$i.fits"
    if ! [ -f $img ]; then
	export GSL_RNG_SEED=$(echo $random_seed | awk '{print $1+'$i'}')
	astarithmetic $width $width 2 makenew float32 $background + \
		      $sigma mknoise-sigma --envseed --output=$img
    fi
    list="$list $img"
done


# Stack the images,
for op in mean median std mad number; do
    if [ x"$clip_operator" = x ]; then
	out=$bdir/stack-$op.fits
	astarithmetic $list $number $op -g1 --output=$out
    else
	operator=$clip_operator-$op
	out=$bdir/stack-$operator.fits
	astarithmetic $list $number $clip_multiple $clip_tolerance \
		      $operator -g1 --output=$out
    fi
done


# Collapse the first and last image along the 2nd dimension.
for i in 1 $number; do
    if [ x"$clip_operator" = x ]; then
	out=$bdir/collapsed-$i.fits
	astarithmetic $bdir/in-$i.fits 2 collapse-median counter \
		      --writeall --output=$out
    else
	out=$bdir/collapsed-$clip_operator-$i.fits
	astarithmetic $bdir/in-$i.fits $clip_multiple $clip_tolerance \
		      2 collapse-$clip_operator-median counter \
		      --writeall --output=$out
    fi
done

After the script finishes, you can see the generated input images with the first command below. The second command shows the stacked images.

$ astscript-fits-view build/in-*.fits --ds9extra="-lock scalelimits yes"
$ astscript-fits-view build/stack-*.fits

Color-blind readers may not clearly see the issue in the opened images with this color bar. In this case, please choose the “color” menu at the top of the DS9 and select “gray” or any other color that makes the circle most visible.

The effect of an outlier on the different measurements above can be visually seen (and quantitatively measured) through the visibility of the circle (that was only present in one image, of nine). Let’s look at them one by one (from the one that is most affected to the least):

std.fits

The standard deviation (third image in DS9) is the most strongly affected statistic by an outlier. This is so strong that the edge of the circle is also clearly visible! The standard deviation is calculated by first finding the mean, and estimating the difference of each element from the mean. Those differences are then taken to the power of two and finally the square root is taken (after a division by the number). It is the power-of-two component that amplifies the effect of the single outlier as you see here.

mean.fits

The mean (first image in DS9) is also affected by the outlier in such a way that the circle footprint is clearly visible. This is because the nine images have the same importance in the combination with a simple mean. Therefore, the outlier value pushes the result to higher values and the circle is printed.

median.fits

The median (second image in DS9) is also affected by the outlier; although much less significantly than the standard deviation or mean. At first sight the circle may not be too visible! To see it more clearly, click on the “Analysis” menu in DS9 and then the “smooth” item. After smoothing, you will see how the single outlier has leaked into the median stack.

Intuitively, we would think that since the median is calculated from the middle element after sorting, the outlier goes to the end and won’t affect the result. However, this is not the case as we see here: with 9 elements, the “central” element is the 5th (counting from 1; after sorting). Since the pixels covered by the circle only have 8 pure noise elements; the “true” median should have been the average of the 4th and 5th elements (after sorting). By definition, the 5th element is always larger than the mean of the 4th and 5th (because the 4th element after sorting has a smaller value than the 5th element). Therefore, using the 5th element (after sorting), we are systematically choosing higher noise values in regions that are covered by the circle!

With larger datasets, the difference between the central elements will be less. However, the improved precision (in the elements without an outlier) will also be more. A detailed analysis of the effect of a single outlier on the median based on the number of inputs can be done as an exercise; but in general, as this argument shows, the median is not immune to outliers; especially when you care about low signal-to-noise regimes (as we do in astronomy: low surface brightness science).

mad.fits

The median absolute deviation (fourth image in DS9) is affected by outliers in a similar fashion to the median.

number.fits

The number image (last image in DS9) shows the number of images that went into each pixel. Since we haven’t rejected any outliers (yet!), all the pixels in this image have a value of 9.

The example above included a single outlier. But we are not usually that lucky: there are usually more outliers! For example, the last command in the script above collapsed 1.fits (that was pure noise, without the circle) and 9.fits (with the circle) along their second dimension (the vertical). Collapsing was done by taking the median along all the pixels in the vertical dimension. The output of collapsing has one less dimension; in this case, producing a 1D table (with the same number of rows as the image’s horizontal axis). To easily plot the output afterwards, we have also used the counter operator. With the command below, you can open both tables and compare them:

$ astscript-fits-view build/collapsed-*.fits

The last command opens TOPCAT. In the “Graphics” menu, select plane plot and you will see all the values fluctuating around 10 (with a maximum/minimum around \(\pm2\)). Afterwards, click on the “Layers” menu and click on “Add position control”. In the opened tab at the bottom (where the scroll bar in front of “Table” is empty), select the other table. In the regions that there was no circle in any of the vertical axes, the two match nicely (the noise level is the same). However, you see that the image columns that were partly covered by the outlying circle gradually get more affected as the width of the circle in that column increases (the full diameter of the circle was in the middle of the image). This shows how the median is biased by outliers as their number increases.

To see the problem more prominently, use the collapse-mean operator instead of the median. The reason that the mean is more strongly affected by the outlier is exactly the same as above for the stacking of the input images. In the subsections below, we will describe some of the basic ways to reject the effect of these outliers (and have better stacks or collapses). But the methodology is not limited to these types of data and can be generically applied; unless specified explicitly.


2.10.2 Sigma clipping

Let’s assume that you have pure noise (centered on zero) with a clear Gaussian distribution, or see Photon counting noise. Now let’s assume you add very bright objects (signal) on the image which have a very sharp boundary. By a sharp boundary, we mean that there is a clear cutoff (from the noise) at the pixels the objects finish. In other words, at their boundaries, the objects do not fade away into the noise.

In optical astronomical imaging, cosmic rays (when they collide at a near normal incidence angle) are a very good example of such outliers. The tracks they leave behind in the image are perfectly immune to the blurring caused by the atmosphere on images of stars or galaxies and they have a very high signal-to-noise ratio. They are also very energetic and so their borders are usually clearly separated from the surrounding noise. See Figure 15 in Akhlaghi and Ichikawa, 2015.

In such a case, when you plot the histogram (see Histogram and Cumulative Frequency Plot) of the distribution, the pixels relating to those objects will be clearly separate from pixels that belong to parts of the image that did not have any signal (were just noise). In the cumulative frequency plot, after a steady rise (due to the noise), you would observe a long flat region were for a certain range of data (horizontal axis), there is no increase in the index (vertical axis).

In the previous section (Building inputs and analysis without clipping) we created one such dataset (9.fits). With the command below, let’s have a look at its histogram and cumulative frequency plot (in simple ASCII format; we are decreasing the default number of bins with --numasciibins to show them easily within the width of the print version of this manual; feel free to change this).

$ aststatistics build/in-9.fits --asciihist --asciicfp \
                --numasciibins=65

ASCII Histogram:
Number: 40401
Y: (linear: 0 to 4191)
X: (linear: -31.9714 -- 150.323, in 65 bins)
 |              **
 |             ****
 |            ******
 |            ******
 |           ********
 |           ********
 |          **********
 |         ************
 |        **************
 |      ******************                          ******
 |*******************************         *************************
 |-----------------------------------------------------------------


ASCII Cumulative frequency plot:
Y: (linear: 0 to 40401)
X: (linear: -31.9714 -- 150.323, in 65 bins)
 |                                                  ***************
 |                   **********************************************
 |                  ***********************************************
 |                *************************************************
 |               **************************************************
 |              ***************************************************
 |             ****************************************************
 |            *****************************************************
 |           ******************************************************
 |         ********************************************************
 |*****************************************************************
 |-----------------------------------------------------------------

Outliers like the example above can significantly bias the measurement of the background noise statistics. For example let’s compare the median, mean and standard deviation of the image above with 1.fits:

$ aststatistics build/in-1.fits --median --mean --std
9.90529778313248e+00 9.96143102101206e+00 1.00137568561776e+01

$ aststatistics build/in-9.fits --median --mean --std
1.09305819367634e+01 1.74470443173776e+01 2.88895986970341e+01

The effect of the outliers is obvious in all three measures: the median has become 1.10 times larger, the mean 1.75 times and the standard deviation about 2.88 times! The differing effect of outliers in different statistics was already discussed in Building inputs and analysis without clipping; also see Quantifying signal in a tile.

\(\sigma\)-clipping is one way to remove/clip the effect of such very strong outliers in measures like the above. \(\sigma\)-clipping is defined as the very simple iteration below. In each iteration, the range of input data might decrease. When the outliers are as strong as above, the outliers will be removed through this iteration.

  1. Calculate the standard deviation (\(\sigma\)) and median (\(m\)) of a distribution. The median is used because, as shown above, the mean is too significantly affected by the presence of outliers.
  2. Remove all points that are smaller or larger than \(m\pm\alpha\sigma\).
  3. Go back to step 1, unless the selected exit criteria is reached. There are commonly two types of exit criteria (to stop the \(\sigma\)-clipping iteration). Within Gnuastro’s programs that use sigma-clipping, the exit criteria is the second value to the --sclipparams option (the first value is the \(\alpha\) above):
    • When a certain number of iterations has taken place (exit criteria is an integer, larger than 1).
    • When the new measured standard deviation is within a certain tolerance level of the previous iteration (exit criteria is floating point and less than 1.0). The tolerance level is defined by:

      $$\sigma_{old}-\sigma_{new} \over \sigma_{new}$$

      In each clipping, the dispersion in the distribution is either less or equal. So \(\sigma_{old}\geq\sigma_{new}\).

Let’s see the algorithm in practice with the --sigmaclip option of Gnuastro’s Statistics program (using the default configuration of \(3\sigma\) clipping and tolerance of 0.1):

$ aststatistics build/in-9.fits --sigmaclip
Statistics (GNU Astronomy Utilities) 0.22
-------
Input: build/in-9.fits (hdu: 1)
-------
3-sigma clipping steps until relative change in STD is less than 0.1:

round number     median       STD
1     40401      1.09306e+01  2.88896e+01
2     37660      1.00306e+01  1.07153e+01
3     37539      1.00080e+01  9.93741e+00
-------
Statistics (after clipping):
  Number of input elements:                40401
  Number of clips:                         2
  Final number of elements:                37539
  Median:                                  1.000803e+01
  Mean:                                    1.001822e+01
  Standard deviation:                      9.937410e+00
  Median Absolute Deviation:               6.772760e+00

After the basic information about the input and settings, the Statistics program has printed the information for each round (iteration) of clipping. Initially, there was 40401 elements (the image is \(201\times201\) pixels). After the first round of clipping, only 37660 elements remained and because the difference in standard deviation was larger than the tolerance level, a third clipping was one. But the change in standard deviation after the third clip was smaller than the tolerance level, so the exit criteria was activated and the clipping finished with 37539 elements. In the end, we see that the final median, mean and standard deviation are very similar to the data without any outlier (build/1.fits in the example above).

The example above provided a single statistic from a single dataset. Other scenarios where sigma-clipping becomes necessary are stacking and collapsing (that was the main goal of the script in Building inputs and analysis without clipping). To generate \(\sigma\)-clipped stacks and collapsed tables, you just need to change the values of the three variables of the script (shown below). After making this change in your favorite text editor, have a look at the outputs:

$ grep ^clip_ script.sh
clip_operator=sigclip    # These variables will be used more
clip_multiple=3          # effectively with the clipping
clip_tolerance=0.1       # operators of the next sections.

$ bash ./script.sh

$ astscript-fits-view build/stack-std.fits \
                      build/stack-sigclip-std.fits \
                      build/stack-*mean.fits \
                      build/stack-*median.fits \
                      build/stack-*number.fits \
                      --ds9extra="-tile grid layout 2 4 -scale minmax"

It is clear that the \(\sigma\)-clipped results have significantly improved in all four measures (images in the right column in DS9). The reason is clear in the stack-sigclip-number.fits image (which show how many images were used for each pixel): almost all of the outlying circle has been accounted for (the pixel values are 8, not 9, showing 8 images went into those). It is the leaked holes in the stack-sigclip-number.fits image (with value of 9) that keep the circle in the final stack of the other measures (at various levels). See Filled re-clipping for stacking operators that can account for this.

So far, \(\sigma\)-clipping has preformed nicely. However, there are important caveats to \(\sigma\)-clipping that are listed in the box below and further elaborated (with examples) afterwards.

Caveats of \(\sigma\)-clipping: There are some important caveats to \(\sigma\)-clipping:

  • The standard deviation is itself heavily influenced by the presence of outliers. Therefore a sufficiently small number of outliers can expand the standard deviation such that they stay within the boundaries.
  • When the outliers do not constitute a clearly distinct distribution like the example here, sigma-clipping will not be able to separate them like here.

To demonstrate how weaker outliers will not be clipped in sigma clipping, let’s decrease the total sum of values in the outlying circle, then re-run the script:

$ grep ^profsum script.sh
profsum=1e5

$ bash ./script.sh

Let’s have a look at the new outlying circle with the first command below. With the second command, let’s view its pixel value histogram (recall that previously, the circle had a clearly separate distribution):

$ astscript-fits-view build/in-9.fits

$ aststatistics build/in-9.fits --asciihist --numasciibins=65
ASCII Histogram:
Number: 40401
Y: (linear: 0 to 2654)
X: (linear: -31.9714 -- 79.4266, in 65 bins)
 |                        **
 |                      *****
 |                    *********
 |                    **********
 |                  *************
 |                  **************
 |                *****************
 |               *******************
 |             ***********************
 |          ****************************************
 |*****************************************************************
 |-----------------------------------------------------------------

We see that even tough the circle is still clearly visible in the noise, the histogram is not longer separate; it has blended into the noise, and just caused a skewness in the otherwise symmetric noise distribution. Let’s try running the --sigmaclip option as above:

$ aststatistics build/in-9.fits --sigmaclip
Statistics (GNU Astronomy Utilities) 0.22
-------
Input: build/in-9.fits (hdu: 1)
-------
3-sigma clipping steps until relative change in STD is less than 0.1:

round number     median       STD
1     40401      1.09295e+01  1.34784e+01
2     39618      1.06762e+01  1.19852e+01
3     39126      1.05265e+01  1.12983e+01
-------
Statistics (after clipping):
  Number of input elements:                40401
  Number of clips:                         2
  Final number of elements:                39126
  Median:                                  1.052652e+01
  Mean:                                    1.114819e+01
  Standard deviation:                      1.129831e+01
  Median Absolute Deviation:               7.106166e+00

We see that the median, mean and standard deviation are over estimated (each worse than the previous!). Let’s see how the \(\sigma\)-clipping stacking on this outlying flat circle:

$ astscript-fits-view build/stack-std.fits \
                      build/stack-sigclip-std.fits \
                      build/stack-*mean.fits \
                      build/stack-*median.fits \
                      build/stack-*number.fits \
                      --ds9extra="-tile grid layout 2 4 -scale minmax"

Compared to the previous run (where the outlying circle was brighter), we see that \(\sigma\)-clipping is now less successful in removing the outlying circle from the stacks; or in the single value measurements. This is particularly visible in the stack-sigclip-number.fits image: the circle barely visible any more: there is only a very weak clustering of pixels with a value of 8 over the circle’s pixels. This has happened because the outliers have biased the standard deviation itself to a level that includes them with this multiple of the standard deviation.

To gauge if \(\sigma\)-clipping will be useful for your dataset, look at the histogram (see Histogram and Cumulative Frequency Plot). The ASCII histogram that is printed on the command-line with --asciihist (like above) is good enough in most cases. But you can’t do this manually in every case (as in the stacking which involved more than forty thousand pixels)! Clipping outliers should be based on a measure of scatter that is less affected by outliers! Therefore, in Gnuastro we also have median absolute deviation (MAD) clipping which is described in the next section (MAD clipping).


2.10.3 MAD clipping

When clipping outliers, it is important that the used measure of dispersion is itself not strongly affected by the outliers. Previously (in Sigma clipping), we saw that the standard deviation is not a good measure of dispersion because of its strong dependency on outliers. In this section, we’ll introduce clipping operators that are based on the median absolute deviation (MAD).

The median absolute deviation is defined as the median of the differences of each element from the median of the elements. As mathematically derived in the Wikipedia page above, for a pure Gaussian distribution, the median absolute deviation will be roughly \(0.67449\sigma\). We can confirm this numerically from the images with pure noise that we created previously in Building inputs and analysis without clipping. With the first command below we can see the raw standard deviation and median absolute deviation values and the second command shows their division:

$ aststatistics build/in-1.fits --std --mad
1.00137568561776e+01 6.74662296703343e+00

$ aststatistics build/in-1.fits --std --mad | awk '{print $2/$1}'
0.673735

The algorithm of MAD-clipping is identical to \(\sigma\)-clipping, except that instead of \(\sigma\), it uses the median absolute deviation. Since the median absolute deviation is smaller than the standard deviation by roughly 0.67, if you regularly use \(3\sigma\) there, you should use \((3/0.67)\rm{MAD}=(4.48)\rm{MAD}\) when doing MAD-clipping. The usual tolerance should also be changed due to the differing nature of the median absolute deviation (based on sorted differences) in relation to the standard deviation (based on the sum of squared differences). A tolerance of 0.01 is better suited to the termination criteria of MAD-clipping.

To demonstrate the steps in practice, let’s assume you have the original script in Building inputs and analysis without clipping with the changes shown in the first command below. With the second command we’ll execute the script, and with the third command we’ll do the iterations of MAD-clipping:

$ grep '^clip_\|^profsum' script.sh
profsum=1e5
clip_operator=madclip
clip_multiple=4.48
clip_tolerance=0.01

$ bash ./script.sh

$ aststatistics build/in-9.fits --madclip
Statistics (GNU Astronomy Utilities) 0.21.6-28a1
-------
Input: build/in-9.fits (hdu: 1)
-------
4.48-MAD clipping steps until relative change in MAD
(median absolute deviation) is less than 0.01:

round number     median       MAD
1     40401      1.09295e+01  7.38609e+00
2     38789      1.04261e+01  7.03508e+00
3     38549      1.03469e+01  6.97927e+00
-------
Statistics (after clipping):
  Number of input elements:                40401
  Number of clips:                         2
  Final number of elements:                38549
  Median:                                  1.034690e+01
  Mean:                                    1.068946e+01
  Standard deviation:                      1.062083e+01
  Median Absolute Deviation:               6.979274e+00

We see that the median, mean and standard deviation after MAD-clipping is much better than the basic \(\sigma\)-clipping (see Sigma clipping): the median is now 10.3 (was 10.5 in \(\sigma\)-clipping), mean is 10.7 (was 10.11) and the standard deviation is 10.6 (was 10.12).

Let’s compare the MAD-clipped stacks with the results of the previous section. Since we want the images shown in a certain order, we’ll first construct the list of images (with a for loop that will fill the imgs variable). Note that this assumes you have ran and carefully read/understand all the commands in the previous sections (Building inputs and analysis without clipping and Sigma clipping). Tip: the three --ds9extra options ensure that the bottom row (showing the number of images used in each pixel) has the same scale and limits in all three columns.

$ imgs=""
$ p=build/stack   # 'p' is short for "prefix"
$ for m in std mean median mad number; do \
   imgs="$imgs $p-$m.fits $p-sigclip-$m.fits $p-madclip-$m.fits"; \
  done
$ astscript-fits-view $imgs --ds9extra="-tile grid layout 3 5" \
                      --ds9extra="-scale limits 1 9" \
                      --ds9extra="-frame move back -scale limits 1 9" \
                      --ds9extra="-frame move back -scale limits 1 9"

The third column shows the newly created MAD-clipped stacks. We see that the outlying circle is much more weaker in the MAD-clipped stacks than in the \(\sigma\)-clipped stacks in all measures (except for the “number” measure where the circle should be stronger).

However, unfortunately even MAD-clipping is not perfect and we still see the circle in all four cases, even with the MAD-clipped median (more clearly: after smoothing/blocking). The reason is similar to what was described in \(\sigma\)-clipping (using the original profsum=3e5: the leaked holes in the numbers image. Because the circle is not too distant from the noise some of its elements do not get clipped, and their stacked value gets systematically higher than the rest of the image. In Gnuastro, we have a fix for this that is described fully in the next section (Filled re-clipping).


2.10.4 Filled re-clipping

When source of the outlier covers more than one element, and its flux is close to the noise level, not all of its elements will be clipped: because of noise, some of its elements will remain un-clipped; and thus affecting the output. Examples of this were created and thoroughly discussed in previous sections with \(\sigma\)-clipping and MAD-clipping (see Sigma clipping and MAD clipping).

To avoid this problem, in Gnuastro we have an extra set of clipping operators that will do two rounds of clips and with some basic operations in the middle.

  1. The requested clipping is first applied. This will the return the median and dispersion measure (MAD or STD).
  2. A mask is created for each input image (in stacking) or 1D array (in collapsing). Any pixel that is outside the requested clip range is set to 1; the rest are set to 0.
  3. Isolated regions are found:
    • For 2D images (were each pixel has 8 neighbors) the mask pixels are first dilated (so the edges of the regions are closed off from the surrounding noise).
    • For 1D arrays (where each element only has two neighbors), the mask is first eroded. This is necessary because the next step (where the holes are filled), two pixels that have been clipped purely due to noise with a large distance between them can wrongly mask a very large range of the input data.
  4. Any 0-valued pixel in the masks that are fully surrounded by 1s (or “holes”) are filled (given a value of 1).
  5. All the pixels that have a value of 1 in the mask are set to NaN in the respective input data (that the mask corresponds to).
  6. The requested clipping is repeated on the newly masked inputs.

Through this process, the less significant outliers (which do not get clipped independently) are clipped based on their surrounding elements. The filled re-clipping operators have an extra -fill in their names. For example the filled MAD-clipped mean is called madclip-fill-mean (while the simple MAD-clipped mean operator was called madclip-mean). Let’s run our script with the filled \(\sigma\)-clipping and \(MAD\)-clipping (before each run, make sure the values shown under the grep command are correct).

With the last command, we’ll view all the outputs generated so far (in this section and the previous ones (Building inputs and analysis without clipping, Sigma clipping and MAD clipping):

$ grep '^clip_\|^profsum' script.sh
profsum=1e5
clip_operator=madclip-fill
clip_multiple=4.48
clip_tolerance=0.01

$ bash ./script

$ $  grep '^clip_\|^profsum' script.sh
profsum=1e5
clip_operator=sigclip-fill
clip_multiple=3
clip_tolerance=0.1

$ bash ./script

$ imgs=""
$ for m in std mean median mad number; do \
   imgs="$imgs $p-$m.fits $p-sigclip-$m.fits $p-sigclip-fill-$m.fits" \
   imgs="$p-madclip-$m.fits $p-madclip-fill-$m.fits"; \
  done
$ astscript-fits-view $imgs --ds9extra="-tile grid layout 5 6" \
                            --ds9extra="-scale limits 1 9" \
                            --ds9extra="-frame move back -scale limits 1 9" \
                            --ds9extra="-frame move back -scale limits 1 9" \
                            --ds9extra="-frame move back -scale limits 1 9" \
                            --ds9extra="-frame move back -scale limits 1 9"

The last column (belonging to the madclip-fill-* operators) is finally free of the outlying circle (that was present in only one of nine inputs). The filling operation did not affect the sigclip-fill-* operators (third column DS9 from the last command above)! The reason is clear from the bottom row (showing the number of images used in each pixel). The weak over density of clipped pixels over the circle is barely visible and was not strong enough for defining “holes” (that will be filled). On the contrary, when comparing the madclip-number.fits and madclip-fill-number.fits, the filled holes within the circle are clearly visible.

But the script also generated collapsed columns of build/in-9.fits (to a 1D table). In this case, for each column, the number of outliers increase as we enter the circle and reach a maximum in the middle of the image. Let’s have a look at those outputs:

$ astscript-fits-view build/collapsed-madclip-9.fits \
                      build/collapsed-madclip-fill-9.fits

When comparing the two in TOPCAT (following the same process described in Building inputs and analysis without clipping) you will notice that the difference is only in the edges of the circle. The 4.48 multiple of MAD-clipping (corresponding to 3 sigma), was not successful in removing the many outlying pixels due to the circle in the central pixels of the image.

This is a relatively high threshold and was used because for the images, we only had 9 elements in each clipping for every pixel. But for the collapsing, we have many more pixels in each vertical direction of the image (201 pixels). Let’s decrease the threshold to 3 and calculate the collapsed mean after MAD-clipping, once with filled re-clipping and once without it:

$ for m in mean number; do \
   for clip in madclip madclip-fill; do \
    astarithmetic build/in-9.fits 3 0.01 2 collapse-$clip-$m \
                  counter --writeall -ocollapse-$clip-$m.fits; \
   done; \
  done

The two loops above created four tables. First, with the command below, let’s look at the two measured mean values (one with filling and the other without it):

$ astscript-fits-view collapse-*-mean.fits

In the table without filled re-clipping, you see a small shift in the center of the image (around 100 in the horizontal axis). Let’s have a look at the final number of pixels used in each clipping:

$ astscript-fits-view collapse-*-number.fits

The difference is now clearly visible when you plot both in one “Plane plot” window. In the filled re-clipping case, we see a clear dip in the number of pixels that very nicely corresponds to the number of pixels associated to the circle. But the dip is much more noisy in the simple MAD-clipping.


3 Installation

The latest released version of Gnuastro source code is always available at the following URL:

http://ftpmirror.gnu.org/gnuastro/gnuastro-latest.tar.gz

Quick start describes the commands necessary to configure, build, and install Gnuastro on your system. This chapter will be useful in cases where the simple procedure above is not sufficient, for example, your system lacks a mandatory/optional dependency (in other words, you cannot pass the $ ./configure step), or you want greater customization, or you want to build and install Gnuastro from other random points in its history, or you want a higher level of control on the installation. Thus if you were happy with downloading the tarball and following Quick start, then you can safely ignore this chapter and come back to it in the future if you need more customization.

Dependencies describes the mandatory, optional and bootstrapping dependencies of Gnuastro. Only the first group are required/mandatory when you are building Gnuastro using a tarball (see Release tarball), they are very basic and low-level tools used in most astronomical software, so you might already have them installed, if not they are very easy to install as described for each. Downloading the source discusses the two methods you can obtain the source code: as a tarball (a significant snapshot in Gnuastro’s history), or the full history80. The latter allows you to build Gnuastro at any random point in its history (for example, to get bug fixes or new features that are not released as a tarball yet).

The building and installation of Gnuastro is heavily customizable, to learn more about them, see Build and install. This section is essentially a thorough explanation of the steps in Quick start. It discusses ways you can influence the building and installation. If you encounter any problems in the installation process, it is probably already explained in Known issues. In Other useful software the installation and usage of some other free software that are not directly required by Gnuastro but might be useful in conjunction with it is discussed.


3.1 Dependencies

A minimal set of dependencies are mandatory for building Gnuastro from the standard tarball release. If they are not present you cannot pass Gnuastro’s configuration step. The mandatory dependencies are therefore very basic (low-level) tools which are easy to obtain, build and install, see Mandatory dependencies for a full discussion.

If you have the packages of Optional dependencies, Gnuastro will have additional functionality (for example, converting FITS images to JPEG or PDF). If you are installing from a tarball as explained in Quick start, you can stop reading after this section. If you are cloning the version controlled source (see Version controlled source), an additional bootstrapping step is required before configuration and its dependencies are explained in Bootstrapping dependencies.

Your operating system’s package manager is an easy and convenient way to download and install the dependencies that are already pre-built for your operating system. In Dependencies from package managers, we will list some common operating system package manager commands to install the optional and mandatory dependencies.


3.1.1 Mandatory dependencies

The mandatory Gnuastro dependencies are very basic and low-level tools. They all follow the same basic GNU based build system (like that shown in Quick start), so even if you do not have them, installing them should be pretty straightforward. In this section we explain each program and any specific note that might be necessary in the installation.


3.1.1.1 GNU Scientific Library

The GNU Scientific Library, or GSL, is a large collection of functions that are very useful in scientific applications, for example, integration, random number generation, and Fast Fourier Transform among many others. To download and install GSL from source, you can run the following commands.

$ wget https://ftp.gnu.org/gnu/gsl/gsl-latest.tar.gz
$ tar -xf gsl-latest.tar.gz
$ cd gsl-X.X                     # Replace X.X with version number.
$ ./configure CFLAGS="$CFLAGS -g0 -O3"
$ make -j8                       # Replace 8 with no. CPU threads.
$ make check
$ sudo make install

3.1.1.2 CFITSIO

CFITSIO is the closest you can get to the pixels in a FITS image while remaining faithful to the FITS standard. It is written by William Pence, the principal author of the FITS standard81, and is regularly updated. Setting the definitions for all other software packages using FITS images.

Some GNU/Linux distributions have CFITSIO in their package managers, if it is available and updated, you can use it. One problem that might occur is that CFITSIO might not be configured with the --enable-reentrant option by the distribution. This option allows CFITSIO to open a file in multiple threads, it can thus provide great speed improvements. If CFITSIO was not configured with this option, any program which needs this capability will warn you and abort when you ask for multiple threads (see Multi-threaded operations).

To install CFITSIO from source, we strongly recommend that you have a look through Chapter 2 (Creating the CFITSIO library) of the CFITSIO manual and understand the options you can pass to $ ./configure (they are not too much). This is a very basic package for most astronomical software and it is best that you configure it nicely with your system. Once you download the source and unpack it, the following configure script should be enough for most purposes. Do not forget to read chapter two of the manual though, for example, the second option is only for 64bit systems. The manual also explains how to check if it has been installed correctly.

CFITSIO comes with two executable files called fpack and funpack. From their manual: they “are standalone programs for compressing and uncompressing images and tables that are stored in the FITS (Flexible Image Transport System) data format. They are analogous to the gzip and gunzip compression programs except that they are optimized for the types of astronomical images that are often stored in FITS format”. The commands below will compile and install them on your system along with CFITSIO. They are not essential for Gnuastro, since they are just wrappers for functions within CFITSIO, but they can come in handy. The make utils command is only available for versions above 3.39, it will build these executable files along with several other executable test files which are deleted in the following commands before the installation (otherwise the test files will also be installed).

The commands necessary to download the source, decompress, build and install CFITSIO from source are described below.

$ urlbase=http://heasarc.gsfc.nasa.gov/FTP/software/fitsio/c
$ wget $urlbase/cfitsio_latest.tar.gz
$ tar -xf cfitsio_latest.tar.gz
$ cd cfitsio-X.XX                   # Replace X.XX with version
$ ./configure --prefix=/usr/local --enable-sse2 --enable-reentrant \
              CFLAGS="$CFLAGS -g0 -O3"
$ make
$ make utils
$ ./testprog > testprog.lis         # See below if this has an error
$ diff testprog.lis testprog.out    # Should have no output
$ cmp testprog.fit testprog.std     # Should have no output
$ rm cookbook fitscopy imcopy smem speed testprog
$ sudo make install

In the ./testprog > testprog.lis step, you may confront an error, complaining that it cannot find libcfitsio.so.AAA (where AAA is an integer). This is the library that you just built and have not yet installed. But unfortunately some versions of CFITSIO do not account for this on some OSs. To fix the problem, you need to tell your OS to also look into current CFITSIO build directory with the first command below, afterwards, the problematic command (second below) should run properly.

$ export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH"
$ ./testprog > testprog.lis

Recall that the modification above is ONLY NECESSARY FOR THIS STEP. Do not put the LD_LIBRARY_PATH modification command in a permanent place (like your bash startup file). After installing CFITSIO, close your terminal and continue working on a new terminal (so LD_LIBRARY_PATH has its default value). For more on LD_LIBRARY_PATH, see Installation directory.


3.1.1.3 WCSLIB

WCSLIB is written and maintained by one of the authors of the World Coordinate System (WCS) definition in the FITS standard82, Mark Calabretta. It might be already built and ready in your distribution’s package management system. However, here the installation from source is explained, for the advantages of installation from source please see Mandatory dependencies. To install WCSLIB you will need to have CFITSIO already installed, see CFITSIO.

WCSLIB also has plotting capabilities which use PGPLOT (a plotting library for C). If you wan to use those capabilities in WCSLIB, PGPLOT provides the PGPLOT installation instructions. However PGPLOT is old83, so its installation is not easy, there are also many great modern WCS plotting tools (mostly in written in Python). Hence, if you will not be using those plotting functions in WCSLIB, you can configure it with the --without-pgplot option as shown below.

If you have the cURL library 84 on your system and you installed CFITSIO version 3.42 or later, you will need to also link with the cURL library at configure time (through the -lcurl option as shown below). CFITSIO uses the cURL library for its HTTPS (or HTTP Secure85) support and if it is present on your system, CFITSIO will depend on it. Therefore, if ./configure command below fails (you do not have the cURL library), then remove this option and rerun it.

To download, configure, build, check and install WCSLIB from source, you can follow the steps below.

## Download and unpack the source tarball
$ wget ftp://ftp.atnf.csiro.au/pub/software/wcslib/wcslib.tar.bz2
$ tar -xf wcslib.tar.bz2

## In the `cd' command, replace `X.X' with version number.
$ cd wcslib-X.X

## If `./configure' fails, remove `-lcurl' and run again.
$ ./configure LIBS="-pthread -lcurl -lm" --without-pgplot     \
              --disable-fortran CFLAGS="$CFLAGS -g0 -O3"
$ make
$ make check
$ sudo make install

3.1.2 Optional dependencies

The libraries listed here are only used for very specific applications, therefore they are optional and Gnuastro can be built without them (with only those specific features disabled). Since these are pretty low-level tools, they are not too hard to install from source, but you can also use your operating system’s package manager to easily install all of them. For more, see Dependencies from package managers.

If the ./configure script cannot find any of these optional dependencies, it will notify you of the operation(s) you cannot do due to not having them. If you continue the build and request an operation that uses a missing library, Gnuastro’s programs will warn that the optional library was missing at build-time and abort. Since Gnuastro was built without that library, installing the library afterwards will not help. The only way is to rebuild Gnuastro from scratch (after the library has been installed). However, for program dependencies (like cURL or Ghostscript) things are easier: you can install them after building Gnuastro also. This is because libraries are used to build the internal structure of Gnuastro’s executables. However, a program dependency is called by Gnuastro’s programs at run-time and has no effect on their internal structure. So if a dependency program becomes available later, it will be used next time it is requested.

GNU Libtool

Libtool is a program to simplify managing of the libraries to build an executable (a program). GNU Libtool has some added functionality compared to other implementations. If GNU Libtool is not present on your system at configuration time, a warning will be printed and BuildProgram will not be built or installed. The configure script will look into your search path (PATH) for GNU Libtool through the following executable names: libtool (acceptable only if it is the GNU implementation) or glibtool. See Installation directory for more on PATH.

GNU Libtool (the binary/executable file) is a low-level program that is probably already present on your system, and if not, is available in your operating system package manager86. If you want to install GNU Libtool’s latest version from source, please visit its web page.

Gnuastro’s tarball is shipped with an internal implementation of GNU Libtool. Even if you have GNU Libtool, Gnuastro’s internal implementation is used for the building and installation of Gnuastro. As a result, you can still build, install and use Gnuastro even if you do not have GNU Libtool installed on your system. However, this internal Libtool does not get installed. Therefore, after Gnuastro’s installation, if you want to use BuildProgram to compile and link your own C source code which uses the Gnuastro library, you need to have GNU Libtool available on your system (independent of Gnuastro). See Review of library fundamentals to learn more about libraries.

GNU Make extension headers

GNU Make is a workflow management system that can be used to run a series of commands in a specific order, and in parallel if you want. GNU Make offers special features to extend it with custom functions within a dynamic library. They are defined in the gnumake.h header. If gnumake.h can be found on your system at configuration time, Gnuastro will build a custom library that GNU Make can use for extended functionality in (astronomical) data analysis scenarios.

libgit2

Git is one of the most common version control systems (see Version controlled source). When libgit2 is present, and Gnuastro’s programs are run within a version controlled directory, outputs will contain the version number of the working directory’s repository for future reproducibility. See the COMMIT keyword header in Output FITS files for a discussion.

libjpeg

libjpeg is only used by ConvertType to read from and write to JPEG images, see Recognized file formats. libjpeg is a very basic library that provides tools to read and write JPEG images, most Unix-like graphic programs and libraries use it. Therefore you most probably already have it installed. libjpeg-turbo is an alternative to libjpeg. It uses Single instruction, multiple data (SIMD) instructions for ARM based systems that significantly decreases the processing time of JPEG compression and decompression algorithms.

libtiff

libtiff is used by ConvertType and the libraries to read TIFF images, see Recognized file formats. libtiff is a very basic library that provides tools to read and write TIFF images, most Unix-like operating system graphic programs and libraries use it. Therefore even if you do not have it installed, it must be easily available in your package manager.

cURL

cURL’s executable (curl) is called by Query for submitting queries to remote datasets and retrieving the results. It is not necessary for the build of Gnuastro from source (only a warning will be printed if it cannot be found at configure time), so if you do not have it at build-time there is no problem. Just be sure to have it when you run astquery, otherwise you’ll get an error about not finding curl.

GPL Ghostscript

GPL Ghostscript’s executable (gs) is called by ConvertType to compile a PDF file from a source PostScript file, see ConvertType. Therefore its headers (and libraries) are not needed.

Python3 with Numpy

Python is a high-level programming language and Numpy is the most commonly used library within Python to add multi-dimensional arrays and matrices. If you configure Gnuastro with --with-python and version 3 of Python is available with a corresponding Numpy Library, Gnuastro’s library will be built with some Python-related helper functions. Python wrappers for Gnuastro’s library (for example, ‘pyGnuastro’) can use these functions when being built from source. For more on Gnuastro’s Python helper functions, see Python interface (python.h).

This Python interface is only relevant if you want to build the Python wrappers (like ‘pyGnuastro’) from source. If you install the Gnuastro Python wrapper from a pre-built repository like PyPI, this feature of your Gnuastro library won’t be used. Pre-built libraries contain the full Gnuastro library that they need within them (you don’t even need to have Gnuastro at all!).

Can’t find the Python3 and Numpy of a virtual environment: make sure to set the $PYTHON variable to point to the python3 command of the virtual environment before running ./configure. Note that you don’t need to activate the virtual env, just point PYTHON to its Python3 executable, like the example below:

$ python3 -m venv test-env    # Setting up the virtual env.
$ export PYTHON="$(pwd)/test-env/bin/python3"
$ ./configure                 # Gnuastro's configure script.
SAO DS9

SAO DS9 (ds9) is a visualization tool for FITS images. Gnuastro’s astscript-fits-view program calls DS9 to visualize FITS images. We have a full appendix on it and how to install it in SAO DS9. Since it is a run-time dependency, it can be installed at any later time (after building and installing Gnuastro).

TOPCAT

TOPCAT (topcat) is a visualization tool for astronomical tables (most commonly: plotting). Gnuastro’s astscript-fits-view program calls TOPCAT it to visualize tables. We have a full appendix on it and how to install it in TOPCAT. Since it is a run-time dependency, it can be installed at any later time (after building and installing Gnuastro).


3.1.3 Bootstrapping dependencies

Bootstrapping is only necessary if you have decided to obtain the full version controlled history of Gnuastro, see Version controlled source and Bootstrapping. Using the version controlled source enables you to always be up to date with the most recent development work of Gnuastro (bug fixes, new functionalities, improved algorithms, etc.). If you have downloaded a tarball (see Downloading the source), then you can ignore this subsection.

To successfully run the bootstrapping process, there are some additional dependencies to those discussed in the previous subsections. These are low level tools that are used by a large collection of Unix-like operating systems programs, therefore they are most probably already available in your system. If they are not already installed, you should be able to easily find them in any GNU/Linux distribution package management system (apt-get, yum, pacman, etc.). The short names in parenthesis in typewriter font after the package name can be used to search for them in your package manager. For the GNU Portability Library, GNU Autoconf Archive and TeX Live, it is recommended to use the instructions here, not your operating system’s package manager.

GNU Portability Library (Gnulib)

To ensure portability for a wider range of operating systems (those that do not include GNU C library, namely glibc), Gnuastro depends on the GNU portability library, or Gnulib. Gnulib keeps a copy of all the functions in glibc, implemented (as much as possible) to be portable to other operating systems. The bootstrap script can automatically clone Gnulib (as a gnulib/ directory inside Gnuastro), however, as described in Bootstrapping this is not recommended.

The recommended way to bootstrap Gnuastro is to first clone Gnulib and the Autoconf archives (see below) into a local directory outside of Gnuastro. Let’s call it DEVDIR87 (which you can set to any directory; preferentially where you keep your other development projects). Currently in Gnuastro, both Gnulib and Autoconf archives have to be cloned in the same top directory88 like the case here89:

$ DEVDIR=/home/yourname/Development  ## Select any location.
$ mkdir $DEVDIR                      ## If it doesn't exist!
$ cd $DEVDIR
$ git clone https://git.sv.gnu.org/git/gnulib.git
$ git clone https://git.sv.gnu.org/git/autoconf-archive.git

Gnulib is a source-based dependency of Gnuastro’s bootstrapping process, so simply having it is enough on your computer, there is no need to install, and thus check anything.

You now have the full version controlled source of these two repositories in separate directories. Both these packages are regularly updated, so every once in a while, you can run $ git pull within them to get any possible updates.

GNU Automake (automake)

GNU Automake will build the Makefile.in files in each sub-directory using the (hand-written) Makefile.am files. The Makefile.ins are subsequently used to generate the Makefiles when the user runs ./configure before building.

To check that you have a working GNU Automake in your system, you can try this command:

$ automake --version
GNU Autoconf (autoconf)

GNU Autoconf will build the configure script using the configurations we have defined (hand-written) in configure.ac.

To check that you have a working GNU Autoconf in your system, you can try this command:

$ autoconf --version
GNU Autoconf Archive

These are a large collection of tests that can be called to run at ./configure time. See the explanation under GNU Portability Library (Gnulib) above for instructions on obtaining it and keeping it up to date.

GNU Autoconf Archive is a source-based dependency of Gnuastro’s bootstrapping process, so simply having it is enough on your computer, there is no need to install, and thus check anything. Just do not forget that it has to be in the same directory as Gnulib (described above).

GNU Texinfo (texinfo)

GNU Texinfo is the tool that formats this manual into the various output formats. To bootstrap Gnuastro you need all of Texinfo’s command-line programs. However, some operating systems package them separately, for example, in Fedora, makeinfo is packaged in the texinfo-tex package.

To check that you have a working GNU Texinfo in your system, you can try this command:

$ makeinfo --version
GNU Libtool (libtool)

GNU Libtool is in charge of building all the libraries in Gnuastro. The libraries contain functions that are used by more than one program and are installed for use in other programs. They are thus put in a separate directory (lib/).

To check that you have a working GNU Libtool in your system, you can try this command (and from the output, make sure it is GNU’s libtool)

$ libtool --version
GNU help2man (help2man)

GNU help2man is used to convert the output of the --help option (--help) to the traditional Man page (Man pages).

To check that you have a working GNU Help2man in your system, you can try this command:

$ help2man --version
LaTeX and some TeX packages

Some of the figures in this book are built by LaTeX (using the PGF/TikZ package). The LaTeX source for those figures is version controlled for easy maintenance not the actual figures. So the ./boostrap script will run LaTeX to build the figures. The best way to install LaTeX and all the necessary packages is through TeX live which is a package manager for TeX related tools that is independent of any operating system. It is thus preferred to the TeX Live versions distributed by your operating system.

To install TeX Live, go to the web page and download the appropriate installer by following the “download” link. Note that by default the full package repository will be downloaded and installed (around 4 Gigabytes) which can take very long to download and to update later. However, most packages are not needed by everyone, it is easier, faster and better to install only the “Basic scheme” (consisting of only the most basic TeX and LaTeX packages, which is less than 200 Mega bytes)90.

After the installation, be sure to set the environment variables as suggested in the end of the outputs. Any time you confront (need) a package you do not have, simply install it with a command like below (similar to how you install software from your operating system’s package manager)91. To install all the necessary TeX packages for a successful Gnuastro bootstrap, run this command:

$ sudo su
# tlmgr install epsf jknapltx caption biblatex biber iftex \
                etoolbox logreq xstring xkeyval pgf ms     \
                xcolor pgfplots times rsfs ps2eps epspdf

To check that you have a working LaTeX executable in your system, you can try this command (this just checks if LaTeX exists, as described above, if you have a missing package, you can easily identify it from the output and install it with tlmgr):

$ latex --version
ImageMagick (imagemagick)

ImageMagick is a wonderful and robust program for image manipulation on the command-line. bootstrap uses it to convert the book images into the formats necessary for the various book formats.

Since ImageMagick version 7, it is necessary to edit the policy file (/etc/ImageMagick-7/policy.xml) to have the following line (it maybe present, but commented, in this case un-comment it):

<policy domain="coder" rights="read|write" pattern="{PS,PDF,XPS}"/>

If the following line is present, it is also necessary to comment/remove it.

<policy domain="delegate" rights="none" pattern="gs" />

To learn more about the ImageMagick security policy please see: https://imagemagick.org/script/security-policy.php.

To check that you have a working ImageMagick in your system, you can try this command:

$ convert --version

3.1.4 Dependencies from package managers

The most basic way to install a package on your system is to build the packages from source yourself. Alternatively, you can use your operating system’s package manager to download pre-compiled files and install them. The latter choice is easier and faster. However, we recommend that you build the Mandatory dependencies yourself from source (all necessary commands and links are given in the respective section). Here are some basic reasons behind this recommendation.

  1. Your operating system’s pre-built software might not be the most recent release. For example, Gnuastro itself is also packaged in some package managers. For the list see: https://repology.org/project/gnuastro/versions. You will notice that Gnuastro’s version in some operating systems is more than 10 versions old! It is the same for all the dependencies of Gnuastro.
  2. For each package, Gnuastro might preform better (or require) certain configuration options that your distribution’s package managers did not add for you. If present, these configuration options are explained during the installation of each in the sections below (for example, in CFITSIO). When the proper configuration has not been set, the programs should complain and inform you.
  3. For the libraries, they might separate the binary file from the header files which can cause confusion, see Known issues.
  4. Like any other tool, the science you derive from Gnuastro’s tools highly depend on these lower level dependencies, so generally it is much better to have a close connection with them. By reading their manuals, installing them and staying up to date with changes/bugs in them, your scientific results and understanding (of what is going on, and thus how you interpret your scientific results) will also correspondingly improve.

Based on your package manager, you can use any of the following commands to install the mandatory and optional dependencies. If your package manager is not included in the list below, please send us the respective command, so we add it. For better archivability and compression ratios, Gnuastro’s recommended tarball compression format is with the Lzip program, see Release tarball. Therefore, the package manager commands below also contain Lzip.

apt-get (Debian-based OSs: Debian, Ubuntu, Linux Mint, etc.)

Debian is one of the oldest GNU/Linux distributions92. It thus has a very extended user community and a robust internal structure and standards. All of it is free software and based on the work of volunteers around the world. Many distributions are thus derived from it, for example, Ubuntu and Linux Mint. This arguably makes Debian-based OSs the largest, and most used, class of GNU/Linux distributions. All of them use Debian’s Advanced Packaging Tool (APT, for example, apt-get) for managing packages.

Development features (Ubuntu or derivatives)

By default, a newly installed Ubuntu does not contain the low-level tools that are necessary for building a software from source. Therefore, if you are using Ubuntu, please run the following command.

$ sudo apt-get install gcc make zlib1g-dev lzip
Mandatory dependencies

Without these, Gnuastro cannot be built, they are necessary for input/output and low-level mathematics (see Mandatory dependencies)!

$ sudo apt-get install libgsl-dev libcfitsio-dev \
                       wcslib-dev
Optional dependencies

If present, these libraries can be used in Gnuastro’s build for extra features, see Optional dependencies.

$ sudo apt-get install ghostscript libtool-bin \
                       libjpeg-dev libtiff-dev \
                       libgit2-dev curl
Programs to view FITS images or tables

These are not used in Gnuastro’s build. They can just help in viewing the inputs/outputs independent of Gnuastro!

$ sudo apt-get install saods9 topcat

Gnuastro is packaged in Debian (and thus some of its derivate operating systems). Just make sure it is the most recent version.

dnf
yum (Red Hat-based OSs: Red Hat, Fedora, CentOS, Scientific Linux, etc.)

Red Hat Enterprise Linux (RHEL) is released by Red Hat Inc. RHEL requires paid subscriptions for use of its binaries and support. But since it is free software, many other teams use its code to spin-off their own distributions based on RHEL. Red Hat-based GNU/Linux distributions initially used the “Yellowdog Updated, Modifier” (YUM) package manager, which has been replaced by “Dandified yum” (DNF). If the latter is not available on your system, you can use yum instead of dnf in the command below.

Mandatory dependencies

Without these, Gnuastro cannot be built, they are necessary for input/output and low-level mathematics (see Mandatory dependencies)!

$ sudo dnf install gsl-devel cfitsio-devel \
                   wcslib-devel
Optional dependencies

If present, these libraries can be used in Gnuastro’s build for extra features, see Optional dependencies.

$ sudo dnf install ghostscript libtool \
                   libjpeg-devel libtiff-devel \
                   libgit2-devel lzip curl
Programs to view FITS images or tables

These are not used in Gnuastro’s build. They can just help in viewing the inputs/outputs independent of Gnuastro!

$ sudo dnf install saods9 topcat
brew (macOS)

macOS is the operating system used on Apple devices. macOS does not come with a package manager pre-installed, but several widely used, third-party package managers exist, such as Homebrew or MacPorts. Both are free software. Currently we have only tested Gnuastro’s installation with Homebrew as described below. If not already installed, first obtain Homebrew by following the instructions at https://brew.sh.

Mandatory dependencies

Without these, Gnuastro cannot be built, they are necessary for input/output and low-level mathematics (see Mandatory dependencies)!

Homebrew manages packages in different ‘taps’. To install WCSLIB via Homebrew you will need to tap into brewsci/science first (the tap may change in the future, but can be found by calling brew search wcslib).

$ brew tap brewsci/science
$ brew install wcslib gsl cfitsio
Optional dependencies

If present, these libraries can be used in Gnuastro’s build for extra features, see Optional dependencies.

$ brew install ghostscript libtool libjpeg \
               libtiff libgit2 curl lzip
Programs to view FITS images or tables

These are not used in Gnuastro’s build. They can just help in viewing the inputs/outputs independent of Gnuastro!

$ brew install saoimageds9 topcat
pacman (Arch Linux)

Arch Linux is a smaller GNU/Linux distribution, which follows the KISS principle (“keep it simple, stupid”) as a general guideline. It “focuses on elegance, code correctness, minimalism and simplicity, and expects the user to be willing to make some effort to understand the system’s operation”. Arch GNU/Linux uses “Package manager” (Pacman) to manage its packages/components.

Mandatory dependencies

Without these, Gnuastro cannot be built, they are necessary for input/output and low-level mathematics (see Mandatory dependencies)!

$ sudo pacman -S gsl cfitsio wcslib
Optional dependencies

If present, these libraries can be used in Gnuastro’s build for extra features, see Optional dependencies.

$ sudo pacman -S ghostscript libtool libjpeg \
                 libtiff libgit2 curl lzip
Programs to view FITS images or tables

These are not used in Gnuastro’s build. They can just help in viewing the inputs/outputs independent of Gnuastro!

SAO DS9 and TOPCAT are not available in the standard Arch GNU/Linux repositories. However, installing and using both is very easy from their own web pages, as described in SAO DS9 and TOPCAT.

zypper (openSUSE and SUSE Linux Enterprise Server)

SUSE Linux Enterprise Server93 (SLES) is the commercial offering which shares code and tools. Many additional packages are offered in the Build Service94. openSUSE and SLES use zypper (cli) and YaST (GUI) for managing repositories and packages.

Configuration

When building Gnuastro, run the configure script with the following CPPFLAGS environment variable:

$ ./configure CPPFLAGS="-I/usr/include/cfitsio"
Mandatory dependencies

Without these, Gnuastro cannot be built, they are necessary for input/output and low-level mathematics (see Mandatory dependencies)!

$ sudo zypper install gsl-devel cfitsio-devel \
                      wcslib-devel
Optional dependencies

If present, these libraries can be used in Gnuastro’s build for extra features, see Optional dependencies.

$ sudo zypper install ghostscript_any libtool \
                      pkgconfig libcurl-devel \
                      libgit2-devel \
                      libjpeg62-devel \
                      libtiff-devel curl
Programs to view FITS images or tables

These are not used in Gnuastro’s build. They can just help in viewing the inputs/outputs independent of Gnuastro!

$ sudo zypper install ds9 topcat

Usually, when libraries are installed by operating system package managers, there should be no problems when configuring and building other programs from source (that depend on the libraries: Gnuastro in this case). However, in some special conditions, problems may pop-up during the configuration, building, or checking/running any of Gnuastro’s programs. The most common of such problems and their solution are discussed below.

Not finding library during configuration: If a library is installed, but during Gnuastro’s configure step the library is not found, then configure Gnuastro like the command below (correcting /path/to/lib). For more, see Known issues and Installation directory.

$ ./configure LDFLAGS="-L/path/to/lib"

Not finding header (.h) files while building: If a library is installed, but during Gnuastro’s make step, the library’s header (file with a .h suffix) is not found, then configure Gnuastro like the command below (correcting /path/to/include). For more, see Known issues and Installation directory.

$ ./configure CPPFLAGS="-I/path/to/include"

Gnuastro’s programs do not run during check or after install: If a library is installed, but the programs do not run due to linking problems, set the LD_LIBRARY_PATH variable like below (assuming Gnuastro is installed in /path/to/installed). For more, see Known issues and Installation directory.

$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/path/to/installed/lib"

3.2 Downloading the source

Gnuastro’s source code can be downloaded in two ways. As a tarball, ready to be configured and installed on your system (as described in Quick start), see Release tarball. If you want official releases of stable versions this is the best, easiest and most common option. Alternatively, you can clone the version controlled history of Gnuastro, run one extra bootstrapping step and then follow the same steps as the tarball. This will give you access to all the most recent work that will be included in the next release along with the full project history. The process is thoroughly introduced in Version controlled source.


3.2.1 Release tarball

A release tarball (commonly compressed) is the most common way of obtaining free and open source software. A tarball is a snapshot of one particular moment in the Gnuastro development history along with all the necessary files to configure, build, and install Gnuastro easily (see Quick start). It is very straightforward and needs the least set of dependencies (see Mandatory dependencies). Gnuastro has tarballs for official stable releases and pre-releases for testing. See Version numbering for more on the two types of releases and the formats of the version numbers. The URLs for each type of release are given below.

Official stable releases (http://ftp.gnu.org/gnu/gnuastro):

This URL hosts the official stable releases of Gnuastro. Always use the most recent version (see Version numbering). By clicking on the “Last modified” title of the second column, the files will be sorted by their date which you can also use to find the latest version. It is recommended to use a mirror to download these tarballs, please visit http://ftpmirror.gnu.org/gnuastro/ and see below.

Pre-release tarballs (http://alpha.gnu.org/gnu/gnuastro):

This URL contains unofficial pre-release versions of Gnuastro. The pre-release versions of Gnuastro here are for enthusiasts to try out before an official release. If there are problems, or bugs then the testers will inform the developers to fix before the next official release. See Version numbering to understand how the version numbers here are formatted. If you want to remain even more up-to-date with the developing activities, please clone the version controlled source as described in Version controlled source.

Gnuastro’s official/stable tarball is released with two formats: Gzip (with suffix .tar.gz) and Lzip (with suffix .tar.lz). The pre-release tarballs (after version 0.3) are released only as an Lzip tarball. Gzip is a very well-known and widely used compression program created by GNU and available in most systems. However, Lzip provides a better compression ratio and more robust archival capacity. For example, Gnuastro 0.3’s tarball was 2.9MB and 4.3MB with Lzip and Gzip respectively, see the Lzip web page for more. Lzip might not be pre-installed in your operating system, if so, installing it from your operating system’s package manager or from source is very easy and fast (it is a very small program).

The GNU FTP server is mirrored (has backups) in various locations on the globe (http://www.gnu.org/order/ftp.html). You can use the closest mirror to your location for a more faster download. Note that only some mirrors keep track of the pre-release (alpha) tarballs. Also note that if you want to download immediately after and announcement (see Announcements), the mirrors might need some time to synchronize with the main GNU FTP server.


3.2.2 Version controlled source

The publicly distributed Gnuastro tarball (for example, gnuastro-X.X.tar.gz) does not contain the revision history, it is only a snapshot of the source code at one significant instant of Gnuastro’s history (specified by the version number, see Version numbering), ready to be configured and built. To be able to develop successfully, the revision history of the code can be very useful to track when something was added or changed, also some updates that are not yet officially released might be in it.

We use Git for the version control of Gnuastro. For those who are not familiar with it, we recommend the ProGit book. The whole book is publicly available for online reading and downloading and does a wonderful job at explaining the concepts and best practices.

Let’s assume you want to keep Gnuastro in the TOPGNUASTRO directory (can be any directory, change the value below). The full version controlled history of Gnuastro can be cloned in TOPGNUASTRO/gnuastro by running the following commands95:

$ TOPGNUASTRO=/home/yourname/Research/projects/
$ cd $TOPGNUASTRO
$ git clone git://git.sv.gnu.org/gnuastro.git

The $TOPGNUASTRO/gnuastro directory will contain hand-written (version controlled) source code for Gnuastro’s programs, libraries, this book and the tests. All are divided into sub-directories with standard and very descriptive names. The version controlled files in the top cloned directory are either mainly in capital letters (for example, THANKS and README) or mainly written in small-caps (for example, configure.ac and Makefile.am). The former are non-programming, standard writing for human readers containing high-level information about the whole package. The latter are instructions to customize the GNU build system for Gnuastro. For more on Gnuastro’s source code structure, please see Developing. We will not go any deeper here.

The cloned Gnuastro source cannot immediately be configured, compiled, or installed since it only contains hand-written files, not automatically generated or imported files which do all the hard work of the build process. See Bootstrapping for the process of generating and importing those files (it is not too hard!). Once you have bootstrapped Gnuastro, you can run the standard procedures (in Quick start). Very soon after you have cloned it, Gnuastro’s main master branch will be updated on the main repository (since the developers are actively working on Gnuastro), for the best practices in keeping your local history in sync with the main repository see Synchronizing.


3.2.2.1 Bootstrapping

The version controlled source code lacks the source files that we have not written or are automatically built. These automatically generated files are included in the distributed tarball for each distribution (for example, gnuastro-X.X.tar.gz, see Version numbering) and make it easy to immediately configure, build, and install Gnuastro. However from the perspective of version control, they are just bloatware and sources of confusion (since they are not changed by Gnuastro developers).

The process of automatically building and importing necessary files into the cloned directory is known as bootstrapping. After bootstrapping is done you are ready to follow the default GNU build steps that you normally run on the tarball (./configure && make for example, described more in Quick start). Some known issues with bootstrapping may occur during the process, to see how to fix them, please see Known issues.

All the instructions for an automatic bootstrapping are available in bootstrap and configured using bootstrap.conf. bootstrap and COPYING (which contains the software copyright notice) are the only files not written by Gnuastro developers but under version control to enable simple bootstrapping and legal information on usage immediately after cloning. bootstrap.conf is maintained by the GNU Portability Library (Gnulib) and this file is an identical copy, so do not make any changes in this file since it will be replaced when Gnulib releases an update. Make all your changes in bootstrap.conf.

The bootstrapping process has its own separate set of dependencies, the full list is given in Bootstrapping dependencies. They are generally very low-level and used by a very large set of commonly used programs, so they are probably already installed on your system. The simplest way to bootstrap Gnuastro is to simply run the bootstrap script within your cloned Gnuastro directory as shown below. However, please read the next paragraph before doing so (see Version controlled source for TOPGNUASTRO).

$ cd TOPGNUASTRO/gnuastro
$ ./bootstrap                      # Requires internet connection

Without any options, bootstrap will clone Gnulib within your cloned Gnuastro directory (TOPGNUASTRO/gnuastro/gnulib) and download the necessary Autoconf archives macros. So if you run bootstrap like this, you will need an internet connection every time you decide to bootstrap. Also, Gnulib is a large package and cloning it can be slow. It will also keep the full Gnulib repository within your Gnuastro repository, so if another one of your projects also needs Gnulib, and you insist on running bootstrap like this, you will have two copies. In case you regularly backup your important files, Gnulib will also slow down the backup process. Therefore while the simple invocation above can be used with no problem, it is not recommended. To do better, see the next paragraph.

The recommended way to get these two packages is thoroughly discussed in Bootstrapping dependencies (in short: clone them in the separate DEVDIR/ directory). The following commands will take you into the cloned Gnuastro directory and run the bootstrap script, while telling it to copy some files (instead of making symbolic links, with the --copy option, this is not mandatory96) and where to look for Gnulib (with the --gnulib-srcdir option). Please note that the address given to --gnulib-srcdir has to be an absolute address (so do not use ~ or ../ for example).

$ cd $TOPGNUASTRO/gnuastro
$ ./bootstrap --copy --gnulib-srcdir=$DEVDIR/gnulib

Since Gnulib and Autoconf archives are now available in your local directories, you do not need an internet connection every time you decide to remove all un-tracked files and redo the bootstrap (see box below). You can also use the same command on any other project that uses Gnulib. All the necessary GNU C library functions, Autoconf macros and Automake inputs are now available along with the book figures. The standard GNU build system (Quick start) will do the rest of the job.

Undoing the bootstrap: During the development, it might happen that you want to remove all the automatically generated and imported files. In other words, you might want to reverse the bootstrap process. Fortunately Git has a good program for this job: git clean. Run the following command and every file that is not version controlled will be removed.

git clean -fxd

It is best to commit any recent change before running this command. You might have created new files since the last commit and if they have not been committed, they will all be gone forever (using rm). To get a list of the non-version controlled files instead of deleting them, add the n option to git clean, so it becomes -fxdn.

Besides the bootstrap and bootstrap.conf, the bootstrapped/ directory and README-hacking file are also related to the bootstrapping process. The former hosts all the imported (bootstrapped) directories. Thus, in the version controlled source, it only contains a README file, but in the distributed tarball it also contains sub-directories filled with all bootstrapped files. README-hacking contains a summary of the bootstrapping process discussed in this section. It is a necessary reference when you have not built this book yet. It is thus not distributed in the Gnuastro tarball.


3.2.2.2 Synchronizing

The bootstrapping script (see Bootstrapping) is not regularly needed: you mainly need it after you have cloned Gnuastro (once) and whenever you want to re-import the files from Gnulib, or Autoconf archives97 (not too common). However, Gnuastro developers are constantly working on Gnuastro and are pushing their changes to the official repository. Therefore, your local Gnuastro clone will soon be out-dated. Gnuastro has two mailing lists dedicated to its developing activities (see Developing mailing lists). Subscribing to them can help you decide when to synchronize with the official repository.

To pull all the most recent work in Gnuastro, run the following command from the top Gnuastro directory. If you do not already have a built system, ignore make distclean. The separate steps are described in detail afterwards.

$ make distclean && git pull && autoreconf -f

You can also run the commands separately:

$ make distclean
$ git pull
$ autoreconf -f

If Gnuastro was already built in this directory, you do not want some outputs from the previous version being mixed with outputs from the newly pulled work. Therefore, the first step is to clean/delete all the built files with make distclean. Fortunately the GNU build system allows the separation of source and built files (in separate directories). This is a great feature to keep your source directory clean and you can use it to avoid the cleaning step. Gnuastro comes with a script with some useful options for this job. It is useful if you regularly pull recent changes, see Separate build and source directories.

After the pull, we must re-configure Gnuastro with autoreconf -f (part of GNU Autoconf). It will update the ./configure script and all the Makefile.in98 files based on the hand-written configurations (in configure.ac and the Makefile.am files). After running autoreconf -f, a warning about TEXI2DVI might show up, you can ignore that.

The most important reason for rebuilding Gnuastro’s build system is to generate/update the version number for your updated Gnuastro snapshot. This generated version number will include the commit information (see Version numbering). The version number is included in nearly all outputs of Gnuastro’s programs, therefore it is vital for reproducing an old result.

As a summary, be sure to run ‘autoreconf -f’ after every change in the Git history. This includes synchronization with the main server or even a commit you have made yourself.

If you would like to see what has changed since you last synchronized your local clone, you can take the following steps instead of the simple command above (do not type anything after #):

$ git checkout master             # Confirm if you are on master.
$ git fetch origin                # Fetch all new commits from server.
$ git log master..origin/master   # See all the new commit messages.
$ git merge origin/master         # Update your master branch.
$ autoreconf -f                   # Update the build system.

By default git log prints the most recent commit first, add the --reverse option to see the changes chronologically. To see exactly what has been changed in the source code along with the commit message, add a -p option to the git log.

If you want to make changes in the code, have a look at Developing to get started easily. Be sure to commit your changes in a separate branch (keep your master branch to follow the official repository) and re-run autoreconf -f after the commit. If you intend to send your work to us, you can safely use your commit since it will be ultimately recorded in Gnuastro’s official history. If not, please upload your separate branch to a public hosting service, for example, Codeberg, and link to it in your report/paper. Alternatively, run make distcheck and upload the output gnuastro-X.X.X.XXXX.tar.gz to a publicly accessible web page so your results can be considered scientific (reproducible) later.


3.3 Build and install

This section is basically a longer explanation to the sequence of commands given in Quick start. If you did not have any problems during the Quick start steps, you want to have all the programs of Gnuastro installed in your system, you do not want to change the executable names during or after installation, you have root access to install the programs in the default system wide directory, the Letter paper size of the print book is fine for you or as a summary you do not feel like going into the details when everything is working, you can safely skip this section.

If you have any of the above problems or you want to understand the details for a better control over your build and install, read along. The dependencies which you will need prior to configuring, building and installing Gnuastro are explained in Dependencies. The first three steps in Quick start need no extra explanation, so we will skip them and start with an explanation of Gnuastro specific configuration options and a discussion on the installation directory in Configuring, followed by some smaller subsections: Tests, A4 print book, and Known issues which explains the solutions to known problems you might encounter in the installation steps and ways you can solve them.


3.3.1 Configuring

The $ ./configure step is the most important step in the build and install process. All the required packages, libraries, headers and environment variables are checked in this step. The behaviors of make and make install can also be set through command-line options to this command.

The configure script accepts various arguments and options which enable the final user to highly customize whatever she is building. The options to configure are generally very similar to normal program options explained in Arguments and options. Similar to all GNU programs, you can get a full list of the options along with a short explanation by running

$ ./configure --help

A complete explanation is also included in the INSTALL file. Note that this file was written by the authors of GNU Autoconf (which builds the configure script), therefore it is common for all programs which use the $ ./configure script for building and installing, not just Gnuastro. Here we only discuss cases where you do not have superuser access to the system and if you want to change the executable names. But before that, a review of the options to configure that are particular to Gnuastro are discussed.


3.3.1.1 Gnuastro configure options

Most of the options to configure (which are to do with building) are similar for every program which uses this script. Here the options that are particular to Gnuastro are discussed. The next topics explain the usage of other configure options which can be applied to any program using the GNU build system (through the configure script).

--enable-debug

Compile/build Gnuastro with debugging information, no optimization and without shared libraries.

In order to allow more efficient programs when using Gnuastro (after the installation), by default Gnuastro is built with a 3rd level (a very high level) optimization and no debugging information. By default, libraries are also built for static and shared linking (see Linking). However, when there are crashes or unexpected behavior, these three features can hinder the process of localizing the problem. This configuration option is identical to manually calling the configuration script with CFLAGS="-g -O0" --disable-shared.

In the (rare) situations where you need to do your debugging on the shared libraries, do not use this option. Instead run the configure script by explicitly setting CFLAGS like this:

$ ./configure CFLAGS="-g -O0"
--enable-check-with-valgrind

Do the make check tests through Valgrind. Therefore, if any crashes or memory-related issues (segmentation faults in particular) occur in the tests, the output of Valgrind will also be put in the tests/test-suite.log file without having to manually modify the check scripts. This option will also activate Gnuastro’s debug mode (see the --enable-debug configure-time option described above).

Valgrind is free software. It is a program for easy checking of memory-related issues in programs. It runs a program within its own controlled environment and can thus identify the exact line-number in the program’s source where a memory-related issue occurs. However, it can significantly slow-down the tests. So this option is only useful when a segmentation fault is found during make check.

--enable-progname

Only build and install progname along with any other program that is enabled in this fashion. progname is the name of the executable without the ast, for example, crop for Crop (with the executable name of astcrop).

Note that by default all the programs will be installed. This option (and the --disable-progname options) are only relevant when you do not want to install all the programs. Therefore, if this option is called for any of the programs in Gnuastro, any program which is not explicitly enabled will not be built or installed.

--disable-progname
--enable-progname=no

Do not build or install the program named progname. This is very similar to the --enable-progname, but will build and install all the other programs except this one.

Note: If some programs are enabled and some are disabled, it is equivalent to simply enabling those that were enabled. Listing the disabled programs is redundant.

--enable-gnulibcheck

Enable checks on the GNU Portability Library (Gnulib). Gnulib is used by Gnuastro to enable users of non-GNU based operating systems (that do not use GNU C library or glibc) to compile and use the advanced features that this library provides. We make extensive use of such functions. If you give this option to $ ./configure, when you run $ make check, first the functions in Gnulib will be tested, then the Gnuastro executables. If your operating system does not support glibc or has an older version of it and you have problems in the build process ($ make), you can give this flag to configure to see if the problem is caused by Gnulib not supporting your operating system or Gnuastro, see Known issues.

--disable-guide-message
--enable-guide-message=no

Do not print a guiding message during the GNU Build process of Quick start. By default, after each step, a message is printed guiding the user what the next command should be. Therefore, after ./configure, it will suggest running make. After make, it will suggest running make check and so on. If Gnuastro is configured with this option, for example

$ ./configure --disable-guide-message

Then these messages will not be printed after any step (like most programs). For people who are not yet fully accustomed to this build system, these guidelines can be very useful and encouraging. However, if you find those messages annoying, use this option.

--without-libgit2

Build Gnuastro without libgit2 (for including Git commit hashes in output files), see Optional dependencies. libgit2 is an optional dependency, with this option, Gnuastro will ignore any possibly existing libgit2 that may already be on the system.

--without-libjpeg

Build Gnuastro without libjpeg (for reading/writing to JPEG files), see Optional dependencies. libjpeg is an optional dependency, with this option, Gnuastro will ignore any possibly existing libjpeg that may already be on the system.

--without-libtiff

Build Gnuastro without libtiff (for reading/writing to TIFF files), see Optional dependencies. libtiff is an optional dependency, with this option, Gnuastro will ignore any possibly existing libtiff that may already be on the system.

--with-python

Build the Python interface within Gnuastro’s dynamic library. This interface can be used for easy communication with Python wrappers (for example, the pyGnuastro package).

When you install the pyGnuastro package from PyPI, the correct configuration of the Gnuastro Library is already packaged with it (with the Python interface) and that is independent of your Gnuastro installation. The Python interface is only necessary if you want to build pyGnuastro from source (which is only necessary for developers). Therefore it has to be explicitly activated at configure time with this option. For more on the interface functions, see Python interface (python.h).

The tests of some programs might depend on the outputs of the tests of other programs. For example, MakeProfiles is one the first programs to be tested when you run $ make check. MakeProfiles’ test outputs (FITS images) are inputs to many other programs (which in turn provide inputs for other programs). Therefore, if you do not install MakeProfiles for example, the tests for many the other programs will be skipped. To avoid this, in one run, you can install all the programs and run the tests but not install. If everything is working correctly, you can run configure again with only the programs you want. However, do not run the tests and directly install after building.


3.3.1.2 Installation directory

One of the most commonly used options to ./configure is --prefix, it is used to define the directory that will host all the installed files (or the “prefix” in their final absolute file name). For example, when you are using a server and you do not have administrator or root access. In this example scenario, if you do not use the --prefix option, you will not be able to install the built files and thus access them from anywhere without having to worry about where they are installed. However, once you prepare your startup file to look into the proper place (as discussed thoroughly below), you will be able to easily use this option and benefit from any software you want to install without having to ask the system administrators or install and use a different version of a software that is already installed on the server.

The most basic way to run an executable is to explicitly write its full file name (including all the directory information) and run it. One example is running the configuration script with the $ ./configure command (see Quick start). By giving a specific directory (the current directory or ./), we are explicitly telling the shell to look in the current directory for an executable file named ‘configure’. Directly specifying the directory is thus useful for executables in the current (or nearby) directories. However, when the program (an executable file) is to be used a lot, specifying all those directories will become a significant burden. For example, the ls executable lists the contents in a given directory and it is (usually) installed in the /usr/bin/ directory by the operating system maintainers. Therefore, if using the full address was the only way to access an executable, each time you wanted a listing of a directory, you would have to run the following command (which is very inconvenient, both in writing and in remembering the various directories).

$ /usr/bin/ls

To address this problem, we have the PATH environment variable. To understand it better, we will start with a short introduction to the shell variables. Shell variable values are basically treated as strings of characters. For example, it does not matter if the value is a name (string of alphabetic characters), or a number (string of numeric characters), or both. You can define a variable and a value for it by running

$ myvariable1=a_test_value
$ myvariable2="a test value"

As you see above, if the value contains white space characters, you have to put the whole value (including white space characters) in double quotes ("). You can see the value it represents by running

$ echo $myvariable1
$ echo $myvariable2

If a variable has no value or it was not defined, the last command will only print an empty line. A variable defined like this will be known as long as this shell or terminal is running. Other terminals will have no idea it existed. The main advantage of shell variables is that if they are exported99, subsequent programs that are run within that shell can access their value. So by changing their value, you can change the “environment” of a program which uses them. The shell variables which are accessed by programs are therefore known as “environment variables”100. You can see the full list of exported variables that your shell recognizes by running:

$ printenv

HOME is one commonly used environment variable, it is any user’s (the one that is logged in) top directory. Try finding it in the command above. It is used so often that the shell has a special expansion (alternative) for it: ‘~’. Whenever you see file names starting with the tilde sign, it actually represents the value to the HOME environment variable, so ~/doc is the same as $HOME/doc.

Another one of the most commonly used environment variables is PATH, it is a list of directories to search for executable names. Its value is a list of directories (separated by a colon, or ‘:’). When the address of the executable is not explicitly given (like ./configure above), the system will look for the executable in the directories specified by PATH. If you have a computer nearby, try running the following command to see which directories your system will look into when it is searching for executable (binary) files, one example is printed here (notice how /usr/bin, in the ls example above, is one of the directories in PATH):

$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/bin

By default PATH usually contains system-wide directories, which are readable (but not writable) by all users, like the above example. Therefore if you do not have root (or administrator) access, you need to add another directory to PATH which you actually have write access to. The standard directory where you can keep installed files (not just executables) for your own user is the ~/.local/ directory. The names of hidden files start with a ‘.’ (dot), so it will not show up in your common command-line listings, or on the graphical user interface. You can use any other directory, but this is the most recognized.

The top installation directory will be used to keep all the package’s components: programs (executables), libraries, include (header) files, shared data (like manuals), or configuration files (see Review of library fundamentals for a thorough introduction to headers and linking). So it commonly has some of the following sub-directories for each class of installed components respectively: bin/, lib/, include/ man/, share/, etc/. Since the PATH variable is only used for executables, you can add the ~/.local/bin directory (which keeps the executables/programs or more generally, “binary” files) to PATH with the following command. As defined below, first the existing value of PATH is used, then your given directory is added to its end and the combined value is put back in PATH (run ‘$ echo $PATH’ afterwards to check if it was added).

$ PATH=$PATH:~/.local/bin

Any executable that you installed in ~/.local/bin will now be usable without having to remember and write its full address. However, as soon as you leave/close your current terminal session, this modified PATH variable will be forgotten. Adding the directories which contain executables to the PATH environment variable each time you start a terminal is also very inconvenient and prone to errors. Fortunately, there are standard ‘startup files’ defined by your shell precisely for this (and other) purposes. There is a special startup file for every significant starting step:

/etc/profile and everything in /etc/profile.d/

These startup scripts are called when your whole system starts (for example, after you turn on your computer). Therefore you need administrator or root privileges to access or modify them.

~/.bash_profile

If you are using (GNU) Bash as your shell, the commands in this file are run, when you log in to your account through Bash. Most commonly when you login through the virtual console (where there is no graphic user interface).

~/.bashrc

If you are using (GNU) Bash as your shell, the commands here will be run each time you start a terminal and are already logged in. For example, when you open your terminal emulator in the graphic user interface.

For security reasons, it is highly recommended to directly type in your HOME directory value by hand in startup files instead of using variables. So in the following, let’s assume your user name is ‘name’ (so ~ may be replaced with /home/name). To add ~/.local/bin to your PATH automatically on any startup file, you have to “export” the new value of PATH in the startup file that is most relevant to you by adding this line:

export PATH=$PATH:/home/name/.local/bin

Now that you know your system will look into ~/.local/bin for executables, you can tell Gnuastro’s configure script to install everything in the top ~/.local directory using the --prefix option. When you subsequently run $ make install, all the install-able files will be put in their respective directory under ~/.local/ (the executables in ~/.local/bin, the compiled library files in ~/.local/lib, the library header files in ~/.local/include and so on, to learn more about these different files, please see Review of library fundamentals). Note that tilde (‘~’) expansion will not happen if you put a ‘=’ between --prefix and ~/.local101, so we have avoided the = character here which is optional in GNU-style options, see Options.

$ ./configure --prefix ~/.local

You can install everything (including libraries like GSL, CFITSIO, or WCSLIB which are Gnuastro’s mandatory dependencies, see Mandatory dependencies) locally by configuring them as above. However, recall that PATH is only for executable files, not libraries and that libraries can also depend on other libraries. For example, WCSLIB depends on CFITSIO and Gnuastro needs both. Therefore, when you installed a library in a non-recognized directory, you have to guide the program that depends on them to look into the necessary library and header file directories. To do that, you have to define the LDFLAGS and CPPFLAGS environment variables respectively. This can be done while calling ./configure as shown below:

$ ./configure LDFLAGS=-L/home/name/.local/lib            \
              CPPFLAGS=-I/home/name/.local/include       \
              --prefix ~/.local

It can be annoying/buggy to do this when configuring every software that depends on such libraries. Hence, you can define these two variables in the most relevant startup file (discussed above). The convention on using these variables does not include a colon to separate values (as PATH-like variables do). They use white space characters and each value is prefixed with a compiler option102. Note the -L and -I above (see Options), for -I see Headers, and for -L, see Linking. Therefore we have to keep the value in double quotation signs to keep the white space characters and adding the following two lines to the startup file of choice:

export LDFLAGS="$LDFLAGS -L/home/name/.local/lib"
export CPPFLAGS="$CPPFLAGS -I/home/name/.local/include"

Dynamic libraries are linked to the executable every time you run a program that depends on them (see Linking to fully understand this important concept). Hence dynamic libraries also require a special path variable called LD_LIBRARY_PATH (same formatting as PATH). To use programs that depend on these libraries, you need to add ~/.local/lib to your LD_LIBRARY_PATH environment variable by adding the following line to the relevant start-up file:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/name/.local/lib

If you also want to access the Info (see Info) and man pages (see Man pages) documentations add ~/.local/share/info and ~/.local/share/man to your INFOPATH103 and MANPATH environment variables respectively.

A final note is that order matters in the directories that are searched for all the variables discussed above. In the examples above, the new directory was added after the system specified directories. So if the program, library or manuals are found in the system wide directories, the user directory is no longer searched. If you want to search your local installation first, put the new directory before the already existing list, like the example below.

export LD_LIBRARY_PATH=/home/name/.local/lib:$LD_LIBRARY_PATH

This is good when a library, for example, CFITSIO, is already present on the system, but the system-wide install was not configured with the correct configuration flags (see CFITSIO), or you want to use a newer version and you do not have administrator or root access to update it on the whole system/server. If you update LD_LIBRARY_PATH by placing ~/.local/lib first (like above), the linker will first find the CFITSIO you installed for yourself and link with it. It thus will never reach the system-wide installation.

There are important security problems with using local installations first: all important system-wide executables and libraries (important executables like ls and cp, or libraries like the C library) can be replaced by non-secure versions with the same file names and put in the customized directory (~/.local in this example). So if you choose to search in your customized directory first, please be sure to keep it clean from executables or libraries with the same names as important system programs or libraries.

Summary: When you are using a server which does not give you administrator/root access AND you would like to give priority to your own built programs and libraries, not the version that is (possibly already) present on the server, add these lines to your startup file. See above for which startup file is best for your case and for a detailed explanation on each. Do not forget to replace ‘/YOUR-HOME-DIR’ with your home directory (for example, ‘/home/your-id’):

export PATH="/YOUR-HOME-DIR/.local/bin:$PATH"
export LDFLAGS="-L/YOUR-HOME-DIR/.local/lib $LDFLAGS"
export MANPATH="/YOUR-HOME-DIR/.local/share/man/:$MANPATH"
export CPPFLAGS="-I/YOUR-HOME-DIR/.local/include $CPPFLAGS"
export INFOPATH="/YOUR-HOME-DIR/.local/share/info/:$INFOPATH"
export LD_LIBRARY_PATH="/YOUR-HOME-DIR/.local/lib:$LD_LIBRARY_PATH"

Afterwards, you just need to add an extra --prefix=/YOUR-HOME-DIR/.local to the ./configure command of the software that you intend to install. Everything else will be the same as a standard build and install, see Quick start.


3.3.1.3 Executable names

At first sight, the names of the executables for each program might seem to be uncommonly long, for example, astnoisechisel or astcrop. We could have chosen terse (and cryptic) names like most programs do. We chose this complete naming convention (something like the commands in TeX) so you do not have to spend too much time remembering what the name of a specific program was. Such complete names also enable you to easily search for the programs.

To facilitate typing the names in, we suggest using the shell auto-complete. With this facility you can find the executable you want very easily. It is very similar to file name completion in the shell. For example, simply by typing the letters below (where [TAB] stands for the Tab key on your keyboard)

$ ast[TAB][TAB]

you will get the list of all the available executables that start with ast in your PATH environment variable directories. So, all the Gnuastro executables installed on your system will be listed. Typing the next letter for the specific program you want along with a Tab, will limit this list until you get to your desired program.

In case all of this does not convince you and you still want to type short names, some suggestions are given below. You should have in mind though, that if you are writing a shell script that you might want to pass on to others, it is best to use the standard name because other users might not have adopted the same customization. The long names also serve as a form of documentation in such scripts. A similar reasoning can be given for option names in scripts: it is good practice to always use the long formats of the options in shell scripts, see Options.

The simplest solution is making a symbolic link to the actual executable. For example, let’s assume you want to type ic to run Crop instead of astcrop. Assuming you installed Gnuastro executables in /usr/local/bin (default) you can do this simply by running the following command as root:

# ln -s /usr/local/bin/astcrop /usr/local/bin/ic

In case you update Gnuastro and a new version of Crop is installed, the default executable name is the same, so your custom symbolic link still works.

The installed executable names can also be set using options to $ ./configure, see Configuring. GNU Autoconf (which configures Gnuastro for your particular system), allows the builder to change the name of programs with the three options --program-prefix, --program-suffix and --program-transform-name. The first two are for adding a fixed prefix or suffix to all the programs that will be installed. This will actually make all the names longer! You can use it to add versions of program names to the programs in order to simultaneously have two executable versions of a program.

The third configure option allows you to set the executable name at install time using the SED program. SED is a very useful ‘stream editor’. There are various resources on the internet to use it effectively. However, we should caution that using configure options will change the actual executable name of the installed program and on every re-install (an update for example), you have to also add this option to keep the old executable name updated. Also note that the documentation or configuration files do not change from their standard names either.

For example, let’s assume that typing ast on every invocation of every program is really annoying you! You can remove this prefix from all the executables at configure time by adding this option:

$ ./configure --program-transform-name='s/ast/ /'

3.3.1.4 Configure and build in RAM

Gnuastro’s configure and build process (the GNU build system) involves the creation, reading, and modification of a large number of files (input/output, or I/O). Therefore file I/O issues can directly affect the work of developers who need to configure and build Gnuastro numerous times. Some of these issues are listed below:

  • I/O will cause wear and tear on both the HDDs (mechanical failures) and SSDs (decreasing the lifetime).
  • Having the built files mixed with the source files can greatly affect backing up (synchronization) of source files (since it involves the management of a large number of small files that are regularly changed. Backup software can of course be configured to ignore the built files and directories. However, since the built files are mixed with the source files and can have a large variety, this will require a high level of customization.

One solution to address both these problems is to use the tmpfs file system. Any file in tmpfs is actually stored in the RAM (and possibly SWAP), not on HDDs or SSDs. The RAM is built for extensive and fast I/O. Therefore the large number of file I/Os associated with configuring and building will not harm the HDDs or SSDs. Due to the volatile nature of RAM, files in the tmpfs file-system will be permanently lost after a power-off. Since all configured and built files are derivative files (not files that have been directly written by hand) there is no problem in this and this feature can be considered as an automatic cleanup.

The modern GNU C library (and thus the Linux kernel) defines the /dev/shm directory for this purpose in the RAM (POSIX shared memory). To build in it, you can use the GNU build system’s ability to build in a separate directory (not necessarily in the source directory) as shown below. Just set SRCDIR as the address of Gnuastro’s top source directory (for example, where there is the unpacked tarball).

$ SRCDIR=/home/username/gnuastro
$ mkdir /dev/shm/tmp-gnuastro-build
$ cd /dev/shm/tmp-gnuastro-build
$ $SRCDIR/configure --srcdir=$SRCDIR
$ make

Gnuastro comes with a script to simplify this process of configuring and building in a different directory (a “clean” build), for more see Separate build and source directories.


3.3.2 Separate build and source directories

The simple steps of Quick start will mix the source and built files. This can cause inconvenience for developers or enthusiasts following the most recent work (see Version controlled source). The current section is mainly focused on this later group of Gnuastro users. If you just install Gnuastro on major releases (following Announcements), you can safely ignore this section.

When it is necessary to keep the source (which is under version control), but not the derivative (built) files (after checking or installing), the best solution is to keep the source and the built files in separate directories. One application of this is already discussed in Configure and build in RAM.

To facilitate this process of configuring and building in a separate directory, Gnuastro comes with the developer-build script. It is available in the top source directory and is not installed. It will make a directory under a given top-level directory (given to --top-build-dir) and build Gnuastro there. It thus keeps the source completely separated from the built files. For easy access to the built files, it also makes a symbolic link to the built directory in the top source files called build.

When running the developer-build script without any options in the Gnuastro’s top source directory, default values will be used for its configuration. As with Gnuastro’s programs, you can inspect the default values with -P (or --printparams, the output just looks a little different here). The default top-level build directory is /dev/shm: the shared memory directory in RAM on GNU/Linux systems as described in Configure and build in RAM.

Besides these, it also has some features to facilitate the job of developers or bleeding edge users like the --debug option to do a fast build, with debug information, no optimization, and no shared libraries. Here is the full list of options you can feed to this script to configure its operations.

Not all Gnuastro’s common program behavior usable here: developer-build is just a non-installed script with a very limited scope as described above. It thus does not have all the common option behaviors or configuration files for example.

White space between option and value: developer-build does not accept an = sign between the options and their values. It also needs at least one character between the option and its value. Therefore -n 4 or --numthreads 4 are acceptable, while -n4, -n=4, or --numthreads=4 are not. Finally multiple short option names cannot be merged: for example, you can say -c -n 4, but unlike Gnuastro’s programs, -cn4 is not acceptable.

Reusable for other packages: This script can be used in any software which is configured and built using the GNU Build System. Just copy it in the top source directory of that software and run it from there.

Example usage: See Forking tutorial for an example usage of this script in some scenarios.

-b STR
--top-build-dir STR

The top build directory to make a directory for the build. If this option is not called, the top build directory is /dev/shm (only available in GNU/Linux operating systems, see Configure and build in RAM).

-V
--version

Print the version string of Gnuastro that will be used in the build. This string will be appended to the directory name containing the built files.

-a
--autoreconf

Run autoreconf -f before building the package. In Gnuastro, this is necessary when a new commit has been made to the project history. In Gnuastro’s build system, the Git description will be used as the version, see Version numbering and Synchronizing.

-c
--clean

Delete the contents of the build directory (clean it) before starting the configuration and building of this run.

This is useful when you have recently pulled changes from the main Git repository, or committed a change yourself and ran autoreconf -f, see Synchronizing. After running GNU Autoconf, the version will be updated and you need to do a clean build.

-d
--debug

Build with debugging flags (for example, to use in GNU Debugger, also known as GDB, or Valgrind), disable optimization and also the building of shared libraries. Similar to running the configure script of below

$ ./configure --enable-debug

Besides all the debugging advantages of building with this option, it will also be significantly speed up the build (at the cost of slower built programs). So when you are testing something small or working on the build system itself, it will be much faster to test your work with this option.

-v
--valgrind

Build all make check tests within Valgrind. For more, see the description of --enable-check-with-valgrind in Gnuastro configure options.

-j INT
--jobs INT

The maximum number of threads/jobs for Make to build at any moment. As the name suggests (Make has an identical option), the number given to this option is directly passed on to any call of Make with its -j option.

-C
--check

After finishing the build, also run make check. By default, make check is not run because the developer usually has their own checks to work on (for example, defined in tests/during-dev.sh).

-i
--install

After finishing the build, also run make install.

-D
--dist

Run make dist-lzip pdf to build a distribution tarball (in .tar.lz format) and a PDF manual. This can be useful for archiving, or sending to colleagues who do not use Git for an easy build and manual.

-u STR
--upload STR

Activate the --dist (-D) option, then use secure copy (scp, part of the SSH tools) to copy the tarball and PDF to the src and pdf sub-directories of the specified server and its directory (value to this option). For example, --upload my-server:dir, will copy the tarball in the dir/src, and the PDF manual in dir/pdf of my-server server. It will then make a symbolic link in the top server directory to the tarball that is called gnuastro-latest.tar.lz.

-p STR
--publish=STR

Clean, bootstrap, build, check and upload the checked tarball and PDF of the book to the URL given as STR. This option is just a wrapper for --autoreconf --clean --debug --check --upload STR. --debug is added because it will greatly speed up the build. --debug will have no effect on the produced tarball (people who later download will be building with the default optimized, and non-debug mode). This option is good when you have made a commit and are ready to publish it on your server (if nothing crashes). Recall that if any of the previous steps fail the script aborts.

-I
--install-archive

Short for --autoreconf --clean --check --install --dist. This is useful when you actually want to install the commit you just made (if the build and checks succeed). It will also produce a distribution tarball and PDF manual for easy access to the installed tarball on your system at a later time.

Ideally, Gnuastro’s Git version history makes it easy for a prepared system to revert back to a different point in history. But Gnuastro also needs to bootstrap files and also your collaborators might (usually do!) find it too much of a burden to do the bootstrapping themselves. So it is convenient to have a tarball and PDF manual of the version you have installed (and are using in your research) handily available.

-h
--help
-P
--printparams

Print a description of this script along with all the options and their current values.


3.3.3 Tests

After successfully building (compiling) the programs with the $ make command you can check the installation before installing. To run the tests, run

$ make check

For every program some tests are designed to check some possible operations. Running the command above will run those tests and give you a final report. If everything is OK and you have built all the programs, all the tests should pass. In case any of the tests fail, please have a look at Known issues and if that still does not fix your problem, look that the ./tests/test-suite.log file to see if the source of the error is something particular to your system or more general. If you feel it is general, please contact us because it might be a bug. Note that the tests of some programs depend on the outputs of other program’s tests, so if you have not installed them they might be skipped or fail. Prior to releasing every distribution all these tests are checked. If you have a reasonably modern terminal, the outputs of the successful tests will be colored green and the failed ones will be colored red.

These scripts can also act as a good set of examples for you to see how the programs are run. All the tests are in the tests/ directory. The tests for each program are shell scripts (ending with .sh) in a sub-directory of this directory with the same name as the program. See Test scripts for more detailed information about these scripts in case you want to inspect them.


3.3.4 A4 print book

The default print version of this book is provided in the letter paper size. If you would like to have the print version of this book on paper and you are living in a country which uses A4, then you can rebuild the book. The great thing about the GNU build system is that the book source code which is in Texinfo is also distributed with the program source code, enabling you to do such customization (hacking).

In order to change the paper size, you will need to have GNU Texinfo installed. Open doc/gnuastro.texi with any text editor. This is the source file that created this book. In the first few lines you will see this line:

@c@afourpaper

In Texinfo, a line is commented with @c. Therefore, un-comment this line by deleting the first two characters such that it changes to:

@afourpaper

Save the file and close it. You can now run the following command

$ make pdf

and the new PDF book will be available in SRCdir/doc/gnuastro.pdf. By changing the pdf in $ make pdf to ps or dvi you can have the book in those formats. Note that you can do this for any book that is in Texinfo format, they might not have @afourpaper line, so you can add it close to the top of the Texinfo source file.


3.3.5 Known issues

Depending on your operating system and the version of the compiler you are using, you might confront some known problems during the configuration ($ ./configure), compilation ($ make) and tests ($ make check). Here, their solutions are discussed.

  • $ ./configure: Configure complains about not finding a library even though you have installed it. The possible solution is based on how you installed the package:
    • From your distribution’s package manager. Most probably this is because your distribution has separated the header files of a library from the library parts. Please also install the ‘development’ packages for those libraries too. Just add a -dev or -devel to the end of the package name and re-run the package manager. This will not happen if you install the libraries from source. When installed from source, the headers are also installed.
    • From source. Then your linker is not looking where you installed the library. If you followed the instructions in this chapter, all the libraries will be installed in /usr/local/lib. So you have to tell your linker to look in this directory. To do so, configure Gnuastro like this:
      $ ./configure LDFLAGS="-L/usr/local/lib"
      

      If you want to use the libraries for your other programming projects, then export this environment variable in a start-up script similar to the case for LD_LIBRARY_PATH explained below, also see Installation directory.

  • $ make: Complains about an unknown function on a non-GNU based operating system. In this case, please run $ ./configure with the --enable-gnulibcheck option to see if the problem is from the GNU Portability Library (Gnulib) not supporting your system or if there is a problem in Gnuastro, see Gnuastro configure options. If the problem is not in Gnulib and after all its tests you get the same complaint from make, then please contact us at bug-gnuastro@gnu.org. The cause is probably that a function that we have used is not supported by your operating system and we did not included it along with the source tarball. If the function is available in Gnulib, it can be fixed immediately.
  • $ make: Cannot find the headers (.h files) of installed libraries. Your C preprocessor (CPP) is not looking in the right place. To fix this, configure Gnuastro with an additional CPPFLAGS like below (assuming the library is installed in /usr/local/include:
    $ ./configure CPPFLAGS="-I/usr/local/include"
    

    If you want to use the libraries for your other programming projects, then export this environment variable in a start-up script similar to the case for LD_LIBRARY_PATH explained below, also see Installation directory.

  • $ make check: Only the first couple of tests pass, all the rest fail or get skipped. It is highly likely that when searching for shared libraries, your system does not look into the /usr/local/lib directory (or wherever you installed Gnuastro or its dependencies). To make sure it is added to the list of directories, add the following line to your ~/.bashrc file and restart your terminal. Do not forget to change /usr/local/lib if the libraries are installed in other (non-standard) directories.
    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"
    

    You can also add more directories by using a colon ‘:’ to separate them. See Installation directory and Linking to learn more on the PATH variables and dynamic linking respectively.

  • $ make check: The tests relying on external programs (for example, fitstopdf.sh fail.) This is probably due to the fact that the version number of the external programs is too old for the tests we have preformed. Please update the program to a more recent version. For example, to create a PDF image, you will need GPL Ghostscript, but older versions do not work, we have successfully tested it on version 9.15. Older versions might cause a failure in the test result.
  • $ make pdf: The PDF book cannot be made. To make a PDF book, you need to have the GNU Texinfo program (like any program, the more recent the better). A working TeX program is also necessary, which you can get from Tex Live104.
  • After make check: do not copy the programs’ executables to another (for example, the installation) directory manually (using cp, or mv for example). In the default configuration105, the program binaries need to link with Gnuastro’s shared library which is also built and installed with the programs. Therefore, to run successfully before and after installation, linking modifications need to be made by GNU Libtool at installation time. make install does this internally, but a simple copy might give linking errors when you run it. If you need to copy the executables, you can do so after installation.
  • $ make (when bootstrapping): After you have bootstrapped Gnuastro from the version-controlled source, you may confront the following (or a similar) error when converting images (for more on bootstrapping, see Bootstrapping):
    convert: attempt to perform an operation not allowed by the
    security policy `gs'  error/delegate.c/ExternalDelegateCommand/378.
    

    This error is a known issue106 with ImageMagick security policies in some operating systems. In short, imagemagick uses Ghostscript for PDF, EPS, PS and XPS parsing. However, because some security vulnerabilities have been found in Ghostscript107, by default, ImageMagick may be compiled without Ghostscript library. In such cases, if allowed, ImageMagick will fall back to the external gs command instead of the library. But this may be disabled with the following (or a similar) lines in /etc/ImageMagick-7/policy.xml (anything related to PDF, PS, or Ghostscript).

    <policy domain="delegate" rights="none" pattern="gs" />
    <policy domain="module" rights="none" pattern="{PS,PDF,XPS}" />
    

    To fix this problem, simply comment such lines (by placing a <!-- before each statement/line and --> at the end of that statement/line).

If your problem was not listed above, please file a bug report (Report a bug).


4 Common program behavior

All the programs in Gnuastro share a set of common behavior mainly to do with user interaction to facilitate their usage and development. This includes how to feed input datasets into the programs, how to configure them, specifying the outputs, numerical data types, treating columns of information in tables, etc. This chapter is devoted to describing this common behavior in all programs. Because the behaviors discussed here are common to several programs, they are not repeated in each program’s description.

In Command-line, a very general description of running the programs on the command-line is discussed, like difference between arguments and options, as well as options that are common/shared between all programs. None of Gnuastro’s programs keep any internal configuration value (values for their different operational steps), they read their configuration primarily from the command-line, then from specific files in directory, user, or system-wide settings. Using these configuration files can greatly help reproducible and robust usage of Gnuastro, see Configuration files for more.

It is not possible to always have the different options and configurations of each program on the top of your head. It is very natural to forget the options of a program, their current default values, or how it should be run and what it did. Gnuastro’s programs have multiple ways to help you refresh your memory in multiple levels (just an option name, a short description, or fast access to the relevant section of the manual. See Getting help for more for more on benefiting from this very convenient feature.

Many of the programs use the multi-threaded character of modern CPUs, in Multi-threaded operations we will discuss how you can configure this behavior, along with some tips on making best use of them. In Numeric data types, we will review the various types to store numbers in your datasets: setting the proper type for the usage context108 can greatly improve the file size and also speed of reading, writing or processing them.

We will then look into the recognized table formats in Tables and how large datasets are broken into tiles, or mesh grid in Tessellation. Finally, we will take a look at the behavior regarding output files: Automatic output describes how the programs set a default name for their output when you do not give one explicitly (using --output). When the output is a FITS file, all the programs also store some very useful information in the header that is discussed in Output FITS files.


4.1 Command-line

Gnuastro’s programs are customized through the standard Unix-like command-line environment and GNU style command-line options. Both are very common in many Unix-like operating system programs. In Arguments and options we will start with the difference between arguments and options and elaborate on the GNU style of options. Afterwards, in Common options, we will go into the detailed list of all the options that are common to all the programs in Gnuastro.


4.1.1 Arguments and options

When you type a command on the command-line, it is passed onto the shell (a generic name for the program that manages the command-line) as a string of characters. As an example, see the “Invoking ProgramName” sections in this manual for some examples of commands with each program, like Invoking Table, Invoking Fits, or Invoking Statistics.

The shell then brakes up your string into separate tokens or words using any metacharacters (like white-space, tab, |, > or ;) that are in the string. On the command-line, the first thing you usually enter is the name of the program you want to run. After that, you can specify two types of tokens: arguments and options. In the GNU-style, arguments are those tokens that are not preceded by any hyphens (-, see Arguments). Here is one example:

$ astcrop --center=53.162551,-27.789676 -w10/3600 --mode=wcs udf.fits

In the example above, we are running Crop to crop a region of width 10 arc-seconds centered at the given RA and Dec from the input Hubble Ultra-Deep Field (UDF) FITS image. Here, the argument is udf.fits. Arguments are most commonly the input file names containing your data. Options start with one or two hyphens, followed by an identifier for the option (the option’s name, for example, --center, -w, --mode in the example above) and its value (anything after the option name, or the optional = character). Through options you can configure how the program runs (interprets the data you provided).

Arguments can be mandatory and optional and unlike options, they do not have any identifiers. Hence, when there multiple arguments, their order might also matter (for example, in cp which is used for copying one file to another location). The outputs of --usage and --help shows which arguments are optional and which are mandatory, see --usage.

As their name suggests, options can be considered to be optional and most of the time, you do not have to worry about what order you specify them in. When the order does matter, or the option can be invoked multiple times, it is explicitly mentioned in the “Invoking ProgramName” section of each program (this is a very important aspect of an option).

If there is only one such character, you can use a backslash (\) before it. If there are multiple, it might be easier to simply put your whole argument or option value inside of double quotes ("). In such cases, everything inside the double quotes will be seen as one token or word.

For example, let’s say you want to specify the header data unit (HDU) of your FITS file using a complex expression like ‘3; images(exposure > 100)’. If you simply add these after the --hdu (-h) option, the programs in Gnuastro will read the value to the HDU option as ‘3’ and run. Then, the shell will attempt to run a separate command ‘images(exposure > 100)’ and complain about a syntax error. This is because the semicolon (;) is an ‘end of command’ character in the shell. To solve this problem you can simply put double quotes around the whole string you want to pass to --hdu as seen below:

$ astcrop --hdu="3; images(exposure > 100)" image.fits

4.1.1.1 Arguments

In Gnuastro, arguments are almost exclusively used as the input data file names. Please consult the first few paragraph of the “Invoking ProgramName” section for each program for a description of what it expects as input, how many arguments, or input data, it accepts, or in what order. Everything particular about how a program treats arguments, is explained under the “Invoking ProgramName” section for that program.

Generally, if there is a standard file name suffix for a particular format, that filename extension is checked to identify their format. In astronomy (and thus Gnuastro), FITS is the preferred format for inputs and outputs, so the focus here and throughout this book is on FITS. However, other formats are also accepted in special cases, for example, ConvertType also accepts JPEG or TIFF inputs, and writes JPEG, EPS or PDF files. The recognized suffixes for these formats are listed there.

The list below shows the recognized suffixes for FITS data files in Gnuastro’s programs. However, in some scenarios FITS writers may not append a suffix to the file, or use a non-recognized suffix (not in the list below). Therefore if a FITS file is expected, but it does not have any of these suffixes, Gnuastro programs will look into the contents of the file and if it does conform with the FITS standard, the file will be used. Just note that checking about 5 characters at the end of a name string is much more efficient than opening and checking the contents of a file, so it is generally recommended to have a recognized FITS suffix.

  • .fits: The standard file name ending of a FITS image.
  • .fit: Alternative (3 character) FITS suffix.
  • .fits.Z: A FITS image compressed with compress.
  • .fits.gz: A FITS image compressed with GNU zip (gzip).
  • .fits.fz: A FITS image compressed with fpack.
  • .imh: IRAF format image file.

Throughout this book and in the command-line outputs, whenever we want to generalize all such astronomical data formats in a text place-holder, we will use ASTRdata and assume that the extension is also part of this name. Any file ending with these names is directly passed on to CFITSIO to read. Therefore you do not necessarily have to have these files on your computer, they can also be located on an FTP or HTTP server too, see the CFITSIO manual for more information.

CFITSIO has its own error reporting techniques, if your input file(s) cannot be opened, or read, those errors will be printed prior to the final error by Gnuastro.


4.1.1.2 Options

Command-line options allow configuring the behavior of a program in all GNU/Linux applications for each particular execution on a particular input data. A single option can be called in two ways: long or short. All options in Gnuastro accept the long format which has two hyphens an can have many characters (for example, --hdu). Short options only have one hyphen (-) followed by one character (for example, -h). You can see some examples in the list of options in Common options or those for each program’s “Invoking ProgramName” section. Both formats are shown for those which support both. First the short is shown then the long.

Usually, the short options are handy when you are writing on the command-line and want to save keystrokes and time. The long options are good for shell scripts, where you are not usually rushing. Long options provide a level of documentation, since they are more descriptive and less cryptic. Usually after a few months of not running a program, the short options will be forgotten and reading your previously written script will not be easy.

Some options need to be given a value if they are called and some do not. You can think of the latter type of options as on/off options. These two types of options can be distinguished using the output of the --help and --usage options, which are common to all GNU software, see Getting help. In Gnuastro we use the following strings to specify when the option needs a value and what format that value should be in. More specific tests will be done in the program and if the values are out of range (for example, negative when the program only wants a positive value), an error will be reported.

INT

The value is read as an integer.

FLT

The value is read as a float. There are generally two types, depending on the context. If they are for fractions, they will have to be less than or equal to unity.

STR

The value is read as a string of characters. For example, column names in a table, or HDU names in a multi-extension FITS file. Other examples include human-readable settings by some programs like the --domain option of the Convolve program that can be either spatial or frequency (to specify the type of convolution, see Convolve).

FITS or FITS/TXT

The value should be a file (most commonly FITS). In many cases, other formats may also be accepted (for example, input tables can be FITS or plain-text, see Recognized table formats).

To specify a value in the short format, simply put the value after the option. Note that since the short options are only one character long, you do not have to type anything between the option and its value. For the long option you either need white space or an = sign, for example, -h2, -h 2, --hdu 2 or --hdu=2 are all equivalent.

The short format of on/off options (those that do not need values) can be concatenated for example, these two hypothetical sequences of options are equivalent: -a -b -c4 and -abc4. As an example, consider the following command to run Crop:

$ astcrop -Dr3 --wwidth 3 catalog.txt --deccol=4 ASTRdata

The $ is the shell prompt, astcrop is the program name. There are two arguments (catalog.txt and ASTRdata) and four options, two of them given in short format (-D, -r) and two in long format (--width and --deccol). Three of them require a value and one (-D) is an on/off option.

If an abbreviation is unique between all the options of a program, the long option names can be abbreviated. For example, instead of typing --printparams, typing --print or maybe even --pri will be enough, if there are conflicts, the program will warn you and show you the alternatives. Finally, if you want the argument parser to stop parsing arguments beyond a certain point, you can use two dashes: --. No text on the command-line beyond these two dashes will be parsed.

Gnuastro has two types of options with values, those that only take a single value are the most common type. If these options are repeated or called more than once on the command-line, the value of the last time it was called will be assigned to it. This is very useful when you are testing/experimenting. Let’s say you want to make a small modification to one option value. You can simply type the option with a new value in the end of the command and see how the script works. If you are satisfied with the change, you can remove the original option for human readability. If the change was not satisfactory, you can remove the one you just added and not worry about forgetting the original value. Without this capability, you would have to memorize or save the original value somewhere else, run the command and then change the value again which is not at all convenient and is potentially cause lots of bugs.

On the other hand, some options can be called multiple times in one run of a program and can thus take multiple values (for example, see the --column option in Invoking Table. In these cases, the order of stored values is the same order that you specified on the command-line.

Gnuastro’s programs do not keep any internal default values, so some options are mandatory and if they do not have a value, the program will complain and abort. Most programs have many such options and typing them by hand on every call is impractical. To facilitate the user experience, after parsing the command-line, Gnuastro’s programs read special configuration files to get the necessary values for the options you have not identified on the command-line. These configuration files are fully described in Configuration files.

CAUTION: In specifying a file address, if you want to use the shell’s tilde expansion (~) to specify your home directory, leave at least one space between the option name and your value. For example, use -o ~/test, --output ~/test or --output= ~/test. Calling them with -o~/test or --output=~/test will disable shell expansion.

CAUTION: If you forget to specify a value for an option which requires one, and that option is the last one, Gnuastro will warn you. But if it is in the middle of the command, it will take the text of the next option or argument as the value which can cause undefined behavior.

NOTE: In some contexts Gnuastro’s counting starts from 0 and in others 1. You can assume by default that counting starts from 1, if it starts from 0 for a special option, it will be explicitly mentioned.


4.1.2 Common options

To facilitate the job of the users and developers, all the programs in Gnuastro share some basic command-line options for the options that are common to many of the programs. The full list is classified as Input/Output options, Processing options, and Operating mode options. In some programs, some of the options are irrelevant, but still recognized (you will not get an unrecognized option error, but the value is not used). Unless otherwise mentioned, these options are identical between all programs.


4.1.2.1 Input/Output options

These options are to do with the input and outputs of the various programs.

--stdintimeout

Number of micro-seconds to wait for writing/typing in the first line of standard input from the command-line (see Standard input). This is only relevant for programs that also accept input from the standard input, and you want to manually write/type the contents on the terminal. When the standard input is already connected to a pipe (output of another program), there will not be any waiting (hence no timeout, thus making this option redundant).

If the first line-break (for example, with the ENTER key) is not provided before the timeout, the program will abort with an error that no input was given. Note that this time interval is only for the first line that you type. Once the first line is given, the program will assume that more data will come and accept rest of your inputs without any time limit. You need to specify the ending of the standard input, for example, by pressing CTRL-D after a new line.

Note that any input you write/type into a program on the command-line with Standard input will be discarded (lost) once the program is finished. It is only recoverable manually from your command-line (where you actually typed) as long as the terminal is open. So only use this feature when you are sure that you do not need the dataset (or have a copy of it somewhere else).

-h STR/INT