GNU Astronomy Utilities 1 Introduction 2 Tutorials 3 Installation 4 Common program behavior 5 Data containers 6 Data manipulation 7 Data analysis 8 Data modeling 9 High-level calculations 10 Installed scripts 11 Makefile extensions (for GNU Make) 12 Library 13 Developing Appendix A Gnuastro programs list Appendix B Other useful software Appendix C GNU Free Doc. License Appendix D GNU Gen. Pub. License v3 Index: Macros, structures and functions Index GNU Astronomy Utilities 1 Introduction 1.1 Quick start 1.2 Gnuastro manifesto: Science and its tools 1.3 Your rights 1.4 Logo of Gnuastro 1.5 Naming convention 1.6 Version numbering 1.6.1 GNU Astronomy Utilities 1.0 1.7 New to GNU/Linux? 1.7.1 Command-line interface 1.8 Report a bug 1.9 Suggest new feature 1.10 Announcements 1.11 Conventions 1.12 Acknowledgments 2 Tutorials 2.1 General program usage tutorial 2.1.1 Calling Gnuastro’s programs 2.1.2 Accessing documentation 2.1.3 Setup and data download 2.1.4 Dataset inspection and cropping 2.1.5 Angular coverage on the sky 2.1.6 Cosmological coverage and visualizing tables 2.1.7 Building custom programs with the library 2.1.8 Option management and configuration files 2.1.9 Warping to a new pixel grid 2.1.10 NoiseChisel and Multi-Extension FITS files 2.1.11 NoiseChisel optimization for detection 2.1.12 NoiseChisel optimization for storage 2.1.13 Segmentation and making a catalog 2.1.14 Measuring the dataset limits 2.1.15 Working with catalogs (estimating colors) 2.1.16 Column statistics (color-magnitude diagram) 2.1.17 Aperture photometry 2.1.18 Matching catalogs 2.1.19 Reddest clumps, cutouts and parallelization 2.1.20 FITS images in a publication 2.1.21 Marking objects for publication 2.1.22 Writing scripts to automate the steps 2.1.23 Citing and acknowledging Gnuastro 2.2 Detecting large extended targets 2.2.1 Downloading and validating input data 2.2.2 NoiseChisel optimization 2.2.3 Skewness caused by signal and its measurement 2.2.4 Image surface brightness limit 2.2.5 Achieved surface brightness level 2.2.6 Extract clumps and objects (Segmentation) 2.3 Building the extended PSF 2.3.1 Preparing input for extended PSF 2.3.2 Saturated pixels and Segment’s clumps 2.3.3 One object for the whole detection 2.3.4 Building outer part of PSF 2.3.5 Inner part of the PSF 2.3.6 Uniting the different PSF components 2.3.7 Subtracting the PSF 2.4 Sufi simulates a detection 3 Installation 3.1 Dependencies 3.1.1 Mandatory dependencies 3.1.1.1 GNU Scientific Library 3.1.1.2 CFITSIO 3.1.1.3 WCSLIB 3.1.2 Optional dependencies 3.1.3 Bootstrapping dependencies 3.1.4 Dependencies from package managers 3.2 Downloading the source 3.2.1 Release tarball 3.2.2 Version controlled source 3.2.2.1 Bootstrapping 3.2.2.2 Synchronizing 3.3 Build and install 3.3.1 Configuring 3.3.1.1 Gnuastro configure options 3.3.1.2 Installation directory 3.3.1.3 Executable names 3.3.1.4 Configure and build in RAM 3.3.2 Separate build and source directories 3.3.3 Tests 3.3.4 A4 print book 3.3.5 Known issues 4 Common program behavior 4.1 Command-line 4.1.1 Arguments and options 4.1.1.1 Arguments 4.1.1.2 Options 4.1.2 Common options 4.1.2.1 Input/Output options 4.1.2.2 Processing options 4.1.2.3 Operating mode options 4.1.3 Shell TAB completion (highly customized) 4.1.4 Standard input 4.2 Configuration files 4.2.1 Configuration file format 4.2.2 Configuration file precedence 4.2.3 Current directory and User wide 4.2.4 System wide 4.3 Getting help 4.3.1 ‘--usage’ 4.3.2 ‘--help’ 4.3.3 Man pages 4.3.4 Info 4.3.5 help-gnuastro mailing list 4.4 Multi-threaded operations 4.4.1 A note on threads 4.4.2 How to run simultaneous operations 4.5 Numeric data types 4.6 Memory management 4.7 Tables 4.7.1 Recognized table formats 4.7.2 Gnuastro text table format 4.7.3 Selecting table columns 4.8 Tessellation 4.9 Automatic output 4.10 Output FITS files 4.11 Numeric locale 5 Data containers 5.1 Fits 5.1.1 Invoking Fits 5.1.1.1 HDU information and manipulation 5.1.1.2 Keyword inspection and manipulation 5.1.1.3 Pixel information images 5.2 ConvertType 5.2.1 Raster and Vector graphics 5.2.2 Recognized file formats 5.2.3 Color 5.2.3.1 Pixel colors 5.2.3.2 Colormaps for single-channel pixels 5.2.3.3 Vector graphics colors 5.2.4 Color channels in same pixel grid 5.2.5 Annotations for figure in paper 5.2.5.1 Full script of annotations on figure 5.2.6 Invoking ConvertType 5.2.6.1 ConvertType input and output 5.2.6.2 Pixel visualization 5.2.6.3 Drawing with vector graphics 5.3 Table 5.3.1 Column arithmetic 5.3.2 Operation precedence in Table 5.3.3 Invoking Table 5.4 Query 5.4.1 Available databases 5.4.2 Invoking Query 6 Data manipulation 6.1 Crop 6.1.1 Crop modes 6.1.2 Crop section syntax 6.1.3 Blank pixels 6.1.4 Invoking Crop 6.1.4.1 Crop options 6.1.4.2 Crop output 6.1.4.3 Crop known issues 6.2 Arithmetic 6.2.1 Reverse polish notation 6.2.2 Integer benefits and pitfalls 6.2.3 Arithmetic operators 6.2.3.1 Basic mathematical operators 6.2.3.2 Trigonometric and hyperbolic operators 6.2.3.3 Constants 6.2.3.4 Unit conversion operators 6.2.3.5 Statistical operators 6.2.3.6 Stacking operators 6.2.3.7 Filtering (smoothing) operators 6.2.3.8 Interpolation operators 6.2.3.9 Dimensionality changing operators 6.2.3.10 Conditional operators 6.2.3.11 Mathematical morphology operators 6.2.3.12 Bitwise operators 6.2.3.13 Numerical type conversion operators 6.2.3.14 Random number generators 6.2.3.15 Elliptical shape operators 6.2.3.16 Loading external columns 6.2.3.17 Building new dataset 6.2.3.18 Operand storage in memory or a file 6.2.4 Invoking Arithmetic 6.3 Convolve 6.3.1 Spatial domain convolution 6.3.1.1 Convolution process 6.3.1.2 Edges in the spatial domain 6.3.2 Frequency domain and Fourier operations 6.3.2.1 Fourier series historical background 6.3.2.2 Circles and the complex plane 6.3.2.3 Fourier series 6.3.2.4 Fourier transform 6.3.2.5 Dirac delta and comb 6.3.2.6 Convolution theorem 6.3.2.7 Sampling theorem 6.3.2.8 Discrete Fourier transform 6.3.2.9 Fourier operations in two dimensions 6.3.2.10 Edges in the frequency domain 6.3.3 Spatial vs. Frequency domain 6.3.4 Convolution kernel 6.3.5 Invoking Convolve 6.4 Warp 6.4.1 Linear warping basics 6.4.2 Merging multiple warpings 6.4.3 Resampling 6.4.4 Moiré pattern and its correction 6.4.5 Invoking Warp 6.4.5.1 Align pixels with WCS considering distortions 6.4.5.2 Linear warps to be called explicitly 7 Data analysis 7.1 Statistics 7.1.1 Histogram and Cumulative Frequency Plot 7.1.2 2D Histograms 7.1.2.1 2D histogram as a table for plotting 7.1.2.2 2D histogram as an image 7.1.3 Sigma clipping 7.1.4 Least squares fitting 7.1.5 Sky value 7.1.5.1 Sky value definition 7.1.5.2 Sky value misconceptions 7.1.5.3 Quantifying signal in a tile 7.1.6 Invoking Statistics 7.1.6.1 Input to Statistics 7.1.6.2 Single value measurements 7.1.6.3 Generating histograms and cumulative freq. 7.1.6.4 Fitting options 7.1.6.5 Contour options 7.1.6.6 Statistics on tiles 7.2 NoiseChisel 7.2.1 NoiseChisel changes after publication 7.2.2 Invoking NoiseChisel 7.2.2.1 NoiseChisel input 7.2.2.2 Detection options 7.2.2.3 NoiseChisel output 7.3 Segment 7.3.1 Invoking Segment 7.3.1.1 Segment input 7.3.1.2 Segmentation options 7.3.1.3 Segment output 7.4 MakeCatalog 7.4.1 Detection and catalog production 7.4.2 Brightness, Flux, Magnitude and Surface brightness 7.4.3 Quantifying measurement limits 7.4.3.1 Magnitude measurement error of each detection 7.4.3.2 Surface brightness error of each detection 7.4.3.3 Completeness limit of each detection 7.4.3.4 Upper limit magnitude of each detection 7.4.3.5 Magnitude limit of image 7.4.3.6 Surface brightness limit of image 7.4.3.7 Upper limit magnitude of image 7.4.4 Measuring elliptical parameters 7.4.5 Adding new columns to MakeCatalog 7.4.6 Invoking MakeCatalog 7.4.6.1 MakeCatalog inputs and basic settings 7.4.6.2 Upper-limit settings 7.4.6.3 MakeCatalog measurements 7.4.6.4 MakeCatalog output 7.5 Match 7.5.1 Matching algorithms 7.5.2 Invoking Match 8 Data modeling 8.1 MakeProfiles 8.1.1 Modeling basics 8.1.1.1 Defining an ellipse and ellipsoid 8.1.1.2 Point spread function 8.1.1.3 Stars 8.1.1.4 Galaxies 8.1.1.5 Sampling from a function 8.1.1.6 Oversampling 8.1.2 If convolving afterwards 8.1.3 Profile magnitude 8.1.4 Invoking MakeProfiles 8.1.4.1 MakeProfiles catalog 8.1.4.2 MakeProfiles profile settings 8.1.4.3 MakeProfiles output dataset 8.1.4.4 MakeProfiles log file 8.2 MakeNoise 8.2.1 Noise basics 8.2.1.1 Photon counting noise 8.2.1.2 Instrumental noise 8.2.1.3 Final noised pixel value 8.2.1.4 Generating random numbers 8.2.2 Invoking MakeNoise 9 High-level calculations 9.1 CosmicCalculator 9.1.1 Distance on a 2D curved space 9.1.2 Extending distance concepts to 3D 9.1.3 Invoking CosmicCalculator 9.1.3.1 CosmicCalculator input options 9.1.3.2 CosmicCalculator basic cosmology calculations 9.1.3.3 CosmicCalculator spectral line calculations 10 Installed scripts 10.1 Sort FITS files by night 10.1.1 Invoking astscript-sort-by-night 10.2 Generate radial profile 10.2.1 Invoking astscript-radial-profile 10.3 SAO DS9 region files from table 10.3.1 Invoking astscript-ds9-region 10.4 Viewing FITS file contents with DS9 or TOPCAT 10.4.1 Invoking astscript-fits-view 10.5 PSF construction and subtraction 10.5.1 Overview of the PSF scripts 10.5.2 Invoking astscript-psf-select-stars 10.5.3 Invoking astscript-psf-stamp 10.5.4 Invoking astscript-psf-unite 10.5.5 Invoking astscript-psf-scale-factor 10.5.6 Invoking astscript-psf-subtract 11 Makefile extensions (for GNU Make) 11.1 Loading the Gnuastro Make functions 11.2 Makefile functions of Gnuastro 12 Library 12.1 Review of library fundamentals 12.1.1 Headers 12.1.2 Linking 12.1.3 Summary and example on libraries 12.2 BuildProgram 12.2.1 Invoking BuildProgram 12.3 Gnuastro library 12.3.1 Configuration information (‘config.h’) 12.3.2 Multithreaded programming (‘threads.h’) 12.3.2.1 Implementation of ‘pthread_barrier’ 12.3.2.2 Gnuastro’s thread related functions 12.3.3 Library data types (‘type.h’) 12.3.4 Pointers (‘pointer.h’) 12.3.5 Library blank values (‘blank.h’) 12.3.6 Data container (‘data.h’) 12.3.6.1 Generic data container (‘gal_data_t’) 12.3.6.2 Dataset allocation 12.3.6.3 Arrays of datasets 12.3.6.4 Copying datasets 12.3.7 Dimensions (‘dimension.h’) 12.3.8 Linked lists (‘list.h’) 12.3.8.1 List of strings 12.3.8.2 List of ‘int32_t’ 12.3.8.3 List of ‘size_t’ 12.3.8.4 List of ‘float’ 12.3.8.5 List of ‘double’ 12.3.8.6 List of ‘void *’ 12.3.8.7 Ordered list of ‘size_t’ 12.3.8.8 Doubly linked ordered list of ‘size_t’ 12.3.8.9 List of ‘gal_data_t’ 12.3.9 Array input output 12.3.10 Table input output (‘table.h’) 12.3.11 FITS files (‘fits.h’) 12.3.11.1 FITS Macros, errors and filenames 12.3.11.2 CFITSIO and Gnuastro types 12.3.11.3 FITS HDUs 12.3.11.4 FITS header keywords 12.3.11.5 FITS arrays (images) 12.3.11.6 FITS tables 12.3.12 File input output 12.3.12.1 Text files (‘txt.h’) 12.3.12.2 TIFF files (‘tiff.h’) 12.3.12.3 JPEG files (‘jpeg.h’) 12.3.12.4 EPS files (‘eps.h’) 12.3.12.5 PDF files (‘pdf.h’) 12.3.13 World Coordinate System (‘wcs.h’) 12.3.14 Arithmetic on datasets (‘arithmetic.h’) 12.3.15 Tessellation library (‘tile.h’) 12.3.15.1 Independent tiles 12.3.15.2 Tile grid 12.3.16 Bounding box (‘box.h’) 12.3.17 Polygons (‘polygon.h’) 12.3.18 Qsort functions (‘qsort.h’) 12.3.19 K-d tree (‘kdtree.h’) 12.3.20 Permutations (‘permutation.h’) 12.3.21 Matching (‘match.h’) 12.3.22 Statistical operations (‘statistics.h’) 12.3.23 Fitting functions (‘fit.h’) 12.3.24 Binary datasets (‘binary.h’) 12.3.25 Labeled datasets (‘label.h’) 12.3.26 Convolution functions (‘convolve.h’) 12.3.27 Interpolation (‘interpolate.h’) 12.3.28 Warp library (‘warp.h’) 12.3.29 Color functions (‘color.h’) 12.3.30 Git wrappers (‘git.h’) 12.3.31 Python interface (‘python.h’) 12.3.32 Unit conversion library (‘units.h’) 12.3.33 Spectral lines library (‘speclines.h’) 12.3.34 Cosmology library (‘cosmology.h’) 12.3.35 SAO DS9 library (‘ds9.h’) 12.4 Library demo programs 12.4.1 Library demo - reading a FITS image 12.4.2 Library demo - inspecting neighbors 12.4.3 Library demo - multi-threaded operation 12.4.4 Library demo - reading and writing table columns 12.4.5 Library demo - Warp to another image 12.4.6 Library demo - Warp to new grid 13 Developing 13.1 Why C programming language? 13.2 Program design philosophy 13.3 Coding conventions 13.4 Program source 13.4.1 Mandatory source code files 13.4.2 The TEMPLATE program 13.5 Documentation 13.6 Building and debugging 13.7 Test scripts 13.8 Bash programmable completion 13.8.1 Bash TAB completion tutorial 13.8.2 Implementing TAB completion in Gnuastro 13.9 Developer’s checklist 13.10 Gnuastro project webpage 13.11 Developing mailing lists 13.12 Contributing to Gnuastro 13.12.1 Copyright assignment 13.12.2 Commit guidelines 13.12.3 Production workflow 13.12.4 Forking tutorial Appendix A Gnuastro programs list Appendix B Other useful software B.1 SAO DS9 B.2 TOPCAT B.3 PGPLOT Appendix C GNU Free Doc. License Appendix D GNU Gen. Pub. License v3 Index: Macros, structures and functions Index GNU Astronomy Utilities *********************** This book documents version 0.19 of the GNU Astronomy Utilities (Gnuastro). Gnuastro provides various programs and libraries for astronomical data manipulation and analysis. Copyright © 2015-2022 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”. 1 Introduction ************** GNU Astronomy Utilities (Gnuastro) is an official GNU package consisting of separate programs and libraries for the manipulation and analysis of astronomical data. All the programs share the same basic command-line user interface for the comfort of both the users and developers. Gnuastro is written to comply fully with the GNU coding standards so it integrates finely with the GNU/Linux operating system. This also enables astronomers to expect a fully familiar experience in the source code, building, installing and command-line user interaction that they have seen in all the other GNU software that they use. The official and always up to date version of this book (or manual) is freely available under *note GNU Free Doc. License:: in various formats (PDF, HTML, plain text, info, and as its Texinfo source) at . For users who are new to the GNU/Linux environment, unless otherwise specified most of the topics in *note Installation:: and *note Common program behavior:: are common to all GNU software, for example, installation, managing command-line options or getting help (also see *note New to GNU/Linux?::). So if you are new to this empowering environment, we encourage you to go through these chapters carefully. They can be a starting point from which you can continue to learn more from each program’s own manual and fully benefit from and enjoy this wonderful environment. Gnuastro also comes with a large set of libraries, so you can write your own programs using Gnuastro’s building blocks, see *note Review of library fundamentals:: for an introduction. In Gnuastro, no change to any program or library will be committed to its history, before it has been fully documented here first. As discussed in *note Science and its tools:: this is a founding principle of the Gnuastro. 1.1 Quick start =============== The latest official release tarball is always available as ‘gnuastro-latest.tar.lz’ (http://ftp.gnu.org/gnu/gnuastro/gnuastro-latest.tar.lz). The Lzip (http://www.nongnu.org/lzip/lzip.html) format is used for better compression (smaller output size, thus faster download), and robust archival features and standards. For historical reasons (those users that do not yet have Lzip), the Gzip’d tarball(1) is available at the same URL (just change the ‘.lz’ suffix above to ‘.gz’; however, the Lzip’d file is recommended). See *note Release tarball:: for more details on the tarball release. Let’s assume the downloaded tarball is in the ‘TOPGNUASTRO’ directory. You can follow the commands below to download and un-compress the Gnuastro source. You need to have the ‘lzip’ program for the decompression (see *note Dependencies from package managers::) If your Tar implementation does not recognize Lzip (the third command fails), run the fourth command. Note that lines starting with ‘##’ do not need to be typed (they are only a description of the following command): ## Go into the download directory. $ cd TOPGNUASTRO ## If you do not already have the tarball, you can download it: $ wget http://ftp.gnu.org/gnu/gnuastro/gnuastro-latest.tar.lz ## If this fails, run the next command. $ tar -xf gnuastro-latest.tar.lz ## Only when the previous command fails. $ lzip -cd gnuastro-latest.tar.lz | tar -xf - Gnuastro has three mandatory dependencies and some optional dependencies for extra functionality, see *note Dependencies:: for the full list. In *note Dependencies from package managers:: we have prepared the command to easily install Gnuastro’s dependencies using the package manager of some operating systems. When the mandatory dependencies are ready, you can configure, compile, check and install Gnuastro on your system with the following commands. See *note Known issues:: if you confront any complications. $ cd gnuastro-X.X # Replace X.X with version number. $ ./configure $ make -j8 # Replace 8 with no. CPU threads. $ make check -j8 # Replace 8 with no. CPU threads. $ sudo make install For each program there is an ‘Invoke ProgramName’ sub-section in this book which explains how the programs should be run on the command-line (for example, see *note Invoking asttable::). In *note Tutorials::, we have prepared some complete tutorials with common Gnuastro usage scenarios in astronomical research. They even contain links to download the necessary data, and thoroughly describe every step of the process (the science, statistics and optimal usage of the command-line). We therefore recommend to read (an run the commands in) the tutorials before starting to use Gnuastro. ---------- Footnotes ---------- (1) The Gzip library and program are commonly available on most systems. However, Gnuastro recommends Lzip as described above and the beta-releases are also only distributed in ‘tar.lz’. 1.2 Gnuastro manifesto: Science and its tools ============================================= History of science indicates that there are always inevitably unseen faults, hidden assumptions, simplifications and approximations in all our theoretical models, data acquisition and analysis techniques. It is precisely these that will ultimately allow future generations to advance the existing experimental and theoretical knowledge through their new solutions and corrections. In the past, scientists would gather data and process them individually to achieve an analysis thus having a much more intricate knowledge of the data and analysis. The theoretical models also required little (if any) simulations to compare with the data. Today both methods are becoming increasingly more dependent on pre-written software. Scientists are dissociating themselves from the intricacies of reducing raw observational data in experimentation or from bringing the theoretical models to life in simulations. These ‘intricacies’ are precisely those unseen faults, hidden assumptions, simplifications and approximations that define scientific progress. Unfortunately, most persons who have recourse to a computer for statistical analysis of data are not much interested either in computer programming or in statistical method, being primarily concerned with their own proper business. Hence the common use of library programs and various statistical packages. ... It’s time that was changed. — _F.J. Anscombe. The American Statistician, Vol. 27, No. 1. 1973_ Anscombe’s quartet (http://en.wikipedia.org/wiki/Anscombe%27s_quartet) demonstrates how four data sets with widely different shapes (when plotted) give nearly identical output from standard regression techniques. Anscombe uses this (now famous) quartet, which was introduced in the paper quoted above, to argue that “_Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer_”. Echoing Anscombe’s concern after 44 years, some of the highly recognized statisticians of our time (Leek, McShane, Gelman, Colquhoun, Nuijten and Goodman), wrote in Nature that: We need to appreciate that data analysis is not purely computational and algorithmic – it is a human behaviour....Researchers who hunt hard enough will turn up a result that fits statistical criteria – but their discovery will probably be a false positive. — _Five ways to fix statistics, Nature, 551, Nov 2017._ Users of statistical (scientific) methods (software) are therefore not passive (objective) agents in their results. It is necessary to actually understand the method, not just use it as a black box. The subjective experience gained by frequently using a method/software is not sufficient to claim an understanding of how the tool/method works and how relevant it is to the data and analysis. This kind of subjective experience is prone to serious misunderstandings about the data, what the software/statistical-method really does (especially as it gets more complicated), and thus the scientific interpretation of the result. This attitude is further encouraged through non-free software(1), poorly written (or non-existent) scientific software manuals, and non-reproducible papers(2). This approach to scientific software and methods only helps in producing dogmas and an “_obscurantist faith in the expert’s special skill, and in his personal knowledge and authority_”(3). Program or be programmed. Choose the former, and you gain access to the control panel of civilization. Choose the latter, and it could be the last real choice you get to make. — _Douglas Rushkoff. Program or be programmed, O/R Books (2010)._ It is obviously impractical for any one human being to gain the intricate knowledge explained above for every step of an analysis. On the other hand, scientific data can be large and numerous, for example, images produced by telescopes in astronomy. This requires efficient algorithms. To make things worse, natural scientists have generally not been trained in the advanced software techniques, paradigms and architecture that are taught in computer science or engineering courses and thus used in most software. The GNU Astronomy Utilities are an effort to tackle this issue. Gnuastro is not just a software, this book is as important to the idea behind Gnuastro as the source code (software). This book has tried to learn from the success of the “Numerical Recipes” book in educating those who are not software engineers and computer scientists but still heavy users of computational algorithms, like astronomers. There are two major differences. The first difference is that Gnuastro’s code and the background information are segregated: the code is moved within the actual Gnuastro software source code and the underlying explanations are given here in this book. In the source code, every non-trivial step is heavily commented and correlated with this book, it follows the same logic of this book, and all the programs follow a similar internal data, function and file structure, see *note Program source::. Complementing the code, this book focuses on thoroughly explaining the concepts behind those codes (history, mathematics, science, software and usage advice when necessary) along with detailed instructions on how to run the programs. At the expense of frustrating “professionals” or “experts”, this book and the comments in the code also intentionally avoid jargon and abbreviations. The source code and this book are thus intimately linked, and when considered as a single entity can be thought of as a real (an actual software accompanying the algorithms) “Numerical Recipes” for astronomy. The second major, and arguably more important, difference is that “Numerical Recipes” does not allow you to distribute any code that you have learned from it. In other words, it does not allow you to release your software’s source code if you have used their codes, you can only publicly release binaries (a black box) to the community. Therefore, while it empowers the privileged individual who has access to it, it exacerbates social ignorance. Exactly at the opposite end of the spectrum, Gnuastro’s source code is released under the GNU general public license (GPL) and this book is released under the GNU free documentation license. You are therefore free to distribute any software you create using parts of Gnuastro’s source code or text, or figures from this book, see *note Your rights::. With these principles in mind, Gnuastro’s developers aim to impose the minimum requirements on you (in computer science, engineering and even the mathematics behind the tools) to understand and modify any step of Gnuastro if you feel the need to do so, see *note Why C:: and *note Program design philosophy::. Without prior familiarity and experience with optics, it is hard to imagine how, Galileo could have come up with the idea of modifying the Dutch military telescope optics to use in astronomy. Astronomical objects could not be seen with the Dutch military design of the telescope. In other words, it is unlikely that Galileo could have asked a random optician to make modifications (not understood by Galileo) to the Dutch design, to do something no astronomer of the time took seriously. In the paradigm of the day, what could be the purpose of enlarging geometric spheres (planets) or points (stars)? In that paradigm only the position and movement of the heavenly bodies was important, and that had already been accurately studied (recently by Tycho Brahe). In the beginning of his “The Sidereal Messenger” (published in 1610) he cautions the readers on this issue and _before_ describing his results/observations, Galileo instructs us on how to build a suitable instrument. Without a detailed description of _how_ he made his tools and done his observations, no reasonable person would believe his results. Before he actually saw the moons of Jupiter, the mountains on the Moon or the crescent of Venus, Galileo was “evasive”(4) to Kepler. Science is defined by its tools/methods, _not_ its raw results(5). The same is true today: science cannot progress with a black box, or poorly released code. The source code of a research is the new (abstractified) communication language in science, understandable by humans _and_ computers. Source code (in any programming language) is a language/notation designed to express all the details that would be too tedious/long/frustrating to report in spoken languages like English, similar to mathematic notation. An article about computational science [almost all sciences today] ... is not the scholarship itself, it is merely advertising of the scholarship. The Actual Scholarship is the complete software development environment and the complete set of instructions which generated the figures. — _Buckheit & Donoho, Lecture Notes in Statistics, Vol 103, 1996_ Today, the quality of the source code that goes into a scientific result (and the distribution of that code) is as critical to scientific vitality and integrity, as the quality of its written language/English used in publishing/distributing its paper. A scientific paper will not even be reviewed by any respectable journal if its written in a poor language/English. A similar level of quality assessment is thus increasingly becoming necessary regarding the codes/methods used to derive the results of a scientific paper. For more on this, please see Akhlaghi et al. (2021) at arXiv:2006.03018 (https://arxiv.org/abs/2006.03018)). Bjarne Stroustrup (creator of the C++ language) says: “_Without understanding software, you are reduced to believing in magic_”. Ken Thomson (the designer or the Unix operating system) says “_I abhor a system designed for the ‘user’ if that word is a coded pejorative meaning ‘stupid and unsophisticated’_.” Certainly no scientist (user of a scientific software) would want to be considered a believer in magic, or stupid and unsophisticated. This can happen when scientists get too distant from the raw data and methods, and are mainly discussing results. In other words, when they feel they have tamed Nature into their own high-level (abstract) models (creations), and are mainly concerned with scaling up, or industrializing those results. Roughly five years before special relativity, and about two decades before quantum mechanics fundamentally changed Physics, Lord Kelvin is quoted as saying: There is nothing new to be discovered in physics now. All that remains is more and more precise measurement. — _William Thomson (Lord Kelvin), 1900_ A few years earlier Albert. A. Michelson made the following statement: The more important fundamental laws and facts of physical science have all been discovered, and these are now so firmly established that the possibility of their ever being supplanted in consequence of new discoveries is exceedingly remote.... Our future discoveries must be looked for in the sixth place of decimals. — _Albert. A. Michelson, dedication of Ryerson Physics Lab, U. Chicago 1894_ If scientists are considered to be more than mere puzzle solvers(6) (simply adding to the decimals of existing values or observing a feature in 10, 100, or 100000 more galaxies or stars, as Kelvin and Michelson clearly believed), they cannot just passively sit back and uncritically repeat the previous (observational or theoretical) methods/tools on new data. Today there is a wealth of raw telescope images ready (mostly for free) at the finger tips of anyone who is interested with a fast enough internet connection to download them. The only thing lacking is new ways to analyze this data and dig out the treasure that is lying hidden in them to existing methods and techniques. New data that we insist on analyzing in terms of old ideas (that is, old models which are not questioned) cannot lead us out of the old ideas. However many data we record and analyze, we may just keep repeating the same old errors, missing the same crucially important things that the experiment was competent to find. — _Jaynes, Probability theory, the logic of science. Cambridge U. Press (2003)._ ---------- Footnotes ---------- (1) (2) Where the authors omit many of the analysis/processing “details” from the paper by arguing that they would make the paper too long/unreadable. However, software engineers have been dealing with such issues for a long time. There are thus software management solutions that allow us to supplement papers with all the details necessary to exactly reproduce the result. For example, see Akhlaghi et al. (2021, arXiv:2006.03018 (https://arxiv.org/abs/2006.03018)). (3) Karl Popper. The logic of scientific discovery. 1959. Larger quote is given at the start of the PDF (for print) version of this book. (4) Galileo G. (Translated by Maurice A. Finocchiaro). _The essential Galileo_.Hackett publishing company, first edition, 2008. (5) For example, take the following two results on the age of the universe: roughly 14 billion years (suggested by the current consensus of the standard model of cosmology) and less than 10,000 years (suggested from some interpretations of the Bible). Both these numbers are _results_. What distinguishes these two results, is the tools/methods that were used to derive them. Therefore, as the term “Scientific method” also signifies, a scientific statement it defined by its _method_, not its result. (6) Thomas S. Kuhn. _The Structure of Scientific Revolutions_, University of Chicago Press, 1962. 1.3 Your rights =============== The paragraphs below, in this section, belong to the GNU Texinfo(1) manual and are not written by us! The name “Texinfo” is just changed to “GNU Astronomy Utilities” or “Gnuastro” because they are released under the same licenses and it is beautifully written to inform you of your rights. GNU Astronomy Utilities is “free software”; this means that everyone is free to use it and free to redistribute it on certain conditions. Gnuastro is not in the public domain; it is copyrighted and there are restrictions on its distribution, but these restrictions are designed to permit everything that a good cooperating citizen would want to do. What is not allowed is to try to prevent others from further sharing any version of Gnuastro that they might get from you. Specifically, we want to make sure that you have the right to give away copies of the programs that relate to Gnuastro, that you receive the source code or else can get it if you want it, that you can change these programs or use pieces of them in new free programs, and that you know you can do these things. To make sure that everyone has such rights, we have to forbid you to deprive anyone else of these rights. For example, if you distribute copies of the Gnuastro related programs, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights. Also, for our own protection, we must make certain that everyone finds out that there is no warranty for the programs that relate to Gnuastro. If these programs are modified by someone else and passed on, we want their recipients to know that what they have is not what we distributed, so that any problems introduced by others will not reflect on our reputation. The full text of the licenses for the Gnuastro book and software can be respectively found in *note GNU General Public License::(2) and *note GNU Free Doc. License::(3). ---------- Footnotes ---------- (1) Texinfo is the GNU documentation system. It is used to create this book in all the various formats. (2) Also available in (3) Also available in 1.4 Logo of Gnuastro ==================== Gnuastro’s logo is an abstract image of a barred spiral galaxy (https://en.wikipedia.org/wiki/Barred_spiral_galaxy). The galaxy is vertically cut in half: on the left side, the beauty of a contiguous galaxy image is visible. But on the right, the image gets pixelated, and we only see the parts that are within the pixels. The pixels that are more near to the center of the galaxy (which is brighter) are also larger. But as we follow the spiral arms (and get more distant from the center), the pixels get smaller (signifying less signal). This sharp distinction between the contiguous and pixelated view of the galaxy signifies the main struggle in science: in the “real” world, objects are not pixelated or discrete and have no noise. However, when we observe nature, we are confined and constrained by the resolution of our data collection (CCD imager in this case). On the other hand, we read English text from the left and progress towards the right. This defines the positioning of the “real” and observed halves of the galaxy: the no-noised and contiguous half (on the left) passes through our observing tools and becomes pixelated and noisy half (on the right). It is the job of scientific software like Gnuastro to help interpret the underlying mechanisms of the “real” universe from the pixelated and noisy data. Gnuastro’s logo was designed by Marjan Akbari. The concept behind it was created after several design iterations with Mohammad Akhlaghi. 1.5 Naming convention ===================== Gnuastro is a package of independent programs and a collection of libraries, here we are mainly concerned with the programs. Each program has an official name which consists of one or two words, describing what they do. The latter are printed with no space, for example, NoiseChisel or Crop. On the command-line, you can run them with their executable names which start with an ‘ast’ and might be an abbreviation of the official name, for example, ‘astnoisechisel’ or ‘astcrop’, see *note Executable names::. We will use “ProgramName” for a generic official program name and ‘astprogname’ for a generic executable name. In this book, the programs are classified based on what they do and thoroughly explained. An alphabetical list of the programs that are installed on your system with this installation are given in *note Gnuastro programs list::. That list also contains the executable names and version numbers along with a one line description. 1.6 Version numbering ===================== Gnuastro can have two formats of version numbers, for official and unofficial releases. Official Gnuastro releases are announced on the ‘info-gnuastro’ mailing list, they have a version control tag in Gnuastro’s development history, and their version numbers are formatted like “‘A.B’”. ‘A’ is a major version number, marking a significant planned achievement (for example, see *note GNU Astronomy Utilities 1.0::), while ‘B’ is a minor version number, see below for more on the distinction. Note that the numbers are not decimals, so version 2.34 is much more recent than version 2.5, which is not equal to 2.50. Gnuastro also allows a unique version number for unofficial releases. Unofficial releases can mark any point in Gnuastro’s development history. This is done to allow astronomers to easily use any point in the version controlled history for their data-analysis and research publication. See *note Version controlled source:: for a complete introduction. This section is not just for developers and is intended to straightforward and easy to read, so please have a look if you are interested in the cutting-edge. This unofficial version number is a meaningful and easy to read string of characters, unique to that particular point of history. With this feature, users can easily stay up to date with the most recent bug fixes and additions that are committed between official releases. The unofficial version number is formatted like: ‘A.B.C-D’. ‘A’ and ‘B’ are the most recent official version number. ‘C’ is the number of commits that have been made after version ‘A.B’. ‘D’ is the first 4 or 5 characters of the commit hash number(1). Therefore, the unofficial version number ‘‘3.92.8-29c8’’, corresponds to the 8th commit after the official version ‘3.92’ and its commit hash begins with ‘29c8’. The unofficial version number is sort-able (unlike the raw hash) and as shown above is descriptive of the state of the unofficial release. Of course an official release is preferred for publication (since its tarballs are easily available and it has gone through more tests, making it more stable), so if an official release is announced prior to your publication’s final review, please consider updating to the official release. The major version number is set by a major goal which is defined by the developers and user community before hand, for example, see *note GNU Astronomy Utilities 1.0::. The incremental work done in minor releases are commonly small steps in achieving the major goal. Therefore, there is no limit on the number of minor releases and the difference between the (hypothetical) versions 2.927 and 3.0 can be a small (negligible to the user) improvement that finalizes the defined goals. ---------- Footnotes ---------- (1) Each point in Gnuastro’s history is uniquely identified with a 40 character long hash which is created from its contents and previous history for example: ‘5b17501d8f29ba3cd610673261e6e2229c846d35’. So the string ‘D’ in the version for this commit could be ‘5b17’, or ‘5b175’. 1.6.1 GNU Astronomy Utilities 1.0 --------------------------------- Currently (prior to Gnuastro 1.0), the aim of Gnuastro is to have a complete system for data manipulation and analysis at least similar to IRAF(1). So an astronomer can take all the standard data analysis steps (starting from raw data to the final reduced product and standard post-reduction tools) with the various programs in Gnuastro. The maintainers of each camera or detector on a telescope can provide a completely transparent shell script or Makefile to the observer for data analysis. This script can set configuration files for all the required programs to work with that particular camera. The script can then run the proper programs in the proper sequence. The user/observer can easily follow the standard shell script to understand (and modify) each step and the parameters used easily. Bash (or other modern GNU/Linux shell scripts) is powerful and made for this gluing job. This will simultaneously improve performance and transparency. Shell scripting (or Makefiles) are also basic constructs that are easy to learn and readily available as part of the Unix-like operating systems. If there is no program to do a desired step, Gnuastro’s libraries can be used to build specific programs. The main factor is that all observatories or projects can freely contribute to Gnuastro and all simultaneously benefit from it (since it does not belong to any particular one of them), much like how for-profit organizations (for example, RedHat, or Intel and many others) are major contributors to free and open source software for their shared benefit. Gnuastro’s copyright has been fully awarded to GNU, so it does not belong to any particular astronomer or astronomical facility or project. ---------- Footnotes ---------- (1) 1.7 New to GNU/Linux? ===================== Some astronomers initially install and use a GNU/Linux operating system because their necessary tools can only be installed in this environment. However, the transition is not necessarily easy. To encourage you in investing the patience and time to make this transition, and actually enjoy it, we will first start with a basic introduction to GNU/Linux operating systems. Afterwards, in *note Command-line interface:: we will discuss the wonderful benefits of the command-line interface, how it beautifully complements the graphic user interface, and why it is worth the (apparently steep) learning curve. Finally a complete chapter (*note Tutorials::) is devoted to real world scenarios of using Gnuastro (on the command-line). Therefore if you do not yet feel comfortable with the command-line we strongly recommend going through that chapter after finishing this section. You might have already noticed that we are not using the name “Linux”, but “GNU/Linux”. Please take the time to have a look at the following essays and FAQs for a complete understanding of this very important distinction. • • Recorded talk: (first 20 min is about the history of Unix-like operating systems). In short, the Linux kernel(1) is built using the GNU C library (glibc) and GNU compiler collection (gcc). The Linux kernel software alone is just a means for other software to access the hardware resources, it is useless alone! A normal astronomer (or scientist) will never interact with the kernel directly! for example, the command-line environment that you interact with is usually GNU Bash. It is GNU Bash that then talks to kernel. To better clarify, let’s use this analogy inspired from one of the links above(2): saying that you are “running Linux” is like saying you are “driving your engine”. The car’s engine is the main source of power in the car, no one doubts that. But you do not “drive” the engine, you drive the “car”. The engine alone is useless for transportation without the radiator, battery, transmission, wheels, chassis, seats, wind-shield, etc. To have an operating system, you need lower-level tools (to build the kernel), and higher-level (to use it) software packages. For the Linux kernel, both the lower-level and higher-level tools are GNU. In other words,“the whole system is basically GNU with Linux loaded”. You can replace the Linux kernel and still have the GNU shell and higher-level utilities. for example, using the “Windows Subsystem for Linux”, you can use almost all GNU tools without the original Linux kernel, but using the host Windows operating system, as in . Alternatively, you can build a fully functional GNU-based working environment on a macOS or BSD-based operating system (using the host’s kernel and C compiler), for example, through projects like Maneage (see Akhlaghi et al. 2021 (https://arxiv.org/abs/2006.03018), and its Appendix C with all the GNU software tools that is exactly reproducible on a macOS also). Therefore to acknowledge GNU’s instrumental role in the creation and usage of the Linux kernel and the operating systems that use it, we should call these operating systems “GNU/Linux”. ---------- Footnotes ---------- (1) In Unix-like operating systems, the kernel connects software and hardware worlds. (2) https://www.gnu.org/gnu/gnu-users-never-heard-of-gnu.html 1.7.1 Command-line interface ---------------------------- One aspect of Gnuastro that might be a little troubling to new GNU/Linux users is that (at least for the time being) it only has a command-line user interface (CLI). This might be contrary to the mostly graphical user interface (GUI) experience with proprietary operating systems. Since the various actions available are not always on the screen, the command-line interface can be complicated, intimidating, and frustrating for a first-time user. This is understandable and also experienced by anyone who started using the computer (from childhood) in a graphical user interface (this includes most of Gnuastro’s authors). Here we hope to convince you of the unique benefits of this interface which can greatly enhance your productivity while complementing your GUI experience. Through GNOME 3(1), most GNU/Linux based operating systems now have an advanced and useful GUI. Since the GUI was created long after the command-line, some wrongly consider the command line to be obsolete. Both interfaces are useful for different tasks. for example, you cannot view an image, video, PDF document or web page on the command-line. On the other hand you cannot reproduce your results easily in the GUI. Therefore they should not be regarded as rivals but as complementary user interfaces, here we will outline how the CLI can be useful in scientific programs. You can think of the GUI as a veneer over the CLI to facilitate a small subset of all the possible CLI operations. Each click you do on the GUI, can be thought of as internally running a different CLI command. So asymptotically (if a good designer can design a GUI which is able to show you all the possibilities to click on) the GUI is only as powerful as the command-line. In practice, such graphical designers are very hard to find for every program, so the GUI operations are always a subset of the internal CLI commands. For programs that are only made for the GUI, this results in not including lots of potentially useful operations. It also results in ‘interface design’ to be a crucially important part of any GUI program. Scientists do not usually have enough resources to hire a graphical designer, also the complexity of the GUI code is far more than CLI code, which is harmful for a scientific software, see *note Science and its tools::. For programs that have a GUI, one action on the GUI (moving and clicking a mouse, or tapping a touchscreen) might be more efficient and easier than its CLI counterpart (typing the program name and your desired configuration). However, if you have to repeat that same action more than once, the GUI will soon become frustrating and prone to errors. Unless the designers of a particular program decided to design such a system for a particular GUI action, there is no general way to run any possible series of actions automatically on the GUI. On the command-line, you can run any series of actions which can come from various CLI capable programs you have decided yourself in any possible permutation with one command(2). This allows for much more creativity and exact reproducibility that is not possible to a GUI user. For technical and scientific operations, where the same operation (using various programs) has to be done on a large set of data files, this is crucially important. It also allows exact reproducibility which is a foundation principle for scientific results. The most common CLI (which is also known as a shell) in GNU/Linux is GNU Bash, we strongly encourage you to put aside several hours and go through this beautifully explained web page: . You do not need to read or even fully understand the whole thing, only a general knowledge of the first few chapters are enough to get you going. Since the operations in the GUI are limited and they are visible, reading a manual is not that important in the GUI (most programs do not even have any!). However, to give you the creative power explained above, with a CLI program, it is best if you first read the manual of any program you are using. You do not need to memorize any details, only an understanding of the generalities is needed. Once you start working, there are more easier ways to remember a particular option or operation detail, see *note Getting help::. To experience the command-line in its full glory and not in the GUI terminal emulator, press the following keys together: (3) to access the virtual console. To return back to your GUI, press the same keys above replacing with (or , or , depending on your GNU/Linux distribution). In the virtual console, the GUI, with all its distracting colors and information, is gone. Enabling you to focus entirely on your actual work. For operations that use a lot of your system’s resources (processing a large number of large astronomical images for example), the virtual console is the place to run them. This is because the GUI is not competing with your research work for your system’s RAM and CPU. Since the virtual consoles are completely independent, you can even log out of your GUI environment to give even more of your hardware resources to the programs you are running and thus reduce the operating time. Since it uses far less system resources, the CLI is also convenient for remote access to your computer. Using secure shell (SSH) you can log in securely to your system (similar to the virtual console) from anywhere even if the connection speeds are low. There are apps for smart phones and tablets which allow you to do this. ---------- Footnotes ---------- (1) (2) By writing a shell script and running it, for example, see the tutorials in *note Tutorials::. (3) Instead of , you can use any of the keys from to for different virtual consoles depending on your GNU/Linux distribution, try them all out. You can also run a separate GUI from within this console if you want to. 1.8 Report a bug ================ According to Wikipedia “a software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways”. So when you see that a program is crashing, not reading your input correctly, giving the wrong results, or not writing your output correctly, you have found a bug. In such cases, it is best if you report the bug to the developers. The programs will also inform you if known impossible situations occur (which are caused by something unexpected) and will ask the users to report the bug issue. Prior to actually filing a bug report, it is best to search previous reports. The issue might have already been found and even solved. The best place to check if your bug has already been discussed is the bugs tracker on *note Gnuastro project webpage:: at . In the top search fields (under “Display Criteria”) set the “Open/Closed” drop-down menu to “Any” and choose the respective program or general category of the bug in “Category” and click the “Apply” button. The results colored green have already been solved and the status of those colored in red is shown in the table. Recently corrected bugs are probably not yet publicly released because they are scheduled for the next Gnuastro stable release. If the bug is solved but not yet released and it is an urgent issue for you, you can get the version controlled source and compile that, see *note Version controlled source::. To solve the issue as readily as possible, please follow the following to guidelines in your bug report. The How to Report Bugs Effectively (http://www.chiark.greenend.org.uk/~sgtatham/bugs.html) and How To Ask Questions The Smart Way (http://catb.org/~esr/faqs/smart-questions.html) essays also provide some good generic advice for all software (do not contact their authors for Gnuastro’s problems). Mastering the art of giving good bug reports (like asking good questions) can greatly enhance your experience with any free and open source software. So investing the time to read through these essays will greatly reduce your frustration after you see something does not work the way you feel it is supposed to for a large range of software, not just Gnuastro. *Be descriptive* Please provide as many details as possible and be very descriptive. Explain what you expected and what the output was: it might be that your expectation was wrong. Also please clearly state which sections of the Gnuastro book (this book), or other references you have studied to understand the problem. This can be useful in correcting the book (adding links to likely places where users will check). But more importantly, it will be encouraging for the developers, since you are showing how serious you are about the problem and that you have actually put some thought into it. “To be able to ask a question clearly is two-thirds of the way to getting it answered.” – John Ruskin (1819-1900). *Individual and independent bug reports* If you have found multiple bugs, please send them as separate (and independent) bugs (as much as possible). This will significantly help us in managing and resolving them sooner. *Reproducible bug reports* If we cannot exactly reproduce your bug, then it is very hard to resolve it. So please send us a Minimal working example(1) along with the description. for example, in running a program, please send us the full command-line text and the output with the ‘-P’ option, see *note Operating mode options::. If it is caused only for a certain input, also send us that input file. In case the input FITS is large, please use Crop to only crop the problematic section and make it as small as possible so it can easily be uploaded and downloaded and not waste the archive’s storage, see *note Crop::. There are generally two ways to inform us of bugs: • Send a mail to ‘bug-gnuastro@gnu.org’. Any mail you send to this address will be distributed through the bug-gnuastro mailing list(2). This is the simplest way to send us bug reports. The developers will then register the bug into the project web page (next choice) for you. • Use the Gnuastro project web page at : There are two ways to get to the submission page as listed below. Fill in the form as described below and submit it (see *note Gnuastro project webpage:: for more on the project web page). • Using the top horizontal menu items, immediately under the top page title. Hovering your mouse on “Support” will open a drop-down list. Select “Submit new”. • In the main body of the page, under the “Communication tools” section, click on “Submit new item”. Once the items have been registered in the mailing list or web page, the developers will add it to either the “Bug Tracker” or “Task Manager” trackers of the Gnuastro project web page. These two trackers can only be edited by the Gnuastro project developers, but they can be browsed by anyone, so you can follow the progress on your bug. You are most welcome to join us in developing Gnuastro and fixing the bug you have found maybe a good starting point. Gnuastro is designed to be easy for anyone to develop (see *note Science and its tools::) and there is a full chapter devoted to developing it: *note Developing::. ---------- Footnotes ---------- (1) (2) 1.9 Suggest new feature ======================= We would always be happy to hear of suggested new features. For every program, there are already lists of features that we are planning to add. You can see the current list of plans from the Gnuastro project web page at and following “Tasks”→“Browse” on the horizontal menu at the top of the page immediately under the title, see *note Gnuastro project webpage::. If you want to request a feature to an existing program, click on the “Display Criteria” above the list and under “Category”, choose that particular program. Under “Category” you can also see the existing suggestions for new programs or other cases like installation, documentation or libraries. Also, be sure to set the “Open/Closed” value to “Any”. If the feature you want to suggest is not already listed in the task manager, then follow the steps that are fully described in *note Report a bug::. Please have in mind that the developers are all busy with their own astronomical research, and implementing existing “task”s to add or resolve bugs. Gnuastro is a volunteer effort and none of the developers are paid for their hard work. So, although we will try our best, please do not expect for your suggested feature to be immediately included (for the next release of Gnuastro). The best person to apply the exciting new feature you have in mind is you, since you have the motivation and need. In fact, Gnuastro is designed for making it as easy as possible for you to hack into it (add new features, change existing ones and so on), see *note Science and its tools::. Please have a look at the chapter devoted to developing (*note Developing::) and start applying your desired feature. Once you have added it, you can use it for your own work and if you feel you want others to benefit from your work, you can request for it to become part of Gnuastro. You can then join the developers and start maintaining your own part of Gnuastro. If you choose to take this path of action please contact us beforehand (*note Report a bug::) so we can avoid possible duplicate activities and get interested people in contact. *Gnuastro is a collection of low level programs:* As described in *note Program design philosophy::, a founding principle of Gnuastro is that each library or program should be basic and low-level. High level jobs should be done by running the separate programs or using separate functions in succession through a shell script or calling the libraries by higher level functions, see the examples in *note Tutorials::. So when making the suggestions please consider how your desired job can best be broken into separate steps and modularized. 1.10 Announcements ================== Gnuastro has a dedicated mailing list for making announcements (‘info-gnuastro’). Anyone can subscribe to this mailing list. Anytime there is a new stable or test release, an email will be circulated there. The email contains a summary of the overall changes along with a detailed list (from the ‘NEWS’ file). This mailing list is thus the best way to stay up to date with new releases, easily learn about the updated/new features, or dependencies (see *note Dependencies::). To subscribe to this list, please visit . Traffic (number of mails per unit time) in this list is designed to be low: only a handful of mails per year. Previous announcements are available on its archive (http://lists.gnu.org/archive/html/info-gnuastro/). 1.11 Conventions ================ In this book we have the following conventions: • All commands that are to be run on the shell (command-line) prompt as the user start with a ‘$’. In case they must be run as a super-user or system administrator, they will start with a single ‘#’. If the command is in a separate line and next line ‘is also in the code type face’, but does not have any of the ‘$’ or ‘#’ signs, then it is the output of the command after it is run. As a user, you do not need to type those lines. A line that starts with ‘##’ is just a comment for explaining the command to a human reader and must not be typed. • If the command becomes larger than the page width a <\> is inserted in the code. If you are typing the code by hand on the command-line, you do not need to use multiple lines or add the extra space characters, so you can omit them. If you want to copy and paste these examples (highly discouraged!) then the <\> should stay. The <\> character is a shell escape character which is used commonly to make characters which have special meaning for the shell, lose that special meaning (the shell will not treat them especially if there is a <\> behind them). When <\> is the last visible character in a line (the next character is a new-line character) the new-line character loses its meaning. Therefore, the shell sees it as a simple white-space character not the end of a command! This enables you to use multiple lines to write your commands. This is not a convention, but a bi-product of the PDF building process of the manual: In the PDF version of this manual, a single quote (or apostrophe) character in the commands or codes is shown like this: ‘'’. Single quotes are sometimes necessary in combination with commands like ‘awk’ or ‘sed’, or when using Column arithmetic in Gnuastro’s own Table (see *note Column arithmetic::). Therefore when typing (recommended) or copy-pasting (not recommended) the commands that have a ‘'’, please correct it to the single-quote (or apostrophe) character, otherwise the command will fail. 1.12 Acknowledgments ==================== Gnuastro would not have been possible without scholarships and grants from several funding institutions. We thus ask that if you used Gnuastro in any of your papers/reports, please add the proper citation and acknowledge the funding agencies/projects. For details of which papers to cite (may be different for different programs) and get the acknowledgment statement to include in your paper, please run the relevant programs with the common ‘--cite’ option like the example commands below (for more on ‘--cite’, please see *note Operating mode options::). $ astnoisechisel --cite $ astmkcatalog --cite Here, we will acknowledge all the institutions (and their grants) along with the people who helped make Gnuastro possible. The full list of Gnuastro authors is available at the start of this book and the ‘AUTHORS’ file in the source code (both are generated automatically from the version controlled history). The plain text file ‘THANKS’, which is also distributed along with the source code, contains the list of people and institutions who played an indirect role in Gnuastro (not committed any code in the Gnuastro version controlled history). The Japanese Ministry of Education, Culture, Sports, Science, and Technology (MEXT) scholarship for Mohammad Akhlaghi’s Masters and PhD degree in Tohoku University Astronomical Institute had an instrumental role in the long term learning and planning that made the idea of Gnuastro possible. The very critical view points of Professor Takashi Ichikawa (Mohammad’s adviser) were also instrumental in the initial ideas and creation of Gnuastro. Afterwards, the European Research Council (ERC) advanced grant 339659-MUSICOS (Principal investigator: Roland Bacon) was vital in the growth and expansion of Gnuastro. Working with Roland at the Centre de Recherche Astrophysique de Lyon (CRAL), enabled a thorough re-write of the core functionality of all libraries and programs, turning Gnuastro into the large collection of generic programs and libraries it is today. At the Instituto de Astrofisica de Canarias (IAC, and in particular in collaboration with Johan Knapen and Ignacio Trujillo), Gnuastro matured and its user base significantly grew. Work on improving Gnuastro is now continuing primarily in the Centro de Estudios de Física del Cosmos de Aragón (CEFCA), located in Teruel, Spain. In general, we would like to gratefully thank the following people for their useful and constructive comments and suggestions (in alphabetical order by family name): Valentina Abril-melgarejo, Marjan Akbari, Carlos Allende Prieto, Hamed Altafi, Roland Bacon, Roberto Baena Gallé, Zahra Bagheri, Karl Berry, Faezeh Bidjarchian, Leindert Boogaard, Nicolas Bouché, Stefan Brüns, Fernando Buitrago, Adrian Bunk, Rosa Calvi, Mark Calabretta Nushkia Chamba, Sergio Chueca Urzay, Tamara Civera Lorenzo, Benjamin Clement, Nima Dehdilani, Andrés Del Pino Molina, Antonio Diaz Diaz, Alexey Dokuchaev, Pierre-Alain Duc, Alessandro Ederoclite, Elham Eftekhari, Paul Eggert, Sepideh Eskandarlou, Sílvia Farras, Juan Antonio Fernández Ontiveros, Gaspar Galaz, Andrés García-Serra Romero, Zohre Ghaffari, Thérèse Godefroy, Giulia Golini, Craig Gordon, Martin Guerrero Roncel, Madusha Gunawardhana, Bruno Haible, Stephen Hamer, Zahra Hosseini, Leslie Hunt, Takashi Ichikawa, Raúl Infante Sainz, Brandon Invergo, Oryna Ivashtenko, Aurélien Jarno, Lee Kelvin, Brandon Kelly, Mohammad-Reza Khellat, Johan Knapen, Geoffry Krouchi, Martin Kuemmel, Teet Kuutma, Clotilde Laigle, Floriane Leclercq, Alan Lefor, Javier Licandro, Jeremy Lim, Alejandro Lumbreras Calle, Sebastián Luna Valero, Alberto Madrigal, Guillaume Mahler, Juan Miro, Alireza Molaeinezhad, Javier Moldon, Juan Molina Tobar, Francesco Montanari, Raphael Morales, Carlos Morales Socorro, Sylvain Mottet, Dmitrii Oparin, François Ochsenbein, Bertrand Pain, William Pence, Irene Pintos Castro, Mamta Pommier, Marcel Popescu, Bob Proulx, Joseph Putko, Samane Raji, Ignacio Ruiz Cejudo, Teymoor Saifollahi, Joanna Sakowska, Elham Saremi, Nafise Sedighi, Markus Schaney, Yahya Sefidbakht, Alejandro Serrano Borlaff, Zahra Sharbaf, David Shupe, Leigh Smith, Jenny Sorce, Manuel Sánchez-Benavente, Lee Spitler, Richard Stallman, Michael Stein, Ole Streicher, Alfred M. Szmidt, Michel Tallon, Juan C. Tello, Vincenzo Testa, Éric Thiébaut, Ignacio Trujillo, Peter Teuben, David Valls-Gabaud, Jesús Varela, Aaron Watkins, Richard Wilbur, Michael H.F. Wilkinson, Christopher Willmer, Xiuqin Wu, Sara Yousefi Taemeh, Johannes Zabl. The GNU French Translation Team is also managing the French version of the top Gnuastro web page which we highly appreciate. Finally, we should thank all the (sometimes anonymous) people in various online forums who patiently answered all our small (but important) technical questions. All work on Gnuastro has been voluntary, but the authors are most grateful to the following institutions (in chronological order) for hosting/supporting us in our research. Where necessary, these institutions have disclaimed any ownership of the parts of Gnuastro that were developed there, thus insuring the freedom of Gnuastro for the future (see *note Copyright assignment::). We highly appreciate their support for free software, and thus free science, and therefore a free society. Tohoku University Astronomical Institute, Sendai, Japan. University of Salento, Lecce, Italy. Centre de Recherche Astrophysique de Lyon (CRAL), Lyon, France. Instituto de Astrofisica de Canarias (IAC), Tenerife, Spain. Centro de Estudios de Física del Cosmos de Aragón (CEFCA), Teruel, Spain. Google Summer of Code 2020, 2021 and 2022 2 Tutorials *********** To help new users have a smooth and easy start with Gnuastro, in this chapter several thoroughly elaborated tutorials, or cookbooks, are provided. These tutorials demonstrate the capabilities of different Gnuastro programs and libraries, along with tips and guidelines for the best practices of using them in various realistic situations. We strongly recommend going through these tutorials to get a good feeling of how the programs are related (built in a modular design to be used together in a pipeline), very similar to the core Unix-based programs that they were modeled on. Therefore these tutorials will help in optimally using Gnuastro’s programs (and generally, the Unix-like command-line environment) effectively for your research. The first three tutorials (*note General program usage tutorial:: and *note Detecting large extended targets:: and *note Building the extended PSF::) use real input datasets from some of the deep Hubble Space Telescope (HST) images, the Sloan Digital Sky Survey (SDSS) and the Javalambre Photometric Local Universe Survey (J-PLUS) respectively. Their aim is to demonstrate some real-world problems that many astronomers often face and how they can be solved with Gnuastro’s programs. The fourth tutorial (*note Sufi simulates a detection::) focuses on simulating astronomical images, which is another critical aspect of any analysis! The ultimate aim of *note General program usage tutorial:: is to detect galaxies in a deep HST image, measure their positions and brightness and select those with the strongest colors. In the process, it takes many detours to introduce you to the useful capabilities of many of the programs. So please be patient in reading it. If you do not have much time and can only try one of the tutorials, we recommend this one. *note Detecting large extended targets:: deals with a major problem in astronomy: effectively detecting the faint outer wings of bright (and large) nearby galaxies to extremely low surface brightness levels (roughly one quarter of the local noise level in the example discussed). Besides the interesting scientific questions in these low-surface brightness features, failure to properly detect them will bias the measurements of the background objects and the survey’s noise estimates. This is an important issue, especially in wide surveys. Because bright/large galaxies and stars(1), cover a significant fraction of the survey area. *note Building the extended PSF:: tackles an important problem in astronomy: how the extract the PSF of an image, to the largest possible extent, without assuming any functional form. In Gnuastro we have multiple installed scripts for this job. Their usage and logic behind best tuning them for the particular step, is fully described in this tutorial, on a real dataset. The tutorial concludes with subtracting that extended PSF from the science image; thus giving you a cleaner image (with no scattered light of the brighter stars) for your higher-level analysis. *note Sufi simulates a detection:: has a fictional(2) setting! Showing how Abd al-rahman Sufi (903 – 986 A.D., the first recorded description of “nebulous” objects in the heavens is attributed to him) could have used some of Gnuastro’s programs for a realistic simulation of his observations and see if his detection of nebulous objects was trust-able. Because all conditions are under control in a simulated/mock environment/dataset, mock datasets can be a valuable tool to inspect the limitations of your data analysis and processing. But they need to be as realistic as possible, so this tutorial is dedicated to this important step of an analysis (simulations). In these tutorials, we have intentionally avoided too many cross references to make it more easy to read. For more information about a particular program, you can visit the section with the same name as the program in this book. Each program section in the subsequent chapters starts by explaining the general concepts behind what it does, for example, see *note Convolve::. If you only want practical information on running a program, for example, its options/configuration, input(s) and output(s), please consult the subsection titled “Invoking ProgramName”, for example, see *note Invoking astnoisechisel::. For an explanation of the conventions we use in the example codes through the book, please see *note Conventions::. ---------- Footnotes ---------- (1) Stars also have similarly large and extended wings due to the point spread function, see *note PSF::. (2) The two historically motivated tutorials (*note Sufi simulates a detection:: is not intended to be a historical reference (the historical facts of this fictional tutorial used Wikipedia as a reference).) This form of presenting a tutorial was influenced by the PGF/TikZ and Beamer manuals. They are both packages in TeX and LaTeX, the first is a high-level vector graphic programming environment, while with the second you can make presentation slides. On a similar topic, there are also some nice words of wisdom for Unix-like systems called Rootless Root (http://catb.org/esr/writings/unix-koans). These also have a similar style but they use a mythical figure named Master Foo. If you already have some experience in Unix-like systems, you will definitely find these Unix Koans entertaining/educative. 2.1 General program usage tutorial ================================== Measuring colors of astronomical objects in broad-band or narrow-band images is one of the most basic and common steps in astronomical analysis. Here, we will use Gnuastro’s programs to get a physical scale (area at certain redshifts) of the field we are studying, detect objects in a Hubble Space Telescope (HST) image, measure their colors and identify the ones with the strongest colors, do a visual inspection of these objects and inspect spatial position in the image. After this tutorial, you can also try the *note Detecting large extended targets:: tutorial which goes into a little more detail on detecting very low surface brightness signal. During the tutorial, we will take many detours to explain, and practically demonstrate, the many capabilities of Gnuastro’s programs. In the end you will see that the things you learned during this tutorial are much more generic than this particular problem and can be used in solving a wide variety of problems involving the analysis of data (images or tables). So please do not rush, and go through the steps patiently to optimally master Gnuastro. In this tutorial, we will use the HSTeXtreme Deep Field (https://archive.stsci.edu/prepds/xdf) dataset. Like almost all astronomical surveys, this dataset is free for download and usable by the public. You will need the following tools in this tutorial: Gnuastro, SAO DS9 (1), GNU Wget(2), and AWK (most common implementation is GNU AWK(3)). This tutorial was first prepared for the “Exploring the Ultra-Low Surface Brightness Universe” workshop (November 2017) at the ISSI in Bern, Switzerland. It was further extended in the “4th Indo-French Astronomy School” (July 2018) organized by LIO, CRAL CNRS UMR5574, UCBL, and IUCAA in Lyon, France. We are very grateful to the organizers of these workshops and the attendees for the very fruitful discussions and suggestions that made this tutorial possible. *Write the example commands manually:* Try to type the example commands on your terminal manually and use the history feature of your command-line (by pressing the “up” button to retrieve previous commands). Do Not simply copy and paste the commands shown here. This will help simulate future situations when you are processing your own datasets. ---------- Footnotes ---------- (1) See *note SAO DS9::, available at (2) (3) 2.1.1 Calling Gnuastro’s programs --------------------------------- A handy feature of Gnuastro is that all program names start with ‘ast’. This will allow your command-line processor to easily list and auto-complete Gnuastro’s programs for you. Try typing the following command (press key when you see ‘’) to see the list: $ ast Any program that starts with ‘ast’ (including all Gnuastro programs) will be shown. By choosing the subsequent characters of your desired program and pressing <> again, the list will narrow down and the program name will auto-complete once your input characters are unambiguous. In short, you often do not need to type the full name of the program you want to run. 2.1.2 Accessing documentation ----------------------------- Gnuastro contains a large number of programs and it is natural to forget the details of each program’s options or inputs and outputs. Therefore, before starting the analysis steps of this tutorial, let’s review how you can access this book to refresh your memory any time you want, without having to take your hands off the keyboard. When you install Gnuastro, this book is also installed on your system along with all the programs and libraries, so you do not need an internet connection to access/read it. Also, by accessing this book as described below, you can be sure that it corresponds to your installed version of Gnuastro. GNU Info(1) is the program in charge of displaying the manual on the command-line (for more, see *note Info::). To see this whole book on your command-line, please run the following command and press subsequent keys. Info has its own mini-environment, therefore we will show the keys that must be pressed in the mini-environment after a ‘->’ sign. You can also ignore anything after the ‘#’ sign in the middle of the line, they are only for your information. $ info gnuastro # Open the top of the manual. -> # All the book chapters. -> # Continue down: show sections. -> ... # Keep pressing space to go down. -> q # Quit Info, return to the command-line. The thing that greatly simplifies navigation in Info is the links (regions with an underline). You can immediately go to the next link in the page with the <> key and press <> on it to go into that part of the manual. Try the commands above again, but this time also use <> to go to the links and press <> on them to go to the respective section of the book. Then follow a few more links and go deeper into the book. To return to the previous page, press (small L). If you are searching for a specific phrase in the whole book (for example, an option name), press and type your search phrase and end it with an <>. You do not need to start from the top of the manual every time. For example, to get to *note Invoking astnoisechisel::, run the following command. In general, all programs have such an “Invoking ProgramName” section in this book. These sections are specifically for the description of inputs, outputs and configuration options of each program. You can access them directly for each program by giving its executable name to Info. $ info astnoisechisel The other sections do not have such shortcuts. To directly access them from the command-line, you need to tell Info to look into Gnuastro’s manual, then look for the specific section (an unambiguous title is necessary). For example, if you only want to review/remember NoiseChisel’s *note Detection options::), just run the following command. Note how case is irrelevant for Info when calling a title in this manner. $ info gnuastro "Detection options" In general, Info is a powerful and convenient way to access this whole book with detailed information about the programs you are running. If you are not already familiar with it, please run the following command and just read along and do what it says to learn it. Do not stop until you feel sufficiently fluent in it. Please invest the half an hour’s time necessary to start using Info comfortably. It will greatly improve your productivity and you will start reaping the rewards of this investment very soon. $ info info As a good scientist you need to feel comfortable to play with the features/options and avoid (be critical to) using default values as much as possible. On the other hand, our human memory is limited, so it is important to be able to easily access any part of this book fast and remember the option names, what they do and their acceptable values. If you just want the option names and a short description, calling the program with the ‘--help’ option might also be a good solution like the first example below. If you know a few characters of the option name, you can feed the printed output to ‘grep’ like the second or third example commands. $ astnoisechisel --help $ astnoisechisel --help | grep quant $ astnoisechisel --help | grep check ---------- Footnotes ---------- (1) GNU Info is already available on almost all Unix-like operating systems. 2.1.3 Setup and data download ----------------------------- The first step in the analysis of the tutorial is to download the necessary input datasets. First, to keep things clean, let’s create a ‘gnuastro-tutorial’ directory and continue all future steps in it: $ mkdir gnuastro-tutorial $ cd gnuastro-tutorial We will be using the near infra-red Wide Field Camera (http://www.stsci.edu/hst/wfc3) dataset. If you already have them in another directory (for example, ‘XDFDIR’, with the same FITS file names), you can set the ‘download’ directory to be a symbolic link to ‘XDFDIR’ with a command like this: $ ln -s XDFDIR download Otherwise, when the following images are not already present on your system, you can make a ‘download’ directory and download them there. $ mkdir download $ cd download $ xdfurl=http://archive.stsci.edu/pub/hlsp/xdf $ wget $xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_f105w_v1_sci.fits $ wget $xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_f125w_v1_sci.fits $ wget $xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits $ cd .. In this tutorial, we will just use these three filters. Later, you may need to download more filters. To do that, you can use the shell’s ‘for’ loop to download them all in series (one after the other(1)) with one command like the one below for the WFC3 filters. Put this command instead of the three ‘wget’ commands above. Recall that all the extra spaces, back-slashes (‘\’), and new lines can be ignored if you are typing on the lines on the terminal. $ for f in f105w f125w f140w f160w; do \ wget $xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_"$f"_v1_sci.fits; \ done ---------- Footnotes ---------- (1) Note that you only have one port to the internet, so downloading in parallel will actually be slower than downloading in series. 2.1.4 Dataset inspection and cropping ------------------------------------- First, let’s visually inspect the datasets we downloaded in *note Setup and data download::. Let’s take F160W image as an example. One of the most common programs for viewing FITS images is SAO DS9, which is usually called through the ‘ds9’ command-line program, like the command below. If you do not already have DS9 on your computer and the command below fails, please see *note SAO DS9::. $ ds9 download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits By default, DS9 open a relatively small window (for modern browsers) and its default scale and color bar make it very hard to see any structure in the image: everything will look black. Also, by default, it zooms into the center of the image and you need to scroll to zoom-out and see the whole thing. To avoid these problems, Gnuastro has the ‘astscript-fits-view’ script: $ astscript-fits-view \ download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits After running this command, you will see that the DS9 window fully covers the height of your monitor, it is showing the whole image, using a more clear color-map, and many more useful things. In fact, you see the DS9 command that is used in your terminal(1). On GNU/Linux operating systems (like Ubuntu, and Fedora), you can also set your graphics user interface to use this script for opening FITS files when you click on them. For more, see the instructions in the checklist at the start of *note Invoking astscript-fits-view::. As you hover your mouse over the image, notice how the “Value” and positional fields on the top of the ds9 window get updated. The first thing you might notice is that when you hover the mouse over the regions with no data, they have a value of zero. The next thing might be that the dataset has a shallower and deeper component (see *note Quantifying measurement limits::). Recall that this is a combined/reduced image of many exposures, and the parts that have more exposures are deeper. In particular, the exposure time of the deep inner region is more than 4 times the exposure time of the outer (more shallower) parts. To simplify the analysis in this tutorial, we will only be working on the deep field, so let’s crop it out of the full dataset. Fortunately the XDF survey web page (above) contains the vertices of the deep flat WFC3-IR field(2). With Gnuastro’s Crop program(3), you can use those vertices to cutout this deep region from the larger image. But before that, to keep things organized, let’s make a directory called ‘flat-ir’ and keep the flat (single-depth) regions in that directory (with a ‘‘xdf-’’ prefix for a shorter and easier filename). $ mkdir flat-ir $ astcrop --mode=wcs -h0 --output=flat-ir/xdf-f105w.fits \ --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \ 53.134517,-27.787144 : 53.161906,-27.807208" \ download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f105w_v1_sci.fits $ astcrop --mode=wcs -h0 --output=flat-ir/xdf-f125w.fits \ --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \ 53.134517,-27.787144 : 53.161906,-27.807208" \ download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f125w_v1_sci.fits $ astcrop --mode=wcs -h0 --output=flat-ir/xdf-f160w.fits \ --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \ 53.134517,-27.787144 : 53.161906,-27.807208" \ download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits Run the command below to have a look at the cropped images: $ astscript-fits-view flat-ir/xdf-f160w.fits You only see the deep region now, does not the noise look much cleaner? An important result of this crop is that regions with no data now have a NaN (Not-a-Number, or a blank value) value. Any self-respecting statistical program will ignore NaN values, so they will not affect your outputs. For example, notice how changing the DS9 color bar will not affect the NaN pixels (their colors will not change). However, do you remember that in the downloaded files, such regions had a value of zero? That is a big problem! Because zero is a number, and is thus meaningful, especially when you later want to NoiseChisel to detect(4) all the signal from the deep universe in this image. Generally, when you want to ignore some pixels in a dataset, and avoid higher-level ambiguities or complications, it is always best to give them blank values (not zero, or some other absurdly large or small number). Gnuastro has the Arithmetic program for such cases, and we will introduce it later in this tutorial. In the example above, the polygon vertices are in degrees, but you can also replace them with sexagesimal(5) coordinates (for example, using ‘03h32m44.9794’ or ‘03:32:44.9794’ instead of ‘53.187414’ instead of the first RA and ‘-27d46m44.9472’ or ‘-27:46:44.9472’ instead of the first Dec). To further simplify things, you can even define your polygon visually as a DS9 “region”, save it as a “region file” and give that file to crop. But we need to continue, so if you are interested to learn more, see *note Crop::. Before closing this section, let’s just take a look at the three cropping commands we ran above. The only thing varying in the three commands the filter name! Note how everything else is the same! In such cases, you should generally avoid repeating a command manually, it is prone to _many_ bugs, and as you see, it is very hard to read (did not you suddenly write a ‘7’ as an ‘8’?). To simplify the command, and allow you to work on more filters, we can use the shell’s ‘for’ loop as shown below. Notice how the place where the filter names (‘f105w’, ‘f125w’ and ‘f160w’) are used above, have been replaced with ‘$f’ (the shell variable that ‘for’ will update in every loop) below. $ rm flat-ir/*.fits $ for f in f105w f125w f160w; do \ astcrop --mode=wcs -h0 --output=flat-ir/xdf-$f.fits \ --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \ 53.134517,-27.787144 : 53.161906,-27.807208" \ download/hlsp_xdf_hst_wfc3ir-60mas_hudf_"$f"_v1_sci.fits; \ done ---------- Footnotes ---------- (1) When comparing DS9’s command-line options to Gnuastro’s, you will notice how SAO DS9 does not follow the GNU style of options where “long” and “short” options are preceded by ‘--’ and ‘-’ respectively (for example, ‘--width’ and ‘-w’, see *note Options::). (2) (3) To learn more about the crop program see *note Crop::. (4) As you will see below, unlike most other detection algorithms, NoiseChisel detects the objects from their faintest parts, it does not start with their high signal-to-noise ratio peaks. Since the Sky is already subtracted in many images and noise fluctuates around zero, zero is commonly higher than the initial threshold applied. Therefore keeping zero-valued pixels in this image will cause them to identified as part of the detections! (5) 2.1.5 Angular coverage on the sky --------------------------------- This is the deepest image we currently have of the sky. The first thing that comes to mind may be this: “How large is this field on the sky?”. You can get a fast and crude answer with Gnuastro’s Fits program, using this command: astfits flat-ir/xdf-f160w.fits --skycoverage It will print the sky coverage in two formats (all numbers are in units of degrees for this image): 1) the image’s central RA and Dec and full width around that center, 2) the range of RA and Dec covered by this image. You can use these values in various online query systems. You can also use this option to automatically calculate the area covered by this image. With the ‘--quiet’ option, the printed output of ‘--skycoverage’ will not contain human-readable text, making it easier for automatic (computer) processing: astfits flat-ir/xdf-f160w.fits --skycoverage --quiet The second row is the coverage range along RA and Dec (compare with the outputs before using ‘--quiet’). We can thus simply subtract the second from the first column and multiply it with the difference of the fourth and third columns to calculate the image area. We will also multiply each by 60 to have the area in arc-minutes squared. astfits flat-ir/xdf-f160w.fits --skycoverage --quiet \ | awk 'NR==2{print ($2-$1)*60*($4-$3)*60}' The returned value is $9.06711$ arcmin$^2$. *However, this method ignores the fact that many of the image pixels are blank!* In other words, the image does cover this area, but there is no data in more than half of the pixels. So let’s calculate the area coverage over-which we actually have data. The FITS world coordinate system (WCS) meta data standard contains the key to answering this question. Run the following command to see all the FITS keywords (metadata) for one of the images (almost identical with the other images because they were are scaled to the same region of Sky): astfits flat-ir/xdf-f160w.fits -h1 Look into the keywords grouped under the ‘‘World Coordinate System (WCS)’’ title. These keywords define how the image relates to the outside world. In particular, the ‘CDELT*’ keywords (or ‘CDELT1’ and ‘CDELT2’ in this 2D image) contain the “Coordinate DELTa” (or change in coordinate units) with a change in one pixel. But what is the units of each “world” coordinate? The ‘CUNIT*’ keywords (for “Coordinate UNIT”) have the answer. In this case, both ‘CUNIT1’ and ‘CUNIT1’ have a value of ‘deg’, so both “world” coordinates are in units of degrees. We can thus conclude that the value of ‘CDELT*’ is in units of degrees-per-pixel(1). With the commands below, we will use ‘CDELT’ (along with the number of non-blank pixels) to find the answer of our initial question: “how much of the sky does this image cover?”. The lines starting with ‘##’ are just comments for you to read and understand each command. Do Not type them on the terminal (no problem if you do, they will just not have any effect). The commands are intentionally repetitive in some places to better understand each step and also to demonstrate the beauty of command-line features like history, variables, pipes and loops (which you will commonly use as you become more proficient on the command-line). *Use shell history:* Do Not forget to make effective use of your shell’s history: you do not have to re-type previous command to add something to them (like the examples below). This is especially convenient when you just want to make a small change to your previous command. Press the “up” key on your keyboard (possibly multiple times) to see your previous command(s) and modify them accordingly. *Your locale does not use ‘.’ as decimal separator:* on systems that do not use an English language environment, the dates, numbers, etc., can be printed in different formats (for example, ‘0.5’ can be written as ‘0,5’: with a comma). With the ‘LC_NUMERIC’ line at the start of the script below, we are ensuring a unified format in the output of ‘seq’. For more, please see *note Numeric locale::. ## Make sure that the decimal separator is a point in any environment. $ export LC_NUMERIC=C ## See the general statistics of non-blank pixel values. $ aststatistics flat-ir/xdf-f160w.fits ## We only want the number of non-blank pixels (add '--number'). $ aststatistics flat-ir/xdf-f160w.fits --number ## Keep the result of the command above in the shell variable `n'. $ n=$(aststatistics flat-ir/xdf-f160w.fits --number) ## See what is stored the shell variable `n'. $ echo $n ## Show all the FITS keywords of this image. $ astfits flat-ir/xdf-f160w.fits -h1 ## The resolution (in degrees/pixel) is in the `CDELT' keywords. ## Only show lines that contain these characters, by feeding ## the output of the previous command to the `grep' program. $ astfits flat-ir/xdf-f160w.fits -h1 | grep CDELT ## Since the resolution of both dimensions is (approximately) equal, ## we will only read the value of one (CDELT1) with '--keyvalue'. $ astfits flat-ir/xdf-f160w.fits -h1 --keyvalue=CDELT1 ## We do not need the file name in the output (add '--quiet'). $ astfits flat-ir/xdf-f160w.fits -h1 --keyvalue=CDELT1 --quiet ## Save it as the shell variable `r'. $ r=$(astfits flat-ir/xdf-f160w.fits -h1 --keyvalue=CDELT1 --quiet) ## Print the values of `n' and `r'. $ echo $n $r ## Use the number of pixels (first number passed to AWK) and ## length of each pixel's edge (second number passed to AWK) ## to estimate the area of the field in arc-minutes squared. $ echo $n $r | awk '{print $1 * ($2*60)^2}' The output of the last command (area of this field) is 4.03817 (or approximately 4.04) arc-minutes squared. Just for comparison, this is roughly 175 times smaller than the average moon’s angular area (with a diameter of 30 arc-minutes or half a degree). Some FITS writers do not use the ‘CDELT’ convention, making it hard to use the steps above. In such cases, you can extract the pixel scale with the ‘--pixelscale’ option of Gnuastro’s Fits program like the command below. Similar to the ‘--skycoverage’ option above, you can also use the ‘--quiet’ option to allow easy usage of the values in scripts. $ astfits flat-ir/xdf-f160w.fits --pixelscale *AWK for table/value processing:* As you saw above AWK is a powerful and simple tool for text processing. You will see it often in shell scripts. GNU AWK (the most common implementation) comes with a free and wonderful book (https://www.gnu.org/software/gawk/manual/) in the same format as this book which will allow you to master it nicely. Just like this manual, you can also access GNU AWK’s manual on the command-line whenever necessary without taking your hands off the keyboard. Just run ‘info awk’. ---------- Footnotes ---------- (1) With the FITS ‘CDELT’ convention, rotation (‘PC’ or ‘CD’ keywords) and scales (‘CDELT’) are separated. In the FITS standard the ‘CDELT’ keywords are optional. When ‘CDELT’ keywords are not present, the ‘PC’ matrix is assumed to contain _both_ the coordinate rotation and scales. Note that not all FITS writers use the ‘CDELT’ convention. So you might not find the ‘CDELT’ keywords in the WCS meta data of some FITS files. However, all Gnuastro programs (which use the default FITS keyword writing format of WCSLIB) write their output WCS with the ‘CDELT’ convention, even if the input does not have it. If your dataset does not use the ‘CDELT’ convention, you can feed it to any (simple) Gnuastro program (for example, Arithmetic) and the output will have the ‘CDELT’ keyword. See Section 8 of the FITS standard (https://fits.gsfc.nasa.gov/standard40/fits_standard40aa-le.pdf) for more 2.1.6 Cosmological coverage and visualizing tables -------------------------------------------------- Having found the angular coverage of the dataset in *note Angular coverage on the sky::, we can now use Gnuastro to answer a more physically motivated question: “How large is this area at different redshifts?”. To get a feeling of the tangential area that this field covers at redshift 2, you can use Gnuastro’s CosmicCalcular program (*note CosmicCalculator::). In particular, you need the tangential distance covered by 1 arc-second as raw output. Combined with the field’s area that was measured before, we can calculate the tangential distance in Mega Parsecs squared ($Mpc^2$). ## If your system language uses ',' (not '.') as decimal separator. $ export LC_NUMERIC=C ## Print general cosmological properties at redshift 2 (for example). $ astcosmiccal -z2 ## When given a "Specific calculation" option, CosmicCalculator ## will just print that particular calculation. To see all such ## calculations, add a `--help' token to the previous command ## (under the same title). Note that with `--help', no processing ## is done, so you can always simply append it to remember ## something without modifying the command you want to run. $ astcosmiccal -z2 --help ## Only print the "Tangential dist. covered by 1arcsec at z (kpc)". ## in units of kpc/arc-seconds. $ astcosmiccal -z2 --arcsectandist ## It is easier to use the short (single character) version of ## this option when typing (but this is hard to read, so use ## the long version in scripts or notes you plan to archive). $ astcosmiccal -z2 -s ## Short options can be merged (they are only a single character!) $ astcosmiccal -sz2 ## Convert this distance to kpc^2/arcmin^2 and save in `k'. $ k=$(astcosmiccal -sz2 | awk '{print ($1*60)^2}') ## Calculate the area of the dataset in arcmin^2. $ n=$(aststatistics flat-ir/xdf-f160w.fits --number) $ r=$(astfits flat-ir/xdf-f160w.fits -h1 --keyvalue=CDELT1 -q) $ a=$(echo $n $r | awk '{print $1 * ($2*60)^2 }') ## Multiply `k' and `a' and divide by 10^6 for value in Mpc^2. $ echo $k $a | awk '{print $1 * $2 / 1e6}' At redshift 2, this field therefore covers approximately 1.07 $Mpc^2$. If you would like to see how this tangential area changes with redshift, you can use a shell loop like below. $ for z in 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0; do \ k=$(astcosmiccal -sz$z); \ echo $z $k $a | awk '{print $1, ($2*60)^2 * $3 / 1e6}'; \ done Fortunately, the shell has a useful tool/program to print a sequence of numbers that is nicely called ‘seq’ (short for “sequence”). You can use it instead of typing all the different redshifts in the loop above. for example, the loop below will calculate and print the tangential coverage of this field across a larger range of redshifts (0.1 to 5) and with finer increments of 0.1. For more on the ‘LC_NUMERIC’ command, see *note Numeric locale::. ## If your system language uses ',' (not '.') as decimal separator. $ export LC_NUMERIC=C ## The loop over the redshifts $ for z in $(seq 0.1 0.1 5); do \ k=$(astcosmiccal -z$z --arcsectandist); \ echo $z $k $a | awk '{print $1, ($2*60)^2 * $3 / 1e6}'; \ done Have a look at the two printed columns. The first is the redshift, and the second is the area of this image at that redshift (in Mega Parsecs squared). Redshift (https://en.wikipedia.org/wiki/Redshift) ($z$) is a measure of distance in galaxy evolution and cosmology: a higher redshift corresponds to larger distance. Now, have a look at the first few values. At $z=0.1$ and $z=0.5$, this image covers $0.05 Mpc^2$ and $0.57 Mpc^2$ respectively. This increase of coverage with redshift is expected because a fixed angle will cover a larger tangential area at larger distances. However, as you come down the list (to higher redshifts) you will notice that this relation does not hold! The largest coverage is at $z=1.6$: at higher redshifts, the area decreases, and continues decreasing!!! In $\Lambda{}CDM$ cosmology, this happens because of the finite speed of light and the expansion of the universe, see the Wikipedia page (https://en.wikipedia.org/wiki/Angular_diameter_distance#Angular_diameter_turnover_point). In case you have TOPCAT, you can visualize this as a plot (if you do not have TOPCAT, see *note TOPCAT::). To do so, first you need to save the output of the loop above into a FITS table by piping the output to Gnuastro’s Table program and giving an output name: $ for z in $(seq 0.1 0.1 5); do \ k=$(astcosmiccal -z$z --arcsectandist); \ echo $z $k $a | awk '{print $1, ($2*60)^2 * $3 / 1e6}'; \ done | asttable --output=z-vs-tandist.fits You can now use Gnuastro’s ‘astscript-fits-view’ to open this table in TOPCAT with the command below. Do you remember this script from *note Dataset inspection and cropping::? There, we used it to view a FITS image with DS9! This script will see if the first dataset in the image is a table or an image and will call TOPCAT or DS9 accordingly: making it a very convenient tool to inspect the contents of all types of FITS data. $ astscript-fits-view z-vs-tandist.fits After TOPCAT opens, you will see the name of the table ‘z-vs-tandist.fits’ in the left panel. On the top menu bar, select the “Graphics” menu, then select “Plain plot” to visualize the two columns printed above as a plot and get a better impression of the turn over point of the image cosmological coverage. 2.1.7 Building custom programs with the library ----------------------------------------------- In *note Cosmological coverage and visualizing tables::, we repeated a certain calculation/output of a program multiple times using the shell’s ‘for’ loop. This simple way of repeating a calculation is great when it is only necessary once. However, if you commonly need this calculation and possibly for a larger number of redshifts at higher precision, the command above can be slow. Please try it out by changing the sequence command in the previous section to ‘‘seq 0.1 0.01 10’’. It will take about 11 seconds(1)! This can be improved by _hundreds_ of times! This section will show you how. Generally, repeated calls to a generic program (like CosmicCalculator) are slow, because a generic program can have a lot of overhead on each call: To be generic and easy to operate, CosmicCalculator has to parse the command-line and all configuration files (see *note Option management and configuration files::) which contain human-readable characters and need a lot of pre-processing to be ready for processing by the computer. Afterwards, CosmicCalculator has to check the sanity of its inputs and check which of its many options you have asked for. All the this pre-processing takes as much time as the high-level calculation you are requesting, and it has to re-do all of these for every redshift in your loop. To greatly speed up the processing, you can directly access the core work-horse of CosmicCalculator without all that overhead by designing your custom program for this job. Using Gnuastro’s library, you can write your own tiny program particularly designed for this exact calculation (and nothing else!). To do that, copy and paste the following C program in a file called ‘myprogram.c’. #include #include #include #include int main(void) { double area=4.03817; /* Area of field (arcmin^2). */ double z, adist, tandist; /* Temporary variables. */ /* Constants from Plank 2018 (arXiv:1807.06209, Table 2) */ double H0=67.66, olambda=0.6889, omatter=0.3111, oradiation=0; /* Do the same thing for all redshifts (z) between 0.1 and 5. */ for(z=0.1; z<10; z+=0.01) { /* Calculate the angular diameter distance. */ adist=gal_cosmology_angular_distance(z, H0, olambda, omatter, oradiation); /* Calculate the tangential distance of one arcsecond. */ tandist = adist * 1000 * M_PI / 3600 / 180; /* Print the redshift and area. */ printf("%-5.2f %g\n", z, pow(tandist * 60,2) * area / 1e6); } /* Tell the system that everything finished successfully. */ return EXIT_SUCCESS; } Then run the following command to compile your program and run it. $ astbuildprog myprogram.c In the command above, you used Gnuastro’s BuildProgram program. Its job is to simplify the compilation, linking and running of simple C programs that use Gnuastro’s library (like this one). BuildProgram is designed to manage Gnuastro’s dependencies, compile and link your custom program and then run it. Did you notice how your custom program created the table almost instantaneously? Technically, it only took about 0.03 seconds! Recall that the ‘for’ loop of *note Cosmological coverage and visualizing tables:: took more than 11 seconds (or $\sim367$ times slower!). Please run the ‘ls’ command to see a listing of the files in the current directory. You will notice that a new file called ‘myprogram’ has been created. This is the compiled program that was created and run by the command above (its in binary machine code format, not human-readable any more). You can run it again to get the same results by executing it: $ ./myprogram The efficiency of your custom ‘myprogram’ compared to repeated calls to CosmicCalculator is because in the latter, the requested processing is comparable to the necessary overheads. For other programs that take large input datasets and do complicated processing on them, the overhead is usually negligible compared to the processing. In such cases, the libraries are only useful if you want a different/new processing compared to the functionalities in Gnuastro’s existing programs. Gnuastro has a large library which is used extensively by all the programs. In other words, the library is like the skeleton of Gnuastro. For the full list of available functions classified by context, please see *note Gnuastro library::. Gnuastro’s library and BuildProgram are created to make it easy for you to use these powerful features as you like. This gives you a high level of creativity, while also providing efficiency and robustness. Several other complete working examples (involving images and tables) of Gnuastro’s libraries can be see in *note Library demo programs::. But for this tutorial, let’s stop discussing the libraries here and get back to Gnuastro’s already built programs (which do not need C programming). But before continuing, let’s clean up the files we do not need any more: $ rm myprogram* z-vs-tandist* ---------- Footnotes ---------- (1) To measure how much time the loop of *note Cosmological coverage and visualizing tables:: takes on your system, you can use the ‘time’ command. First put the whole loop (and pipe) into a plain-text file (to be loaded as a shell script) called ‘z-vs-tandist.sh’. Then run this command: ‘time -p bash z-vs-tandist.sh’. The relevant time (in seconds) is shown after ‘real’. 2.1.8 Option management and configuration files ----------------------------------------------- In the previous section (*note Cosmological coverage and visualizing tables::), when you ran CosmicCalculator, you only specified the redshfit with ‘-z2’ option. You did not specify the cosmological parameters that are necessary for the calculations! Parameters like the Hubble constant ($H_0$) and the matter density. In spite of this, CosmicCalculator done its processing and printed results. None of Gnuastro’s programs keep a default value internally within their code (they are all set by the user)! So where did the necessary cosmological parameters that are necessary for its calculations come from? What were the values to those parameters? In short, they come from a configuration file (see *note Configuration file precedence::), and the final used values can be checked/edited on the command-line. In this section we will review this important aspect of all the programs in Gnuastro. Configuration files are an important part of all Gnuastro’s programs, especially the ones with a large number of options, so it is important to understand this part well. Once you get comfortable with configuration files, you can make good use of them in all Gnuastro programs (for example, NoiseChisel). For example, to do optimal detection on various datasets, you can have configuration files for different noise properties. The configuration of each program (besides its version) is vital for the reproducibility of your results, so it is important to manage them properly. As we saw above, the full list of the options in all Gnuastro programs can be seen with the ‘--help’ option. Try calling it with CosmicCalculator as shown below. Note how options are grouped by context to make it easier to find your desired option. However, in each group, options are ordered alphabetically. $ astcosmiccal --help After running the command above, please scroll to the line that you ran this command and read through the output (its the same format for all the programs). All options have a long format (starting with ‘--’ and a multi-character name) and some have a short format (starting with ‘-’ and a single character), for more see *note Options::. The options that expect a value, have an <=> sign after their long version. The format of their expected value is also shown as ‘FLT’, ‘INT’ or ‘STR’ for floating point numbers, integer numbers, and strings (filenames for example) respectively. You can see the values of all options that need one with the ‘--printparams’ option (or its short format: ‘-P’). ‘--printparams’ is common to all programs (see *note Common options::). You can see the default cosmological parameters (from the Plank 2018 results (https://arxiv.org/abs/1807.06209)) under the ‘# Input:’ title: $ astcosmiccal -P # Input: H0 67.66 # Current expansion rate (Hubble constant). olambda 0.6889 # Current cosmological cst. dens. per crit. dens. omatter 0.3111 # Current matter density per critical density. oradiation 0 # Current radiation density per critical density. Let’s say you want to do the calculation in the previous section using $H_0=70$ km/s/Mpc. To do this, just add ‘--H0=70’ after the command above (while keeping the ‘-P’). In the output, you can see that the used Hubble constant has also changed. $ astcosmiccal -P --H0=70 Afterwards, delete the ‘-P’ and add a ‘-z2’ to see the calculations with the new cosmology (or configuration). $ astcosmiccal --H0=70 -z2 From the output of the ‘--help’ option, note how the option for Hubble constant has both short (‘-H’) and long (‘--H0’) formats. One final note is that the equal (<=>) sign is not mandatory. In the short format, the value can stick to the actual option (the short option name is just one character after-all, thus easily identifiable) and in the long format, a white-space character is also enough. $ astcosmiccal -H70 -z2 $ astcosmiccal --H0 70 -z2 --arcsectandist When an option does not need a value, and has a short format (like ‘--arcsectandist’), you can easily append it _before_ other short options. So the last command above can also be written as: $ astcosmiccal --H0 70 -sz2 Let’s assume that in one project, you want to only use rounded cosmological parameters ($H_0$ of 70km/s/Mpc and matter density of 0.3). You should therefore run CosmicCalculator like this: $ astcosmiccal --H0=70 --olambda=0.7 --omatter=0.3 -z2 But having to type these extra options every time you run CosmicCalculator will be prone to errors (typos in particular), frustrating and slow. Therefore in Gnuastro, you can put all the options and their values in a “Configuration file” and tell the programs to read the option values from there. Let’s create a configuration file... With your favorite text editor, make a file named ‘my-cosmology.conf’ (or ‘my-cosmology.txt’, the suffix does not matter for Gnuastro, but a more descriptive suffix like ‘.conf’ is recommended for humans reading your code and seeing your files: this includes you, looking into your own project, in a couple of months that you have forgot the details!). Then put the following lines inside of the plain-text file. One space between the option value and name is enough, the values are just under each other to help in readability. Also note that you should only use _long option names_ in configuration files. H0 70 olambda 0.7 omatter 0.3 You can now tell CosmicCalculator to read this file for option values immediately using the ‘--config’ option as shown below. Do you see how the output of the following command corresponds to the option values in ‘my-cosmology.conf’, and is therefore identical to the previous command? $ astcosmiccal --config=my-cosmology.conf -z2 But still, having to type ‘--config=my-cosmology.conf’ every time is annoying, is not it? If you need this cosmology every time you are working in a specific directory, you can use Gnuastro’s default configuration file names and avoid having to type it manually. The default configuration files (that are checked if they exist) must be placed in the hidden ‘.gnuastro’ sub-directory (in the same directory you are running the program). Their file name (within ‘.gnuastro’) must also be the same as the program’s executable name. So in the case of CosmicCalculator, the default configuration file in a given directory is ‘.gnuastro/astcosmiccal.conf’. Let’s do this. We will first make a directory for our custom cosmology, then build a ‘.gnuastro’ within it. Finally, we will copy the custom configuration file there: $ mkdir my-cosmology $ mkdir my-cosmology/.gnuastro $ mv my-cosmology.conf my-cosmology/.gnuastro/astcosmiccal.conf Once you run CosmicCalculator within ‘my-cosmology’ (as shown below), you will see how your custom cosmology has been implemented without having to type anything extra on the command-line. $ cd my-cosmology $ astcosmiccal -P # Your custom cosmology is printed. $ cd .. $ astcosmiccal -P # The default cosmology is printed. To further simplify the process, you can use the ‘--setdirconf’ option. If you are already in your desired working directory, calling this option with the others will automatically write the final values (along with descriptions) in ‘.gnuastro/astcosmiccal.conf’. for example, try the commands below: $ mkdir my-cosmology2 $ cd my-cosmology2 $ astcosmiccal -P $ astcosmiccal --H0 70 --olambda=0.7 --omatter=0.3 --setdirconf $ astcosmiccal -P $ cd .. Gnuastro’s programs also have default configuration files for a specific user (when run in any directory). This allows you to set a special behavior every time a program is run by a specific user. Only the directory and filename differ from the above, the rest of the process is similar to before. Finally, there are also system-wide configuration files that can be used to define the option values for all users on a system. See *note Configuration file precedence:: for a more detailed discussion. We will stop the discussion on configuration files here, but you can always read about them in *note Configuration files::. Before continuing the tutorial, let’s delete the two extra directories that we do not need any more: $ rm -rf my-cosmology* 2.1.9 Warping to a new pixel grid --------------------------------- We are now ready to start processing the downloaded images. The XDF datasets we are using here are already aligned to the same pixel grid. However, warping to a different/matched pixel grid is commonly needed before higher-level analysis when you are using datasets from different instruments. So let’s have a look at Gnuastro’s warping features here. Gnuastro’s Warp program should be used for warping the pixel-grid (see *note Warp::). For example, try rotating one of the images by 20 degrees: $ astwarp flat-ir/xdf-f160w.fits --rotate=20 Open the output (‘xdf-f160w_rotated.fits’) and see how it is rotated. Warp can generally be used for many kinds of pixel grid manipulation (warping), not just rotations. for example, the outputs of the commands below will have larger pixels respectively (new resolution being one quarter the original resolution), get shifted by 2.8 (by sub-pixel), get a shear of 2, and be tilted (projected). Run each of them and open the output file to see the effect, they will become handy for you in the future. $ astwarp flat-ir/xdf-f160w.fits --scale=0.25 $ astwarp flat-ir/xdf-f160w.fits --translate=2.8 $ astwarp flat-ir/xdf-f160w.fits --shear=0.2 $ astwarp flat-ir/xdf-f160w.fits --project=0.001,0.0005 If you need to do multiple warps, you can combine them in one call to Warp. for example, to first rotate the image, then scale it, run this command: $ astwarp flat-ir/xdf-f160w.fits --rotate=20 --scale=0.25 If you have multiple warps, do them all in one command. Do Not warp them in separate commands because the correlated noise will become too strong. As you see in the matrix that is printed when you run Warp, it merges all the warps into a single warping matrix (see *note Merging multiple warpings::) and simply applies that (mixes the pixel values) just once. However, if you run Warp multiple times, the pixels will be mixed multiple times, creating a strong artificial blur/smoothing, or stronger correlated noise. Recall that the merging of multiple warps is done through matrix multiplication, therefore order matters in the separate operations. At a lower level, through Warp’s ‘--matrix’ option, you can directly request your desired final warp and do not have to break it up into different warps like above (see *note Invoking astwarp::). Fortunately these datasets are already aligned to the same pixel grid, so you do not actually need the files that were just generated.You can safely delete them all with the following command. Here, you see why we put the processed outputs that we need later into a separate directory. In this way, the top directory can be used for temporary files for testing that you can simply delete with a generic command like below. $ rm *.fits 2.1.10 NoiseChisel and Multi-Extension FITS files ------------------------------------------------- In the previous sections, we completed a review of the basics of Gnuastro’s programs. We are now ready to do some more serious analysis on the downloaded images: extract the pixels containing signal from the image, find sub-structure of the extracted signal, do measurements over the extracted objects and analyze them (finding certain objects of interest in the image). The first step is to separate the signal (galaxies or stars) from the background noise in the image. We will be using the results of *note Dataset inspection and cropping::, so be sure you already have them. Gnuastro has NoiseChisel for this job. But NoiseChisel’s output is a multi-extension FITS file, therefore to better understand how to use NoiseChisel, let’s take a look at multi-extension FITS files and how you can interact with them. In the FITS format, each extension contains a separate dataset (image in this case). You can get basic information about the extensions in a FITS file with Gnuastro’s Fits program (see *note Fits::). To start with, let’s run NoiseChisel without any options, then use Gnuastro’s FITS program to inspect the number of extensions in this file. $ astnoisechisel flat-ir/xdf-f160w.fits $ astfits xdf-f160w_detected.fits From the output list, we see that NoiseChisel’s output contains 5 extensions. The zero-th (counting from zero, with name ‘NOISECHISEL-CONFIG’) is empty: it has value of ‘0’ in the fourth column (which shows its size in pixels). Like NoiseChisel, in all of Gnuastro’s programs, the first (or zero-th) extension of the output only contains meta-data: data about/describing the datasets within (all) the output’s extensions. This is recommended by the FITS standard, see *note Fits:: for more. In the case of Gnuastro’s programs, this generic zero-th/meta-data extension (for the whole file) contains all the configuration options of the program that created the file. Metadata regarding how the analysis was done (or a dataset was created) is very important for higher-level analysis and reproducibility. Therefore, Let’s first take a closer look at the ‘NOISECHISEL-CONFIG’ extension. If you specify a special header in the FITS file, Gnuastro’s Fits program will print the header keywords (metadata) of that extension. You can either specify the HDU/extension counter (starting from 0), or name. Therefore, the two commands below are identical for this file. We are usually tempted to use the first (shorter format), but when putting your commands into a script, please use the second format which is more human-friendly and understandable for readers of your code who may not know what is in the 0-th extension (this includes yourself in a few months!): $ astfits xdf-f160w_detected.fits -h0 $ astfits xdf-f160w_detected.fits -hNOISECHISEL-CONFIG The first group of FITS header keywords you see (containing the ‘SIMPLE’ and ‘BITPIX’ keywords; before the first empty line) are standard keywords. They are required by the FITS standard and must be present in any FITS extension. The second group starts with the input file name (value to the ‘INPUT’ keyword). The rest of the keywords you see afterwards have the same name as NoiseChisel’s options, and the value used by NoiseChisel in this run is shown after the ‘=’ sign. Finally, the last group (starting with ‘DATE’) contains the date and version information of Gnuastro and its dependencies that were used to generate this file. Besides the option values, these are also critical for future reproducibility of the result (you may update Gnuastro or its dependencies, and they may behave differently afterwards). The “versions and date” group of keywords are present in all Gnuastro’s FITS extension outputs, for more see *note Output FITS files::. Note that if a keyword name is larger than 8 characters, it is preceded by a ‘HIERARCH’ keyword and that all keyword names are in capital letters. These are all part of the FITS standard and originate from its history. But in short, both can be ignored! For example, with the command below, let’s see what the default value of the ‘--detgrowquant’ option is (using the ‘-P’ option described in *note Option management and configuration files::). $ astnoisechisel -P | grep detgrowquant To confirm that NoiseChisel used this value when we ran it above, let’s use ‘grep’ to extract the keyword line with ‘detgrowquant’ from the metadata extension. However, as you saw above, keyword names in the header is in all caps. So we need to ask ‘grep’ to ignore case with the ‘-i’ option. $ astfits xdf-f160w_detected.fits -h0 | grep -i detgrowquant In the output of the above command, you see ‘HIERARCH’ at the start of the line. According to the FITS standard, ‘HIERARCH’ is placed at the start of all keywords that have a name that is more than 8 characters long. Both the all-caps and the ‘HIERARCH’ keyword can be annoying when you want to read/check the value. Therefore, the best solution is to use the ‘--keyvalue’ option of Gnuastro’s ‘astfits’ program as shown below. With it, you do not have to worry about ‘HIERARCH’ or the case of the name (FITS keyword names are not case-sensitive). $ astfits xdf-f160w_detected.fits -h0 --keyvalue=detgrowquant -q The metadata (that is stored in the output) can later be used to exactly reproduce/understand your result, even if you have lost/forgot the command you used to create the file. This feature is present in all of Gnuastro’s programs, not just NoiseChisel. The rest of the HDUs in NoiseChisel have data. So let’s open them in a DS9 window and then describe each: $ astscript-fits-view xdf-f160w_detected.fits A “cube” window opens along with DS9’s main window. The buttons and horizontal scroll bar in this small new window can be used to navigate between the extensions. In this mode, all DS9’s settings (for example, zoom or color-bar) will be identical between the extensions. Try zooming into one part and flipping through the extensions to see how the galaxies were detected along with the Sky and Sky standard deviation values for that region. Just have in mind that NoiseChisel’s job is _only_ detection (separating signal from noise), We will do segmentation on this result later to find the individual galaxies/peaks over the detected pixels. The second extension of NoiseChisel’s output (numbered 1, named ‘INPUT-NO-SKY’) is the Sky-subtracted input that you provided. The third (‘DETECTIONS’) is NoiseChisel’s main output which is a binary image with only two possible values for all pixels: 0 for noise and 1 for signal. Since it only has two values, to avoid taking too much space on your computer, its numeric datatype an unsigned 8-bit integer (or ‘uint8’)(1). The fourth and fifth (‘SKY’ and ‘SKY_STD’) extensions, have the Sky and its standard deviation values for the input on a tile grid and were calculated over the undetected regions (for more on the importance of the Sky value, see *note Sky value::). Each HDU/extension in a FITS file is an independent dataset (image or table) which you can delete from the FITS file, or copy/cut to another file. For example, with the command below, you can copy NoiseChisel’s ‘DETECTIONS’ HDU/extension to another file: $ astfits xdf-f160w_detected.fits --copy=DETECTIONS -odetections.fits There are similar options to conveniently cut (‘--cut’, copy, then remove from the input) or delete (‘--remove’) HDUs from a FITS file also. See *note HDU information and manipulation:: for more. ---------- Footnotes ---------- (1) To learn more about numeric data types see *note Numeric data types::. 2.1.11 NoiseChisel optimization for detection --------------------------------------------- In *note NoiseChisel and Multi-Extension FITS files::, we ran NoiseChisel and reviewed NoiseChisel’s output format. Now that you have a better feeling for multi-extension FITS files, let’s optimize NoiseChisel for this particular dataset. One good way to see if you have missed any signal (small galaxies, or the wings of brighter galaxies) is to mask all the detected pixels and inspect the noise pixels. For this, you can use Gnuastro’s Arithmetic program (in particular its ‘where’ operator, see *note Arithmetic operators::). The command below will produce ‘mask-det.fits’. In it, all the pixels in the ‘INPUT-NO-SKY’ extension that are flagged 1 in the ‘DETECTIONS’ extension (dominated by signal, not noise) will be set to NaN. Since the various extensions are in the same file, for each dataset we need the file and extension name. To make the command easier to read/write/understand, let’s use shell variables: ‘‘in’’ will be used for the Sky-subtracted input image and ‘‘det’’ will be used for the detection map. Recall that a shell variable’s value can be retrieved by adding a ‘$’ before its name, also note that the double quotations are necessary when we have white-space characters in a variable value (like this case). $ in="xdf-f160w_detected.fits -hINPUT-NO-SKY" $ det="xdf-f160w_detected.fits -hDETECTIONS" $ astarithmetic $in $det nan where --output=mask-det.fits To invert the result (only keep the detected pixels), you can flip the detection map (from 0 to 1 and vice-versa) by adding a ‘‘not’’ after the second ‘$det’: $ astarithmetic $in $det not nan where --output=mask-sky.fits Look again at the ‘DETECTIONS’ extension, in particular the long worm-like structure around (1) pixel 1650 (X) and 1470 (Y). These types of long wiggly structures show that we have dug too deep into the noise, and are a signature of correlated noise. Correlated noise is created when we warp (for example, rotate) individual exposures (that are each slightly offset compared to each other) into the same pixel grid before adding them into one deeper image. During the warping, nearby pixels are mixed and the effect of this mixing on the noise (which is in every pixel) is called “correlated noise”. Correlated noise is a form of convolution and it slightly smooths the image. In terms of the number of exposures (and thus correlated noise), the XDF dataset is by no means an ordinary dataset. Therefore the default parameters need to be slightly customized. It is the result of warping and adding roughly 80 separate exposures which can create strong correlated noise/smoothing. In common surveys the number of exposures is usually 10 or less. See Figure 2 of Akhlaghi [2019] (https://arxiv.org/abs/1909.11230) and the discussion on ‘--detgrowquant’ there for more on how NoiseChisel “grow”s the detected objects and the patterns caused by correlated noise. Let’s tweak NoiseChisel’s configuration a little to get a better result on this dataset. Do Not forget that “_Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer_” (Anscombe 1973, see *note Science and its tools::). A good scientist must have a good understanding of her tools to make a meaningful analysis. So do not hesitate in playing with the default configuration and reviewing the manual when you have a new dataset (from a new instrument) in front of you. Robust data analysis is an art, therefore a good scientist must first be a good artist. Once you have found the good configuration for that particular noise pattern (instrument) you can safely use it for all new data that have a similar noise pattern. NoiseChisel can produce “Check images” to help you visualize and inspect how each step is done. You can see all the check images it can produce with this command. $ astnoisechisel --help | grep check Let’s check the overall detection process to get a better feeling of what NoiseChisel is doing with the following command. To learn the details of NoiseChisel in more detail, please see *note NoiseChisel::, Akhlaghi and Ichikawa [2015] (https://arxiv.org/abs/1505.01664) and Akhlaghi [2019] (https://arxiv.org/abs/1909.11230). $ astnoisechisel flat-ir/xdf-f160w.fits --checkdetection The check images/tables are also multi-extension FITS files. As you saw from the command above, when check datasets are requested, NoiseChisel will not go to the end. It will abort as soon as all the extensions of the check image are ready. Please list the extensions of the output with ‘astfits’ and then opening it with ‘ds9’ as we done above. If you have read the paper, you will see why there are so many extensions in the check image. $ astfits xdf-f160w_detcheck.fits $ ds9 -mecube xdf-f160w_detcheck.fits -zscale -zoom to fit In order to understand the parameters and their biases (especially as you are starting to use Gnuastro, or running it a new dataset), it is _strongly_ encouraged to play with the different parameters and use the respective check images to see which step is affected by your changes and how, for example, see *note Detecting large extended targets::. Let’s focus on one step: the ‘OPENED_AND_LABELED’ extension shows the initial detection step of NoiseChisel. We see the seeds of that correlated noise structure with many small detections (a relatively early stage in the processing). Such connections at the lowest surface brightness limits usually occur when the dataset is too smoothed, the threshold is too low, or the final “growth” is too much. As you see from the 2nd (‘CONVOLVED’) extension, the first operation that NoiseChisel does on the data is to slightly smooth it. However, the natural correlated noise of this dataset is already one level of artificial smoothing, so further smoothing it with the default kernel may be the culprit. To see the effect, let’s use a sharper kernel as a first step to convolve/smooth the input. By default NoiseChisel uses a Gaussian with full-width-half-maximum (FWHM) of 2 pixels. We can use Gnuastro’s MakeProfiles to build a kernel with FWHM of 1.5 pixel (truncated at 5 times the FWHM, like the default) using the following command. MakeProfiles is a powerful tool to build any number of mock profiles on one image or independently, to learn more of its features and capabilities, see *note MakeProfiles::. $ astmkprof --kernel=gaussian,1.5,5 --oversample=1 Please open the output ‘kernel.fits’ and have a look (it is very small and sharp). We can now tell NoiseChisel to use this instead of the default kernel with the following command (we will keep the ‘--checkdetection’ to continue checking the detection steps) $ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \ --checkdetection Open the output ‘xdf-f160w_detcheck.fits’ as a multi-extension FITS file and go to the last extension (‘DETECTIONS-FINAL’, it is the same pixels as the final NoiseChisel output without ‘--checkdetections’). Look again at that position mentioned above (1650,1470), you see that the long wiggly structure is gone. This shows we are making progress :-). Looking at the new ‘OPENED_AND_LABELED’ extension, we see that the thin connections between smaller peaks has now significantly decreased. Going two extensions/steps ahead (in the first ‘HOLES-FILLED’), you can see that during the process of finding false pseudo-detections, too many holes have been filled: do you see how the many of the brighter galaxies are connected? At this stage all holes are filled, irrespective of their size. Try looking two extensions ahead (in the first ‘PSEUDOS-FOR-SN’), you can see that there are not too many pseudo-detections because of all those extended filled holes. If you look closely, you can see the number of pseudo-detections in the printed outputs of NoiseChisel (around 6400). This is another side-effect of correlated noise. To address it, we should slightly increase the pseudo-detection threshold (before changing ‘--dthresh’, run with ‘-P’ to see the default value): $ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \ --dthresh=0.1 --checkdetection Before visually inspecting the check image, you can already see the effect of this small change in NoiseChisel’s command-line output: notice how the number of pseudo-detections has increased to more than 7100! Open the check image now and have a look, you can see how the pseudo-detections are distributed much more evenly in the blank sky regions of the ‘PSEUDOS-FOR-SN’ extension. *Maximize the number of pseudo-detections:* When using NoiseChisel on datasets with a new noise-pattern (for example, going to a Radio astronomy image, or a shallow ground-based image), play with ‘--dthresh’ until you get a maximal number of pseudo-detections: the total number of pseudo-detections is printed on the command-line when you run NoiseChisel, you do not even need to open a FITS viewer. In this particular case, try ‘--dthresh=0.2’ and you will see that the total printed number decreases to around 6700 (recall that with ‘--dthresh=0.1’, it was roughly 7100). So for this type of very deep HST images, we should set ‘--dthresh=0.1’. As discussed in Section 3.1.5 of Akhlaghi and Ichikawa [2015] (https://arxiv.org/abs/1505.01664), the signal-to-noise ratio of pseudo-detections are critical to identifying/removing false detections. For an optimal detection they are very important to get right (where you want to detect the faintest and smallest objects in the image successfully). Let’s have a look at their signal-to-noise distribution with ‘--checksn’. $ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \ --dthresh=0.1 --checkdetection --checksn The output (‘xdf-f160w_detsn.fits’) contains two extensions for the pseudo-detections containing two-column tables over the undetected (‘SKY_PSEUDODET_SN’) regions and those over detections (‘DET_PSEUDODET_SN’). With the first command below you can see the HDUs of this file, and with the second you can see the information of the table in the first HDU (which is the default when you do not use ‘--hdu’): $ astfits xdf-f160w_detsn.fits $ asttable xdf-f160w_detsn.fits -i You can see the table columns with the first command below and get a feeling of the signal-to-noise value distribution with the second command (the two Table and Statistics programs will be discussed later in the tutorial): $ asttable xdf-f160w_detsn.fits -hSKY_PSEUDODET_SN $ aststatistics xdf-f160w_detsn.fits -hSKY_PSEUDODET_SN -c2 ... [output truncated] ... Histogram: | * | *** | ****** | ********* | ********** | ************* | ***************** | ******************** | ************************** | ******************************** |******************************************************* * ** * |---------------------------------------------------------------------- The correlated noise is again visible in the signal-to-noise distribution of sky pseudo-detections! Do you see how skewed this distribution is? In an image with less correlated noise, this distribution would be much more symmetric. A small change in the quantile will translate into a big change in the S/N value. for example, see the difference between the three 0.99, 0.95 and 0.90 quantiles with this command: $ aststatistics xdf-f160w_detsn.fits -hSKY_PSEUDODET_SN -c2 \ --quantile=0.99 --quantile=0.95 --quantile=0.90 We get a change of almost 2 units (which is very significant). If you run NoiseChisel with ‘-P’, you’ll see the default signal-to-noise quantile ‘--snquant’ is 0.99. In effect with this option you specify the purity level you want (contamination by false detections). With the ‘aststatistics’ command above, you see that a small number of extra false detections (impurity) in the final result causes a big change in completeness (you can detect more lower signal-to-noise true detections). So let’s loosen-up our desired purity level, remove the check-image options, and then mask the detected pixels like before to see if we have missed anything. $ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \ --dthresh=0.1 --snquant=0.95 $ in="xdf-f160w_detected.fits -hINPUT-NO-SKY" $ det="xdf-f160w_detected.fits -hDETECTIONS" $ astarithmetic $in $det nan where --output=mask-det.fits Overall it seems good, but if you play a little with the color-bar and look closer in the noise, you’ll see a few very sharp, but faint, objects that have not been detected. for example, the object around pixel (456, 1662). Despite its high valued pixels, this object was lost because erosion ignores the precise pixel values. Losing small/sharp objects like this only happens for under-sampled datasets like HST (where the pixel size is larger than the point spread function FWHM). So this will not happen on ground-based images. To address this problem of sharp objects, we can use NoiseChisel’s ‘--noerodequant’ option. All pixels above this quantile will not be eroded, thus allowing us to preserve small/sharp objects (that cover a small area, but have a lot of signal in it). Check its default value, then run NoiseChisel like below and make the mask again. $ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \ --noerodequant=0.95 --dthresh=0.1 --snquant=0.95 This seems to be fine and the object above is now detected. We will stop editing the configuration of NoiseChisel here, but please feel free to keep looking into the data to see if you can improve it even more. Once you have found the proper configuration for the type of images you will be using you do not need to change them any more. The same configuration can be used for any dataset that has been similarly produced (and has a similar noise pattern). But entering all these options on every call to NoiseChisel is annoying and prone to bugs (mistakenly typing the wrong value for example). To simplify things, we will make a configuration file in a visible ‘config’ directory. Then we will define the hidden ‘.gnuastro’ directory (that all Gnuastro’s programs will look into for configuration files) as a symbolic link to the ‘config’ directory. Finally, we will write the finalized values of the options into NoiseChisel’s standard configuration file within that directory. We will also put the kernel in a separate directory to keep the top directory clean of any files we later need. $ mkdir kernel config $ ln -s config/ .gnuastro $ mv kernel.fits kernel/noisechisel.fits $ echo "kernel kernel/noisechisel.fits" > config/astnoisechisel.conf $ echo "noerodequant 0.95" >> config/astnoisechisel.conf $ echo "dthresh 0.1" >> config/astnoisechisel.conf $ echo "snquant 0.95" >> config/astnoisechisel.conf We are now ready to finally run NoiseChisel on the three filters and keep the output in a dedicated directory (which we will call ‘nc’ for simplicity). $ rm *.fits $ mkdir nc $ for f in f105w f125w f160w; do \ astnoisechisel flat-ir/xdf-$f.fits --output=nc/xdf-$f.fits; \ done ---------- Footnotes ---------- (1) To find a particular coordiante easily in DS9, you can do this: Click on the “Edit” menu, and select “Region”. Then click on any random part of the image to see a circle show up in that location (this is the “region”). Double-click on the region and a “Circle” window will open. If you have celestial coordinates, keep the default “fk5” in the scroll-down menu after the “Center”. But if you have pixel/image coordinates, click on the “fk5” and select “Image”. Now you can set the “Center” coordinates of the region (‘1650’ and ‘1470’ in this case) by manually typing them in the two boxes in front of “Center”. Finally, when everything is ready, click on the “Apply” button and your region will go over your requested coordinates. You can zoom out (to see the whole image) and visually find it. 2.1.12 NoiseChisel optimization for storage ------------------------------------------- As we showed before (in *note NoiseChisel and Multi-Extension FITS files::), NoiseChisel’s output is a multi-extension FITS file with several images the same size as the input. As the input datasets get larger this output can become hard to manage and waste a lot of storage space. Fortunately there is a solution to this problem (which is also useful for Segment’s outputs). In this small section we will take a short detour to show this feature. Please note that the outputs generated here are not needed for the rest of the tutorial. But first, let’s have a look at the contents/HDUs and volume of NoiseChisel’s output from *note NoiseChisel optimization for detection:: (fast answer, it is larger than 100 mega-bytes): $ astfits nc/xdf-f160w.fits $ ls -lh nc/xdf-f160w.fits Two options can drastically decrease NoiseChisel’s output file size: 1) With the ‘--rawoutput’ option, NoiseChisel will not create a Sky-subtracted output. After all, it is redundant: you can always generate it by subtracting the ‘SKY’ extension from the input image (which you have in your database) using the Arithmetic program. 2) With the ‘--oneelempertile’, you can tell NoiseChisel to store its Sky and Sky standard deviation results with one pixel per tile (instead of many pixels per tile). So let’s run NoiseChisel with these options, then have another look at the HDUs and the over-all file size: $ astnoisechisel flat-ir/xdf-f160w.fits --oneelempertile --rawoutput \ --output=nc-for-storage.fits $ astfits nc-for-storage.fits $ ls -lh nc-for-storage.fits See how ‘nc-for-storage.fits’ has four HDUs, while ‘nc/xdf-f160w.fits’ had five HDUs? As explained above, the missing extension is ‘INPUT-NO-SKY’. Also, look at the sizes of the ‘SKY’ and ‘SKY_STD’ HDUs, unlike before, they are not the same size as ‘DETECTIONS’, they only have one pixel for each tile (group of pixels in raw input). Finally, you see that ‘nc-for-storage.fits’ is just under 8 mega bytes (while ‘nc/xdf-f160w.fits’ was 100 mega bytes)! But we are not yet finished! You can even be more efficient in storage, archival or transferring NoiseChisel’s output by compressing this file. Try the command below to see how NoiseChisel’s output has now shrunk to about 250 kilo-byes while keeping all the necessary information as the original 100 mega-byte output. $ gzip --best nc-for-storage.fits $ ls -lh nc-for-storage.fits.gz We can get this wonderful level of compression because NoiseChisel’s output is binary with only two values: 0 and 1. Compression algorithms are highly optimized in such scenarios. You can open ‘nc-for-storage.fits.gz’ directly in SAO DS9 or feed it to any of Gnuastro’s programs without having to decompress it. Higher-level programs that take NoiseChisel’s output (for example, Segment or MakeCatalog) can also deal with this compressed image where the Sky and its Standard deviation are one pixel-per-tile. You just have to give the “values” image as a separate option, for more, see *note Segment:: and *note MakeCatalog::. Segment (the program we will introduce in the next section for identifying sub-structure), also has similar features to optimize its output for storage. Since this file was only created for a fast detour demonstration, let’s keep our top directory clean and move to the next step: rm nc-for-storage.fits.gz 2.1.13 Segmentation and making a catalog ---------------------------------------- The main output of NoiseChisel is the binary detection map (‘DETECTIONS’ extension, see *note NoiseChisel optimization for detection::). It only has two values: 1 or 0. This is useful when studying the noise or background properties, but hardly of any use when you actually want to study the targets/galaxies in the image, especially in such a deep field where almost everything is connected. To find the galaxies over the detections, we will use Gnuastro’s *note Segment:: program: $ mkdir seg $ astsegment nc/xdf-f160w.fits -oseg/xdf-f160w.fits $ astsegment nc/xdf-f125w.fits -oseg/xdf-f125w.fits $ astsegment nc/xdf-f105w.fits -oseg/xdf-f105w.fits Segment’s operation is very much like NoiseChisel (in fact, prior to version 0.6, it was part of NoiseChisel). for example, the output is a multi-extension FITS file (previously discussed in *note NoiseChisel and Multi-Extension FITS files::), it has check images and uses the undetected regions as a reference (previously discussed in *note NoiseChisel optimization for detection::). Please have a look at Segment’s multi-extension output to get a good feeling of what it has done. Do Not forget to flip through the extensions in the “Cube” window. $ astscript-fits-view seg/xdf-f160w.fits Like NoiseChisel, the first extension is the input. The ‘CLUMPS’ extension shows the true “clumps” with values that are $\ge1$, and the diffuse regions labeled as $-1$. Please flip between the first extension and the clumps extension and zoom-in on some of the clumps to get a feeling of what they are. In the ‘OBJECTS’ extension, we see that the large detections of NoiseChisel (that may have contained many galaxies) are now broken up into separate labels. Play with the color-bar and hover your mouse of the various detections to see their different labels. The clumps are not affected by the hard-to-deblend and low signal-to-noise diffuse regions, they are more robust for calculating the colors (compared to objects). From this step onward, we will continue with clumps. Having localized the regions of interest in the dataset, we are ready to do measurements on them with *note MakeCatalog::. MakeCatalog is specialized and optimized for doing measurements over labeled regions of an image. In other words, through MakeCatalog, you can “reduce” an image to a table (catalog of certain properties of objects in the image). Each requested measurement (over each label) will be given a column in the output table. To see the full set of available measurements run it with ‘--help’ like below (and scroll up), note that measurements are classified by context. $ astmkcatalog --help So let’s select the properties we want to measure in this tutorial. First of all, we need to know which measurement belongs to which object or clump, so we will start with the ‘--ids’ (read as: IDs(1)). We also want to measure (in this order) the Right Ascension (with ‘--ra’), Declination (‘--dec’), magnitude (‘--magnitude’), and signal-to-noise ratio (‘--sn’) of the objects and clumps. Furthermore, as mentioned above, we also want measurements on clumps, so we also need to call ‘--clumpscat’. The following command will make these measurements on Segment’s F160W output and write them in a catalog for each object and clump in a FITS table. For more on the zero point, see *note Brightness flux magnitude::. $ mkdir cat $ astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \ --zeropoint=25.94 --clumpscat --output=cat/xdf-f160w.fits From the printed statements on the command-line, you see that MakeCatalog read all the extensions in Segment’s output for the various measurements it needed. To calculate colors, we also need magnitude measurements on the other filters. So let’s repeat the command above on them, just changing the file names and zero point (which we got from the XDF survey web page): $ astmkcatalog seg/xdf-f125w.fits --ids --ra --dec --magnitude --sn \ --zeropoint=26.23 --clumpscat --output=cat/xdf-f125w.fits $ astmkcatalog seg/xdf-f105w.fits --ids --ra --dec --magnitude --sn \ --zeropoint=26.27 --clumpscat --output=cat/xdf-f105w.fits However, the galaxy properties might differ between the filters (which is the whole purpose behind observing in different filters!). Also, the noise properties and depth of the datasets differ. You can see the effect of these factors in the resulting clump catalogs, with Gnuastro’s Table program. We will go deep into working with tables in the next section, but in summary: the ‘-i’ option will print information about the columns and number of rows. To see the column values, just remove the ‘-i’ option. In the output of each command below, look at the ‘Number of rows:’, and note that they are different. $ asttable cat/xdf-f105w.fits -hCLUMPS -i $ asttable cat/xdf-f125w.fits -hCLUMPS -i $ asttable cat/xdf-f160w.fits -hCLUMPS -i Matching the catalogs is possible (for example, with *note Match::). However, the measurements of each column are also done on different pixels: the clump labels can/will differ from one filter to another for one object. Please open them and focus on one object to see for yourself. This can bias the result, if you match catalogs. An accurate color calculation can only be done when magnitudes are measured from the same pixels on all images and this can be done easily with MakeCatalog. In fact this is one of the reasons that NoiseChisel or Segment do not generate a catalog like most other detection/segmentation software. This gives you the freedom of selecting the pixels for measurement in any way you like (from other filters, other software, manually, etc.). Fortunately in these images, the Point spread function (PSF) is very similar, allowing us to use a single labeled image output for all filters(2). The F160W image is deeper, thus providing better detection/segmentation, and redder, thus observing smaller/older stars and representing more of the mass in the galaxies. We will thus use the F160W filter as a reference and use its segment labels to identify which pixels to use for which objects/clumps. But we will do the measurements on the sky-subtracted F105W and F125W images (using MakeCatalog’s ‘--valuesfile’ option) as shown below: Notice that the only difference between these calls and the call to generate the raw F160W catalog (excluding the zero point and the output name) is the ‘--valuesfile’. $ astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \ --valuesfile=nc/xdf-f125w.fits --zeropoint=26.23 \ --clumpscat --output=cat/xdf-f125w-on-f160w-lab.fits $ astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \ --valuesfile=nc/xdf-f105w.fits --zeropoint=26.27 \ --clumpscat --output=cat/xdf-f105w-on-f160w-lab.fits After running the commands above, look into what MakeCatalog printed on the command-line. You can see that (as requested) the object and clump pixel labels in both were taken from the respective extensions in ‘seg/xdf-f160w.fits’. However, the pixel values and pixel Sky standard deviation were respectively taken from ‘nc/xdf-f105w.fits’ and ‘nc/xdf-f125w.fits’. Since we used the same labeled image on all filters, the number of rows in both catalogs are now identical. Let’s have a look: $ asttable cat/xdf-f105w-on-f160w-lab.fits -hCLUMPS -i $ asttable cat/xdf-f125w-on-f160w-lab.fits -hCLUMPS -i $ asttable cat/xdf-f160w.fits -hCLUMPS -i Finally, MakeCatalog also does basic calculations on the full dataset (independent of each labeled region but related to whole data), for example, pixel area or per-pixel surface brightness limit. They are stored as keywords in the FITS headers (or lines starting with ‘#’ in plain text). This (and other ways to measure the limits of your dataset) are discussed in the next section: *note Measuring the dataset limits::. ---------- Footnotes ---------- (1) This option is plural because we need two ID columns for identifying “clumps” in the clumps catalog/table: the first column will be the ID of the host “object”, and the second one will be the ID of the clump within that object. In the “objects” catalog/table, only a single column will be returned for this option. (2) When the PSFs between two images differ largely, you would have to PSF-match the images before using the same pixels for measurements. 2.1.14 Measuring the dataset limits ----------------------------------- In *note Segmentation and making a catalog::, we created a catalog of the different objects with the image. Before measuring colors, or doing any other kind of analysis on the catalogs (and detected objects), it is very important to understand the limitations of the dataset. Without understanding the limitations of your dataset, you cannot make any physical interpretation of your results. The theory behind the calculations discussed here is thoroughly introduced in *note Quantifying measurement limits::. For example, with the command below, let’s sort all the detected clumps in the image by magnitude (with ‘--sort=magnitude’) and and print the magnitude and signal-to-noise ratio (S/N; with ‘-cmagnitude,sn’): $ asttable cat/xdf-f160w.fits -hclumps -cmagnitude,sn \ --sort=magnitude --noblank=magnitude As you see, we have clumps with a total magnitude of almost 32! This is _extremely faint_! Are these things trustable? Let’s have a look at all of those with a magnitude between 31 and 32 with the command below. We are first using Table to only keep the relevant columns rows, and using Gnuastro’s DS9 region file creation script (‘astscript-ds9-region’) to generate DS9 region files, and open DS9: $ asttable cat/xdf-f160w.fits -hclumps -cra,dec \ --range=magnitude,31:32 \ | astscript-ds9-region -c1,2 --radius=0.5 \ --command="ds9 -mecube seg/xdf-f160w.fits -zscale" Zoom-out a little and you will see some green circles (DS9 region files) in some regions of the image. There actually does seem to be a true peak under the selected regions, but as you see, they are very small, diffuse and noisy. How reliable are the measured magnitudes? Using the S/N column from the first command above, you can see that such objects only have a signal to noise of about 2.6 (which is indeed too low for most analysis purposes) $ asttable cat/xdf-f160w.fits -hclumps -csn \ --range=magnitude,31:32 | aststatistics This brings us to the first method of quantifying your dataset’s _magnitude limit_, which is also sometimes called _detection limit_ (see *note Magnitude limit of image::). To estimate the $5\sigma$ detection limit of your dataset, you simply report the median magnitude of the objects that have a signal to noise of (approximately) five. This is very easy to calculate with the command below: $ asttable cat/xdf-f160w.fits -hclumps --range=sn,4.8:5.2 -cmagnitude \ | aststatistics --median 29.9949 Let’s have a look at these objects, to get a feeling of what these clump looks like: $ asttable cat/xdf-f160w.fits -hclumps --range=sn,4.8:5.2 \ -cra,dec,magnitude \ | astscript-ds9-region -c1,2 --namecol=3 \ --width=2 --radius=0.5 \ --command="ds9 -mecube seg/xdf-f160w.fits -zscale" The number you see on top of each region is the clump’s magnitude. Please go over the objects and have a close look at them! It is very important to have a feeling of what your dataset looks like, and how to interpret the numbers to associate an image with them. Generally, they look very small with different levels of diffuse-ness! Those that are sharper make more visual sense (to be $5\sigma$ detections), but the more diffuse ones extend over a larger area. Furthermore, the noise is measured on individual pixel measurements. However, during the reduction many exposures are co-added and stacked, mixing the pixels like a small convolution (creating “correlated noise”). Therefore you clearly see two main issues with the detection limit as defined above: it depends on the morphology, and it does not take into account the correlated noise. A more realistic way to estimate the significance of the detection is to take its footprint, randomly place it in thousands of undetected regions of the image and use that distribution as a reference. This is technically known as upper-limit measurements. For a full discussion, see *note Upper limit magnitude of each detection::). Since it is for each separate object, the upper-limit measurements should be requested as extra columns in MakeCatalog’s output. for example, with the command below, let’s generate a new catalog of the F160W filter, but with two extra columns compared to the one in ‘cat/’: the upper-limit magnitude and the upper-limit multiple of sigma. $ astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \ --zeropoint=25.94 --clumpscat --upnsigma=3 \ --upperlimitmag --upperlimitsigma \ --output=xdf-f160w.fits Let’s compare the upper-limit magnitude with the measured magnitude of each clump: $ asttable xdf-f160w.fits -hclumps -cmagnitude,upperlimit_mag As you see, in almost all of the cases, the measured magnitude is sufficiently higher than the upper-limit magnitude. Let’s subtract the latter from the former to better see this difference in a third column: $ asttable xdf-f160w.fits -hclumps -cmagnitude,upperlimit_mag \ -c'arith upperlimit_mag magnitude -' The ones with a positive third column (difference) we can say that the clump seems to has sufficiently higher brightness than the noisy background to be usable. Let’s use Table’s *note Column arithmetic:: to find only those that have a negative difference: $ asttable xdf-f160w.fits -hclumps -cra,dec --noblankend=3 \ -c'arith upperlimit_mag magnitude - set-d d d 0 gt nan where' From more than 3500 clumps, this command only gave $\sim150$ rows (this number may slightly change on different runs due to the random nature of the upper-limit sampling(1))! Let’s have a look at them: $ asttable xdf-f160w.fits -hclumps -cra,dec --noblankend=3 \ -c'arith upperlimit_mag magnitude - set-d d d 0 gt nan where' \ | astscript-ds9-region -c1,2 --namecol=3 --width=2 \ --radius=0.5 \ --command="ds9 -mecube seg/xdf-f160w.fits -zscale" You see that they are all extremely faint and diffuse/small peaks. Therefore, if an object’s magnitude is fainter than its upper-limit magnitude, you should not use the magnitude: it is not accurate! You should use the upper-limit magnitude instead (with an arrow in your plots to mark which ones are upper-limits). But the main point (in relation to the magnitude limit) with the upper-limit, is the ‘UPPERLIMIT_SIGMA’ column. you can think of this as a _realistic_ S/N for extremely faint/diffuse/small objects). The raw S/N column is simply calculated on a pixel-by-pixel basis, however, the upper-limit sigma is produced by actually taking the label’s footprint, and randomly placing it thousands of time over un-detected parts of the image and measuring the brightness of the sky. The clump’s brightness is then divided by the standard deviation of the resulting distribution to give you exactly how significant it is (accounting for inter-pixel issues like correlated noise, which are strong in this dataset). You can actually compare the two values with the command below: $ asttable xdf-f160w.fits -hclumps -csn,upperlimit_sigma As you see, the second column (upper-limit sigma) is almost always less than the S/N. This clearly shows the effect of correlated noise! If you now use this column as the reference for deriving the magnitude limit, you will see that it will shift by almost 0.5 magnitudes brighter and is now more reasonable: $ asttable xdf-f160w.fits -hclumps --range=upperlimit_sigma,4.8:5.2 \ -cmagnitude | aststatistics --median 29.6257 We see that the $5\sigma$ detection limit is $\sim29.6$! This is extremely deep! for example, in the Legacy Survey(2), the $5\sigma$ detection limit for _point sources_ is approximately 24.5 (5 magnitudes, or 100 times, shallower than this image). As mentioned above, an important caveat in this simple calculation is that we should only be looking at point-like objects, not simply everything. This is because the shape or radial slope of the profile has an important effect on this measurement: at the same total magnitude, a sharper object will have a higher S/N. To be more precise, we should first perform star-galaxy separation, then do this only for the objects that are classified as stars. A crude, first-order, method is to use the ‘--axisratio’ option so MakeCatalog also measures the axis ratio, then call Table with ‘--range=upperlimit_sigma,,4.8:5.2’ and ‘--range=axis_ratio,0.95:1’ (in one command). Please do this for yourself as an exercise to see the difference with the result above. Before continuing, let’s remove this temporarily produced catalog: $ rm xdf-f160w.fits Another measure of the dataset’s limit is the completeness limit (*note Completeness limit of each detection::). This is necessary when you are looking at populations of objects over the image. You want to know until what magnitude you can be sure that you have detected an object (if it was present). As described in *note Completeness limit of each detection::, the best way to do this is with mock images. But a crude, first order result can be obtained from the actual image: by simply plotting the histogram of the magnitudes: $ aststatistics cat/xdf-f160w.fits -hclumps -cmagnitude ... Histogram: | * | ** **** | *********** | ************* | **************** | ******************* | ********************** | ************************** | ********************************* | ********************************************* |* * ** ** ********************************************************** |---------------------------------------------------------------------- This plot (the histogram of magnitudes; where fainter magnitudes are towards the right) is technically called the dataset’s _number count_ plot. You see that the number of objects increases with magnitude as the magnitudes get fainter (to the right). However, beyond a certain magnitude, you see it becomes flat, and soon afterwards, the numbers suddenly drop. Once you have your catalog, you can easily find this point with the two commands below. First we generate a histogram with fewer bins (to have more numbers in each bin). We then use AWK to find the magnitude bin where the number of points decrease compared to the previous bin. But we only do this for bins that have more than 50 items (to avoid scatter in the bright end). Finally, in Statistics, we have manually set the magnitude range and number of bins so each bin is roughly 0.5 magnitudes thick (with ‘--greaterequal=20’, ‘--lessthan=32’ and ‘--numbins=24’) $ aststatistics cat/xdf-f160w.fits -hclumps -cmagnitude --histogram \ --greaterequal=20 --lessthan=32 --numbins=24 \ --output=f160w-hist.txt $ asttable f160w-hist.txt \ | awk '$2>50 && $2 (3) You can change these values with the ‘--sfmagarea’ and ‘--sfmagnsigma’ 2.1.15 Working with catalogs (estimating colors) ------------------------------------------------ In the previous step we generated catalogs of objects and clumps over our dataset (see *note Segmentation and making a catalog::). The catalogs are available in the two extensions of the single FITS file(1). Let’s see the extensions and their basic properties with the Fits program: $ astfits cat/xdf-f160w.fits # Extension information Let’s inspect the table in each extension with Gnuastro’s Table program (see *note Table::). We should have used ‘-hOBJECTS’ and ‘-hCLUMPS’ instead of ‘-h1’ and ‘-h2’ respectively. The numbers are just used here to convey that both names or numbers are possible, in the next commands, we will just use names. $ asttable cat/xdf-f160w.fits -h1 --info # Objects catalog info. $ asttable cat/xdf-f160w.fits -h1 # Objects catalog columns. $ asttable cat/xdf-f160w.fits -h2 -i # Clumps catalog info. $ asttable cat/xdf-f160w.fits -h2 # Clumps catalog columns. As you see above, when given a specific table (file name and extension), Table will print the full contents of all the columns. To see the basic metadata about each column (for example, name, units and comments), simply append a ‘--info’ (or ‘-i’) to the command. To print the contents of special column(s), just give the column number(s) (counting from ‘1’) or the column name(s) (if they have one) to the ‘--column’ (or ‘-c’) option. For example, if you just want the magnitude and signal-to-noise ratio of the clumps (in the clumps catalog), you can get it with any of the following commands $ asttable cat/xdf-f160w.fits -hCLUMPS --column=5,6 $ asttable cat/xdf-f160w.fits -hCLUMPS -c5,SN $ asttable cat/xdf-f160w.fits -hCLUMPS -c5 -c6 $ asttable cat/xdf-f160w.fits -hCLUMPS -cMAGNITUDE -cSN Similar to HDUs, when the columns have names, always use the name: it is so common to mis-write numbers or forget the order later! Using column names instead of numbers has many advantages: 1. You do not have to worry about the order of columns in the table. 2. It acts as a documentation in the script. 3. Column meta-data (including a name) are not just limited to FITS tables and can also be used in plain text tables, see *note Gnuastro text table format::. Table also has tools to limit the displayed rows. for example, with the first command below only rows with a magnitude in the range of 29 to 30 will be shown. With the second command, you can further limit the displayed rows to rows with an S/N larger than 10 (a range between 10 to infinity). You can further sort the output rows, only show the top (or bottom) N rows, etc., see *note Table:: for more. $ asttable cat/xdf-f160w.fits -hCLUMPS --range=MAGNITUDE,28:29 $ asttable cat/xdf-f160w.fits -hCLUMPS \ --range=MAGNITUDE,28:29 --range=SN,10:inf Now that you are comfortable in viewing table columns and rows, let’s look into merging columns of multiple tables into one table (which is necessary for measuring the color of the clumps). Since ‘cat/xdf-f160w.fits’ and ‘cat/xdf-f105w-on-f160w-lab.fits’ have exactly the same number of rows and the rows correspond to the same clump, let’s merge them to have one table with magnitudes in both filters. We can merge columns with the ‘--catcolumnfile’ option like below. You give this option a file name (which is assumed to be a table that has the same number of rows as the main input), and all the table’s columns will be concatenated/appended to the main table. Now, try it out with the commands below. We will first look at the metadata of the first table (only the ‘CLUMPS’ extension). With the second command, we will concatenate the two tables and write them in, ‘two-in-one.fits’ and finally, we will check the new catalog’s metadata. $ asttable cat/xdf-f160w.fits -i -hCLUMPS $ asttable cat/xdf-f160w.fits -hCLUMPS --output=two-in-one.fits \ --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \ --catcolumnhdu=CLUMPS $ asttable two-in-one.fits -i By comparing the two metadata, we see that both tables have the same number of rows. But what might have attracted your attention more, is that ‘two-in-one.fits’ has double the number of columns (as expected, after all, you merged both tables into one file, and did not ask for any specific column). In fact you can concatenate any number of other tables in one command, for example: $ asttable cat/xdf-f160w.fits -hCLUMPS --output=three-in-one.fits \ --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \ --catcolumnfile=cat/xdf-f105w-on-f160w-lab.fits \ --catcolumnhdu=CLUMPS --catcolumnhdu=CLUMPS $ asttable three-in-one.fits -i As you see, to avoid confusion in column names, Table has intentionally appended a ‘-1’ to the column names of the first concatenated table if the column names are already present in the original table. for example, we have the original ‘RA’ column, and another one called ‘RA-1’). Similarly a ‘-2’ has been added for the columns of the second concatenated table. However, this example clearly shows a problem with this full concatenation: some columns are identical (for example, ‘HOST_OBJ_ID’ and ‘HOST_OBJ_ID-1’), or not needed (for example, ‘RA-1’ and ‘DEC-1’ which are not necessary here). In such cases, you can use ‘--catcolumns’ to only concatenate certain columns, not the whole table. for example, this command: $ asttable cat/xdf-f160w.fits -hCLUMPS --output=two-in-one-2.fits \ --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \ --catcolumnhdu=CLUMPS --catcolumns=MAGNITUDE $ asttable two-in-one-2.fits -i You see that we have now only appended the ‘MAGNITUDE’ column of ‘cat/xdf-f125w-on-f160w-lab.fits’. This is what we needed to be able to later subtract the magnitudes. Let’s go ahead and add the F105W magnitudes also with the command below. Note how we need to call ‘--catcolumnhdu’ once for every table that should be appended, but we only call ‘--catcolumn’ once (assuming all the tables that should be appended have this column). $ asttable cat/xdf-f160w.fits -hCLUMPS --output=three-in-one-2.fits \ --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \ --catcolumnfile=cat/xdf-f105w-on-f160w-lab.fits \ --catcolumnhdu=CLUMPS --catcolumnhdu=CLUMPS \ --catcolumns=MAGNITUDE $ asttable three-in-one-2.fits -i But we are not finished yet! There is a very big problem: it is not immediately clear which one of ‘MAGNITUDE’, ‘MAGNITUDE-1’ or ‘MAGNITUDE-2’ columns belong to which filter! Right now, you know this because you just ran this command. But in one hour, you’ll start doubting yourself and will be forced to go through your command history, trying to figure out if you added F105W first, or F125W. You should never torture your future-self (or your colleagues) like this! So, let’s rename these confusing columns in the matched catalog. Fortunately, with the ‘--colmetadata’ option, you can correct the column metadata of the final table (just before it is written). It takes four values: 1) the original column name or number, 2) the new column name, 3) the column unit and 4) the column comments. Since the comments are usually human-friendly sentences and contain space characters, you should put them in double quotations like below. for example, by adding three calls of this option to the previous command, we write the filter name in the magnitude column name and description. $ asttable cat/xdf-f160w.fits -hCLUMPS --output=three-in-one-3.fits \ --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \ --catcolumnfile=cat/xdf-f105w-on-f160w-lab.fits \ --catcolumnhdu=CLUMPS --catcolumnhdu=CLUMPS \ --catcolumns=MAGNITUDE \ --colmetadata=MAGNITUDE,MAG-F160W,log,"Magnitude in F160W." \ --colmetadata=MAGNITUDE-1,MAG-F125W,log,"Magnitude in F125W." \ --colmetadata=MAGNITUDE-2,MAG-F105W,log,"Magnitude in F105W." $ asttable three-in-one-3.fits -i We now have all three magnitudes in one table and can start doing arithmetic on them (to estimate colors, which are just a subtraction of magnitudes). To use column arithmetic, simply call the column selection option (‘--column’ or ‘-c’), put the value in single quotations and start the value with ‘arith’ (followed by a space) like the example below. Column arithmetic uses the same “reverse polish notation” as the Arithmetic program (see *note Reverse polish notation::), with almost all the same operators (see *note Arithmetic operators::), and some column-specific operators (that are not available for images). In column-arithmetic, you can identify columns by number (prefixed with a ‘$’) or name, for more see *note Column arithmetic::. So let’s estimate one color from ‘three-in-one-3.fits’ using column arithmetic. All the commands below will produce the same output, try them each and focus on the differences. Note that column arithmetic can be mixed with other ways to choose output columns (the ‘-c’ option). $ asttable three-in-one-3.fits -ocolor-cat.fits \ -c1,2,3,4,'arith $5 $7 -' $ asttable three-in-one-3.fits -ocolor-cat.fits \ -c1,2,RA,DEC,'arith MAG-F125W MAG-F160W -' $ asttable three-in-one-3.fits -ocolor-cat.fits -c1,2 \ -cRA,DEC --column='arith MAG-F105W MAG-F160W -' This example again highlights the important point on using column names: if you do not know the commands before, you have no way of making sense of the first command: what is in column 5 and 7? why not subtract columns 3 and 4 from each other? Do you see how cryptic the first one is? Then look at the last one: even if you have no idea how this table was created, you immediately understand the desired operation. *When you have column names, please use them.* If your table does not have column names, give them names with the ‘--colmetadata’ (described above) as you are creating them. But how about the metadata for the column you just created with column arithmetic? Have a look at the column metadata of the table produced above: $ asttable color-cat.fits -i The name of the column produced by arithmetic column is ‘ARITH_1’! This is natural: Arithmetic has no idea what the modified column is! You could have multiplied two columns, or done much more complex transformations with many columns. _Metadata cannot be set automatically, your (the human) input is necessary._ To add metadata, you can use ‘--colmetadata’ like before: $ asttable three-in-one-3.fits -ocolor-cat.fits -c1,2,RA,DEC \ --column='arith MAG-F105W MAG-F160W -' \ --colmetadata=ARITH_1,F105W-F160W,log,"Magnitude difference" $ asttable color-cat.fits -i We are now ready to make our final table. We want it to have the magnitudes in all three filters, as well as the three possible colors. Recall that by convention in astronomy colors are defined by subtracting the bluer magnitude from the redder magnitude. In this way a larger color value corresponds to a redder object. So from the three magnitudes, we can produce three colors (as shown below). Also, because this is the final table we are creating here and want to use it later, we will store it in ‘cat/’ and we will also give it a clear name and use the ‘--range’ option to only print columns with a signal-to-noise ratio (‘SN’ column, from the F160W filter) above 5. $ asttable three-in-one-3.fits --range=SN,5,inf -c1,2,RA,DEC,SN \ -cMAG-F160W,MAG-F125W,MAG-F105W \ -c'arith MAG-F125W MAG-F160W -' \ -c'arith MAG-F105W MAG-F125W -' \ -c'arith MAG-F105W MAG-F160W -' \ --colmetadata=SN,SN-F160W,ratio,"F160W signal to noise ratio" \ --colmetadata=ARITH_1,F125W-F160W,log,"Color F125W-F160W." \ --colmetadata=ARITH_2,F105W-F125W,log,"Color F105W-F125W." \ --colmetadata=ARITH_3,F105W-F160W,log,"Color F105W-F160W." \ --output=cat/mags-with-color.fits $ asttable cat/mags-with-color.fits -i The table now has all the columns we need and it has the proper metadata to let us safely use it later (without frustrating over column orders!) or passing it to colleagues. Let’s finish this section of the tutorial with a useful tip on modifying column metadata. Above, updating/changing column metadata was done with the ‘--colmetadata’ in the same command that produced the newly created Table file. But in many situations, the table is already made and you just want to update the metadata of one column. In such cases using ‘--colmetadata’ is over-kill (wasting CPU/RAM energy or time if the table is large) because it will load the full table data and metadata into memory, just change the metadata and write it back into a file. In scenarios when the table’s data does not need to be changed and you just want to set or update the metadata, it is much more efficient to use basic FITS keyword editing. For example, in the FITS standard, column names are stored in the ‘TTYPE’ header keywords, so let’s have a look: $ asttable two-in-one.fits -i $ astfits two-in-one.fits -h1 | grep TTYPE Changing/updating the column names is as easy as updating the values to these keywords. You do not need to touch the actual data! With the command below, we will just update the ‘MAGNITUDE’ and ‘MAGNITUDE-1’ columns (which are respectively stored in the ‘TTYPE5’ and ‘TTYPE11’ keywords) by modifying the keyword values and checking the effect by listing the column metadata again: $ astfits two-in-one.fits -h1 \ --update=TTYPE5,MAG-F160W \ --update=TTYPE11,MAG-F125W $ asttable two-in-one.fits -i You can see that the column names have indeed been changed without touching any of the data. You can do the same for the column units or comments by modifying the keywords starting with ‘TUNIT’ or ‘TCOMM’. Generally, Gnuastro’s table is a very useful program in data analysis and what you have seen so far is just the tip of the iceberg. But to avoid making the tutorial even longer, we will stop reviewing the features here, for more, please see *note Table::. Before continuing, let’s just delete all the temporary FITS tables we placed in the top project directory: rm *.fits ---------- Footnotes ---------- (1) MakeCatalog can also output plain text tables. However, in the plain text format you can only have one table per file. Therefore, if you also request measurements on clumps, two plain text tables will be created (suffixed with ‘_o.txt’ and ‘_c.txt’). 2.1.16 Column statistics (color-magnitude diagram) -------------------------------------------------- In *note Working with catalogs estimating colors:: we created a single catalog containing the magnitudes of our desired clumps in all three filters, and their colors. To start with, let’s inspect the distribution of three colors with the Statistics program. $ aststatistics cat/mags-with-color.fits -cF105W-F125W $ aststatistics cat/mags-with-color.fits -cF105W-F160W $ aststatistics cat/mags-with-color.fits -cF125W-F160W This tiny and cute ASCII histogram (and the general information printed above it) gives you a crude (but very useful and fast) feeling on the distribution. You can later use Gnuastro’s Statistics program with the ‘--histogram’ option to build a much more fine-grained histogram as a table to feed into your favorite plotting program for a much more accurate/appealing plot (for example, with PGFPlots in LaTeX). If you just want a specific measure, for example, the mean, median and standard deviation, you can ask for them specifically, like below: $ aststatistics cat/mags-with-color.fits -cF105W-F160W \ --mean --median --std The basic statistics we measured above were just on one column. In many scenarios this is fine, but things get much more exciting if you look at the correlation of two columns with each other. For example, let’s create the color-magnitude diagram for our measured targets. In many papers, the color-magnitude diagram is usually plotted as a scatter plot. However, scatter plots have a major limitation when there are a lot of points and they cluster together in one region of the plot: the possible correlation in that dense region is lost (because the points fall over each other). In such cases, it is much better to use a 2D histogram. In a 2D histogram, the full range in both columns is divided into discrete 2D bins (or pixels!) and we count how many objects fall in that 2D bin. Since a 2D histogram is a pixelated space, we can simply save it as a FITS image and view it in a FITS viewer. Let’s do this in the command below. As is common with color-magnitude plots, we will put the redder magnitude on the horizontal axis and the color on the vertical axis. We will set both dimensions to have 100 bins (with ‘--numbins’ for the horizontal and ‘--numbins2’ for the vertical). Also, to avoid strong outliers in any of the dimensions, we will manually set the range of each dimension with the ‘--greaterequal’, ‘--greaterequal2’, ‘--lessthan’ and ‘--lessthan2’ options. $ aststatistics cat/mags-with-color.fits -cMAG-F160W,F105W-F160W \ --histogram2d=image --manualbinrange \ --numbins=100 --greaterequal=22 --lessthan=30 \ --numbins2=100 --greaterequal2=-1 --lessthan2=3 \ --manualbinrange --output=cmd.fits You can now open this FITS file as a normal FITS image, for example, with the command below. Try hovering/zooming over the pixels: not only will you see the number of objects in catalog that fall in each bin/pixel, but you also see the ‘F160W’ magnitude and color of that pixel also (in the same place you usually see RA and Dec when hovering over an astronomical image). $ astscript-fits-view cmd.fits --ds9scale=minmax Having a 2D histogram as a FITS image with WCS has many great advantages. For example, just like FITS images of the night sky, you can “match” many 2D histograms that were created independently. You can add two histograms with each other, or you can use advanced features of FITS viewers to find structure in the correlation of your columns. With the first command below, you can activate the grid feature of DS9 to actually see the coordinate grid, as well as values on each line. With the second command, DS9 will even read the labels of the axes and use them to generate an almost publication-ready plot. $ astscript-fits-view cmd.fits --ds9scale=minmax --ds9extra="-grid yes" $ astscript-fits-view cmd.fits --ds9scale=minmax \ --ds9extra="-grid yes -grid type publication" If you are happy with the grid and coloring and the rest, you can also use ds9 to save this as a JPEG image to directly use in your documents/slides with these extra DS9 options (DS9 will write the image to ‘cmd-2d.jpeg’ and quit immediately afterwards): $ astscript-fits-view cmd.fits --ds9scale=minmax \ --ds9extra="-grid yes -grid type publication" \ --ds9extra="-saveimage cmd-2d.jpeg -quit" This is good for a fast progress update. But for your paper or more official report, you want to show something with higher quality. For that, you can use the PGFPlots package in LaTeX to add axes in the same font as your text, sharp grids and many other elegant/powerful features (like over-plotting interesting points and lines). But to load the 2D histogram into PGFPlots first you need to convert the FITS image into a more standard format, for example, PDF. We will use Gnuastro’s *note ConvertType:: for this, and use the ‘sls-inverse’ color map (which will map the pixels with a value of zero to white): $ astconvertt cmd.fits --colormap=sls-inverse --borderwidth=0 -ocmd.pdf Open the resulting ‘cmd.pdf’ and see the PDF. Below you can see a minimally working example of how to add axis numbers, labels and a grid to the PDF generated above. First, let’s create a new ‘report’ directory to keep the LaTeX outputs, then put the minimal report’s source in a file called ‘report.tex’. Notice the ‘xmin’, ‘xmax’, ‘ymin’, ‘ymax’ values and how they are the same as the range specified above. $ mkdir report-cmd $ mv cmd.pdf report-cmd/ $ cat report-cmd/report.tex \documentclass{article} \usepackage{pgfplots} \dimendef\prevdepth=0 \begin{document} You can write all you want here... \begin{tikzpicture} \begin{axis}[ enlargelimits=false, grid, axis on top, width=\linewidth, height=\linewidth, xlabel={Magnitude (F160W)}, ylabel={Color (F105W-F160W)}] \addplot graphics[xmin=22, xmax=30, ymin=-1, ymax=3] {cmd.pdf}; \end{axis} \end{tikzpicture} \end{document} Run this command to build your PDF (assuming you have LaTeX and PGFPlots). $ cd report-cmd $ pdflatex report.tex Open the newly created ‘report.pdf’ and enjoy the exquisite quality. The improved quality, blending in with the text, vector-graphics resolution and other features make this plot pleasing to the eye, and let your readers focus on the main point of your scientific argument. PGFPlots can also built the PDF of the plot separately from the rest of the paper/report, see *note 2D histogram as a table for plotting:: for the necessary changes in the preamble. We will not go much deeper into the Statistics program here, but there is so much more you can do with it. After finishing the tutorial, see *note Statistics::. 2.1.17 Aperture photometry -------------------------- The colors we calculated in *note Working with catalogs estimating colors:: used a different segmentation map for each object. This might not satisfy some science cases that need the flux within a fixed area/aperture. Fortunately Gnuastro’s modular programs make it very easy do this type of measurement (photometry). To do this, we can ignore the labeled images of NoiseChisel of Segment, we can just built our own labeled image! That labeled image can then be given to MakeCatalog To generate the apertures catalog we will use Gnuastro’s MakeProfiles (see *note MakeProfiles::). But first we need a list of positions (aperture photometry needs a-priori knowledge of your target positions). So we will first read the clump positions from the F160W catalog, then use AWK to set the other parameters of each profile to be a fixed circle of radius 5 pixels (recall that we want all apertures to have an identical size/area in this scenario). $ rm *.fits *.txt $ asttable cat/xdf-f160w.fits -hCLUMPS -cRA,DEC \ | awk '!/^#/{print NR, $1, $2, 5, 5, 0, 0, 1, NR, 1}' \ > apertures.txt $ cat apertures.txt We can now feed this catalog into MakeProfiles using the command below to build the apertures over the image. The most important option for this particular job is ‘--mforflatpix’, it tells MakeProfiles that the values in the magnitude column should be used for each pixel of a flat profile. Without it, MakeProfiles would build the profiles such that the _sum_ of the pixels of each profile would have a _magnitude_ (in log-scale) of the value given in that column (what you would expect when simulating a galaxy for example). See *note Invoking astmkprof:: for details on the options. $ astmkprof apertures.txt --background=flat-ir/xdf-f160w.fits \ --clearcanvas --replace --type=int16 --mforflatpix \ --mode=wcs --output=apertures.fits Open ‘apertures.fits’ with a FITS image viewer (like SAO DS9) and look around at the circles placed over the targets. Also open the input image and Segment’s clumps image and compare them with the positions of these circles. Where the apertures overlap, you will notice that one label has replaced the other (because of the ‘--replace’ option). In the future, MakeCatalog will be able to work with overlapping labels, but currently it does not. If you are interested, please join us in completing Gnuastro with added improvements like this (see task 14750 (1)). We can now feed the ‘apertures.fits’ labeled image into MakeCatalog instead of Segment’s output as shown below. In comparison with the previous MakeCatalog call, you will notice that there is no more ‘--clumpscat’ option, since there is no more separate “clump” image now, each aperture is treated as a separate “object”. $ astmkcatalog apertures.fits -h1 --zeropoint=26.27 \ --valuesfile=nc/xdf-f105w.fits \ --ids --ra --dec --magnitude --sn \ --output=cat/xdf-f105w-aper.fits This catalog has the same number of rows as the catalog produced from clumps in *note Working with catalogs estimating colors::. Therefore similar to how we found colors, you can compare the aperture and clump magnitudes for example. You can also change the filter name and zero point magnitudes and run this command again to have the fixed aperture magnitude in the F160W filter and measure colors on apertures. ---------- Footnotes ---------- (1) 2.1.18 Matching catalogs ------------------------ In the example above, we had the luxury to generate the catalogs ourselves, and where thus able to generate them in a way that the rows match. But this is not generally the case. In many situations, you need to use catalogs from many different telescopes, or catalogs with high-level calculations that you cannot simply regenerate with the same pixels without spending a lot of time or using heavy computation. In such cases, when each catalog has the coordinates of its own objects, you can use the coordinates to match the rows with Gnuastro’s Match program (see *note Match::). As the name suggests, Gnuastro’s Match program will match rows based on distance (or aperture in 2D) in one, two, or three columns. For this tutorial, let’s try matching the two catalogs that were not created from the same labeled images, recall how each has a different number of rows: $ asttable cat/xdf-f105w.fits -hCLUMPS -i $ asttable cat/xdf-f160w.fits -hCLUMPS -i You give Match two catalogs (from the two different filters we derived above) as argument, and the HDUs containing them (if they are FITS files) with the ‘--hdu’ and ‘--hdu2’ options. The ‘--ccol1’ and ‘--ccol2’ options specify the coordinate-columns which should be matched with which in the two catalogs. With ‘--aperture’ you specify the acceptable error (radius in 2D), in the same units as the columns. $ astmatch cat/xdf-f160w.fits cat/xdf-f105w.fits \ --hdu=CLUMPS --hdu2=CLUMPS \ --ccol1=RA,DEC --ccol2=RA,DEC \ --aperture=0.5/3600 \ --output=matched.fits $ astfits matched.fits From the second command, you see that the output has two extensions and that both have the same number of rows. The rows in each extension are the matched rows of the respective input table: those in the first HDU come from the first input and those in the second HDU come from the second. However, their order may be different from the input tables because the rows match: the first row in the first HDU matches with the first row in the second HDU, etc. You can also see which objects did not match with the ‘--notmatched’, like below. Note how each extension of now has a different number of rows. $ astmatch cat/xdf-f160w.fits cat/xdf-f105w.fits \ --hdu=CLUMPS --hdu2=CLUMPS \ --ccol1=RA,DEC --ccol2=RA,DEC \ --aperture=0.5/3600 \ --output=not-matched.fits --notmatched $ astfits not-matched.fits The ‘--outcols’ of Match is a very convenient feature: you can use it to specify which columns from the two catalogs you want in the output (merge two input catalogs into one). If the first character is an ‘’, the respective matched column (number or name, similar to Table above) in the first catalog will be written in the output table. When the first character is a ‘’, the respective column from the second catalog will be written in the output. Also, if the first character is followed by ‘_all’, then all the columns from the respective catalog will be put in the output. $ astmatch cat/xdf-f160w.fits cat/xdf-f105w.fits \ --hdu=CLUMPS --hdu2=CLUMPS \ --ccol1=RA,DEC --ccol2=RA,DEC \ --aperture=0.35/3600 \ --outcols=a_all,bMAGNITUDE,bSN \ --output=matched.fits $ astfits matched.fits 2.1.19 Reddest clumps, cutouts and parallelization -------------------------------------------------- As a final step, let’s go back to the original clumps-based color measurement we generated in *note Working with catalogs estimating colors::. We will find the objects with the strongest color and make a cutout to inspect them visually and finally, we will see how they are located on the image. With the command below, we will select the reddest objects (those with a color larger than 1.5): $ asttable cat/mags-with-color.fits --range=F105W-F160W,1.5,inf You can see how many they are by piping it to ‘wc -l’: $ asttable cat/mags-with-color.fits --range=F105W-F160W,1.5,inf | wc -l Let’s crop the F160W image around each of these objects, but we first need a unique identifier for them. We will define this identifier using the object and clump labels (with an underscore between them) and feed the output of the command above to AWK to generate a catalog. Note that since we are making a plain text table, we will define the necessary (for the string-type first column) metadata manually (see *note Gnuastro text table format::). $ echo "# Column 1: ID [name, str10] Object ID" > cat/reddest.txt $ asttable cat/mags-with-color.fits --range=F105W-F160W,1.5,inf \ | awk '{printf("%d_%-10d %f %f\n", $1, $2, $3, $4)}' \ >> cat/reddest.txt Let’s see how these objects are positioned over the dataset. DS9 has the “Region”s concept for this purpose. And you build such regions easily from a table using Gnuastro’s ‘astscript-ds9-region’ installed script, using the command below: $ astscript-ds9-region cat/reddest.txt -c2,3 --mode=wcs \ --command="ds9 flat-ir/xdf-f160w.fits -zscale" We can now feed ‘cat/reddest.txt’ into Gnuastro’s Crop program to get separate postage stamps for each object. To keep things clean, we will make a directory called ‘crop-red’ and ask Crop to save the crops in this directory. We will also add a ‘-f160w.fits’ suffix to the crops (to remind us which filter they came from). The width of the crops will be 15 arc-seconds (or 15/3600 degrees, which is the units of the WCS). $ mkdir crop-red $ astcrop flat-ir/xdf-f160w.fits --mode=wcs --namecol=ID \ --catalog=cat/reddest.txt --width=15/3600,15/3600 \ --suffix=-f160w.fits --output=crop-red Like the MakeProfiles command in *note Aperture photometry::, if you look at the order of the crops, you will notice that the crops are not made in order! This is because each crop is independent of the rest, therefore crops are done in parallel, and parallel operations are asynchronous. So the order can differ in each run, but the final output is the same! In the command above, you can change ‘f160w’ to ‘f105w’ to make the crops in both filters. You can see all the cropped FITS files in the ‘crop-red’ directory with this command: $ astscript-fits-view crop-red/*.fits To view the crops more easily (not having to open ds9 for each image), you can convert the FITS crops into the JPEG format with a shell loop like below. $ cd crop-red $ for f in *.fits; do \ astconvertt $f --fluxlow=-0.001 --fluxhigh=0.005 --invert -ojpg; \ done $ cd .. $ ls crop-red/ You can now use your general graphic user interface image viewer to flip through the images more easily, or import them into your papers/reports. The ‘for’ loop above to convert the images will do the job in series: each file is converted only after the previous one is complete. But like the crops, each JPEG image is independent, so let’s parallelize it. In other words, we want to run more than one instance of the command at any moment. To do that, we will use Make (https://en.wikipedia.org/wiki/Make_(software)). Make is a very wonderful pipeline management system, and the most common and powerful implementation is GNU Make (https://www.gnu.org/software/make), which has a complete manual just like this one. We cannot go into the details of Make here, for a hands-on video tutorial, see this video introduction (https://peertube.stream/w/iJitjS3r232Z8UPMxKo6jq). To do the process above in Make, please copy the contents below into a plain-text file called ‘Makefile’. Just replace the ‘__[TAB]__’ part at the start of the line with a single ‘’ button on your keyboard. jpgs=$(subst .fits,.jpg,$(wildcard *.fits)) all: $(jpgs) $(jpgs): %.jpg: %.fits __[TAB]__astconvertt $< --fluxlow=-0.001 --fluxhigh=0.005 \ __[TAB]__ --invert -o$ Now that the ‘Makefile’ is ready, you can run Make on 12 threads using the commands below. Feel free to replace the 12 with any number of threads you have on your system (you can find out by running the ‘nproc’ command on GNU/Linux operating systems): $ make -j12 Did you notice how much faster this one was? When possible, it is always very helpful to do your analysis in parallel. You can build very complex workflows with Make, for example, see Akhlaghi et al. (2021) (https://arxiv.org/abs/2006.03018) so it is worth spending some time to master. 2.1.20 FITS images in a publication ----------------------------------- In the previous section (*note Reddest clumps cutouts and parallelization::), we visually inspected the positions of the reddest objects using DS9. That is very good for an interactive inspection of the objects: you can zoom-in and out, you can do measurements, etc. Once the experimentation phase of your project is complete, you want to show these objects over the whole image in a report, paper or slides. One solution is to use DS9 itself! For example, run the ‘astscript-fits-view’ command of the previous section to open DS9 with the regions over-plotted. Click on the “File” menu and select “Save Image”. In the side-menu that opens, you have multiple formats to select from. Usually for publications, we want to show the regions and text (in the colorbar) in vector graphics, so it is best to export to EPS. Once you have made the EPS, you can then convert it to PDF with the ‘epspdf’ command. Another solution is to use Gnuastro’s ConvertType program. The main difference is that DS9 is a Graphic User Interface (GUI) program, so it takes relatively long (about a second) to load, and it requires many dependencies. This will slow-down automatic conversion of many files, and will make your code hard to move to another operating system. DS9 does have a command-line interface that you can use to automate the creation of each file, however, it has a very peculiar command-line interface and formats (like the “region” files). However, in ConvertType, there is no graphic interface, so it has very few dependencies, it is fast, and finally, it takes normal tables (in plain-text or FITS) as input. So in this concluding step of the analysis, let’s build a nice publication-ready plot, showing the positions of the reddest objects in the image for our paper. In *note Reddest clumps cutouts and parallelization::, we already used ConvertType to make JPEG postage stamps. Here, we will use it to make a PDF image of the whole deep region. To start, let’s simply run ConvertType on the F160W image: $ astconvertt flat-ir/xdf-f160w.fits -oxdf.pdf Open the output in a PDF viewer. You see that it is almost fully black! Let’s see why this happens! First, with the two commands below, let’s calculate the maximum value, and the standard deviation of the sky in this image (using NoiseChisel’s output, which we found at the end of *note NoiseChisel optimization for detection::). Note that NoiseChisel writes the median sky standard deviation _before_ interpolation in the ‘MEDSTD’ keyword of the ‘SKY_STD’ HDU. This is more robust than the median of the Sky standard deviation image (which has gone through interpolation). $ max=$(aststatistics nc/xdf-f160w.fits -hINPUT-NO-SKY --maximum) $ skystd=$(astfits nc/xdf-f160w.fits -hSKY_STD --keyvalue=MEDSTD -q) $ echo $max $skystd 58.8292 0.000410282 $ echo $max $skystd | awk '{print $1/$2}' 143387 In the last command above, we divided the maximum by the sky standard deviation. You see that the maximum value is more than $140000$ times larger than the noise level! On the other hand common monitors or printers, usually have a maximum dynamic range of 8-bits, only allowing for $2^8=256$ layers. This is therefore the maximum number of “layers” you can have in a common display formats like JPEG, PDF or PNG! Dividing the result above by 256, we get a layer spacing of $ echo $max $skystd | awk '{print $1/$2/256}' 560.106 In other words, the first layer (which is black) will contain all the pixel values below $\sim560$! So all pixels with a signal-to-noise ratio lower than $\sim560$ will have a black color since they fall in the first layer of an 8-bit PDF (or JPEG) image. This happens because by default we are assuming a linear mapping from floating point to 8-bit integers. To fix this, we should move to a different mapping. A good, physically motivated, mapping is Surface Brightness (which is in log-scale, see *note Brightness flux magnitude::). Fortunately this is very easy to do with Gnuastro’s Arithmetic program, as shown in the commands below (using the known zero point(1), and after calculating the pixel area in units of arcsec$^2$): $ zeropoint=25.94 $ pixarcsec2=$(astfits nc/xdf-f160w.fits --pixelareaarcsec2) $ astarithmetic nc/xdf-f160w.fits $zeropoint $pixarcsec2 counts-to-sb \ --output=xdf-f160w-sb.fits With the two commands below, first, let’s look at the dynamic range of the image now (dividing the maximum by the minimum), and then let’s open the image and have a look at it: $ aststatistics xdf-f160w-sb.fits --minimum --maximum $ astscript-fits-view xdf-f160w-sb.fits The good news is that the dynamic range has now decreased to about 2! In other words, we can distribute the 256 layers of an 8-bit display over a much smaller range of values, and therefore better visualize the data. However, there are two important points to consider from the output of the first command and a visual inspection of the second. • The largest pixel value (faintest surface brightness level) in the image is $\sim43$! This is far too low to be realistic, and is just due to noise. As discussed in *note Measuring the dataset limits::, the $3\sigma$ surface brightness limit of this image, over 100 arcsec$^2$ is roughly 32.66 mag/arcsec$^2$. • You see many NaN pixels in between the galaxies! These are due to the fact that the magnitude is defined on a logarithmic scale and the logarithm of a negative number is not defined. In other words, we should replace all NaN pixels, and pixels with a surface brightness value fainter than the image surface brightness limit to this limit. With the first command below, we will first extract the surface brightness limit from the catalog headers that we calculated before, and then call Arithmetic to use this limit. $ sblimit=$(astfits cat/xdf-f160w.fits --keyvalue=SBLMAG -q) $ astarithmetic nc/xdf-f160w.fits $zeropoint $pixarcsec2 \ counts-to-sb set-sb \ sb sb $sblimit gt sb isblank or $sblimit where \ --output=xdf-f160w-sb.fits Let’s convert this image into a PDF with the command below: $ astconvertt xdf-f160w-sb.fits --output=xdf-f160w-sb.pdf It is much better now and we can visualize many features of the FITS file (from the central structures of the galaxies and stars, to a little into the noise and their low surface brightness features. However, the image generally looks a little too gray! This is because of that bright star in the bottom half of the image! Stars are very sharp! So let’s manually tell ConvertType to set any pixel with a value less than (brighter than) 20 to black (and not use the minimum). We do this with the ‘--fluxlow’ option: $ astconvertt xdf-f160w-sb.fits --output=xdf-f160w-sb.pdf --fluxlow=20 We are still missing some of the diffuse flux in this PDF. This is because of those negative pixels that were set to NaN. To better show these structures, we should warp the image to larger pixels. So let’s warp it to a pixel grid where the new pixels are $4\times4$ larger than the original pixels. But be careful that warping should be done on the original image, not on the surface brightness image. We should re-calculate the surface brightness image after the warping is one. This is because $log(a+b)\ne log(a)+log(b)$. Recall that surface brightness calculation involves a logarithm, and warping involves addition of pixel values. $ astwarp nc/xdf-f160w.fits --scale=1/4 --centeroncorner \ --output=xdf-f160w-warped.fits $ pixarcsec2=$(astfits xdf-f160w-warped.fits --pixelareaarcsec2) $ astarithmetic xdf-f160w-warped.fits $zeropoint $pixarcsec2 \ counts-to-sb set-sb \ sb sb $sblimit gt sb isblank or $sblimit where \ --output=xdf-f160w-sb.fits $ astconvertt xdf-f160w-sb.fits --output=xdf-f160w-sb.pdf --fluxlow=20 Above, we needed to re-calculate the pixel area of the warpped image, but we did not need to re-calculate the surface brightness limit! The reason is that the surface brightness limit is independent of the pixel area (in its derivation, the pixel area has been accounted for). As a side-effect of the warping, the number of pixels in the image also dramatically decreased, therefore the volume of the output PDF (in bytes) is also smaller, making your paper/report easier to upload/download or send by email. This visual resolution is still more than enough for including on top of a column in your paper! *I do not have the zero point of my image:* The absolute value of the zero point is irrelevant for the finally produced PDF. We used it here because it was available and makes the numbers physically understandable. If you do not have the zero point, just set it to zero (which is also the default zero point used by MakeCatalog when it estimates the surface brightness limit). For the value to ‘--fluxlow’ above, you can simply subtract $\sim10$ from the surface brightness limit. To summarize, and to keep the image for the next section in a separate directory, here are the necessary commands: $ zeropoint=25.94 $ mkdir report-image $ sblimit=$(astfits cat/xdf-f160w.fits --keyvalue=SBLMAG -q) $ astwarp nc/xdf-f160w.fits --scale=1/4 --centeroncorner \ --output=report-image/warped.fits $ pixarcsec2=$(astfits report-image/warped.fits --pixelareaarcsec2) $ astarithmetic report-image/warped.fits $zeropoint $pixarcsec2 \ counts-to-sb set-sb \ sb sb $sblimit gt sb isblank or $sblimit where \ --output=report-image/sb.fits $ astconvertt report-image/sb.fits --output=report-image/sb.pdf \ --fluxlow=20 Finally, let’s remove all the temporary files we built in the top-level tutorial directory: $ rm *.fits *.pdf ---------- Footnotes ---------- (1) 2.1.21 Marking objects for publication -------------------------------------- In *note FITS images in a publication:: we created a ready-to-print visualization of the FITS image used in this tutorial. However, you rarely want to show a naked image like that! You usually want to highlight some objects (that are the target of your science) over the image and show different marks for the various types of objects you are studying. In this tutorial, we will do just that: select a sub-set of the full catalog of clumps, and show them with different marks shapes and colors, while also adding some text under each mark. To add coordinates on the edges of the figure in your paper, see *note Annotations for figure in paper::. To start with, let’s put a red plus sign over the sub-sample of reddest clumps similar to *note Reddest clumps cutouts and parallelization::. First, we will need to make the table of marks. We will choose those with a color stronger than 1.5 magnitudes and a signal-to-noise ratio (in F160W) larger than 5. We also only need the RA, Dec, color and magnitude (in F160W) columns. $ asttable cat/mags-with-color.fits --range=F105W-F160W,1.5:inf \ --range=sn-f160w,5:inf -cRA,DEC,MAG-F160w,F105W-F160W \ -oreport-image/reddest-cat.fits To keep the rest of the code easier to read, let’s move to the ‘report-image’ directory: $ cd report-image Gnuastro’s ConvertType program also has features to add marks over the finally produced PDF. Below, we will start with the same ‘astconvertt’ command of the previous section. The positions of the marks should be given as a table to the ‘--marks’ option. Two other options are also mandatory: ‘--markcoords’ identifies the columns that contain the coordinates of each mark and ‘--mode’ specifies if the coordinates are in image or WCS coordinates. $ astconvertt sb.fits --output=reddest.pdf --fluxlow=20 \ --marks=reddest-cat.fits --mode=wcs \ --markcoords=RA,DEC Open the output ‘reddest.pdf’ and see the result. You will see relatively thick red circles placed over the given coordinates. In your PDF browser, zoom-in to one of the regions, you will see that while the pixels of the background image become larger, the lines of these regions do not degrade! This is the concept/power of Vector Graphics: ideal for publication! For more on raster (pixelated) and vector (infinite-resolution) graphics, see *note Raster and Vector graphics::. We had planned to put a plus-sign on each object. However, because we did not explicitly ask for a certain shape, ConvertType put a circle. Each mark can have its own separate shape. Shapes can be given by a name or a code. The full list of available shapes names and codes is given in the description of ‘--markshape’ option of *note Drawing with vector graphics::. To use a different shape, we need to add a new column to the base table, containing the identifier of the desired shape for each mark. For example, the code for the plus sign is ‘2’. With the commands below, we will add a new column with this fixed value. With the first AWK command we will make a single-column file, where all the rows have the same value. We pipe our base table into AWK, so it has the same number of rows. With the second command, we concatenate (or append) the new column with Table, and give this new column the name ‘SHAPE’ (to easily refer to it later and not have to count). With the third command, we clean-up behind our selves (deleting the extra ‘params.txt’ file). Finally, we use the ‘--markshape’ option to tell ConvertType which column to use for the shape identifier. $ asttable reddest-cat.fits | awk '{print 2}' > params.txt $ asttable reddest-cat.fits --catcolumnfile=params.txt \ --colmetadata=5,SHAPE,id,"Shape of mark" \ --output=reddest-marks.fits $ rm params.txt $ astconvertt sb.fits --output=reddest.pdf --fluxlow=20 \ --marks=reddest-marks.fits --mode=wcs \ --markcoords=RA,DEC --markshape=SHAPE Open the PDF and have a look! You do see red signs over the coordinates, but the thick plus-signs only become visible after you zoom-in multiple times! To make them larger, you can give another column to specify the size of each mark. Let’s set the full width of the plus sign to extend 3 arcseconds. The commands are similar to above, try to follow the difference (in particular, how we use ‘--sizeinarcsec’). $ asttable reddest-cat.fits | awk '{print 2, 3}' > params.txt $ asttable reddest-cat.fits --catcolumnfile=params.txt \ --colmetadata=5,SHAPE,id,"Shape of mark" \ --colmetadata=6,SIZE,arcsec,"Size in arcseconds" \ --output=reddest-marks.fits $ rm params.txt $ astconvertt sb.fits --output=reddest.pdf --fluxlow=20 \ --marks=reddest-marks.fits --mode=wcs \ --markcoords=RA,DEC --markshape=SHAPE \ --marksize=SIZE --sizeinarcsec The power of this methodology is that each mark can be completely different! For example, let’s show the objects with a color less than 2 magnitudes with a circle, and those with a stronger color with a plus (recall that the code for a circle was ‘1’ and that of a plus was ‘2’). You only need to replace the first command above with the one below. Afterwards, run the rest of the commands in the last code-block. $ asttable reddest-cat.fits -cF105W-F160W \ | awk '{if($1<2) shape=1; else shape=2; print shape, 3}' \ > params.txt Have a look at the resulting ‘reddest.pdf’. You see that the circles are much larger than the plus signs. This is because the “size” of a cross is defined to be its full width, but for a circle, the value in the size column is the radius. The way each shape interprets the value of the size column is fully described under ‘--markshape’ of *note Drawing with vector graphics::. To make them more comparable, let’s set the circle sizes to be half of the cross sizes. $ asttable reddest-cat.fits -cF105W-F160W \ | awk '{if($1<2) {shape=1; size=1.5} \ else {shape=2; size=3} \ print shape, size}' \ > params.txt Let’s make things a little more complex (and show more information in the visualization) by using color. Gnuastro recognizes the full extended web colors (https://en.wikipedia.org/wiki/Web_colors#Extended_colors), for their full list (containing names and codes) see *note Vector graphics colors::. But like everything else, an even easier way to view and select the color for your figure is on the command-line! If your terminal supports 24-bit true-color, you can see all the colors by running this command (supported on modern GNU/Linux distributions): $ astconvertt --listcolors we will give a “Sienna” color for the objects that are fainter than 29th magnitude and a “deeppink” color to the brighter ones (while keeping the same shapes definition as before) Since there are many colors, using their codes can make the table hard to read by a human! So let’s use the color names instead of the color codes in the example below (this is useful in other columns require strings-only, like the font name). The only intricacy is in the making of ‘params.txt’. Recall that string columns need column metadata (*note Gnuastro text table format::). In this particular case, since the string column is the last one, we can safely use AWK’s ‘print’ command. But if you have multiple string columns, to be safe it is better to use AWK’s ‘printf’ and explicitly specify the number of characters in the string columns. $ asttable reddest-cat.fits -cF105W-F160W,MAG-F160W \ | awk 'BEGIN{print "# Column 3: COLOR [name, str8]"}\ {if($1<2) {shape=1; size=1.5} \ else {shape=2; size=3} \ if($2>29) {color="sienna"} \ else {color="deeppink"} \ print shape, size, color}' \ > params.txt $ asttable reddest-cat.fits --catcolumnfile=params.txt \ --colmetadata=5,SHAPE,id,"Shape of mark" \ --colmetadata=6,SIZE,arcsec,"Size in arcseconds" \ --output=reddest-marks.fits $ rm params.txt $ astconvertt sb.fits --output=reddest.pdf --fluxlow=20 \ --marks=reddest-marks.fits --mode=wcs \ --markcoords=RA,DEC --markshape=SHAPE \ --marksize=SIZE --sizeinarcsec --markcolor=COLOR As one final example, let’s write the magnitude of each object under it. Since the magnitude is already in the ‘marks.fits’ that we produced above, it is very easy to add it (just add ‘--marktext’ option to ConvertType): $ astconvertt sb.fits --output=reddest.pdf --fluxlow=20 \ --marks=reddest-marks.fits --mode=wcs \ --markcoords=RA,DEC --markshape=SHAPE \ --marksize=SIZE --sizeinarcsec \ --markcolor=COLOR --marktext=MAG-F160W Open the final PDF (‘reddest.pdf’) and you will see the magnitudes written under each mark in the same color. In the case of magnitudes (where the magnitude error is usually much larger than 0.01 magnitudes, four decimals is not too meaningful. By default, for printing floating point columns, we use the compiler’s default precision (which is about 4 digits for 32-bit floating point numbers). But you can over-write this (to only show two digits after the decimal point) with the ‘--marktextprecision=2’ option. You can customize the written text by specifying a different line-width (for the text, different from the main mark), or even specifying a different font for each mark! You can see the full list of available fonts for the text under a mark with the first command below and with the second, you can actually see them in a custom PDF (to only show the fonts). $ astconvertt --listfonts $ astconvertt --showfonts As you see, there are many ways you can customize each mark! The above examples were just the tip of the iceburg! But this section has already become long so we will stop it here (see the box at the end of this section for yet another useful example). Like above, each feature of a mark can be controlled with a column in the table of mark information. Please see in *note Drawing with vector graphics:: for the full list of columns/features that you can use. *Drawing ellipses:* With the commands below, you can measure the elliptical properties of the objects and visualized them in a ready-to-publish PDF (we will only show the ellipses of the largest clumps): $ astmkcatalog ../seg/xdf-f160w.fits --ra --dec --semimajor \ --axisratio --positionangle --clumpscat \ --output=ellipseinfo.fits $ asttable ellipseinfo.fits -hCLUMPS | awk '{print 4}' > params.txt $ asttable ellipseinfo.fits -hCLUMPS --catcolumnfile=params.txt \ --range=SEMI_MAJOR,10,inf -oellipse-marks.fits \ --colmetadata=6,SHAPE,id,"Shape of mark" $ astconvertt sb.fits --output=ellipse.pdf --fluxlow=20 \ --marks=ellipse-marks.fits --mode=wcs \ --markcoords=RA,DEC --markshape=SHAPE \ --marksize=SEMI_MAJOR,AXIS_RATIO --sizeinpix \ --markrotate=POSITION_ANGLE To conclude this section, let us highlight an important factor to consider in vector graphics. In ConvertType, things like line width or font size are defined in units of _points_. In vector graphics standards, 72 points correspond to one inch. Therefore, one way you can change these factors for all the objects is to assign a larger or smaller print size to the image. The print size is just a meta-data entry, and will not affect the file’s volume in bytes! You can do this with the ‘--widthincm’ option. Try adding this option and giving it very different values like ‘5’ or ‘30’. 2.1.22 Writing scripts to automate the steps -------------------------------------------- In the previous sub-sections, we went through a series of steps like downloading the necessary datasets (in *note Setup and data download::), detecting the objects in the image, and finally selecting a particular subset of them to inspect visually (in *note Reddest clumps cutouts and parallelization::). To benefit most effectively from this subsection, please go through the previous sub-sections, and if you have not actually done them, we recommended to do/run them before continuing here. Each sub-section/step of the sub-sections above involved several commands on the command-line. Therefore, if you want to reproduce the previous results (for example, to only change one part, and see its effect), you’ll have to go through all the sections above and read through them again. If you have ran the commands recently, you may also have them in the history of your shell (command-line environment). You can see many of your previous commands on the shell (even if you have closed the terminal) with the ‘history’ command, like this: $ history Try it in your teminal to see for yourself. By default in GNU Bash, it shows the last 500 commands. You can also save this “history” of previous commands to a file using shell redirection (to have it after your next 500 commands), with this command $ history > my-previous-commands.txt This is a good way to temporarily keep track of every single command you ran. But in the middle of all the useful commands, you will have many extra commands, like tests that you did before/after the good output of a step (that you decided to continue working on), or an unrelated job you had to do in the middle of this project. Because of these impurities, after a few days (that you have forgot the context: tests you did not end-up using, or unrelated jobs) reading this full history will be very frustrating. Keeping the final commands that were used in each step of an analysis is a common problem for anyone who is doing something serious with the computer. But simply keeping the most important commands in a text file is not enough, the small steps in the middle (like making a directory to keep the outputs of one step) are also important. In other words, the only way you can be sure that you are under control of your processing (and actually understand how you produced your final result) is to run the commands automatically. Fortunately, typing commands interactively with your fingers is not the only way to operate the shell. The shell can also take its orders/commands from a plain-text file, which is called a _script_. When given a script, the shell will read it line-by-line as if you have actually typed it manually. Let’s continue with an example: try typing the commands below in your shell. With these commands we are making a text file (‘a.txt’) containing a simple $3\times3$ matrix, converting it to a FITS image and computing its basic statistics. After the first three commands open ‘a.txt’ with a text editor to actually see the values we wrote in it, and after the fourth, open the FITS file to see the matrix as an image. ‘a.txt’ is created through the shell’s redirection feature: ‘‘>’’ overwrites the existing contents of a file, and ‘‘>>’’ appends the new contents after the old contents. $ echo "1 1 1" > a.txt $ echo "1 2 1" >> a.txt $ echo "1 1 1" >> a.txt $ astconvertt a.txt --output=a.fits $ aststatistics a.fits To automate these series of commands, you should put them in a text file. But that text file must have two special features: 1) It should tell the shell what program should interpret the script. 2) The operating system should know that the file can be directly executed. For the first, Unix-like operating systems define the _shebang_ concept (also known as _sha-bang_ or _hashbang_). In the shebang convention, the first two characters of a file should be ‘‘#!’’. When confronted with these characters, the script will be interpreted with the program that follows them. In this case, we want to write a shell script and the most common shell program is GNU Bash which is installed in ‘/bin/bash’. So the first line of your script should be ‘‘#!/bin/bash’’(1). It may happen (rarely) that GNU Bash is in another location on your system. In other cases, you may prefer to use a non-standard version of Bash installed in another location (that has higher priority in your ‘PATH’, see *note Installation directory::). In such cases, you can use the ‘‘#!/usr/bin/env bash’’ shebang instead. Through the ‘env’ program, this shebang will look in your ‘PATH’ and use the first ‘bash’ it finds to run your script. But for simplicity in the rest of the tutorial, we will continue with the ‘‘#!/bin/bash’’ shebang. Using your favorite text editor, make a new empty file, let’s call it ‘my-first-script.sh’. Write the GNU Bash shebang (above) as its first line. After the shebang, copy the series of commands we ran above. Just note that the ‘‘$’’ sign at the start of every line above is the prompt of the interactive shell (you never actually typed it, remember?). Therefore, commands in a shell script should not start with a ‘‘$’’. Once you add the commands, close the text editor and run the ‘cat’ command to confirm its contents. It should look like the example below. Recall that you should only type the line that starts with a ‘‘$’’, the lines without a ‘‘$’’, are printed automatically on the command-line (they are the contents of your script). $ cat my-first-script.sh #!/bin/bash echo "1 1 1" > a.txt echo "1 2 1" >> a.txt echo "1 1 1" >> a.txt astconvertt a.txt --output=a.fits aststatistics a.fits The script contents are now ready, but to run it, you should activate the script file’s _executable flag_. In Unix-like operating systems, every file has three types of flags: _read_ (or ‘r’), _write_ (or ‘w’) and _execute_ (or ‘x’). To toggle a file’s flags, you should use the ‘chmod’ (for “change mode”) command. To activate a flag, you put a ‘‘+’’ before the flag character (for example, ‘+x’). To deactivate it, you put a ‘‘-’’ (for example, ‘-x’). In this case, you want to activate the script’s executable flag, so you should run $ chmod +x my-first-script.sh Your script is now ready to run/execute the series of commands. To run it, you should call it while specifying its location in the file system. Since you are currently in the same directory as the script, it is easiest to use relative addressing like below (where ‘‘./’’ means the current directory). But before running your script, first delete the two ‘a.txt’ and ‘a.fits’ files that were created when you interactively ran the commands. $ rm a.txt a.fits $ ls $ ./my-first-script.sh $ ls The script immediately prints the statistics while doing all the previous steps in the background. With the last ‘ls’, you see that it automatically re-built the ‘a.txt’ and ‘a.fits’ files, open them and have a look at their contents. An extremely useful feature of shell scripts is that the shell will ignore anything after a ‘‘#’’ character. You can thus add descriptions/comments to the commands and make them much more useful for the future. For example, after adding comments, your script might look like this: $ cat my-first-script.sh #!/bin/bash # This script is my first attempt at learning to write shell scripts. # As a simple series of commands, I am just building a small FITS # image, and calculating its basic statistics. # Write the matrix into a file. echo "1 1 1" > a.txt echo "1 2 1" >> a.txt echo "1 1 1" >> a.txt # Convert the matrix to a FITS image. astconvertt a.txt --output=a.fits # Calculate the statistics of the FITS image. aststatistics a.fits Is Not this much more easier to read now? Comments help to provide human-friendly context to the raw commands. At the time you make a script, comments may seem like an extra effort and slow you down. But in one year, you will forget almost everything about your script and you will appreciate the effort so much! Think of the comments as an email to your future-self and always put a well-written description of the context/purpose (most importantly, things that are not directly clear by reading the commands) in your scripts. The example above was very basic and mostly redundant series of commands, to show the basic concepts behind scripts. You can put any (arbitrarily long and complex) series of commands in a script by following the two rules: 1) add a shebang, and 2) enable the executable flag. In fact, as you continue your own research projects, you will find that any time you are dealing with more than two or three commands, keeping them in a script (and modifying that script, and running it) is much more easier, and future-proof, then typing the commands directly on the command-line and relying on things like ‘history’. Here are some tips that will come in handy when you are writing your scripts: As a more realistic example, let’s have a look at a script that will do the steps of *note Setup and data download:: and *note Dataset inspection and cropping::. In particular note how often we are using variables to avoid repeating fixed strings of characters (usually file/directory names). This greatly helps in scaling up your project, and avoiding hard-to-find bugs that are caused by typos in those fixed strings. $ cat gnuastro-tutorial-1.sh #!/bin/bash # Download the input datasets # --------------------------- # # The default file names have this format (where `FILTER' differs for # each filter): # hlsp_xdf_hst_wfc3ir-60mas_hudf_FILTER_v1_sci.fits # To make the script easier to read, a prefix and suffix variable are # used to sandwich the filter name into one short line. dldir=download xdfsuffix=_v1_sci.fits xdfprefix=hlsp_xdf_hst_wfc3ir-60mas_hudf_ xdfurl=http://archive.stsci.edu/pub/hlsp/xdf # The file name and full URLs of the input data. f105w_in=$xdfprefix"f105w"$xdfsuffix f160w_in=$xdfprefix"f160w"$xdfsuffix f105w_url=$xdfurl/$f105w_in f160w_url=$xdfurl/$f160w_in # Go into the download directory and download the images there, # then come back up to the top running directory. mkdir $dldir cd $dldir wget $f105w_url wget $f160w_url cd .. # Only work on the deep region # ---------------------------- # # To help in readability, each vertice of the deep/flat field is stored # as a separate variable. They are then merged into one variable to # define the polygon. flatdir=flat-ir vertice1="53.187414,-27.779152" vertice2="53.159507,-27.759633" vertice3="53.134517,-27.787144" vertice4="53.161906,-27.807208" f105w_flat=$flatdir/xdf-f105w.fits f160w_flat=$flatdir/xdf-f160w.fits deep_polygon="$vertice1:$vertice2:$vertice3:$vertice4" mkdir $flatdir astcrop --mode=wcs -h0 --output=$f105w_flat \ --polygon=$deep_polygon $dldir/$f105w_in astcrop --mode=wcs -h0 --output=$f160w_flat \ --polygon=$deep_polygon $dldirdir/$f160w_in The first thing you may notice is that even if you already have the downloaded input images, this script will always try to re-download them. Also, if you re-run the script, you will notice that ‘mkdir’ prints an error message that the download directory already exists. Therefore, the script above is not too useful and some modifications are necessary to make it more generally useful. Here are some general tips that are often very useful when writing scripts: *Stop script if a command crashes* By default, if a command in a script crashes (aborts and fails to do what it was meant to do), the script will continue onto the next command. In GNU Bash, you can tell the shell to stop a script in the case of a crash by adding this line at the start of your script: set -e *Check if a file/directory exists to avoid re-creating it* Conditionals are a very useful feature in scripts. One common conditional is to check if a file exists or not. Assuming the file’s name is ‘FILENAME’, you can check its existance (to avoid re-doing the commands that build it) like this: if [ -f FILENAME ]; then echo "FILENAME exists" else # Some commands to generate the file echo "done" > FILENAME fi To check the existance of a directory instead of a file, use ‘-d’ instead of ‘-f’. To negate a conditional, use ‘‘!’’ and note that conditionals can be written in one line also (useful for when it is short). One common scenario that you’ll need to check the existance of directories is when you are making them: the default ‘mkdir’ command will crash if the desired directory already exists. On some systems (including GNU/Linux distributions), ‘mkdir’ has options to deal with such cases. But if you want your script to be portable, it is best to check yourself like below: if ! [ -d DIRNAME ]; then mkdir DIRNAME; fi *Avoid changing directories (with ‘‘cd’’) within the script* You can directly read and write files within other directories. Therefore using ‘cd’ to enter a directory (like what we did above, around the ‘wget’ commands), running command there and coming out is extra, and not good practice. This is because the running directory is part of the environment of a command. You can simply give the directory name before the input and output file names to use them from anywhere on the file system. See the same ‘wget’ commands below for an example. *Copyright notice:* A very important thing to put _at the top_ of your script is a one-line description of what it does and its copyright information (see the example below). Here, we specify who is the author(s) of this script, in which years, and under what license others are allowed to use this file. Without it, your script does not credibility or identity, and others cannot trust, use or acknowledge your work on it. Since Gnuastro is itself licensed under a copyleft (https://en.wikipedia.org/wiki/Copyleft) license (see *note Your rights:: and *note GNU General Public License:: or GNU GPL, the license finishes with a template on how to add it), any script that uses Gnuastro should also have a copyleft license: we recommend the same GNU GPL v3+ like below. Taking the above points into consideration, we can write a better version of the script above. Please compare this script with the previous one carefully to spot the differences. These are very important points that you will definitely encouter during your own research, and knowing them can greatly help your productiveity, so pay close attention (even in the comments). #!/bin/bash # Script to download and keep the deep region of the XDF survey. # # Copyright (C) 2021-2022 Original Author # Copyright (C) 2022 Your Name # # This script is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This script is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with Gnuastro. If not, see . # Abort the script in case of an error. set -e # Download the input datasets # --------------------------- # # The default file names have this format (where `FILTER' differs for # each filter): # hlsp_xdf_hst_wfc3ir-60mas_hudf_FILTER_v1_sci.fits # To make the script easier to read, a prefix and suffix variable are # used to sandwich the filter name into one short line. dldir=download xdfsuffix=_v1_sci.fits xdfprefix=hlsp_xdf_hst_wfc3ir-60mas_hudf_ xdfurl=http://archive.stsci.edu/pub/hlsp/xdf # The file name and full URLs of the input data. f105w_in=$xdfprefix"f105w"$xdfsuffix f160w_in=$xdfprefix"f160w"$xdfsuffix f105w_url=$xdfurl/$f105w_in f160w_url=$xdfurl/$f160w_in # Make sure the download directory exists, and download the images. if ! [ -d $dldir ]; then mkdir $dldir; fi if ! [ -f $f105w_in ]; then wget $f105w_url -O $dldir/$f105w_in; fi if ! [ -f $f160w_in ]; then wget $f160w_url -O $dldir/$f160w_in; fi # Crop out the deep region # ------------------------ # # To help in readability, each vertice of the deep/flat field is stored # as a separate variable. They are then merged into one variable to # define the polygon. flatdir=flat-ir vertice1="53.187414,-27.779152" vertice2="53.159507,-27.759633" vertice3="53.134517,-27.787144" vertice4="53.161906,-27.807208" f105w_flat=$flatdir/xdf-f105w.fits f160w_flat=$flatdir/xdf-f160w.fits deep_polygon="$vertice1:$vertice2:$vertice3:$vertice4" if ! [ -d $flatdir ]; then mkdir $flatdir; fi if ! [ -f $f105w_flat ]; then astcrop --mode=wcs -h0 --output=$f105w_flat \ --polygon=$deep_polygon $dldir/$f105w_in fi if ! [ -f $f160w_flat ]; then astcrop --mode=wcs -h0 --output=$f160w_flat \ --polygon=$deep_polygon $dldir/$f160w_in fi ---------- Footnotes ---------- (1) When the script is to be run by the same shell that is calling it (like this script), the shebang is optional. But it is still recommended, because it ensures that even if the user is not using GNU Bash, the script will be run in GNU Bash: given the differences between various shells, writing truly portable shell scripts, that can be run by many shell programs/implementations, is not easy (sometimes not possible!). 2.1.23 Citing and acknowledging Gnuastro ---------------------------------------- In conclusion, we hope this extended tutorial has been a good starting point to help in your exciting research. If this book or any of the programs in Gnuastro have been useful for your research, please cite the respective papers, and acknowledge the funding agencies that made all of this possible. Without citations, we will not be able to secure future funding to continue working on Gnuastro or improving it, so please take software citation seriously (for all the scientific software you use, not just Gnuastro). To help you in this, all Gnuastro programs have a ‘--cite’ option to facilitate the citation and acknowledgment. Just note that it may be necessary to cite additional papers for different programs, so please try it out on all the programs that you used, for example: $ astmkcatalog --cite $ astnoisechisel --cite 2.2 Detecting large extended targets ==================================== The outer wings of large and extended objects can sink into the noise very gradually and can have a large variety of shapes (for example, due to tidal interactions). Therefore separating the outer boundaries of the galaxies from the noise can be particularly tricky. Besides causing an under-estimation in the total estimated brightness of the target, failure to detect such faint wings will also cause a bias in the noise measurements, thereby hampering the accuracy of any measurement on the dataset. Therefore even if they do not constitute a significant fraction of the target’s light, or are not your primary target, these regions must not be ignored. In this tutorial, we will walk you through the strategy of detecting such targets using *note NoiseChisel::. *Do Not start with this tutorial:* If you have not already completed *note General program usage tutorial::, we strongly recommend going through that tutorial before starting this one. Basic features like access to this book on the command-line, the configuration files of Gnuastro’s programs, benefiting from the modular nature of the programs, viewing multi-extension FITS files, or using NoiseChisel’s outputs are discussed in more detail there. We will try to detect the faint tidal wings of the beautiful M51 group(1) in this tutorial. We will use a dataset/image from the public Sloan Digital Sky Survey (http://www.sdss.org/), or SDSS. Due to its more peculiar low surface brightness structure/features, we will focus on the dwarf companion galaxy of the group (or NGC 5195). ---------- Footnotes ---------- (1) 2.2.1 Downloading and validating input data ------------------------------------------- To get the image, you can use the simple field search (https://dr12.sdss.org/fields) tool of SDSS. As long as it is covered by the SDSS, you can find an image containing your desired target either by providing a standard name (if it has one), or its coordinates. To access the dataset we will use here, write ‘NGC5195’ in the “Object Name” field and press “Submit” button. *Type the example commands:* Try to type the example commands on your terminal and use the history feature of your command-line (by pressing the “up” button to retrieve previous commands). Do Not simply copy and paste the commands shown here. This will help simulate future situations when you are processing your own datasets. You can see the list of available filters under the color image. For this demonstration, we will use the r-band filter image. By clicking on the “r-band FITS” link, you can download the image. Alternatively, you can just run the following command to download it with GNU Wget(1). To keep things clean, let’s also put it in a directory called ‘ngc5195’. With the ‘-O’ option, we are asking Wget to save the downloaded file with a more manageable name: ‘r.fits.bz2’ (this is an r-band image of NGC 5195, which was the directory name). $ mkdir ngc5195 $ cd ngc5195 $ topurl=https://dr12.sdss.org/sas/dr12/boss/photoObj/frames $ wget $topurl/301/3716/6/frame-r-003716-6-0117.fits.bz2 -Or.fits.bz2 When you want to reproduce a previous result (a known analysis, on a known dataset, to get a known result: like the case here!) it is important to verify that the file is correct: that the input file has not changed (on the remote server, or in your own archive), or there was no downloading problem. Otherwise, if the data have changed in your server/archive, and you use the same script, you will get a different result, causing a lot of confusion! One good way to verify the contents of a file is to store its _Checksum_ in your analysis script and check it before any other operation. The _Checksum_ algorithms look into the contents of a file and calculate a fixed-length string from them. If any change (even in a bit or byte) is made within the file, the resulting string will change, for more see Wikipedia (https://en.wikipedia.org/wiki/Checksum). There are many common algorithms, but a simple one is the SHA-1 algorithm (https://en.wikipedia.org/wiki/SHA-1) (Secure Hash Algorithm 1) that you can calculate easily with the command below (the second line is the output, and the checksum is the first/long string: it is independent of the file name) $ sha1sum r.fits.bz2 5fb06a572c6107c72cbc5eb8a9329f536c7e7f65 r.fits.bz2 If the checksum on your computer is different from this, either the file has been incorrectly downloaded (most probable), or it has changed on SDSS servers (very unlikely(2)). To get a better feeling of checksums open your favorite text editor and make a test file by writing something in it. Save it and calculate the text file’s SHA-1 checksum with ‘sha1sum’. Try renaming that file, and you’ll see the checksum has not changed (checksums only look into the contents, not the name/location of the file). Then open the file with your text editor again, make a change and re-calculate its checksum, you’ll see the checksum string has changed. Its always good to keep this short checksum string with your project’s scripts and validate your input data before using them. You can do this with a shell conditional like this: filename=r.fits.bz2 expected=5fb06a572c6107c72cbc5eb8a9329f536c7e7f65 sum=$(sha1sum $filename | awk '{print $1}') if [ $sum = $expected ]; then echo "$filename: validated" else echo "$filename: wrong checksum!" exit 1 fi Now that we know you have the same data that we wrote this tutorial with, let’s continue. The SDSS server keeps the files in a Bzip2 compressed file format (that have a ‘.bz2’ suffix). So we will first decompress it with the following command to use it as a normal FITS file. By convention, compression programs delete the original file (compressed when uncompressing, or uncompressed when compressing). To keep the original file, you can use the ‘--keep’ or ‘-k’ option which is available in most compression programs for this job. Here, we do not need the compressed file any more, so we will just let ‘bunzip’ delete it for us and keep the directory clean. $ bunzip2 r.fits.bz2 ---------- Footnotes ---------- (1) To make the command easier to view on screen or in a page, we have defined the top URL of the image as the ‘topurl’ shell variable. You can just replace the value of this variable with ‘$topurl’ in the ‘wget’ command. (2) If your checksum is different, try uncompressing the file with the ‘bunzip2’ command after this, and open the resulting FITS file. If it opens and you see the image of M51 and NGC5195, then there was no download problem, and the file has indeed changed on the SDSS servers! In this case, please contact us at ‘bug-gnuastro@gnu.org’. 2.2.2 NoiseChisel optimization ------------------------------ In *note Detecting large extended targets:: we downloaded the single exposure SDSS image. Let’s see how NoiseChisel operates on it with its default parameters: $ astnoisechisel r.fits -h0 As described in *note NoiseChisel and Multi-Extension FITS files::, NoiseChisel’s default output is a multi-extension FITS file. Open the output ‘r_detected.fits’ file and have a look at the extensions, the 0-th extension is only meta-data and contains NoiseChisel’s configuration parameters. The rest are the Sky-subtracted input, the detection map, Sky values and Sky standard deviation. $ ds9 -mecube r_detected.fits -zscale -zoom to fit Flipping through the extensions in a FITS viewer, you will see that the first image (Sky-subtracted image) looks reasonable: there are no major artifacts due to bad Sky subtraction compared to the input. The second extension also seems reasonable with a large detection map that covers the whole of NGC5195, but also extends towards the bottom of the image where we actually see faint and diffuse signal in the input image. Now try flipping between the ‘DETECTIONS’ and ‘SKY’ extensions. In the ‘SKY’ extension, you’ll notice that there is still significant signal beyond the detected pixels. You can tell that this signal belongs to the galaxy because the far-right side of the image (away from M51) is dark (has lower values) and the brighter parts in the Sky image (with larger values) are just under the detections and follow a similar pattern. The fact that signal from the galaxy remains in the ‘SKY’ HDU shows that NoiseChisel can be optimized for a much better result. The ‘SKY’ extension must not contain any light around the galaxy. Generally, any time your target is much larger than the tile size and the signal is very diffuse and extended at low signal-to-noise values (like this case), this _will_ happen. Therefore, when there are large objects in the dataset, *the best place* to check the accuracy of your detection is the estimated Sky image. When dominated by the background, noise has a symmetric distribution. However, signal is not symmetric (we do not have negative signal). Therefore when non-constant(1) signal is present in a noisy dataset, the distribution will be positively skewed. For a demonstration, see Figure 1 of Akhlaghi and Ichikawa [2015] (https://arxiv.org/abs/1505.01664). This skewness is a good measure of how much faint signal we have in the distribution. The skewness can be accurately measured by the difference in the mean and median (assuming no strong outliers): the more distant they are, the more skewed the dataset is. This important concept will be discussed more extensively in the next section (*note Skewness caused by signal and its measurement::). However, skewness is only a proxy for signal when the signal has structure (varies per pixel). Therefore, when it is approximately constant over a whole tile, or sub-set of the image, the constant signal’s effect is just to shift the symmetric center of the noise distribution to the positive and there will not be any skewness (major difference between the mean and median). This positive(2) shift that preserves the symmetric distribution is the Sky value. When there is a gradient over the dataset, different tiles will have different constant shifts/Sky-values, for example, see Figure 11 of Akhlaghi and Ichikawa [2015] (https://arxiv.org/abs/1505.01664). To make this very large diffuse/flat signal detectable, you will therefore need a larger tile to contain a larger change in the values within it (and improve number statistics, for less scatter when measuring the mean and median). So let’s play with the tessellation a little to see how it affects the result. In Gnuastro, you can see the option values (‘--tilesize’ in this case) by adding the ‘-P’ option to your last command. Try running NoiseChisel with ‘-P’ to see its default tile size. You can clearly see that the default tile size is indeed much smaller than this (huge) galaxy and its tidal features. As a result, NoiseChisel was unable to identify the skewness within the tiles under the outer parts of M51 and NGC 5159 and the threshold has been over-estimated on those tiles. To see which tiles were used for estimating the quantile threshold (no skewness was measured), you can use NoiseChisel’s ‘--checkqthresh’ option: $ astnoisechisel r.fits -h0 --checkqthresh Did you see how NoiseChisel aborted after finding and applying the quantile thresholds? When you call any of NoiseChisel’s ‘--check*’ options, by default, it will abort as soon as all the check steps have been written in the check file (a multi-extension FITS file). This allows you to focus on the problem you wanted to check as soon as possible (you can disable this feature with the ‘--continueaftercheck’ option). To optimize the threshold-related settings for this image, let’s play with this quantile threshold check image a little. Do Not forget that “_Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer_” (Anscombe 1973, see *note Science and its tools::). A good scientist must have a good understanding of her tools to make a meaningful analysis. So do not hesitate in playing with the default configuration and reviewing the manual when you have a new dataset (from a new instrument) in front of you. Robust data analysis is an art, therefore a good scientist must first be a good artist. So let’s open the check image as a multi-extension cube: $ ds9 -mecube r_qthresh.fits -zscale -cmap sls -zoom to fit The first extension (called ‘CONVOLVED’) of ‘r_qthresh.fits’ is the convolved input image where the threshold(s) is(are) defined (and later applied to). For more on the effect of convolution and thresholding, see Sections 3.1.1 and 3.1.2 of Akhlaghi and Ichikawa [2015] (https://arxiv.org/abs/1505.01664). The second extension (‘QTHRESH_ERODE’) has a blank/white value for all the pixels of any tile that was identified as having significant signal. The other tiles have the measured threshold over them. The next two extensions (‘QTHRESH_NOERODE’ and ‘QTHRESH_EXPAND’) are the other two quantile thresholds that are necessary in NoiseChisel’s later steps. Every step in this file is repeated on the three thresholds. Play a little with the color bar of the ‘QTHRESH_ERODE’ extension, you clearly see how the non-blank tiles around NGC 5195 have a gradient. As one line of attack against discarding too much signal below the threshold, NoiseChisel rejects outlier tiles. Go forward by three extensions to ‘VALUE1_NO_OUTLIER’ and you will see that many of the tiles over the galaxy have been removed in this step. For more on the outlier rejection algorithm, see the latter half of *note Quantifying signal in a tile::. Even though much of the galaxy’s footprint has been rejected as outliers, there are still tiles with signal remaining: play with the DS9 color-bar and you still see a gradient near the outer tidal feature of the galaxy. Before trying to correct this, let’s look at the other extensions of this check image. We will use a ‘*’ as a wild-card that can be 1, 2 or 3. In the ‘THRESH*_INTERP’ extensions, you see that all the blank tiles have been interpolated using their nearest neighbors (the relevant option here is ‘--interpnumngb’). In the following ‘THRESH*_SMOOTH’ extensions, you can see the tile values after smoothing (configured with ‘--smoothwidth’ option). Finally, in ‘QTHRESH-APPLIED’, you see the thresholded image: pixels with a value of 1 will be eroded later, but pixels with a value of 2 will pass the erosion step un-touched. Let’s get back to the problem of optimizing the result. You have two strategies for detecting the outskirts of the merging galaxies: 1) Increase the tile size to get more accurate measurements of skewness. 2) Strengthen the outlier rejection parameters to discard more of the tiles with signal. Fortunately in this image we have a sufficiently large region on the right side of the image that the galaxy does not extend to. So we can use the more robust first solution. In situations where this does not happen (for example, if the field of view in this image was shifted to the left to have more of M51 and less sky) you are limited to a combination of the two solutions or just to the second solution. *Skipping convolution for faster tests:* The slowest step of NoiseChisel is the convolution of the input dataset. Therefore when your dataset is large (unlike the one in this test), and you are not changing the input dataset or kernel in multiple runs (as in the tests of this tutorial), it is faster to do the convolution separately once (using *note Convolve::) and use NoiseChisel’s ‘--convolved’ option to directly feed the convolved image and avoid convolution. For more on ‘--convolved’, see *note NoiseChisel input::. To better identify the skewness caused by the flat NGC 5195 and M51 tidal features on the tiles under it, we have to choose a larger tile size. Let’s try a tile size of 100 by 100 pixels and inspect the check image. $ astnoisechisel r.fits -h0 --tilesize=100,100 --checkqthresh $ ds9 -mecube r_qthresh.fits -zscale -cmap sls -zoom to fit You can clearly see the effect of this increased tile size: the tiles are much larger and when you look into ‘VALUE1_NO_OUTLIER’, you see that all the tiles are nicely grouped on the right side of the image (the farthest from M51, where we do not see a gradient in ‘QTHRESH_ERODE’). Things look good now, so let’s remove ‘--checkqthresh’ and let NoiseChisel proceed with its detection. $ astnoisechisel r.fits -h0 --tilesize=100,100 $ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit The detected pixels of the ‘DETECTIONS’ extension have expanded a little, but not as much. Also, the gradient in the ‘SKY’ image is almost fully removed (and does not fall over M51 anymore). However, on the bottom-right of the m51 detection, we see many holes gradually increasing in size. This hints that there is still signal out there. Let’s check the next series of detection steps by adding the ‘--checkdetection’ option this time: $ astnoisechisel r.fits -h0 --tilesize=100,100 --checkdetection $ ds9 -mecube r_detcheck.fits -zscale -cmap sls -zoom to fit The output now has 16 extensions, showing every step that is taken by NoiseChisel. The first and second (‘INPUT’ and ‘CONVOLVED’) are clear from their names. The third (‘THRESHOLDED’) is the thresholded image after finding the quantile threshold (last extension of the output of ‘--checkqthresh’). The fourth HDU (‘ERODED’) is new: it is the name-stake of NoiseChisel, or eroding pixels that are above the threshold. By erosion, we mean that all pixels with a value of ‘1’ (above the threshold) that are touching a pixel with a value of ‘0’ (below the threshold) will be flipped to zero (or “carved” out)(3). You can see its effect directly by going back and forth between the ‘THRESHOLDED’ and ‘ERODED’ extensions. In the fifth extension (‘OPENED-AND-LABELED’) the image is “opened”, which is a name for eroding once, then dilating (dilation is the inverse of erosion). This is good to remove thin connections that are only due to noise. Each separate connected group of pixels is also given its unique label here. Do you see how just beyond the large M51 detection, there are many smaller detections that get smaller as you go more distant? This hints at the solution: the default number of erosions is too much. Let’s see how many erosions take place by default (by adding ‘-P | grep erode’ to the previous command) $ astnoisechisel r.fits -h0 --tilesize=100,100 -P | grep erode We see that the value of ‘erode’ is ‘2’. The default NoiseChisel parameters are primarily targeted to processed images (where there is correlated noise due to all the processing that has gone into the warping and stacking of raw images, see *note NoiseChisel optimization for detection::). In those scenarios 2 erosions are commonly necessary. But here, we have a single-exposure image where there is no correlated noise (the pixels are not mixed). So let’s see how things change with only one erosion: $ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1 \ --checkdetection $ ds9 -mecube r_detcheck.fits -zscale -cmap sls -zoom to fit Looking at the ‘OPENED-AND-LABELED’ extension again, we see that the main/large detection is now much larger than before. While the immediately-outer connected regions are still present, they have decreased dramatically, so we can pass this step. After the ‘OPENED-AND-LABELED’ extension, NoiseChisel goes onto finding false detections using the undetected pixels. The process is fully described in Section 3.1.5. (Defining and Removing False Detections) of arXiv:1505.01664 (https://arxiv.org/pdf/1505.01664.pdf). Please compare the extensions to what you read there and things will be very clear. In the last HDU (‘DETECTION-FINAL’), we have the final detected pixels that will be used to estimate the Sky and its Standard deviation. We see that the main detection has indeed been detected very far out, so let’s see how the full NoiseChisel will estimate the Sky and its standard deviation (by removing ‘--checkdetection’): $ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1 $ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit The ‘DETECTIONS’ extension of ‘r_detected.fits’ closely follows what the ‘DETECTION-FINAL’ of the check image (looks good!). If you go ahead to the ‘SKY’ extension, things still look good. But it can still be improved. Look at the ‘DETECTIONS’ again, you will see the right-ward edges of M51’s detected pixels have many “holes” that are fully surrounded by signal (value of ‘1’) and the signal stretches out in the noise very thinly (the size of the holes increases as we go out). This suggests that there is still undetected signal and that we can still dig deeper into the noise. With the ‘--detgrowquant’ option, NoiseChisel will “grow” the detections in to the noise. Its value is the ultimate limit of the growth in units of quantile (between 0 and 1). Therefore ‘--detgrowquant=1’ means no growth and ‘--detgrowquant=0.5’ means an ultimate limit of the Sky level (which is usually too much and will cover the whole image!). See Figure 2 of arXiv:1909.11230 (https://arxiv.org/pdf/1909.11230.pdf) for more on this option. Try running the previous command with various values (from 0.6 to higher values) to see this option’s effect on this dataset. For this particularly huge galaxy (with signal that extends very gradually into the noise), we will set it to ‘0.75’: $ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1 \ --detgrowquant=0.75 $ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit Beyond this level (smaller ‘--detgrowquant’ values), you see many of the smaller background galaxies (towards the right side of the image) starting to create thin spider-leg-like features, showing that we are following correlated noise for too much. Please try it for yourself by changing it to ‘0.6’ for example. When you look at the ‘DETECTIONS’ extension of the command shown above, you see the wings of the galaxy being detected much farther out, But you also see many holes which are clearly just caused by noise. After growing the objects, NoiseChisel also allows you to fill such holes when they are smaller than a certain size through the ‘--detgrowmaxholesize’ option. In this case, a maximum area/size of 10,000 pixels seems to be good: $ astnoisechisel r.fits -h0 --tilesize=100,100 --erode=1 \ --detgrowquant=0.75 --detgrowmaxholesize=10000 $ ds9 -mecube r_detected.fits -zscale -cmap sls -zoom to fit When looking at the raw input image (which is very “shallow”: less than a minute exposure!), you do not see anything so far out of the galaxy. You might just think to yourself that “this is all noise, I have just dug too deep and I’m following systematics”! If you feel like this, have a look at the deep images of this system in Watkins et al. [2015] (https://arxiv.org/abs/1501.04599), or a 12 hour deep image of this system (with a 12-inch telescope): (4). In these deeper images you clearly see how the outer edges of the M51 group follow this exact structure, below in *note Achieved surface brightness level::, we will measure the exact level. As the gradient in the ‘SKY’ extension shows, and the deep images cited above confirm, the galaxy’s signal extends even beyond this. But this is already far deeper than what most (if not all) other tools can detect. Therefore, we will stop configuring NoiseChisel at this point in the tutorial and let you play with the other options a little more, while reading more about it in the papers (Akhlaghi and Ichikawa [2015] (https://arxiv.org/abs/1505.01664) and Akhlaghi [2019] (https://arxiv.org/abs/1909.11230)) and *note NoiseChisel::. When you do find a better configuration feel free to contact us for feedback. Do Not forget that good data analysis is an art, so like a sculptor, master your chisel for a good result. To avoid typing all these options every time you run NoiseChisel on this image, you can use Gnuastro’s configuration files, see *note Configuration files::. For an applied example of setting/using them, see *note Option management and configuration files::. *This NoiseChisel configuration is NOT GENERIC:* Do Not use the configuration derived above, on another instrument’s image _blindly_. If you are unsure, just use the default values. As you saw above, the reason we chose this particular configuration for NoiseChisel to detect the wings of the M51 group was strongly influenced by the noise properties of this particular image. Remember *note NoiseChisel optimization for detection::, where we looked into the very deep XDF image which had strong correlated noise? As long as your other images have similar noise properties (from the same data-reduction step of the same instrument), you can use your configuration on any of them. But for images from other instruments, please follow a similar logic to what was presented in these tutorials to find the optimal configuration. *Smart NoiseChisel:* As you saw during this section, there is a clear logic behind the optimal parameter value for each dataset. Therefore, we plan to add capabilities to (optionally) automate some of the choices made here based on the actual dataset, please join us in doing this if you are interested. However, given the many problems in existing “smart” solutions, such automatic changing of the configuration may cause more problems than they solve. So even when they are implemented, we would strongly recommend quality checks for a robust analysis. ---------- Footnotes ---------- (1) by constant, we mean that it has a single value in the region we are measuring. (2) In processed images, where the Sky value can be over-estimated, this constant shift can be negative. (3) Pixels with a value of ‘2’ are very high signal-to-noise pixels, they are not eroded, to preserve sharp and bright sources. (4) The image is taken from this Reddit discussion: 2.2.3 Skewness caused by signal and its measurement --------------------------------------------------- In the previous section (*note NoiseChisel optimization::) we showed how to customize NoiseChisel for a single-exposure SDSS image of the M51 group. During the customization, we also discussed the skewness caused by signal. In the next section (*note Image surface brightness limit::), we will use this to measure the surface brightness limit of the image. However, to better understand NoiseChisel and also, the image surface brightness limit, understanding the skewness caused by signal, and how to measure it properly are very important. Therefore now that we have separated signal from noise, let’s pause for a moment and look into skewness, how signal creates it, and find the best way to measure it. Let’s start masking all the detected pixels found at the end of the previous section (*note NoiseChisel optimization::) and having a look at the noise distribution with Gnuastro’s Arithmetic and Statistics programs as shown below (while visually inspecting the masked image with DS9 in the middle). $ astarithmetic r_detected.fits -hINPUT-NO-SKY set-in \ r_detected.fits -hDETECTIONS set-det \ in det nan where -odet-masked.fits $ ds9 det-masked.fits $ aststatistics det-masked.fits You will see that Gnuastro’s Statistics program prints an ASCII histogram when no option is given (it is shown below). This is done to give you a fast and easy view of the distribution of values in the dataset (pixels in an image, or rows in a table’s column). ------- Input: det-masked.fits (hdu: 1) ------- Number of elements: 903920 Minimum: -0.113543 Maximum: 0.130339 Median: -0.00216306 Mean: -0.0001893073877 Standard deviation: 0.02569057188 ------- Histogram: | ** * | * ** * * | ** ** * * | * ** ** ** * | ** ** ** ** * ** | ** ** ** ** * ** * | * ** ** ** ** * ** ** | ** ** ** ** **** ** ** * | ** ** ** ** ** **** ** ** ** * | ** ** ** ** ** ** ******* ** ** ** * |*********** ** ** ** ******************* ** ** ** ** ***** ** ***** ** |---------------------------------------------------------------------- This histogram shows a roughly symmetric noise distribution, so let’s have a look at its skewness. The most commonly used definition of skewness is known as the “Pearson’s first skewness coefficient”. It measures the difference between the mean and median, in units of the standard deviation (STD): $$\rm{Skewness}\equiv\frac{(\rm{mean}-\rm{median})}{\rm{STD}}$$ The logic behind this definition is simple: as more signal is added to the same pixels that originally only have raw noise (skewness is increased), the mean shifts to the positive faster than the median, so the distance between the mean and median should increase. Let’s measure the skewness (as defined above) over the image without any signal. Its very easy with Gnuastro’s Statistics program (and piping the output to AWK): $ aststatistics det-masked.fits --mean --median --std \ | awk '{print ($1-$2)/$3}' 0.0768279 We see that the mean and median are only $0.08\sigma$ (rounded) away from each other (which is very close)! All pixels with significant signal are masked, so this is expected, and everything is fine. Now, let’s check the pixel distribution of the sky-subtracted input (where pixels with significant signal remain, and are not masked): $ ds9 r_detected.fits $ aststatistics r_detected.fits -hINPUT-NO-SKY ------- Input: r_detected.fits (hdu: INPUT-NO-SKY) Unit: nanomaggy ------- Number of elements: 3049472 Minimum: -0.113543 Maximum: 159.25 Median: 0.0241158 Mean: 0.1057885317 Standard deviation: 0.698167489 ------- Histogram: |* |* |* |* |* |* |* |* |* |* |******************************************* *** ** **** * * * * * |---------------------------------------------------------------------- Comparing the distributions above, you can see that the _minimum_ value of the image has not changed because we have not masked the minimum values. However, as expected, the _maximum_ value of the image has changed (from $0.13$ to $159.25$). This is clearly evident from the ASCII histogram: the distribution is very elongated because the galaxy inside the image is extremely bright. Now, let’s limit the displayed information with the ‘--lessthan=0.13’ option of Statistics as shown below (to only use values less than 0.13; the maximum of the image where all signal is masked). $ aststatistics r_detected.fits -hINPUT-NO-SKY --lessthan=0.13 ------- Input: r_detected.fits (hdu: INPUT-NO-SKY) Range: up to (exclusive) 0.13. Unit: nanomaggy ------- Number of elements: 2531949 Minimum: -0.113543 Maximum: 0.126233 Median: 0.0137138 Mean: 0.01735551527 Standard deviation: 0.03590550597 ------- Histogram: | * | * ** ** | * * ** ** ** | * * ** ** ** * | * ** * ** ** ** * | ** ** * ** ** ** * * | * ** ** * ** ** ** * * | ** ** ** * ** ** ** ** * ** * | * ** ** **** ** ** ** **** ** ** ** | * ** ** ** **** ** ** ** ******* ** ** ** * ** ** ** |***** ** ********** ** ** ********** ** ********** ** ************* ** |---------------------------------------------------------------------- The improvement is obvious: the ASCII histogram better shows the pixel values near the noise level. We can now compare with the distribution of ‘det-masked.fits’ that we found earlier. The ASCII histogram of ‘det-masked.fits’ was approximately symmetric, while this is asymmetric in this range, especially in outer (to the right, or positive) direction. The heavier right-side tail is a clear visual demonstration of skewness that is caused by the signal in the un-masked image. Having visually confirmed the skewness, let’s quantify it with Pearson’s first skewness coefficient. Like before, we can simply use Gnuastro’s Statistics and AWK for the measurement and calculation: $ aststatistics r_detected.fits --mean --median --std \ | awk '{print ($1-$2)/$3}' 0.116982 The difference between the mean and median is now approximately $0.12\sigma$. This is larger than the skewness of the masked image (which was approximately $0.08\sigma$). At a glance (only looking at the numbers), it seems that there is not much difference between the two distributions. However, visually looking at the non-masked image, or the ASCII histogram, you would expect the quantified skewness to be much larger than that of the masked image, but that has not happened! Why is that? The reason is that the presence of signal does not only shift the mean and median, it _also_ increases the standard deviation! To see this for yourself, compare the standard deviation of ‘det-masked.fits’ (which was approximately $0.025$) to ‘r_detected.fits’ (without ‘--lessthan’; which was approximately $0.699$). The latter is almost 28 times larger! This happens because the standard deviation is defined only in a symmetric (and Gaussian) distribution. In a non-Gaussian distribution, the standard deviation is poorly defined and is not a good measure of “width”. Since Pearson’s first skewness coefficient is defined in units of the standard deviation, this very large increase in the standard deviation has hidden the much increased distance between the mean and median after adding signal. We therefore need a better unit or scale to quantify the distance between the mean and median. A unit that is less affected by skewness or outliers. One solution that we have found to be very useful is the quantile units or quantile scale. The quantile scale is defined by first sorting the dataset (which has $N$ elements). If we want the quantile of a value $V$ in a distribution, we first find the nearest data element to $V$ in the sorted dataset. Let’s assume the nearest element is the $i$-th element, counting from 0, after sorting. The quantile of V in that distribution is then defined as $i/(N-1)$ (which will have a value between 0 and 1). The quantile of the median is obvious from its definition: 0.5. This is because the median is defined to be the middle element of the distribution after sorting. We can therefore define skewness as the quantile of the mean ($q_m$). If $q_m\sim0.5$ (the median), then the distribution (of signal blended in noise) is symmetric (possibly Gaussian, but the functional form is irrelevant here). A larger value for $|q_m-0.5|$ quantifies a more skewed the distribution. Furthermore, a $q_m>0.5$ signifies a positive skewness, while $q_m<0.5$ signifies a negative skewness. Let’s put this definition to a test on the same two images we have already created. Fortunately Gnuastro’s Statistics program has the ‘--quantofmean’ option to easily calculate $q_m$ for you. So testing is easy: $ aststatistics det-masked.fits --quantofmean 0.51295636 $ aststatistics r_detected.fits -hINPUT-NO-SKY --quantofmean 0.8105163158 The two quantiles of mean are now very distinctly different ($0.51$ and $0.81$): differing by about $0.3$ (on a scale of 0 to 1)! Recall that when defining skewness with Pearson’s first skewness coefficient, their difference was negligible ($0.04\sigma$)! You can now better appreciate why we discussed quantile so extensively in *note NoiseChisel optimization::. In case you would like to know more about the usage of the quantile of the mean in Gnuastro, please see *note Quantifying signal in a tile::, or watch this video demonstration: . 2.2.4 Image surface brightness limit ------------------------------------ When your science is related to extended emission (like the example here) and you are presenting your results in a scientific conference, usually the first thing that someone will ask (if you do not explicitly say it!), is the dataset’s _surface brightness limit_ (a standard measure of the noise level), and your target’s surface brightness (a measure of the signal, either in the center or outskirts, depending on context). For more on the basics of these important concepts please see *note Quantifying measurement limits::). So in this section of the tutorial, we will measure these values for this image and this target. Before measuring the surface brightness limit, let’s see how reliable our detection was. In other words, let’s see how “clean” our noise is (after masking all detections, as described previously in *note Skewness caused by signal and its measurement::) $ aststatistics det-masked.fits --quantofmean 0.5111848629 Showing that the mean is indeed very close to the median, although just about 1 quantile larger. As we saw in *note NoiseChisel optimization::, a very small residual signal still remains in the undetected regions and this very small difference is a quantitative measure of that undetected signal. It was up to you as an exercise to improve it, so we will continue with this dataset. The surface brightness limit of the image can be measured from the masked image and the equation in *note Quantifying measurement limits::. Let’s do it for a $3\sigma$ surface brightness limit over an area of $25 \rm{arcsec}^2$: $ nsigma=3 $ zeropoint=22.5 $ areaarcsec2=25 $ std=$(aststatistics det-masked.fits --sigclip-std) $ pixarcsec2=$(astfits det-masked.fits --pixelscale --quiet \ | awk '{print $3*3600*3600}') $ astarithmetic --quiet $nsigma $std x \ $areaarcsec2 $pixarcsec2 x \ sqrt / $zeropoint counts-to-mag 26.0241 The customizable steps above are good for any type of mask. for example, your field of view may contain a very deep part so you need to mask all the shallow parts _as well as_ the detections before these steps. But when your image is flat (like this), there is a much simpler method to obtain the same value through MakeCatalog (when the standard deviation image is made by NoiseChisel). NoiseChisel has already calculated the minimum (‘MINSTD’), maximum (‘MAXSTD’) and median (‘MEDSTD’) standard deviation within the tiles during its processing and has stored them as FITS keywords within the ‘SKY_STD’ HDU. You can see them by piping all the keywords in this HDU into ‘grep’. In Grep, each ‘.’ represents one character that can be anything so ‘M..STD’ will match all three keywords mentioned above. $ astfits r_detected.fits --hdu=SKY_STD | grep 'M..STD' The ‘MEDSTD’ value is very similar to the standard deviation derived above, so we can safely use it instead of having to mask and run Statistics. In fact, MakeCatalog also uses this keyword and will report the dataset’s $n\sigma$ surface brightness limit as keywords in the output (not as measurement columns, since it is related to the noise, not labeled signal): $ astmkcatalog r_detected.fits -hDETECTIONS --output=sbl.fits \ --forcereadstd --ids Before looking into the measured surface brightness limits, let’s review some important points about this call to MakeCatalog first: • We are only concerned with the noise (not the signal), so we do not ask for any further measurements, because they can un-necessarily slow it down. However, MakeCatalog requires at least one column, so we will only ask for the ‘--ids’ column (which does not need any measurement!). The output catalog will therefore have a single row and a single column, with 1 as its value(1). • If we do not ask for any noise-related column (for example, the signal-to-noise ratio column with ‘--sn’, among other noise-related columns), MakeCatalog is not going to read the noise standard deviation image (again, to speed up its operation when it is redundant). We are thus using the ‘--forcereadstd’ option (short for “force read standard deviation image”) here so it is ready for the surface brightness limit measurements that are written as keywords. With the command below you can see all the keywords that were measured with the table. Notice the group of keywords that are under the “Surface brightness limit (SBL)” title. $ astfits sbl.fits -h1 Since all the keywords of interest here start with ‘SBL’, we can get a more cleaner view with this command. $ astfits sbl.fits -h1 | grep ^SBL Notice how the ‘SBLSTD’ has the same value as NoiseChisel’s ‘MEDSTD’ above. Using ‘SBLSTD’, MakeCatalog has determined the $n\sigma$ surface brightness limiting magnitude in these header keywords. The multiple of $\sigma$, or $n$, is the value of the ‘SBLNSIG’ keyword which you can change with the ‘--sfmagnsigma’. The surface brightness limiting magnitude within a pixel (‘SBLNSIG’) and within a pixel-agnostic area of ‘SBLAREA’ arcsec$^2$ are stored in ‘SBLMAG’. You will notice that the two surface brightness limiting magnitudes above have values around 3 and 4 (which is not correct!). This is because we have not given a zero point magnitude to MakeCatalog, so it uses the default value of ‘0’. SDSS image pixel values are calibrated in units of “nanomaggy” which are defined to have a zero point magnitude of 22.5(2). So with the first command below we give the zero point value and with the second we can see the surface brightness limiting magnitudes with the correct values (around 25 and 26) $ astmkcatalog r_detected.fits -hDETECTIONS --zeropoint=22.5 \ --output=sbl.fits --forcereadstd --ids $ astfits sbl.fits -h1 | grep ^SBL As you see from ‘SBLNSIG’ and ‘SBLAREA’, the default multiple of sigma is 1 and the default area is 1 arcsec$^2$. Usually higher values are used for these two parameters. Following the manual example we did above, you can ask for the multiple of sigma to be 3 and the area to be 25 arcsec$^2$: $ astmkcatalog r_detected.fits -hDETECTIONS --zeropoint=22.5 \ --output=sbl.fits --sfmagarea=25 --sfmagnsigma=3 \ --forcereadstd --ids $ astfits sbl.fits -h1 | awk '/^SBLMAG /{print $3}' 26.02296 You see that the value is identical to the custom surface brightness limiting magnitude we measured above (a difference of $0.00114$ magnitudes is negligible and hundreds of times larger than the typical errors in the zero point magnitude or magnitude measurements). But it is much more easier to have MakeCatalog do this measurement, because these values will be appended (as keywords) into your final catalog of objects within that image. *Custom STD for MakeCatalog’s Surface brightness limit:* You can manually change/set the value of the ‘MEDSTD’ keyword in your input STD image with *note Fits::: $ std=$(aststatistics masked.fits --sigclip-std) $ astfits noisechisel.fits -hSKY_STD --update=MEDSTD,$std With this change, MakeCatalog will use your custom standard deviation for the surface brightness limit. This is necessary in scenarios where your image has multiple depths and during your masking, you also mask the shallow regions (as well as the detections of course). We have successfully measured the image’s $3\sigma$ surface brightness limiting magnitude over 25 arcsec$^2$. However, as discussed in *note Quantifying measurement limits:: this value is just an extrapolation of the per-pixel standard deviation. Issues like correlated noise will cause the real noise over a large area to be different. So for a more robust measurement, let’s use the upper-limit magnitude of similarly sized region. For more on the upper-limit magnitude, see the respective item in *note Quantifying measurement limits::. In summary, the upper-limit measurements involve randomly placing the footprint of an object in undetected parts of the image many times. This results in a random distribution of brightness measurements, the standard deviation of that distribution is then converted into magnitudes. To be comparable with the results above, let’s make a circular aperture that has an area of 25 arcsec$^2$ (thus with a radius of $2.82095$ arcsec). zeropoint=22.5 r_arcsec=2.82095 ## Convert the radius (in arcseconds) to pixels. r_pixel=$(astfits r_detected.fits --pixelscale -q \ | awk '{print '$r_arcsec'/($1*3600)}') ## Make circular aperture at pixel (100,100) position is irrelevant. echo "1 100 100 5 $r_pixel 0 0 1 1 1" \ | astmkprof --background=r_detected.fits \ --clearcanvas --mforflatpix --type=uint8 \ --output=lab.fits ## Do the upper-limit measurement, ignoring all NoiseChisel's ## detections as a mask for the upper-limit measurements. astmkcatalog lab.fits -h1 --zeropoint=$zeropoint -osbl.fits \ --sfmagarea=25 --sfmagnsigma=3 --forcereadstd \ --valuesfile=r_detected.fits --valueshdu=INPUT-NO-SKY \ --upmaskfile=r_detected.fits --upmaskhdu=DETECTIONS \ --upnsigma=3 --checkuplim=1 --upnum=1000 \ --ids --upperlimitsb The ‘sbl.fits’ catalog now contains the upper-limit surface brightness for a circle with an area of 25 arcsec$^2$. You can check the value with the command below, but the great thing is that now you have both of the surface brightness limiting magnitude in the headers discussed above, and the upper-limit surface brightness within the table. You can also add more profiles with different shapes and sizes if necessary. Of course, you can also use ‘--upperlimitsb’ in your actual science objects and clumps to get an object-specific or clump-specific value. $ asttable sbl.fits -cUPPERLIMIT_SB 25.9119 You will get a slightly different value from the command above. In fact, if you run the MakeCatalog command again and look at the measured upper-limit surface brightness, it will be slightly different with your first trial! Please try exactly the same MakeCatalog command above a few times to see how it changes. This is because of the _random_ factor in the upper-limit measurements: every time you run it, different random points will be checked, resulting in a slightly different distribution. You can decrease the random scatter by increasing the number of random checks (for example, setting ‘--upnum=100000’, compared to 1000 in the command above). But this will be slower and the results will not be exactly reproducible. The only way to ensure you get an identical result later is to fix the random number generator function and seed like the command below(3). This is a very important point regarding any statistical process involving random numbers, please see *note Generating random numbers::. export GSL_RNG_TYPE=ranlxs1 export GSL_RNG_SEED=1616493518 astmkcatalog lab.fits -h1 --zeropoint=$zeropoint -osbl.fits \ --sfmagarea=25 --sfmagnsigma=3 --forcereadstd \ --valuesfile=r_detected.fits --valueshdu=INPUT-NO-SKY \ --upmaskfile=r_detected.fits --upmaskhdu=DETECTIONS \ --upnsigma=3 --checkuplim=1 --upnum=1000 \ --ids --upperlimitsb --envseed But where do all the random apertures of the upper-limit measurement fall on the image? It is good to actually inspect their location to get a better understanding for the process and also detect possible bugs/biases. When MakeCatalog is run with the ‘--checkuplim’ option, it will print all the random locations and their measured brightness as a table in a file with the suffix ‘_upcheck.fits’. With the first command below you can use Gnuastro’s ‘asttable’ and ‘astscript-ds9-region’ to convert the successful aperture locations into a DS9 region file, and with the second can load the region file into the detections and sky-subtracted image to visually see where they are. ## Create a DS9 region file from the check table (activated ## with '--checkuplim') asttable lab_upcheck.fits --noblank=RANDOM_SUM \ | astscript-ds9-region -c1,2 --mode=img \ --radius=$r_pixel ## Have a look at the regions in relation with NoiseChisel's ## detections. ds9 r_detected.fits[INPUT-NO-SKY] -regions load ds9.reg ds9 r_detected.fits[DETECTIONS] -regions load ds9.reg In this example, we were looking at a single-exposure image that has no correlated noise. Because of this, the surface brightness limit and the upper-limit surface brightness are very close. They will have a bigger difference on deep datasets with stronger correlated noise (that are the result of stacking many individual exposures). As an exercise, please try measuring the upper-limit surface brightness level and surface brightness limit for the deep HST data that we used in the previous tutorial (*note General program usage tutorial::). ---------- Footnotes ---------- (1) Recall that NoiseChisel’s output is a binary image: 0-valued pixels are noise and 1-valued pixel are signal. NoiseChisel does not identify sub-structure over the signal, this is the job of Segment, see *note Extract clumps and objects::. (2) From (3) You can use any integer for the seed. One recommendation is to run MakeCatalog without ‘--envseed’ once and use the randomly generated seed that is printed on the terminal. 2.2.5 Achieved surface brightness level --------------------------------------- In *note NoiseChisel optimization:: we customized NoiseChisel for a single-exposure SDSS image of the M51 group and in *note Image surface brightness limit:: we measured the surface brightness limit and the upper-limit surface brightness level (which are both measures of the noise level). In this section, let’s do some measurements on the outer-most edges of the M51 group to see how they relate to the noise measurements found in the previous section. For this measurement, we will need to estimate the average flux on the outer edges of the detection. Fortunately all this can be done with a few simple commands using *note Arithmetic:: and *note MakeCatalog::. First, let’s separate each detected region, or give a unique label/counter to all the connected pixels of NoiseChisel’s detection map with the command below. Recall that with the ‘set-’ operator, the popped operand will be given a name (‘det’ in this case) for easy usage later. $ astarithmetic r_detected.fits -hDETECTIONS set-det \ det 2 connected-components -olabeled.fits You can find the label of the main galaxy visually (by opening the image and hovering your mouse over the M51 group’s label). But to have a little more fun, let’s do this automatically (which is necessary in a general scenario). The M51 group detection is by far the largest detection in this image, this allows us to find its ID/label easily. We will first run MakeCatalog to find the area of all the labels, then we will use Table to find the ID of the largest object and keep it as a shell variable (‘id’): # Run MakeCatalog to find the area of each label. $ astmkcatalog labeled.fits --ids --geoarea -h1 -ocat.fits ## Sort the table by the area column. $ asttable cat.fits --sort=AREA_FULL ## The largest object, is the last one, so we will use '--tail'. $ asttable cat.fits --sort=AREA_FULL --tail=1 ## We only want the ID, so let's only ask for that column: $ asttable cat.fits --sort=AREA_FULL --tail=1 --column=OBJ_ID ## Now, let's put this result in a variable (instead of printing) $ id=$(asttable cat.fits --sort=AREA_FULL --tail=1 --column=OBJ_ID) ## Just to confirm everything is fine. $ echo $id We can now use the ‘id’ variable to reject all other detections: $ astarithmetic labeled.fits $id eq -oonly-m51.fits Open the image and have a look. To separate the outer edges of the detections, we will need to “erode” the M51 group detection. So in the same Arithmetic command as above, we will erode three times (to have more pixels and thus less scatter), using a maximum connectivity of 2 (8-connected neighbors). We will then save the output in ‘eroded.fits’. $ astarithmetic labeled.fits $id eq 2 erode 2 erode 2 erode \ -oeroded.fits In ‘labeled.fits’, we can now set all the 1-valued pixels of ‘eroded.fits’ to 0 using Arithmetic’s ‘where’ operator added to the previous command. We will need the pixels of the M51 group in ‘labeled.fits’ two times: once to do the erosion, another time to find the outer pixel layer. To do this (and be efficient and more readable) we will use the ‘set-i’ operator (to give this image the name ‘‘i’’). In this way we can use it any number of times afterwards, while only reading it from disk and finding M51’s pixels once. $ astarithmetic labeled.fits $id eq set-i i \ i 2 erode 2 erode 2 erode 0 where -oedge.fits Open the image and have a look. You’ll see that the detected edge of the M51 group is now clearly visible. You can use ‘edge.fits’ to mark (set to blank) this boundary on the input image and get a visual feeling of how far it extends: $ astarithmetic r.fits -h0 edge.fits nan where -oedge-masked.fits To quantify how deep we have detected the low-surface brightness regions (in units of signal to-noise ratio), we will use the command below. In short it just divides all the non-zero pixels of ‘edge.fits’ in the Sky subtracted input (first extension of NoiseChisel’s output) by the pixel standard deviation of the same pixel. This will give us a signal-to-noise ratio image. The mean value of this image shows the level of surface brightness that we have achieved. You can also break the command below into multiple calls to Arithmetic and create temporary files to understand it better. However, if you have a look at *note Reverse polish notation:: and *note Arithmetic operators::, you should be able to easily understand what your computer does when you run this command(1). $ astarithmetic edge.fits -h1 set-edge \ r_detected.fits -hSKY_STD set-skystd \ r_detected.fits -hINPUT-NO-SKY set-skysub \ skysub skystd / edge not nan where meanvalue --quiet We have thus detected the wings of the M51 group down to roughly 1/3rd of the noise level in this image which is a very good achievement! But the per-pixel S/N is a relative measurement. Let’s also measure the depth of our detection in absolute surface brightness units; or magnitudes per square arc-seconds (see *note Brightness flux magnitude::). We will also ask for the S/N and magnitude of the full edge we have defined. Fortunately doing this is very easy with Gnuastro’s MakeCatalog: $ astmkcatalog edge.fits -h1 --valuesfile=r_detected.fits \ --zeropoint=22.5 --ids --surfacebrightness --sn \ --magnitude $ asttable edge_cat.fits 1 25.6971 55.2406 15.8994 We have thus reached an outer surface brightness of $25.70$ magnitudes/arcsec$^2$ (second column in ‘edge_cat.fits’) on this single exposure SDSS image! This is very similar to the surface brightness limit measured in *note Image surface brightness limit:: (which is a big achievement!). But another point in the result above is very interesting: the total S/N of the edge is $55.24$ with a total edge magnitude(2) of 15.90!!! This is very large for such a faint signal (recall that the mean S/N per pixel was 0.32) and shows a very important point in the study of galaxies: While the per-pixel signal in their outer edges may be very faint (and invisible to the eye in noise), a lot of signal hides deeply buried in the noise. In interpreting this value, you should just have in mind that NoiseChisel works based on the contiguity of signal in the pixels. Therefore the larger the object, the deeper NoiseChisel can carve it out of the noise (for the same outer surface brightness). In other words, this reported depth, is the depth we have reached for this object in this dataset, processed with this particular NoiseChisel configuration. If the M51 group in this image was larger/smaller than this (the field of view was smaller/larger), or if the image was from a different instrument, or if we had used a different configuration, we would go deeper/shallower. ---------- Footnotes ---------- (1) ‘edge.fits’ (extension ‘1’) is a binary (0 or 1 valued) image. Applying the ‘not’ operator on it, just flips all its pixels (from ‘0’ to ‘1’ and vice-versa). Using the ‘where’ operator, we are then setting all the newly 1-valued pixels (pixels that are not on the edge) to NaN/blank in the sky-subtracted input image (‘r_detected.fits’, extension ‘INPUT-NO-SKY’, which we call ‘skysub’). We are then dividing all the non-blank pixels (only those on the edge) by the sky standard deviation (‘r_detected.fits’, extension ‘SKY_STD’, which we called ‘skystd’). This gives the signal-to-noise ratio (S/N) for each of the pixels on the boundary. Finally, with the ‘meanvalue’ operator, we are taking the mean value of all the non-blank pixels and reporting that as a single number. (2) You can run MakeCatalog on ‘only-m51.fits’ instead of ‘edge.fits’ to see the full magnitude of the M51 group in this image. 2.2.6 Extract clumps and objects (Segmentation) ----------------------------------------------- In *note NoiseChisel optimization:: we found a good detection map over the image, so pixels harboring signal have been differentiated from those that do not. For noise-related measurements like the surface brightness limit, this is fine. However, after finding the pixels with signal, you are most likely interested in knowing the sub-structure within them. for example, how many star forming regions (those bright dots along the spiral arms) of M51 are within this image? What are the colors of each of these star forming regions? In the outer most wings of M51, which pixels belong to background galaxies and foreground stars? And many more similar questions. To address these questions, you can use *note Segment:: to identify all the “clumps” and “objects” over the detection. $ astsegment r_detected.fits --output=r_segmented.fits $ ds9 -mecube r_segmented.fits -cmap sls -zoom to fit -scale limits 0 2 Open the output ‘r_segmented.fits’ as a multi-extension data cube with the second command above and flip through the first and second extensions, zoom-in to the spiral arms of M51 and see the detected clumps (all pixels with a value larger than 1 in the second extension). To optimize the parameters and make sure you have detected what you wanted, we recommend to visually inspect the detected clumps on the input image. For visual inspection, you can make a simple shell script like below. It will first call MakeCatalog to estimate the positions of the clumps, then make an SAO DS9 region file and open ds9 with the image and region file. Recall that in a shell script, the numeric variables (like ‘$1’, ‘$2’, and ‘$3’ in the example below) represent the arguments given to the script. But when used in the AWK arguments, they refer to column numbers. To create the shell script, using your favorite text editor, put the contents below into a file called ‘check-clumps.sh’. Recall that everything after a ‘#’ is just comments to help you understand the command (so read them!). Also note that if you are copying from the PDF version of this book, fix the single quotes in the AWK command. #! /bin/bash set -e # Stop execution when there is an error. set -u # Stop execution when a variable is not initialized. # Run MakeCatalog to write the coordinates into a FITS table. # Default output is `$1_cat.fits'. astmkcatalog $1.fits --clumpscat --ids --ra --dec # Use Gnuastro's Table and astscript-ds9-region to build the DS9 # region file (a circle of radius 1 arcseconds on each point). asttable $1"_cat.fits" -hCLUMPS -cRA,DEC \ | astscript-ds9-region -c1,2 --mode=wcs --radius=1 \ --output=$1.reg # Show the image (with the requested color scale) and the region file. ds9 -geometry 1800x3000 -mecube $1.fits -zoom to fit \ -scale limits $2 $3 -regions load all $1.reg # Clean up (delete intermediate files). rm $1"_cat.fits" $1.reg Finally, you just have to activate the script’s executable flag with the command below. This will enable you to directly/easily call the script as a command. $ chmod +x check-clumps.sh This script does not expect the ‘.fits’ suffix of the input’s filename as the first argument. Because the script produces intermediate files (a catalog and DS9 region file, which are later deleted). However, we do not want multiple instances of the script (on different files in the same directory) to collide (read/write to the same intermediate files). Therefore, we have used suffixes added to the input’s name to identify the intermediate files. Note how all the ‘$1’ instances in the commands (not within the AWK command(1)) are followed by a suffix. If you want to keep the intermediate files, put a ‘#’ at the start of the last line. The few, but high-valued, bright pixels in the central parts of the galaxies can hinder easy visual inspection of the fainter parts of the image. With the second and third arguments to this script, you can set the numerical values of the color map (first is minimum/black, second is maximum/white). You can call this script with any(2) output of Segment (when ‘--rawoutput’ is _not_ used) with a command like this: $ ./check-clumps.sh r_segmented -0.1 2 Go ahead and run this command. You will see the intermediate processing being done and finally it opens SAO DS9 for you with the regions superimposed on all the extensions of Segment’s output. The script will only finish (and give you control of the command-line) when you close DS9. If you need your access to the command-line before closing DS9, add a ‘&’ after the end of the command above. While DS9 is open, slide the dynamic range (values for black and white, or minimum/maximum values in different color schemes) and zoom into various regions of the M51 group to see if you are satisfied with the detected clumps. Do Not forget that through the “Cube” window that is opened along with DS9, you can flip through the extensions and see the actual clumps also. The questions you should be asking yourself are these: 1) Which real clumps (as you visually _feel_) have been missed? In other words, is the _completeness_ good? 2) Are there any clumps which you _feel_ are false? In other words, is the _purity_ good? Note that completeness and purity are not independent of each other, they are anti-correlated: the higher your purity, the lower your completeness and vice-versa. You can see this by playing with the purity level using the ‘--snquant’ option. Run Segment as shown above again with ‘-P’ and see its default value. Then increase/decrease it for higher/lower purity and check the result as before. You will see that if you want the best purity, you have to sacrifice completeness and vice versa. One interesting region to inspect in this image is the many bright peaks around the central parts of M51. Zoom into that region and inspect how many of them have actually been detected as true clumps. Do you have a good balance between completeness and purity? Also look out far into the wings of the group and inspect the completeness and purity there. An easier way to inspect completeness (and only completeness) is to mask all the pixels detected as clumps and visually inspecting the rest of the pixels. You can do this using Arithmetic in a command like below. For easy reading of the command, we will define the shell variable ‘i’ for the image name and save the output in ‘masked.fits’. $ in="r_segmented.fits -hINPUT" $ clumps="r_segmented.fits -hCLUMPS" $ astarithmetic $in $clumps 0 gt nan where -oclumps-masked.fits Inspecting ‘clumps-masked.fits’, you can see some very diffuse peaks that have been missed, especially as you go farther away from the group center and into the diffuse wings. This is due to the fact that with this configuration, we have focused more on the sharper clumps. To put the focus more on diffuse clumps, you can use a wider convolution kernel. Using a larger kernel can also help in detecting the existing clumps to fainter levels (thus better separating them from the surrounding diffuse signal). You can make any kernel easily using the ‘--kernel’ option in *note MakeProfiles::. But note that a larger kernel is also going to wash-out many of the sharp/small clumps close to the center of M51 and also some smaller peaks on the wings. Please continue playing with Segment’s configuration to obtain a more complete result (while keeping reasonable purity). We will finish the discussion on finding true clumps at this point. The properties of the clumps within M51, or the background objects can then easily be measured using *note MakeCatalog::. To measure the properties of the background objects (detected as clumps over the diffuse region), you should not mask the diffuse region. When measuring clump properties with *note MakeCatalog:: and using the ‘--clumpscat’, the ambient flux (from the diffuse region) is calculated and subtracted. If the diffuse region is masked, its effect on the clump brightness cannot be calculated and subtracted. To keep this tutorial short, we will stop here. See *note Segmentation and making a catalog:: and *note Segment:: for more on using Segment, producing catalogs with MakeCatalog and using those catalogs. ---------- Footnotes ---------- (1) In AWK, ‘$1’ refers to the first column, while in the shell script, it refers to the first argument. (2) Some modifications are necessary based on the input dataset: depending on the dynamic range, you have to adjust the second and third arguments. But more importantly, depending on the dataset’s world coordinate system, you have to change the region ‘width’, in the AWK command. Otherwise the circle regions can be too small/large. 2.3 Building the extended PSF ============================= Deriving the extended PSF of an image is very important in many aspects of the analysis of the objects within it. Gnuastro has a set of installed scripts, designed to simplify the process following the recipe of Infante-Sainz et al. (2020, ); for more, see *note PSF construction and subtraction::. An overview of the process is given in *note Overview of the PSF scripts::. 2.3.1 Preparing input for extended PSF -------------------------------------- We will use an image of the M51 galaxy group in the r (SDSS) band of the Javalambre Photometric Local Universe Survey (J-PLUS) to extract its extended PSF. For more information on J-PLUS, and its unique features visit: . First, let’s download the image from the J-PLUS web page using ‘wget’. But to have a generalize-able, and easy to read command, we will define some base variables (in all-caps) first. After the download is complete, open the image with SAO DS9 (or any other FITS viewer you prefer!) to have a feeling of the data (and of course, enjoy the beauty of M51 in such a wide field of view): $ urlend="jplus-dr2/get_fits?id=67510" $ urlbase="http://archive.cefca.es/catalogues/vo/siap/" $ mkdir jplus-dr2 $ wget $urlbase$urlend -O jplus-dr2/67510.fits.fz $ astscript-fits-view jplus-dr2/67510.fits.fz After enjoying the large field of view, have a closer look at the edges of the image. Please zoom in to the corners. You will see that on the edges, the pixel values are either zero or with significantly different values than the main body of the image. This is due to the dithering pattern that was used to make this image and happens in all imaging surveys(1). To avoid potential issues or problems that these regions may cause, we will first crop out the main body of the image with the command below. To keep the top-level directory clean, let’s also put the crop in a directory called ‘flat’. $ mkdir flat $ astcrop jplus-dr2/67510.fits.fz --section=225:9275,150:9350 \ --mode=img -oflat/67510.fits $ astscript-fits-view flat/67510.fits Please zoom into the edges again, you will see that they now have the same noise-level as the rest of the image (the problematic parts are now gone). ---------- Footnotes ---------- (1) Recall the cropping in a previous tutorial for a similar reason (varying “depth” across the image): *note Dataset inspection and cropping::. 2.3.2 Saturated pixels and Segment’s clumps ------------------------------------------- A constant-depth (flat) image was created in the previous section (*note Preparing input for extended PSF::). As explained in *note Overview of the PSF scripts::, an important step when building the PSF is to mask other sources in the image. Therefore, before going onto selecting stars, let’s detect all significant signal, and identify the clumps of background objects over the wings of the extended PSF. There is a problem however: the saturated pixels of the bright stars are going to cause problems in the segmentation phase. To see this problem, let’s make a $1000\times1000$ crop around a bright star to speed up the test (and its solution). Afterwards we will apply the solution to the whole image. $ astcrop flat/67510.fits --mode=wcs --widthinpix --width=1000 \ --center=203.3916736,46.7968652 --output=saturated.fits $ astnoisechisel saturated.fits --output=sat-nc.fits $ astsegment sat-nc.fits --output=sat-seg.fits $ astscript-fits-view sat-seg.fits Have a look at the ‘CLUMPS’ extension. You will see that instead of a single clump at the center of the bright star, we have many clumps! This has happened because of the saturated pixels! When saturation occurs, the sharp peak of the profile is lost (like cutting off the tip of a mountain to build a telescope!) and all saturated pixels get a noisy value close to the saturation level. To see this saturation noise run the last command again and in SAO DS9, set the “Scale” to “min max” and zoom into the center. You will see the noisy saturation pixels at the center of the star in red. This noise-at-the-peak disrupts Segment’s assumption to expand clumps from a local maxima: each noisy peak is being treated as a separate local maxima and thus a separate clump. For more on how Segment defines clumps, see Section 3.2.1 and Figure 8 of Akhlaghi & Ichikawa 2015 (https://arxiv.org/abs/1505.01664). To have the center identified as a single clump, we should mask these saturated pixels in a way that suites Segment’s non-parametric methodology. First we need to find the saturation level! The saturation level is usually fixed for any survey or input data that you receive from a certain database, so you will usually have to do this only once (the first time you get data from that database). Let’s make a smaller crop of $50\times50$ pixels around the star with the first command below. With the next command, please look at the crop with DS9 to visually understand the problem. You will see the saturated pixels as the noisy red pixels in the center of the image. A non-saturated star will have a single pixel as the maximum and will not have a such a large area covered by a noisy constant value (find a few stars in the image and see for yourself). Visual and qualitative inspection of the process is very important for understanding the solution. $ astcrop saturated.fits --mode=wcs --widthinpix --width=50 \ --center=203.3916736,46.7968652 --output=sat-center.fits $ astscript-fits-view sat-center.fits --ds9scale=minmax To quantitatively identify the saturation level in this image, let’s have a look at the distribution of pixels with a value larger than 100 (above the noise level): $ aststatistics sat-center.fits --greaterequal=100 Histogram: |* |* |* |* |* * |** * |*** ** |**** ** |****** **** |********** * * * ****** |************************* ************ * *** ******* *** ************ |---------------------------------------------------------------------- The peak you see in the right end (larger values) of the histogram shows the saturated pixels (a constant level, with some scatter due to the large Poisson noise). If there was no saturation, the number of pixels should have decreased at increasing values; until reaching the maximum value of the profile in one pixel. But that is not the case here. Please try this experiment on a non-saturated (fainter) star to see what we mean. If you still have not experimented on a non-saturated star, please stop reading this tutorial! Please open ‘flat/67510.fits’ in DS9, select a fainter/smaller star and repeat the last three commands (with a different center). After you have confirmed the point above (visually, and with the histogram), please continue with the rest of this tutorial. Finding the saturation level is easy with Statistics (by using the ‘--lessthan’ option until the histogram becomes as expected: only decreasing). First, let’s try ‘--lessthan=3000’: $ aststatistics sat-center.fits --greaterequal=100 --lessthan=3000 ------- Histogram: |* |* |* |* |* |** |*** * |**** * |******* ** |*********** * * * * * * * **** |************************* * ***** ******* ***** ** ***** * ******** |---------------------------------------------------------------------- We still see an increase in the histogram around 3000. Let’s try a threshold of 2500: $ aststatistics sat-center.fits --greaterequal=100 --lessthan=2500 ------- Histogram: |* |* |** |** |** |** |**** |***** |********* |************* * * * * |********************************* ** ** ** *** ** * **** ** ***** |---------------------------------------------------------------------- The peak at the large end of the histogram has gone! But let’s have a closer look at the values (the resolution of an ASCII histogram is limited!). To do this, we will ask Statistics to save the histogram into a table with the ‘--histogram’ option, then look at the last 20 rows: $ aststatistics sat-center.fits --greaterequal=100 --lessthan=2500 \ --histogram --output=sat-center-hist.fits $ asttable sat-center-hist.fits --tail=20 2021.1849112701 1 2045.0495397186 0 2068.9141681671 1 2092.7787966156 1 2116.6434250641 0 2140.5080535126 0 2164.3726819611 0 2188.2373104095 0 2212.101938858 1 2235.9665673065 1 2259.831195755 2 2283.6958242035 0 2307.560452652 0 2331.4250811005 1 2355.289709549 1 2379.1543379974 1 2403.0189664459 2 2426.8835948944 1 2450.7482233429 2 2474.6128517914 2 Since the number of points at the extreme end are increasing (from 1 to 2), We therefore see that a value 2500 is still above the saturation level (the number of pixels has started to increase)! A more reasonable saturation level for this image would be 2200! As an exercise, you can try automating this selection with AWK. Therefore, we can set the saturation level in this image(1) to be 2200. Let’s mask all such pixels with the command below: $ astarithmetic saturated.fits set-i i i 2200 gt nan where \ --output=sat-masked.fits $ astscript-fits-view sat-masked.fits --ds9scale=minmax You will see the peaks of several bright stars, not just the central very bright star. Zoom into each of the peaks you see. Besides the central very bright one that we were looking at closely until now, only one other star is saturated (its center is NaN, or Not-a-Number). Try to find it. But we are not done yet! Please zoom-in to that central bright star and have another look on the edges of the vertical “bleeding” saturated pixels, there are strong positive/negative values touching it (almost like “waves”). These will also cause problems and have to be masked! So with a small addition to the previous command, let’s dilate the saturated regions (with 2-connectivity, or 8-connected neighbors) four times and have another look: $ astarithmetic saturated.fits set-i i i 2200 gt \ 2 dilate 2 dilate 2 dilate 2 dilate \ nan where --output=sat-masked.fits $ astscript-fits-view sat-masked.fits --ds9scale=minmax Now that saturated pixels (and their problematic neighbors) have been masked, we can convolve the image (recall that Segment will use the convolved image for identifying clumps) with the command below. However, we will use the Spatial Domain convolution which can account for blank pixels (for more on the pros and cons of spatial and frequency domain convolution, see *note Spatial vs. Frequency domain::). We will also create a Gaussian kernel with $\rm{FWHM}=2$ pixels, truncated at $5\times\rm{FWHM}$. $ astmkprof --kernel=gaussian,2,5 --oversample=1 -okernel.fits $ astconvolve sat-masked.fits --kernel=kernel.fits --domain=spatial \ --output=sat-masked-conv.fits $ astscript-fits-view sat-masked-conv.fits --ds9scale=minmax Please zoom-in to the star and look closely to see how after spatial-domain convolution, the problematic pixels are still NaN. But Segment requires the profile to start with a maximum value and decrease. So before feeding into Segment, let’s fill the blank values with the maximum value of the neighboring pixels in both the input and convolved images (see *note Interpolation operators::): $ astarithmetic sat-masked.fits 2 interpolate-maxofregion \ --output=sat-fill.fits $ astarithmetic sat-masked-conv.fits 2 interpolate-maxofregion \ --output=sat-fill-conv.fits $ astscript-fits-view sat-fill* --ds9scale=minmax Have a closer look at the opened images. Please zoom-in (you will notice that they are already matched and locked, so they will both zoom-in together). Go to the centers of the saturated stars and confirm how they are filled with the largest non-blank pixel. We can now feed this image to NoiseChisel and Segment as the convolved image: $ astnoisechisel sat-fill.fits --convolved=sat-fill-conv.fits \ --output=sat-nc.fits $ astsegment sat-nc.fits --convolved=sat-fill-conv.fits \ --output=sat-seg.fits --rawoutput $ ds9 -mecube sat-seg.fits -zoom to fit -scale limits -1 1 See the ‘CLUMPS’ extension. Do you see how the whole center of the star has indeed been identified as a single clump? We thus achieved our aim and did not let the saturated pixels harm the identification of the center! If the issue was only clumps (like in a normal deep image processing), this was the end of Segment’s special considerations. However, in the scenario here, with the very extended wings of the bright stars, it usually happens that background objects become “clumps” in the outskirts and will rip the bright star outskirts into separate “objects”. In the next section (*note One object for the whole detection::), we will describe how you can modify Segment to avoid this issue. ---------- Footnotes ---------- (1) In raw exposures, this value is usually around 65000 (close to $2^{16}$, since most CCDs have 16-bit pixels; see *note Numeric data types::). But that is not the case here, because this is a processed/stacked image that has been calibrated. 2.3.3 One object for the whole detection ---------------------------------------- In *note Saturated pixels and Segment's clumps::, we described how you can run Segment such that saturated pixels do not interfere with its clumps. However, due to the very extended wings of the PSF, the default definition of “objects” should also be modified for the scenario here. To better see the problem, let’s inspect now the ‘OBJECTS’ extension, focusing on those objects with a label between 50 to 150 (which include the main star): $ astscript-fits-view sat-seg.fits -hOBJECTS --ds9scale="limits 50 150" We can see that the detection corresponding to the star has been broken into different objects. This is not a good object segmentation image for our scenario here. Since those objects in the outer wings of the bright star’s detection harbor a lot of the extended PSF. We want to keep them with the same “object” label as the star (we only need to mask the “clumps” of the background sources). To do this, we will make the following changes to Segment’s options (see *note Segmentation options:: for more on this options): • Since we want the extended diffuse flux of the PSF to be taken as a single object, we want all the grown clumps to touch. Therefore, it is necessary to decrease ‘--gthresh’ to very low values, like $-10$. Recall that its value is in units of the input standard deviation, so ‘--gthresh=-10’ corresponds to $-10\sigma$. The default value is not for such extended sources that dominate all background sources. • Since we want all connected grown clumps to be counted as a single object in any case, we will set ‘--objbordersn=0’ (its smallest possible value). Let’s make these changes and check if the star has been kept as a single object in the ‘OBJECTS’ extension or not: $ astsegment sat-nc.fits --convolved=sat-fill-conv.fits \ --gthresh=-10 --objbordersn=0 \ --output=sat-seg.fits --rawoutput $ astscript-fits-view sat-seg.fits -hOBJECTS --ds9scale="limits 50 150" Now we can extend these same steps to the whole image. To detect signal, we can run NoiseChisel using the command below. We modified the default value to two of the options, below you can see the reason for these changes. See *note Detecting large extended targets:: for more on optimizing NoiseChisel. • Since the image is so large, we have increased ‘--interpnumngb’ to get better outlier statistics on the tiles. The default value is primarily for small images, so this is usually the first thing you should do when running NoiseChisel on a real/large image. • Since the image is not too deep (made from few exposures), it does not have strong correlated noise, so we will decrease ‘--detgrowquant’ and increase ‘--detgrowmaxholesize’ to better extract signal. Furthermore, since both NoiseChisel and Segment need a convolved image, we will do the convolution before and feed it to both (to save running time). But in the first command below, let’s delete all the temporary files we made above. Since the image is large (+300 MB), to avoid wasting storage, any temporary file that is no longer necessary for later processing is deleted after it is used. You can visually check each of them with DS9 before deleting them (or not delete them at all!). Generally, within a pipeline it is best to remove such large temporary files, because space runs out much faster than you think (for example, once you get good results and want to use more fields). $ rm *.fits $ mkdir label $ astmkprof --kernel=gaussian,2,5 --oversample=1 \ -olabel/kernel.fits $ astarithmetic flat/67510.fits set-i i i 2200 gt \ 2 dilate 2 dilate 2 dilate 2 dilate nan where \ --output=label/67510-masked-sat.fits $ astconvolve label/67510-masked-sat.fits --kernel=label/kernel.fits \ --domain=spatial --output=label/67510-masked-conv.fits $ rm label/kernel.fits $ astarithmetic label/67510-masked-sat.fits 2 interpolate-maxofregion \ --output=label/67510-fill.fits $ astarithmetic label/67510-masked-conv.fits 2 interpolate-maxofregion \ --output=label/67510-fill-conv.fits $ rm label/67510-masked-conv.fits $ astnoisechisel label/67510-fill.fits --interpnumngb=100 \ --detgrowquant=0.8 --detgrowmaxholesize=100000 \ --convolved=label/67510-fill-conv.fits \ --output=label/67510-nc.fits $ rm label/67510-fill.fits $ astsegment label/67510-nc.fits --output=label/67510-seg-raw.fits \ --convolved=label/67510-fill-conv.fits --rawoutput \ --gthresh=-10 --objbordersn=0 $ rm label/67510-fill-conv.fits $ astscript-fits-view label/67510-seg-raw.fits We see that the saturated pixels have not caused any problem and the central clumps/objects of bright stars are now a single clump/object. We can now proceed to estimating the outer PSF. But before that, let’s make a “standard” segment output: one that can safely be fed into MakeCatalog for measurements and can contain all necessary outputs of this whole process in a single file (as multiple extensions). The main problem is again the saturated pixels: we interpolated them to be the maximum of their nearby pixels. But this will cause problems in any measurement that is done over those regions. To let MakeCatalog know that those pixels should not be used, the first extension of the file given to MakeCatalog should have blank values on those pixels. We will do this with the commands below: ## First HDU of Segment (Sky-subtracted input) $ astarithmetic label/67510-nc.fits -hINPUT-NO-SKY \ label/67510-masked-sat.fits isblank nan where \ --output=label/67510-seg.fits $ astfits label/67510-seg.fits --update=EXTNAME,INPUT-NO-SKY ## Second and third HDUs: CLUMPS and OBJECTS $ astfits label/67510-seg-raw.fits --copy=CLUMPS --copy=OBJECTS \ --output=label/67510-seg.fits ## Fourth HDU: Sky standard deviation (from NoiseChisel): $ astfits label/67510-nc.fits --copy=SKY_STD \ --output=label/67510-seg.fits ## Clean up all the un-necessary files: $ rm label/67510-masked-sat.fits label/67510-nc.fits \ label/67510-seg-raw.fits You can now simply run MakeCatalog on this image and be sure that saturated pixels will not affect the measurements. As one example, you can use MakeCatalog to find the clumps containing saturated pixels: recall that the ‘--area’ column only calculates the area of non-blank pixels, while ‘--geoarea’ calculates the area of the label (independent of their blank-ness in the values image): $ astmkcatalog label/67510-seg.fits --ids --ra --dec --area --geoarea \ --clumpscat --output=cat.fits The information of the clumps that have been affected by saturation can easily be found by selecting those with a differing value in the ‘AREA’ and ‘AREA_FULL’ columns: ## With AWK (second command, counts the number of rows) $ asttable cat.fits -hCLUMPS | awk '$5!=$6' $ asttable cat.fits -hCLUMPS | awk '$5!=$6' | wc -l ## Using Table arithmetic (compared to AWK, you can use column ## names, save as FITS, and be faster): $ asttable cat.fits -hCLUMPS -cRA,DEC --noblankend=3 \ -c'arith AREA AREA AREA_FULL eq nan where' ## Remove the table (which was just for a demo) $ rm cat.fits We are now ready to start building the outer parts of the PSF in *note Building outer part of PSF::. 2.3.4 Building outer part of PSF -------------------------------- In *note Preparing input for extended PSF::, we described how to create a Segment clump and object map, while accounting for saturated stars and not having over-fragmentation of objects in the outskirts of stars. We are now ready to start building the extended PSF. First we will build the outer parts of the PSF, so we want the brightest stars. You will see we have several bright stars in this very large field of view, but we do not yet have a feeling how many they are, and at what magnitudes. So let’s use Gnuastro’s Query program to find the magnitudes of the brightest stars (those brighter than g-magnitude 10 in Gaia early data release 3, or eDR3). For more on Query, see *note Query::. $ astquery gaia --dataset=edr3 --overlapwith=flat/67510.fits \ --range=phot_g_mean_mag,-inf,10 \ --output=flat/67510-bright.fits Now, we can easily visualize the magnitude and positions of these stars using ‘astscript-ds9-region’ and the command below (for more on this script, see *note SAO DS9 region files from table::) $ astscript-ds9-region flat/67510-bright.fits -cra,dec \ --namecol=phot_g_mean_mag \ --command="ds9 flat/67510.fits -zoom to fit -zscale" You can see that we have several stars between magnitudes 6 to 10. Let’s use ‘astscript-psf-select-stars’ in the command below to select the relevant stars in the image (the brightest; with a magnitude between 6 to 10). The advantage of using this script (instead of a simple ‘--range’ in Table), is that it will also check distances to nearby stars and reject those that are too close (and not good for constructing the PSF). Since we have very bright stars in this very wide-field image, we will also increase the distance to nearby neighbors with brighter or similar magnitudes (the default value is 1 arcmin). To do this, we will set ‘--mindistdeg=0.02’, which corresponds to 1.2 arcmin. $ mkdir outer $ astscript-psf-select-stars flat/67510.fits \ --magnituderange=6,10 --mindistdeg=0.02 \ --output=outer/67510-6-10.fits Let’s have a look at the selected stars in the image (it is very important to visually check every step when you are first discovering a new dataset). $ astscript-ds9-region outer/67510-6-10.fits -cra,dec \ --namecol=phot_g_mean_mag \ --command="ds9 flat/67510.fits -zoom to fit -zscale" Now that the catalog of good stars is ready, it is time to construct the individual stamps from the catalog above. To do that, we will use ‘astscript-psf-stamp’. One of the most important parameters for this script is the normalization radii ‘--normradii’. This parameter defines a ring for the flux normalization of each star stamp. The normalization of the flux is necessary because each star has a different brightness, and consequently, it is crucial for having all the stamps with the same flux level in the same region. Otherwise the final stack of the different stamps would have no sense. Depending on the PSF shape, internal reflections, ghosts, saturated pixels, and other systematics, it would be necessary to choose the ‘--normradii’ appropriately. The selection of the normalization radii is something that requires a good understanding of the data. To do that, let’s use two useful parameters that will help us in the checking of the data: ‘--tmpdir’ and ‘--keeptmp’; • With ‘--tmpdir=checking-normradii’ all temporary files, including the radial profiles, will be save in that directory (instead of an internally-created name). • With ‘--keeptmp’ we will not remove the temporal files, so it is possible to have a look at them (by default the temporary directory gets deleted at the end). It is necessary to specify the ‘--normradii’ even if we do not know yet the final values. Otherwise the script will not generate the radial profile. As a consequence, in this step we put the normalization radii equal to the size of the stamps. By doing this, the script will generate the radial profile of the entire stamp. In this particular step we set it to ‘--normradii=500,510’. We also use the ‘--nocentering’ option to disable sub-pixel warping in this phase (it is only relevant for the central part of the PSF). Furthermore, since there are several stars, we iterate over each row of the catalog using a while loop. $ counter=1 $ mkdir finding-normradii $ asttable outer/67510-6-10.fits \ | while read -r ra dec mag; do astscript-psf-stamp label/67510-seg.fits \ --mode=wcs \ --nocentering \ --center=$ra,$dec \ --normradii=500,510 \ --widthinpix=1000,1000 \ --segment=label/67510-seg.fits \ --output=finding-normradii/$counter.fits \ --tmpdir=finding-normradii --keeptmp; \ counter=$((counter+1)); \ done First let’s have a look at all the masked postage stamps of the cropped stars. Once they all open, feel free to zoom-in, they are all matched and locked. It is always good to check the different stamps to ensure the quality and possible two dimensional features that are difficult to detect from the radial profiles (such as ghosts and internal reflections). $ astscript-fits-view finding-normradii/cropped-masked*.fits If everything looks good in the image, let’s open all the radial profiles and visually check those with the command below. Note that ‘astscript-fits-view’ calls the ‘topcat’ graphic user interface (GUI) program to visually inspect (plot) tables. If you do not already have it, see *note TOPCAT::. $ astscript-fits-view finding-normradii/rprofile*.fits After some study of this data, we could say that a good normalization ring is those pixels between R=20 and R=30 pixels. Such a ring ensures having a high number of pixels so the estimation of the flux normalization will be robust. Also, at such distance from the center the signal to noise is high and there are not obvious features that can affect the normalization. Note that the profiles are different because we are considering a wide range of magnitudes, so the fainter stars are much more noisy. However, in this tutorial we will keep these stars in order to have a higher number of stars for the outer part. In a real case scenario, we should look for stars with a much more similar brightness (smaller range of magnitudes) to not lose signal to noise as a consequence of the inclusion of fainter stars. $ rm -r finding-normradii $ counter=1 $ mkdir outer/stamps $ asttable outer/67510-6-10.fits \ | while read -r ra dec mag; do astscript-psf-stamp label/67510-seg.fits \ --mode=wcs \ --nocentering \ --center=$ra,$dec \ --normradii=20,30 \ --widthinpix=1000,1000 \ --segment=label/67510-seg.fits \ --output=outer/stamps/67510-$counter.fits; \ counter=$((counter+1)); \ done After the stamps are created, we need to stack them together with a simple Arithmetic command (see *note Stacking operators::). The stack is done using the sigma-clipped mean operator that will preserve more of the signal, while rejecting outliers (more than $3\sigma$ with a tolerance of $0.2$, for more on sigma-clipping see *note Sigma clipping::). Just recall that we need to specify the number of inputs into the stacking operators, so we are reading the list of images and counting them as separate variables before calling Arithmetic. $ imgs=outer/stamps/*.fits $ numimgs=$(echo $imgs | wc -w) $ astarithmetic $imgs $numimgs 3 0.2 sigclip-mean -g1 \ --output=outer/stack.fits --wcsfile=none Did you notice the ‘--wcsfile=none’ option above? With it, the stacked image no longer has any WCS information. This is natural, because the stacked image does not correspond to any specific region of the sky any more. Let’s compare this stacked PSF with the images of the individual stars that were used to create it. You can clearly see that the number of masked pixels is significantly decreased and the PSF is much more “cleaner”. $ astscript-fits-view outer/stack.fits outer/stamps/*.fits However, the saturation in the center still remains. Also, because we did not have too many images, some regions still are very noisy. If we had more bright stars in our selected magnitude range, we could have filled those outer remaining patches. In a large survey like J-PLUS (that we are using here), you can simply look into other fields that were observed soon before/after the image ID 67510 that we used here (to have a similar PSF) and get more stars in those images to add to these. In fact, the J-PLUS DR2 image ID of the field above was intentionally preserved during the steps above to show how easy it is to use images from other fields and blend them all into the output PSF. 2.3.5 Inner part of the PSF --------------------------- In *note Building outer part of PSF::, we were able to create a stack of the outer-most behavior of the PSF in a J-PLUS survey image. But the central part that was affected by saturation and non-linearity is still remaining, and we still do not have a “complete” PSF! In this section, we will use the same steps before to make stacks of more inner regions of the PSF to ultimately unite them all into a single PSF in *note Uniting the different PSF components::. For the outer PSF, we selected stars in the magnitude range of 6 to 10. So let’s have a look and see how many stars we have in the magnitude range of 12 to 13 with a more relaxed condition on the minimum distance for neighbors, ‘--mindistdeg=0.01’ (36 arcsec, since these stars are fainter), and use the ds9 region script to visually inspect them: $ mkdir inner $ astscript-psf-select-stars flat/67510.fits \ --magnituderange=12,13 --mindistdeg=0.01 \ --output=inner/67510-12-13.fits $ astscript-ds9-region inner/67510-12-13.fits -cra,dec \ --namecol=phot_g_mean_mag \ --command="ds9 flat/67510.fits -zoom to fit -zscale" We have 41 stars, but if you zoom into their centers, you will see that they do not have any major bleeding-vertical saturation any more. Only the very central core of some of the stars is saturated. We can therefore use these stars to fill the strong bleeding footprints that were present in the outer stack of ‘outer/stack.fits’. Similar to before, let’s build ready-to-stack crops of these stars. To get a better feeling of the normalization radii, follow the same steps of *note Building outer part of PSF:: (setting ‘--tmpdir’ and ‘--keeptmp’). In this case, since the stars are fainter, we can set a smaller size for the individual stamps, ‘--widthinpix=500,500’, to speed up the calculations: $ counter=1 $ mkdir inner/stamps $ asttable inner/67510-12-13.fits \ | while read -r ra dec mag; do astscript-psf-stamp label/67510-seg.fits \ --mode=wcs \ --normradii=5,10 \ --center=$ra,$dec \ --widthinpix=500,500 \ --segment=label/67510-seg.fits \ --output=inner/stamps/67510-$counter.fits; \ counter=$((counter+1)); \ done $ imgs=inner/stamps/*.fits $ numimgs=$(echo $imgs | wc -w) $ astarithmetic $imgs $numimgs 3 0.2 sigclip-mean -g1 \ --output=inner/stack.fits --wcsfile=none $ astscript-fits-view inner/stack.fits inner/stamps/*.fits We are now ready to unite the two stacks we have constructed: the outer and the inner parts. 2.3.6 Uniting the different PSF components ------------------------------------------ In *note Building outer part of PSF:: we built the outer part of the extended PSF and the inner part was built in *note Inner part of the PSF::. The outer part was built with very bright stars, and the inner part using fainter stars to not have saturation in the core of the PSF. The next step is to join these two parts in order to have a single PSF. First of all, let’s have a look at the two stacks and also to their radial profiles to have a good feeling of the task. Note that you will need to have TOPCAT to run the last command and plot the radial profile (see *note TOPCAT::). $ astscript-fits-view outer/stack.fits inner/stack.fits $ astscript-radial-profile outer/stack.fits -o outer/profile.fits $ astscript-radial-profile inner/stack.fits -o inner/profile.fits $ astscript-fits-view outer/profile.fits inner/profile.fits From the visual inspection of the images and the radial profiles, it is clear that we have saturation in the center for the outer part. Note that the absolute flux values of the PSFs are meaningless since they depend on the normalization radii we used to obtain them. The uniting step consists in scaling up (or down) the inner part of the PSF to have the same flux at the junction radius, and then, use that flux-scaled inner part to fill the center of the outer PSF. To get a feeling of the process, first, let’s open the two radial profiles and find the factor manually first: 1. Run this command to open the two tables in *note TOPCAT::: $ astscript-fits-view outer/profile.fits inner/profile.fits 2. On the left side of the screen, under “Table List”, you will see the two imported tables. Click on the first one (profile of the outer part) so it is shown first. 3. Under the “Graphics” menu item, click on “Plane plot”. A new window will open with the plot of the first two columns: ‘RADIUS’ on the horizontal axis and ‘MEAN’ on the vertical. The rest of the steps are done in this window. 4. In the bottom settings, within the left panel, click on the “Axes” item. This will allow customization of the plot axes. 5. In the bottom-right panel, click on the box in front of “Y Log” to make the vertical axis logarithmic-scaled. 6. On the “Layers” menu, select “Add Position Control” to allow adding the profile of the inner region. After it, you will see that a new red-blue scatter plot icon opened on the bottom-left menu (with a title of ‘’). 7. On the bottom-right panel, in the drop-down menu in front of ‘Table:’, select ‘2: profile.fits’. Afterwards, you will see the radial profile of the inner stack as the newly added blue plot. Our goal here is to find the factor that is necessary to multiply with the inner profile so it matches the outer one. 8. On the bottom-right panel, in front of ‘Y:’, you will see ‘MEAN’. Click in the white-space after it, and type this: ‘*100’. This will display the ‘MEAN’ column of the inner profile, after multiplying it by 100. Afterwards, you will see that the inner profile (blue) matches more cleanly with the outer (red); especially in the smaller radii. At larger radii, it does not drop like the red plot. This is because of the extremely low signal-to-noise ratio at those regions in the fainter stars used to make this stack. 9. Take your mouse cursor over the profile, in particular over the bump around a radius of 100 pixels. Scroll your mouse down-ward to zoom-in to the profile and up-ward to zoom-out. You can click-and-hold any part of the profile and if you move your cursor (while still holding the mouse-button) to look at different parts of the profile. This is particular helpful when you have zoomed-in to the profile. 10. Zoom-in to the bump around a radius of 100 pixels until the horizontal axis range becomes around 50 to 130 pixels. 11. You clearly see that the inner stack (blue) is much more noisy than the outer (red) stack. By “noisy”, we mean that the scatter of the points is much larger. If you further zoom-out, you will see that the shallow slope at the larger radii of the inner (blue) profile has also affected the height of this bump in the inner profile. This is a _very important_ point: this clearly shows that the inner profile is too noisy at these radii. 12. Click-and-hold your mouse to see the inner parts of the two profiles (in the range 0 to 80). You will see that for radii less than 40 pixels, the inner profile (blue) points loose their scatter (and thus have a good signal-to-noise ratio). 13. Zoom-in to the plot and follow the profiles until smaller radii (for example, 10 pixels). You see that for each radius, the inner (blue) points are consistently above the outer (red) points. This shows that the $\times100$ factor we selected above was too much. 14. In the bottom-right panel, change the ‘100’ to ‘80’ and zoom-in to the same region. At each radius, the blue points are now below the red points, so the scale-factor 80 is not enough. So let’s increase it and try ‘90’. After zooming-in, you will notice that in the inner-radii (less than 30 pixels), they are now very similar. The ultimate aim of the steps below is to find this factor automatically. 15. But before continuing, let’s focus on another important point about the central regions: non-linearity and saturation. While you are zoomed-in (from the step above), follow (click-and-drag) the profile towards smaller radii. You will see that smaller than a radius of 10, they start to diverge. But this time, the outer (red) profile is getting a shallower slope and diverges significantly from about the radius of 8. We had masked all saturated pixels before, so this divergence for radii smaller than 10 shows the effect of the CCD’s non-linearity (where the number of electrons will not be linearly correlated with the number of incident photons). This is present in all CCDs and pixels beyond this level should not be used in measurements (or properly corrected). The items above were only listed so you get a good mental/visual understanding of the logic behind the operation of the next script (and to learn how to tune its parameters where necessary): ‘astscript-psf-scale-factor’. This script is more general than this particular problem, but can be used for this special case also. Its job is to take a model of an object (PSF, or inner stack in this case) and the position of an instance of that model (a star, or the outer stack in this case) in a larger image. Instead of dealing with radial profiles (that enforce a certain shape), this script will put the centers of the inner and outer PSFs over each other and divide the outer by the inner. Let’s have a look with the command below. Just note that we are running it with ‘--keeptmp’ so the temporary directory with all the intermediate files remain for further clarification: $ astscript-psf-scale-factor outer/stack.fits \ --psf=inner/stack.fits --center=501,501 \ --mode=img --normradii=10,15 --keeptmp $ astscript-fits-view stack_psfmodelscalefactor/cropped-*.fits \ stack_psfmodelscalefactor/for-factor-*.fits With the second command, you see the four steps of the process: the first two images show the cropped outer and inner stacks (to same width image). The third shows the radial position of each pixel (which is used to only keep the pixels within the desired radial range). The fourth shows the per-pixel division of the outer by the inner within the requested radii. The sigma-clipped median of these pixels is finally reported. Unlike the radial profile method (which averages over a circular/elliptical annulus for each radius), this method imposes no a-priori shape on the PSF. This makes it very useful for complex PSFs (like the case here). To continue, let’s remove the temporary directory and re-run the script but with ‘--quiet’ mode so we can put the output in a shell variable. $ rm -r stack_psfmodelscalefactor $ scale=$(astscript-psf-scale-factor outer/stack.fits \ --psf=inner/stack.fits --center=501,501 \ --mode=img --normradii=10,15 --quiet) $ echo $scale Now that we know the scaling factor, we are ready to unite the outer and the inner part of the PSF. To do that, we will use the script ‘astscript-psf-unite’ with the command below (for more on this script, see *note Invoking astscript-psf-unite::). The basic parameters are the inner part of the PSF (given to ‘--inner’), the inner part’s scale factor (‘--scale’), and the junction radius (‘--radius’). The inner part is first scaled, and all the pixels of the outer image within the given radius are replaced with the pixels of the inner image. Since the flux factor was computed for a ring of pixels between 10 and 15 pixels, let’s set the junction radius to be 12 pixels (roughly in between 10 and 15): $ astscript-psf-unite outer/stack.fits \ --inner=inner/stack.fits --radius=12 \ --scale=$scale --output=psf.fits Let’s have a look at the outer stack and the final PSF with the command below. Since we want several other DS9 settings to help you directly see the main point, we are using ‘--ds9extra’. After DS9 is opened, you can see that the center of the PSF has now been nicely filled. You can click on the “Edit” button and then the “Colorbar” and hold your cursor over the image and move it. You can see that besides filling the inner regions nicely, there is also no major discontinuity in the 2D image around the union radius of 12 pixels around the center. $ astscript-fits-view outer/stack.fits psf.fits --ds9scale=minmax \ --ds9extra="-scale limits 0 22000 -match scale" \ --ds9extra="-lock scale yes -zoom 4 -scale log" Nothing demonstrates the effect of a bad analysis than actually seeing a bad result! So let’s choose a bad normalization radial range (50 to 60 pixels) and unite the inner and outer parts based on that. The last command will open the two PSFs together in DS9, you should be able to immediately see the discontinuity in the union radius. $ scale=$(astscript-psf-scale-factor outer/stack.fits \ --psf=inner/stack.fits --center=501,501 \ --mode=img --normradii=50,60 --quiet) $ astscript-psf-unite outer/stack.fits \ --inner=inner/stack.fits --radius=55 \ --scale=$scale --output=psf-bad.fits $ astscript-fits-view psf-bad.fits psf.fits --ds9scale=minmax \ --ds9extra="-scale limits 0 50 -match scale" \ --ds9extra="-lock scale yes -zoom 4 -scale log" As you see, the selection of the normalization radii and the unite radius are very important. The first time you are trying to build the PSF of a new dataset, it has to be explored with a visual inspection of the images and radial profiles. Once you have found a good normalization radius for a certain part of the PSF in a survey, you can generally use it comfortably without change. But for a new survey, or a different part of the PSF, be sure to repeat the visual checks above to choose the best radii. As a summary, a good junction radius is one that: • Is large enough to not let saturation and non-linearity (from the outer profile) into the inner region. • Is small enough to have a sufficiently high signal to noise ratio (from the inner profile) to avoid adding noise in the union radius. Now that the complete PSF has been obtained, let’s remove that bad-looking PSF, and stick with the nice and clean PSF for the next step in *note Subtracting the PSF::. $ rm -rf psf-bad.fits 2.3.7 Subtracting the PSF ------------------------- Previously (in *note Uniting the different PSF components::) we constructed a full PSF, from the central pixel to a radius of 500 pixels. Now, let’s use the PSF to subtract the scattered light from each individual star in the image. By construction, the pixel values of the PSF came from the normalization of the individual stamps (that were created for stars of different magnitudes). As a consequence, it is necessary to compute a scale factor to fit that PSF image to each star. This is done with the same ‘astscript-psf-scale-factor’ command that we used previously in *note Uniting the different PSF components::. The difference is that now we are not aiming to join two different PSF parts but looking for the necessary scale factor to match the star with the PSF. Afterwards, we will use ‘astscript-psf-subtract’ for placing the PSF image at the desired coordinates within the same pixel grid as the image. Finally, once the stars have been modeled by the PSF, we will subtract it. First, let’s start with a single star. Later, when the basic idea has been explained, we will generalize the method for any number of stars. With the following command we obtain the coordinates (RA and DEC) and magnitude of the brightest star in the image (which is on the top edge of the image): $ mkdir single-star $ center=$(asttable flat/67510-bright.fits --sort phot_g_mean_mag \ --column=ra,dec --head 1 \ | awk '{printf "%s,%s", $1, $2}') $ echo $center With the center position of that star, let’s obtain the flux factor using the same normalization ring we used for the creation of the outer part of the PSF: $ scale=$(astscript-psf-scale-factor label/67510-seg.fits \ --mode=wcs --quiet \ --psf=psf.fits \ --center=$center \ --normradii=10,15 \ --segment=label/67510-seg.fits) Now we have all the information necessary to model the star using the PSF: the position on the sky and the flux factor. Let’s use this data with the script ‘astscript-psf-subtract’ for modeling this star and have a look with DS9. $ astscript-psf-subtract label/67510-seg.fits \ --mode=wcs \ --psf=psf.fits \ --scale=$scale \ --center=$center \ --output=single-star/subtracted.fits $ astscript-fits-view label/67510-seg.fits single-star/subtracted.fits \ --ds9center=$center --ds9mode=wcs --ds9extra="-zoom 4" You will notice that there is something wrong with this “subtraction”! The box of the extended PSF is clearly visible! The sky noise under the box is clearly larger than the rest of the noise in the image. Before reading on, please try to think about the cause of this yourself. To understand the cause, let’s look at the scale factor, the number of stamps used to build the outer part (and its square root): $ echo $scale $ ls outer/stamps/*.fits | wc -l $ ls outer/stamps/*.fits | wc -l | awk '{print sqrt($1)}' You see that the scale is almost 19! As a result, the PSF has been multiplied by 19 before being subtracted. However, the outer part of the PSF was created with only a handful of star stamps. When you stack $N$ images, the stack’s signal-to-noise ratio (S/N) improves by $\sqrt{N}$. We had 8 images for the outer part, so the S/N has only improved by a factor of just under 3! When we multiply the final stacked PSF with 19, we are also scaling up the noise by that same factor (most importantly: in the outer most regions where there is almost no signal). So the stacked image’s noise-level is $19/3=6.3$ times larger than the noise of the input image. This terrible noise-level is what you clearly see as the footprint of the PSF. To confirm this, let’s use the commands below to subtract the faintest of the bright-stars catalog (note the use of ‘--tail’ when finding the central position). You will notice that the scale factor ($\sim1.3$) is now smaller than 3. So when we multiply the PSF with this factor, the PSF’s noise level is lower than our input image and we should not see any footprint like before. Note also that we are using a larger zoom factor, because this star is smaller in the image. $ center=$(asttable flat/67510-bright.fits --sort phot_g_mean_mag \ --column=ra,dec --tail 1 \ | awk '{printf "%s,%s", $1, $2}') $ scale=$(astscript-psf-scale-factor label/67510-seg.fits \ --mode=wcs --quiet \ --psf=psf.fits \ --center=$center \ --normradii=10,15 \ --segment=label/67510-seg.fits) $ echo $scale $ astscript-psf-subtract label/67510-seg.fits \ --mode=wcs \ --psf=psf.fits \ --scale=$scale \ --center=$center \ --output=single-star/subtracted.fits $ astscript-fits-view label/67510-seg.fits single-star/subtracted.fits \ --ds9center=$center --ds9mode=wcs --ds9extra="-zoom 10" In a large survey like J-PLUS, it is easy to use more and more bright stars from different pointings (ideally with similar FWHM and similar telescope properties(1)) to improve the S/N of the PSF. As explained before, we designed the output files of this tutorial with the ‘67510’ (which is this image’s pointing label in J-PLUS) where necessary so you see how easy it is to add more pointings to use in the creation of the PSF. Let’s consider now more than one single star. We should have two things in mind: • The brightest (subtract-able, see the point below) star should be the first star to be subtracted. This is because of its extended wings which may affect the scale factor of nearby stars. So we should sort the catalog by brightness and come down from the brightest. • We should only subtract stars where the scale factor is less than the S/N of the PSF (in relation to the data). Since it can get a little complex, it is easier to implement this step as a script (that is heavily commented for you to easily understand every step; especially if you put it in a good text editor with color-coding!). You will notice that script also creates a ‘.log’ file, which shows which star was subtracted and which one was not (this is important, and will be used below!). #!/bin/bash # Abort the script on first error. set -e # ID of image to subtract stars from. imageid=67510 # Get S/N level of the final PSF in relation to the actual data: snlevel=$(ls outer/stamps/*.fits | wc -l | awk '{print sqrt($1)}') # Put a copy of the image we want to subtract the PSF from in the # final file (this will be over-written after each subtraction). subtracted=subtracted/$imageid.fits cp label/$imageid-seg.fits $subtracted # Name of log-file to keep status of the subtraction of each star. logname=subtracted/$imageid.log echo "# Column 1: RA [deg, f64] Right ascension of star." > $logname echo "# Column 2: Dec [deg, f64] Declination of star." >> $logname echo "# Column 3: Stat [deg, f64] Status (1: subtracted)" >> $logname # Go over each item in the bright star catalog: asttable flat/67510-bright.fits -cra,dec --sort phot_g_mean_mag \ | while read -r ra dec; do # Put a comma between the RA/Dec to pass to options. center=$(echo $ra $dec | awk '{printf "%s,%s", $1, $2}') # Calculate the scale value scale=$(astscript-psf-scale-factor label/67510-seg.fits \ --mode=wcs --quiet\ --psf=psf.fits \ --center=$center \ --normradii=10,15 \ --segment=label/67510-seg.fits) # Subtract this star if the scale factor is less than the S/N # level calculated above. check=$(echo $snlevel $scale \ | awk '{if($1>$2) c="good"; else c="bad"; print c}') if [ $check = good ]; then # A temporary file to subtract this star. subtmp=subtracted/$imageid-tmp.fits # Subtract this star from the image where all previous stars # were subtracted. astscript-psf-subtract $subtracted \ --mode=wcs \ --psf=psf.fits \ --scale=$scale \ --center=$center \ --output=$subtmp # Rename the temporary subtracted file to the final one: mv $subtmp $subtracted # Keep the status for this star. status=1 else # Let the user know this star did not work, and keep the status # for this star. echo "$center: $scale is larger than $snlevel" status=0 fi # Keep the status in a log file. echo "$ra $dec $status" >> $logname done Copy the contents above into a file called ‘subtract-psf-from-cat.sh’ and run the following commands. Just note that in the script above, we assumed the output is written in the ‘subtracted/’, directory, so we will first make that. $ mkdir subtracted $ chmod +x subtract-psf-from-cat.sh $ ./subtract-psf-from-cat.sh $ astscript-fits-view label/67510-seg.fits subtracted/67510.fits Can you visually find the stars that have been subtracted? Its a little hard, is not it? This shows that you done a good job this time (the sky-noise is not significantly affected)! So let’s subtract the actual image from the PSF-subtracted image to see the scattered light field of the subtracted stars. With the second command below we will zoom into the brightest subtracted star, but of course feel free to zoom-out and inspect the others also. $ astarithmetic label/67510-seg.fits subtracted/67510.fits - \ --output=scattered-light.fits -g1 $ center=$(asttable subtracted/67510.log --equal=Stat,1 --head=1 \ -cra,dec | awk '{printf "%s,%s", $1, $2}') $ astscript-fits-view label/67510-seg.fits subtracted/67510.fits \ scattered-light.fits \ --ds9center=$center --ds9mode=wcs \ --ds9extra="-scale limits -0.5 1.5 -match scale" \ --ds9extra="-lock scale yes -zoom 10" \ --ds9extra="-tile mode column" ## We can always make it easily, so let's remove this. $ rm scattered-light.fits You will probably have noticed that in the scattered light field there are some patches that correspond to the saturation of the stars. Since we obtained the scattered light field by subtracting PSF-subtracted image from the original image, it is natural that we have such saturated regions. To solve such inconvenience, this script also has an option to not make the subtraction of the PSF but to give as output the modeled star. For doing that, it is necessary to run the script with the option ‘--modelonly’. We encourage the reader to obtain such scattered light field model. In some scenarios it could be interesting having such way of correcting the PSF. For example, if there are many faint stars that can be modeled at the same time because their flux do not affect each other. In such situation, the task could be easily parallelized without having to wait to model the brighter stars before the fainter ones. At the end, once all stars have been modeled, a simple Arithmetic command could be used to sum the different modeled-PSF stamps to obtain the entire scattered light field. In general you see that the subtraction has been done nicely and almost all the extended wings of the PSF have been subtracted. The central regions of the stars are not perfectly subtracted: • Some may get too dark at the center. This may be due to the non-linearity of the CCD counting (as discussed previously in *note Uniting the different PSF components::). • Others may have a strong gradient: one side is too positive and one side is too negative (only in the very central few pixels). This is due to the non-accurate positioning: most probably this happens because of imperfect astrometry. Note also that during this process we assumed that the PSF does not vary with the CCD position or any other parameter. In other words, we are obtaining an averaged PSF model from a few star stamps that are naturally different, and this also explains the residuals on each subtracted star. We let as an interesting exercise the modeling and subtraction of other stars, for example, the non saturated stars of the image. By doing this, you will notice that in the core region the residuals are different compared to the residuals of brighter stars that we have obtained. In general, in this tutorial we have showed how to deal with the most important challenges for constructing an extended PSF. Each image or dataset will have its own particularities that you will have to take into account when constructing the PSF. ---------- Footnotes ---------- (1) for example, in J-PLUS, the baffle of the secondary mirror was adjusted in 2017 because it produced extra spikes in the PSF. So all images after that date have a PSF with 4 spikes (like this one), while those before it have many more spikes. 2.4 Sufi simulates a detection ============================== It is the year 953 A.D. and Abd al-rahman Sufi (903 – 986 A.D.)(1) is in Shiraz as a guest astronomer. He had come there to use the advanced 123 centimeter astrolabe for his studies on the ecliptic. However, something was bothering him for a long time. While mapping the constellations, there were several non-stellar objects that he had detected in the sky, one of them was in the Andromeda constellation. During a trip he had to Yemen, Sufi had seen another such object in the southern skies looking over the Indian ocean. He was not sure if such cloud-like non-stellar objects (which he was the first to call ‘Sahābi’ in Arabic or ‘nebulous’) were real astronomical objects or if they were only the result of some bias in his observations. Could such diffuse objects actually be detected at all with his detection technique? He still had a few hours left until nightfall (when he would continue his studies on the ecliptic) so he decided to find an answer to this question. He had thoroughly studied Claudius Ptolemy’s (90 – 168 A.D) Almagest and had made lots of corrections to it, in particular in measuring the brightness. Using his same experience, he was able to measure a magnitude for the objects and wanted to simulate his observation to see if a simulated object with the same brightness and size could be detected in simulated noise with the same detection technique. The general outline of the steps he wants to take are: 1. Make some mock profiles in an over-sampled image. The initial mock image has to be over-sampled prior to convolution or other forms of transformation in the image. Through his experiences, Sufi knew that this is because the image of heavenly bodies is actually transformed by the atmosphere or other sources outside the atmosphere (for example, gravitational lenses) prior to being sampled on an image. Since that transformation occurs on a continuous grid, to best approximate it, he should do all the work on a finer pixel grid. In the end he can re-sample the result to the initially desired grid size. 2. Convolve the image with a point spread function (PSF, see *note PSF::) that is over-sampled to the same resolution as the mock image. Since he wants to finish in a reasonable time and the PSF kernel will be very large due to oversampling, he has to use frequency domain convolution which has the side effect of dimming the edges of the image. So in the first step above he also has to build the image to be larger by at least half the width of the PSF convolution kernel on each edge. 3. With all the transformations complete, the image should be re-sampled to the same size of the pixels in his detector. 4. He should remove those extra pixels on all edges to remove frequency domain convolution artifacts in the final product. 5. He should add noise to the (until now, noise-less) mock image. After all, all observations have noise associated with them. Fortunately Sufi had heard of GNU Astronomy Utilities from a colleague in Isfahan (where he worked) and had installed it on his computer a year before. It had tools to do all the steps above. He had used MakeProfiles before, but was not sure which columns he had chosen in his user or system-wide configuration files for which parameters, see *note Configuration files::. So to start his simulation, Sufi runs MakeProfiles with the ‘-P’ option to make sure what columns in a catalog MakeProfiles currently recognizes, and confirm the output image parameters. In particular, Sufi is interested in the recognized columns (shown below). $ astmkprof -P [[[ ... Truncated lines ... ]]] # Output: type float32 # Type of output: e.g., int16, float32, etc. mergedsize 1000,1000 # Number of pixels along first FITS axis. oversample 5 # Scale of oversampling (>0 and odd). [[[ ... Truncated lines ... ]]] # Columns, by info (see `--searchin'), or number (starting from 1): ccol 2 # Coord. columns (one call for each dim.). ccol 3 # Coord. columns (one call for each dim.). fcol 4 # sersic (1), moffat (2), gaussian (3), point # (4), flat (5), circumference (6), distance # (7), custom-prof (8), azimuth (9), # custom-img (10). rcol 5 # Effective radius or FWHM in pixels. ncol 6 # Sersic index or Moffat beta. pcol 7 # Position angle. qcol 8 # Axis ratio. mcol 9 # Magnitude. tcol 10 # Truncation in units of radius or pixels. [[[ ... Truncated lines ... ]]] In Gnuastro, column counting starts from 1, so the columns are ordered such that the first column (number 1) can be an ID he specifies for each object (and MakeProfiles ignores), each subsequent column is used for another property of the profile. It is also possible to use column names for the values of these options and change these defaults, but Sufi preferred to stick to the defaults. Fortunately MakeProfiles has the capability to also make the PSF which is to be used on the mock image and using the ‘--prepforconv’ option, he can also make the mock image to be larger by the correct amount and all the sources to be shifted by the correct amount. For his initial check he decides to simulate the nebula in the Andromeda constellation. The night he was observing, the PSF had roughly a FWHM of about 5 pixels, so as the first row (profile) in the table below, he defines the PSF parameters. Sufi sets the radius column (‘rcol’ above, fifth column) to ‘5.000’, he also chooses a Moffat function for its functional form. Remembering how diffuse the nebula in the Andromeda constellation was, he decides to simulate it with a mock Sérsic index 1.0 profile. He wants the output to be 499 pixels by 499 pixels, so he can put the center of the mock profile in the central pixel of the image which is the 250th pixel along both dimensions (note that an even number does not have a “central” pixel). Looking at his drawings of it, he decides a reasonable effective radius for it would be 40 pixels on this image pixel scale (second row, 5th column below). He also sets the axis ratio (0.4) and position angle (-25 degrees) to approximately correct values too, and finally he sets the total magnitude of the profile to 3.44 which he had measured. Sufi also decides to truncate both the mock profile and PSF at 5 times the respective radius parameters. In the end he decides to put four stars on the four corners of the image at very low magnitudes as a visual scale. While he was preparing the catalog, one of his students approached him and was also following the steps. As described above, the catalog of profiles to build will be a table (multiple columns of numbers) like below: 0 0.000 0.000 2 5 4.7 0.0 1.0 30.0 5.0 1 250.0 250.0 1 40 1.0 -25 0.4 3.44 5.0 2 50.00 50.00 4 0 0.0 0.0 0.0 6.00 0.0 3 450.0 50.00 4 0 0.0 0.0 0.0 6.50 0.0 4 50.00 450.0 4 0 0.0 0.0 0.0 7.00 0.0 5 450.0 450.0 4 0 0.0 0.0 0.0 7.50 0.0 This contains all the “data” to build the profile, and you can easily pass it to Gnuastro’s MakeProfiles: since Sufi already knows the columns and expected values very good, he has placed the information in the proper columns. However, when the student sees this, he just sees a mumble-jumble of numbers! Generally, Sufi explains to the student that even if you know the number positions and values very nicely today, in a couple of months you will forget! It will then be very hard for you to interpret the numbers properly. So you should never use naked data (or data without any extra information). Data (or information) that describes other data is called “metadata”! One common example is column names (the name of a column is itself a data element, but data that describes the lower-level data within that column: how to interpret the numbers within it). Sufi explains to his student that Gnuastro has a convention for adding metadata within a plain-text file; and guides him to *note Gnuastro text table format::. Because we do not want metadata to be confused with the actual data, in a plain-text file, we start lines containing metadata with a ‘‘#’’. For example, see the same data above, but this time with metadata for every column: # Column 1: ID [counter, u8] Identifier # Column 2: X [pix, f32] Horizontal position # Column 3: Y [pix, f32] Vertical position # Column 4: PROFILE [name, u8] Radial profile function # Column 5: R [pix, f32] Effective radius # Column 6: N [n/a, f32] Sersic index # Column 7: PA [deg, f32] Position angle # Column 8: Q [n/a, f32] Axis ratio # Column 9: MAG [log, f32] Magnitude # Column 10: TRUNC [n/a, f32] Truncation (multiple of R) 0 0.000 0.000 2 5 4.7 0.0 1.0 30.0 5.0 1 250.0 250.0 1 40 1.0 -25 0.4 3.44 5.0 2 50.00 50.00 4 0 0.0 0.0 0.0 6.00 0.0 3 450.0 50.00 4 0 0.0 0.0 0.0 6.50 0.0 4 50.00 450.0 4 0 0.0 0.0 0.0 7.00 0.0 5 450.0 450.0 4 0 0.0 0.0 0.0 7.50 0.0 The numbers now make much more sense for the student! Before continuing, Sufi reminded the student that even though metadata may not be strictly/technically necessary (for the computer programs), metadata are critical for human readers! Therefore, a good scientist should never forget to keep metadata with any data that they create, use or archive. To start simulating the nebula, Sufi creates a directory named ‘simulationtest’ in his home directory. Note that the ‘pwd’ command will print the “parent working directory” of the current directory (its a good way to confirm/check your current location in the full file system: it always starts from the root, or ‘‘/’’). $ mkdir ~/simulationtest $ cd ~/simulationtest $ pwd /home/rahman/simulationtest It is possible to use a plain-text editor to manually put the catalog contents above into a plain-text file. But to easily automate catalog production (in later trials), Sufi decides to fill the input catalog with the redirection features of the command-line (or shell). Sufi’s student was not familiar with this feature of the shell! So Sufi decided to do a fast demo; giving the following explanations while running the commands: Shell redirection allows you to “re-direct” the “standard output” of a program (which is usually printed by the program on the command-line during its execution; like the output of ‘pwd’ above) into a file. For example, let’s simply “echo” (or print to standard output) the line “This is a test.”: $ echo "This is a test." This is a test. As you see, our statement was simply “echo”-ed to the standard output! To redirect this sentence into a file (instead of simply printing it on the standard output), we can simply use the ‘>’ character, followed by the name of the file we want it to be dumped in. $ echo "This is a test." > test.txt This time, the ‘echo’ command did not print anything in the terminal. Instead, the shell (command-line environment) took the output, and “re-directed” it into a file called ‘test.txt’. Let’s confirm this with the ‘ls’ command (‘ls’ is short for “list” and will list all the files in the current directory): $ ls test.txt Now that you confirm the existence of ‘test.txt’, you can see its contents with the ‘cat’ command (short for “concatenation”; because it can also merge multiple files together): $ cat test.txt This is a test. Now that we have written our first line in ‘test.txt’, let’s try adding a second line (do not forget that our final catalog of objects to simulate will have multiple lines): $ echo "This is my second line." > test.txt $ cat test.txt This is my second line. As you see, the first line that you put in the file is no longer present! This happens because ‘‘>’’ always starts dumping content to a file from the start of the file. In effect, this means that any possibly pre-existing content is over-written by the new content! To append new lines (or dumping new content at the end of existing content), you can use ‘‘>>’’. for example, with the commands below, first we will write the first sentence (using ‘‘>’’), then use ‘‘>>’’ to add the second and third sentences. Finally, we will print the contents of ‘test.txt’ to confirm that all three lines are preserved. $ echo "My first sentence." > test.txt $ echo "My second sentence." >> test.txt $ echo "My third sentence." >> test.txt $ cat test.txt My first sentence. My second sentence. My third sentence. The student thanked Sufi for this explanation and now feels more comfortable with redirection. Therefore Sufi continues with the main project. But before that, he deletes the temporary test file: $ rm test.txt To put the catalog of profile data and their metadata (that was described above) into a file, Sufi uses the commands below. While Sufi was writing these commands, the student complained that “I could have done in this in a text editor”. Sufi reminded the student that it is indeed possible; but it requires manual intervention. The advantage of a solution like below is that it can be automated (for example, adding more rows; for more profiles in the final image). $ echo "# Column 1: ID [counter, u8] Identifier" > cat.txt $ echo "# Column 2: X [pix, f32] Horizontal position" >> cat.txt $ echo "# Column 3: Y [pix, f32] Vertical position" >> cat.txt $ echo "# Column 4: PROF [name, u8] Radial profile function" \ >> cat.txt $ echo "# Column 5: R [pix, f32] Effective radius" >> cat.txt $ echo "# Column 6: N [n/a, f32] Sersic index" >> cat.txt $ echo "# Column 7: PA [deg, f32] Position angle" >> cat.txt $ echo "# Column 8: Q [n/a, f32] Axis ratio" >> cat.txt $ echo "# Column 9: MAG [log, f32] Magnitude" >> cat.txt $ echo "# Column 10: TRUNC [n/a, f32] Truncation (multiple of R)" \ >> cat.txt $ echo "0 0.000 0.000 2 5 4.7 0.0 1.0 30.0 5.0" >> cat.txt $ echo "1 250.0 250.0 1 40 1.0 -25 0.4 3.44 5.0" >> cat.txt $ echo "2 50.00 50.00 4 0 0.0 0.0 0.0 6.00 0.0" >> cat.txt $ echo "3 450.0 50.00 4 0 0.0 0.0 0.0 6.50 0.0" >> cat.txt $ echo "4 50.00 450.0 4 0 0.0 0.0 0.0 7.00 0.0" >> cat.txt $ echo "5 450.0 450.0 4 0 0.0 0.0 0.0 7.50 0.0" >> cat.txt To make sure if the catalog’s content is correct (and there was no typo for example!), Sufi runs ‘‘cat cat.txt’’, and confirms that it is correct. Now that the catalog is created, Sufi is ready to call MakeProfiles to build the image containing these objects. He looks into his records and finds that the zero point magnitude for that night, and that particular detector, was 18 magnitudes. The student was a little confused on the concept of zero point, so Sufi pointed him to *note Brightness flux magnitude::, which the student can study in detail later. Sufi therefore runs MakeProfiles with the command below: $ astmkprof --prepforconv --mergedsize=499,499 --zeropoint=18.0 cat.txt MakeProfiles 0.19 started on Sat Oct 6 16:26:56 953 - 6 profiles read from cat.txt - Random number generator (RNG) type: ranlxs1 - Basic RNG seed: 1652884540 - Using 12 threads. ---- row 3 complete, 5 left to go ---- row 4 complete, 4 left to go ---- row 6 complete, 3 left to go ---- row 5 complete, 2 left to go ---- ./0_cat_profiles.fits created. ---- row 1 complete, 1 left to go ---- row 2 complete, 0 left to go - ./cat_profiles.fits created. 0.092573 seconds -- Output: ./cat_profiles.fits MakeProfiles finished in 0.293644 seconds Sufi encourages the student to read through the printed output. As the statements say, two FITS files should have been created in the running directory. So Sufi ran the command below to confirm: $ ls 0_cat_profiles.fits cat_profiles.fits cat.txt The file ‘0_cat_profiles.fits’ is the PSF Sufi had asked for, and ‘cat_profiles.fits’ is the image containing the main objects in the catalog. Sufi opened the main image with the command below (using SAO DS9): $ astscript-fits-view cat_profiles.fits --ds9scale=95 The student could clearly see the main elliptical structure in the center. However, the size of ‘cat_profiles.fits’ was surprising for the student, instead of 499 by 499 (as we had requested), it was 2615 by 2615 pixels (from the command below): $ astfits cat_profiles.fits Fits (GNU Astronomy Utilities) 0.19 Run on Sat Oct 6 16:26:58 953 ----- HDU (extension) information: 'cat_profiles.fits'. Column 1: Index (counting from 0, usable with '--hdu'). Column 2: Name ('EXTNAME' in FITS standard, usable with '--hdu'). Column 3: Image data type or 'table' format (ASCII or binary). Column 4: Size of data in HDU. ----- 0 MKPROF-CONFIG no-data 0 1 Mock profiles float32 2615x2615 So Sufi explained why oversampling is important in modeling, especially for parts of the image where the flux change is significant over a pixel. Recall that when you oversample the model (for example, by 5 times), for every desired pixel, you get 25 pixels ($5\times5$). Sufi then explained that after convolving (next step below) we will down-sample the image to get our originally desired size/resolution. After seeing the image, the student complained that only the large elliptical model for the Andromeda nebula can be seen in the center. He could not see the four stars that we had also requested in the catalog. So Sufi had to explain that the stars are there in the image, but the reason that they are not visible when looking at the whole image at once, is that they only cover a single pixel! To prove it, he centered the image around the coordinates 2308 and 2308, where one of the stars is located in the over-sampled image [you can do this in ‘ds9’ by selecting “Pan” in the “Edit” menu, then clicking around that position]. Sufi then zoomed in to that region and soon, the star’s non-zero pixel could be clearly seen. Sufi explained that the stars will take the shape of the PSF (cover an area of more than one pixel) after convolution. If we did not have an atmosphere and we did not need an aperture, then stars would only cover a single pixel with normal CCD resolutions. So Sufi convolved the image with this command: $ astconvolve --kernel=0_cat_profiles.fits cat_profiles.fits \ --output=cat_convolved.fits Convolve started on Sat Oct 6 16:35:32 953 - Using 8 CPU threads. - Input: cat_profiles.fits (hdu: 1) - Kernel: 0_cat_profiles.fits (hdu: 1) - Input and Kernel images padded. 0.075541 seconds - Images converted to frequency domain. 6.728407 seconds - Multiplied in the frequency domain. 0.040659 seconds - Converted back to the spatial domain. 3.465344 seconds - Padded parts removed. 0.016767 seconds - Output: cat_convolved.fits Convolve finished in: 10.422161 seconds When convolution finished, Sufi opened ‘cat_convolved.fits’ and the four stars could be easily seen now: $ astscript-fits-view cat_convolved.fits --ds9scale=95 It was interesting for the student that all the flux in that single pixel is now distributed over so many pixels (the sum of all the pixels in each convolved star is actually equal to the value of the single pixel before convolution). Sufi explained how a PSF with a larger FWHM would make the points even wider than this (distributing their flux in a larger area). With the convolved image ready, they were prepared to re-sample it to the original pixel scale Sufi had planned [from the ‘$ astmkprof -P’ command above, recall that MakeProfiles had over-sampled the image by 5 times]. Sufi explained the basic concepts of warping the image to his student and ran Warp with the following command: $ astwarp --scale=1/5 --centeroncorner cat_convolved.fits Warp started on Sat Oct 6 16:51:59 953 Using 8 CPU threads. Input: cat_convolved.fits (hdu: 1) matrix: 0.2000 0.0000 0.4000 0.0000 0.2000 0.4000 0.0000 0.0000 1.0000 $ astfits cat_convolved_scaled.fits --quiet 0 WARP-CONFIG no-data 0 1 Warped float32 523x523 ‘cat_convolved_scaled.fits’ now has the correct pixel scale. However, the image is still larger than what we had wanted, it is $523\times523$ pixels (not our desired $499\times499$). The student is slightly confused, so Sufi also re-samples the PSF with the same scale by running $ astwarp --scale=1/5 --centeroncorner 0_cat_profiles.fits $ astfits 0_cat_profiles_scaled.fits --quiet 0 WARP-CONFIG no-data 0 1 Warped float32 25x25 Sufi notes that $25=12+12+1$ and that $523=499+12+12$. He goes on to explain that frequency space convolution will dim the edges and that is why he added the ‘--prepforconv’ option to MakeProfiles above. Now that convolution is done, Sufi can remove those extra pixels using Crop with the command below. Crop’s ‘--section’ option accepts coordinates inclusively and counting from 1 (according to the FITS standard), so the crop region’s first pixel has to be 13, not 12. $ astcrop cat_convolved_scaled.fits --section=13:*-12,13:*-12 \ --mode=img --zeroisnotblank Crop started on Sat Oct 6 17:03:24 953 - Read metadata of 1 image. 0.001304 seconds ---- ...nvolved_scaled_cropped.fits created: 1 input. Crop finished in: 0.027204 seconds To fully convince the student, Sufi checks the size of the output of the crop command above: $ astfits cat_convolved_scaled_cropped.fits --quiet 0 n/a no-data 0 1 n/a float32 499x499 Finally, ‘cat_convolved_scaled_cropped.fits’ is $499\times499$ pixels and the mock Andromeda galaxy is centered on the central pixel. This is the same dimensions as Sufi had desired in the beginning. All this trouble was certainly worth it because now there is no dimming on the edges of the image and the profile centers are more accurately sampled. The final step to simulate a real observation would be to add noise to the image. Sufi set the zero point magnitude to the same value that he set when making the mock profiles and looking again at his observation log, he had measured the background flux near the nebula had a _per-pixel_ magnitude of 7 that night. For more on how the background value determines the noise, see *note Noise basics::. So using these values he ran MakeNoise, and with the second command, he visually inspected the image. $ astmknoise --zeropoint=18 --background=7 --output=out.fits \ cat_convolved_scaled_cropped.fits MakeNoise 0.19 started on Sat Oct 6 17:05:06 953 - Generator type: ranlxs1 - Generator seed: 1428318100 MakeNoise finished in: 0.033491 (seconds) $ astscript-fits-view out.fits The ‘out.fits’ file now contains the noised image of the mock catalog Sufi had asked for. The student had not observed the nebula in the sky, so when he saw the mock image in SAO DS9 (with the second command above), he understood why Sufi was dubious: it was very diffuse! Seeing how the ‘--output’ option allows the user to specify the name of the output file, the student was confused and wanted to know why Sufi had not used it more regularly before? Sufi explained that for intermediate steps, you can rely on the automatic output of the programs (see *note Automatic output::). Doing so will give all the intermediate files a similar basic name structure, so in the end you can simply remove them all with the Shell’s capabilities, and it will be familiar for other users. So Sufi decided to show this to the student by making a shell script from the commands he had used before. The command-line shell has the capability to read all the separate input commands from a file. This is useful when you want to do the same thing multiple times, with only the names of the files or minor parameters changing between the different instances. Using the shell’s history (by pressing the up keyboard key) Sufi reviewed all the commands and then he retrieved the last 5 commands with the ‘$ history 5’ command. He selected all those lines he had input and put them in a text file named ‘mymock.sh’. Then he defined the ‘edge’ and ‘base’ shell variables for easier customization later, and before every command, he added some comments (lines starting with <#>) for future readability. Finally, Sufi pointed the student to Gnuastro’s *note General program usage tutorial::, which has a full section on *note Writing scripts to automate the steps::. #!/bin/bash edge=12 base=cat # Stop running next commands if one fails. set -e # Remove any (possibly) existing output (from previous runs) # before starting. rm -f out.fits # Run MakeProfiles to create an oversampled FITS image. astmkprof --prepforconv --mergedsize=499,499 --zeropoint=18.0 \ "$base".txt # Convolve the created image with the kernel. astconvolve "$base"_profiles.fits \ --kernel=0_"$base"_profiles.fits \ --output="$base"_convolved.fits # Scale the image back to the intended resolution. astwarp --scale=1/5 --centeroncorner "$base"_convolved.fits # Crop the edges out (dimmed during convolution). '--section' # accepts inclusive coordinates, so the start of the section # must be one pixel larger than its end. st_edge=$(( edge + 1 )) astcrop "$base"_convolved_scaled.fits --zeroisnotblank \ --mode=img --section=$st_edge:*-$edge,$st_edge:*-$edge # Add noise to the image. astmknoise --zeropoint=18 --background=7 --output=out.fits \ "$base"_convolved_scaled_cropped.fits # Remove all the temporary files. rm 0*.fits "$base"*.fits He used this chance to remind the student of the importance of comments in code or shell scripts! Just like metadata in a dataset, when writing the code, you have a good mental picture of what you are doing, so writing comments might seem superfluous and excessive. However, in one month when you want to re-use the script, you have lost that mental picture and remembering it can be time-consuming and frustrating. The importance of comments is further amplified when you want to share the script with a friend/colleague. So it is good to accompany any step of a script, or code, with useful comments while you are writing it (create a good mental picture of why you are doing something: do not just describe the command, but its purpose). Sufi then explained to the eager student that you define a variable by giving it a name, followed by an ‘=’ sign and the value you want. Then you can reference that variable from anywhere in the script by calling its name with a ‘$’ prefix. So in the script whenever you see ‘$base’, the value we defined for it above is used. If you use advanced editors like GNU Emacs or even simpler ones like Gedit (part of the GNOME graphical user interface) the variables will become a different color which can really help in understanding the script. We have put all the ‘$base’ variables in double quotation marks (‘"’) so the variable name and the following text do not get mixed, the shell is going to ignore the ‘"’ after replacing the variable value. To make the script executable, Sufi ran the following command: $ chmod +x mymock.sh Then finally, Sufi ran the script, simply by calling its file name: $ ./mymock.sh After the script finished, the only file remaining is the ‘out.fits’ file that Sufi had wanted in the beginning. Sufi then explained to the student how he could run this script anywhere that he has a catalog if the script is in the same directory. The only thing the student had to modify in the script was the name of the catalog (the value of the ‘base’ variable in the start of the script) and the value to the ‘edge’ variable if he changed the PSF size. The student was also happy to hear that he will not need to make it executable again when he makes changes later, it will remain executable unless he explicitly changes the executable flag with ‘chmod’. The student was really excited, since now, through simple shell scripting, he could really speed up his work and run any command in any fashion he likes allowing him to be much more creative in his works. Until now he was using the graphical user interface which does not have such a facility and doing repetitive things on it was really frustrating and some times he would make mistakes. So he left to go and try scripting on his own computer. He later reminded Sufi that the second tutorial in the Gnuastro book as more complex commands in data analysis, and a more advanced introduction to scripting (see *note General program usage tutorial::). Sufi could now get back to his own work and see if the simulated nebula which resembled the one in the Andromeda constellation could be detected or not. Although it was extremely faint(2). Therefore, Sufi ran Gnuastro’s detection software (*note NoiseChisel::) to see if this object is detectable or not. NoiseChisel’s output (‘out_detected.fits’) is a multi-extension FITS file, so he used Gnuastro’s ‘astscript-fits-view’ program in the second command to see the output: $ astnoisechisel out.fits $ astscript-fits-view out_detected.fits In the “Cube” window (that was opened with DS9), if Sufi clicked on the “Next” button to see the pixels that were detected to contain significant signal. Fortunately the nebula’s shape was detectable and he could finally confirm that the nebula he kept in his notebook was actually observable. He wrote this result in the draft manuscript that would later become “Book of fixed stars”(3). He still had to check the other nebula he saw from Yemen and several other such objects, but they could wait until tomorrow (thanks to the shell script, he only has to define a new catalog). It was nearly sunset and they had to begin preparing for the night’s measurements on the ecliptic. ---------- Footnotes ---------- (1) In Latin Sufi is known as Azophi. He was an Iranian astronomer. His manuscript “Book of fixed stars” contains the first recorded observations of the Andromeda galaxy, the Large Magellanic Cloud and seven other non-stellar or ‘nebulous’ objects. (2) The brightness of a diffuse object is added over all its pixels to give its final magnitude, see *note Brightness flux magnitude::. So although the magnitude 3.44 (of the mock nebula) is orders of magnitude brighter than 6 (of the stars), the central galaxy is much fainter. Put another way, the brightness is distributed over a large area in the case of a nebula. (3) 3 Installation ************** The latest released version of Gnuastro source code is always available at the following URL: *note Quick start:: describes the commands necessary to configure, build, and install Gnuastro on your system. This chapter will be useful in cases where the simple procedure above is not sufficient, for example, your system lacks a mandatory/optional dependency (in other words, you cannot pass the ‘$ ./configure’ step), or you want greater customization, or you want to build and install Gnuastro from other random points in its history, or you want a higher level of control on the installation. Thus if you were happy with downloading the tarball and following *note Quick start::, then you can safely ignore this chapter and come back to it in the future if you need more customization. *note Dependencies:: describes the mandatory, optional and bootstrapping dependencies of Gnuastro. Only the first group are required/mandatory when you are building Gnuastro using a tarball (see *note Release tarball::), they are very basic and low-level tools used in most astronomical software, so you might already have them installed, if not they are very easy to install as described for each. *note Downloading the source:: discusses the two methods you can obtain the source code: as a tarball (a significant snapshot in Gnuastro’s history), or the full history(1). The latter allows you to build Gnuastro at any random point in its history (for example, to get bug fixes or new features that are not released as a tarball yet). The building and installation of Gnuastro is heavily customizable, to learn more about them, see *note Build and install::. This section is essentially a thorough explanation of the steps in *note Quick start::. It discusses ways you can influence the building and installation. If you encounter any problems in the installation process, it is probably already explained in *note Known issues::. In *note Other useful software:: the installation and usage of some other free software that are not directly required by Gnuastro but might be useful in conjunction with it is discussed. ---------- Footnotes ---------- (1) *note Bootstrapping dependencies:: are required if you clone the full history. 3.1 Dependencies ================ A minimal set of dependencies are mandatory for building Gnuastro from the standard tarball release. If they are not present you cannot pass Gnuastro’s configuration step. The mandatory dependencies are therefore very basic (low-level) tools which are easy to obtain, build and install, see *note Mandatory dependencies:: for a full discussion. If you have the packages of *note Optional dependencies::, Gnuastro will have additional functionality (for example, converting FITS images to JPEG or PDF). If you are installing from a tarball as explained in *note Quick start::, you can stop reading after this section. If you are cloning the version controlled source (see *note Version controlled source::), an additional bootstrapping step is required before configuration and its dependencies are explained in *note Bootstrapping dependencies::. Your operating system’s package manager is an easy and convenient way to download and install the dependencies that are already pre-built for your operating system. In *note Dependencies from package managers::, we will list some common operating system package manager commands to install the optional and mandatory dependencies. 3.1.1 Mandatory dependencies ---------------------------- The mandatory Gnuastro dependencies are very basic and low-level tools. They all follow the same basic GNU based build system (like that shown in *note Quick start::), so even if you do not have them, installing them should be pretty straightforward. In this section we explain each program and any specific note that might be necessary in the installation. 3.1.1.1 GNU Scientific Library .............................. The GNU Scientific Library (http://www.gnu.org/software/gsl/), or GSL, is a large collection of functions that are very useful in scientific applications, for example, integration, random number generation, and Fast Fourier Transform among many others. To install GSL from source, you can run the following commands after you have downloaded ‘gsl-latest.tar.gz’ (http://ftpmirror.gnu.org/gsl/gsl-latest.tar.gz): $ tar xf gsl-latest.tar.gz $ cd gsl-X.X # Replace X.X with version number. $ ./configure CFLAGS="$CFLAGS -g0 -O3" $ make -j8 # Replace 8 with no. CPU threads. $ make check $ sudo make install 3.1.1.2 CFITSIO ............... CFITSIO (http://heasarc.gsfc.nasa.gov/fitsio/) is the closest you can get to the pixels in a FITS image while remaining faithful to the FITS standard (http://fits.gsfc.nasa.gov/fits_standard.html). It is written by William Pence, the principal author of the FITS standard(1), and is regularly updated. Setting the definitions for all other software packages using FITS images. Some GNU/Linux distributions have CFITSIO in their package managers, if it is available and updated, you can use it. One problem that might occur is that CFITSIO might not be configured with the ‘--enable-reentrant’ option by the distribution. This option allows CFITSIO to open a file in multiple threads, it can thus provide great speed improvements. If CFITSIO was not configured with this option, any program which needs this capability will warn you and abort when you ask for multiple threads (see *note Multi-threaded operations::). To install CFITSIO from source, we strongly recommend that you have a look through Chapter 2 (Creating the CFITSIO library) of the CFITSIO manual and understand the options you can pass to ‘$ ./configure’ (they are not too much). This is a very basic package for most astronomical software and it is best that you configure it nicely with your system. Once you download the source and unpack it, the following configure script should be enough for most purposes. Do Not forget to read chapter two of the manual though, for example, the second option is only for 64bit systems. The manual also explains how to check if it has been installed correctly. CFITSIO comes with two executable files called ‘fpack’ and ‘funpack’. From their manual: they “are standalone programs for compressing and uncompressing images and tables that are stored in the FITS (Flexible Image Transport System) data format. They are analogous to the gzip and gunzip compression programs except that they are optimized for the types of astronomical images that are often stored in FITS format”. The commands below will compile and install them on your system along with CFITSIO. They are not essential for Gnuastro, since they are just wrappers for functions within CFITSIO, but they can come in handy. The ‘make utils’ command is only available for versions above 3.39, it will build these executable files along with several other executable test files which are deleted in the following commands before the installation (otherwise the test files will also be installed). The commands necessary to decompress, build and install CFITSIO from source are described below. Let’s assume you have downloaded ‘cfitsio_latest.tar.gz’ (http://heasarc.gsfc.nasa.gov/FTP/software/fitsio/c/cfitsio_latest.tar.gz) and are in the same directory: $ tar xf cfitsio_latest.tar.gz $ cd cfitsio-X.XX # Replace X.XX with version $ ./configure --prefix=/usr/local --enable-sse2 --enable-reentrant \ CFLAGS="$CFLAGS -g0 -O3" $ make $ make utils $ ./testprog > testprog.lis # See below if this has an error $ diff testprog.lis testprog.out # Should have no output $ cmp testprog.fit testprog.std # Should have no output $ rm cookbook fitscopy imcopy smem speed testprog $ sudo make install In the ‘./testprog > testprog.lis’ step, you may confront an error, complaining that it cannot find ‘libcfitsio.so.AAA’ (where ‘AAA’ is an integer). This is the library that you just built and have not yet installed. But unfortunately some versions of CFITSIO do not account for this on some OSs. To fix the problem, you need to tell your OS to also look into current CFITSIO build directory with the first command below, afterwards, the problematic command (second below) should run properly. $ export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH" $ ./testprog > testprog.lis Recall that the modification above is ONLY NECESSARY FOR THIS STEP. _Do Not_ put the ‘LD_LIBRARY_PATH’ modification command in a permanent place (like your bash startup file). After installing CFITSIO, close your terminal and continue working on a new terminal (so ‘LD_LIBRARY_PATH’ has its default value). For more on ‘LD_LIBRARY_PATH’, see *note Installation directory::. ---------- Footnotes ---------- (1) Pence, W.D. et al. Definition of the Flexible Image Transport System (FITS), version 3.0. (2010) Astronomy and Astrophysics, Volume 524, id.A42, 40 pp. 3.1.1.3 WCSLIB .............. WCSLIB (http://www.atnf.csiro.au/people/mcalabre/WCS/) is written and maintained by one of the authors of the World Coordinate System (WCS) definition in the FITS standard (http://fits.gsfc.nasa.gov/fits_standard.html)(1), Mark Calabretta. It might be already built and ready in your distribution’s package management system. However, here the installation from source is explained, for the advantages of installation from source please see *note Mandatory dependencies::. To install WCSLIB you will need to have CFITSIO already installed, see *note CFITSIO::. WCSLIB also has plotting capabilities which use PGPLOT (a plotting library for C). If you wan to use those capabilities in WCSLIB, *note PGPLOT:: provides the PGPLOT installation instructions. However PGPLOT is old(2), so its installation is not easy, there are also many great modern WCS plotting tools (mostly in written in Python). Hence, if you will not be using those plotting functions in WCSLIB, you can configure it with the ‘--without-pgplot’ option as shown below. If you have the cURL library (3) on your system and you installed CFITSIO version 3.42 or later, you will need to also link with the cURL library at configure time (through the ‘-lcurl’ option as shown below). CFITSIO uses the cURL library for its HTTPS (or HTTP Secure(4)) support and if it is present on your system, CFITSIO will depend on it. Therefore, if ‘./configure’ command below fails (you do not have the cURL library), then remove this option and rerun it. Let’s assume you have downloaded ‘wcslib.tar.bz2’ (ftp://ftp.atnf.csiro.au/pub/software/wcslib/wcslib.tar.bz2) and are in the same directory, to configure, build, check and install WCSLIB follow the steps below. $ tar xf wcslib.tar.bz2 ## In the `cd' command, replace `X.X' with version number. $ cd wcslib-X.X ## If `./configure' fails, remove `-lcurl' and run again. $ ./configure LIBS="-pthread -lcurl -lm" --without-pgplot \ --disable-fortran CFLAGS="$CFLAGS -g0 -O3" $ make $ make check $ sudo make install ---------- Footnotes ---------- (1) Greisen E.W., Calabretta M.R. (2002) Representation of world coordinates in FITS. Astronomy and Astrophysics, 395, 1061-1075. (2) As of early June 2016, its most recent version was uploaded in February 2001. (3) (4) 3.1.2 Optional dependencies --------------------------- The libraries listed here are only used for very specific applications, therefore they are optional and Gnuastro can be built without them (with only those specific features disabled). Since these are pretty low-level tools, they are not too hard to install from source, but you can also use your operating system’s package manager to easily install all of them. For more, see *note Dependencies from package managers::. If the ‘./configure’ script cannot find any of these optional dependencies, it will notify you of the operation(s) you cannot do due to not having them. If you continue the build and request an operation that uses a missing library, Gnuastro’s programs will warn that the optional library was missing at build-time and abort. Since Gnuastro was built without that library, installing the library afterwards will not help. The only way is to re-build Gnuastro from scratch (after the library has been installed). However, for program dependencies (like cURL or Ghostscript) things are easier: you can install them after building Gnuastro also. This is because libraries are used to build the internal structure of Gnuastro’s executables. However, a program dependency is called by Gnuastro’s programs at run-time and has no effect on their internal structure. So if a dependency program becomes available later, it will be used next time it is requested. GNU Libtool Libtool is a program to simplify managing of the libraries to build an executable (a program). GNU Libtool has some added functionality compared to other implementations. If GNU Libtool is not present on your system at configuration time, a warning will be printed and *note BuildProgram:: will not be built or installed. The configure script will look into your search path (‘PATH’) for GNU Libtool through the following executable names: ‘libtool’ (acceptable only if it is the GNU implementation) or ‘glibtool’. See *note Installation directory:: for more on ‘PATH’. GNU Libtool (the binary/executable file) is a low-level program that is probably already present on your system, and if not, is available in your operating system package manager(1). If you want to install GNU Libtool’s latest version from source, please visit its web page (https://www.gnu.org/software/libtool/). Gnuastro’s tarball is shipped with an internal implementation of GNU Libtool. Even if you have GNU Libtool, Gnuastro’s internal implementation is used for the building and installation of Gnuastro. As a result, you can still build, install and use Gnuastro even if you do not have GNU Libtool installed on your system. However, this internal Libtool does not get installed. Therefore, after Gnuastro’s installation, if you want to use *note BuildProgram:: to compile and link your own C source code which uses the *note Gnuastro library::, you need to have GNU Libtool available on your system (independent of Gnuastro). See *note Review of library fundamentals:: to learn more about libraries. GNU Make extension headers GNU Make is a workflow management system that can be used to run a series of commands in a specific order, and in parallel if you want. GNU Make offers special features to extend it with custom functions within a dynamic library. They are defined in the ‘gnumake.h’ header. If ‘gnumake.h’ can be found on your system at configuration time, Gnuastro will build a custom library that GNU Make can use for extended functionality in (astronomical) data analysis scenarios. libgit2 Git is one of the most common version control systems (see *note Version controlled source::). When ‘libgit2’ is present, and Gnuastro’s programs are run within a version controlled directory, outputs will contain the version number of the working directory’s repository for future reproducibility. See the ‘COMMIT’ keyword header in *note Output FITS files:: for a discussion. libjpeg libjpeg is only used by ConvertType to read from and write to JPEG images, see *note Recognized file formats::. libjpeg (http://www.ijg.org/) is a very basic library that provides tools to read and write JPEG images, most Unix-like graphic programs and libraries use it. Therefore you most probably already have it installed. libjpeg-turbo (http://libjpeg-turbo.virtualgl.org/) is an alternative to libjpeg. It uses Single instruction, multiple data (SIMD) instructions for ARM based systems that significantly decreases the processing time of JPEG compression and decompression algorithms. libtiff libtiff is used by ConvertType and the libraries to read TIFF images, see *note Recognized file formats::. libtiff (http://www.simplesystems.org/libtiff/) is a very basic library that provides tools to read and write TIFF images, most Unix-like operating system graphic programs and libraries use it. Therefore even if you do not have it installed, it must be easily available in your package manager. cURL cURL’s executable (‘curl’) is called by *note Query:: for submitting queries to remote datasets and retrieving the results. It is not necessary for the build of Gnuastro from source (only a warning will be printed if it cannot be found at configure time), so if you do not have it at build-time there is no problem. Just be sure to have it when you run ‘astquery’, otherwise you’ll get an error about not finding ‘curl’. GPL Ghostscript GPL Ghostscript’s executable (‘gs’) is called by ConvertType to compile a PDF file from a source PostScript file, see *note ConvertType::. Therefore its headers (and libraries) are not needed. Python3 with Numpy Python is a high-level programming language and Numpy is the most commonly used library within Python to add multi-dimensional arrays and matrices. If version 3 of Python is available with a corresponding Numpy Library, Gnuastro’s library will be built with some Python-related helper functions. Python wrappers for Gnuastro’s library (for example, ‘pyGnuastro’) can use these functions when being built from source. For more on Gnuastro’s Python helper functions, see *note Python interface::. This Python interface is only relevant if you want to build the Python wrappers (like ‘pyGnuastro’) from source. If you install the Gnuastro Python wrapper from a pre-built repository like PyPI, this feature of your Gnuastro library won’t be used. Pre-built libraries contain the full Gnuastro library that they need within them (you don’t even need to have Gnuastro at all!). *Can’t find the Python3 and Numpy of a virtual environment:* make sure to set the ‘$PYTHON’ variable to point to the ‘python3’ command of the virtual environment before running ‘./configure’. Note that you don’t need to activate the virtual env, just point ‘PYTHON’ to its Python3 executable, like the example below: $ python3 -m venv test-env # Setting up the virtual env. $ export PYTHON="$(pwd)/test-env/bin/python3" $ ./configure # Gnuastro's configure script. SAO DS9 SAO DS9 (‘ds9’) is a visualization tool for FITS images. Gnuastro’s ‘astscript-fits-view’ program calls DS9 to visualize FITS images. We have a full appendix on it and how to install it in *note SAO DS9::. Since it is a run-time dependency, it can be installed at any later time (after building and installing Gnuastro). TOPCAT TOPCAT (‘topcat’) is a visualization tool for astronomical tables (most commonly: plotting). Gnuastro’s ‘astscript-fits-view’ program calls TOPCAT it to visualize tables. We have a full appendix on it and how to install it in *note TOPCAT::. Since it is a run-time dependency, it can be installed at any later time (after building and installing Gnuastro). ---------- Footnotes ---------- (1) Note that we want the binary/executable Libtool program which can be run on the command-line. In Debian-based operating systems which separate various parts of a package, you want want ‘libtool-bin’, the ‘libtool’ package will not contain the executable program. 3.1.3 Bootstrapping dependencies -------------------------------- Bootstrapping is only necessary if you have decided to obtain the full version controlled history of Gnuastro, see *note Version controlled source:: and *note Bootstrapping::. Using the version controlled source enables you to always be up to date with the most recent development work of Gnuastro (bug fixes, new functionalities, improved algorithms, etc.). If you have downloaded a tarball (see *note Downloading the source::), then you can ignore this subsection. To successfully run the bootstrapping process, there are some additional dependencies to those discussed in the previous subsections. These are low level tools that are used by a large collection of Unix-like operating systems programs, therefore they are most probably already available in your system. If they are not already installed, you should be able to easily find them in any GNU/Linux distribution package management system (‘apt-get’, ‘yum’, ‘pacman’, etc.). The short names in parenthesis in ‘typewriter’ font after the package name can be used to search for them in your package manager. For the GNU Portability Library, GNU Autoconf Archive and TeX Live, it is recommended to use the instructions here, not your operating system’s package manager. GNU Portability Library (Gnulib) To ensure portability for a wider range of operating systems (those that do not include GNU C library, namely glibc), Gnuastro depends on the GNU portability library, or Gnulib. Gnulib keeps a copy of all the functions in glibc, implemented (as much as possible) to be portable to other operating systems. The ‘bootstrap’ script can automatically clone Gnulib (as a ‘gnulib/’ directory inside Gnuastro), however, as described in *note Bootstrapping:: this is not recommended. The recommended way to bootstrap Gnuastro is to first clone Gnulib and the Autoconf archives (see below) into a local directory outside of Gnuastro. Let’s call it ‘DEVDIR’(1) (which you can set to any directory). Currently in Gnuastro, both Gnulib and Autoconf archives have to be cloned in the same top directory(2) like the case here(3): $ DEVDIR=/home/yourname/Development $ cd $DEVDIR $ git clone https://git.sv.gnu.org/git/gnulib.git $ git clone https://git.sv.gnu.org/git/autoconf-archive.git Gnulib is a source-based dependency of Gnuastro’s bootstrapping process, so simply having it is enough on your computer, there is no need to install, and thus check anything. You now have the full version controlled source of these two repositories in separate directories. Both these packages are regularly updated, so every once in a while, you can run ‘$ git pull’ within them to get any possible updates. GNU Automake (‘automake’) GNU Automake will build the ‘Makefile.in’ files in each sub-directory using the (hand-written) ‘Makefile.am’ files. The ‘Makefile.in’s are subsequently used to generate the ‘Makefile’s when the user runs ‘./configure’ before building. To check that you have a working GNU Automake in your system, you can try this command: $ automake --version GNU Autoconf (‘autoconf’) GNU Autoconf will build the ‘configure’ script using the configurations we have defined (hand-written) in ‘configure.ac’. To check that you have a working GNU Autoconf in your system, you can try this command: $ autoconf --version GNU Autoconf Archive These are a large collection of tests that can be called to run at ‘./configure’ time. See the explanation under GNU Portability Library (Gnulib) above for instructions on obtaining it and keeping it up to date. GNU Autoconf Archive is a source-based dependency of Gnuastro’s bootstrapping process, so simply having it is enough on your computer, there is no need to install, and thus check anything. Just do not forget that it has to be in the same directory as Gnulib (described above). GNU Texinfo (‘texinfo’) GNU Texinfo is the tool that formats this manual into the various output formats. To bootstrap Gnuastro you need all of Texinfo’s command-line programs. However, some operating systems package them separately, for example, in Fedora, ‘makeinfo’ is packaged in the ‘texinfo-tex’ package. To check that you have a working GNU Texinfo in your system, you can try this command: $ makeinfo --version GNU Libtool (‘libtool’) GNU Libtool is in charge of building all the libraries in Gnuastro. The libraries contain functions that are used by more than one program and are installed for use in other programs. They are thus put in a separate directory (‘lib/’). To check that you have a working GNU Libtool in your system, you can try this command (and from the output, make sure it is GNU’s libtool) $ libtool --version GNU help2man (‘help2man’) GNU help2man is used to convert the output of the ‘--help’ option (*note --help::) to the traditional Man page (*note Man pages::). To check that you have a working GNU Help2man in your system, you can try this command: $ help2man --version LaTeX and some TeX packages Some of the figures in this book are built by LaTeX (using the PGF/TikZ package). The LaTeX source for those figures is version controlled for easy maintenance not the actual figures. So the ‘./boostrap’ script will run LaTeX to build the figures. The best way to install LaTeX and all the necessary packages is through TeX live (https://www.tug.org/texlive/) which is a package manager for TeX related tools that is independent of any operating system. It is thus preferred to the TeX Live versions distributed by your operating system. To install TeX Live, go to the web page and download the appropriate installer by following the “download” link. Note that by default the full package repository will be downloaded and installed (around 4 Giga Bytes) which can take _very_ long to download and to update later. However, most packages are not needed by everyone, it is easier, faster and better to install only the “Basic scheme” (consisting of only the most basic TeX and LaTeX packages, which is less than 200 Mega bytes)(4). After the installation, be sure to set the environment variables as suggested in the end of the outputs. Any time you confront (need) a package you do not have, simply install it with a command like below (similar to how you install software from your operating system’s package manager)(5). To install all the necessary TeX packages for a successful Gnuastro bootstrap, run this command: $ su # tlmgr install epsf jknapltx caption biblatex biber iftex \ etoolbox logreq xstring xkeyval pgf ms \ xcolor pgfplots times rsfs ps2eps epspdf To check that you have a working LaTeX executable in your system, you can try this command (this just checks if LaTeX exists, as described above, if you have a missing package, you can easily identify it from the output and install it with ‘tlmgr’. $ latex --version ImageMagick (‘imagemagick’) ImageMagick is a wonderful and robust program for image manipulation on the command-line. ‘bootstrap’ uses it to convert the book images into the formats necessary for the various book formats. Since ImageMagick version 7, it is necessary to edit the policy file (‘/etc/ImageMagick-7/policy.xml’) to have the following line (it maybe present, but commented, in this case un-comment it): If the following line is present, it is also necessary to comment/remove it. To learn more about the ImageMagick security policy please see: . To check that you have a working ImageMagick in your system, you can try this command: $ convert --version ---------- Footnotes ---------- (1) If you are not a developer in Gnulib or Autoconf archives, ‘DEVDIR’ can be a directory that you do not backup. In this way the large number of files in these projects will not slow down your backup process or take bandwidth (if you backup to a remote server). (2) If you already have the Autoconf archives in a separate directory, or cannot clone it in the same directory as Gnulib, or you have it with another directory name (not ‘autoconf-archive/’), you can follow this short step. Set ‘AUTOCONFARCHIVES’ to your desired address. Then define a symbolic link in ‘DEVDIR’ with the following command so Gnuastro’s bootstrap script can find it: ‘$ ln -s $AUTOCONFARCHIVES $DEVDIR/autoconf-archive’. (3) If your internet connection is active, but Git complains about the network, it might be due to your network setup not recognizing the git protocol. In that case use the following URL for the HTTP protocol instead (for Autoconf archives, replace the name): ‘http://git.sv.gnu.org/r/gnulib.git’ (4) You can also download the DVD iso file at a later time to keep as a backup for when you do not have internet connection if you need a package. (5) After running TeX, or LaTeX, you might get a warning complaining about a ‘missingfile’. Run ‘‘tlmgr info missingfile’’ to see the package(s) containing that file which you can install. 3.1.4 Dependencies from package managers ---------------------------------------- The most basic way to install a package on your system is to build the packages from source yourself. Alternatively, you can use your operating system’s package manager to download pre-compiled files and install them. The latter choice is easier and faster. However, we recommend that you build the *note Mandatory dependencies:: yourself from source (all necessary commands and links are given in the respective section). Here are some basic reasons behind this recommendation. 1. Your operating system’s pre-built software might not be the most recent release. For example, Gnuastro itself is also packaged in some package managers. For the list see: . You will notice that Gnuastro’s version in some operating systems is more than 10 versions old! It is the same for all the dependencies of Gnuastro. 2. For each package, Gnuastro might preform better (or require) certain configuration options that your distribution’s package managers did not add for you. If present, these configuration options are explained during the installation of each in the sections below (for example, in *note CFITSIO::). When the proper configuration has not been set, the programs should complain and inform you. 3. For the libraries, they might separate the binary file from the header files which can cause confusion, see *note Known issues::. 4. Like any other tool, the science you derive from Gnuastro’s tools highly depend on these lower level dependencies, so generally it is much better to have a close connection with them. By reading their manuals, installing them and staying up to date with changes/bugs in them, your scientific results and understanding (of what is going on, and thus how you interpret your scientific results) will also correspondingly improve. Based on your package manager, you can use any of the following commands to install the mandatory and optional dependencies. If your package manager is not included in the list below, please send us the respective command, so we add it. For better archivability and compression ratios, Gnuastro’s recommended tarball compression format is with the Lzip (http://lzip.nongnu.org/lzip.html) program, see *note Release tarball::. Therefore, the package manager commands below also contain Lzip. ‘apt-get’ (Debian-based OSs: Debian, Ubuntu, Linux Mint, etc.) Debian (https://en.wikipedia.org/wiki/Debian) is one of the oldest GNU/Linux distributions(1). It thus has a very extended user community and a robust internal structure and standards. All of it is free software and based on the work of volunteers around the world. Many distributions are thus derived from it, for example, Ubuntu and Linux Mint. This arguably makes Debian-based OSs the largest, and most used, class of GNU/Linux distributions. All of them use Debian’s Advanced Packaging Tool (APT, for example, ‘apt-get’) for managing packages. Mandatory dependencies Without these, Gnuastro cannot be built, they are necessary for input/output and low-level mathematics (see *note Mandatory dependencies::)! $ sudo apt-get install libgsl-dev libcfitsio-dev \ wcslib-dev Optional dependencies If present, these libraries can be used in Gnuastro’s build for extra features, see *note Optional dependencies::. $ sudo apt-get install ghostscript libtool-bin \ libjpeg-dev libtiff-dev libgit2-dev curl lzip Programs to view FITS images or tables These are not used in Gnuastro’s build. They can just help in viewing the inputs/outputs independent of Gnuastro! $ sudo apt-get install saods9 topcat Gnuastro is packaged (https://tracker.debian.org/pkg/gnuastro) in Debian (and thus some of its derivate operating systems). Just make sure it is the most recent version. ‘dnf’ ‘yum’ (Red Hat-based OSs: Red Hat, Fedora, CentOS, Scientific Linux, etc.) Red Hat Enterprise Linux (https://en.wikipedia.org/wiki/Red_Hat) (RHEL) is released by Red Hat Inc. RHEL requires paid subscriptions for use of its binaries and support. But since it is free software, many other teams use its code to spin-off their own distributions based on RHEL. Red Hat-based GNU/Linux distributions initially used the “Yellowdog Updated, Modifier” (YUM) package manager, which has been replaced by “Dandified yum” (DNF). If the latter is not available on your system, you can use ‘yum’ instead of ‘dnf’ in the command below. Mandatory dependencies Without these, Gnuastro cannot be built, they are necessary for input/output and low-level mathematics (see *note Mandatory dependencies::)! $ sudo dnf install gsl-devel cfitsio-devel \ wcslib-devel Optional dependencies If present, these libraries can be used in Gnuastro’s build for extra features, see *note Optional dependencies::. $ sudo dnf install ghostscript libtool \ libjpeg-devel libtiff-devel \ libgit2-devel lzip curl Programs to view FITS images or tables These are not used in Gnuastro’s build. They can just help in viewing the inputs/outputs independent of Gnuastro! $ sudo dnf install saods9 topcat ‘brew’ (macOS) macOS (https://en.wikipedia.org/wiki/MacOS) is the operating system used on Apple devices. macOS does not come with a package manager pre-installed, but several widely used, third-party package managers exist, such as Homebrew or MacPorts. Both are free software. Currently we have only tested Gnuastro’s installation with Homebrew as described below. If not already installed, first obtain Homebrew by following the instructions at . Mandatory dependencies Without these, Gnuastro cannot be built, they are necessary for input/output and low-level mathematics (see *note Mandatory dependencies::)! Homebrew manages packages in different ‘taps’. To install WCSLIB via Homebrew you will need to ‘tap’ into ‘brewsci/science’ first (the tap may change in the future, but can be found by calling ‘brew search wcslib’). $ brew tap brewsci/science $ brew install wcslib gsl cfitsio Optional dependencies If present, these libraries can be used in Gnuastro’s build for extra features, see *note Optional dependencies::. $ brew install ghostscript libtool libjpeg \ libtiff libgit2 curl lzip Programs to view FITS images or tables These are not used in Gnuastro’s build. They can just help in viewing the inputs/outputs independent of Gnuastro! $ brew install saoimageds9 topcat ‘pacman’ (Arch Linux) Arch Linux (https://en.wikipedia.org/wiki/Arch_Linux) is a smaller GNU/Linux distribution, which follows the KISS principle (“keep it simple, stupid”) as a general guideline. It “focuses on elegance, code correctness, minimalism and simplicity, and expects the user to be willing to make some effort to understand the system’s operation”. Arch GNU/Linux uses “Package manager” (Pacman) to manage its packages/components. Mandatory dependencies Without these, Gnuastro cannot be built, they are necessary for input/output and low-level mathematics (see *note Mandatory dependencies::)! $ sudo pacman -S gsl cfitsio wcslib Optional dependencies If present, these libraries can be used in Gnuastro’s build for extra features, see *note Optional dependencies::. $ sudo pacman -S ghostscript libtool libjpeg \ libtiff libgit2 curl lzip Programs to view FITS images or tables These are not used in Gnuastro’s build. They can just help in viewing the inputs/outputs independent of Gnuastro! SAO DS9 and TOPCAT are not available in the standard Arch GNU/Linux repositories. However, installing and using both is very easy from their own web pages, as described in *note SAO DS9:: and *note TOPCAT::. ‘zypper’ (openSUSE and SUSE Linux Enterprise Server) SUSE Linux Enterprise Server(2) (SLES) is the commercial offering which shares code and tools. Many additional packages are offered in the Build Service(3). openSUSE and SLES use ‘zypper’ (cli) and YaST (GUI) for managing repositories and packages. Configuration When building Gnuastro, run the configure script with the following ‘CPPFLAGS’ environment variable: $ ./configure CPPFLAGS="-I/usr/include/cfitsio" Mandatory dependencies Without these, Gnuastro cannot be built, they are necessary for input/output and low-level mathematics (see *note Mandatory dependencies::)! $ sudo zypper install gsl-devel cfitsio-devel \ wcslib-devel Optional dependencies If present, these libraries can be used in Gnuastro’s build for extra features, see *note Optional dependencies::. $ sudo zypper install ghostscript_any libtool \ pkgconfig libcurl-devel \ libgit2-devel \ libjpeg62-devel \ libtiff-devel curl Programs to view FITS images or tables These are not used in Gnuastro’s build. They can just help in viewing the inputs/outputs independent of Gnuastro! $ sudo zypper install ds9 topcat Usually, when libraries are installed by operating system package managers, there should be no problems when configuring and building other programs from source (that depend on the libraries: Gnuastro in this case). However, in some special conditions, problems may pop-up during the configuration, building, or checking/running any of Gnuastro’s programs. The most common of such problems and their solution are discussed below. *Not finding library during configuration:* If a library is installed, but during Gnuastro’s ‘configure’ step the library is not found, then configure Gnuastro like the command below (correcting ‘/path/to/lib’). For more, see *note Known issues:: and *note Installation directory::. $ ./configure LDFLAGS="-L/path/to/lib" *Not finding header (.h) files while building:* If a library is installed, but during Gnuastro’s ‘make’ step, the library’s header (file with a ‘.h’ suffix) is not found, then configure Gnuastro like the command below (correcting ‘/path/to/include’). For more, see *note Known issues:: and *note Installation directory::. $ ./configure CPPFLAGS="-I/path/to/include" *Gnuastro’s programs do not run during check or after install:* If a library is installed, but the programs do not run due to linking problems, set the ‘LD_LIBRARY_PATH’ variable like below (assuming Gnuastro is installed in ‘/path/to/installed’). For more, see *note Known issues:: and *note Installation directory::. $ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/path/to/installed/lib" ---------- Footnotes ---------- (1) (2) (3) 3.2 Downloading the source ========================== Gnuastro’s source code can be downloaded in two ways. As a tarball, ready to be configured and installed on your system (as described in *note Quick start::), see *note Release tarball::. If you want official releases of stable versions this is the best, easiest and most common option. Alternatively, you can clone the version controlled history of Gnuastro, run one extra bootstrapping step and then follow the same steps as the tarball. This will give you access to all the most recent work that will be included in the next release along with the full project history. The process is thoroughly introduced in *note Version controlled source::. 3.2.1 Release tarball --------------------- A release tarball (commonly compressed) is the most common way of obtaining free and open source software. A tarball is a snapshot of one particular moment in the Gnuastro development history along with all the necessary files to configure, build, and install Gnuastro easily (see *note Quick start::). It is very straightforward and needs the least set of dependencies (see *note Mandatory dependencies::). Gnuastro has tarballs for official stable releases and pre-releases for testing. See *note Version numbering:: for more on the two types of releases and the formats of the version numbers. The URLs for each type of release are given below. Official stable releases (): This URL hosts the official stable releases of Gnuastro. Always use the most recent version (see *note Version numbering::). By clicking on the “Last modified” title of the second column, the files will be sorted by their date which you can also use to find the latest version. It is recommended to use a mirror to download these tarballs, please visit and see below. Pre-release tar-balls (): This URL contains unofficial pre-release versions of Gnuastro. The pre-release versions of Gnuastro here are for enthusiasts to try out before an official release. If there are problems, or bugs then the testers will inform the developers to fix before the next official release. See *note Version numbering:: to understand how the version numbers here are formatted. If you want to remain even more up-to-date with the developing activities, please clone the version controlled source as described in *note Version controlled source::. Gnuastro’s official/stable tarball is released with two formats: Gzip (with suffix ‘.tar.gz’) and Lzip (with suffix ‘.tar.lz’). The pre-release tarballs (after version 0.3) are released only as an Lzip tarball. Gzip is a very well-known and widely used compression program created by GNU and available in most systems. However, Lzip provides a better compression ratio and more robust archival capacity. for example, Gnuastro 0.3’s tarball was 2.9MB and 4.3MB with Lzip and Gzip respectively, see the Lzip web page (http://www.nongnu.org/lzip/lzip.html) for more. Lzip might not be pre-installed in your operating system, if so, installing it from your operating system’s package manager or from source is very easy and fast (it is a very small program). The GNU FTP server is mirrored (has backups) in various locations on the globe (). You can use the closest mirror to your location for a more faster download. Note that only some mirrors keep track of the pre-release (alpha) tarballs. Also note that if you want to download immediately after and announcement (see *note Announcements::), the mirrors might need some time to synchronize with the main GNU FTP server. 3.2.2 Version controlled source ------------------------------- The publicly distributed Gnuastro tar-ball (for example, ‘gnuastro-X.X.tar.gz’) does not contain the revision history, it is only a snapshot of the source code at one significant instant of Gnuastro’s history (specified by the version number, see *note Version numbering::), ready to be configured and built. To be able to develop successfully, the revision history of the code can be very useful to track when something was added or changed, also some updates that are not yet officially released might be in it. We use Git for the version control of Gnuastro. For those who are not familiar with it, we recommend the ProGit book (https://git-scm.com/book/en). The whole book is publicly available for online reading and downloading and does a wonderful job at explaining the concepts and best practices. Let’s assume you want to keep Gnuastro in the ‘TOPGNUASTRO’ directory (can be any directory, change the value below). The full version controlled history of Gnuastro can be cloned in ‘TOPGNUASTRO/gnuastro’ by running the following commands(1): $ TOPGNUASTRO=/home/yourname/Research/projects/ $ cd $TOPGNUASTRO $ git clone git://git.sv.gnu.org/gnuastro.git The ‘$TOPGNUASTRO/gnuastro’ directory will contain hand-written (version controlled) source code for Gnuastro’s programs, libraries, this book and the tests. All are divided into sub-directories with standard and very descriptive names. The version controlled files in the top cloned directory are either mainly in capital letters (for example, ‘THANKS’ and ‘README’) or mainly written in small-caps (for example, ‘configure.ac’ and ‘Makefile.am’). The former are non-programming, standard writing for human readers containing high-level information about the whole package. The latter are instructions to customize the GNU build system for Gnuastro. For more on Gnuastro’s source code structure, please see *note Developing::. We will not go any deeper here. The cloned Gnuastro source cannot immediately be configured, compiled, or installed since it only contains hand-written files, not automatically generated or imported files which do all the hard work of the build process. See *note Bootstrapping:: for the process of generating and importing those files (its not too hard!). Once you have bootstrapped Gnuastro, you can run the standard procedures (in *note Quick start::). Very soon after you have cloned it, Gnuastro’s main ‘master’ branch will be updated on the main repository (since the developers are actively working on Gnuastro), for the best practices in keeping your local history in sync with the main repository see *note Synchronizing::. ---------- Footnotes ---------- (1) If your internet connection is active, but Git complains about the network, it might be due to your network setup not recognizing the Git protocol. In that case use the following URL which uses the HTTP protocol instead: ‘http://git.sv.gnu.org/r/gnuastro.git’ 3.2.2.1 Bootstrapping ..................... The version controlled source code lacks the source files that we have not written or are automatically built. These automatically generated files are included in the distributed tar ball for each distribution (for example, ‘gnuastro-X.X.tar.gz’, see *note Version numbering::) and make it easy to immediately configure, build, and install Gnuastro. However from the perspective of version control, they are just bloatware and sources of confusion (since they are not changed by Gnuastro developers). The process of automatically building and importing necessary files into the cloned directory is known as _bootstrapping_. After bootstrapping is done you are ready to follow the default GNU build steps that you normally run on the tarball (‘./configure && make’ for example, described more in *note Quick start::). Some known issues with bootstrapping may occur during the process, to see how to fix them, please see *note Known issues::. All the instructions for an automatic bootstrapping are available in ‘bootstrap’ and configured using ‘bootstrap.conf’. ‘bootstrap’ and ‘COPYING’ (which contains the software copyright notice) are the only files not written by Gnuastro developers but under version control to enable simple bootstrapping and legal information on usage immediately after cloning. ‘bootstrap.conf’ is maintained by the GNU Portability Library (Gnulib) and this file is an identical copy, so do not make any changes in this file since it will be replaced when Gnulib releases an update. Make all your changes in ‘bootstrap.conf’. The bootstrapping process has its own separate set of dependencies, the full list is given in *note Bootstrapping dependencies::. They are generally very low-level and used by a very large set of commonly used programs, so they are probably already installed on your system. The simplest way to bootstrap Gnuastro is to simply run the bootstrap script within your cloned Gnuastro directory as shown below. However, please read the next paragraph before doing so (see *note Version controlled source:: for ‘TOPGNUASTRO’). $ cd TOPGNUASTRO/gnuastro $ ./bootstrap # Requires internet connection Without any options, ‘bootstrap’ will clone Gnulib within your cloned Gnuastro directory (‘TOPGNUASTRO/gnuastro/gnulib’) and download the necessary Autoconf archives macros. So if you run bootstrap like this, you will need an internet connection every time you decide to bootstrap. Also, Gnulib is a large package and cloning it can be slow. It will also keep the full Gnulib repository within your Gnuastro repository, so if another one of your projects also needs Gnulib, and you insist on running bootstrap like this, you will have two copies. In case you regularly backup your important files, Gnulib will also slow down the backup process. Therefore while the simple invocation above can be used with no problem, it is not recommended. To do better, see the next paragraph. The recommended way to get these two packages is thoroughly discussed in *note Bootstrapping dependencies:: (in short: clone them in the separate ‘DEVDIR/’ directory). The following commands will take you into the cloned Gnuastro directory and run the ‘bootstrap’ script, while telling it to copy some files (instead of making symbolic links, with the ‘--copy’ option, this is not mandatory(1)) and where to look for Gnulib (with the ‘--gnulib-srcdir’ option). Please note that the address given to ‘--gnulib-srcdir’ has to be an absolute address (so do not use ‘~’ or ‘../’ for example). $ cd $TOPGNUASTRO/gnuastro $ ./bootstrap --copy --gnulib-srcdir=$DEVDIR/gnulib Since Gnulib and Autoconf archives are now available in your local directories, you do not need an internet connection every time you decide to remove all un-tracked files and redo the bootstrap (see box below). You can also use the same command on any other project that uses Gnulib. All the necessary GNU C library functions, Autoconf macros and Automake inputs are now available along with the book figures. The standard GNU build system (*note Quick start::) will do the rest of the job. *Undoing the bootstrap:* During the development, it might happen that you want to remove all the automatically generated and imported files. In other words, you might want to reverse the bootstrap process. Fortunately Git has a good program for this job: ‘git clean’. Run the following command and every file that is not version controlled will be removed. git clean -fxd It is best to commit any recent change before running this command. You might have created new files since the last commit and if they have not been committed, they will all be gone forever (using ‘rm’). To get a list of the non-version controlled files instead of deleting them, add the ‘n’ option to ‘git clean’, so it becomes ‘-fxdn’. Besides the ‘bootstrap’ and ‘bootstrap.conf’, the ‘bootstrapped/’ directory and ‘README-hacking’ file are also related to the bootstrapping process. The former hosts all the imported (bootstrapped) directories. Thus, in the version controlled source, it only contains a ‘README’ file, but in the distributed tar-ball it also contains sub-directories filled with all bootstrapped files. ‘README-hacking’ contains a summary of the bootstrapping process discussed in this section. It is a necessary reference when you have not built this book yet. It is thus not distributed in the Gnuastro tarball. ---------- Footnotes ---------- (1) The ‘--copy’ option is recommended because some backup systems might do strange things with symbolic links. 3.2.2.2 Synchronizing ..................... The bootstrapping script (see *note Bootstrapping::) is not regularly needed: you mainly need it after you have cloned Gnuastro (once) and whenever you want to re-import the files from Gnulib, or Autoconf archives(1) (not too common). However, Gnuastro developers are constantly working on Gnuastro and are pushing their changes to the official repository. Therefore, your local Gnuastro clone will soon be out-dated. Gnuastro has two mailing lists dedicated to its developing activities (see *note Developing mailing lists::). Subscribing to them can help you decide when to synchronize with the official repository. To pull all the most recent work in Gnuastro, run the following command from the top Gnuastro directory. If you do not already have a built system, ignore ‘make distclean’. The separate steps are described in detail afterwards. $ make distclean && git pull && autoreconf -f You can also run the commands separately: $ make distclean $ git pull $ autoreconf -f If Gnuastro was already built in this directory, you do not want some outputs from the previous version being mixed with outputs from the newly pulled work. Therefore, the first step is to clean/delete all the built files with ‘make distclean’. Fortunately the GNU build system allows the separation of source and built files (in separate directories). This is a great feature to keep your source directory clean and you can use it to avoid the cleaning step. Gnuastro comes with a script with some useful options for this job. It is useful if you regularly pull recent changes, see *note Separate build and source directories::. After the pull, we must re-configure Gnuastro with ‘autoreconf -f’ (part of GNU Autoconf). It will update the ‘./configure’ script and all the ‘Makefile.in’(2) files based on the hand-written configurations (in ‘configure.ac’ and the ‘Makefile.am’ files). After running ‘autoreconf -f’, a warning about ‘TEXI2DVI’ might show up, you can ignore that. The most important reason for re-building Gnuastro’s build system is to generate/update the version number for your updated Gnuastro snapshot. This generated version number will include the commit information (see *note Version numbering::). The version number is included in nearly all outputs of Gnuastro’s programs, therefore it is vital for reproducing an old result. As a summary, be sure to run ‘‘autoreconf -f’’ after every change in the Git history. This includes synchronization with the main server or even a commit you have made yourself. If you would like to see what has changed since you last synchronized your local clone, you can take the following steps instead of the simple command above (do not type anything after ‘#’): $ git checkout master # Confirm if you are on master. $ git fetch origin # Fetch all new commits from server. $ git log master..origin/master # See all the new commit messages. $ git merge origin/master # Update your master branch. $ autoreconf -f # Update the build system. By default ‘git log’ prints the most recent commit first, add the ‘--reverse’ option to see the changes chronologically. To see exactly what has been changed in the source code along with the commit message, add a ‘-p’ option to the ‘git log’. If you want to make changes in the code, have a look at *note Developing:: to get started easily. Be sure to commit your changes in a separate branch (keep your ‘master’ branch to follow the official repository) and re-run ‘autoreconf -f’ after the commit. If you intend to send your work to us, you can safely use your commit since it will be ultimately recorded in Gnuastro’s official history. If not, please upload your separate branch to a public hosting service, for example, Codeberg (https://codeberg.org), and link to it in your report/paper. Alternatively, run ‘make distcheck’ and upload the output ‘gnuastro-X.X.X.XXXX.tar.gz’ to a publicly accessible web page so your results can be considered scientific (reproducible) later. ---------- Footnotes ---------- (1) is defined for you to check if significant (for Gnuastro) updates are made in these repositories, since the last time you pulled from them. (2) In the GNU build system, ‘./configure’ will use the ‘Makefile.in’ files to create the necessary ‘Makefile’ files that are later read by ‘make’ to build the package. 3.3 Build and install ===================== This section is basically a longer explanation to the sequence of commands given in *note Quick start::. If you did not have any problems during the *note Quick start:: steps, you want to have all the programs of Gnuastro installed in your system, you do not want to change the executable names during or after installation, you have root access to install the programs in the default system wide directory, the Letter paper size of the print book is fine for you or as a summary you do not feel like going into the details when everything is working, you can safely skip this section. If you have any of the above problems or you want to understand the details for a better control over your build and install, read along. The dependencies which you will need prior to configuring, building and installing Gnuastro are explained in *note Dependencies::. The first three steps in *note Quick start:: need no extra explanation, so we will skip them and start with an explanation of Gnuastro specific configuration options and a discussion on the installation directory in *note Configuring::, followed by some smaller subsections: *note Tests::, *note A4 print book::, and *note Known issues:: which explains the solutions to known problems you might encounter in the installation steps and ways you can solve them. 3.3.1 Configuring ----------------- The ‘$ ./configure’ step is the most important step in the build and install process. All the required packages, libraries, headers and environment variables are checked in this step. The behaviors of make and make install can also be set through command line options to this command. The configure script accepts various arguments and options which enable the final user to highly customize whatever she is building. The options to configure are generally very similar to normal program options explained in *note Arguments and options::. Similar to all GNU programs, you can get a full list of the options along with a short explanation by running $ ./configure --help A complete explanation is also included in the ‘INSTALL’ file. Note that this file was written by the authors of GNU Autoconf (which builds the ‘configure’ script), therefore it is common for all programs which use the ‘$ ./configure’ script for building and installing, not just Gnuastro. Here we only discuss cases where you do not have super-user access to the system and if you want to change the executable names. But before that, a review of the options to configure that are particular to Gnuastro are discussed. 3.3.1.1 Gnuastro configure options .................................. Most of the options to configure (which are to do with building) are similar for every program which uses this script. Here the options that are particular to Gnuastro are discussed. The next topics explain the usage of other configure options which can be applied to any program using the GNU build system (through the configure script). ‘--enable-debug’ Compile/build Gnuastro with debugging information, no optimization and without shared libraries. In order to allow more efficient programs when using Gnuastro (after the installation), by default Gnuastro is built with a 3rd level (a very high level) optimization and no debugging information. By default, libraries are also built for static _and_ shared linking (see *note Linking::). However, when there are crashes or unexpected behavior, these three features can hinder the process of localizing the problem. This configuration option is identical to manually calling the configuration script with ‘CFLAGS="-g -O0" --disable-shared’. In the (rare) situations where you need to do your debugging on the shared libraries, do not use this option. Instead run the configure script by explicitly setting ‘CFLAGS’ like this: $ ./configure CFLAGS="-g -O0" ‘--enable-check-with-valgrind’ Do the ‘make check’ tests through Valgrind. Therefore, if any crashes or memory-related issues (segmentation faults in particular) occur in the tests, the output of Valgrind will also be put in the ‘tests/test-suite.log’ file without having to manually modify the check scripts. This option will also activate Gnuastro’s debug mode (see the ‘--enable-debug’ configure-time option described above). Valgrind is free software. It is a program for easy checking of memory-related issues in programs. It runs a program within its own controlled environment and can thus identify the exact line-number in the program’s source where a memory-related issue occurs. However, it can significantly slow-down the tests. So this option is only useful when a segmentation fault is found during ‘make check’. ‘--enable-progname’ Only build and install ‘progname’ along with any other program that is enabled in this fashion. ‘progname’ is the name of the executable without the ‘ast’, for example, ‘crop’ for Crop (with the executable name of ‘astcrop’). Note that by default all the programs will be installed. This option (and the ‘--disable-progname’ options) are only relevant when you do not want to install all the programs. Therefore, if this option is called for any of the programs in Gnuastro, any program which is not explicitly enabled will not be built or installed. ‘--disable-progname’ ‘--enable-progname=no’ Do not build or install the program named ‘progname’. This is very similar to the ‘--enable-progname’, but will build and install all the other programs except this one. *Note:* If some programs are enabled and some are disabled, it is equivalent to simply enabling those that were enabled. Listing the disabled programs is redundant. ‘--enable-gnulibcheck’ Enable checks on the GNU Portability Library (Gnulib). Gnulib is used by Gnuastro to enable users of non-GNU based operating systems (that do not use GNU C library or glibc) to compile and use the advanced features that this library provides. We make extensive use of such functions. If you give this option to ‘$ ./configure’, when you run ‘$ make check’, first the functions in Gnulib will be tested, then the Gnuastro executables. If your operating system does not support glibc or has an older version of it and you have problems in the build process (‘$ make’), you can give this flag to configure to see if the problem is caused by Gnulib not supporting your operating system or Gnuastro, see *note Known issues::. ‘--disable-guide-message’ ‘--enable-guide-message=no’ Do not print a guiding message during the GNU Build process of *note Quick start::. By default, after each step, a message is printed guiding the user what the next command should be. Therefore, after ‘./configure’, it will suggest running ‘make’. After ‘make’, it will suggest running ‘make check’ and so on. If Gnuastro is configured with this option, for example $ ./configure --disable-guide-message Then these messages will not be printed after any step (like most programs). For people who are not yet fully accustomed to this build system, these guidelines can be very useful and encouraging. However, if you find those messages annoying, use this option. ‘--without-libgit2’ Build Gnuastro without libgit2 (for including Git commit hashes in output files), see *note Optional dependencies::. libgit2 is an optional dependency, with this option, Gnuastro will ignore any possibly existing libgit2 that may already be on the system. ‘--without-libjpeg’ Build Gnuastro without libjpeg (for reading/writing to JPEG files), see *note Optional dependencies::. libjpeg is an optional dependency, with this option, Gnuastro will ignore any possibly existing libjpeg that may already be on the system. ‘--without-libtiff’ Build Gnuastro without libtiff (for reading/writing to TIFF files), see *note Optional dependencies::. libtiff is an optional dependency, with this option, Gnuastro will ignore any possibly existing libtiff that may already be on the system. ‘--without-python’ Don’t build the Python interface within Gnuastro’s dynamic library. This interface can be used for easy communication with Python wrappers (for example, the pyGnuastro package). However upon installing the pyGnuastro package from PyPI, the correct configuration of the Gnuastro Library(with the python interface) is already packaged with it. The Python interface is only necessary if you want to build pyGnuastro from source. For more on the interface functions, see *note Python interface::. The tests of some programs might depend on the outputs of the tests of other programs. for example, MakeProfiles is one the first programs to be tested when you run ‘$ make check’. MakeProfiles’ test outputs (FITS images) are inputs to many other programs (which in turn provide inputs for other programs). Therefore, if you do not install MakeProfiles for example, the tests for many the other programs will be skipped. To avoid this, in one run, you can install all the programs and run the tests but not install. If everything is working correctly, you can run configure again with only the programs you want. However, do not run the tests and directly install after building. 3.3.1.2 Installation directory .............................. One of the most commonly used options to ‘./configure’ is ‘--prefix’, it is used to define the directory that will host all the installed files (or the “prefix” in their final absolute file name). For example, when you are using a server and you do not have administrator or root access. In this example scenario, if you do not use the ‘--prefix’ option, you will not be able to install the built files and thus access them from anywhere without having to worry about where they are installed. However, once you prepare your startup file to look into the proper place (as discussed thoroughly below), you will be able to easily use this option and benefit from any software you want to install without having to ask the system administrators or install and use a different version of a software that is already installed on the server. The most basic way to run an executable is to explicitly write its full file name (including all the directory information) and run it. One example is running the configuration script with the ‘$ ./configure’ command (see *note Quick start::). By giving a specific directory (the current directory or ‘./’), we are explicitly telling the shell to look in the current directory for an executable file named ‘‘configure’’. Directly specifying the directory is thus useful for executables in the current (or nearby) directories. However, when the program (an executable file) is to be used a lot, specifying all those directories will become a significant burden. For example, the ‘ls’ executable lists the contents in a given directory and it is (usually) installed in the ‘/usr/bin/’ directory by the operating system maintainers. Therefore, if using the full address was the only way to access an executable, each time you wanted a listing of a directory, you would have to run the following command (which is very inconvenient, both in writing and in remembering the various directories). $ /usr/bin/ls To address this problem, we have the ‘PATH’ environment variable. To understand it better, we will start with a short introduction to the shell variables. Shell variable values are basically treated as strings of characters. For example, it does not matter if the value is a name (string of _alphabetic_ characters), or a number (string of _numeric_ characters), or both. You can define a variable and a value for it by running $ myvariable1=a_test_value $ myvariable2="a test value" As you see above, if the value contains white space characters, you have to put the whole value (including white space characters) in double quotes (<">). You can see the value it represents by running $ echo $myvariable1 $ echo $myvariable2 If a variable has no value or it was not defined, the last command will only print an empty line. A variable defined like this will be known as long as this shell or terminal is running. Other terminals will have no idea it existed. The main advantage of shell variables is that if they are exported(1), subsequent programs that are run within that shell can access their value. So by changing their value, you can change the “environment” of a program which uses them. The shell variables which are accessed by programs are therefore known as “environment variables”(2). You can see the full list of exported variables that your shell recognizes by running: $ printenv ‘HOME’ is one commonly used environment variable, it is any user’s (the one that is logged in) top directory. Try finding it in the command above. It is used so often that the shell has a special expansion (alternative) for it: ‘‘~’’. Whenever you see file names starting with the tilde sign, it actually represents the value to the ‘HOME’ environment variable, so ‘~/doc’ is the same as ‘$HOME/doc’. Another one of the most commonly used environment variables is ‘PATH’, it is a list of directories to search for executable names. Its value is a list of directories (separated by a colon, or ‘<:>’). When the address of the executable is not explicitly given (like ‘./configure’ above), the system will look for the executable in the directories specified by ‘PATH’. If you have a computer nearby, try running the following command to see which directories your system will look into when it is searching for executable (binary) files, one example is printed here (notice how ‘/usr/bin’, in the ‘ls’ example above, is one of the directories in ‘PATH’): $ echo $PATH /usr/local/sbin:/usr/local/bin:/usr/bin By default ‘PATH’ usually contains system-wide directories, which are readable (but not writable) by all users, like the above example. Therefore if you do not have root (or administrator) access, you need to add another directory to ‘PATH’ which you actually have write access to. The standard directory where you can keep installed files (not just executables) for your own user is the ‘~/.local/’ directory. The names of hidden files start with a ‘<.>’ (dot), so it will not show up in your common command-line listings, or on the graphical user interface. You can use any other directory, but this is the most recognized. The top installation directory will be used to keep all the package’s components: programs (executables), libraries, include (header) files, shared data (like manuals), or configuration files (see *note Review of library fundamentals:: for a thorough introduction to headers and linking). So it commonly has some of the following sub-directories for each class of installed components respectively: ‘bin/’, ‘lib/’, ‘include/’ ‘man/’, ‘share/’, ‘etc/’. Since the ‘PATH’ variable is only used for executables, you can add the ‘~/.local/bin’ directory (which keeps the executables/programs or more generally, “binary” files) to ‘PATH’ with the following command. As defined below, first the existing value of ‘PATH’ is used, then your given directory is added to its end and the combined value is put back in ‘PATH’ (run ‘‘$ echo $PATH’’ afterwards to check if it was added). $ PATH=$PATH:~/.local/bin Any executable that you installed in ‘~/.local/bin’ will now be usable without having to remember and write its full address. However, as soon as you leave/close your current terminal session, this modified ‘PATH’ variable will be forgotten. Adding the directories which contain executables to the ‘PATH’ environment variable each time you start a terminal is also very inconvenient and prone to errors. Fortunately, there are standard ‘startup files’ defined by your shell precisely for this (and other) purposes. There is a special startup file for every significant starting step: ‘/etc/profile’ and everything in ‘/etc/profile.d/’ These startup scripts are called when your whole system starts (for example, after you turn on your computer). Therefore you need administrator or root privileges to access or modify them. ‘~/.bash_profile’ If you are using (GNU) Bash as your shell, the commands in this file are run, when you log in to your account _through Bash_. Most commonly when you login through the virtual console (where there is no graphic user interface). ‘~/.bashrc’ If you are using (GNU) Bash as your shell, the commands here will be run each time you start a terminal and are already logged in. For example, when you open your terminal emulator in the graphic user interface. For security reasons, it is highly recommended to directly type in your ‘HOME’ directory value by hand in startup files instead of using variables. So in the following, let’s assume your user name is ‘‘name’’ (so ‘~’ may be replaced with ‘/home/name’). To add ‘~/.local/bin’ to your ‘PATH’ automatically on any startup file, you have to “export” the new value of ‘PATH’ in the startup file that is most relevant to you by adding this line: export PATH=$PATH:/home/name/.local/bin Now that you know your system will look into ‘~/.local/bin’ for executables, you can tell Gnuastro’s configure script to install everything in the top ‘~/.local’ directory using the ‘--prefix’ option. When you subsequently run ‘$ make install’, all the install-able files will be put in their respective directory under ‘~/.local/’ (the executables in ‘~/.local/bin’, the compiled library files in ‘~/.local/lib’, the library header files in ‘~/.local/include’ and so on, to learn more about these different files, please see *note Review of library fundamentals::). Note that tilde (‘<~>’) expansion will not happen if you put a ‘<=>’ between ‘--prefix’ and ‘~/.local’(3), so we have avoided the <=> character here which is optional in GNU-style options, see *note Options::. $ ./configure --prefix ~/.local You can install everything (including libraries like GSL, CFITSIO, or WCSLIB which are Gnuastro’s mandatory dependencies, see *note Mandatory dependencies::) locally by configuring them as above. However, recall that ‘PATH’ is only for executable files, not libraries and that libraries can also depend on other libraries. for example, WCSLIB depends on CFITSIO and Gnuastro needs both. Therefore, when you installed a library in a non-recognized directory, you have to guide the program that depends on them to look into the necessary library and header file directories. To do that, you have to define the ‘LDFLAGS’ and ‘CPPFLAGS’ environment variables respectively. This can be done while calling ‘./configure’ as shown below: $ ./configure LDFLAGS=-L/home/name/.local/lib \ CPPFLAGS=-I/home/name/.local/include \ --prefix ~/.local It can be annoying/buggy to do this when configuring every software that depends on such libraries. Hence, you can define these two variables in the most relevant startup file (discussed above). The convention on using these variables does not include a colon to separate values (as ‘PATH’-like variables do). They use white space characters and each value is prefixed with a compiler option(4). Note the ‘-L’ and ‘-I’ above (see *note Options::), for ‘-I’ see *note Headers::, and for ‘-L’, see *note Linking::. Therefore we have to keep the value in double quotation signs to keep the white space characters and adding the following two lines to the startup file of choice: export LDFLAGS="$LDFLAGS -L/home/name/.local/lib" export CPPFLAGS="$CPPFLAGS -I/home/name/.local/include" Dynamic libraries are linked to the executable every time you run a program that depends on them (see *note Linking:: to fully understand this important concept). Hence dynamic libraries also require a special path variable called ‘LD_LIBRARY_PATH’ (same formatting as ‘PATH’). To use programs that depend on these libraries, you need to add ‘~/.local/lib’ to your ‘LD_LIBRARY_PATH’ environment variable by adding the following line to the relevant start-up file: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/name/.local/lib If you also want to access the Info (see *note Info::) and man pages (see *note Man pages::) documentations add ‘~/.local/share/info’ and ‘~/.local/share/man’ to your ‘INFOPATH’(5) and ‘MANPATH’ environment variables respectively. A final note is that order matters in the directories that are searched for all the variables discussed above. In the examples above, the new directory was added after the system specified directories. So if the program, library or manuals are found in the system wide directories, the user directory is no longer searched. If you want to search your local installation first, put the new directory before the already existing list, like the example below. export LD_LIBRARY_PATH=/home/name/.local/lib:$LD_LIBRARY_PATH This is good when a library, for example, CFITSIO, is already present on the system, but the system-wide install was not configured with the correct configuration flags (see *note CFITSIO::), or you want to use a newer version and you do not have administrator or root access to update it on the whole system/server. If you update ‘LD_LIBRARY_PATH’ by placing ‘~/.local/lib’ first (like above), the linker will first find the CFITSIO you installed for yourself and link with it. It thus will never reach the system-wide installation. There are important security problems with using local installations first: all important system-wide executables and libraries (important executables like ‘ls’ and ‘cp’, or libraries like the C library) can be replaced by non-secure versions with the same file names and put in the customized directory (‘~/.local’ in this example). So if you choose to search in your customized directory first, please _be sure_ to keep it clean from executables or libraries with the same names as important system programs or libraries. *Summary:* When you are using a server which does not give you administrator/root access AND you would like to give priority to your own built programs and libraries, not the version that is (possibly already) present on the server, add these lines to your startup file. See above for which startup file is best for your case and for a detailed explanation on each. Do Not forget to replace ‘‘/YOUR-HOME-DIR’’ with your home directory (for example, ‘‘/home/your-id’’): export PATH="/YOUR-HOME-DIR/.local/bin:$PATH" export LDFLAGS="-L/YOUR-HOME-DIR/.local/lib $LDFLAGS" export MANPATH="/YOUR-HOME-DIR/.local/share/man/:$MANPATH" export CPPFLAGS="-I/YOUR-HOME-DIR/.local/include $CPPFLAGS" export INFOPATH="/YOUR-HOME-DIR/.local/share/info/:$INFOPATH" export LD_LIBRARY_PATH="/YOUR-HOME-DIR/.local/lib:$LD_LIBRARY_PATH" Afterwards, you just need to add an extra ‘--prefix=/YOUR-HOME-DIR/.local’ to the ‘./configure’ command of the software that you intend to install. Everything else will be the same as a standard build and install, see *note Quick start::. ---------- Footnotes ---------- (1) By running ‘$ export myvariable=a_test_value’ instead of the simpler case in the text (2) You can use shell variables for other actions too, for example, to temporarily keep some names or run loops on some files. (3) If you insist on using ‘<=>’, you can use ‘--prefix=$HOME/.local’. (4) These variables are ultimately used as options while building the programs. Therefore every value has be an option name followed be a value as discussed in *note Options::. (5) Info has the following convention: “If the value of ‘INFOPATH’ ends with a colon [or it is not defined] ..., the initial list of directories is constructed by appending the build-time default to the value of ‘INFOPATH’.” So when installing in a non-standard directory and if ‘INFOPATH’ was not initially defined, add a colon to the end of ‘INFOPATH’ as shown below. Otherwise Info will not be able to find system-wide installed documentation: ‘echo 'export INFOPATH=$INFOPATH:/home/name/.local/share/info:' >> ~/.bashrc’ Note that this is only an internal convention of Info: do not use it for other ‘*PATH’ variables. 3.3.1.3 Executable names ........................ At first sight, the names of the executables for each program might seem to be uncommonly long, for example, ‘astnoisechisel’ or ‘astcrop’. We could have chosen terse (and cryptic) names like most programs do. We chose this complete naming convention (something like the commands in TeX) so you do not have to spend too much time remembering what the name of a specific program was. Such complete names also enable you to easily search for the programs. To facilitate typing the names in, we suggest using the shell auto-complete. With this facility you can find the executable you want very easily. It is very similar to file name completion in the shell. For example, simply by typing the letters below (where <[TAB]> stands for the Tab key on your keyboard) $ ast[TAB][TAB] you will get the list of all the available executables that start with ‘ast’ in your ‘PATH’ environment variable directories. So, all the Gnuastro executables installed on your system will be listed. Typing the next letter for the specific program you want along with a Tab, will limit this list until you get to your desired program. In case all of this does not convince you and you still want to type short names, some suggestions are given below. You should have in mind though, that if you are writing a shell script that you might want to pass on to others, it is best to use the standard name because other users might not have adopted the same customization. The long names also serve as a form of documentation in such scripts. A similar reasoning can be given for option names in scripts: it is good practice to always use the long formats of the options in shell scripts, see *note Options::. The simplest solution is making a symbolic link to the actual executable. for example, let’s assume you want to type ‘ic’ to run Crop instead of ‘astcrop’. Assuming you installed Gnuastro executables in ‘/usr/local/bin’ (default) you can do this simply by running the following command as root: # ln -s /usr/local/bin/astcrop /usr/local/bin/ic In case you update Gnuastro and a new version of Crop is installed, the default executable name is the same, so your custom symbolic link still works. The installed executable names can also be set using options to ‘$ ./configure’, see *note Configuring::. GNU Autoconf (which configures Gnuastro for your particular system), allows the builder to change the name of programs with the three options ‘--program-prefix’, ‘--program-suffix’ and ‘--program-transform-name’. The first two are for adding a fixed prefix or suffix to all the programs that will be installed. This will actually make all the names longer! You can use it to add versions of program names to the programs in order to simultaneously have two executable versions of a program. The third configure option allows you to set the executable name at install time using the SED program. SED is a very useful ‘stream editor’. There are various resources on the internet to use it effectively. However, we should caution that using configure options will change the actual executable name of the installed program and on every re-install (an update for example), you have to also add this option to keep the old executable name updated. Also note that the documentation or configuration files do not change from their standard names either. For example, let’s assume that typing ‘ast’ on every invocation of every program is really annoying you! You can remove this prefix from all the executables at configure time by adding this option: $ ./configure --program-transform-name='s/ast/ /' 3.3.1.4 Configure and build in RAM .................................. Gnuastro’s configure and build process (the GNU build system) involves the creation, reading, and modification of a large number of files (input/output, or I/O). Therefore file I/O issues can directly affect the work of developers who need to configure and build Gnuastro numerous times. Some of these issues are listed below: • I/O will cause wear and tear on both the HDDs (mechanical failures) and SSDs (decreasing the lifetime). • Having the built files mixed with the source files can greatly affect backing up (synchronization) of source files (since it involves the management of a large number of small files that are regularly changed. Backup software can of course be configured to ignore the built files and directories. However, since the built files are mixed with the source files and can have a large variety, this will require a high level of customization. One solution to address both these problems is to use the tmpfs file system (https://en.wikipedia.org/wiki/Tmpfs). Any file in tmpfs is actually stored in the RAM (and possibly SWAP), not on HDDs or SSDs. The RAM is built for extensive and fast I/O. Therefore the large number of file I/Os associated with configuring and building will not harm the HDDs or SSDs. Due to the volatile nature of RAM, files in the tmpfs file-system will be permanently lost after a power-off. Since all configured and built files are derivative files (not files that have been directly written by hand) there is no problem in this and this feature can be considered as an automatic cleanup. The modern GNU C library (and thus the Linux kernel) defines the ‘/dev/shm’ directory for this purpose in the RAM (POSIX shared memory). To build in it, you can use the GNU build system’s ability to build in a separate directory (not necessarily in the source directory) as shown below. Just set ‘SRCDIR’ as the address of Gnuastro’s top source directory (for example, the unpacked tarball). $ mkdir /dev/shm/tmp-gnuastro-build $ cd /dev/shm/tmp-gnuastro-build $ SRCDIR/configure --srcdir=SRCDIR $ make Gnuastro comes with a script to simplify this process of configuring and building in a different directory (a “clean” build), for more see *note Separate build and source directories::. 3.3.2 Separate build and source directories ------------------------------------------- The simple steps of *note Quick start:: will mix the source and built files. This can cause inconvenience for developers or enthusiasts following the most recent work (see *note Version controlled source::). The current section is mainly focused on this later group of Gnuastro users. If you just install Gnuastro on major releases (following *note Announcements::), you can safely ignore this section. When it is necessary to keep the source (which is under version control), but not the derivative (built) files (after checking or installing), the best solution is to keep the source and the built files in separate directories. One application of this is already discussed in *note Configure and build in RAM::. To facilitate this process of configuring and building in a separate directory, Gnuastro comes with the ‘developer-build’ script. It is available in the top source directory and is _not_ installed. It will make a directory under a given top-level directory (given to ‘--top-build-dir’) and build Gnuastro in there directory. It thus keeps the source completely separated from the built files. For easy access to the built files, it also makes a symbolic link to the built directory in the top source files called ‘build’. When run without any options, default values will be used for its configuration. As with Gnuastro’s programs, you can inspect the default values with ‘-P’ (or ‘--printparams’, the output just looks a little different here). The default top-level build directory is ‘/dev/shm’: the shared memory directory in RAM on GNU/Linux systems as described in *note Configure and build in RAM::. Besides these, it also has some features to facilitate the job of developers or bleeding edge users like the ‘--debug’ option to do a fast build, with debug information, no optimization, and no shared libraries. Here is the full list of options you can feed to this script to configure its operations. *Not all Gnuastro’s common program behavior usable here:* ‘developer-build’ is just a non-installed script with a very limited scope as described above. It thus does not have all the common option behaviors or configuration files for example. *White space between option and value:* ‘developer-build’ does not accept an <=> sign between the options and their values. It also needs at least one character between the option and its value. Therefore ‘-n 4’ or ‘--numthreads 4’ are acceptable, while ‘-n4’, ‘-n=4’, or ‘--numthreads=4’ are not. Finally multiple short option names cannot be merged: for example, you can say ‘-c -n 4’, but unlike Gnuastro’s programs, ‘-cn4’ is not acceptable. *Reusable for other packages:* This script can be used in any software which is configured and built using the GNU Build System. Just copy it in the top source directory of that software and run it from there. ‘-b STR’ ‘--top-build-dir STR’ The top build directory to make a directory for the build. If this option is not called, the top build directory is ‘/dev/shm’ (only available in GNU/Linux operating systems, see *note Configure and build in RAM::). ‘-V’ ‘--version’ Print the version string of Gnuastro that will be used in the build. This string will be appended to the directory name containing the built files. ‘-a’ ‘--autoreconf’ Run ‘autoreconf -f’ before building the package. In Gnuastro, this is necessary when a new commit has been made to the project history. In Gnuastro’s build system, the Git description will be used as the version, see *note Version numbering:: and *note Synchronizing::. ‘-c’ ‘--clean’ Delete the contents of the build directory (clean it) before starting the configuration and building of this run. This is useful when you have recently pulled changes from the main Git repository, or committed a change yourself and ran ‘autoreconf -f’, see *note Synchronizing::. After running GNU Autoconf, the version will be updated and you need to do a clean build. ‘-d’ ‘--debug’ Build with debugging flags (for example, to use in GNU Debugger, also known as GDB, or Valgrind), disable optimization and also the building of shared libraries. Similar to running the configure script of below $ ./configure --enable-debug Besides all the debugging advantages of building with this option, it will also be significantly speed up the build (at the cost of slower built programs). So when you are testing something small or working on the build system itself, it will be much faster to test your work with this option. ‘-v’ ‘--valgrind’ Build all ‘make check’ tests within Valgrind. For more, see the description of ‘--enable-check-with-valgrind’ in *note Gnuastro configure options::. ‘-j INT’ ‘--jobs INT’ The maximum number of threads/jobs for Make to build at any moment. As the name suggests (Make has an identical option), the number given to this option is directly passed on to any call of Make with its ‘-j’ option. ‘-C’ ‘--check’ After finishing the build, also run ‘make check’. By default, ‘make check’ is not run because the developer usually has their own checks to work on (for example, defined in ‘tests/during-dev.sh’). ‘-i’ ‘--install’ After finishing the build, also run ‘make install’. ‘-D’ ‘--dist’ Run ‘make dist-lzip pdf’ to build a distribution tarball (in ‘.tar.lz’ format) and a PDF manual. This can be useful for archiving, or sending to colleagues who do not use Git for an easy build and manual. ‘-u STR’ ‘--upload STR’ Activate the ‘--dist’ (‘-D’) option, then use secure copy (‘scp’, part of the SSH tools) to copy the tarball and PDF to the ‘src’ and ‘pdf’ sub-directories of the specified server and its directory (value to this option). for example, ‘--upload my-server:dir’, will copy the tarball in the ‘dir/src’, and the PDF manual in ‘dir/pdf’ of ‘my-server’ server. It will then make a symbolic link in the top server directory to the tarball that is called ‘gnuastro-latest.tar.lz’. ‘-p’ ‘--publish’ Short for ‘--autoreconf --clean --debug --check --upload STR’. ‘--debug’ is added because it will greatly speed up the build. It will have no effect on the produced tarball. This is good when you have made a commit and are ready to publish it on your server (if nothing crashes). Recall that if any of the previous steps fail the script aborts. ‘-I’ ‘--install-archive’ Short for ‘--autoreconf --clean --check --install --dist’. This is useful when you actually want to install the commit you just made (if the build and checks succeed). It will also produce a distribution tarball and PDF manual for easy access to the installed tarball on your system at a later time. Ideally, Gnuastro’s Git version history makes it easy for a prepared system to revert back to a different point in history. But Gnuastro also needs to bootstrap files and also your collaborators might (usually do!) find it too much of a burden to do the bootstrapping themselves. So it is convenient to have a tarball and PDF manual of the version you have installed (and are using in your research) handily available. ‘-h’ ‘--help’ ‘-P’ ‘--printparams’ Print a description of this script along with all the options and their current values. 3.3.3 Tests ----------- After successfully building (compiling) the programs with the ‘$ make’ command you can check the installation before installing. To run the tests, run $ make check For every program some tests are designed to check some possible operations. Running the command above will run those tests and give you a final report. If everything is OK and you have built all the programs, all the tests should pass. In case any of the tests fail, please have a look at *note Known issues:: and if that still does not fix your problem, look that the ‘./tests/test-suite.log’ file to see if the source of the error is something particular to your system or more general. If you feel it is general, please contact us because it might be a bug. Note that the tests of some programs depend on the outputs of other program’s tests, so if you have not installed them they might be skipped or fail. Prior to releasing every distribution all these tests are checked. If you have a reasonably modern terminal, the outputs of the successful tests will be colored green and the failed ones will be colored red. These scripts can also act as a good set of examples for you to see how the programs are run. All the tests are in the ‘tests/’ directory. The tests for each program are shell scripts (ending with ‘.sh’) in a sub-directory of this directory with the same name as the program. See *note Test scripts:: for more detailed information about these scripts in case you want to inspect them. 3.3.4 A4 print book ------------------- The default print version of this book is provided in the letter paper size. If you would like to have the print version of this book on paper and you are living in a country which uses A4, then you can rebuild the book. The great thing about the GNU build system is that the book source code which is in Texinfo is also distributed with the program source code, enabling you to do such customization (hacking). In order to change the paper size, you will need to have GNU Texinfo installed. Open ‘doc/gnuastro.texi’ with any text editor. This is the source file that created this book. In the first few lines you will see this line: @c@afourpaper In Texinfo, a line is commented with ‘@c’. Therefore, un-comment this line by deleting the first two characters such that it changes to: @afourpaper Save the file and close it. You can now run the following command $ make pdf and the new PDF book will be available in ‘SRCdir/doc/gnuastro.pdf’. By changing the ‘pdf’ in ‘$ make pdf’ to ‘ps’ or ‘dvi’ you can have the book in those formats. Note that you can do this for any book that is in Texinfo format, they might not have ‘@afourpaper’ line, so you can add it close to the top of the Texinfo source file. 3.3.5 Known issues ------------------ Depending on your operating system and the version of the compiler you are using, you might confront some known problems during the configuration (‘$ ./configure’), compilation (‘$ make’) and tests (‘$ make check’). Here, their solutions are discussed. • ‘$ ./configure’: _Configure complains about not finding a library even though you have installed it._ The possible solution is based on how you installed the package: • From your distribution’s package manager. Most probably this is because your distribution has separated the header files of a library from the library parts. Please also install the ‘development’ packages for those libraries too. Just add a ‘-dev’ or ‘-devel’ to the end of the package name and re-run the package manager. This will not happen if you install the libraries from source. When installed from source, the headers are also installed. • From source. Then your linker is not looking where you installed the library. If you followed the instructions in this chapter, all the libraries will be installed in ‘/usr/local/lib’. So you have to tell your linker to look in this directory. To do so, configure Gnuastro like this: $ ./configure LDFLAGS="-L/usr/local/lib" If you want to use the libraries for your other programming projects, then export this environment variable in a start-up script similar to the case for ‘LD_LIBRARY_PATH’ explained below, also see *note Installation directory::. • ‘$ make’: _Complains about an unknown function on a non-GNU based operating system._ In this case, please run ‘$ ./configure’ with the ‘--enable-gnulibcheck’ option to see if the problem is from the GNU Portability Library (Gnulib) not supporting your system or if there is a problem in Gnuastro, see *note Gnuastro configure options::. If the problem is not in Gnulib and after all its tests you get the same complaint from ‘make’, then please contact us at ‘bug-gnuastro@gnu.org’. The cause is probably that a function that we have used is not supported by your operating system and we did not included it along with the source tar ball. If the function is available in Gnulib, it can be fixed immediately. • ‘$ make’: _Cannot find the headers (.h files) of installed libraries._ Your C pre-processor (CPP) is not looking in the right place. To fix this, configure Gnuastro with an additional ‘CPPFLAGS’ like below (assuming the library is installed in ‘/usr/local/include’: $ ./configure CPPFLAGS="-I/usr/local/include" If you want to use the libraries for your other programming projects, then export this environment variable in a start-up script similar to the case for ‘LD_LIBRARY_PATH’ explained below, also see *note Installation directory::. • ‘$ make check’: _Only the first couple of tests pass, all the rest fail or get skipped._ It is highly likely that when searching for shared libraries, your system does not look into the ‘/usr/local/lib’ directory (or wherever you installed Gnuastro or its dependencies). To make sure it is added to the list of directories, add the following line to your ‘~/.bashrc’ file and restart your terminal. Do Not forget to change ‘/usr/local/lib’ if the libraries are installed in other (non-standard) directories. export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib" You can also add more directories by using a colon ‘‘:’’ to separate them. See *note Installation directory:: and *note Linking:: to learn more on the ‘PATH’ variables and dynamic linking respectively. • ‘$ make check’: _The tests relying on external programs (for example, ‘fitstopdf.sh’ fail.)_ This is probably due to the fact that the version number of the external programs is too old for the tests we have preformed. Please update the program to a more recent version. for example, to create a PDF image, you will need GPL Ghostscript, but older versions do not work, we have successfully tested it on version 9.15. Older versions might cause a failure in the test result. • ‘$ make pdf’: _The PDF book cannot be made._ To make a PDF book, you need to have the GNU Texinfo program (like any program, the more recent the better). A working TeX program is also necessary, which you can get from Tex Live(1). • After ‘make check’: do not copy the programs’ executables to another (for example, the installation) directory manually (using ‘cp’, or ‘mv’ for example). In the default configuration(2), the program binaries need to link with Gnuastro’s shared library which is also built and installed with the programs. Therefore, to run successfully before and after installation, linking modifications need to be made by GNU Libtool at installation time. ‘make install’ does this internally, but a simple copy might give linking errors when you run it. If you need to copy the executables, you can do so after installation. • ‘$ make’ (when bootstrapping): After you have bootstrapped Gnuastro from the version-controlled source, you may confront the following (or a similar) error when converting images (for more on bootstrapping, see *note Bootstrapping::): convert: attempt to perform an operation not allowed by the security policy `gs' error/delegate.c/ExternalDelegateCommand/378. This error is a known issue(3) with ‘ImageMagick’ security policies in some operating systems. In short, ‘imagemagick’ uses Ghostscript for PDF, EPS, PS and XPS parsing. However, because some security vulnerabilities have been found in Ghostscript(4), by default, ImageMagick may be compiled without Ghostscript library. In such cases, if allowed, ImageMagick will fall back to the external ‘gs’ command instead of the library. But this may be disabled with the following (or a similar) lines in ‘/etc/ImageMagick-7/policy.xml’ (anything related to PDF, PS, or Ghostscript). To fix this problem, simply comment such lines (by placing a ‘’ at the end of that statement/line). If your problem was not listed above, please file a bug report (*note Report a bug::). ---------- Footnotes ---------- (1) (2) If you configure Gnuastro with the ‘--disable-shared’ option, then the libraries will be statically linked to the programs and this problem will not exist, see *note Linking::. (3) (4) 4 Common program behavior ************************* All the programs in Gnuastro share a set of common behavior mainly to do with user interaction to facilitate their usage and development. This includes how to feed input datasets into the programs, how to configure them, specifying the outputs, numerical data types, treating columns of information in tables, etc. This chapter is devoted to describing this common behavior in all programs. Because the behaviors discussed here are common to several programs, they are not repeated in each program’s description. In *note Command-line::, a very general description of running the programs on the command-line is discussed, like difference between arguments and options, as well as options that are common/shared between all programs. None of Gnuastro’s programs keep any internal configuration value (values for their different operational steps), they read their configuration primarily from the command-line, then from specific files in directory, user, or system-wide settings. Using these configuration files can greatly help reproducible and robust usage of Gnuastro, see *note Configuration files:: for more. It is not possible to always have the different options and configurations of each program on the top of your head. It is very natural to forget the options of a program, their current default values, or how it should be run and what it did. Gnuastro’s programs have multiple ways to help you refresh your memory in multiple levels (just an option name, a short description, or fast access to the relevant section of the manual. See *note Getting help:: for more for more on benefiting from this very convenient feature. Many of the programs use the multi-threaded character of modern CPUs, in *note Multi-threaded operations:: we will discuss how you can configure this behavior, along with some tips on making best use of them. In *note Numeric data types::, we will review the various types to store numbers in your datasets: setting the proper type for the usage context(1) can greatly improve the file size and also speed of reading, writing or processing them. We will then look into the recognized table formats in *note Tables:: and how large datasets are broken into tiles, or mesh grid in *note Tessellation::. Finally, we will take a look at the behavior regarding output files: *note Automatic output:: describes how the programs set a default name for their output when you do not give one explicitly (using ‘--output’). When the output is a FITS file, all the programs also store some very useful information in the header that is discussed in *note Output FITS files::. ---------- Footnotes ---------- (1) for example, if the values in your dataset can only be integers between 0 or 65000, store them in a unsigned 16-bit type, not 64-bit floating point type (which is the default in most systems). It takes four times less space and is much faster to process. 4.1 Command-line ================ Gnuastro’s programs are customized through the standard Unix-like command-line environment and GNU style command-line options. Both are very common in many Unix-like operating system programs. In *note Arguments and options:: we will start with the difference between arguments and options and elaborate on the GNU style of options. Afterwards, in *note Common options::, we will go into the detailed list of all the options that are common to all the programs in Gnuastro. 4.1.1 Arguments and options --------------------------- When you type a command on the command-line, it is passed onto the shell (a generic name for the program that manages the command-line) as a string of characters. As an example, see the “Invoking ProgramName” sections in this manual for some examples of commands with each program, like *note Invoking asttable::, *note Invoking astfits::, or *note Invoking aststatistics::. The shell then brakes up your string into separate _tokens_ or _words_ using any _metacharacters_ (like white-space, tab, ‘|’, ‘>’ or ‘;’) that are in the string. On the command-line, the first thing you usually enter is the name of the program you want to run. After that, you can specify two types of tokens: _arguments_ and _options_. In the GNU-style, arguments are those tokens that are not preceded by any hyphens (‘-’, see *note Arguments::). Here is one example: $ astcrop --center=53.162551,-27.789676 -w10/3600 --mode=wcs udf.fits In the example above, we are running *note Crop:: to crop a region of width 10 arc-seconds centered at the given RA and Dec from the input Hubble Ultra-Deep Field (UDF) FITS image. Here, the argument is ‘udf.fits’. Arguments are most commonly the input file names containing your data. Options start with one or two hyphens, followed by an identifier for the option (the option’s name, for example, ‘--center’, ‘-w’, ‘--mode’ in the example above) and its value (anything after the option name, or the optional <=> character). Through options you can configure how the program runs (interprets the data you provided). Arguments can be mandatory and optional and unlike options, they do not have any identifiers. Hence, when there multiple arguments, their order might also matter (for example, in ‘cp’ which is used for copying one file to another location). The outputs of ‘--usage’ and ‘--help’ shows which arguments are optional and which are mandatory, see *note --usage::. As their name suggests, _options_ can be considered to be optional and most of the time, you do not have to worry about what order you specify them in. When the order does matter, or the option can be invoked multiple times, it is explicitly mentioned in the “Invoking ProgramName” section of each program (this is a very important aspect of an option). If there is only one such character, you can use a backslash (‘\’) before it. If there are multiple, it might be easier to simply put your whole argument or option value inside of double quotes (‘"’). In such cases, everything inside the double quotes will be seen as one token or word. For example, let’s say you want to specify the header data unit (HDU) of your FITS file using a complex expression like ‘‘3; images(exposure > 100)’’. If you simply add these after the ‘--hdu’ (‘-h’) option, the programs in Gnuastro will read the value to the HDU option as ‘‘3’’ and run. Then, the shell will attempt to run a separate command ‘‘images(exposure > 100)’’ and complain about a syntax error. This is because the semicolon (‘;’) is an ‘end of command’ character in the shell. To solve this problem you can simply put double quotes around the whole string you want to pass to ‘--hdu’ as seen below: $ astcrop --hdu="3; images(exposure > 100)" image.fits 4.1.1.1 Arguments ................. In Gnuastro, arguments are almost exclusively used as the input data file names. Please consult the first few paragraph of the “Invoking ProgramName” section for each program for a description of what it expects as input, how many arguments, or input data, it accepts, or in what order. Everything particular about how a program treats arguments, is explained under the “Invoking ProgramName” section for that program. Generally, if there is a standard file name suffix for a particular format, that filename extension is checked to identify their format. In astronomy (and thus Gnuastro), FITS is the preferred format for inputs and outputs, so the focus here and throughout this book is on FITS. However, other formats are also accepted in special cases, for example, *note ConvertType:: also accepts JPEG or TIFF inputs, and writes JPEG, EPS or PDF files. The recognized suffixes for these formats are listed there. The list below shows the recognized suffixes for FITS data files in Gnuastro’s programs. However, in some scenarios FITS writers may not append a suffix to the file, or use a non-recognized suffix (not in the list below). Therefore if a FITS file is expected, but it does not have any of these suffixes, Gnuastro programs will look into the contents of the file and if it does conform with the FITS standard, the file will be used. Just note that checking about 5 characters at the end of a name string is much more efficient than opening and checking the contents of a file, so it is generally recommended to have a recognized FITS suffix. • ‘.fits’: The standard file name ending of a FITS image. • ‘.fit’: Alternative (3 character) FITS suffix. • ‘.fits.Z’: A FITS image compressed with ‘compress’. • ‘.fits.gz’: A FITS image compressed with GNU zip (gzip). • ‘.fits.fz’: A FITS image compressed with ‘fpack’. • ‘.imh’: IRAF format image file. Through out this book and in the command-line outputs, whenever we want to generalize all such astronomical data formats in a text place-holder, we will use ‘ASTRdata’, we will assume that the extension is also part of this name. Any file ending with these names is directly passed on to CFITSIO to read. Therefore you do not necessarily have to have these files on your computer, they can also be located on an FTP or HTTP server too, see the CFITSIO manual for more information. CFITSIO has its own error reporting techniques, if your input file(s) cannot be opened, or read, those errors will be printed prior to the final error by Gnuastro. 4.1.1.2 Options ............... Command-line options allow configuring the behavior of a program in all GNU/Linux applications for each particular execution on a particular input data. A single option can be called in two ways: _long_ or _short_. All options in Gnuastro accept the long format which has two hyphens an can have many characters (for example, ‘--hdu’). Short options only have one hyphen (<->) followed by one character (for example, ‘-h’). You can see some examples in the list of options in *note Common options:: or those for each program’s “Invoking ProgramName” section. Both formats are shown for those which support both. First the short is shown then the long. Usually, the short options are for when you are writing on the command-line and want to save keystrokes and time. The long options are good for shell scripts, where you are not usually rushing. Long options provide a level of documentation, since they are more descriptive and less cryptic. Usually after a few months of not running a program, the short options will be forgotten and reading your previously written script will not be easy. Some options need to be given a value if they are called and some do not. You can think of the latter type of options as on/off options. These two types of options can be distinguished using the output of the ‘--help’ and ‘--usage’ options, which are common to all GNU software, see *note Getting help::. In Gnuastro we use the following strings to specify when the option needs a value and what format that value should be in. More specific tests will be done in the program and if the values are out of range (for example, negative when the program only wants a positive value), an error will be reported. ‘INT’ The value is read as an integer. ‘FLT’ The value is read as a float. There are generally two types, depending on the context. If they are for fractions, they will have to be less than or equal to unity. ‘STR’ The value is read as a string of characters. for example, column names in a table, or HDU names in a multi-extension FITS file. Other examples include human-readable settings by some programs like the ‘--domain’ option of the Convolve program that can be either ‘spatial’ or ‘frequency’ (to specify the type of convolution, see *note Convolve::). ‘FITS or FITS/TXT’ The value should be a file (most commonly FITS). In many cases, other formats may also be accepted (for example, input tables can be FITS or plain-text, see *note Recognized table formats::). To specify a value in the short format, simply put the value after the option. Note that since the short options are only one character long, you do not have to type anything between the option and its value. For the long option you either need white space or an ‘=’ sign, for example, ‘-h2’, ‘-h 2’, ‘--hdu 2’ or ‘--hdu=2’ are all equivalent. The short format of on/off options (those that do not need values) can be concatenated for example, these two hypothetical sequences of options are equivalent: ‘-a -b -c4’ and ‘-abc4’. As an example, consider the following command to run Crop: $ astcrop -Dr3 --wwidth 3 catalog.txt --deccol=4 ASTRdata The ‘$’ is the shell prompt, ‘astcrop’ is the program name. There are two arguments (‘catalog.txt’ and ‘ASTRdata’) and four options, two of them given in short format (‘-D’, ‘-r’) and two in long format (‘--width’ and ‘--deccol’). Three of them require a value and one (‘-D’) is an on/off option. If an abbreviation is unique between all the options of a program, the long option names can be abbreviated. For example, instead of typing ‘--printparams’, typing ‘--print’ or maybe even ‘--pri’ will be enough, if there are conflicts, the program will warn you and show you the alternatives. Finally, if you want the argument parser to stop parsing arguments beyond a certain point, you can use two dashes: ‘--’. No text on the command-line beyond these two dashes will be parsed. Gnuastro has two types of options with values, those that only take a single value are the most common type. If these options are repeated or called more than once on the command-line, the value of the last time it was called will be assigned to it. This is very useful when you are testing/experimenting. Let’s say you want to make a small modification to one option value. You can simply type the option with a new value in the end of the command and see how the script works. If you are satisfied with the change, you can remove the original option for human readability. If the change was not satisfactory, you can remove the one you just added and not worry about forgetting the original value. Without this capability, you would have to memorize or save the original value somewhere else, run the command and then change the value again which is not at all convenient and is potentially cause lots of bugs. On the other hand, some options can be called multiple times in one run of a program and can thus take multiple values (for example, see the ‘--column’ option in *note Invoking asttable::. In these cases, the order of stored values is the same order that you specified on the command-line. Gnuastro’s programs do not keep any internal default values, so some options are mandatory and if they do not have a value, the program will complain and abort. Most programs have many such options and typing them by hand on every call is impractical. To facilitate the user experience, after parsing the command-line, Gnuastro’s programs read special configuration files to get the necessary values for the options you have not identified on the command-line. These configuration files are fully described in *note Configuration files::. *CAUTION:* In specifying a file address, if you want to use the shell’s tilde expansion (‘~’) to specify your home directory, leave at least one space between the option name and your value. for example, use ‘-o ~/test’, ‘--output ~/test’ or ‘--output= ~/test’. Calling them with ‘-o~/test’ or ‘--output=~/test’ will disable shell expansion. *CAUTION:* If you forget to specify a value for an option which requires one, and that option is the last one, Gnuastro will warn you. But if it is in the middle of the command, it will take the text of the next option or argument as the value which can cause undefined behavior. *NOTE:* In some contexts Gnuastro’s counting starts from 0 and in others 1. You can assume by default that counting starts from 1, if it starts from 0 for a special option, it will be explicitly mentioned. 4.1.2 Common options -------------------- To facilitate the job of the users and developers, all the programs in Gnuastro share some basic command-line options for the options that are common to many of the programs. The full list is classified as *note Input output options::, *note Processing options::, and *note Operating mode options::. In some programs, some of the options are irrelevant, but still recognized (you will not get an unrecognized option error, but the value is not used). Unless otherwise mentioned, these options are identical between all programs. 4.1.2.1 Input/Output options ............................ These options are to do with the input and outputs of the various programs. ‘--stdintimeout’ Number of micro-seconds to wait for writing/typing in the _first line_ of standard input from the command-line (see *note Standard input::). This is only relevant for programs that also accept input from the standard input, _and_ you want to manually write/type the contents on the terminal. When the standard input is already connected to a pipe (output of another program), there will not be any waiting (hence no timeout, thus making this option redundant). If the first line-break (for example, with the key) is not provided before the timeout, the program will abort with an error that no input was given. Note that this time interval is _only_ for the first line that you type. Once the first line is given, the program will assume that more data will come and accept rest of your inputs without any time limit. You need to specify the ending of the standard input, for example, by pressing after a new line. Note that any input you write/type into a program on the command-line with Standard input will be discarded (lost) once the program is finished. It is only recoverable manually from your command-line (where you actually typed) as long as the terminal is open. So only use this feature when you are sure that you do not need the dataset (or have a copy of it somewhere else). ‘-h STR/INT’ ‘--hdu=STR/INT’ The name or number of the desired Header Data Unit, or HDU, in the FITS image. A FITS file can store multiple HDUs or extensions, each with either an image or a table or nothing at all (only a header). Note that counting of the extensions starts from 0(zero), not 1(one). Counting from 0 is forced on us by CFITSIO which directly reads the value you give with this option (see *note CFITSIO::). When specifying the name, case is not important so ‘IMAGE’, ‘image’ or ‘ImAgE’ are equivalent. CFITSIO has many capabilities to help you find the extension you want, far beyond the simple extension number and name. See CFITSIO manual’s “HDU Location Specification” section for a very complete explanation with several examples. A ‘#’ is appended to the string you specify for the HDU(1) and the result is put in square brackets and appended to the FITS file name before calling CFITSIO to read the contents of the HDU for all the programs in Gnuastro. ‘-s STR’ ‘--searchin=STR’ Where to match/search for columns when the column identifier was not a number, see *note Selecting table columns::. The acceptable values are ‘name’, ‘unit’, or ‘comment’. This option is only relevant for programs that take table columns as input. ‘-I’ ‘--ignorecase’ Ignore case while matching/searching column meta-data (in the field specified by the ‘--searchin’). The FITS standard suggests to treat the column names as case insensitive, which is strongly recommended here also but is not enforced. This option is only relevant for programs that take table columns as input. This option is not relevant to *note BuildProgram::, hence in that program the short option ‘-I’ is used for include directories, not to ignore case. ‘-o STR’ ‘--output=STR’ The name of the output file or directory. With this option the automatic output names explained in *note Automatic output:: are ignored. ‘-T STR’ ‘--type=STR’ The data type of the output depending on the program context. This option is not applicable to some programs like *note Fits:: and will be ignored by them. The different acceptable values to this option are fully described in *note Numeric data types::. ‘-D’ ‘--dontdelete’ By default, if the output file already exists, Gnuastro’s programs will silently delete it and put their own outputs in its place. When this option is activated, if the output file already exists, the programs will not delete it, will warn you, and will abort. ‘-K’ ‘--keepinputdir’ In automatic output names, do not remove the directory information of the input file names. As explained in *note Automatic output::, if no output name is specified (with ‘--output’), then the output name will be made in the existing directory based on your input’s file name (ignoring the directory of the input). If you call this option, the directory information of the input will be kept and the automatically generated output name will be in the same directory as the input (usually with a suffix added). Note that his is only relevant if you are running the program in a different directory than the input data. ‘-t STR’ ‘--tableformat=STR’ The output table’s type. This option is only relevant when the output is a table and its format cannot be deduced from its filename. For example, if a name ending in ‘.fits’ was given to ‘--output’, then the program knows you want a FITS table. But there are two types of FITS tables: FITS ASCII, and FITS binary. Thus, with this option, the program is able to identify which type you want. The currently recognized values to this option are: ‘--wcslinearmatrix=STR’ Select the linear transformation matrix of the output’s WCS. This option only takes two values: ‘pc’ (for the ‘PCi_j’ formalism) and ‘cd’ (for ‘CDi_j’). For more on the different formalisms, please see Section 8.1 of the FITS standard(2), version 4.0. In short, in the ‘PCi_j’ formalism, we only keep the linear rotation matrix in these keywords and put the scaling factor (or the pixel scale in astronomical imaging) in the ‘CDELTi’ keywords. In the ‘CDi_j’ formalism, we blend the scaling into the rotation into a single matrix and keep that matrix in these FITS keywords. By default, Gnuastro uses the ‘PCi_j’ formalism, because it greatly helps in human readability of the raw keywords and is also the default mode of WCSLIB. However, in some circumstances it may be necessary to have the keywords in the CD format; for example, when you need to feed the outputs into other software that do not follow the full FITS standard and only recognize the ‘CDi_j’ formalism. ‘txt’ A plain text table with white-space characters between the columns (see *note Gnuastro text table format::). ‘fits-ascii’ A FITS ASCII table (see *note Recognized table formats::). ‘fits-binary’ A FITS binary table (see *note Recognized table formats::). ---------- Footnotes ---------- (1) With the ‘#’ character, CFITSIO will only read the desired HDU into your memory, not all the existing HDUs in the fits file. (2) 4.1.2.2 Processing options .......................... Some processing steps are common to several programs, so they are defined as common options to all programs. Note that this class of common options is thus necessarily less common between all the programs than those described in *note Input output options::, or *note Operating mode options:: options. Also, if they are irrelevant for a program, these options will not display in the ‘--help’ output of the program. ‘--minmapsize=INT’ The minimum size (in bytes) to memory-map a processing/internal array as a file (on the non-volatile HDD/SSD), and not use the system’s RAM. Before using this option, please read *note Memory management::. By default processing arrays will only be memory-mapped to a file when the RAM is full. With this option, you can force the memory-mapping, even when there is enough RAM. To ensure this default behavior, the pre-defined value to this option is an extremely large value (larger than any existing RAM). Please note that using a non-volatile file (in the HDD/SDD) instead of RAM can significantly increase the program’s running time, especially on HDDs (where read/write is slower). Also, note that the number of memory-mapped files that your kernel can support is limited. So when this option is necessary, it is best to give it values larger than 1 megabyte (‘--minmapsize=1000000’). You can then decrease it for a specific program’s invocation on a large input after you see memory issues arise (for example, an error, or the program not aborting and fully consuming your memory). If you see randomly named files remaining in this directory when the program finishes normally, please send us a bug report so we address the problem, see *note Report a bug::. *Limited number of memory-mapped files:* The operating system kernels usually support a limited number of memory-mapped files. Therefore never set ‘--minmapsize’ to zero or a small number of bytes (so too many files are created). If the kernel capacity is exceeded, the program will crash. ‘--quietmmap’ Do Not print any message when an array is stored in non-volatile memory (HDD/SSD) and not RAM, see the description of ‘--minmapsize’ (above) for more. ‘-Z INT[,INT[,...]]’ ‘--tilesize=[,INT[,...]]’ The size of regular tiles for tessellation, see *note Tessellation::. For each dimension an integer length (in units of data-elements or pixels) is necessary. If the number of input dimensions is different from the number of values given to this option, the program will stop with an error. Values must be separated by commas (<,>) and can also be fractions (for example, ‘4/2’). If they are fractions, the result must be an integer, otherwise an error will be printed. ‘-M INT[,INT[,...]]’ ‘--numchannels=INT[,INT[,...]]’ The number of channels for larger input tessellation, see *note Tessellation::. The number and types of acceptable values are similar to ‘--tilesize’. The only difference is that instead of length, the integers values given to this option represent the _number_ of channels, not their size. ‘-F FLT’ ‘--remainderfrac=FLT’ The fraction of remainder size along all dimensions to add to the first tile. See *note Tessellation:: for a complete description. This option is only relevant if ‘--tilesize’ is not exactly divisible by the input dataset’s size in a dimension. If the remainder size is larger than this fraction (compared to ‘--tilesize’), then the remainder size will be added with one regular tile size and divided between two tiles at the start and end of the given dimension. ‘--workoverch’ Ignore the channel borders for the high-level job of the given application. As a result, while the channel borders are respected in defining the small tiles (such that no tile will cross a channel border), the higher-level program operation will ignore them, see *note Tessellation::. ‘--checktiles’ Make a FITS file with the same dimensions as the input but each pixel is replaced with the ID of the tile that it is associated with. Note that the tile IDs start from 0. See *note Tessellation:: for more on Tiling an image in Gnuastro. ‘--oneelempertile’ When showing the tile values (for example, with ‘--checktiles’, or when the program’s output is tessellated) only use one element for each tile. This can be useful when only the relative values given to each tile compared to the rest are important or need to be checked. Since the tiles usually have a large number of pixels within them the output will be much smaller, and so easier to read, write, store, or send. Note that when the full input size in any dimension is not exactly divisible by the given ‘--tilesize’ in that dimension, the edge tile(s) will have different sizes (in units of the input’s size), see ‘--remainderfrac’. But with this option, all displayed values are going to have the (same) size of one data-element. Hence, in such cases, the image proportions are going to be slightly different with this option. If your input image is not exactly divisible by the tile size and you want one value per tile for some higher-level processing, all is not lost though. You can see how many pixels were within each tile (for example, to weight the values or discard some for later processing) with Gnuastro’s Statistics (see *note Statistics::) as shown below. The output FITS file is going to have two extensions, one with the median calculated on each tile and one with the number of elements that each tile covers. You can then use the ‘where’ operator in *note Arithmetic:: to set the values of all tiles that do not have the regular area to a blank value. $ aststatistics --median --number --ontile input.fits \ --oneelempertile --output=o.fits $ REGULAR_AREA=1600 # Check second extension of `o.fits'. $ astarithmetic o.fits o.fits $REGULAR_AREA ne nan where \ -h1 -h2 Note that if ‘input.fits’ also has blank values, then the median on tiles with blank values will also be ignored with the command above (which is desirable). ‘--inteponlyblank’ When values are to be interpolated, only change the values of the blank elements, keep the non-blank elements untouched. ‘--interpmetric=STR’ The metric to use for finding nearest neighbors. Currently it only accepts the Manhattan (or taxicab) metric with ‘manhattan’, or the radial metric with ‘radial’. The Manhattan distance between two points is defined with $|\Delta{x}|+|\Delta{y}|$. Thus the Manhattan metric has the advantage of being fast, but at the expense of being less accurate. The radial distance is the standard definition of distance in a Euclidean space: $\sqrt{\Delta{x}^2+\Delta{y}^2}$. It is accurate, but the multiplication and square root can slow down the processing. ‘--interpnumngb=INT’ The number of nearby non-blank neighbors to use for interpolation. 4.1.2.3 Operating mode options .............................. Another group of options that are common to all the programs in Gnuastro are those to do with the general operation of the programs. The explanation for those that are not only limited to Gnuastro but are common to all GNU programs start with (GNU option). ‘--’ (GNU option) Stop parsing the command-line. This option can be useful in scripts or when using the shell history. Suppose you have a long list of options, and want to see if removing some of them (to read from configuration files, see *note Configuration files::) can give a better result. If the ones you want to remove are the last ones on the command-line, you do not have to delete them, you can just add ‘--’ before them and if you do not get what you want, you can remove the ‘--’ and get the same initial result. ‘--usage’ (GNU option) Only print the options and arguments and abort. This is very useful for when you know the what the options do, and have just forgot their long/short identifiers, see *note --usage::. ‘-?’ ‘--help’ (GNU option) Print all options with an explanation and abort. Adding this option will print all the options in their short and long formats, also displaying which ones need a value if they are called (with an ‘=’ after the long format followed by a string specifying the format, see *note Options::). A short explanation is also given for what the option is for. The program will quit immediately after the message is printed and will not do any form of processing, see *note --help::. ‘-V’ ‘--version’ (GNU option) Print a short message, showing the full name, version, copyright information and program authors and abort. On the first line, it will print the official name (not executable name) and version number of the program. Following this is a blank line and a copyright information. The program will not run. ‘-q’ ‘--quiet’ Do Not report steps. All the programs in Gnuastro that have multiple major steps will report their steps for you to follow while they are operating. If you do not want to see these reports, you can call this option and only error/warning messages will be printed. If the steps are done very fast (depending on the properties of your input) disabling these reports will also decrease running time. ‘--cite’ Print all necessary information to cite and acknowledge Gnuastro in your published papers. With this option, the programs will print the BibTeX entry to include in your paper for Gnuastro in general, and the particular program’s paper (if that program comes with a separate paper). It will also print the necessary acknowledgment statement to add in the respective section of your paper and it will abort. For a more complete explanation, please see *note Acknowledgments::. Citations and acknowledgments are vital for the continued work on Gnuastro. Gnuastro started, and is continued, based on separate research projects. So if you find any of the tools offered in Gnuastro to be useful in your research, please use the output of this command to cite and acknowledge the program (and Gnuastro) in your research paper. Thank you. Gnuastro is still new, there is no separate paper only devoted to Gnuastro yet. Therefore currently the paper to cite for Gnuastro is the paper for NoiseChisel which is the first published paper introducing Gnuastro to the astronomical community. Upon reaching a certain point, a paper completely devoted to describing Gnuastro’s many functionalities will be published, see *note GNU Astronomy Utilities 1.0::. ‘-P’ ‘--printparams’ With this option, Gnuastro’s programs will read your command-line options and all the configuration files. If there is no problem (like a missing parameter or a value in the wrong format or range) and immediately before actually running, the programs will print the full list of option names, values and descriptions, sorted and grouped by context and abort. They will also report the version number, the date they were configured on your system and the time they were reported. As an example, you can give your full command-line options and even the input and output file names and finally just add ‘-P’ to check if all the parameters are finely set. If everything is OK, you can just run the same command (easily retrieved from the shell history, with the top arrow key) and simply remove the last two characters that showed this option. No program will actually start its processing when this option is called. The otherwise mandatory arguments for each program (for example, input image or catalog files) are no longer required when you call this option. ‘--config=STR’ Parse ‘STR’ as a configuration file name, immediately when this option is confronted (see *note Configuration files::). The ‘--config’ option can be called multiple times in one run of any Gnuastro program on the command-line or in the configuration files. In any case, it will be immediately read (before parsing the rest of the options on the command-line, or lines in a configuration file). If the given file does not exist or cannot be read for any reason, the program will print a warning and continue its processing. The warning can be suppressed with ‘--quiet’. Note that by definition, options on the command-line still take precedence over those in any configuration file, including the file(s) given to this option if they are called before it. Also see ‘--lastconfig’ and ‘--onlyversion’ on how this option can be used for reproducible results. You can use ‘--checkconfig’ (below) to check/confirm the parsing of configuration files. ‘--checkconfig’ Print options and their values, within the command-line or configuration files, as they are parsed (see *note Configuration file precedence::). If an option has already been set, or is ignored by the program, this option will also inform you with special values like ‘--ALREADY-SET--’. Only options that are parsed after this option are printed, so to see the parsing of all input options, it is recommended to put this option immediately after the program name before any other options. This is a very good option to confirm where the value of each option is has been defined in scenarios where there are multiple configuration files (for debugging). ‘-S’ ‘--setdirconf’ Update the current directory configuration file for the Gnuastro program and quit. The full set of command-line and configuration file options will be parsed and options with a value will be written in the current directory configuration file for this program (see *note Configuration files::). If the configuration file or its directory does not exist, it will be created. If a configuration file exists it will be replaced (after it, and all other configuration files have been read). In any case, the program will not run. This is the recommended method(1) to edit/set the configuration file for all future calls to Gnuastro’s programs. It will internally check if your values are in the correct range and type and save them according to the configuration file format, see *note Configuration file format::. So if there are unreasonable values to some options, the program will notify you and abort before writing the final configuration file. When this option is called, the otherwise mandatory arguments, for example input image or catalog file(s), are no longer mandatory (since the program will not run). ‘-U’ ‘--setusrconf’ Update the user configuration file and quit (see *note Configuration files::). See explanation under ‘--setdirconf’ for more details. ‘--lastconfig’ This is the last configuration file that must be read. When this option is confronted in any stage of reading the options (on the command-line or in a configuration file), no other configuration file will be parsed, see *note Configuration file precedence:: and *note Current directory and User wide::. Like all on/off options, on the command-line, this option does not take any values. But in a configuration file, it takes the values of ‘0’ or ‘1’, see *note Configuration file format::. If it is present in a configuration file with a value of ‘0’, then all later occurrences of this option will be ignored. ‘--onlyversion=STR’ Only run the program if Gnuastro’s version is exactly equal to ‘STR’ (see *note Version numbering::). Note that it is not compared as a number, but as a string of characters, so ‘0’, or ‘0.0’ and ‘0.00’ are different. If the running Gnuastro version is different, then this option will report an error and abort as soon as it is confronted on the command-line or in a configuration file. If the running Gnuastro version is the same as ‘STR’, then the program will run as if this option was not called. This is useful if you want your results to be exactly reproducible and not mistakenly run with an updated/newer or older version of the program. Besides internal algorithmic/behavior changes in programs, the existence of options or their names might change between versions (especially in these earlier versions of Gnuastro). Hence, when using this option (probably in a script or in a configuration file), be sure to call it before other options. The benefit is that, when the version differs, the other options will not be parsed and you, or your collaborators/users, will not get errors saying an option in your configuration does not exist in the running version of the program. Here is one example of how this option can be used in conjunction with the ‘--lastconfig’ option. Let’s assume that you were satisfied with the results of this command: ‘astnoisechisel image.fits --snquant=0.95’ (along with various options set in various configuration files). You can save the state of NoiseChisel and reproduce that exact result on ‘image.fits’ later by following these steps (the extra spaces, and <\>, are only for easy readability, if you want to try it out, only one space between each token is enough). $ echo "onlyversion X.XX" > reproducible.conf $ echo "lastconfig 1" >> reproducible.conf $ astnoisechisel image.fits --snquant=0.95 -P \ >> reproducible.conf ‘--onlyversion’ was available from Gnuastro 0.0, so putting it immediately at the start of a configuration file will ensure that later, you (or others using different version) will not get a non-recognized option error in case an option was added/removed. ‘--lastconfig’ will inform the installed NoiseChisel to not parse any other configuration files. This is done because we do not want the user’s user-wide or system wide option values affecting our results. Finally, with the third command, which has a ‘-P’ (short for ‘--printparams’), NoiseChisel will print all the option values visible to it (in all the configuration files) and the shell will append them to ‘reproduce.conf’. Hence, you do not have to worry about remembering the (possibly) different options in the different configuration files. Afterwards, if you run NoiseChisel as shown below (telling it to read this configuration file with the ‘--config’ option). You can be sure that there will either be an error (for version mismatch) or it will produce exactly the same result that you got before. $ astnoisechisel --config=reproducible.conf ‘--log’ Some programs can generate extra information about their outputs in a log file. When this option is called in those programs, the log file will also be printed. If the program does not generate a log file, this option is ignored. *‘--log’ is not thread-safe*: The log file usually has a fixed name. Therefore if two simultaneous calls (with ‘--log’) of a program are made in the same directory, the program will try to write to he same file. This will cause problems like unreasonable log file, undefined behavior, or a crash. ‘-N INT’ ‘--numthreads=INT’ Use ‘INT’ CPU threads when running a Gnuastro program (see *note Multi-threaded operations::). If the value is zero (‘0’), or this option is not given on the command-line or any configuration file, the value will be determined at run-time: the maximum number of threads available to the system when you run a Gnuastro program. Note that multi-threaded programming is only relevant to some programs. In others, this option will be ignored. ---------- Footnotes ---------- (1) Alternatively, you can use your favorite text editor. 4.1.3 Shell TAB completion (highly customized) ---------------------------------------------- *Under development:* Gnuastro’s TAB completion in Bash already greatly improves usage of Gnuastro on the command-line, but still under development and not yet complete. If you are interested to try it out, please go ahead and activate it (as described below), we encourage this. But please have in mind that there are known issues(1) and you may find new issues. If you do, please get in touch with us as described in *note Report a bug::. TAB completion is currently only implemented in the following programs: Arithmetic, BuildProgram, ConvertType, Convolve, CosmicCalculator, Crop, Fits and Table. For progress on this task, please see Task 15799(2). Bash provides a built-in feature called _programmable completion_(3) to help increase interactive workflow efficiency and minimize the number of key-strokes _and_ the need to memorize things. It is also known as TAB completion, bash completion, auto-completion, or word completion. Completion is activated by pressing <[TAB]> while you are typing a command. For file arguments this is the default behavior already and you have probably used it a lot with any command-line program. Besides this simple/default mode, Bash also enables a high level of customization features for its completion. These features have been extensively used in Gnuastro to improve your work efficiency(4). for example, if you are running ‘asttable’ (which only accepts files containing a table), and you press <[TAB]>, it will only suggest files containing tables. As another example, if an option needs image HDUs within a FITS file, pressing <[TAB]> will only suggest the image HDUs (and not other possibly existing HDUs that contain tables, or just metadata). Just note that the file name has to be already given on the command-line before reaching such options (that look into the contents of a file). But TAB completion is not limited to file types or contents. Arguments/Options that take certain fixed string values will directly suggest those strings with TAB, and completely ignore the file structure (for example, spectral line names in *note Invoking astcosmiccal::)! As another example, the option ‘--numthreads’ option (to specify the number of threads to use by the program), will find the number of available threads on the system, and suggest the possible numbers with a TAB! To activate Gnuastro’s custom TAB completion in Bash, you need to put the following line in one of your Bash startup files (for example, ‘~/.bashrc’). If you installed Gnuastro using the steps of *note Quick start::, you should have already done this (the command just after ‘sudo make install’). For a list of (and discussion on) Bash startup files and installation directories see *note Installation directory::. Of course, if Gnuastro was installed in a custom location, replace the ‘‘/usr/local’’ part of the line below to the value that was given to ‘--prefix’ during Gnuastro’s configuration(5). # Enable Gnuastro's TAB completion source /usr/local/share/gnuastro/completion.bash After adding the line above in a Bash startup file, TAB completion will always be activated in any new terminal. To see if it has been activated, try it out with ‘asttable [TAB][TAB]’ and ‘astarithmetic [TAB][TAB]’ in a directory that contains tables and images. The first will only suggest the files with a table, and the second, only those with an image. *TAB completion only works with long option names:* As described above, short options are much more complex to generalize, therefore TAB completion is only available for long options. But do not worry! TAB completion also involves option names, so if you just type ‘--a[TAB][TAB]’, you will get the list of options that start with an ‘--a’. Therefore as a side-effect of TAB completion, your commands will be far more human-readable with minimal key strokes. ---------- Footnotes ---------- (1) (2) (3) (4) To learn how Gnuastro implements TAB completion in Bash, see *note Bash programmable completion::. (5) In case you do not know the installation directory of Gnuastro on your system, you can find out with this command: ‘which astfits | sed -e"s|/bin/astfits||"’ 4.1.4 Standard input -------------------- The most common way to feed the primary/first input dataset into a program is to give its filename as an argument (discussed in *note Arguments::). When you want to run a series of programs in sequence, this means that each will have to keep the output of each program in a separate file and re-type that file’s name in the next command. This can be very slow and frustrating (mis-typing a file’s name). To solve the problem, the founders of Unix defined pipes to directly feed the output of one program (its “Standard output” stream) into the “standard input” of a next program. This removes the need to make temporary files between separate processes and became one of the best demonstrations of the Unix-way, or Unix philosophy. Every program has three streams identifying where it reads/writes non-file inputs/outputs: _Standard input_, _Standard output_, and _Standard error_. When a program is called alone, all three are directed to the terminal that you are using. If it needs an input, it will prompt you for one and you can type it in. Or, it prints its results in the terminal for you to see. For example, say you have a FITS table/catalog containing the B and V band magnitudes (‘MAG_B’ and ‘MAG_V’ columns) of a selection of galaxies along with many other columns. If you want to see only these two columns in your terminal, can use Gnuastro’s *note Table:: program like below: $ asttable cat.fits -cMAG_B,MAG_V Through the Unix pipe mechanism, when the shell confronts the pipe character (<|>), it connects the standard output of the program before the pipe, to the standard input of the program after it. So it is literally a “pipe”: everything that you would see printed by the first program on the command (without any pipe), is now passed to the second program (and not seen by you). To continue the previous example, let’s say you want to see the B-V color. To do this, you can pipe Table’s output to AWK (a wonderful tool for processing things like plain text tables): $ asttable cat.fits -cMAG_B,MAG_V | awk '{print $1-$2}' But understanding the distribution by visually seeing all the numbers under each other is not too useful! You can therefore feed this single column information into *note Statistics:: to give you a general feeling of the distribution with the same command: $ asttable cat.fits -cMAG_B,MAG_V | awk '{print $1-$2}' | aststatistics Gnuastro’s programs that accept input from standard input, only look into the Standard input stream if there is no first argument. In other words, arguments take precedence over Standard input. When no argument is provided, the programs check if the standard input stream is already full or not (output from another program is waiting to be used). If data is present in the standard input stream, it is used. When the standard input is empty, the program will wait ‘--stdintimeout’ micro-seconds for you to manually enter the first line (ending with a new-line character, or the key, see *note Input output options::). If it detects the first line in this time, there is no more time limit, and you can manually write/type all the lines for as long as it takes. To inform the program that Standard input has finished, press after a new line. If the program does not catch the first line before the time-out finishes, it will abort with an error saying that no input was provided. *Manual input in Standard input is discarded:* Be careful that when you manually fill the Standard input, the data will be discarded once the program finishes and reproducing the result will be impossible. Therefore this form of providing input is only good for temporary tests. *Standard input currently only for plain text:* Currently Standard input only works for plain text inputs like the example above. We will later allow FITS files into the programs through standard input also. 4.2 Configuration files ======================= Each program needs a certain number of parameters to run. Supplying all the necessary parameters each time you run the program is very frustrating and prone to errors. Therefore all the programs read the values for the necessary options you have not given in the command line from one of several plain text files (which you can view and edit with any text editor). These files are known as configuration files and are usually kept in a directory named ‘etc/’ according to the file system hierarchy standard(1). The thing to have in mind is that none of the programs in Gnuastro keep any internal default value. All the values must either be stored in one of the configuration files or explicitly called in the command-line. In case the necessary parameters are not given through any of these methods, the program will print a missing option error and abort. The only exception to this is ‘--numthreads’, whose default value is determined at run-time using the number of threads available to your system, see *note Multi-threaded operations::. Of course, you can still provide a default value for the number of threads at any of the levels below, but if you do not, the program will not abort. Also note that through automatic output name generation, the value to the ‘--output’ option is also not mandatory on the command-line or in the configuration files for all programs which do not rely on that value as an input(2), see *note Automatic output::. ---------- Footnotes ---------- (1) (2) One example of a program which uses the value given to ‘--output’ as an input is ConvertType, this value specifies the type of the output through the value to ‘--output’, see *note Invoking astconvertt::. 4.2.1 Configuration file format ------------------------------- The configuration files for each program have the standard program executable name with a ‘‘.conf’’ suffix. When you download the source code, you can find them in the same directory as the source code of each program, see *note Program source::. Any line in the configuration file whose first non-white character is a <#> is considered to be a comment and is ignored. An empty line is also similarly ignored. The long name of the option should be used as an identifier. The parameter name and parameter value have to be separated by any number of ‘white-space’ characters: space, tab or vertical tab. By default several space characters are used. If the value of an option has space characters (most commonly for the ‘hdu’ option), then the full value can be enclosed in double quotation signs (<">, similar to the example in *note Arguments and options::). If it is an option without a value in the ‘--help’ output (on/off option, see *note Options::), then the value should be ‘1’ if it is to be ‘on’ and ‘0’ otherwise. In each non-commented and non-blank line, any text after the first two words (option identifier and value) is ignored. If an option identifier is not recognized in the configuration file, the name of the file, the line number of the unrecognized option, and the unrecognized identifier name will be reported and the program will abort. If a parameter is repeated more more than once in the configuration files, accepts only one value, and is not set on the command-line, then only the first value will be used, the rest will be ignored. You can build or edit any of the directories and the configuration files yourself using any text editor. However, it is recommended to use the ‘--setdirconf’ and ‘--setusrconf’ options to set default values for the current directory or this user, see *note Operating mode options::. With these options, the values you give will be checked before writing in the configuration file. They will also print a set of commented lines guiding the reader and will also classify the options based on their context and write them in their logical order to be more understandable. 4.2.2 Configuration file precedence ----------------------------------- The option values in all the programs of Gnuastro will be filled in the following order. If an option only takes one value which is given in an earlier step, any value for that option in a later step will be ignored. Note that if the ‘lastconfig’ option is specified in any step below, no other configuration files will be parsed (see *note Operating mode options::). 1. Command-line options, for a particular run of ProgramName. 2. ‘.gnuastro/astprogname.conf’ is parsed by ProgramName in the current directory. 3. ‘.gnuastro/gnuastro.conf’ is parsed by all Gnuastro programs in the current directory. 4. ‘$HOME/.local/etc/astprogname.conf’ is parsed by ProgramName in the user’s home directory (see *note Current directory and User wide::). 5. ‘$HOME/.local/etc/gnuastro.conf’ is parsed by all Gnuastro programs in the user’s home directory (see *note Current directory and User wide::). 6. ‘prefix/etc/astprogname.conf’ is parsed by ProgramName in the system-wide installation directory (see *note System wide:: for ‘prefix’). 7. ‘prefix/etc/gnuastro.conf’ is parsed by all Gnuastro programs in the system-wide installation directory (see *note System wide:: for ‘prefix’). The basic idea behind setting this progressive state of checking for parameter values is that separate users of a computer or separate folders in a user’s file system might need different values for some parameters. *Checking the order:* You can confirm/check the order of parsing configuration files using the ‘--checkconfig’ option with any Gnuastro program, see *note Operating mode options::. Just be sure to place this option immediately after the program name, before any other option. As you see above, there can also be a configuration file containing the common options in all the programs: ‘gnuastro.conf’ (see *note Common options::). If options specific to one program are specified in this file, there will be unrecognized option errors, or unexpected behavior if the option has different behavior in another program. On the other hand, there is no problem with ‘astprogname.conf’ containing common options(1). *Manipulating the order:* You can manipulate this order or add new files with the following two options which are fully described in *note Operating mode options::: ‘--config’ Allows you to define any file to be parsed as a configuration file on the command-line or within the any other configuration file. Recall that the file given to ‘--config’ is parsed immediately when this option is confronted (on the command-line or in a configuration file). ‘--lastconfig’ Allows you to stop the parsing of subsequent configuration files. Note that if this option is given in a configuration file, it will be fully read, so its position in the configuration does not matter (unlike ‘--config’). One example of benefiting from these configuration files can be this: raw telescope images usually have their main image extension in the second FITS extension, while processed FITS images usually only have one extension. If your system-wide default input extension is 0 (the first), then when you want to work with the former group of data you have to explicitly mention it to the programs every time. With this progressive state of default values to check, you can set different default values for the different directories that you would like to run Gnuastro in for your different purposes, so you will not have to worry about this issue any more. The same can be said about the ‘gnuastro.conf’ files: by specifying a behavior in this single file, all Gnuastro programs in the respective directory, user, or system-wide steps will behave similarly. for example, to keep the input’s directory when no specific output is given (see *note Automatic output::), or to not delete an existing file if it has the same name as a given output (see *note Input output options::). ---------- Footnotes ---------- (1) As an example, the ‘--setdirconf’ and ‘--setusrconf’ options will also write the common options they have read in their produced ‘astprogname.conf’. 4.2.3 Current directory and User wide ------------------------------------- For the current (local) and user-wide directories, the configuration files are stored in the hidden sub-directories named ‘.gnuastro/’ and ‘$HOME/.local/etc/’ respectively. Unless you have changed it, the ‘$HOME’ environment variable should point to your home directory. You can check it by running ‘$ echo $HOME’. Each time you run any of the programs in Gnuastro, this environment variable is read and placed in the above address. So if you suddenly see that your home configuration files are not being read, probably you (or some other program) has changed the value of this environment variable. Although it might cause confusions like above, this dependence on the ‘HOME’ environment variable enables you to temporarily use a different directory as your home directory. This can come in handy in complicated situations. To set the user or current directory configuration files based on your command-line input, you can use the ‘--setdirconf’ or ‘--setusrconf’, see *note Operating mode options::. 4.2.4 System wide ----------------- When Gnuastro is installed, the configuration files that are shipped with the distribution are copied into the (possibly system wide) ‘prefix/etc/’ directory. For more details on ‘prefix’, see *note Installation directory:: (by default it is: ‘/usr/local’). This directory is the final place (with the lowest priority) that the programs in Gnuastro will check to retrieve parameter values. If you remove an option and its value from the system wide configuration files, you either have to specify it in more immediate configuration files or set it each time in the command-line. Recall that none of the programs in Gnuastro keep any internal default values and will abort if they do not find a value for the necessary parameters (except the number of threads and output file name). So even though you might never expect to use an optional option, it safe to have it available in this system-wide configuration file even if you do not intend to use it frequently. Note that in case you install Gnuastro from your distribution’s repositories, ‘prefix’ will either be set to ‘/’ (the root directory) or ‘/usr’, so you can find the system wide configuration variables in ‘/etc/’ or ‘/usr/etc/’. The prefix of ‘/usr/local/’ is conventionally used for programs you install from source by yourself as in *note Quick start::. 4.3 Getting help ================ Probably the first time you read this book, it is either in the PDF or HTML formats. These two formats are very convenient for when you are not actually working, but when you are only reading. Later on, when you start to use the programs and you are deep in the middle of your work, some of the details will inevitably be forgotten. Going to find the PDF file (printed or digital) or the HTML web page is a major distraction. GNU software have a very unique set of tools for aiding your memory on the command-line, where you are working, depending how much of it you need to remember. In the past, such command-line help was known as “online” help, because they were literally provided to you ‘on’ the command ‘line’. However, nowadays the word “online” refers to something on the internet, so that term will not be used. With this type of help, you can resume your exciting research without taking your hands off the keyboard. Another major advantage of such command-line based help routines is that they are installed with the software in your computer, therefore they are always in sync with the executable you are actually running. Three of them are actually part of the executable. You do not have to worry about the version of the book or program. If you rely on external help (a PDF in your personal print or digital archive or HTML from the official web page) you have to check to see if their versions fit with your installed program. If you only need to remember the short or long names of the options, ‘--usage’ is advised. If it is what the options do, then ‘--help’ is a great tool. Man pages are also provided for those who are use to this older system of documentation. This full book is also available to you on the command-line in Info format. If none of these seems to resolve the problems, there is a mailing list which enables you to get in touch with experienced Gnuastro users. In the subsections below each of these methods are reviewed. 4.3.1 ‘--usage’ --------------- If you give this option, the program will not run. It will only print a very concise message showing the options and arguments. Everything within square brackets (‘[]’) is optional. for example, here are the first and last two lines of Crop’s ‘--usage’ is shown: $ astcrop --usage Usage: astcrop [-Do?IPqSVW] [-d INT] [-h INT] [-r INT] [-w INT] [-x INT] [-y INT] [-c INT] [-p STR] [-N INT] [--deccol=INT] .... [--setusrconf] [--usage] [--version] [--wcsmode] [ASCIIcatalog] FITSimage(s).fits There are no explanations on the options, just their short and long names shown separately. After the program name, the short format of all the options that do not require a value (on/off options) is displayed. Those that do require a value then follow in separate brackets, each displaying the format of the input they want, see *note Options::. Since all options are optional, they are shown in square brackets, but arguments can also be optional. for example, in this example, a catalog name is optional and is only required in some modes. This is a standard method of displaying optional arguments for all GNU software. 4.3.2 ‘--help’ -------------- If the command-line includes this option, the program will not be run. It will print a complete list of all available options along with a short explanation. The options are also grouped by their context. Within each context, the options are sorted alphabetically. Since the options are shown in detail afterwards, the first line of the ‘--help’ output shows the arguments and if they are optional or not, similar to *note --usage::. In the ‘--help’ output of all programs in Gnuastro, the options for each program are classified based on context. The first two contexts are always options to do with the input and output respectively. for example, input image extensions or supplementary input files for the inputs. The last class of options is also fixed in all of Gnuastro, it shows operating mode options. Most of these options are already explained in *note Operating mode options::. The help message will sometimes be longer than the vertical size of your terminal. If you are using a graphical user interface terminal emulator, you can scroll the terminal with your mouse, but we promised no mice distractions! So here are some suggestions: • to scroll up and to scroll down. For most help output this should be enough. The problem is that it is limited by the number of lines that your terminal keeps in memory and that you cannot scroll by lines, only by whole screens. • Pipe to ‘less’. A pipe is a form of shell re-direction. The ‘less’ tool in Unix-like systems was made exactly for such outputs of any length. You can pipe (‘|’) the output of any program that is longer than the screen to it and then you can scroll through (up and down) with its many tools. For example: $ astnoisechisel --help | less Once you have gone through the text, you can quit ‘less’ by pressing the key. • Redirect to a file. This is a less convenient way, because you will then have to open the file in a text editor! You can do this with the shell redirection tool (‘>’): $ astnoisechisel --help > filename.txt In case you have a special keyword you are looking for in the help, you do not have to go through the full list. GNU Grep is made for this job. for example, if you only want the list of options whose ‘--help’ output contains the word “axis” in Crop, you can run the following command: $ astcrop --help | grep axis If the output of this option does not fit nicely within the confines of your terminal, GNU does enable you to customize its output through the environment variable ‘ARGP_HELP_FMT’, you can set various parameters which specify the formatting of the help messages. for example, if your terminals are wider than 70 spaces (say 100) and you feel there is too much empty space between the long options and the short explanation, you can change these formats by giving values to this environment variable before running the program with the ‘--help’ output. You can define this environment variable in this manner: $ export ARGP_HELP_FMT=rmargin=100,opt-doc-col=20 This will affect all GNU programs using GNU C library’s ‘argp.h’ facilities as long as the environment variable is in memory. You can see the full list of these formatting parameters in the “Argp User Customization” part of the GNU C library manual. If you are more comfortable to read the ‘--help’ outputs of all GNU software in your customized format, you can add your customization (similar to the line above, without the ‘$’ sign) to your ‘~/.bashrc’ file. This is a standard option for all GNU software. 4.3.3 Man pages --------------- Man pages were the Unix method of providing command-line documentation to a program. With GNU Info, see *note Info:: the usage of this method of documentation is highly discouraged. This is because Info provides a much more easier to navigate and read environment. However, some operating systems require a man page for packages that are installed and some people are still used to this method of command line help. So the programs in Gnuastro also have Man pages which are automatically generated from the outputs of ‘--version’ and ‘--help’ using the GNU help2man program. So if you run $ man programname You will be provided with a man page listing the options in the standard manner. 4.3.4 Info ---------- Info is the standard documentation format for all GNU software. It is a very useful command-line document viewing format, fully equipped with links between the various pages and menus and search capabilities. As explained before, the best thing about it is that it is available for you the moment you need to refresh your memory on any command-line tool in the middle of your work without having to take your hands off the keyboard. This complete book is available in Info format and can be accessed from anywhere on the command-line. To open the Info format of any installed programs or library on your system which has an Info format book, you can simply run the command below (change ‘executablename’ to the executable name of the program or library): $ info executablename In case you are not already familiar with it, run ‘$ info info’. It does a fantastic job in explaining all its capabilities itself. It is very short and you will become sufficiently fluent in about half an hour. Since all GNU software documentation is also provided in Info, your whole GNU/Linux life will significantly improve. Once you’ve become an efficient navigator in Info, you can go to any part of this book or any other GNU software or library manual, no matter how long it is, in a matter of seconds. It also blends nicely with GNU Emacs (a text editor) and you can search manuals while you are writing your document or programs without taking your hands off the keyboard, this is most useful for libraries like the GNU C library. To be able to access all the Info manuals installed in your GNU/Linux within Emacs, type . To see this whole book from the beginning in Info, you can run $ info gnuastro If you run Info with the particular program executable name, for example ‘astcrop’ or ‘astnoisechisel’: $ info astprogramname you will be taken to the section titled “Invoking ProgramName” which explains the inputs and outputs along with the command-line options for that program. Finally, if you run Info with the official program name, for example, Crop or NoiseChisel: $ info ProgramName you will be taken to the top section which introduces the program. Note that in all cases, Info is not case sensitive. 4.3.5 help-gnuastro mailing list -------------------------------- Gnuastro maintains the help-gnuastro mailing list for users to ask any questions related to Gnuastro. The experienced Gnuastro users and some of its developers are subscribed to this mailing list and your email will be sent to them immediately. However, when contacting this mailing list please have in mind that they are possibly very busy and might not be able to answer immediately. To ask a question from this mailing list, send a mail to ‘help-gnuastro@gnu.org’. Anyone can view the mailing list archives at . It is best that before sending a mail, you search the archives to see if anyone has asked a question similar to yours. If you want to make a suggestion or report a bug, please do not send a mail to this mailing list. We have other mailing lists and tools for those purposes, see *note Report a bug:: or *note Suggest new feature::. 4.4 Multi-threaded operations ============================= Some of the programs benefit significantly when you use all the threads your computer’s CPU has to offer to your operating system. The number of threads available can be larger than the number of physical (hardware) cores in the CPU (also known as Simultaneous multithreading). For example, in Intel’s CPUs (those that implement its Hyper-threading technology) the number of threads is usually double the number of physical cores in your CPU. On a GNU/Linux system, the number of threads available can be found with the command ‘$ nproc’ command (part of GNU Coreutils). Gnuastro’s programs can find the number of threads available to your system internally at run-time (when you execute the program). However, if a value is given to the ‘--numthreads’ option, the given number will be used, see *note Operating mode options:: and *note Configuration files:: for ways to use this option. Thus ‘--numthreads’ is the only common option in Gnuastro’s programs with a value that does not have to be specified anywhere on the command-line or in the configuration files. 4.4.1 A note on threads ----------------------- Spinning off threads is not necessarily the most efficient way to run an application. Creating a new thread is not a cheap operation for the operating system. It is most useful when the input data are fixed and you want the same operation to be done on parts of it. for example, one input image to Crop and multiple crops from various parts of it. In this fashion, the image is loaded into memory once, all the crops are divided between the number of threads internally and each thread cuts out those parts which are assigned to it from the same image. On the other hand, if you have multiple images and you want to crop the same region(s) out of all of them, it is much more efficient to set ‘--numthreads=1’ (so no threads spin off) and run Crop multiple times simultaneously, see *note How to run simultaneous operations::. You can check the boost in speed by first running a program on one of the data sets with the maximum number of threads and another time (with everything else the same) and only using one thread. You will notice that the wall-clock time (reported by most programs at their end) in the former is longer than the latter divided by number of physical CPU cores (not threads) available to your operating system. Asymptotically these two times can be equal (most of the time they are not). So limiting the programs to use only one thread and running them independently on the number of available threads will be more efficient. Note that the operating system keeps a cache of recently processed data, so usually, the second time you process an identical data set (independent of the number of threads used), you will get faster results. In order to make an unbiased comparison, you have to first clean the system’s cache with the following command between the two runs. $ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches *SUMMARY: Should I use multiple threads?* Depends: • If you only have *one* data set (image in most cases!), then yes, the more threads you use (with a maximum of the number of threads available to your OS) the faster you will get your results. • If you want to run the same operation on *multiple* data sets, it is best to set the number of threads to 1 and use Make, or GNU Parallel, as explained in *note How to run simultaneous operations::. 4.4.2 How to run simultaneous operations ---------------------------------------- There are two(1) approaches to simultaneously execute a program: using GNU Parallel or Make (GNU Make is the most common implementation). The first is very useful when you only want to do one job multiple times and want to get back to your work without actually keeping the command you ran. The second is usually for more important operations, with lots of dependencies between the different products (for example, a full scientific research). GNU Parallel When you only want to run multiple instances of a command on different threads and get on with the rest of your work, the best method is to use GNU parallel. Surprisingly GNU Parallel is one of the few GNU packages that has no Info documentation but only a Man page, see *note Info::. So to see the documentation after installing it please run $ man parallel As an example, let’s assume we want to crop a region fixed on the pixels (500, 600) with the default width from all the FITS images in the ‘./data’ directory ending with ‘sci.fits’ to the current directory. To do this, you can run: $ parallel astcrop --numthreads=1 --xc=500 --yc=600 ::: \ ./data/*sci.fits GNU Parallel can help in many more conditions, this is one of the simplest, see the man page for lots of other examples. For absolute beginners: the backslash (‘\’) is only a line breaker to fit nicely in the page. If you type the whole command in one line, you should remove it. Make Make is a program for building “targets” (e.g., files) using “recipes” (a set of operations) when their known “prerequisites” (other files) have been updated. It elegantly allows you to define dependency structures for building your final output and updating it efficiently when the inputs change. It is the most common infra-structure to build software today. Scientific research methodology is very similar to software development: you start by testing a hypothesis on a small sample of objects/targets with a simple set of steps. As you are able to get promising results, you improve the method and use it on a larger, more general, sample. In the process, you will confront many issues that have to be corrected (bugs in software development jargon). Make is a wonderful tool to manage this style of development. Besides the raw data analysis pipeline, Make has been used to for producing reproducible papers, for example, see the reproduction pipeline (https://gitlab.com/makhlaghi/NoiseChisel-paper) of the paper introducing *note NoiseChisel:: (one of Gnuastro’s programs). In fact the NoiseChisel paper’s Make-based workflow was the foundation of a parallel project called Maneage (http://maneage.org) (_Man_aging data lin_eage_): that is described more fully in Akhlaghi et al. 2021 (https://arxiv.org/abs/2006.03018). Therefore, it is a very useful tool for complex scientific workflows. GNU Make(2) is the most common implementation which (similar to nearly all GNU programs, comes with a wonderful manual(3)). Make is very basic and simple, and thus the manual is short (the most important parts are in the first roughly 100 pages) and easy to read/understand. Make comes with a ‘--jobs’ (‘-j’) option which allows you to specify the maximum number of jobs that can be done simultaneously. for example, if you have 8 threads available to your operating system. You can run: $ make -j8 With this command, Make will process your ‘Makefile’ and create all the targets (can be thousands of FITS images for example) simultaneously on 8 threads, while fully respecting their dependencies (only building a file/target when its prerequisites are successfully built). Make is thus strongly recommended for managing scientific research where robustness, archiving, reproducibility and speed(4) are important. ---------- Footnotes ---------- (1) A third way would be to open multiple terminal emulator windows in your GUI, type the commands separately on each and press once on each terminal, but this is far too frustrating, tedious and prone to errors. It’s therefore not a realistic solution when tens, hundreds or thousands of operations (your research targets, multiplied by the operations you do on each) are to be done. (2) (3) (4) Besides its multi-threaded capabilities, Make will only re-build those targets that depend on a change you have made, not the whole work. For example, if you have set the prerequisites properly, you can easily test the changing of a parameter on your paper’s results without having to re-do everything (which is much faster). This allows you to be much more productive in easily checking various ideas/assumptions of the different stages of your research and thus produce a more robust result for your exciting science. 4.5 Numeric data types ====================== At the lowest level, the computer stores everything in terms of ‘1’ or ‘0’. For example, each program in Gnuastro, or each astronomical image you take with the telescope is actually a string of millions of these zeros and ones. The space required to keep a zero or one is the smallest unit of storage, and is known as a _bit_. However, understanding and manipulating this string of bits is extremely hard for most people. Therefore, different standards are defined to package the bits into separate _type_s with a fixed interpretation of the bits in each package. To store numbers, the most basic standard/type is for integers ($..., -2, -1, 0, 1, 2, ...$). The common integer types are 8, 16, 32, and 64 bits wide (more bits will give larger limits). Each bit corresponds to a power of 2 and they are summed to create the final number. In the integer types, for each width there are two standards for reading the bits: signed and unsigned. In the ‘signed’ convention, one bit is reserved for the sign (stating that the integer is positive or negative). The ‘unsigned’ integers use that bit in the actual number and thus contain only positive numbers (starting from zero). Therefore, at the same number of bits, both signed and unsigned integers can allow the same number of integers, but the positive limit of the ‘unsigned’ types is double their ‘signed’ counterparts with the same width (at the expense of not having negative numbers). When the context of your work does not involve negative numbers (for example, counting, where negative is not defined), it is best to use the ‘unsigned’ types. For the full numerical range of all integer types, see below. Another standard of converting a given number of bits to numbers is the floating point standard, this standard can _approximately_ store any real number with a given precision. There are two common floating point types: 32-bit and 64-bit, for single and double precision floating point numbers respectively. The former is sufficient for data with less than 8 significant decimal digits (most astronomical data), while the latter is good for less than 16 significant decimal digits. The representation of real numbers as bits is much more complex than integers. If you are interested to learn more about it, you can start with the Wikipedia article (https://en.wikipedia.org/wiki/Floating_point). Practically, you can use Gnuastro’s Arithmetic program to convert/change the type of an image/datacube (see *note Arithmetic::), or Gnuastro Table program to convert a table column’s data type (see *note Column arithmetic::). Conversion of a dataset’s type is necessary in some contexts. for example, the program/library, that you intend to feed the data into, only accepts floating point values, but you have an integer image/column. Another situation that conversion can be helpful is when you know that your data only has values that fit within ‘int8’ or ‘uint16’. However it is currently formatted in the ‘float64’ type. The important thing to consider is that operations involving wider, floating point, or signed types can be significantly slower than smaller-width, integer, or unsigned types respectively. Note that besides speed, a wider type also requires much more storage space (by 4 or 8 times). Therefore, when you confront such situations that can be optimized and want to store/archive/transfer the data, it is best to use the most efficient type. for example, if your dataset (image or table column) only has positive integers less than 65535, store it as an unsigned 16-bit integer for faster processing, faster transfer, and less storage space. The short and long names for the recognized numeric data types in Gnuastro are listed below. Both short and long names can be used when you want to specify a type. For example, as a value to the common option ‘--type’ (see *note Input output options::), or in the information comment lines of *note Gnuastro text table format::. The ranges listed below are inclusive. ‘u8’ ‘uint8’ 8-bit unsigned integers, range: $[0\rm{\ to\ }2^8-1]$ or $[0\rm{\ to\ }255]$. ‘i8’ ‘int8’ 8-bit signed integers, range: $[-2^7\rm{\ to\ }2^7-1]$ or $[-128\rm{\ to\ }127]$. ‘u16’ ‘uint16’ 16-bit unsigned integers, range: $[0\rm{\ to\ }2^{16}-1]$ or $[0\rm{\ to\ }65535]$. ‘i16’ ‘int16’ 16-bit signed integers, range: $[-2^{15}\rm{\ to\ }2^{15}-1]$ or $[-32768\rm{\ to\ }32767]$. ‘u32’ ‘uint32’ 32-bit unsigned integers, range: $[0\rm{\ to\ }2^{32}-1]$ or $[0\rm{\ to\ }4294967295]$. ‘i32’ ‘int32’ 32-bit signed integers, range: $[-2^{31}\rm{\ to\ }2^{31}-1]$ or $[-2147483648\rm{\ to\ }2147483647]$. ‘u64’ ‘uint64’ 64-bit unsigned integers, range $[0\rm{\ to\ }2^{64}-1]$ or $[0\rm{\ to\ }18446744073709551615]$. ‘i64’ ‘int64’ 64-bit signed integers, range: $[-2^{63}\rm{\ to\ }2^{63}-1]$ or $[-9223372036854775808\rm{\ to\ }9223372036854775807]$. ‘f32’ ‘float32’ 32-bit (single-precision) floating point types. The maximum (minimum is its negative) possible value is $3.402823\times10^{38}$. Single-precision floating points can accurately represent a floating point number up to $\sim7.2$ significant decimals. Given the heavy noise in astronomical data, this is usually more than sufficient for storing results. ‘f64’ ‘float64’ 64-bit (double-precision) floating point types. The maximum (minimum is its negative) possible value is $\sim10^{308}$. Double-precision floating points can accurately represent a floating point number $\sim15.9$ significant decimals. This is usually good for processing (mixing) the data internally, for example, a sum of single precision data (and later storing the result as ‘float32’). *Some file formats do not recognize all types.* for example, the FITS standard (see *note Fits::) does not define ‘uint64’ in binary tables or images. When a type is not acceptable for output into a given file format, the respective Gnuastro program or library will let you know and abort. On the command-line, you can convert the numerical type of an image, or table column into another type with *note Arithmetic:: or *note Table:: respectively. If you are writing your own program, you can use the ‘gal_data_copy_to_new_type()’ function in Gnuastro’s library, see *note Copying datasets::. 4.6 Memory management ===================== In this section we will review how Gnuastro manages your input data in your system’s memory. Knowing this can help you optimize your usage (in speed and memory consumption) when the data volume is large and approaches, or exceeds, your available RAM (usually in various calls to multiple programs simultaneously). But before diving into the details, let’s have a short basic introduction to memory in general and in particular the types of memory most relevant to this discussion. Input datasets (that are later fed into programs for analysis) are commonly first stored in _non-volatile memory_. This is a type of memory that does not need a constant power supply to keep the data and is therefore primarily aimed for long-term storage, like HDDs or SSDs. So data in this type of storage is preserved when you turn off your computer. But by its nature, non-volatile memory is much slower, in reading or writing, than the speeds that CPUs can process the data. Thus relying on this type of memory alone would create a bad bottleneck in the input/output (I/O) phase of any processing. The first step to decrease this bottleneck is to have a faster storage space, but with a much limited storage volume. For this type of storage, computers have a Random Access Memory (or RAM). RAM is classified as a _volatile memory_ because it needs a constant flow of electricity to keep the information. In other words, the moment power is cut-off, all the stored information in your RAM is gone (hence the “volatile” name). But thanks to that constant supply of power, it can access any random address with equal (and very high!) speed. Hence, the general/simplistic way that programs deal with memory is the following (this is general to almost all programs, not just Gnuastro’s): 1) Load/copy the input data from the non-volatile memory into RAM. 2) Use the copy of the data in RAM as input for all the internal processing as well as the intermediate data that is necessary during the processing. 3) Finally, when the analysis is complete, write the final output data back into non-volatile memory, and free/delete all the used space in the RAM (the initial copy and all the intermediate data). Usually the RAM is most important for the data of the intermediate steps (that you never see as a user of a program!). When the input dataset(s) to a program are small (compared to the available space in your system’s RAM at the moment it is run) Gnuastro’s programs and libraries follow the standard series of steps above. The only exception is that deleting the intermediate data is not only done at the end of the program. As soon as an intermediate dataset is no longer necessary for the next internal steps, the space it occupied is deleted/freed. This allows Gnuastro programs to minimize their usage of your system’s RAM over the full running time. The situation gets complicated when the datasets are large (compared to your available RAM when the program is run). for example, if a dataset is half the size of your system’s available RAM, and the program’s internal analysis needs three or more intermediately processed copies of it at one moment in its analysis. There will not be enough RAM to keep those higher-level intermediate data. In such cases, programs that do not do any memory management will crash. But fortunately Gnuastro’s programs do have a memory management plans for such situations. When the necessary amount of space for an intermediate dataset cannot be allocated in the RAM, Gnuastro’s programs will not use the RAM at all. They will use the “memory-mapped file” concept in modern operating systems to create a randomly-named file in your non-volatile memory and use that instead of the RAM. That file will have the exact size (in bytes) of that intermediate dataset. Any time the program needs that intermediate dataset, the operating system will directly go to that file, and bypass your RAM. As soon as that file is no longer necessary for the analysis, it will be deleted. But as mentioned above, non-volatile memory has much slower I/O speed than the RAM. Hence in such situations, the programs will become noticeably slower (sometimes by factors of 10 times slower, depending on your non-volatile memory speed). Because of the drop in I/O speed (and thus the speed of your running program), the moment that any to-be-allocated dataset is memory-mapped, Gnuastro’s programs and libraries will notify you with a descriptive statement like below (can happen in any phase of their analysis). It shows the location of the memory-mapped file, its size, complemented with a small description of the cause, a pointer to this section of the book for more information on how to deal with it (if necessary), and what to do to suppress it. astarithmetic: ./gnuastro_mmap/Fu7Dhs: temporary memory-mapped file (XXXXXXXXXXX bytes) created for intermediate data that is not stored in RAM (see the "Memory management" section of Gnuastro's manual for optimizing your project's memory management, and thus speed). To disable this warning, please use the option '--quiet-mmap' Finally, when the intermediate dataset is no longer necessary, the program will automatically delete it and notify you with a statement like this: astarithmetic: ./gnuastro_mmap/Fu7Dhs: deleted To disable these messages, you can run the program with ‘--quietmmap’, or set the ‘quietmmap’ variable in the allocating library function to be non-zero. An important component of these messages is the name of the memory-mapped file. Knowing that the file has been deleted is important for the user if the program crashes for any reason: internally (for example, a parameter is given wrongly) or externally (for example, you mistakenly kill the running job). In the event of a crash, the memory-mapped files will not be deleted and you have to manually delete them because they are usually large and they may soon fill your full storage if not deleted in a long time due to successive crashes. This brings us to managing the memory-mapped files in your non-volatile memory. In other words: knowing where they are saved, or intentionally placing them in different places of your file system, or deleting them when necessary. As the examples above show, memory-mapped files are stored in a sub-directory of the running directory called ‘gnuastro_mmap’. If this directory does not exist, Gnuastro will automatically create it when memory mapping becomes necessary. Alternatively, it may happen that the ‘gnuastro_mmap’ sub-directory exists and is not writable, or it cannot be created. In such cases, the memory-mapped file for each dataset will be created in the running directory with a ‘gnuastro_mmap_’ prefix. Therefore one easy way to delete all memory-mapped files in case of a crash, is to delete everything within the sub-directory (first command below), or all files stating with this prefix: rm -f gnuastro_mmap/* rm -f gnuastro_mmap_* A much more common issue when dealing with memory-mapped files is their location. For example, you may be running a program in a partition that is hosted by an HDD. But you also have another partition on an SSD (which has much faster I/O). So you want your memory-mapped files to be created in the SSD to speed up your processing. In this scenario, you want your project source directory to only contain your plain-text scripts and you want your project’s built products (even the temporary memory-mapped files) to be built in a different location because they are large; thus I/O speed becomes important. To host the memory-mapped files in another location (with fast I/O), you can set (‘gnuastro_mmap’) to be a symbolic link to it. For example, let’s assume you want your memory-mapped files to be stored in ‘/path/to/dir/for/mmap’. All you have to do is to run the following command before your Gnuastro analysis command(s). ln -s /path/to/dir/for/mmap gnuastro_mmap The programs will delete a memory-mapped file when it is no longer needed, but they will not delete the ‘gnuastro_mmap’ directory that hosts them. So if your project involves many Gnuastro programs (possibly called in parallel) and you want your memory-mapped files to be in a different location, you just have to make the symbolic link above once at the start, and all the programs will use it if necessary. Another memory-management scenario that may happen is this: you do not want a Gnuastro program to allocate internal datasets in the RAM at all. for example, the speed of your Gnuastro-related project does not matter at that moment, and you have higher-priority jobs that are being run at the same time which need to have RAM available. In such cases, you can use the ‘--minmapsize’ option that is available in all Gnuastro programs (see *note Processing options::). Any intermediate dataset that has a size larger than the value of this option will be memory-mapped, even if there is space available in your RAM. for example, if you want any dataset larger than 100 megabytes to be memory-mapped, use ‘--minmapsize=100000000’ (8 zeros!). You should not set the value of ‘--minmapsize’ to be too small, otherwise even small intermediate values (that are usually very numerous) in the program will be memory-mapped. However the kernel can only host a limited number of memory-mapped files at every moment (by all running programs combined). for example, in the default(1) Linux kernel on GNU/Linux operating systems this limit is roughly 64000. If the total number of memory-mapped files exceeds this number, all the programs using them will crash. Gnuastro’s programs will warn you if your given value is too small and may cause a problem later. Actually, the default behavior for Gnuastro’s programs (to only use memory-mapped files when there is not enough RAM) is a side-effect of ‘--minmapsize’. The pre-defined value to this option is an extremely large value in the lowest-level Gnuastro configuration file (the installed ‘gnuastro.conf’ described in *note Configuration file precedence::). This value is larger than the largest possible available RAM. You can check by running any Gnuastro program with a ‘-P’ option. Because no dataset will be larger than this, by default the programs will first attempt to use the RAM for temporary storage. But if writing in the RAM fails (for any reason, mainly due to lack of available space), then a memory-mapped file will be created. ---------- Footnotes ---------- (1) If you need to host more memory-mapped files at one moment, you need to build your own customized Linux kernel. 4.7 Tables ========== “A table is a collection of related data held in a structured format within a database. It consists of columns, and rows.” (from Wikipedia). Each column in the table contains the values of one property and each row is a collection of properties (columns) for one target object. For example, let’s assume you have just ran MakeCatalog (see *note MakeCatalog::) on an image to measure some properties for the labeled regions (which might be detected galaxies for example) in the image. For each labeled region (detected galaxy), there will be a _row_ which groups its measured properties as _columns_, one column for each property. One such property can be the object’s magnitude, which is the sum of pixels with that label, or its center can be defined as the light-weighted average value of those pixels. Many such properties can be derived from the raw pixel values and their position, see *note Invoking astmkcatalog:: for a long list. As a summary, for each labeled region (or, galaxy) we have one _row_ and for each measured property we have one _column_. This high-level structure is usually the first step for higher-level analysis, for example, finding the stellar mass or photometric redshift from magnitudes in multiple colors. Thus, tables are not just outputs of programs, in fact it is much more common for tables to be inputs of programs. For example, to make a mock galaxy image, you need to feed in the properties of each galaxy into *note MakeProfiles:: for it do the inverse of the process above and make a simulated image from a catalog, see *note Sufi simulates a detection::. In other cases, you can feed a table into *note Crop:: and it will crop out regions centered on the positions within the table, see *note Reddest clumps cutouts and parallelization::. So to end this relatively long introduction, tables play a very important role in astronomy, or generally all branches of data analysis. In *note Recognized table formats:: the currently recognized table formats in Gnuastro are discussed. You can use any of these tables as input or ask for them to be built as output. The most common type of table format is a simple plain text file with each row on one line and columns separated by white space characters, this format is easy to read/write by eye/hand. To give it the full functionality of more specific table types like the FITS tables, Gnuastro has a special convention which you can use to give each column a name, type, unit, and comments, while still being readable by other plain text table readers. This convention is described in *note Gnuastro text table format::. When tables are input to a program, the program reading it needs to know which column(s) it should use for its desired purposes. Gnuastro’s programs all follow a similar convention, on the way you can select columns in a table. They are thoroughly discussed in *note Selecting table columns::. 4.7.1 Recognized table formats ------------------------------ The list of table formats that Gnuastro can currently read from and write to are described below. Each has their own advantage and disadvantages, so a short review of the format is also provided to help you make the best choice based on how you want to define your input tables or later use your output tables. Plain text table This is the most basic and simplest way to create, view, or edit the table by hand on a text editor. The other formats described below are less eye-friendly and have a more formal structure (for easier computer readability). It is fully described in *note Gnuastro text table format::. FITS ASCII tables The FITS ASCII table extension is fully in ASCII encoding and thus easily readable on any text editor (assuming it is the only extension in the FITS file). If the FITS file also contains binary extensions (for example, an image or binary table extensions), then there will be many hard to print characters. The FITS ASCII format does not have new line characters to separate rows. In the FITS ASCII table standard, each row is defined as a fixed number of characters (value to the ‘NAXIS1’ keyword), so to visually inspect it properly, you would have to adjust your text editor’s width to this value. All columns start at given character positions and have a fixed width (number of characters). Numbers in a FITS ASCII table are printed into ASCII format, they are not in binary (that the CPU uses). Hence, they can take a larger space in memory, loose their precision, and take longer to read into memory. If you are dealing with integer type columns (see *note Numeric data types::), another issue with FITS ASCII tables is that the type information for the column will be lost (there is only one integer type in FITS ASCII tables). One problem with the binary format on the other hand is that it is not portable (different CPUs/compilers) have different standards for translating the zeros and ones. But since ASCII characters are defined on a byte and are well recognized, they are better for portability on those various systems. Gnuastro’s plain text table format described below is much more portable and easier to read/write/interpret by humans manually. Generally, as the name implies, this format is useful for when your table mainly contains ASCII columns (for example, file names, or descriptions). They can be useful when you need to include columns with structured ASCII information along with other extensions in one FITS file. In such cases, you can also consider header keywords (see *note Fits::). FITS binary tables The FITS binary table is the FITS standard’s solution to the issues discussed with keeping numbers in ASCII format as described under the FITS ASCII table title above. Only columns defined as a string type (a string of ASCII characters) are readable in a text editor. The portability problem with binary formats discussed above is mostly solved thanks to the portability of CFITSIO (see *note CFITSIO::) and the very long history of the FITS format which has been widely used since the 1970s. In the case of most numbers, storing them in binary format is more memory efficient than ASCII format. For example, to store ‘-25.72034’ in ASCII format, you need 9 bytes/characters. But if you keep this same number (to the approximate precision possible) as a 4-byte (32-bit) floating point number, you can keep/transmit it with less than half the amount of memory. When catalogs contain thousands/millions of rows in tens/hundreds of columns, this can lead to significant improvements in memory/band-width usage. Moreover, since the CPU does its operations in the binary formats, reading the table in and writing it out is also much faster than an ASCII table. When you are dealing with integer numbers, the compression ratio can be even better, for example, if you know all of the values in a column are positive and less than ‘255’, you can use the ‘unsigned char’ type which only takes one byte! If they are between ‘-128’ and ‘127’, then you can use the (signed) ‘char’ type. So if you are thoughtful about the limits of your integer columns, you can greatly reduce the size of your file and also the speed at which it is read/written. This can be very useful when sharing your results with collaborators or publishing them. To decrease the file size even more you can name your output as ending in ‘.fits.gz’ so it is also compressed after creation. Just note that compression/decompressing is CPU intensive and can slow down the writing/reading of the file. Fortunately the FITS Binary table format also accepts ASCII strings as column types (along with the various numerical types). So your dataset can also contain non-numerical columns. 4.7.2 Gnuastro text table format -------------------------------- Plain text files are the most generic, portable, and easiest way to (manually) create, (visually) inspect, or (manually) edit a table. In this format, the ending of a row is defined by the new-line character (a line on a text editor). So when you view it on a text editor, every row will occupy one line. The delimiters (or characters separating the columns) are white space characters (space, horizontal tab, vertical tab) and a comma (<,>). The only further requirement is that all rows/lines must have the same number of columns. The columns do not have to be exactly under each other and the rows can be arbitrarily long with different lengths. for example, the following contents in a file would be interpreted as a table with 4 columns and 2 rows, with each element interpreted as a ‘double’ type (see *note Numeric data types::). 1 2.234948 128 39.8923e8 2 , 4.454 792 72.98348e7 However, the example above has no other information about the columns (it is just raw data, with no meta-data). To use this table, you have to remember what the numbers in each column represent. Also, when you want to select columns, you have to count their position within the table. This can become frustrating and prone to bad errors (getting the columns wrong) especially as the number of columns increase. It is also bad for sending to a colleague, because they will find it hard to remember/use the columns properly. To solve these problems in Gnuastro’s programs/libraries you are not limited to using the column’s number, see *note Selecting table columns::. If the columns have names, units, or comments you can also select your columns based on searches/matches in these fields, for example, see *note Table::. Also, in this manner, you cannot guide the program reading the table on how to read the numbers. As an example, the first and third columns above can be read as integer types: the first column might be an ID and the third can be the number of pixels an object occupies in an image. So there is no need to read these to columns as a ‘double’ type (which takes more memory, and is slower). In the bare-minimum example above, you also cannot use strings of characters, for example, the names of filters, or some other identifier that includes non-numerical characters. In the absence of any information, only numbers can be read robustly. Assuming we read columns with non-numerical characters as string, there would still be the problem that the strings might contain space (or any delimiter) character for some rows. So, each ‘word’ in the string will be interpreted as a column and the program will abort with an error that the rows do not have the same number of columns. To correct for these limitations, Gnuastro defines the following convention for storing the table meta-data along with the raw data in one plain text file. The format is primarily designed for ease of reading/writing by eye/fingers, but is also structured enough to be read by a program. When the first non-white character in a line is <#>, or there are no non-white characters in it, then the line will not be considered as a row of data in the table (this is a pretty standard convention in many programs, and higher level languages). In the former case, the line is interpreted as a _comment_. If the comment line starts with ‘‘# Column N:’’, then it is assumed to contain information about column ‘N’ (a number, counting from 1). Comment lines that do not start with this pattern are ignored and you can use them to include any further information you want to store with the table in the text file. A column information comment is assumed to have the following format: # Column N: NAME [UNIT, TYPE, BLANK] COMMENT Any sequence of characters between ‘<:>’ and ‘<[>’ will be interpreted as the column name (so it can contain anything except the ‘<[>’ character). Anything between the ‘<]>’ and the end of the line is defined as a comment. Within the brackets, anything before the first ‘<,>’ is the units (physical units, for example, km/s, or erg/s), anything before the second ‘<,>’ is the short type identifier (see below, and *note Numeric data types::). If the type identifier is not recognized, the default 64-bit floating point type will be used. Finally (still within the brackets), any non-white characters after the second ‘<,>’ are interpreted as the blank value for that column (see *note Blank pixels::). The blank value can either be in the same type as the column (for example, ‘-99’ for a signed integer column), or any string (for example, ‘NaN’ in that same column). In both cases, the values will be stored in memory as Gnuastro’s fixed blank values for each type. For floating point types, Gnuastro’s internal blank value is IEEE NaN (Not-a-Number). For signed integers, it is the smallest possible value and for unsigned integers its the largest possible value. When a formatting problem occurs, or when the column was already given meta-data in a previous comment, or when the column number is larger than the actual number of columns in the table (the non-commented or empty lines), then the comment information line will be ignored. When a comment information line can be used, the leading and trailing white space characters will be stripped from all of the elements. for example, in this line: # Column 5: column name [km/s, f32,-99] Redshift as speed The ‘NAME’ field will be ‘‘column name’’ and the ‘TYPE’ field will be ‘‘f32’’. Note how all the white space characters before and after strings are not used, but those in the middle remained. Also, white space characters are not mandatory. Hence, in the example above, the ‘BLANK’ field will be given the value of ‘‘-99’’. Except for the column number (‘N’), the rest of the fields are optional. Also, the column information comments do not have to be in order. In other words, the information for column $N+m$ ($m>0$) can be given in a line before column $N$. Furthermore, you do not have to specify information for all columns. Those columns that do not have this information will be interpreted with the default settings (like the case above: values are double precision floating point, and the column has no name, unit, or comment). So these lines are all acceptable for any table (the first one, with nothing but the column number is redundant): # Column 5: # Column 1: ID [,i8] The Clump ID. # Column 3: mag_f160w [AB mag, f32] Magnitude from the F160W filter The data type of the column should be specified with one of the following values: • For a numeric column, you can use any of the numeric types (and their recognized identifiers) described in *note Numeric data types::. • ‘‘strN’’: for strings. The ‘N’ value identifies the length of the string (how many characters it has). The start of the string on each row is the first non-delimiter character of the column that has the string type. The next ‘N’ characters will be interpreted as a string and all leading and trailing white space will be removed. If the next column’s characters, are closer than ‘N’ characters to the start of the string column in that line/row, they will be considered part of the string column. If there is a new-line character before the ending of the space given to the string column (in other words, the string column is the last column), then reading of the string will stop, even if the ‘N’ characters are not complete yet. See ‘tests/table/table.txt’ for one example. Therefore, the only time you have to pay attention to the positioning and spaces given to the string column is when it is not the last column in the table. The only limitation in this format is that trailing and leading white space characters will be removed from the columns that are read. In most cases, this is the desired behavior, but if trailing and leading white-spaces are critically important to your analysis, define your own starting and ending characters and remove them after the table has been read. for example, in the sample table below, the two ‘<|>’ characters (which are arbitrary) will remain in the value of the second column and you can remove them manually later. If only one of the leading or trailing white spaces is important for your work, you can only use one of the ‘<|>’s. # Column 1: ID [label, u8] # Column 2: Notes [no unit, str50] 1 leading and trailing white space is ignored here 2.3442e10 2 | but they will be preserved here | 8.2964e11 Note that the FITS binary table standard does not define the ‘unsigned int’ and ‘unsigned long’ types, so if you want to convert your tables to FITS binary tables, use other types. Also, note that in the FITS ASCII table, there is only one integer type (‘long’). So if you convert a Gnuastro plain text table to a FITS ASCII table with the *note Table:: program, the type information for integers will be lost. Conversely if integer types are important for you, you have to manually set them when reading a FITS ASCII table (for example, with the Table program when reading/converting into a file, or with the ‘gnuastro/table.h’ library functions when reading into memory). 4.7.3 Selecting table columns ----------------------------- At the lowest level, the only defining aspect of a column in a table is its number, or position. But selecting columns purely by number is not very convenient and, especially when the tables are large it can be very frustrating and prone to errors. Hence, table file formats (for example, see *note Recognized table formats::) have ways to store additional information about the columns (meta-data). Some of the most common pieces of information about each column are its _name_, the _units_ of data in it, and a _comment_ for longer/informal description of the column’s data. To facilitate research with Gnuastro, you can select columns by matching, or searching in these three fields, besides the low-level column number. To view the full list of information on the columns in the table, you can use the Table program (see *note Table::) with the command below (replace ‘table-file’ with the filename of your table, if its FITS, you might also need to specify the HDU/extension which contains the table): $ asttable --information table-file Gnuastro’s programs need the columns for different purposes, for example, in Crop, you specify the columns containing the central coordinates of the crop centers with the ‘--coordcol’ option (see *note Crop options::). On the other hand, in MakeProfiles, to specify the column containing the profile position angles, you must use the ‘--pcol’ option (see *note MakeProfiles catalog::). Thus, there can be no unified common option name to select columns for all programs (different columns have different purposes). However, when the program expects a column for a specific context, the option names end in the ‘col’ suffix like the examples above. These options accept values in integer (column number), or string (metadata match/search) format. If the value can be parsed as a positive integer, it will be seen as the low-level column number. Note that column counting starts from 1, so if you ask for column 0, the respective program will abort with an error. When the value cannot be interpreted as an a integer number, it will be seen as a string of characters which will be used to match/search in the table’s meta-data. The meta-data field which the value will be compared with can be selected through the ‘--searchin’ option, see *note Input output options::. ‘--searchin’ can take three values: ‘name’, ‘unit’, ‘comment’. The matching will be done following this convention: • If the value is enclosed in two slashes (for example, ‘-x/RA_/’, or ‘--coordcol=/RA_/’, see *note Crop options::), then it is assumed to be a regular expression with the same convention as GNU AWK. GNU AWK has a very well written chapter (https://www.gnu.org/software/gawk/manual/html_node/Regexp.html) describing regular expressions, so we will not continue discussing them here. Regular expressions are a very powerful tool in matching text and useful in many contexts. We thus strongly encourage reviewing this chapter for greatly improving the quality of your work in many cases, not just for searching column meta-data in Gnuastro. • When the string is not enclosed between ‘’s, any column that exactly matches the given value in the given field will be selected. Note that in both cases, you can ignore the case of alphabetic characters with the ‘--ignorecase’ option, see *note Input output options::. Also, in both cases, multiple columns may be selected with one call to this function. In this case, the order of the selected columns (with one call) will be the same order as they appear in the table. 4.8 Tessellation ================ It is sometimes necessary to classify the elements in a dataset (for example, pixels in an image) into a grid of individual, non-overlapping tiles. for example, when background sky gradients are present in an image, you can define a tile grid over the image. When the tile sizes are set properly, the background’s variation over each tile will be negligible, allowing you to measure (and subtract) it. In other cases (for example, spatial domain convolution in Gnuastro, see *note Convolve::), it might simply be for speed of processing: each tile can be processed independently on a separate CPU thread. In the arts and mathematics, this process is formally known as tessellation (https://en.wikipedia.org/wiki/Tessellation). The size of the regular tiles (in units of data-elements, or pixels in an image) can be defined with the ‘--tilesize’ option. It takes multiple numbers (separated by a comma) which will be the length along the respective dimension (in FORTRAN/FITS dimension order). Divisions are also acceptable, but must result in an integer. for example, ‘--tilesize=30,40’ can be used for an image (a 2D dataset). The regular tile size along the first FITS axis (horizontal when viewed in SAO DS9) will be 30 pixels and along the second it will be 40 pixels. Ideally, ‘--tilesize’ should be selected such that all tiles in the image have exactly the same size. In other words, that the dataset length in each dimension is divisible by the tile size in that dimension. However, this is not always possible: the dataset can be any size and every pixel in it is valuable. In such cases, Gnuastro will look at the significance of the remainder length, if it is not significant (for example, one or two pixels), then it will just increase the size of the first tile in the respective dimension and allow the rest of the tiles to have the required size. When the remainder is significant (for example, one pixel less than the size along that dimension), the remainder will be added to one regular tile’s size and the large tile will be cut in half and put in the two ends of the grid/tessellation. In this way, all the tiles in the central regions of the dataset will have the regular tile sizes and the tiles on the edge will be slightly larger/smaller depending on the remainder significance. The fraction which defines the remainder significance along all dimensions can be set through ‘--remainderfrac’. The best tile size is directly related to the spatial properties of the property you want to study (for example, gradient on the image). In practice we assume that the gradient is not present over each tile. So if there is a strong gradient (for example, in long wavelength ground based images) or the image is of a crowded area where there is not too much blank area, you have to choose a smaller tile size. A larger mesh will give more pixels and so the scatter in the results will be less (better statistics). For raw image processing, a single tessellation/grid is not sufficient. Raw images are the unprocessed outputs of the camera detectors. Modern detectors usually have multiple readout channels each with its own amplifier. for example, the Hubble Space Telescope Advanced Camera for Surveys (ACS) has four amplifiers over its full detector area dividing the square field of view to four smaller squares. Ground based image detectors are not exempt, for example, each CCD of Subaru Telescope’s Hyper Suprime-Cam camera (which has 104 CCDs) has four amplifiers, but they have the same height of the CCD and divide the width by four parts. The bias current on each amplifier is different, and initial bias subtraction is not perfect. So even after subtracting the measured bias current, you can usually still identify the boundaries of different amplifiers by eye. See Figure 11(a) in Akhlaghi and Ichikawa (2015) for an example. This results in the final reduced data to have non-uniform amplifier-shaped regions with higher or lower background flux values. Such systematic biases will then propagate to all subsequent measurements we do on the data (for example, photometry and subsequent stellar mass and star formation rate measurements in the case of galaxies). Therefore an accurate analysis requires a two layer tessellation: the top layer contains larger tiles, each covering one amplifier channel. For clarity we will call these larger tiles “channels”. The number of channels along each dimension is defined through the ‘--numchannels’. Each channel is then covered by its own individual smaller tessellation (with tile sizes determined by the ‘--tilesize’ option). This will allow independent analysis of two adjacent pixels from different channels if necessary. If the image is processed or the detector only has one amplifier, you can set the number of channels in both dimension to 1. The final tessellation can be inspected on the image with the ‘--checktiles’ option that is available to all programs which use tessellation for localized operations. When this option is called, a FITS file with a ‘_tiled.fits’ suffix will be created along with the outputs, see *note Automatic output::. Each pixel in this image has the number of the tile that covers it. If the number of channels in any dimension are larger than unity, you will notice that the tile IDs are defined such that the first channels is covered first, then the second and so on. For the full list of processing-related common options (including tessellation options), please see *note Processing options::. 4.9 Automatic output ==================== All the programs in Gnuastro are designed such that specifying an output file or directory (based on the program context) is optional. When no output name is explicitly given (with ‘--output’, see *note Input output options::), the programs will automatically set an output name based on the input name(s) and what the program does. for example, when you are using ConvertType to save FITS image named ‘dataset.fits’ to a JPEG image and do not specify a name for it, the JPEG output file will be name ‘dataset.jpg’. When the input is from the standard input (for example, a pipe, see *note Standard input::), and ‘--output’ is not given, the output name will be the program’s name (for example, ‘converttype.jpg’). Another very important part of the automatic output generation is that all the directory information of the input file name is stripped off of it. This feature can be disabled with the ‘--keepinputdir’ option, see *note Input output options::. It is the default because astronomical data are usually very large and organized specially with special file names. In some cases, the user might not have write permissions in those directories(1). Let’s assume that we are working on a report and want to process the FITS images from two projects (ABC and DEF), which are stored in the sub-directories named ‘ABCproject/’ and ‘DEFproject/’ of our top data directory (‘/mnt/data’). The following shell commands show how one image from the former is first converted to a JPEG image through ConvertType and then the objects from an image in the latter project are detected using NoiseChisel. The text after the ‘#’ sign are comments (not typed!). $ pwd # Current location /home/usrname/research/report $ ls # List directory contents ABC01.jpg $ ls /mnt/data/ABCproject # Archive 1 ABC01.fits ABC02.fits ABC03.fits $ ls /mnt/data/DEFproject # Archive 2 DEF01.fits DEF02.fits DEF03.fits $ astconvertt /mnt/data/ABCproject/ABC02.fits --output=jpg # Prog 1 $ ls ABC01.jpg ABC02.jpg $ astnoisechisel /mnt/data/DEFproject/DEF01.fits # Prog 2 $ ls ABC01.jpg ABC02.jpg DEF01_detected.fits ---------- Footnotes ---------- (1) In fact, even if the data is stored on your own computer, it is advised to only grant write permissions to the super user or root. This way, you will not accidentally delete or modify your valuable data! 4.10 Output FITS files ====================== The output of many of Gnuastro’s programs are (or can be) FITS files. The FITS format has many useful features for storing scientific datasets (cubes, images and tables) along with a robust features for archivability. For more on this standard, please see *note Fits::. As a community convention described in *note Fits::, the first extension of all FITS files produced by Gnuastro’s programs only contains the meta-data that is intended for the file’s extension(s). For a Gnuastro program, this generic meta-data (that is stored as FITS keyword records) is its configuration when it produced this dataset: file name(s) of input(s) and option names, values and comments. Note that when the configuration is too trivial (only input filename, for example, the program *note Table::) no meta-data is written in this extension. FITS keywords have the following limitations in regards to generic option names and values which are described below: • If a keyword (option name) is longer than 8 characters, the first word in the record (80 character line) is ‘HIERARCH’ which is followed by the keyword name. • Values can be at most 75 characters, but for strings, this changes to 73 (because of the two extra <'> characters that are necessary). However, if the value is a file name, containing slash () characters to separate directories, Gnuastro will break the value into multiple keywords. • Keyword names ignore case, therefore they are all in capital letters. Therefore, if you want to use Grep to inspect these keywords, use the ‘-i’ option, like the example below. $ astfits image_detected.fits -h0 | grep -i snquant The keywords above are classified (separated by an empty line and title) as a group titled “ProgramName configuration”. This meta-data extension, as well as all the other extensions (which contain data), also contain have final group of keywords to keep the basic date and version information of Gnuastro, its dependencies and the pipeline that is using Gnuastro (if it is under version control). ‘DATE’ The creation time of the FITS file. This date is written directly by CFITSIO and is in UT format. ‘COMMIT’ Git’s commit description from the running directory of Gnuastro’s programs. If the running directory is not version controlled or ‘libgit2’ is not installed (see *note Optional dependencies::) then this keyword will not be present. The printed value is equivalent to the output of the following command: git describe --dirty --always If the running directory contains non-committed work, then the stored value will have a ‘‘-dirty’’ suffix. This can be very helpful to let you know that the data is not ready to be shared with collaborators or submitted to a journal. You should only share results that are produced after all your work is committed (safely stored in the version controlled history and thus reproducible). At first sight, version control appears to be mainly a tool for software developers. However progress in a scientific research is almost identical to progress in software development: first you have a rough idea that starts with handful of easy steps. But as the first results appear to be promising, you will have to extend, or generalize, it to make it more robust and work in all the situations your research covers, not just your first test samples. Slowly you will find wrong assumptions or bad implementations that need to be fixed (‘bugs’ in software development parlance). Finally, when you submit the research to your collaborators or a journal, many comments and suggestions will come in, and you have to address them. Software developers have created version control systems precisely for this kind of activity. Each significant moment in the project’s history is called a “commit”, see *note Version controlled source::. A snapshot of the project in each “commit” is safely stored away, so you can revert back to it at a later time, or check changes/progress. This way, you can be sure that your work is reproducible and track the progress and history. With version control, experimentation in the project’s analysis is greatly facilitated, since you can easily revert back if a brainstorm test procedure fails. One important feature of version control is that the research result (FITS image, table, report or paper) can be stamped with the unique commit information that produced it. This information will enable you to exactly reproduce that same result later, even if you have made changes/progress. For one example of a research paper’s reproduction pipeline, please see the reproduction pipeline (https://gitlab.com/makhlaghi/NoiseChisel-paper) of the paper (https://arxiv.org/abs/1505.01664) describing *note NoiseChisel::. ‘CFITSIO’ The version of CFITSIO used (see *note CFITSIO::). ‘WCSLIB’ The version of WCSLIB used (see *note WCSLIB::). Note that older versions of WCSLIB do not report the version internally. So this is only available if you are using more recent WCSLIB versions. ‘GSL’ The version of GNU Scientific Library that was used, see *note GNU Scientific Library::. ‘GNUASTRO’ The version of Gnuastro used (see *note Version numbering::). Here is one example of the last few lines of an example output. / Versions and date DATE = '...' / file creation date COMMIT = 'v0-8-g547f6eb' / Commit description in running dir. CFITSIO = '3.45 ' / CFITSIO version. WCSLIB = '5.19 ' / WCSLIB version. GSL = '2.5 ' / GNU Scientific Library version. GNUASTRO= '0.7 ' / GNU Astronomy Utilities version. END 4.11 Numeric locale =================== If your system locale (https://en.wikipedia.org/wiki/Locale_(computer_software)) is not English, it may happen that the ‘.’ is not used as the decimal separator of basic command-line tools for input or output. for example, in Spanish and some other languages the decimal separator (symbol used to separate the integer and fractional part of a number), is a comma. Therefore in such systems, some programs may print $0.5$ as as ‘‘0,5’’ (instead of ‘‘0.5’’). This mainly happens in some core operating system tools like ‘awk’ or ‘seq’ depend on the locale. This can cause problems for other programs (like those in Gnuastro that expect a ‘<.>’ as the decimal separator). To see the effect, please try the commands below. The first one will print $0.5$ in your default locale’s format. The second set will use the Spanish locale for printing numbers (which will put a comma between the 0 and the 5). The third will use the English (US) locale for printing numbers (which will put a point between the 0 and the 5). $ seq 0.5 1 $ export LC_NUMERIC=es_ES.utf8 $ seq 0.5 1 $ export LC_NUMERIC=en_US.utf8 $ seq 0.5 1 With the simple command below, you can check your current locale environment variables for specifying the formats of various things like date, time, monetary, telephone, numbers, etc. You can change any of these, by simply giving different values to the respective variable like above. For a more complete explanation on each variable, see . $ locale To avoid these kinds of locale-specific problems (for example, another program not being able to read ‘‘0,5’’ as half of unity), you can change the locale by giving the value of ‘C’ to the ‘LC_NUMERIC’ environment variable (or the lower-level/generic ‘LC_ALL’). You will notice that ‘C’ is not a human-language and country identifier like ‘en_US’, it is the programming locale, which is well recognized by programmers in all countries and is available on all Unix-like operating systems (others may not be pre-defined and may need installation). You can set the ‘LC_NUMERIC’ only for a single command (the first one below: simply defining the variable in the same line), or all commands within the running session (the second command below, or “exporting” it to all subsequent commands): ## Change the numeric locale, only for this 'seq' command. $ LC_NUMERIC=C seq 0.5 1 ## Change the locale to the standard, for all commands after it. $ export LC_NUMERIC=C If you want to change it generally for all future sessions, you can put the second command in your shell’s startup file. For more on startup files, please see *note Installation directory::. 5 Data containers ***************** The most low-level and basic property of a dataset is how it is stored. To process, archive and transmit the data, you need a container to store it first. From the start of the computer age, different formats have been defined to store data, optimized for particular applications. One format/container can never be useful for all applications: the storage defines the application and vice-versa. In astronomy, the Flexible Image Transport System (FITS) standard has become the most common format of data storage and transmission. It has many useful features, for example, multiple sub-containers (also known as extensions or header data units, HDUs) within one file, or support for tables as well as images. Each HDU can store an independent dataset and its corresponding meta-data. Therefore, Gnuastro has one program (see *note Fits::) specifically designed to manipulate FITS HDUs and the meta-data (header keywords) in each HDU. Your astronomical research does not just involve data analysis (where the FITS format is very useful). for example, you want to demonstrate your raw and processed FITS images or spectra as figures within slides, reports, or papers. The FITS format is not defined for such applications. Thus, Gnuastro also comes with the ConvertType program (see *note ConvertType::) which can be used to convert a FITS image to and from (where possible) other formats like plain text and JPEG (which allow two way conversion), along with EPS and PDF (which can only be created from FITS, not the other way round). Finally, the FITS format is not just for images, it can also store tables. Binary tables in particular can be very efficient in storing catalogs that have more than a few tens of columns and rows. However, unlike images (where all elements/pixels have one data type), tables contain multiple columns and each column can have different properties: independent data types (see *note Numeric data types::) and meta-data. In practice, each column can be viewed as a separate container that is grouped with others in the table. The only shared property of the columns in a table is thus the number of elements they contain. To allow easy inspection/manipulation of table columns, Gnuastro has the Table program (see *note Table::). It can be used to select certain table columns in a FITS table and see them as a human readable output on the command-line, or to save them into another plain text or FITS table. 5.1 Fits ======== The “Flexible Image Transport System”, or FITS, is by far the most common data container format in astronomy and in constant use since the 1970s. Archiving (future usage, simplicity) has been one of the primary design principles of this format. In the last few decades it has proved so useful and robust that the Vatican Library has also chosen FITS for its “long-term digital preservation” project(1). Although the full name of the standard invokes the idea that it is only for images, it also contains complete and robust features for tables. It started off in the 1970s and was formally published as a standard in 1981, it was adopted by the International Astronomical Union (IAU) in 1982 and an IAU working group to maintain its future was defined in 1988. The FITS 2.0 and 3.0 standards were approved in 2000 and 2008 respectively, and the 4.0 draft has also been released recently, please see the FITS standard document web page (https://fits.gsfc.nasa.gov/fits_standard.html) for the full text of all versions. Also see the FITS 3.0 standard paper (https://doi.org/10.1051/0004-6361/201015362) for a nice introduction and history along with the full standard. Many common image formats, for example, a JPEG, only have one image/dataset per file, however one great advantage of the FITS standard is that it allows you to keep multiple datasets (images or tables along with their separate meta-data) in one file. In the FITS standard, each data + metadata is known as an extension, or more formally a header data unit or HDU. The HDUs in a file can be completely independent: you can have multiple images of different dimensions/sizes or tables as separate extensions in one file. However, while the standard does not impose any constraints on the relation between the datasets, it is strongly encouraged to group data that are contextually related with each other in one file. for example, an image and the table/catalog of objects and their measured properties in that image. Other examples can be images of one patch of sky in different colors (filters), or one raw telescope image along with its calibration data (tables or images). As discussed above, the extensions in a FITS file can be completely independent. To keep some information (meta-data) about the group of extensions in the FITS file, the community has adopted the following convention: put no data in the first extension, so it is just meta-data. This extension can thus be used to store Meta-data regarding the whole file (grouping of extensions). Subsequent extensions may contain data along with their own separate meta-data. All of Gnuastro’s programs also follow this convention: the main output dataset(s) are placed in the second (or later) extension(s). The first extension contains no data the program’s configuration (input file name, along with all its option values) are stored as its meta-data, see *note Output FITS files::. The meta-data contain information about the data, for example, which region of the sky an image corresponds to, the units of the data, what telescope, camera, and filter the data were taken with, it observation date, or the software that produced it and its configuration. Without the meta-data, the raw dataset is practically just a collection of numbers and really hard to understand, or connect with the real world (other datasets). It is thus strongly encouraged to supplement your data (at any level of processing) with as much meta-data about your processing/science as possible. The meta-data of a FITS file is in ASCII format, which can be easily viewed or edited with a text editor or on the command-line. Each meta-data element (known as a keyword generally) is composed of a name, value, units and comments (the last two are optional). for example, below you can see three FITS meta-data keywords for specifying the world coordinate system (WCS, or its location in the sky) of a dataset: LATPOLE = -27.805089 / [deg] Native latitude of celestial pole RADESYS = 'FK5' / Equatorial coordinate system EQUINOX = 2000.0 / [yr] Equinox of equatorial coordinates However, there are some limitations which discourage viewing/editing the keywords with text editors. for example, there is a fixed length of 80 characters for each keyword (its name, value, units and comments) and there are no new-line characters, so on a text editor all the keywords are seen in one line. Also, the meta-data keywords are immediately followed by the data which are commonly in binary format and will show up as strange looking characters on a text editor, and significantly slowing down the processor. Gnuastro’s Fits program was designed to allow easy manipulation of FITS extensions and meta-data keywords on the command-line while conforming fully with the FITS standard. for example, you can copy or cut (copy and remove) HDUs/extensions from one FITS file to another, or completely delete them. It also has features to delete, add, or edit meta-data keywords within one HDU. ---------- Footnotes ---------- (1) 5.1.1 Invoking Fits ------------------- Fits can print or manipulate the FITS file HDUs (extensions), meta-data keywords in a given HDU. The executable name is ‘astfits’ with the following general template $ astfits [OPTION...] ASTRdata One line examples: ## View general information about every extension: $ astfits image.fits ## Print the header keywords in the second HDU (counting from 0): $ astfits image.fits -h1 ## Only print header keywords that contain `NAXIS': $ astfits image.fits -h1 | grep NAXIS ## Only print the WCS standard PC matrix elements $ astfits image.fits -h1 | grep 'PC._.' ## Copy a HDU from input.fits to out.fits: $ astfits input.fits --copy=hdu-name --output=out.fits ## Update the OLDKEY keyword value to 153.034: $ astfits --update=OLDKEY,153.034,"Old keyword comment" ## Delete one COMMENT keyword and add a new one: $ astfits --delete=COMMENT --comment="Anything you like ;-)." ## Write two new keywords with different values and comments: $ astfits --write=MYKEY1,20.00,"An example keyword" --write=MYKEY2,fd ## Inspect individual pixel area taken based on its WCS (in degree^2). ## Then convert the area to arcsec^2 with the Arithmetic program. $ astfits input.fits --pixareaonwcs -o pixarea.fits $ astarithmetic pixarea.fits 3600 3600 x x -o pixarea_arcsec2.fits When no action is requested (and only a file name is given), Fits will print a list of information about the extension(s) in the file. This information includes the HDU number, HDU name (‘EXTNAME’ keyword), type of data (see *note Numeric data types::, and the number of data elements it contains (size along each dimension for images and table rows and columns). Optionally, a comment column is printed for special situations (like a 2D HEALPix grid that is usually stored as a 1D dataset/table). You can use this to get a general idea of the contents of the FITS file and what HDU to use for further processing, either with the Fits program or any other Gnuastro program. Here is one example of information about a FITS file with four extensions: the first extension has no data, it is a purely meta-data HDU (commonly used to keep meta-data about the whole file, or grouping of extensions, see *note Fits::). The second extension is an image with name ‘IMAGE’ and single precision floating point type (‘float32’, see *note Numeric data types::), it has 4287 pixels along its first (horizontal) axis and 4286 pixels along its second (vertical) axis. The third extension is also an image with name ‘MASK’. It is in 2-byte integer format (‘int16’) which is commonly used to keep information about pixels (for example, to identify which ones were saturated, or which ones had cosmic rays and so on), note how it has the same size as the ‘IMAGE’ extension. The third extension is a binary table called ‘CATALOG’ which has 12371 rows and 5 columns (it probably contains information about the sources in the image). GNU Astronomy Utilities X.X Run on Day Month DD HH:MM:SS YYYY ----- HDU (extension) information: `image.fits'. Column 1: Index (counting from 0). Column 2: Name (`EXTNAME' in FITS standard). Column 3: Image data type or `table' format (ASCII or binary). Column 4: Size of data in HDU. ----- 0 n/a uint8 0 1 IMAGE float32 4287x4286 2 MASK int16 4287x4286 3 CATALOG table_binary 12371x5 If a specific HDU is identified on the command-line with the ‘--hdu’ (or ‘-h’ option) and no operation requested, then the full list of header keywords in that HDU will be printed (as if the ‘--printallkeys’ was called, see below). It is important to remember that this only occurs when ‘--hdu’ is given on the command-line. The ‘--hdu’ value given in a configuration file will only be used when a specific operation on keywords requested. Therefore as described in the paragraphs above, when no explicit call to the ‘--hdu’ option is made on the command-line and no operation is requested (on the command-line or configuration files), the basic information of each HDU/extension is printed. The operating mode and input/output options to Fits are similar to the other programs and fully described in *note Common options::. The options particular to Fits can be divided into three groups: 1) those related to modifying HDUs or extensions (see *note HDU information and manipulation::), and 2) those related to viewing/modifying meta-data keywords (see *note Keyword inspection and manipulation::). 3) those related to creating meta-images where each pixel shows values for a specific property of the image (see *note Pixel information images::). These three classes of options cannot be called together in one run: you can either work on the extensions, meta-data keywords in any instance of Fits, or create meta-images where each pixel shows a particular information about the image itself. 5.1.1.1 HDU information and manipulation ........................................ Each FITS file header data unit, or HDU (also known as an extension) is an independent dataset (data + meta-data). Multiple HDUs can be stored in one FITS file, see *note Fits::. The general HDU-related options to the Fits program are listed below as two general classes: the first group below focus on HDU information while the latter focus on manipulating (moving or deleting) the HDUs. The options below print information about the given HDU on the command-line. Thus they cannot be called together in one command (each has its own independent output). ‘-n’ ‘--numhdus’ Print the number of extensions/HDUs in the given file. Note that this option must be called alone and will only print a single number. It is thus useful in scripts, for example, when you need to do check the number of extensions in a FITS file. For a complete list of basic meta-data on the extensions in a FITS file, do not use any of the options in this section or in *note Keyword inspection and manipulation::. For more, see *note Invoking astfits::. ‘--hastablehdu’ Print ‘1’ (on standard output) if at least one table HDU (ASCII or binary) exists in the FITS file. Otherwise (when no table HDU exists in the file), print ‘0’. ‘--listtablehdus’ Print the names or numbers (when a name does not exist, counting from zero) of HDUs that contain a table (ASCII or Binary) on standard output, one per line. Otherwise (when no table HDU exists in the file) nothing will be printed. ‘--hasimagehdu’ Print ‘1’ (on standard output) if at least one image HDU exists in the FITS file. Otherwise (when no image HDU exists in the file), print ‘0’. In the FITS standard, any array with any dimensions is called an “image”, therefore this option includes 1, 3 and 4 dimensional arrays too. However, an image HDU with zero dimensions (which is usually the first extension and only contains metadata) is not counted here. ‘--listimagehdus’ Print the names or numbers (when a name does not exist, counting from zero) of HDUs that contain an image on standard output, one per line. Otherwise (when no image HDU exists in the file) nothing will be printed. In the FITS standard, any array with any dimensions is called an “image”, therefore this option includes 1, 3 and 4 dimensional arrays too. However, an image HDU with zero dimensions (which is usually the first extension and only contains metadata) is not counted here. ‘--listallhdus’ Print the names or numbers (when a name does not exist, counting from zero) of all HDUs within the input file on the standard output, one per line. ‘--pixelscale’ Print the HDU’s pixel-scale (change in world coordinate for one pixel along each dimension) and pixel area or voxel volume. Without the ‘--quiet’ option, the output of ‘--pixelscale’ has multiple lines and explanations, thus being more human-friendly. It prints the file/HDU name, number of dimensions, and the units along with the actual pixel scales. Also, when any of the units are in degrees, the pixel scales and area/volume are also printed in units of arc-seconds. For 3D datasets, the pixel area (on each 2D slice of the 3D cube) is printed as well as the voxel volume. If you only want the pixel area of a 2D image in units of arcsec$^2$ you can use ‘--pixelareaarcsec2’ described below. However, in scripts (that are to be run automatically), this human-friendly format is annoying, so when called with the ‘--quiet’ option, only the pixel-scale value(s) along each dimension is(are) printed in one line. These numbers are followed by the pixel area (in the raw WCS units). For 3D datasets, this will be area on each 2D slice. Finally, for 3D datasets, a final number (the voxel volume) is printed. As a summary, in ‘--quiet’ mode, for 2D datasets three numbers are printed and for 3D datasets, 5 numbers are printed. If the dataset has more than 3 dimensions, only the pixel-scale values are printed (no area or volume will be printed). ‘--pixelareaarcsec2’ Print the HDU’s pixel area in units of arcsec$^2$. This option only works on 2D images, that have WCS coordinates in units of degrees. For lower-level information about the pixel scale in each dimension, see ‘--pixelscale’ (described above). ‘--skycoverage’ Print the rectangular area (or 3D cube) covered by the given image/datacube HDU over the Sky in the WCS units. The covered area is reported in two ways: 1) the center and full width in each dimension, 2) the minimum and maximum sky coordinates in each dimension. This is option is thus useful when you want to get a general feeling of a new image/dataset, or prepare the inputs to query external databases in the region of the image (for example, with *note Query::). If run without the ‘--quiet’ option, the values are given with a human-friendly description. for example, here is the output of this option on an image taken near the star Castor: $ astfits castor.fits --skycoverage Input file: castor.fits (hdu: 1) Sky coverage by center and (full) width: Center: 113.9149075 31.93759664 Width: 2.41762045 2.67945253 Sky coverage by range along dimensions: RA 112.7235592 115.1411797 DEC 30.59262123 33.27207376 With the ‘--quiet’ option, the values are more machine-friendly (easy to parse). It has two lines, where the first line contains the center/width values and the second line shows the coordinate ranges in each dimension. $ astfits castor.fits --skycoverage --quiet 113.9149075 31.93759664 2.41762045 2.67945253 112.7235592 115.1411797 30.59262123 33.27207376 Note that this is a simple rectangle (cube in 3D) definition, so if the image is rotated in relation to the celestial coordinates a general polygon is necessary to exactly describe the coverage. Hence when there is rotation, the reported area will be larger than the actual area containing data, you can visually see the area with the ‘--pixelareaonwcs’ option of *note Fits::. Currently this option only supports images that are less than 180 degrees in width (which is usually the case!). This requirement has been necessary to account for images that cross the RA=0 hour circle on the sky. Please get in touch with us at if you have an image that is larger than 180 degrees so we try to find a solution based on need. ‘--datasum’ Calculate and print the given HDU’s "datasum" to stdout. The given HDU is specified with the ‘--hdu’ (or ‘-h’) option. This number is calculated by parsing all the bytes of the given HDU’s data records (excluding keywords). This option ignores any possibly existing ‘DATASUM’ keyword in the HDU. For more on ‘DATASUM’ in the FITS standard, see *note Keyword inspection and manipulation:: (under the ‘checksum’ component of ‘--write’). You can use this option to confirm that the data in two different HDUs (possibly with different keywords) is identical. Its advantage over ‘--write=datasum’ (which writes the ‘DATASUM’ keyword into the given HDU) is that it does not require write permissions. The following options manipulate (move/delete) the HDUs in one FITS file or to another FITS file. These options may be called multiple times in one run. If so, the extensions will be copied from the input FITS file to the output FITS file in the given order (on the command-line and also in configuration files, see *note Configuration file precedence::). If the separate classes are called together in one run of Fits, then first ‘--copy’ is run (on all specified HDUs), followed by ‘--cut’ (again on all specified HDUs), and then ‘--remove’ (on all specified HDUs). The ‘--copy’ and ‘--cut’ options need an output FITS file (specified with the ‘--output’ option). If the output file exists, then the specified HDU will be copied following the last extension of the output file (the existing HDUs in it will be untouched). Thus, after Fits finishes, the copied HDU will be the last HDU of the output file. If no output file name is given, then automatic output will be used to store the HDUs given to this option (see *note Automatic output::). ‘-C STR’ ‘--copy=STR’ Copy the specified extension into the output file, see explanations above. ‘-k STR’ ‘--cut=STR’ Cut (copy to output, remove from input) the specified extension into the output file, see explanations above. ‘-R STR’ ‘--remove=STR’ Remove the specified HDU from the input file. The first (zero-th) HDU cannot be removed with this option. Consider using ‘--copy’ or ‘--cut’ in combination with ‘primaryimghdu’ to not have an empty zero-th HDU. From CFITSIO: “In the case of deleting the primary array (the first HDU in the file) then [it] will be replaced by a null primary array containing the minimum set of required keywords and no data.”. So in practice, any existing data (array) and meta-data in the first extension will be removed, but the number of extensions in the file will not change. This is because of the unique position the first FITS extension has in the FITS standard (for example, it cannot be used to store tables). ‘--primaryimghdu’ Copy or cut an image HDU to the zero-th HDU/extension a file that does not yet exist. This option is thus irrelevant if the output file already exists or the copied/cut extension is a FITS table. for example, with the commands below, first we make sure that ‘out.fits’ does not exist, then we copy the first extension of ‘in.fits’ to the zero-th extension of ‘out.fits’. $ rm -f out.fits $ astfits in.fits --copy=1 --primaryimghdu --output=out.fits If we had not used ‘--primaryimghdu’, then the zero-th extension of ‘out.fits’ would have no data, and its second extension would host the copied image (just like any other output of Gnuastro). 5.1.1.2 Keyword inspection and manipulation ........................................... The meta-data in each header data unit, or HDU (also known as extension, see *note Fits::) is stored as “keyword”s. Each keyword consists of a name, value, unit, and comments. The Fits program (see *note Fits::) options related to viewing and manipulating keywords in a FITS HDU are described below. First, let’s review the ‘--keyvalue’ option which should be called separately from the rest of the options described in this section. Also, unlike the rest of the options in this section, with ‘--keyvalue’, you can give more than one input file. ‘-l STR[,STR[,...]’ ‘--keyvalue=STR[,STR[,...]’ Only print the value of the requested keyword(s): the ‘STR’s. ‘--keyvalue’ can be called multiple times, and each call can contain multiple comma-separated keywords. If more than one file is given, this option uses the same HDU/extension for all of them (value to ‘--hdu’). For example, you can get the number of dimensions of the three FITS files in the running directory, as well as the length along each dimension, with this command: $ astfits *.fits --keyvalue=NAXIS,NAXIS1 --keyvalue=NAXIS2 image-a.fits 2 774 672 image-b.fits 2 774 672 image-c.fits 2 387 336 If only one input is given, and the ‘--quiet’ option is activated, the file name is not printed on the first column, only the values of the requested keywords. $ astfits image-a.fits --keyvalue=NAXIS,NAXIS1 \ --keyvalue=NAXIS2 --quiet 2 774 672 The output is internally stored (and finally printed) as a table (with one column per keyword). Therefore just like the Table program, you can use ‘--colinfoinstdout’ to print the metadata like the example below (also see *note Invoking asttable::). The keyword metadata (comments and units) are extracted from the comments and units of the keyword in the input files (first file that has a comment or unit). Hence if the keyword does not have units or comments in any of the input files, they will be empty. For more on Gnuastro’s plain-text metadata format, see *note Gnuastro text table format::. $ astfits *.fits --keyvalue=NAXIS,NAXIS1,NAXIS2 \ --colinfoinstdout # Column 1: FILENAME [name,str10,] Name of input file. # Column 2: NAXIS [ ,u8 ,] number of data axes # Column 3: NAXIS1 [ ,u16 ,] length of data axis 1 # Column 4: NAXIS2 [ ,u16 ,] length of data axis 2 image-a.fits 2 774 672 image-b.fits 2 774 672 image-c.fits 2 387 336 Another advantage of a table output is that you can directly write the table to a file. for example, if you add ‘--output=fileinfo.fits’, the information above will be printed into a FITS table. You can also pipe it into *note Table:: to select files based on certain properties, to sort them based on another property, or any other operation that can be done with Table (including *note Column arithmetic::). for example, with the command below, you can select all the files that have a size larger than 500 pixels in both dimensions. $ astfits *.fits --keyvalue=NAXIS,NAXIS1,NAXIS2 \ --colinfoinstdout \ | asttable --range=NAXIS1,500,inf \ --range=NAXIS2,500,inf -cFILENAME image-a.fits image-b.fits Note that ‘--colinfoinstdout’ is necessary to use column names when piping to other programs (like ‘asttable’ above). Also, with the ‘-cFILENAME’ option, we are asking Table to only print the final file names (we do not need the sizes any more). The commands with multiple files above used ‘*.fits’, which is only useful when all your FITS files are in the same directory. However, in many cases, your FITS files will be scattered in multiple sub-directories of a certain top-level directory, or you may only want those with more particular file name patterns. A more powerful way to list the input files to ‘--keyvalue’ is to use the ‘find’ program in Unix-like operating systems. For example, with the command below you can search all the FITS files in all the sub-directories of ‘/TOP/DIR’. astfits $(find /TOP/DIR/ -name "*.fits") --keyvalue=NAXIS2 ‘-O’ ‘--colinfoinstdout’ Print column information (or metadata) above the column values when writing keyword values to standard output with ‘--keyvalue’. You can read this option as column-information-in-standard-output. Below we will discuss the options that can be used to manipulate keywords. To see the full list of keywords in a FITS HDU, you can use the ‘--printallkeys’ option. If any of the keyword modification options below are requested (for example, ‘--update’), the headers of the input file/HDU will be changed first, then printed. Keyword modification is done within the input file. Therefore, if you want to keep the original FITS file or HDU intact, it is easiest to create a copy of the file/HDU first and then run Fits on that (for copying a HDU to another file, see *note HDU information and manipulation::. In the FITS standard, keywords are always uppercase. So case does not matter in the input or output keyword names you specify. *‘CHECKSUM’ automatically updated, when present:* the keyword modification options will change the contents of the HDU. Therefore, if a ‘CHECKSUM’ is present in the HDU, after all the keyword modification options have been complete, Fits will also update ‘CHECKSUM’ before closing the file. Most of the options can accept multiple instances in one command. for example, you can add multiple keywords to delete by calling ‘--delete’ multiple times, since repeated keywords are allowed, you can even delete the same keyword multiple times. The action of such options will start from the top most keyword. The precedence of operations are described below. Note that while the order within each class of actions is preserved, the order of individual actions is not. So irrespective of what order you called ‘--delete’ and ‘--update’. First, all the delete operations are going to take effect then the update operations. 1. ‘--delete’ 2. ‘--rename’ 3. ‘--update’ 4. ‘--write’ 5. ‘--asis’ 6. ‘--history’ 7. ‘--comment’ 8. ‘--date’ 9. ‘--printallkeys’ 10. ‘--verify’ 11. ‘--copykeys’ All possible syntax errors will be reported before the keywords are actually written. FITS errors during any of these actions will be reported, but Fits will not stop until all the operations are complete. If ‘--quitonerror’ is called, then Fits will immediately stop upon the first error. If you want to inspect only a certain set of header keywords, it is easiest to pipe the output of the Fits program to GNU Grep. Grep is a very powerful and advanced tool to search strings which is precisely made for such situations. for example, if you only want to check the size of an image FITS HDU, you can run: $ astfits input.fits | grep NAXIS *FITS STANDARD KEYWORDS:* Some header keywords are necessary for later operations on a FITS file, for example, BITPIX or NAXIS, see the FITS standard for their full list. If you modify (for example, remove or rename) such keywords, the FITS file extension might not be usable any more. Also be careful for the world coordinate system keywords, if you modify or change their values, any future world coordinate system (like RA and Dec) measurements on the image will also change. The keyword related options to the Fits program are fully described below. ‘-d STR’ ‘--delete=STR’ Delete one instance of the ‘STR’ keyword from the FITS header. Multiple instances of ‘--delete’ can be given (possibly even for the same keyword, when its repeated in the meta-data). All keywords given will be removed from the headers in the same given order. If the keyword does not exist, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with ‘--quitonerror’. ‘-r STR,STR’ ‘--rename=STR,STR’ Rename a keyword to a new value (for example, ‘--rename=OLDNAME,NEWNAME’. ‘STR’ contains both the existing and new names, which should be separated by either a comma (<,>) or a space character. Note that if you use a space character, you have to put the value to this option within double quotation marks (<">) so the space character is not interpreted as an option separator. Multiple instances of ‘--rename’ can be given in one command. The keywords will be renamed in the specified order. If the keyword does not exist, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with ‘--quitonerror’. ‘-u STR’ ‘--update=STR’ Update a keyword, its value, its comments and its units in the format described below. If there are multiple instances of the keyword in the header, they will be changed from top to bottom (with multiple ‘--update’ options). The format of the values to this option can best be specified with an example: --update=KEYWORD,value,"comments for this keyword",unit If there is a writing error, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with ‘--quitonerror’. The value can be any numerical or string value(1). Other than the ‘KEYWORD’, all the other values are optional. To leave a given token empty, follow the preceding comma (<,>) immediately with the next. If any space character is present around the commas, it will be considered part of the respective token. So if more than one token has space characters within it, the safest method to specify a value to this option is to put double quotation marks around each individual token that needs it. Note that without double quotation marks, space characters will be seen as option separators and can lead to undefined behavior. ‘-w STR’ ‘--write=STR’ Write a keyword to the header. For the possible value input formats, comments and units for the keyword, see the ‘--update’ option above. The special names (first string) below will cause a special behavior: ‘/’ Write a “title” to the list of keywords. A title consists of one blank line and another which is blank for several spaces and starts with a slash (). The second string given to this option is the “title” or string printed after the slash. for example, with the command below you can add a “title” of ‘My keywords’ after the existing keywords and add the subsequent ‘K1’ and ‘K2’ keywords under it (note that keyword names are not case sensitive). $ astfits test.fits -h1 --write=/,"My keywords" \ --write=k1,1.23,"My first keyword" \ --write=k2,4.56,"My second keyword" $ astfits test.fits -h1 [[[ ... truncated ... ]]] / My keywords K1 = 1.23 / My first keyword K2 = 4.56 / My second keyword END Adding a “title” before each contextually separate group of header keywords greatly helps in readability and visual inspection of the keywords. So generally, when you want to add new FITS keywords, it is good practice to also add a title before them. The reason you need to use as the keyword name for setting a title is that is the first non-white character. The title(s) is(are) written into the FITS with the same order that ‘--write’ is called. Therefore in one run of the Fits program, you can specify many different titles (with their own keywords under them). for example, the command below that builds on the previous example and adds another group of keywords named ‘A1’ and ‘A2’. $ astfits test.fits -h1 --write=/,"My keywords" \ --write=k1,1.23,"My first keyword" \ --write=k2,4.56,"My second keyword" \ --write=/,"My second group of keywords" \ --write=a1,7.89,"First keyword" \ --write=a2,0.12,"Second keyword" ‘checksum’ When nothing is given afterwards, the header integrity keywords ‘DATASUM’ and ‘CHECKSUM’ will be calculated and written/updated. The calculation and writing is done fully by CFITSIO, therefore they comply with the FITS standard 4.0(2) that defines these keywords (its Appendix J). If a value is given (e.g., ‘--write=checksum,MyOwnCheckSum’), then CFITSIO will not be called to calculate these two keywords and the value (as well as possible comment and unit) will be written just like any other keyword. This is generally not recommended since ‘CHECKSUM’ is a reserved FITS standard keyword. If you want to calculate the checksum with another hashing standard manually and write it into the header, it is recommended to use another keyword name. In the FITS standard, ‘CHECKSUM’ depends on the HDU’s data _and_ header keywords, it will therefore not be valid if you make any further changes to the header after writing the ‘CHECKSUM’ keyword. This includes any further keyword modification options in the same call to the Fits program. However, ‘DATASUM’ only depends on the data section of the HDU/extension, so it is not changed when you add, remove or update the header keywords. Therefore, it is recommended to write these keywords as the last keywords that are written/modified in the extension. You can use the ‘--verify’ option (described below) to verify the values of these two keywords. ‘datasum’ Similar to ‘checksum’, but only write the ‘DATASUM’ keyword (that does not depend on the header keywords, only the data). ‘-a STR’ ‘--asis=STR’ Write the given ‘STR’ _exactly_ as it is, into the given FITS file header with no modifications. If the contents of ‘STR’ does not conform to the FITS standard for keywords, then it may (most probably: it will!) corrupt your file and you may not be able to open it any more. So please be *very careful* with this option (its your responsibility to make sure that the string conforms with the FITS standard for keywords). If you want to define the keyword from scratch, it is best to use the ‘--write’ option (see below) and let CFITSIO worry about complying with the FITS standard. Also, you want to copy keywords from one FITS file to another, you can use ‘--copykeys’ that is described below. Through these high-level instances, you don’t have to worry about low-level issues. One common usage of ‘--asis’ occurs when you are given the contents of a FITS header (many keywords) as a plain-text file (so the format of each keyword line conforms with the FITS standard, just the file is plain-text, and you have one keyword per line when you open it in a plain-text editor). In that case, Gnuastro’s Fits program won’t be able to parse it (it doesn’t conform to the FITS standard, which doesn’t have a new-line character!). With the command below, you can insert those headers in ‘headers.txt’ into ‘img.fits’ (its HDU number 1, the default; you can change the HDU to modify with ‘--hdu’). $ cat headers.txt \ | while read line; do \ astfits img.fits --asis="$line"; \ done *Don’t forget a title:* Since the newly added headers in the example above weren’t originally in the file, they are probably some form of high-level metadata. The raw example above will just append the new keywords after the last one. Making it hard for human readability (its not clear what this new group of keywords signify, where they start, and where this group of keywords end). To help the human readability of the header, add a title for this group of keywords before writing them. To do that, run the following command before the ‘cat ...’ command above (replace ‘Imported keys’ with any title that best describes this group of new keywords based on their context): $ astfits img.fits --write=/,"Imported keys" ‘-H STR’ ‘--history STR’ Add a ‘HISTORY’ keyword to the header with the given value. A new ‘HISTORY’ keyword will be created for every instance of this option. If the string given to this option is longer than 70 characters, it will be separated into multiple keyword cards. If there is an error, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with ‘--quitonerror’. ‘-c STR’ ‘--comment STR’ Add a ‘COMMENT’ keyword to the header with the given value. Similar to the explanation for ‘--history’ above. ‘-t’ ‘--date’ Put the current date and time in the header. If the ‘DATE’ keyword already exists in the header, it will be updated. If there is a writing error, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with ‘--quitonerror’. ‘-p’ ‘--printallkeys’ Print the full meta data (keywords, values, units and comments) in the specified FITS extension (HDU). If this option is called along with any of the other keyword editing commands, as described above, all other editing commands take precedence to this. Therefore, it will print the final keywords after all the editing has been done. ‘--printkeynames’ Print only the keyword names of the specified FITS extension (HDU), one line per name. This option must be called alone. ‘-v’ ‘--verify’ Verify the ‘DATASUM’ and ‘CHECKSUM’ data integrity keywords of the FITS standard. See the description under the ‘checksum’ (under ‘--write’, above) for more on these keywords. This option will print ‘Verified’ for both keywords if they can be verified. Otherwise, if they do not exist in the given HDU/extension, it will print ‘NOT-PRESENT’, and if they cannot be verified it will print ‘INCORRECT’. In the latter case (when the keyword values exist but cannot be verified), the Fits program will also return with a failure. By default this function will also print a short description of the ‘DATASUM’ AND ‘CHECKSUM’ keywords. You can suppress this extra information with ‘--quiet’ option. ‘--copykeys=INT:INT/STR,STR[,STR]’ Copy the desired set of the input’s keyword records, to the to the output (specified with the ‘--output’ and ‘--outhdu’ for the filename and HDU/extension respectively). The keywords to copy can be given either as a range (in the format of ‘INT:INT’, inclusive) or a list of keyword names as comma-separated strings (‘STR,STR’), the list can have any number of keyword names. More details and examples of the two forms are given below: Range The given string to this option must be two integers separated by a colon (<:>). The first integer must be positive (counting of the keyword records starts from 1). The second integer may be negative (zero is not acceptable) or an integer larger than the first. A negative second integer means counting from the end. So ‘-1’ is the last copy-able keyword (not including the ‘END’ keyword). To see the header keywords of the input with a number before them, you can pipe the output of the FITS program (when it prints all the keywords in an extension) into the ‘cat’ program like below: $ astfits input.fits -h1 | cat -n List of names The given string to this option must be a comma separated list of keyword names. For example, see the command below: $ astfits input.fits -h1 --copykeys=KEY1,KEY2 \ --output=output.fits --outhdu=1 Please consider the notes below when copying keywords with names: • If the number of characters in the name is more than 8, CFITSIO will place a ‘HIERARCH’ before it. In this case simply give the name and do not give the ‘HIERARCH’ (which is a constant and not considered part of the keyword name). • If your keyword name is composed only of digits, do not give it as the first name given to ‘--copykeys’. Otherwise, it will be confused with the range format above. You can safely give an only-digit keyword name as the second, or third requested keywords. • If the keyword is repeated more than once in the header, currently only the first instance will be copied. In other words, even if you call ‘--copykeys’ multiple times with the same keyword name, its first instance will be copied. If you need to copy multiple instances of the same keyword, please get in touch with us at ‘bug-gnuastro@gnu.org’. ‘--outhdu’ The HDU/extension to write the output keywords of ‘--copykeys’. ‘-Q’ ‘--quitonerror’ Quit if any of the operations above are not successful. By default if an error occurs, Fits will warn the user of the faulty keyword and continue with the rest of actions. ‘-s STR’ ‘--datetosec STR’ Interpret the value of the given keyword in the FITS date format (most generally: ‘YYYY-MM-DDThh:mm:ss.ddd...’) and return the corresponding Unix epoch time (number of seconds that have passed since 00:00:00 Thursday, January 1st, 1970). The ‘Thh:mm:ss.ddd...’ section (specifying the time of day), and also the ‘.ddd...’ (specifying the fraction of a second) are optional. The value to this option must be the FITS keyword name that contains the requested date, for example, ‘--datetosec=DATE-OBS’. This option can also interpret the older FITS date format (‘DD/MM/YYThh:mm:ss.ddd...’) where only two characters are given to the year. In this case (following the GNU C Library), this option will make the following assumption: values 68 to 99 correspond to the years 1969 to 1999, and values 0 to 68 as the years 2000 to 2068. This is a very useful option for operations on the FITS date values, for example, sorting FITS files by their dates, or finding the time difference between two FITS files. The advantage of working with the Unix epoch time is that you do not have to worry about calendar details (for example, the number of days in different months, or leap years). ‘--wcscoordsys=STR’ Convert the coordinate system of the image’s world coordinate system (WCS) to the given coordinate system (‘STR’) and write it into the file given to ‘--output’ (or an automatically named file if no ‘--output’ has been given). for example, with the command below, ‘img-eq.fits’ will have an identical dataset (pixel values) as ‘image.fits’. However, the WCS coordinate system of ‘img-eq.fits’ will be the equatorial coordinate system in the Julian calendar epoch 2000 (which is the most common epoch used today). Fits will automatically extract the current coordinate system of ‘image.fits’ and as long as it is one of the recognized coordinate systems listed below, it will do the conversion. $ astfits image.fits --wcscoordsys=eq-j2000 \ --output=img-eq.fits The currently recognized coordinate systems are listed below (the most common one today is ‘eq-j2000’): ‘eq-j2000’ 2000.0 (Julian-year) equatorial coordinates. ‘eq-b1950’ 1950.0 (Besselian-year) equatorial coordinates. ‘ec-j2000’ 2000.0 (Julian-year) ecliptic coordinates. ‘ec-b1950’ 1950.0 (Besselian-year) ecliptic coordinates. ‘galactic’ Galactic coordinates. ‘supergalactic’ Supergalactic coordinates. The Equatorial and Ecliptic coordinate systems are defined by the mean equator and equinox epoch: either the Besselian year 1950.0, or the Julian year 2000. For more on their difference and links for further reading about epochs in astronomy, please see the description in Wikipedia (https://en.wikipedia.org/wiki/Epoch_(astronomy)). ‘--wcsdistortion=STR’ If the argument has a WCS distortion, the output (file given with the ‘--output’ option) will have the distortion given to this option (for example, ‘SIP’, ‘TPV’). The output will be a new file (with a copy of the image, and the new WCS), so if it already exists, the file will be delete (unless you use the ‘--dontdelete’ option, see *note Input output options::). With this option, the FITS program will read the minimal set of keywords from the input HDU and the HDU data, it will then write them into the file given to the ‘--output’ option but with a newly created set of WCS-related keywords corresponding to the desired distortion standard. If no ‘--output’ file is specified, an automatically generated output name will be used which is composed of the input’s name but with the ‘-DDD.fits’ suffix, see *note Automatic output::. Where ‘DDD’ is the value given to this option (desired output distortion). Note that all possible conversions between all standards are not yet supported. If the requested conversion is not supported, an informative error message will be printed. If this happens, please let us know and we will try our best to add the respective conversions. for example, with the command below, you can be sure that if ‘in.fits’ has a distortion in its WCS, the distortion of ‘out.fits’ will be in the SIP standard. $ astfits in.fits --wcsdistortion=SIP --output=out.fits ---------- Footnotes ---------- (1) Some tricky situations arise with values like ‘‘87095e5’’, if this was intended to be a number it will be kept in the header as ‘8709500000’ and there is no problem. But this can also be a shortened Git commit hash. In the latter case, it should be treated as a string and stored as it is written. Commit hashes are very important in keeping the history of a file during your research and such values might arise without you noticing them in your reproduction pipeline. One solution is to use ‘git describe’ instead of the short hash alone. A less recommended solution is to add a space after the commit hash and Fits will write the value as ‘‘87095e5 ’’ in the header. If you later compare the strings on the shell, the space character will be ignored by the shell in the latter solution and there will be no problem. (2) 5.1.1.3 Pixel information images ................................ In *note Keyword inspection and manipulation:: options like ‘--pixelscale’ were introduced for information on the pixels from the keywords. But that only provides a single value for all the pixels! This will not be sufficient in some scenarios; for example due to distortion, different regions of the image will have different pixel areas when projected onto the sky. The options in this section provide such “meta” images: images where the pixel values are information about the pixel itself. Such images can be useful in understanding the underlying pixel grid with the same tools that you study the astronomical objects within the image (like *note SAO DS9::). After all, nothing beats visual inspection with tools you are familiar with. ‘--pixareaonwcs’ Create a meta-image where each pixel’s value shows its area in the WCS units (usually degrees squared). The output is therefore the same size as the input. This option uses the same “pixel mixing” or “area resampling” concept that is described in *note Resampling:: (as part of the Warp program). Similar to Warp, its sampling can be tuned with the ‘--edgesampling’ that is described below. One scenario where this option becomes handy is when you are debugging aligned images using the Warp program (see *note Warp::). You may observe gradients after warping and can check if they caused by the distortion of the instrument or not. Such gradients can happen due to distortions because the detectors pixels are measuring photons from different areas on the sky (or the type of projection you’re seeing). This effect is more pronounced in images covering larger portions of the sky, for instance, the TESS images(1). Here is an example usage of the ‘--pixareaonwcs’ option: # Check the area each 'input.fits' pixel takes in sky $ astfits input.fits -h1 --pixareaonwcs -o pixarea.fits # Convert each pixel's area to arcsec^2 $ astarithmetic pixarea.fits 3600 3600 x x \ --output=pixarea_arcsec2.fits # Compare area relative to the actual reported pixel scale $ pixarea=$(astfits input.fits --pixelscale -q \ | awk '{print $3}') $ astarithmetic pixarea.fits $pixarea / -o pixarea_rel.fits ‘--edgesampling=INT’ Extra sampling along the pixel edges for ‘--pixareaonwcs’. The default value is 0, meaning that only the pixel vertices are used. Values greater than zero improve the accuracy in the expense of greater time and memory consumption. With that said, the default value of zero usually has a good precision unless the given image has extreme distortions that produce irregular pixel shapes. For more, see *note Align pixels with WCS considering distortions::). *Caution:* This option does not “oversample” the output image! Rather, it makes Warp use more points to calculate the _input_ pixel area. To oversample the output image, set a reasonable ‘--cdelt’ value. ---------- Footnotes ---------- (1) 5.2 ConvertType =============== The FITS format used in astronomy was defined mainly for archiving, transmission, and processing. In other situations, the data might be useful in other formats. For example, when you are writing a paper or report, or if you are making slides for a talk, you cannot use a FITS image. Other image formats should be used. In other cases you might want your pixel values in a table format as plain text for input to other programs that do not recognize FITS. ConvertType is created for such situations. The various types will increase with future updates and based on need. The conversion is not only one way (from FITS to other formats), but two ways (except the EPS and PDF formats(1)). So you can also convert a JPEG image or text file into a FITS image. Basically, other than EPS/PDF, you can use any of the recognized formats as different color channel inputs to get any of the recognized outputs. Before explaining the options and arguments (in *note Invoking astconvertt::), we will start with a short discussion on the difference between raster and vector graphics in *note Raster and Vector graphics::. In ConvertType, vector graphics are used to add markers over your originally rasterized data, producing high quality images, ready to be used in your exciting papers. We will continue with a description of the recognized files types in *note Recognized file formats::, followed a short introduction to digital color in *note Color::. A tutorial on how to add markers over an image is then given in *note Marking objects for publication:: and we conclude with a LaTeX based solution to add coordinates over an image. ---------- Footnotes ---------- (1) Because EPS and PDF are vector, not raster/pixelated formats 5.2.1 Raster and Vector graphics -------------------------------- Images that are produced by a hardware (for example, the camera in your phone, or the camera connected to a telescope) provide pixelated data. Such data are therefore stored in a Raster graphics (https://en.wikipedia.org/wiki/Raster_graphics) format which has discrete, independent, equally spaced data elements. for example, this is the format used FITS (see *note Fits::), JPEG, TIFF, PNG and other image formats. On the other hand, when something is generated by the computer (for example, a diagram, plot or even adding a cross over a camera image to highlight something there), there is no “observation” or connection with nature! Everything is abstract! For such things, it is much easier to draw a mathematical line (with infinite resolution). Therefore, no matter how much you zoom-in, it will never get pixelated. This is the realm of Vector graphics (https://en.wikipedia.org/wiki/Vector_graphics). If you open the Gnuastro manual in PDF format (https://www.gnu.org/software/gnuastro/manual/gnuastro.pdf) You can see such graphics in the Gnuastro manual, for example, in *note Circles and the complex plane:: or *note Distance on a 2D curved space::. The most common vector graphics format is PDF for document sharing or SVG for web-based applications. The pixels of a raster image can be shown as vector-based squares with different shades, so vector graphics can generally also support raster graphics. This is very useful when you want to add some graphics over an image to help your discussion (for example a $+$ over your object of interest). However, vector graphics is not optimized for rasterized data (which are usually also noisy!), and can either not display nicely, or result in much larger file volume (in bytes). Therefore, if it is not necessary to add any marks over a FITS image, for example, it may be better to store it in a rasterized format. The distinction between the vector and raster graphics is also the primary theme behind Gnuastro’s logo, see *note Logo of Gnuastro::. 5.2.2 Recognized file formats ----------------------------- The various standards and the file name extensions recognized by ConvertType are listed below. For a review on the difference between Raster and Vector graphics, see *note Raster and Vector graphics::. For a review on the concept of color and channels, see *note Color::. Currently, except for the FITS format, Gnuastro uses the file name’s suffix to identify the format, so if the file’s name does not end with one of the suffixes mentioned below, it will not be recognized. FITS or IMH Astronomical data are commonly stored in the FITS format (or the older data IRAF ‘.imh’ format), a list of file name suffixes which indicate that the file is in this format is given in *note Arguments::. FITS is a raster graphics format. Each image extension of a FITS file only has one value per pixel/element. Therefore, when used as input, each input FITS image contributes as one color channel. If you want multiple extensions in one FITS file for different color channels, you have to repeat the file name multiple times and use the ‘--hdu’, ‘--hdu2’, ‘--hdu3’ or ‘--hdu4’ options to specify the different extensions. JPEG The JPEG standard was created by the Joint photographic experts group. It is currently one of the most commonly used image formats. Its major advantage is the compression algorithm that is defined by the standard. Like the FITS standard, this is a raster graphics format, which means that it is pixelated. A JPEG file can have 1 (for gray-scale), 3 (for RGB) and 4 (for CMYK) color channels. If you only want to convert one JPEG image into other formats, there is no problem, however, if you want to use it in combination with other input files, make sure that the final number of color channels does not exceed four. If it does, then ConvertType will abort and notify you. The file name endings that are recognized as a JPEG file for input are: ‘.jpg’, ‘.JPG’, ‘.jpeg’, ‘.JPEG’, ‘.jpe’, ‘.jif’, ‘.jfif’ and ‘.jfi’. TIFF TIFF (or Tagged Image File Format) was originally designed as a common format for scanners in the early 90s and since then it has grown to become very general. In many aspects, the TIFF standard is similar to the FITS image standard: it can allow data of many types (see *note Numeric data types::), and also allows multiple images to be stored in a single file (like a FITS extension: each image in the file is called a ‘directory’ in the TIFF standard). However, unlike FITS, it can only store images, it has no constructs for tables. Also unlike FITS, each ‘directory’ of a TIFF file can have a multi-channel (e.g., RGB) image. Another (inconvenient) difference with the FITS standard is that keyword names are stored as numbers, not human-readable text. However, outside of astronomy, because of its support of different numeric data types, many fields use TIFF images for accurate (for example, 16-bit integer or floating point for example) imaging data. Currently ConvertType can only read TIFF images, if you are interested in writing TIFF images, please get in touch with us. EPS The Encapsulated PostScript (EPS) format is essentially a one page PostScript file which has a specified size. Postscript is used to store a full document like this whole Gnuastro book. PostScript therefore also includes non-image data, for example, lines and texts. It is a fully functional programming language to describe a document. A PostScript file is a plain text file that can be edited like any program source with any plain-text editor. Therefore in ConvertType, EPS is only an output format and cannot be used as input. Contrary to the FITS or JPEG formats, PostScript is not a raster format, but is categorized as vector graphics. With these features in mind, you can see that when you are compiling a document with TeX or LaTeX, using an EPS file is much more low level than a JPEG and thus you have much greater control and therefore quality. Since it also includes vector graphic lines we also use such lines to make a thin border around the image to make its appearance in the document much better. Furthermore, through EPS, you can add marks over the image in many shapes and colors. No matter the resolution of the display or printer, these lines will always be clear and not pixelated. However, this can be done better with tools within TeX or LaTeX such as PGF/Tikz(1). If the final input image (possibly after all operations on the flux explained below) is a binary image or only has two colors of black and white (in segmentation maps for example), then PostScript has another great advantage compared to other formats. It allows for 1 bit pixels (pixels with a value of 0 or 1), this can decrease the output file size by 8 times. So if a gray-scale image is binary, ConvertType will exploit this property in the EPS and PDF (see below) outputs. The standard formats for an EPS file are ‘.eps’, ‘.EPS’, ‘.epsf’ and ‘.epsi’. The EPS outputs of ConvertType have the ‘.eps’ suffix. PDF The Portable Document Format (PDF) is currently the most common format for documents. It is a vector graphics format, allowing abstract constructs like marks or borders. The PDF format is based on Postscript, so it shares all the features mentioned above for EPS. To be able to display it is programmed content or print, a Postscript file needs to pass through a processor or compiler. A PDF file can be thought of as the processed output of the PostScript compiler. PostScript, EPS and PDF were created and are registered by Adobe Systems. As explained under EPS above, a PDF document is a static document description format, viewing its result is therefore much faster and more efficient than PostScript. To create a PDF output, ConvertType will make an EPS file and convert that to PDF using GPL Ghostscript. The suffixes recognized for a PDF file are: ‘.pdf’, ‘.PDF’. If GPL Ghostscript cannot be run on the PostScript file, The EPS will remain and a warning will be printed (see *note Optional dependencies::). ‘blank’ This is not actually a file type! But can be used to fill one color channel with a blank value. If this argument is given for any color channel, that channel will not be used in the output. Plain text The value of each pixel in a 2D image can be written as a 2D matrix in a plain-text file. Therefore, for the purpose of ConvertType, plain-text files are a single-channel raster graphics file format. Plain text files have the advantage that they can be viewed with any text editor or on the command-line. Most programs also support input as plain text files. As input, each plain text file is considered to contain one color channel. In ConvertType, the recognized extensions for plain text files are ‘.txt’ and ‘.dat’. As described in *note Invoking astconvertt::, if you just give these extensions, (and not a full filename) as output, then automatic output will be preformed to determine the final output name (see *note Automatic output::). Besides these, when the format of a file cannot be recognized from its name, ConvertType will fall back to plain text mode. So you can use any name (even without an extension) for a plain text input or output. Just note that when the suffix is not recognized, automatic output will not be preformed. The basic input/output on plain text images is very similar to how tables are read/written as described in *note Gnuastro text table format::. Simply put, the restrictions are very loose, and there is a convention to define a name, units, data type (see *note Numeric data types::), and comments for the data in a commented line. The only difference is that as a table, a text file can contain many datasets (columns), but as a 2D image, it can only contain one dataset. As a result, only one information comment line is necessary for a 2D image, and instead of the starting ‘‘# Column N’’ (‘N’ is the column number), the information line for a 2D image must start with ‘‘# Image 1’’. When ConvertType is asked to output to plain text file, this information comment line is written before the image pixel values. When converting an image to plain text, consider the fact that if the image is large, the number of columns in each line will become very large, possibly making it very hard to open in some text editors. Standard output (command-line) This is very similar to the plain text output, but instead of creating a file to keep the printed values, they are printed on the command line. This can be very useful when you want to redirect the results directly to another program in one command with no intermediate file. The only difference is that only the pixel values are printed (with no information comment line). To print to the standard output, set the output name to ‘‘stdout’’. ---------- Footnotes ---------- (1) 5.2.3 Color ----------- Color is generally defined after mixing various data “channels”. The values for each channel usually come a filter that is placed in the optical path. Filters, only allow a certain window of the spectrum to pass (for example, the SDSS _r_ filter only allows light from about 5500 to 7000 Angstroms). In digital monitors or common digital cameras, a different set of filters are used: Red, Green and Blue (commonly known as RGB) that are more optimized to the eye’s perception. On the other hand, when printing on paper, standard printers use the cyan, magenta, yellow and key (CMYK, key=black) color space. 5.2.3.1 Pixel colors .................... As discussed in *note Color::, for each displayed/printed pixel of a color image, the dataset/image has three or four values. To store/show the three values for each pixel, cameras and monitors allocate a certain fraction of each pixel’s area to red, green and blue filters. These three filters are thus built into the hardware at the pixel level. However, because measurement accuracy is very important in scientific instruments, and we want to do measurements (take images) with various/custom filters (without having to order a new expensive detector!), scientific detectors use the full area of the pixel to store one value for it in a single/mono channel dataset. To make measurements in different filters, we just place a filter in the light path before the detector. Therefore, the FITS format that is used to store astronomical datasets is inherently a mono-channel format (see *note Recognized file formats:: or *note Fits::). When a subject has been imaged in multiple filters, you can feed each different filter into the red, green and blue channels of your monitor and obtain a false-colored visualization. The reason we say “false-color” (or pseudo color) is that generally, the three data channels you provide are not from the same Red, Green and Blue filters of your monitor! So the observed color on your monitor does not correspond the physical “color” that you would have seen if you looked at the object by eye. Nevertheless, it is good (and sometimes necessary) for visualization (of special features). In ConvertType, you can do this by giving each separate single-channel dataset (for example, in the FITS image format) as an argument (in the proper order), then asking for the output in a format that supports multi-channel datasets (for example, see the command below, or *note ConvertType input and output::). $ astconvertt r.fits g.fits b.fits --output=color.jpg 5.2.3.2 Colormaps for single-channel pixels ........................................... As discussed in *note Pixel colors::, color is not defined when a dataset/image contains a single value for each pixel. However, we interact with scientific datasets through monitors or printers. They allow multiple channels (independent values) per pixel and produce color with them (on monitors, this is usually with three channels: Red, Green and Blue). As a result, there is a lot of freedom in visualizing a single-channel dataset. The mapping of single-channel values to multi-channel colors is called called a “color map”. Since more information can be put in multiple channels, this usually results in better visualizing the dynamic range of your single-channel data. In ConvertType, you can use the ‘--colormap’ option to choose between different mappings of mono-channel inputs, see *note Invoking astconvertt::. Below, we will review two of the basic color maps, please see the description of ‘--colormap’ in *note Invoking astconvertt:: for the full list. • The most basic colormap is shades of black (because of its strong contrast with white). This scheme is called Grayscale (https://en.wikipedia.org/wiki/Grayscale). But ultimately, the black is just one color, so with Grayscale, you are not using the full dynamic range of the three-channel monitor effectively. To help in visualization, more complex mappings can be defined. • A slightly more complex color map can be defined when you scale the values to a range of 0 to 360, and use as it as the “Hue” term of the Hue-Saturation-Value (https://en.wikipedia.org/wiki/HSL_and_HSV) (HSV) color space (while fixing the “Saturation” and “Value” terms). The increased usage of the monitor’s 3-channel color space is indeed better, but the resulting images can be un-”natural” to the eye. Since grayscale is a commonly used mapping of single-valued datasets, we will continue with a closer look at how it is stored. One way to represent a gray-scale image in different color spaces is to use the same proportions of the primary colors in each pixel. This is the common way most FITS image viewers work: for each pixel, they fill all the channels with the single value. While this is necessary for displaying a dataset, there are downsides when storing/saving this type of grayscale visualization (for example, in a paper). • Three (for RGB) or four (for CMYK) values have to be stored for every pixel, this makes the output file very heavy (in terms of bytes). • If printing, the printing errors of each color channel can make the printed image slightly more blurred than it actually is. To solve both these problems when storing grayscale visualization, the best way is to save a single-channel dataset into the black channel of the CMYK color space. The JPEG standard is the only common standard that accepts CMYK color space. The JPEG and EPS standards set two sizes for the number of bits in each channel: 8-bit and 12-bit. The former is by far the most common and is what is used in ConvertType. Therefore, each channel should have values between 0 to 2^8-1=255. From this we see how each pixel in a gray-scale image is one byte (8 bits) long, in an RGB image, it is 3 bytes long and in CMYK it is 4 bytes long. But thanks to the JPEG compression algorithms, when all the pixels of one channel have the same value, that channel is compressed to one pixel. Therefore a Grayscale image and a CMYK image that has only the K-channel filled are approximately the same file size. 5.2.3.3 Vector graphics colors .............................. When creating vector graphics, ConvertType recognizes the extended web colors (https://en.wikipedia.org/wiki/Web_colors#Extended_colors) that are the result of merging the colors in the HTML 4.01, CSS 2.0, SVG 1.0 and CSS3 standards. They are all shown with their standard name in *note Figure 5.1: colornames. The names are not case sensitive so you can use them in any form (for example, ‘turquoise’ is the same as ‘Turquoise’ or ‘TURQUOISE’). On the command line, you can also get the list of colors with the ‘--listcolors’ option to CovertType, like below. In particular, if your terminal is 24-bit or "true color", in the last column, you will see each color. This greatly helps in selecting the best color for our purpose easily on the command-line (without taking your hands off the keyboard and getting distracted). $ astconvertt --listcolors ../gnuastro-figures//color-names.eps Figure 5.1: Recognized color names in Gnuastro, shown with their numerical identifiers. 5.2.4 Color channels in same pixel grid --------------------------------------- In order to use different images as color channels, it is important that the images be properly aligned and on the same pixel grid. When your inputs are high-level products of the same survey, this is usually the case. However, in many other situations the images you plan to use as different color channels lie on different sky positions, even if they may have the same number of pixels. In this section we will show how to solve this problem. For an example dataset, let’s use the same SDSS field that we used in *note Detecting large extended targets::: the field covering the outer parts of the M51 group. With the commands below, we’ll make an ‘inputs’ directory and download and prepare the three g, r and i band images of SDSS over the same field there: $ mkdir inputs $ sdssurl=https://dr12.sdss.org/sas/dr12/boss/photoObj/frames $ for f in g r i; do \ wget $sdssurl/301/3716/6/frame-$f-003716-6-0117.fits.bz2 \ -O$f.fits.bz2; \ bunzip2 $f.fits.bz2; \ astfits $f.fits --copy=0 -oinputs/$f.fits; \ rm $f.fits; \ done With the commands below, first we’ll check the size of all three images to confirm that they have exactly the same number of pixels. Then we’ll use them as three color channels to construct a PDF image: ## Check the number of pixels along each axis of all images. $ astfits inputs/*.fits --keyvalue=NAXIS1,NAXIS2 ## Create a color image from the non-yet-aligned inputs. $ astconvertt inputs/i.fits inputs/r.fits inputs/g.fits -g1 \ --fluxhigh=1 -om51-not-aligned.pdf Open ‘m51-not-aligned.pdf’ with your PDF viewer, and zoom-in to some part of the image with fewer sources. You will clearly see that for each object, there are three copies, one in red (from the reddest filter; i), one in green (from the middle filter; r), and one in blue (the bluest filter; g). Did you see the Warning message that was printed after your latest command? We have implemented a check in Warp to inform you when the images are not aligned and can produce bad (in most cases!) outputs like this. To solve this problem, you need to align the three color channels into the same pixel grid. To do that, we will use the *note Warp:: program and in particular, its *note Align pixels with WCS considering distortions::. Let’s take the middle (r band) filter as the reference to define our grid. With the first command below, let’s align the r band filter to the celestial coordinates (so the M51 group’s position angle doesn’t depend on the orientation of the telescope when it took this image). With the next two commands, let’s use the ‘--gridfile’ to ensure that the pixel grid and WCS comes from the r band image, but the pixel values come from the other two filters. Finally, in the last command, we’ll produce the color PDF from the three aligned images (that aren’t in the ‘inputs/’ directory any more): ## Put all three channels in the same pixel grid. $ astwarp inputs/r.fits --output=r.fits $ astwarp inputs/g.fits --gridfile=r.fits --output=g.fits $ astwarp inputs/i.fits --gridfile=r.fits --output=i.fits ## Create a color image from the aligned inputs. $ astconvertt i.fits r.fits g.fits -g1 --fluxhigh=1 -om51.pdf Open the new ‘m51.pdf’ and compare it with the old ‘m51-not-aligned.pdf’. The difference is obvious! When you zoom-in, the stars are very clear and the different color channels of the same object in the sky don’t fall on different pixels. If you look closely on the two longer edges of the image, you will see that one edge has a thin green shadow and the other has a thin red shadow. This shows how green and red channels have been slightly shifted to put your astronomical sources on the same grid. If you don’t want to have those, or if you want the outer parts of the final image (where there was no data) to be white, some more complex commands are necessary. We’ll leave those as an exercise for you to try your self using *note Warp:: and/or *note Crop:: to pre-process the inputs before converting it to a color image. 5.2.5 Annotations for figure in paper ------------------------------------- To make a nice figure from your FITS images, it is important to show more than merely the raw image (converted to a printer friendly format like PDF or JPEG). Annotations (or visual metadata) over the raw image greatly help the readers clearly see your argument and put the image/result in a larger context. Examples include: • Coordinates (Right Ascension and Declination) on the edges of the image, so viewers of your paper or presentation slides can get a physical feeling of the field’s sky coverage. • Thick line that has a fixed tangential size (for example, in kilo parsecs) at the redshift/distance of interest. • Contours over the image to show radio/X-ray emission, over an optical image for example. • Text, arrows, etc., over certain parts of the image. Because of the modular philosophy of Gnuastro, ConvertType is only focused on converting your FITS images to printer friendly formats like JPEG or PDF. But to present your results in a slide or paper, you will often need to annotate the raw JPEG or PDF with some of the features above. The good news is that there are many powerful plotting programs that you can use to add such annotations. As a result, there is no point in making a new one, specific to Gnuastro. In this section, we will demonstrate this using the very powerful PGFPlots(1) package of LaTeX. *Single script for easy running:* In this section we are reviewing the reason and details of every step which is good for educational purposes. But when you know the steps already, these separate code blocks can be annoying. Therefore the full script (except for the data download step) is available in *note Full script of annotations on figure::. PGFPlots uses the same LaTeX graphic engine that typesets your paper/slide. Therefore when you build your plots and figures using PGFPlots (and its underlying package PGF/TiKZ(2)) your plots will blend beautifully within your text: same fonts, same colors, same line properties, etc. Since most papers (and presentation slides(3)) are made with LaTeX, PGFPlots is therefore the best tool for those who use LaTeX to create documents. PGFPlots also does not need any extra dependencies beyond a basic/minimal TeX-live installation, so it is much more reliable than tools like Matplotlib in Python that have hundreds of fast-evolving dependencies(4). To demonstrate this, we will create a surface brightness image of a galaxy in the F160W filter of the ABYSS survey(5). In the code-block below, let’s make a “build” directory to keep intermediate files and avoid populating the source. Afterwards, we will download the full image and crop out a 20 arcmin wide image around the galaxy with the commands below. You can run these commands in an empty directory. $ mkdir build $ wget http://cdsarc.u-strasbg.fr/ftp/J/A+A/621/A133/fits/ah_f160w.fits $ astcrop ah_f160w.fits --center=53.1616278,-27.7802446 --mode=wcs \ --width=20/3600 --output=build/crop.fits To better show the low surface brightness (LSB) outskirts, we will warp the image, then convert the pixel units to surface brightness with the commands below. It is very important that the warping is done _before_ the conversion to surface brightness (in units of mag/arcsec$^2$), because the definition of surface brightness is non-linear. For more, see the Surface brightness topic of *note Brightness flux magnitude::, and for a more complete tutorial, see *note FITS images in a publication::. $ zeropoint=25.94 $ astwarp build/crop.fits --centeroncorner --scale=1/3 \ --output=build/scaled.fits $ pixarea=$(astfits build/scaled.fits --pixelareaarcsec2) $ astarithmetic build/scaled.fits $zeropoint $pixarea counts-to-sb \ --output=build/sb.fits We are now ready to convert the surface brightness image into a PDF. To better show the LSB features, we will also limit the color range with the ‘--fluxlow’ and ‘--fluxhigh’ options: all pixels with a surface brightness brighter than 22 mag/arcsec$^2$ will be shown as black, and all pixels with a surface brightness fainter than 30 mag/arcsec$^2$ will be white. These thresholds are being defined as variables, because we will also need them below (to pass into PGFPlots). We will also set ‘--borderwidth=0’, because the coordinate system we will add over the image will effectively be a border for the image (separating it from the background). $ sblow=22 $ sbhigh=30 $ astconvertt build/sb.fits --colormap=gray --borderwidth=0 \ --fluxhigh=$sbhigh --fluxlow=$sblow --output=build/sb.pdf Please open ‘sb.pdf’ and have a look. Also, please open ‘sb.fits’ in DS9 (or any other FITS viewer) and play with the color range. Can the surface brightness limits be changed to better show the LSB structure? If so, you are free to change the limits above. We now have the printable PDF representation of the image, but as discussed above, it is not enough for a paper. We will add 1) a thick line showing the size of 20 kpc (kilo parsecs) at the redshift of the central galaxy, 2) coordinates and 3) a color bar, showing the surface brightness level of each grayscale level. To get the first job done, we first need to know the redshift of the central galaxy. To do this, we can use Gnuastro’s Query program to look into all the objects in NED within this image (only asking for the RA, Dec and redshift columns). We will then use the Match program to find the NED entry that corresponds to our galaxy. $ astquery ned --dataset=objdir --overlapwith=build/sb.fits \ --column=ra,dec,z --output=ned.fits $ astmatch ned.fits -h1 --coord=53.1616278,-27.7802446 \ --ccol1=RA,Dec --aperture=1/3600 $ redshift=$(asttable ned_matched.fits -cz) $ echo $redshift Now that we know the redshift of the central object, we can define the coordinates of the thick line that will show the length of 20 kpc at that redshift. It will be a horizontal line (fixed Declination) across a range of RA. The start of this thick line will be located at the top edge of the image (at the 95-percent of the width and height of the image). With the commands below we will find the three necessary parameters (one declination and two RAs). Just note that in astronomical images, RA increases to the left/east, which is the reason we are using the minimum and ‘+’ to find the RA starting point. $ scalelineinkpc=20 $ coverage=$(astfits build/sb.fits --skycoverage --quiet | awk 'NR==2') $ scalelinedec=$(echo $coverage | awk '{print $4-($4-$3)*0.05}') $ scalelinerastart=$(echo $coverage | awk '{print $1+($2-$1)*0.05}') $ scalelineraend=$(astcosmiccal --redshift=$redshift --arcsectandist \ | awk '{start='$scalelinerastart'; \ width='$scalelineinkpc'/$1/3600; \ print start+width}') To draw coordinates over the image, we need to feed these values into PGFPlots. But manually entering numbers into the PGFPlots source will be very frustrating and prone to many errors! Fortunately there is an easy way to do this: LaTeX macros. New macros are defined by this LaTeX command: \newcommand{\macroname}{value} Anywhere that LaTeX confronts ‘\macroname’, it will replace ‘value’ when building the output. We will have one file called ‘macros.tex’ in the build directory and define macros based on those values. We will use the shell’s ‘printf’ command to write these macro definition lines into the macro file. We just have to use double backslashes in the ‘printf’ command, because backslash is a meaningful character for ‘printf’, but we want to keep one of them. Also, we put a ‘\n’ at the end of each line, otherwise, all the commands will go into a single line of the macro file. We will also place the random ‘‘ma’’ string at the start of all our LaTeX macros to help identify the macros for this plot. $ macros=build/macros.tex $ printf '\\newcommand{\\maScaleDec}'"{$scalelinedec}\n" > $macros $ printf '\\newcommand{\\maScaleRAa}'"{$scalelinerastart}\n" >> $macros $ printf '\\newcommand{\\maScaleRAb}'"{$scalelineraend}\n" >> $macros $ printf '\\newcommand{\\maScaleKpc}'"{$scalelineinkpc}\n" >> $macros $ printf '\\newcommand{\\maCenterZ}'"{$redshift}\n" >> $macros Please open the macros file after these commands and have a look to see if they do conform to the expected format above. Another set of macros we will need to feed into PGFPlots is the coordinates of the image corners. Fortunately the ‘coverage’ variable found above is also useful here. We just need to extract each item before feeding it into the macros. To do this, we will use AWK and keep each value with the temporary shell variable ‘‘v’’. $ v=$(echo $coverage | awk '{print $1}') $ printf '\\newcommand{\\maCropRAMin}'"{$v}\n" >> $macros $ v=$(echo $coverage | awk '{print $2}') $ printf '\\newcommand{\\maCropRAMax}'"{$v}\n" >> $macros $ v=$(echo $coverage | awk '{print $3}') $ printf '\\newcommand{\\maCropDecMin}'"{$v}\n" >> $macros $ v=$(echo $coverage | awk '{print $4}') $ printf '\\newcommand{\\maCropDecMax}'"{$v}\n" >> $macros Finally, we also need to pass some other numbers to PGFPlots: 1) the major tick distance (in the coordinate axes that will be printed on the edge of the image). We will assume 7 ticks for this image. 2) The minimum and maximum surface brightness values that we gave to ConvertType when making the PDF; PGFPlots will define its color-bar based on these two values. $ v=$(echo $coverage | awk '{print ($2-$1)/7}') $ printf '\\newcommand{\\maTickDist}'"{$v}\n" >> $macros $ printf '\\newcommand{\\maSBlow}'"{$sblow}\n" >> $macros $ printf '\\newcommand{\\maSBhigh}'"{$sbhigh}\n" >> $macros All the necessary numbers are now ready. Please copy the contents below into a file called ‘my-figure.tex’. This is the PGFPlots source for this particular plot. Besides the coordinates and scale-line, we will also add some text over the image and an orange arrow pointing to the central object with its redshift printed over it. The parameters are generally human-readable, so you should be able to get a good feeling of every line. There are also comments which will show up as a different color when you copy this into a plain-text editor. \begin{tikzpicture} %% Define the coordinates and colorbar \begin{axis}[ at={(0,0)}, axis on top, x dir=reverse, scale only axis, width=\linewidth, height=\linewidth, minor tick num=10, xmin=\maCropRAMin, xmax=\maCropRAMax, ymin=\maCropDecMin, ymax=\maCropDecMax, enlargelimits=false, every tick/.style={black}, xtick distance=\maTickDist, ytick distance=\maTickDist, yticklabel style={rotate=90}, ylabel={Declination (degrees)}, xlabel={Right Ascension (degrees)}, ticklabel style={font=\small, /pgf/number format/.cd, precision=4,/tikz/.cd}, x label style={at={(axis description cs:0.5,0.02)}, anchor=north,font=\small}, y label style={at={(axis description cs:0.07,0.5)}, anchor=south,font=\small}, colorbar, colormap name=gray, point meta min=\maSBlow, point meta max=\maSBhigh, colorbar style={ at={(1.01,1)}, ylabel={Surface brightness (mag/arcsec$^2$)}, yticklabel style={ /pgf/number format/.cd, precision=1, /tikz/.cd}, y label style={at={(axis description cs:5.3,0.5)}, anchor=south,font=\small}, }, ] %% Put the image in the proper positions of the plot. \addplot graphics[ xmin=\maCropRAMin, xmax=\maCropRAMax, ymin=\maCropDecMin, ymax=\maCropDecMax] {sb.pdf}; %% Draw the scale factor. \addplot[black, line width=5, name=scaleline] coordinates {(\maScaleRAa,\maScaleDec) (\maScaleRAb,\maScaleDec)} node [anchor=north west] {\large $\maScaleKpc$ kpc}; \end{axis} %% Add some text anywhere over the plot. The text is added two %% times: the first time with a white background (that with a %% certain opacity), the second time just the text with opacity. \node[anchor=south west, fill=white, opacity=0.5] at (0.01\linewidth,0.01\linewidth) {(a) Text can be added here}; \node[anchor=south west] at (0.01\linewidth,0.01\linewidth) {(a) Text can be added here}; %% Add an arrow to highlight certain structures. \draw [->, red!70!yellow, line width=5] (0.35\linewidth,0.35\linewidth) -- node [anchor=south, rotate=45]{$z=\maCenterZ$} (0.45\linewidth,0.45\linewidth); \end{tikzpicture} Finally, we need another simple LaTeX source for the main PDF “report” that will host this figure. This can actually be your paper or slides for example. Here, we will suffice to the minimal working example. \documentclass{article} %% Import the TiKZ package and activate its "external" feature. \usepackage{tikz} \usetikzlibrary{external} \tikzexternalize %% PGFPlots (which uses TiKZ). \usepackage{pgfplots} \pgfplotsset{axis line style={thick}} \pgfplotsset{ /pgfplots/colormap={gray}{rgb255=(0,0,0) rgb255=(255,255,255)} } %% Import the macros. \input{macros.tex} %% Start document. \begin{document} You can write anything here. %% Add the figure and its caption. \begin{figure} \input{my-figure.tex} \caption{A demo image.} \end{figure} %% Finish the document. \end{document} You are now ready to create the PDF. But LaTeX creates many temporary files, so to avoid populating our top-level directory, we will copy the two ‘.tex’ files into the build directory, go there and run LaTeX. Before running it though, we will first delete all the files that have the name pattern ‘*-figure0*’, these are “external” files created by TiKZ+PGFPlots, including the actual PDF of the figure. $ cp report.tex my-figure.tex build $ cd build $ rm -f *-figure0* $ pdflatex -shell-escape -halt-on-error report.tex You now have the full “report” in ‘report.pdf’. Try adding some extra text on top of the figure, or in the caption and re-running the last four commands. Also try changing the 20kpc scale line length to 50kpc, or try changing the redshift, to see how the length and text of the thick scale-line will automatically change. But the good news is that you also have the raw PDF of the figure that you can use in other places. You can see that file in ‘report-figure0.pdf’. In a larger paper, you can add multiple such figures (with different ‘.tex’ files that are placed in different ‘figure’ environments with different captions). Each figure will get a number in the build directory. TiKZ also allows setting a file name for each “external” figure (to avoid such numbers that can be annoying if the image orders are changed). PGFPlots is also highly customizable, you can make a lot of changes and customizations. Both TiKZ(6) and PGFPLots(7) have wonderful manuals, so have a look trough them. ---------- Footnotes ---------- (1) (2) (3) To build slides, LaTeX has packages like Beamer, see (4) See Figure 1 of Alliez et al. 2020 at (5) (6) (7) 5.2.5.1 Full script of annotations on figure ............................................ In *note Annotations for figure in paper::, we each one of the steps to add annotations over an image were described in detail. So if you have understood the steps, but need to add annotations over an image, repeating those steps individually will be annoying. Therefore in this section, we will summarize all the steps in a single script that you can simply copy-paste into a text editor, configure, and run. *Necessary files:* To run this script, you will need an image to crop your object from (here assuming it is called ‘ah_f160w.fits’ with a certain zero point) and two ‘my-figure.tex’ and ‘report.tex’ files that were fully included in *note Annotations for figure in paper::. Also, we have brought the redshift as a parameter here. But if the center of your image always points to your main object, you can also include the Query command to automatically find the object’s redshift from NED. Alternatively, your image may already be cropped, in this case, you can remove the cropping step and # Parameters. sblow=22 # Minimum surface brightness. sbhigh=30 # Maximum surface brightness. bdir=build # Build directory location on filesystem. numticks=7 # Number of major ticks in each axis. redshift=0.619 # Redshift of object of interest. zeropoint=25.94 # Zero point of input image. scalelineinkpc=20 # Length of scale-line (in kilo parsecs). input=ah_f160w.fits # Name of input (to crop). # Stop the script in case of a crash. set -e # Build directory if ! [ -d $bdir ]; then mkdir $bdir; fi # Crop out the desired region. crop=$bdir/crop.fits astcrop $input --center=53.1616278,-27.7802446 --mode=wcs \ --width=20/3600 --output=$crop # Warp the image to larger pixels to show surface brightness better. scaled=$bdir/scaled.fits astwarp $crop --centeroncorner --scale=1/3 --output=$scaled # Calculate the pixel area and convert image to Surface brightness. sb=$bdir/sb.fits pixarea=$(astfits $scaled --pixelareaarcsec2) astarithmetic $scaled $zeropoint $pixarea counts-to-sb \ --output=$sb # Convert the Surface brightness image into PDF. sbpdf=$bdir/sb.pdf astconvertt $sb --colormap=gray --borderwidth=0 \ --fluxhigh=$sbhigh --fluxlow=$sblow --output=$sbpdf # Specify the coordinates of the scale line (specifying a certain # width in kpc). We will put it on the top-right side of the image (5% # of the full width of the image away from the edge). coverage=$(astfits $sb --skycoverage --quiet | awk 'NR==2') scalelinedec=$(echo $coverage | awk '{print $4-($4-$3)*0.05}') scalelinerastart=$(echo $coverage | awk '{print $1+($2-$1)*0.05}') scalelineraend=$(astcosmiccal --redshift=$redshift --arcsectandist \ | awk '{start='$scalelinerastart'; \ width='$scalelineinkpc'/$1/3600; \ print start+width}') # Write the LaTeX macros to use in plot. Start with the thick line # showing tangential distance. macros=$bdir/macros.tex printf '\\newcommand{\\maScaleDec}'"{$scalelinedec}\n" > $macros printf '\\newcommand{\\maScaleRAa}'"{$scalelinerastart}\n" >> $macros printf '\\newcommand{\\maScaleRAb}'"{$scalelineraend}\n" >> $macros printf '\\newcommand{\\maScaleKpc}'"{$scalelineinkpc}\n" >> $macros printf '\\newcommand{\\maCenterZ}'"{$redshift}\n" >> $macros # Add image extrema for the coordinates. v=$(echo $coverage | awk '{print $1}') printf '\\newcommand{\maCropRAMin}'"{$v}\n" >> $macros v=$(echo $coverage | awk '{print $2}') printf '\\newcommand{\maCropRAMax}'"{$v}\n" >> $macros v=$(echo $coverage | awk '{print $3}') printf '\\newcommand{\maCropDecMin}'"{$v}\n" >> $macros v=$(echo $coverage | awk '{print $4}') printf '\\newcommand{\maCropDecMax}'"{$v}\n" >> $macros # Distance between each tick value. v=$(echo $coverage | awk '{print ($2-$1)/'$numticks'}') printf '\\newcommand{\maTickDist}'"{$v}\n" >> $macros printf '\\newcommand{\maSBlow}'"{$sblow}\n" >> $macros printf '\\newcommand{\maSBhigh}'"{$sbhigh}\n" >> $macros # Copy the LaTeX source into the build directory and go there to run # it and have all the temporary LaTeX files there. cp report.tex my-figure.tex $bdir cd $bdir rm -f *-figure0* pdflatex -shell-escape -halt-on-error report.tex 5.2.6 Invoking ConvertType -------------------------- ConvertType will convert any recognized input file type to any specified output type. The executable name is ‘astconvertt’ with the following general template $ astconvertt [OPTION...] InputFile [InputFile2] ... [InputFile4] One line examples: ## Convert an image in FITS to PDF: $ astconvertt image.fits --output=pdf ## Similar to before, but use the Viridis color map: $ astconvertt image.fits --colormap=viridis --output=pdf ## Add markers to to highlight parts of the image ## ('marks.fits' is a table containing coordinates) $ astconvertt image.fits --marks=marks.fits --output=pdf ## Convert an image in JPEG to FITS (with multiple extensions ## if it has color): $ astconvertt image.jpg -oimage.fits ## Use three 2D arrays to create an RGB JPEG output (two are ## plain-text, the third is FITS, but all have the same size). $ astconvertt f1.txt f2.txt f3.fits -o.jpg ## Use two images and one blank for an RGB EPS output: $ astconvertt M31_r.fits M31_g.fits blank -oeps ## Directly pass input from output of another program through Standard ## input (not a file). $ cat 2darray.txt | astconvertt -oimg.fits In the sub-sections below various options that are specific to ConvertType are grouped in different categories. Please see those sections for a detailed discussion on each group and its options. Besides those, ConvertType also shares the *note Common options:: with other Gnuastro programs. The common options are not repeated here. 5.2.6.1 ConvertType input and output .................................... At most four input files (one for each color channel for formats that allow it) are allowed in ConvertType. The first input dataset can either be a file, or come from Standard input (see *note Standard input:: and *note Recognized file formats::). The order of multiple input files is important. After reading the input file(s) the number of color channels in all the inputs will be used to define which color space to use for the outputs and how each color channel is interpreted: 1 (for grayscale), 3 (for RGB) and 4 (for CMYK) input channels. For more on pixel color channels, see *note Pixel colors::. Depending on the format of the input(s), the number of input files can differ. for example, if you plan to build an RGB PDF and your three channels are in the first HDU of ‘r.fits’, ‘g.fits’ and ‘b.fits’, then you can simply call MakeProfiles like this: $ astconvertt r.fits g.fits b.fits -g1 --output=rgb.pdf However, if the three color channels are in three extensions (assuming the HDUs are respectively named ‘R’, ‘G’ and ‘B’) of a single file (assuming ‘channels.fits’), you should run it like this: $ astconvertt channels.fits -hR -hG -hB --output=rgb.pdf On the other hand, if the channels are already in a multi-channel format (like JPEG), you can simply provide that file: $ astconvertt image.jpg --output=rgb.pdf If multiple channels are given as input, and the output format does not support multiple color channels (for example, FITS), ConvertType will put the channels in different HDUs, like the example below. After running the ‘astfits’ command, if your JPEG file was not grayscale (single channel), you will see multiple HDUs in ‘channels.fits’. $ astconvertt image.jpg --output=channels.fits $ astfits channels.fits As shown above, the output’s file format will be interpreted from the name given to the ‘--output’ option (as a common option to all Gnuastro programs, for the description of ‘--output’, see *note Input output options::). It can either be given on the command-line or in any of the configuration files (see *note Configuration files::). When the output suffix is not recognized, it will default to plain text format, see *note Recognized file formats::. If there is one input dataset (color channel) the output will be gray-scale. When three input datasets (color channels) are given, they are respectively considered to be the red, green and blue color channels. Finally, if there are four color channels they will be cyan, magenta, yellow and black (CMYK colors). The value to ‘--output’ (or ‘-o’) can be either a full file name or just the suffix of the desired output format. In the former case (full name), it will be directly used for the output’s file name. In the latter case, the name of the output file will be set based on the automatic output guidelines, see *note Automatic output::. Note that the suffix name can optionally start with a ‘.’ (dot), so for example, ‘--output=.jpg’ and ‘--output=jpg’ are equivalent. See *note Recognized file formats::. The relevant options for input/output formats are described below: ‘-h STR/INT’ ‘--hdu=STR/INT’ Input HDU name or counter (counting from 0) for each input FITS file. If the same HDU should be used from all the FITS files, you can use the ‘--globalhdu’ option described below. In ConvertType, it is possible to call the HDU option multiple times for the different input FITS or TIFF files in the same order that they are called on the command-line. Note that in the TIFF standard, one ‘directory’ (similar to a FITS HDU) may contain multiple color channels (for example, when the image is in RGB). Except for the fact that multiple calls are possible, this option is identical to the common ‘--hdu’ in *note Input output options::. The number of calls to this option cannot be less than the number of input FITS or TIFF files, but if there are more, the extra HDUs will be ignored, note that they will be read in the order described in *note Configuration file precedence::. Unlike CFITSIO, libtiff (which is used to read TIFF files) only recognizes numbers (counting from zero, similar to CFITSIO) for ‘directory’ identification. Hence the concept of names is not defined for the directories and the values to this option for TIFF files must be numbers. ‘-g STR/INT’ ‘--globalhdu=STR/INT’ Use the value given to this option (a HDU name or a counter, starting from 0) for the HDU identifier of all the input FITS files. This is useful when all the inputs are distributed in different files, but have the same HDU in those files. ‘-w FLT’ ‘--widthincm=FLT’ The width of the output in centimeters. This is only relevant for those formats that accept such a width as metadata (not FITS or plain-text for example), see *note Recognized file formats::. For most digital purposes, the number of pixels is far more important than the value to this parameter because you can adjust the absolute width (in inches or centimeters) in your document preparation program. ‘-x’ ‘--hex’ Use Hexadecimal encoding in creating EPS output. By default the ASCII85 encoding is used which provides a much better compression ratio. When converted to PDF (or included in TeX or LaTeX which is finally saved as a PDF file), an efficient binary encoding is used which is far more efficient than both of them. The choice of EPS encoding will thus have no effect on the final PDF. So if you want to transfer your EPS files (for example, if you want to submit your paper to arXiv or journals in PostScript), their storage might become important if you have large images or lots of small ones. By default ASCII85 encoding is used which offers a much better compression ratio (nearly 40 percent) compared to Hexadecimal encoding. ‘-u INT’ ‘--quality=INT’ The quality (compression) of the output JPEG file with values from 0 to 100 (inclusive). For other formats the value to this option is ignored. Note that only in gray-scale (when one input color channel is given) will this actually be the exact quality (each pixel will correspond to one input value). If it is in color mode, some degradation will occur. While the JPEG standard does support loss-less graphics, it is not commonly supported. 5.2.6.2 Pixel visualization ........................... The main goal of ConvertType is to visualize pixels to/from print or web friendly formats. Astronomical data usually have a very large dynamic range (difference between maximum and minimum value) and different subjects might be better demonstrated with a limited flux range. ‘--colormap=STR[,FLT,...]’ The color map to visualize a single channel. The first value given to this option is the name of the color map, which is shown below. Some color maps can be configured. In this case, the configuration parameters are optionally given as numbers following the name of the color map for example, see ‘hsv’. The table below contains the usable names of the color maps that are currently supported: ‘gray’ ‘grey’ Grayscale color map. This color map does not have any parameters. The full dataset range will be scaled to 0 and $2^8-1=255$ to be stored in the requested format. ‘hsv’ Hue, Saturation, Value(1) color map. If no values are given after the name (‘--colormap=hsv’), the dataset will be scaled to 0 and 360 for hue covering the full spectrum of colors. However, you can limit the range of hue (to show only a special color range) by explicitly requesting them after the name (for example, ‘--colormap=hsv,20,240’). The mapping of a single-channel dataset to HSV is done through the Hue and Value elements: Lower dataset elements have lower “value” _and_ lower “hue”. This creates darker colors for fainter parts, while also respecting the range of colors. ‘viridis’ Viridis is the default colormap of the popular Matplotlib module of Python and available in many other visualization tools like PGFPlots. ‘sls’ The SLS color range, taken from the commonly used SAO DS9 (http://ds9.si.edu). The advantage of this color range is that it starts with black, going into dark blue and finishes with the brighter colors of red and white. So unlike the HSV color range, it includes black and white and brighter colors (like yellow, red) show the larger values. ‘sls-inverse’ The inverse of the SLS color map (see above), where the lowest value corresponds to white and the highest value is black. While SLS is good for visualizing on the monitor, SLS-inverse is good for printing. ‘--rgbtohsv’ When there are three input channels and the output is in the FITS format, interpret the three input channels as red, green and blue channels (RGB) and convert them to the hue, saturation, value (HSV) color space. The currently supported output formats of ConvertType do not have native support for HSV. Therefore this option is only supported when the output is in FITS format and each of the hue, saturation and value arrays can be saved as one FITS extension in the output for further analysis (for example, to select a certain color). ‘-c STR’ ‘--change=STR’ (‘=STR’) Change pixel values with the following format ‘"from1:to1, from2:to2,..."’. This option is very useful in displaying labeled pixels (not actual data images which have noise) like segmentation maps. In labeled images, usually a group of pixels have a fixed integer value. With this option, you can manipulate the labels before the image is displayed to get a better output for print or to emphasize on a particular set of labels and ignore the rest. The labels in the images will be changed in the same order given. By default first the pixel values will be converted then the pixel values will be truncated (see ‘--fluxlow’ and ‘--fluxhigh’). You can use any number for the values irrespective of your final output, your given values are stored and used in the double precision floating point format. So for example, if your input image has labels from 1 to 20000 and you only want to display those with labels 957 and 11342 then you can run ConvertType with these options: $ astconvertt --change=957:50000,11342:50001 --fluxlow=5e4 \ --fluxhigh=1e5 segmentationmap.fits --output=jpg While the output JPEG format is only 8 bit, this operation is done in an intermediate step which is stored in double precision floating point. The pixel values are converted to 8-bit after all operations on the input fluxes have been complete. By placing the value in double quotes you can use as many spaces as you like for better readability. ‘-C’ ‘--changeaftertrunc’ Change pixel values (with ‘--change’) after truncation of the flux values, by default it is the opposite. ‘-L FLT’ ‘--fluxlow=FLT’ The minimum flux (pixel value) to display in the output image, any pixel value below this value will be set to this value in the output. If the value to this option is the same as ‘--fluxhigh’, then no flux truncation will be applied. Note that when multiple channels are given, this value is used for all the color channels. ‘-H FLT’ ‘--fluxhigh=FLT’ The maximum flux (pixel value) to display in the output image, see ‘--fluxlow’. ‘-m INT’ ‘--maxbyte=INT’ This is only used for the JPEG and EPS output formats which have an 8-bit space for each channel of each pixel. The maximum value in each pixel can therefore be $2^8-1=255$. With this option you can change (decrease) the maximum value. By doing so you will decrease the dynamic range. It can be useful if you plan to use those values for other purposes. ‘-A’ ‘--forcemin’ Enforce the value of ‘--fluxlow’ (when it is given), even if it is smaller than the minimum of the dataset and the output is format supporting color. This is particularly useful when you are converting a number of images to a common image format like JPEG or PDF with a single command and want them all to have the same range of colors, independent of the contents of the dataset. Note that if the minimum value is smaller than ‘--fluxlow’, then this option is redundant. By default, when the dataset only has two values, _and_ the output format is PDF or EPS, ConvertType will use the PostScript optimization that allows setting the pixel values per bit, not byte (*note Recognized file formats::). This can greatly help reduce the file size. However, when ‘--fluxlow’ or ‘--fluxhigh’ are called, this optimization is disabled: even though there are only two values (is binary), the difference between them does not correspond to the full contrast of black and white. ‘-B’ ‘--forcemax’ Similar to ‘--forcemin’, but for the maximum. ‘-i’ ‘--invert’ For 8-bit output types (JPEG, EPS, and PDF for example) the final value that is stored is inverted so white becomes black and vice versa. The reason for this is that astronomical images usually have a very large area of blank sky in them. The result will be that a large are of the image will be black. Note that this behavior is ideal for gray-scale images, if you want a color image, the colors are going to be mixed up. ---------- Footnotes ---------- (1) 5.2.6.3 Drawing with vector graphics .................................... With the options described in this section, you can draw marks over your to-be-published images (for example, in PDF). Each mark can be highly customized so they can have different shapes, colors, line widths, text, text size, etc. The properties of the marks should be stored in a table that is given to the ‘--marks’ option described below. A fully working demo on adding marks is provided in *note Marking objects for publication::. An important factor to consider when drawing vector graphics is that vector graphics standards (the PostScript standard in this case) use a “point” as the primary unit of line thickness or font size. Such that 72 points correspond to 1 inch (or 2.54 centimeters). In other words, there are roughly 3 PostScript points in every millimeter. On the other hand, the pixels of the images you plan to show as the background do not have any real size! Pixels are abstract and can be associated with any print-size. In ConvertType, the print-size of your final image is set with the ‘--widthincm’ option (see *note ConvertType input and output::). The value to ‘--widthincm’ is the to-be width of the image in centimeters. It therefore defines the thickness of lines or font sizes for your vector graphics features (like the image border or marks). Just recall that we are not talking about resolution! Vector graphics have infinite resolution! We are talking about the relative thickness of the lines (or font sizes) in relation to the pixels in your background image. ‘-b INT’ ‘--borderwidth=INT’ The width of the border to be put around the EPS and PDF outputs in units of PostScript points. If you are planning on adding a border, its thickness in relation to your image pixel sizes is highly correlated with the value you give to the ‘--widthincm’ parameter. See the description at the start of this section for more. Unfortunately in the document structuring convention of the PostScript language, the “bounding box” has to be in units of PostScript points with no fractions allowed. So the border values only have to be specified in integers. To have a final border that is thinner than one PostScript point in your document, you can ask for a larger width in ConvertType and then scale down the output EPS or PDF file in your document preparation program. for example, by setting ‘width’ in your ‘includegraphics’ command in TeX or LaTeX to be larger than the value to ‘--widthincm’. Since it is vector graphics, the changes of size have no effect on the quality of your output (pixels do not get different values). ‘--bordercolor=STR’ The name of the color to use for border that will be put around the EPS and PDF outputs. The list of available colors, along with their name and an example can be seen with the following command (also see *note Vector graphics colors::): $ astconvertt --listcolors This option only accepts the name of the color, not the numeric identifier. ‘--marks=STR’ Draw vector graphics (infinite resolution) marks over the image. The value to this option should be the file name of a table containing the mark information. The table given to this option can have various properties for each mark in each column. You can specify which column contains which property of the marks using the options below that start with ‘--mark’. Only two property columns are mandatory (‘--markcoords’), the rest are optional. The table can be in any of the Gnuastro’s *note Recognized table formats::. For more on the difference between vector and raster graphics, see *note Raster and Vector graphics::. For example, if your table with mark information is called ‘my-marks.fits’, you can use the command below to draw red circles of radius 5 pixels over the coordinates. $ astconvertt image.fits --output=image.pdf \ --marks=marks.fits --mode=wcs \ --markcoords=RA,DEC You can highly customize each mark with different columns in ‘marks.fits’ using the ‘--mark*’ options below (for example, using different colors, different shapes, different sizes, text, and the rest on each mark). ‘--markshdu=STR/INT’ The HDU (or extension) name or number of the table containing mark properties (file given to ‘--marks’). This is only relevant if the table is in the FITS format and there is more than one HDU in the FITS file. ‘-r STR,STR’ ‘--markcoords=STR,STR’ The column names (or numbers) containing the coordinates of each mark (in table given to ‘--marks’). Only two values should be given to this option (one for each coordinate). They can either be given to one call (‘--markcoords=RA,DEC’) or in separate calls (‘--markcoords=RA --markcoords=DEC’). When ‘--mode=image’ the columns will be associated to the horizontal/vertical coordinates of the image, and interpreted in units of pixels. In ‘--mode=wcs’, the columns will be associated to the WCS coordinates (typically Right Ascension and Declination, in units of degrees). ‘-O STR’ ‘--mode=STR’ The coordinate mode for interpreting the values in the columns given to the ‘--markcoord1’ and ‘--markcoord2’ options. The acceptable values are either ‘img’ (for image or pixel coordinates), and ‘wcs’ for World Coordinate System (typically RA and Dec). For the WCS-mode, the input image should have the necessary WCS keywords, otherwise ConvertType will crash. ‘--markshape=STR/INT’ The column name(s), or number(s), containing the shapes of each mark (in table given to ‘--marks’). The shapes can either be identified by their name, or their numerical identifier. If identifying them by name in a plain-text table, you need to define a string column (see *note Gnuastro text table format::). The full list of names is shown below, with their numerical identifier in parenthesis afterwards. For each shape, you can also specify properties such as the size, line width, rotation, and color. See the description of the relevant ‘--mark*’ option below. ‘circle (1)’ A circular circumference. It’s _radius_ is defined by a single size element (the first column given to ‘--marksize’). Any value in the second size column (if given for other shapes in the same call) are ignored by this shape. ‘plus (2)’ The plus sign ($+$). The _length of its lines_ is defined by a single size element (the first column given to ‘--marksize’). Such that the intersection of its lines is on the central coordinate of the mark. Any value in the second size column (if given for other shapes in the same call) are ignored by this shape. ‘cross (3)’ A multiplication sign ($\times$). The _length of its lines_ is defined by a single size element (the first column given to ‘--marksize’). Such that the intersection of its lines is on the central coordinate of the mark. Any value in the second size column (if given for other shapes in the same call) are ignored by this shape. ‘ellipse (4)’ An elliptical circumference. Its major axis radius is defined by the first size element (first column given to ‘--marksize’), and its axis ratio is defined through the second size element (second column given to ‘--marksize’). ‘point (5)’ A point (or a filled circle). Its _radius_ is defined by a single size element (the first column given to ‘--marksize’). Any value in the second size column (if given for other shapes in the same call) are ignored by this shape. This filled circle mark is defined as a “point” because it is usually relevant as a small size (or point in the whole image). But there is no limit on its size, so it can be arbitrarily large. ‘square (6)’ A square circumference. Its _edge length_ is defined by a single size element (the first column given to ‘--marksize’). Any value in the second size column (if given for other shapes in the same call) are ignored by this shape. ‘rectangle (7)’ A rectangular circumference. Its length along the horizontal image axis is defined by first size element (first column given to ‘--marksize’), and its length along the vertical image axis is defined through the second size element (second column given to ‘--marksize’). ‘line (8)’ A line. The line’s _length_ is defined by a single size element (the first column given to ‘--marksize’. The line will be centered on the given coordinate. Like all shapes, you can rotate the line about its center using the ‘--markrotate’ column. Any value in the second size column (if given for other shapes in the same call) are ignored by this shape. ‘--markrotate=STR/INT’ Column name or number that contains the mark’s rotation angle. The rotation angle should be in degrees and be relative to the horizontal axis of the image. ‘--marksize=STR[,STR]’ The column name(s), or number(s), containing the size(s) of each mark (in table given to ‘--marks’). All shapes need at least one “size” parameter and some need two. For the interpretation of the size column(s) for each shape, see the ‘--markshape’ option’s description. Since the size column(s) is (are) optional, when not specified, default values will be used (which may be too small in larger images, so you need to change them). By default, the values in the size column are assumed to be in the same units as the coordinates (defined by the ‘--mode’ option, described above). However, when the coordinates are in WCS-mode, some special cases may occur for the size. • The native WCS units (usually degrees) can be too large, and it may be more convenient for the values in the size column(s) to be in arc-seconds. In this case, you can use the ‘--sizeinarcsec’ option. • Similar to above, but in units of arc-minutes. In this case, you can use the ‘--sizeinarcmin’ option. • Your sizes may be in units of pixels, not the WCS units. In this case, you can use the ‘--sizeinpix’ option. ‘--sizeinpix’ In WCS-mode, assume that the sizes are in units of pixels. By default, when in WCS-mode, the sizes are assumed to be in the units of the WCS coordinates (usually degrees). ‘--sizeinarcsec’ In WCS-mode, assume that the sizes are in units of arc-seconds. By default, when in WCS-mode, the sizes are assumed to be in the units of the WCS coordinates (usually degrees). ‘--sizeinarcmin’ In WCS-mode, assume that the sizes are in units of arc-seconds. By default, when in WCS-mode, the sizes are assumed to be in the units of the WCS coordinates (usually degrees). ‘--marklinewidth=STR/INT’ Column containing the width (thickness) of the line to draw each mark. The line width is measured in units of “points” (where 72 points is one inch), and it can be any positive floating point number. Therefore, the thickness (in relation to the pixels of your image) depends on ‘--widthincm’ option. For more, see the description at the start of this section. ‘--markcolor=STR/INT’ Column containing the color of the mark. This column can be either a string or an integer. As a string, the color name can be written directly in your table (this greatly helps in human readability). For more on string columns see *note Gnuastro text table format::. As an integer, you can simply use the numerical identifier of the column. You can see the list of colors with their names and numerical identifiers in Gnuastro by running ConvertType with ‘--listcolors’, or see *note Vector graphics colors::. ‘--listcolors’ The list of acceptable color names, their codes and their representation can be seen with the ‘--listcolors’ option. By “representation” we mean that the color will be shown on the terminal as the background in that column. But this will only be properly visible with “true color” or 24-bit terminals, see ANSI escape sequence standard (https://en.wikipedia.org/wiki/ANSI_escape_code). Most modern GNU/Linux terminals support 24-bit colors natively, and no modification is necessary. For macOS, see the box below. The printed text in standard output is in the *note Gnuastro text table format::, so if you want to store this table, you can simply pipe the output to Gnuastro’s Table program and store it as a FITS table: $ astconvertt --listcolors | astttable -ocolors.fits *macOS terminal colors*: as of August 2022, the default macOS terminal (iTerm) does not support 24-bit colors! The output of ‘--listlines’ therefore does not display the actual colors (you can only use the color names). One tested solution is to install and use iTerm2 (https://iterm2.com), which is free software and available in Homebrew (https://formulae.brew.sh/cask/iterm2). iTerm2 is described as a successor for iTerm and works on macOS 10.14 (released in September 2018) or newer. ‘--marktext=STR/INT’ Column name or number that contains the text that should be printed under the mark. If the column is numeric, the number will be printed under the mark (for example, if you want to write the magnitude or redshift of the object under the mark showing it). For the precision of writing floating point columns, see ‘--marktextprecision’. But if the column has a string format (for example, the name of the object like an NGC1234), you need to define the column as a string column (see *note Gnuastro text table format::). For text with different lengths, set the length in the definition of the column to the maximum length of the strings to be printed. If there are some rows or marks that don’t require text, set the string in this column to ‘n/a’ (not applicable; the blank value for strings in Gnuastro). When having strings with different lengths, make sure to have enough white spaces (for the shorter strings) so the adjacent columns are not taken as part of the string (see *note Gnuastro text table format::). ‘--marktextprecision=INT’ The number of decimal digits to print after the floating point. This is only relevant when ‘--marktext’ is given, and the selected column has a floating point format. ‘--markfont=STR/INT’ Column name or number that contains the font for the displayed text under the mark. This is only relevant if ‘--marktext’ is called. The font should be accessible by Ghostscript. If you are not familiar with the available fonts on your system’s Ghostscript, you can use the ‘--showfonts’ option to see all the fonts in a custom PDF file (one page per font). If you are already familiar with the font you want, but just want to make sure about its presence (or spelling!), you can get a list (on standard output) of all the available fonts with the ‘--listfonts’ option. Both are described below. It is possible to add custom fonts to Ghostscript as described in the Fonts section (https://ghostscript.com/doc/current/Fonts.htm) of the Ghostscript manual. ‘--markfontsize=STR/INT’ Column name or number that contains the font size to use. This is only relevant if a text column has been defined (with ‘--marktext’, described above). The font size is in units of “point”s, see description at the start of this section for more. ‘--showfonts’ Create a special PDF file that shows the name and shape of all available fonts in your system’s Ghostscript. You can use this for selecting the best font to put in the ‘--markfonts’ column. The available fonts can differ from one system to another (depending on how Ghostscript was configured in that system). The PDF file’s name is constructed by appending a ‘-fonts.pdf’ to the file name given to the ‘--output’ option. The PDF file will have one page for each font, and the sizes of the pages are customized for showing the fonts (each page is horizontally elongated). This helps to better check the files by disable “continuous” mode in your PDF viewer, and setting the zoom such that the width of the page corresponds to the width of your PDF viewer. Simply pressing the left/right keys will then nicely show each fonts separately. ‘--listfonts’ Print (to standard output) the names of all available fonts in Ghostscript that you can use for the ‘--markfonts’ column. The available fonts can differ from one system to another (depending on how Ghostscript was configured in that system). If you are not already familiar with the shape of each font, please use ‘--showfonts’ (described above). 5.3 Table ========= Tables are the high-level products of processing on low-leveler data like images or spectra. for example, in Gnuastro, MakeCatalog will process the pixels over an object and produce a catalog (or table) with the properties of each object such as magnitudes and positions (see *note MakeCatalog::). Each one of these properties is a column in its output catalog (or table) and for each input object, we have a row. When there are only a small number of objects (rows) and not too many properties (columns), then a simple plain text file is mainly enough to store, transfer, or even use the produced data. However, to be more efficient, astronomers have defined the FITS binary table standard to store data in a binary format (which cannot be seen in a text editor text). This can offer major advantages: the file size will be greatly reduced and the reading and writing will also be faster (because the RAM and CPU also work in binary). The acceptable table formats are fully described in *note Tables::. Binary tables are not easily readable with basic plain-text editors. There is no fixed/unified standard on how the zero and ones should be interpreted. Unix-like operating systems have flourished because of a simple fact: communication between the various tools is based on human readable characters(1). So while the FITS table standards are very beneficial for the tools that recognize them, they are hard to use in the vast majority of available software. This creates limitations for their generic use. Table is Gnuastro’s solution to this problem. Table has a large set of operations that you can directly do on any recognized table (such as selecting certain rows and doing arithmetic on the columns). For operations that Table does not do internally, FITS tables (ASCII or binary) are directly accessible to the users of Unix-like operating systems (in particular those working the command-line or shell, see *note Command-line interface::). With Table, a FITS table (in binary or ASCII formats) is only one command away from AWK (or any other tool you want to use). Just like a plain text file that you read with the ‘cat’ command. You can pipe the output of Table into any other tool for higher-level processing, see the examples in *note Invoking asttable:: for some simple examples. In the sections below we describe how to effectively use the Table program. We start with *note Column arithmetic::, where the basic concept and methods of applying arithmetic operations on one or more columns are discussed. Afterwards, in *note Operation precedence in Table::, we review the various types of operations available and their precedence in an instance of calling Table. This is a good place to get a general feeling of all the things you can do with Table. Finally, in *note Invoking asttable::, we give some examples and describe each option in Table. ---------- Footnotes ---------- (1) In “The art of Unix programming”, Eric Raymond makes this suggestion to programmers: “When you feel the urge to design a complex binary file format, or a complex binary application protocol, it is generally wise to lie down until the feeling passes.”. This is a great book and strongly recommended, give it a look if you want to truly enjoy your work/life in this environment. 5.3.1 Column arithmetic ----------------------- In many scenarios, you want to apply some kind of operation on the columns and save them in another table or feed them into another program. With Table you can do a rich set of operations on the contents of one or more columns in a table, and save the resulting values as new column(s) in the output table. For seeing the precedence of Column arithmetic in relation to other Table operators, see *note Operation precedence in Table::. To enable column arithmetic, the first 6 characters of the value to ‘--column’ (‘-c’) should be the activation word ‘‘arith ’’ (note the space character in the end, after ‘‘arith’’). After the activation word, you can use reverse polish notation to identify the operators and their operands, see *note Reverse polish notation::. Just note that white-space characters are used between the tokens of the arithmetic expression and that they are meaningful to the command-line environment. Therefore the whole expression (including the activation word) has to be quoted on the command-line or in a shell script (see the examples below). To identify a column you can directly use its name, or specify its number (counting from one, see *note Selecting table columns::). When you are giving a column number, it is necessary to prefix the number with a ‘$’, similar to AWK. Otherwise the number is not distinguishable from a constant number to use in the arithmetic operation. for example, with the command below, the first two columns of ‘table.fits’ will be printed along with a third column that is the result of multiplying the first column with $10^{10}$ (for example, to convert wavelength from Meters to Angstroms). Note that without the ‘<$>’, it is not possible to distinguish between “1” as a column-counter, or “1” as a constant number to use in the arithmetic operation. Also note that because of the significance of <$> for the command-line environment, the single-quotes are the recommended quoting method (as in an AWK expression), not double-quotes (for the significance of using single quotes see the box below). $ asttable table.fits -c1,2 -c'arith $1 1e10 x' *Single quotes when string contains <$>*: On the command-line, or in shell-scripts, <$> is used to expand variables, for example, ‘echo $PATH’ prints the value (a string of characters) in the variable ‘PATH’, it will not simply print ‘$PATH’. This operation is also permitted within double quotes, so ‘echo "$PATH"’ will produce the same output. This is good when printing values, for example, in the command below, ‘$PATH’ will expand to the value within it. $ echo "My path is: $PATH" If you actually want to return the literal string ‘$PATH’, not the value in the ‘PATH’ variable (like the scenario here in column arithmetic), you should put it in single quotes like below. The printed value here will include the ‘$’, please try it to see for yourself and compare to above. $ echo 'My path is: $PATH' Therefore, when your column arithmetic involves the <$> sign (to specify columns by number), quote your ‘arith ’ string with a single quotation mark. Otherwise you can use both single or double quotes. Alternatively, if the columns have meta-data and the first two are respectively called ‘AWAV’ and ‘SPECTRUM’, the command above is equivalent to the command below. Note that the character ‘<$>’ is no longer necessary in this scenario (because names will not be confused with numbers): $ asttable table.fits -cAWAV,SPECTRUM -c'arith AWAV 1e10 x' Comparison of the two commands above clearly shows why it is recommended to use column names instead of numbers. When the columns have descriptive names, the command/script actually becomes much more readable, describing the intent of the operation. It is also independent of the low-level table structure: for the second command, the column numbers of the ‘AWAV’ and ‘SPECTRUM’ columns in ‘table.fits’ is irrelevant. Column arithmetic changes the values of the data within the column. So the old column meta data cannot be used any more. By default the output column of the arithmetic operation will be given a generic metadata (for example, its name will be ‘ARITH_1’, which is hardly useful!). But meta data are critically important and it is good practice to always have short, but descriptive, names for each columns, units and also some comments for more explanation. To add metadata to a column, you can use the ‘--colmetadata’ option that is described in *note Invoking asttable:: and *note Operation precedence in Table::. Since the arithmetic expressions are a value to ‘--column’, it does not necessarily have to be a separate option, so the commands above are also identical to the command below (note that this only has one ‘-c’ option). Just be very careful with the quoting! With the ‘--colmetadata’ option, we are also giving a name, units and a comment to the third column. $ asttable table.fits -cAWAV,SPECTRUM,'arith AWAV 1e10 x' \ --colmetadata=3,AWAV_A,angstrom,"Wavelength (in Angstroms)" In case you need to append columns from other tables (with ‘--catcolumnfile’), you can use those extra columns in column arithmetic also. The easiest, and most robust, way is that your columns of interest (in all files whose columns are to be merged) have different names. In this scenario, you can simply use the names of the columns you plan to append. If there are similar names, note that by default Table appends a ‘-N’ to similar names (where ‘N’ is the file counter given to ‘--catcolumnfile’, see the description of ‘--catcolumnfile’ for more). Using column numbers can get complicated: if the number is smaller than the main input’s number of columns, the main input’s column will be used. Otherwise (when the requested column number is larger than the main input’s number of columns), the final output (after appending all the columns from all the possible files) column number will be used. Almost all the arithmetic operators of *note Arithmetic operators:: are also supported for column arithmetic in Table. In particular, the few that are not present in the Gnuastro library(1) are not yet supported for column arithmetic. Besides the operators in *note Arithmetic operators::, several operators are only available in Table to use on table columns. ‘wcs-to-img’ Convert the given WCS positions to image/dataset coordinates based on the number of dimensions in the WCS structure of ‘--wcshdu’ extension/HDU in ‘--wcsfile’. It will output the same number of columns. The first popped operand is the last FITS dimension. for example, the two commands below (which have the same output) will produce 5 columns. The first three columns are the input table’s ID, RA and Dec columns. The fourth and fifth columns will be the pixel positions in ‘image.fits’ that correspond to each RA and Dec. $ asttable table.fits -cID,RA,DEC,'arith RA DEC wcs-to-img' \ --wcsfile=image.fits $ asttable table.fits -cID,RA -cDEC \ -c'arith RA DEC wcs-to-img' --wcsfile=image.fits ‘img-to-wcs’ Similar to ‘wcs-to-img’, except that image/dataset coordinates are converted to WCS coordinates. ‘distance-flat’ Return the distance between two points assuming they are on a flat surface. Note that each point needs two coordinates, so this operator needs four operands (currently it only works for 2D spaces). The first and second popped operands are considered to belong to one point and the third and fourth popped operands to the second point. Each of the input points can be a single coordinate or a full table column (containing many points). In other words, the following commands are all valid: $ asttable table.fits \ -c'arith X1 Y1 X2 Y2 distance-flat' $ asttable table.fits \ -c'arith X Y 12.345 6.789 distance-flat' $ asttable table.fits \ -c'arith 12.345 6.789 X Y distance-flat' In the first case we are assuming that ‘table.fits’ has the following four columns ‘X1’, ‘Y1’, ‘X2’, ‘Y2’. The returned column by this operator will be the difference between two points in each row with coordinates like the following (‘X1’, ‘Y1’) and (‘X2’, ‘Y2’). In other words, for each row, the distance between different points is calculated. In the second and third cases (which are identical), it is assumed that ‘table.fits’ has the two columns ‘X’ and ‘Y’. The returned column by this operator will be the difference of each row with the fixed point at (12.345, 6.789). ‘distance-on-sphere’ Return the spherical angular distance (along a great circle, in degrees) between the given two points. Note that each point needs two coordinates (in degrees), so this operator needs four operands. The first and second popped operands are considered to belong to one point and the third and fourth popped operands to the second point. Each of the input points can be a single coordinate or a full table column (containing many points). In other words, the following commands are all valid: $ asttable table.fits \ -c'arith RA1 DEC1 RA2 DEC2 distance-on-sphere' $ asttable table.fits \ -c'arith RA DEC 9.876 5.432 distance-on-sphere' $ asttable table.fits \ -c'arith 9.876 5.432 RA DEC distance-on-sphere' In the first case we are assuming that ‘table.fits’ has the following four columns ‘RA1’, ‘DEC1’, ‘RA2’, ‘DEC2’. The returned column by this operator will be the difference between two points in each row with coordinates like the following (‘RA1’, ‘DEC1’) and (‘RA2’, ‘DEC2’). In other words, for each row, the angular distance between different points is calculated. In the second and third cases (which are identical), it is assumed that ‘table.fits’ has the two columns ‘RA’ and ‘DEC’. The returned column by this operator will be the difference of each row with the fixed point at (9.876, 5.432). The distance (along a great circle) on a sphere between two points is calculated with the equation below, where $r_1$, $r_2$, $d_1$ and $d_2$ are the right ascensions and declinations of points 1 and 2. $$\cos(d)=\sin(d_1)\sin(d_2)+\cos(d_1)\cos(d_2)\cos(r_1-r_2)$$ ‘ra-to-degree’ Convert the hour-wise Right Ascension (RA) string, in the sexagesimal format of ‘_h_m_s’ or ‘_:_:_’, to degrees. Note that the input column has to have a string format. In FITS tables, string columns are well-defined. For plain-text tables, please follow the standards defined in *note Gnuastro text table format::, otherwise the string column will not be read. $ asttable catalog.fits -c'arith RA ra-to-degree' $ asttable catalog.fits -c'arith $5 ra-to-degree' ‘dec-to-degree’ Convert the sexagesimal Declination (Dec) string, in the format of ‘_d_m_s’ or ‘_:_:_’, to degrees (a single floating point number). For more details please see the ‘ra-to-degree’ operator. ‘degree-to-ra’ Convert degrees (a column with a single floating point number) to the Right Ascension, RA, string (in the sexagesimal format hours, minutes and seconds, written as ‘_h_m_s’). The output will be a string column so no further mathematical operations can be done on it. The output file can be in any format (for example, FITS or plain-text). If it is plain-text, the string column will be written following the standards described in *note Gnuastro text table format::. ‘degree-to-dec’ Convert degrees (a column with a single floating point number) to the Declination, Dec, string (in the format of ‘_d_m_s’). See the ‘degree-to-ra’ for more on the format of the output. ‘date-to-sec’ Return the number of seconds from the Unix epoch time (00:00:00 Thursday, January 1st, 1970). The input (popped) operand should be a string column in the FITS date format (most generally: ‘YYYY-MM-DDThh:mm:ss.ddd...’). The returned operand will be named ‘UNIXSEC’ (short for Unix-seconds) and will be a 64-bit, signed integer, see *note Numeric data types::. If the input string has sub-second precision, it will be ignored because floating point numbers cannot accurately store numbers with many significant digits. To preserve sub-second precision, please use ‘date-to-millisec’. For example, in the example below we are using this operator, in combination with the ‘--keyvalue’ option of the Fits program, to sort your desired FITS files by observation date (value in the ‘DATE-OBS’ keyword in example below): $ astfits *.fits --keyvalue=DATE-OBS --colinfoinstdout \ | asttable -cFILENAME,'arith DATE-OBS date-to-sec' \ --colinfoinstdout \ | asttable --sort=UNIXSEC If you do not need to see the Unix-seconds any more, you can add a ‘-cFILENAME’ (short for ‘--column=FILENAME’) at the end. For more on ‘--keyvalue’, see *note Keyword inspection and manipulation::. ‘date-to-millisec’ Return the number of milli-seconds from the Unix epoch time (00:00:00 Thursday, January 1st, 1970). The input (popped) operand should be a string column in the FITS date format (most generally: ‘YYYY-MM-DDThh:mm:ss.ddd...’, where ‘.ddd’ is the optional sub-second component). The returned operand will be named ‘UNIXMILLISEC’ (short for Unix milli-seconds) and will be a 64-bit, signed integer, see *note Numeric data types::. The returned value is not a floating point type because for large numbers, floating point data types loose single-digit precision (which is important here). Other than the units of the output, this operator behaves similarly to ‘date-to-sec’. See the description of that operator for an example. ---------- Footnotes ---------- (1) For a list of the Gnuastro library arithmetic operators, please see the macros starting with ‘GAL_ARITHMETIC_OP’ and ending with the operator name in *note Arithmetic on datasets::. 5.3.2 Operation precedence in Table ----------------------------------- The Table program can do many operations on the rows and columns of the input tables and they are not always applied in the order you call the operation on the command-line. In this section we will describe which operation is done before/after which operation. Knowing this precedence table is important to avoid confusion when you ask for more than one operation. For a description of each option, please see *note Invoking asttable::. Column information (‘--information’ or ‘-i’) When given this option, the column data are not read at all. Table simply reads the column metadata (name, units, numeric data type and comments), and the number of rows and prints them. Table then terminates and no other operation is done. This can therefore be called at the end of an arbitrarily long Table command only to remember the column metadata, then deleted to continue writing the command (using the shell’s history to retrieve the previous command with an up-arrow key). Column selection (‘--column’) When this option is given, only the columns given to this option (from the main input) will be used for all future steps. When ‘--column’ (or ‘-c’) is not given, then all the main input’s columns will be used in the next steps. Column(s) from other file(s) (‘--catcolumnfile’ and ‘--catcolumnhdu’, ‘--catcolumns’) When column concatenation (addition) is requested, columns from other tables (in other files, or other HDUs of the same FITS file) will be added after the existing columns read from the main input. In one command, you can call these options multiple times to allow addition of columns from many files. The rest of the operations below are done on the rows, therefore you can merge the columns of various tables into one table, then start adding/limiting the rows of the output. If any of the row-based operations below are requested in the same ‘asttable’ command, they will also be applied to the rows of the added columns. However, the conditions to keep/reject rows can only be applied to the rows of the columns in main input table (not the columns that are added with these options). Rows from other file(s) (‘--catrowfile’ and ‘--catrowhdu’) With this feature, you can import rows from other tables (in other files, or other HDUs of the same FITS file). The same column selection of ‘--column’ is applied to the tables given here. The column metadata (name, units and comments) will be taken from the main input. Two conditions are mandatory for adding rows: • The number of columns used from the new tables must be equal to the number of columns in memory, by the time control reaches here. • The data type of each column (see *note Numeric data types::) should be the same as the respective column in memory by the time control reaches here. If the data types are different, you can use the type conversion operators of Table’s column arithmetic on the inputs in a separate command first (see *note Numerical type conversion operators:: and *note Column arithmetic::). Row selection by value in a column • ‘--range’: only keep rows within a certain interval in given column. • ‘--inpolygon’: only keep rows within the polygon of ‘--polygon’. • ‘--outpolygon’: only keep rows outside the polygon of ‘--polygon’. • ‘--equal’: only keep rows with specified value in given column. • ‘--notequal’: only keep rows without specified value in given column. • ‘--noblank’: only keep rows that are not blank in the given column(s). These options take certain column(s) as input and remove some rows from the full table (all columns), based on the given limitations. They can be called any number of times (to limit the final rows based on values in different columns for example). Since these are row-rejection operations, their internal order is irrelevant. In other words, it makes no difference if ‘--equal’ is called before or after ‘--range’ for example. As a side-effect, because NaN/blank values are defined to fail on any condition, these operations will also remove rows with NaN/blank values in the specified column they are checking. Also, the columns that are used for these operations do not necessarily have to be in the final output table (you may not need the column after doing the selection based on it). Even though these options are applied after merging columns from other tables, currently their condition-columns can only come from the main input table. In other words, even though the rows of the added columns (from another file) will also be selected with these options, the condition to keep/reject rows cannot be taken from the newly added columns. These options are applied first because the speed of later operations can be greatly affected by the number of rows. For example, if you also call the ‘--sort’ option, and your row selection will result in 50 rows (from an input of 1000 rows), limiting the number of rows can greatly speed up the sorting in your final output. Sorting (‘--sort’) Sort of the rows based on values in a certain column. The column to sort by can only come from the main input table columns (not columns that may have been added with ‘--catcolumnfile’). Row selection (by position) • ‘--head’: keep only requested number of top rows. • ‘--tail’: keep only requested number of bottom rows. • ‘--rowrandom’: keep only a random number of rows. • ‘--rowrange’: keep only rows within a certain positional interval. These options limit/select rows based on their position within the table (not their value in any certain column). Column arithmetic Once the final rows are selected in the requested order, column arithmetic is done (if requested). For more on column arithmetic, see *note Column arithmetic::. Column metadata (‘--colmetadata’) Changing column metadata is necessary after column arithmetic or adding new columns from other tables (that were done above). Output row selection (‘--noblankend’) Only keep the output rows that do not have a blank value in the given column(s). For example, you may need to apply arithmetic operations on the columns (through *note Column arithmetic::) before rejecting the undesired rows. After the arithmetic operation is done, you can use the ‘where’ operator to set the non-desired columns to NaN/blank and use ‘--noblankend’ option to remove them just before writing the output. In other scenarios, you may want to remove blank values based on columns in another table. You can also use the modified metadata of the previous steps to use updated names! See the example below for applying any generic value-based row selection based on ‘--noblankend’. As an example, let’s review how Table interprets the command below. We are assuming that ‘table.fits’ contains at least three columns: ‘RA’, ‘DEC’ and ‘PARAM’ and you only want the RA and Dec of the rows where $p\times 2<5$ ($p$ is the value of each row in the ‘PARAM’ column). asttable table.fits -cRA,DEC --noblankend=MULTIP \ -c'arith PARAM 2 x set-i i i 5 gt nan where' \ --colmetadata=3,MULTIP,unit,"Description of column" Due to the precedence described in this section, Table does these operations (which are independent of the order of the operations written on the command-line): 1. At the start (with ‘-cRA,DEC’), Table reads the ‘RA’ and ‘DEC’ columns. 2. In between all the operations in the command above, Column arithmetic (with ‘-c'arith ...'’) has the highest precedence. So the arithmetic operation is done and stored as a new (third) column. In this arithmetic operation, we multiply all the values of the ‘PARAM’ column by 2, then set all those with a value larger than 5 to NaN (for more on understanding this operation, see the ‘‘set-’’ and ‘‘where’’ operators in *note Arithmetic operators::). 3. Updating column metadata (with ‘--colmetadata’) is then done to give a name (‘MULTIP’) to the newly calculated (third) column. During the process, besides a name, we also set a unit and description for the new column. These metadata entries are _very important_, so always be sure to add metadata after doing column arithmetic. 4. The lowest precedence operation is ‘--noblankend=MULTIP’. So only rows that are not blank/NaN in the ‘MULTIP’ column are kept. 5. Finally, the output table (with three columns) is written to the command-line. If you also want to print the column metadata, you can use the ‘--colinfoinstdout’ option. Alternatively, if you want the output in a file, you can use the ‘--output’ option to save the table in FITS or plain-text format. *Out of precedence:* It may happen that your desired operation needs a separate precedence. In this case you can pipe the output of Table into another call of Table and use the ‘--colinfoinstdout’ option to preserve the metadata between the two calls. For example, let’s assume that you want to sort the output table from the example command above based on the new ‘MULTIP’ column. Since sorting is done prior to column arithmetic, you cannot do it in one command, but you can circumvent this limitation by simply piping the output (including metadata) to another call to Table: asttable table.fits -cRA,DEC --noblankend=MULTIP --colinfoinstdout \ -c'arith PARAM 2 x set-i i i 5 gt nan where' \ --colmetadata=3,MULTIP,unit,"Description of column" \ | asttable --sort=MULTIP --output=selected.fits 5.3.3 Invoking Table -------------------- Table will read/write, select, modify, or show the information of the rows and columns in recognized Table formats (including FITS binary, FITS ASCII, and plain text table files, see *note Tables::). Output columns can also be determined by number or regular expression matching of column names, units, or comments. The executable name is ‘asttable’ with the following general template $ asttable [OPTION...] InputFile One line examples: ## Get the table column information (name, data type, or units): $ asttable bintab.fits --information ## Print columns named RA and DEC, followed by all the columns where ## the name starts with "MAG_": $ asttable bintab.fits --column=RA --column=DEC --column=/^MAG_/ ## Similar to the above, but with one call to `--column' (or `-c'), ## also sort the rows by the input's photometric redshift (`Z_PHOT') ## column. To confirm the sort, you can add `Z_PHOT' to the columns ## to print. $ asttable bintab.fits -cRA,DEC,/^MAG_/ --sort=Z_PHOT ## Similar to the above, but only print rows that have a photometric ## redshift between 2 and 3. $ asttable bintab.fits -cRA,DEC,/^MAG_/ --range=Z_PHOT,2:3 ## Only print rows with a value in the 10th column above 100000: $ asttable bintab.fits --range=10,10e5,inf ## Only print the 2nd column, and the third column multiplied by 5, ## Save the resulting two columns in `table.txt' $ asttable bintab.fits -c2,'arith $2 5 x' -otable.fits ## Sort the output columns by the third column, save output: $ asttable bintab.fits --sort=3 -ooutput.txt ## Subtract the first column from the second in `cat.fits' (can also ## be a text table) and keep the third and fourth columns. $ asttable cat.txt -c'arith $2 $1 -',3,4 -ocat.fits ## Convert sexagesimal coordinates to degrees (same can be done in a ## large table given as argument). $ echo "7h34m35.5498 31d53m14.352s" | asttable ## Convert RA and Dec in degrees to sexagesimal (same can be done in a ## large table given as argument). echo "113.64812416667 31.88732" \ | asttable -c'arith $1 degree-to-ra $2 degree-to-dec' Table’s input dataset can be given either as a file or from Standard input (piped from another program, see *note Standard input::). In the absence of selected columns, all the input’s columns and rows will be written to the output. The full set of operations Table can do are described in detail below, but for a more high-level introduction to the various operations, and their precedence, see *note Operation precedence in Table::. If any output file is explicitly requested (with ‘--output’) the output table will be written in it. When no output file is explicitly requested the output table will be written to the standard output. If the specified output is a FITS file, the type of FITS table (binary or ASCII) will be determined from the ‘--tabletype’ option. If the output is not a FITS file, it will be printed as a plain text table (with space characters between the columns). When the output is not binary (for example standard output or a plain-text), the ‘--txtf32*’ or ‘--txtf64*’ options can be used for the formatting of floating point columns. When the columns are accompanied by meta-data (like column name, units, or comments), this information will also printed in the plain text file before the table, as described in *note Gnuastro text table format::. For the full list of options common to all Gnuastro programs please see *note Common options::. Options can also be stored in directory, user or system-wide configuration files to avoid repeating on the command-line, see *note Configuration files::. Table does not follow Automatic output that is common in most Gnuastro programs, see *note Automatic output::. Thus, in the absence of an output file, the selected columns will be printed on the command-line with no column information, ready for redirecting to other tools like ‘awk’. *Sexagesimal coordinates as floats in plain-text tables:* When a column is determined to be a floating point type (32-bit or 64-bit) in a plain-text table, it can contain sexagesimal values in the format of ‘‘_h_m_s’’ (for RA) and ‘‘_d_m_s’’ (for Dec), where the ‘‘_’’s are place-holders for numbers. In this case, the string will be immediately converted to a single floating point number (in units of degrees) and stored in memory with the rest of the column or table. Besides being useful in large tables, with this feature, conversion to sexagesimal coordinates to degrees becomes very easy, for example: echo "7h34m35.5498 31d53m14.352s" | asttable The inverse can also be done with the more general column arithmetic operators: echo "113.64812416667 31.88732" \ | asttable -c'arith $1 degree-to-ra $2 degree-to-dec' If you want to preserve the sexagesimal contents of a column, you should store that column as a string, see *note Gnuastro text table format::. ‘-i’ ‘--information’ Only print the column information in the specified table on the command-line and exit. Each column’s information (number, name, units, data type, and comments) will be printed as a row on the command-line. Note that the FITS standard only requires the data type (see *note Numeric data types::), and in plain text tables, no meta-data/information is mandatory. Gnuastro has its own convention in the comments of a plain text table to store and transfer this information as described in *note Gnuastro text table format::. This option will take precedence over all other operations in Table, so when it is called along with other operations, they will be ignored, see *note Operation precedence in Table::. This can be useful if you forget the identifier of a column after you have already typed some on the command-line. You can simply add a ‘-i’ to your already-written command (without changing anything) and run Table, to see the whole list of column names and information. Then you can use the shell history (with the up arrow key on the keyboard), and retrieve the last command with all the previously typed columns present, delete ‘-i’ and add the identifier you had forgot. ‘-c STR/INT’ ‘--column=STR/INT’ Set the output columns either by specifying the column number, or name. For more on selecting columns, see *note Selecting table columns::. If a value of this option starts with ‘‘arith ’’, column arithmetic will be activated, allowing you to edit/manipulate column contents. For more on column arithmetic see *note Column arithmetic::. To ask for multiple columns this option can be used in two ways: 1) multiple calls to this option, 2) using a comma between each column specifier in one call to this option. These different solutions may be mixed in one call to Table: for example, ‘‘-cRA,DEC,MAG’’, or ‘‘-cRA,DEC -cMAG’’ are both equivalent to ‘‘-cRA -cDEC -cMAG’’. The order of the output columns will be the same order given to the option or in the configuration files (see *note Configuration file precedence::). This option is not mandatory, if no specific columns are requested, all the input table columns are output. When this option is called multiple times, it is possible to output one column more than once. ‘-w FITS’ ‘--wcsfile=FITS’ FITS file that contains the WCS to be used in the ‘wcs-to-img’ and ‘img-to-wcs’ operators of *note Column arithmetic::. The extension name/number within the FITS file can be specified with ‘--wcshdu’. If the value to this option is ‘‘none’’, no WCS will be written in the output. ‘-W STR’ ‘--wcshdu=STR’ FITS extension/HDU in the FITS file given to ‘--wcsfile’ (see the description of ‘--wcsfile’ for more). ‘-L FITS/TXT’ ‘--catcolumnfile=FITS/TXT’ Concatenate (or add, or append) the columns of this option’s value (a filename) to the output columns. This option may be called multiple times (to add columns from more than one file into the final output), the columns from each file will be added in the same order that this option is called. The number of rows in the file(s) given to this option has to be the same as the input table (before any type of row-selection), see *note Operation precedence in Table::. By default all the columns of the given file will be appended, if you only want certain columns to be appended, use the ‘--catcolumns’ option to specify their name or number (see *note Selecting table columns::). Note that the columns given to ‘--catcolumns’ must be present in all the given files (if this option is called more than once with more than one file). If the file given to this option is a FITS file, it is necessary to also define the corresponding HDU/extension with ‘--catcolumnhdu’. Also note that no operation (such as row selection and arithmetic) is applied to the table given to this option. If the appended columns have a name, and their name is already present in the table before adding those columns, the column names of each file will be appended with a ‘-N’, where ‘N’ is a counter starting from 1 for each appended table. Just note that in the FITS standard (and thus in Gnuastro), column names are not case-sensitive. This is done because when concatenating columns from multiple tables (more than two) into one, they may have the same name, and it is not good practice to have multiple columns with the same name. You can disable this feature with ‘--catcolumnrawname’. Generally, you can use the ‘--colmetadata’ option to update column metadata in the same command, after all the columns have been concatenated. For example, let’s assume you have two catalogs of the same objects (same number of rows) in different filters. Such that ‘f160w-cat.fits’ has a ‘MAGNITUDE’ column that has the magnitude of each object in the ‘F160W’ filter and similarly ‘f105w-cat.fits’, also has a ‘MAGNITUDE’ column, but for the ‘F105W’ filter. You can use column concatenation like below to import the ‘MAGNITUDE’ column from the ‘F105W’ catalog into the ‘F160W’ catalog, while giving each magnitude column a different name: asttable f160w-cat.fits --output=both.fits \ --catcolumnfile=f105w-cat.fits --catcolumns=MAGNITUDE \ --colmetadata=MAGNITUDE,MAG-F160W,log,"Magnitude in F160W" \ --colmetadata=MAGNITUDE-1,MAG-F105W,log,"Magnitude in F105W" For a more complete example, see *note Working with catalogs estimating colors::. *Loading external columns with Arithmetic:* an alternative way to load external columns into your output is to use column arithmetic (*note Column arithmetic::) In particular the ‘load-col-’ operator described in *note Loading external columns::. But this operator will load only one column per file/HDU every time it is called. So if you have many columns to insert, it is much faster to use ‘--catcolumnfile’. Because ‘--catcolumnfile’ will load all the columns in one opening of the file, and possibly even read them all into memory in parallel! ‘-u STR/INT’ ‘--catcolumnhdu=STR/INT’ The HDU/extension of the FITS file(s) that should be concatenated, or appended, by column with ‘--catcolumnfile’. If ‘--catcolumn’ is called more than once with more than one FITS file, it is necessary to call this option more than once. The HDUs will be loaded in the same order as the FITS files given to ‘--catcolumnfile’. ‘-C STR/INT’ ‘--catcolumns=STR/INT’ The column(s) in the file(s) given to ‘--catcolumnfile’ to append. When this option is not given, all the columns will be concatenated. See ‘--catcolumnfile’ for more. ‘--catcolumnrawname’ Do Not modify the names of the concatenated (appended) columns, see description in ‘--catcolumnfile’. ‘-R FITS/TXT’ ‘--catrowfile=FITS/TXT’ Add the rows of the given file to the output table. The selected columns in the tables given to this option should have the same number and datatype and the rows before control reaches this phase (after column selection and column concatenation), for more see *note Operation precedence in Table::. For example, if ‘a.fits’, ‘b.fits’ and ‘c.fits’ have the columns ‘RA’, ‘DEC’ and ‘MAGNITUDE’ (possibly in different column-numbers in their respective table, along with many more columns), the command below will add their rows into the final output that will only have these three columns: $ asttable a.fits --catrowfile=b.fits --catrowhdu=1 \ --catrowfile=c.fits --catrowhdu=1 \ -cRA,DEC,MAGNITUDE --output=allrows.fits *How to avoid repetition when adding rows:* this option will simply add the rows of multiple tables into one, it does not check their contents! Therefore if you use this option on multiple catalogs that may have some shared physical objects in some of their rows, those rows/objects will be repeated in the final table. In such scenarios, to avoid potential repetition, it is better to use *note Match:: (with ‘--notmatched’ and ‘--outcols=AAA,BBB’) instead of Table. For more on using Match for this scenario, see the description of ‘--outcols’ in *note Invoking astmatch::. ‘-X STR’ ‘--catrowhdu=STR’ The HDU/extension of the FITS file(s) that should be concatenated, or appended, by rows with ‘--catrowfile’. If ‘--catrowfile’ is called more than once with more than one FITS file, it is necessary to call this option more than once also (once for every FITS table given to ‘--catrowfile’). The HDUs will be loaded in the same order as the FITS files given to ‘--catrowfile’. ‘-O’ ‘--colinfoinstdout’ Add column metadata when the output is printed in the standard output. Usually the standard output is used for a fast visual check, or to pipe into other metadata-agnostic programs (like AWK) for further processing. So by default meta-data are not included. But when piping to other Gnuastro programs (where metadata can be interpreted and used) it is recommended to use this option and use column names in the next program. ‘-r STR,FLT:FLT’ ‘--range=STR,FLT:FLT’ Only output rows that have a value within the given range in the ‘STR’ column (can be a name or counter). Note that the range is only inclusive in the lower-limit. for example, with ‘--range=sn,5:20’ the output’s columns will only contain rows that have a value in the ‘sn’ column (not case-sensitive) that is greater or equal to 5, and less than 20. Also you can use the comma for separating the values such as this ‘--range=sn,5,20’. For the precedence of this operation in relation to others, see *note Operation precedence in Table::. This option can be called multiple times (different ranges for different columns) in one run of the Table program. This is very useful for selecting the final rows from multiple criteria/columns. The chosen column does not have to be in the output columns. This is good when you just want to select using one column’s values, but do not need that column anymore afterwards. For one example of using this option, see the example under ‘--sigclip-median’ in *note Invoking aststatistics::. ‘--inpolygon=STR1,STR2’ Only return rows where the given coordinates are inside the polygon specified by the ‘--polygon’ option. The coordinate columns are the given ‘STR1’ and ‘STR2’ columns, they can be a column name or counter (see *note Selecting table columns::). For the precedence of this operation in relation to others, see *note Operation precedence in Table::. Note that the chosen columns does not have to be in the output columns (which are specified by the ‘--column’ option). for example, if we want to select rows in the polygon specified in *note Dataset inspection and cropping::, this option can be used like this (you can remove the double quotations and write them all in one line if you remove the white-spaces around the colon separating the column vertices): asttable table.fits --inpolygon=RA,DEC \ --polygon="53.187414,-27.779152 \ : 53.159507,-27.759633 \ : 53.134517,-27.787144 \ : 53.161906,-27.807208" \ *Flat/Euclidean space: * The ‘--inpolygon’ option assumes a flat/Euclidean space so it is only correct for RA and Dec when the polygon size is very small like the example above. If your polygon is a degree or larger, it may not return correct results. Please get in touch if you need such a feature (see *note Suggest new feature::). ‘--outpolygon=STR1,STR2’ Only return rows where the given coordinates are outside the polygon specified by the ‘--polygon’ option. This option is very similar to the ‘--inpolygon’ option, so see the description there for more. ‘--polygon=STR’ ‘--polygon=FLT,FLT:FLT,FLT:...’ The polygon to use for the ‘--inpolygon’ and ‘--outpolygon’ options. This option is parsed in an identical way to the same option in the Crop program, so for more information on how to use it, see *note Crop options::. ‘-e STR,INT/FLT,...’ ‘--equal=STR,INT/FLT,...’ Only output rows that are equal to the given number(s) in the given column. The first argument is the column identifier (name or number, see *note Selecting table columns::), after that you can specify any number of values. For the precedence of this operation in relation to others, see *note Operation precedence in Table::. for example, ‘--equal=ID,5,6,8’ will only print the rows that have a value of 5, 6, or 8 in the ‘ID’ column. This option can also be called multiple times, so ‘--equal=ID,4,5 --equal=ID,6,7’ has the same effect as ‘--equal=4,5,6,7’. *Equality and floating point numbers:* Floating point numbers are only approximate values (see *note Numeric data types::). In this context, their equality depends on how the input table was originally stored (as a plain text table or as an ASCII/binary FITS table). If you want to select floating point numbers, it is strongly recommended to use the ‘--range’ option and set a very small interval around your desired number, do not use ‘--equal’ or ‘--notequal’. The ‘--equal’ and ‘--notequal’ options also work when the given column has a string type. In this case the given value to the option will also be parsed as a string, not as a number. When dealing with string columns, be careful with trailing white space characters (the actual value maybe adjusted to the right, left, or center of the column’s width). If you need to account for such white spaces, you can use shell quoting. for example, ‘--equal=NAME," myname "’. *Strings with a comma (,):* When your desired column values contain a comma, you need to put a ‘‘\’’ before the internal comma (within the value). Otherwise, the comma will be interpreted as a delimiter between multiple values, and anything after it will be interpreted as a separate string. For example, assume column ‘AB’ of your ‘table.fits’ contains this value: ‘‘cd,ef’’ in your desired rows. To extract those rows, you should use the command below: $ asttable table.fits --equal=AB,cd\,ef ‘-n STR,INT/FLT,...’ ‘--notequal=STR,INT/FLT,...’ Only output rows that are _not_ equal to the given number(s) in the given column. The first argument is the column identifier (name or number, see *note Selecting table columns::), after that you can specify any number of values. for example, ‘--notequal=ID,5,6,8’ will only print the rows where the ‘ID’ column does not have value of 5, 6, or 8. This option can also be called multiple times, so ‘--notequal=ID,4,5 --notequal=ID,6,7’ has the same effect as ‘--notequal=4,5,6,7’. Be very careful if you want to use the non-equality with floating point numbers, see the special note under ‘--equal’ for more. This option also works when the given column has a string type, see the description under ‘--equal’ (above) for more. ‘-b STR[,STR[,STR]]’ ‘--noblank=STR[,STR[,STR]]’ Only output rows that are _not_ blank in the given column of the _input_ table. Like above, the columns can be specified by their name or number (counting from 1). This option can be called multiple times, so ‘--noblank=MAG --noblank=PHOTOZ’ is equivalent to ‘--noblank=MAG,PHOTOZ’. For the precedence of this operation in relation to others, see *note Operation precedence in Table::. for example, if ‘table.fits’ has blank values (NaN in floating point types) in the ‘magnitude’ and ‘sn’ columns, with ‘--noblank=magnitude,sn’, the output will not contain any rows with blank values in these two columns. If you want _all_ columns to be checked, simply set the value to ‘_all’ (in other words: ‘--noblank=_all’). This mode is useful when there are many columns in the table and you want a “clean” output table (with no blank values in any column): entering their name or number one-by-one can be buggy and frustrating. In this mode, no other column name should be given. for example, if you give ‘--noblank=_all,magnitude’, then Table will assume that your table actually has a column named ‘_all’ and ‘magnitude’, and if it does not, it will abort with an error. If you want to change column values using *note Column arithmetic:: (and set some to blank, to later remove), or you want to select rows based on columns that you have imported from other tables, you should use the ‘--noblankend’ option described below. Also, see *note Operation precedence in Table::. ‘-s STR’ ‘--sort=STR’ Sort the output rows based on the values in the ‘STR’ column (can be a column name or number). By default the sort is done in ascending/increasing order, to sort in a descending order, use ‘--descending’. For the precedence of this operation in relation to others, see *note Operation precedence in Table::. The chosen column does not have to be in the output columns. This is good when you just want to sort using one column’s values, but do not need that column anymore afterwards. ‘-d’ ‘--descending’ When called with ‘--sort’, rows will be sorted in descending order. ‘-H INT’ ‘--head=INT’ Only print the given number of rows from the _top_ of the final table. Note that this option only affects the _output_ table. for example, if you use ‘--sort’, or ‘--range’, the printed rows are the first _after_ applying the sort sorting, or selecting a range of the full input. This option cannot be called with ‘--tail’, ‘--rowrange’ or ‘--rowrandom’. For the precedence of this operation in relation to others, see *note Operation precedence in Table::. If the given value to ‘--head’ is 0, the output columns will not have any rows and if it is larger than the number of rows in the input table, all the rows are printed (this option is effectively ignored). This behavior is taken from the ‘head’ program in GNU Coreutils. ‘-t INT’ ‘--tail=INT’ Only print the given number of rows from the _bottom_ of the final table. See ‘--head’ for more. This option cannot be called with ‘--head’, ‘--rowrange’ or ‘--rowrandom’. ‘--rowrange=INT,INT’ Only return the rows within the requested positional range (inclusive on both sides). Therefore, ‘--rowrange=5,7’ will return 3 of the input rows, row 5, 6 and 7. This option will abort if any of the given values is larger than the total number of rows in the table. For the precedence of this operation in relation to others, see *note Operation precedence in Table::. With the ‘--head’ or ‘--tail’ options you can only see the top or bottom few rows. However, with this option, you can limit the returned rows to a contiguous set of rows in the middle of the table. Therefore this option cannot be called with ‘--head’, ‘--tail’, or ‘--rowrandom’. ‘--rowrandom=INT’ Select ‘INT’ rows from the input table by random (assuming a uniform distribution). This option is applied _after_ the value-based selection options (such as ‘--sort’, ‘--range’, and ‘--polygon’). On the other hand, only the row counters are randomly selected, this option does not change the order. Therefore, if ‘--rowrandom’ is called together with ‘--sort’, the returned rows are still sorted. This option cannot be called with ‘--head’, ‘--tail’, or ‘--rowrange’. For the precedence of this operation in relation to others, see *note Operation precedence in Table::. This option will only have an effect if ‘INT’ is larger than the number of rows when it is activated (after the value-based selection options have been applied). When there are fewer rows, a warning is printed, saying that this option has no effect. The warning can be disabled with the ‘--quiet’ option. Due to its nature (to be random), the output of this option differs in each run. Therefore 5 calls to Table with ‘--rowrandom’ on the same input table will generate 5 different outputs. If you want a reproducible random selection, set the ‘GSL_RNG_SEED’ environment variable and also use the ‘--envseed’ option, for more see *note Generating random numbers::. ‘--envseed’ Read the random number generator seed from the ‘GSL_RNG_SEED’ environment variable for ‘--rowrandom’ (instead of generating a different seed internally on every run). This is useful if you want a reproducible random selection of the input rows. For more, see *note Generating random numbers::. ‘-E STR[,STR[,STR]]’ ‘--noblankend=STR[,STR[,STR]]’ Remove all rows in the requested _output_ columns that have a blank value. Like above, the columns can be specified by their name or number (counting from 1). This option can be called multiple times, so ‘--noblank=MAG --noblank=PHOTOZ’ is equivalent to ‘--noblank=MAG,PHOTOZ’. For the precedence of this operation in relation to others, see *note Operation precedence in Table::. for example, if your final output table (possibly after column arithmetic, or adding new columns) has blank values (NaN in floating point types) in the ‘magnitude’ and ‘sn’ columns, with ‘--noblankend=magnitude,sn’, the output will not contain any rows with blank values in these two columns. If you want blank values to be removed from the main input table _before_ any further processing (like adding columns, sorting or column arithmetic), you should use the ‘--noblank’ option. With the ‘--noblank’ option, the column(s) that is(are) given does not necessarily have to be in the output (it is just temporarily used for reading the inputs and selecting rows, but does not necessarily need to be present in the output). However, the column(s) given to this option should exist in the output. If you want _all_ columns to be checked, simply set the value to ‘_all’ (in other words: ‘--noblankend=_all’). This mode is useful when there are many columns in the table and you want a “clean” output table (with no blank values in any column): entering their name or number one-by-one can be buggy and frustrating. In this mode, no other column name should be given. for example, if you give ‘--noblankend=_all,magnitude’, then Table will assume that your table actually has a column named ‘_all’ and ‘magnitude’, and if it does not, it will abort with an error. This option is applied just before writing the final table (after ‘--colmetadata’ has finished). So in case you changed the column metadata, or added new columns, you can use the new names, or the newly defined column numbers. For the precedence of this operation in relation to others, see *note Operation precedence in Table::. ‘-m STR/INT,STR[,STR[,STR]]’ ‘--colmetadata=STR/INT,STR[,STR[,STR]]’ Update the specified column metadata in the output table. This option is applied after all other column-related operations are complete, for example, column arithmetic, or column concatenation. For the precedence of this operation in relation to others, see *note Operation precedence in Table::. The first value (before the first comma) given to this option is the column’s identifier. It can either be a counter (positive integer, counting from 1), or a name (the column’s name in the output if this option was not called). After the to-be-updated column is identified, at least one other string should be given, with a maximum of three strings. The first string after the original name will the selected column’s new name. The next (optional) string will be the selected column’s unit and the third (optional) will be its comments. If the two optional strings are not given, the original column’s units or comments will remain unchanged. If any of the values contains a comma, you should place a ‘‘\’’ before the comma to avoid it getting confused with a delimiter. for example, see the command below for a column description that contains a comma: $ asttable table.fits \ --colmetadata=NAME,UNIT,"Comments\, with a comma" Generally, since the comma is commonly used as a delimiter in many scenarios, to avoid complicating your future analysis with the table, it is best to avoid using a comma in the column name and units. Some examples of this option are available in the tutorials, in particular *note Working with catalogs estimating colors::. Here are some more specific examples: ‘--colmetadata=MAGNITUDE,MAG_F160W’ This will convert name of the original ‘MAGNITUDE’ column to ‘MAG_F160W’, leaving the unit and comments unchanged. ‘--colmetadata=3,MAG_F160W,mag’ This will convert name of the third column of the final output to ‘MAG_F160W’ and the units to ‘mag’, while leaving the comments untouched. ‘--colmetadata=MAGNITUDE,MAG_F160W,mag,"Magnitude in F160W filter"’ This will convert name of the original ‘MAGNITUDE’ column to ‘MAG_F160W’, and the units to ‘mag’ and the comments to ‘Magnitude in F160W filter’. Note the double quotations around the comment string, they are necessary to preserve the white-space characters within the column comment from the command-line, into the program (otherwise, upon reaching a white-space character, the shell will consider this option to be finished and cause un-expected behavior). If your table is large and generated by a script, you can first do all your operations on your table’s data and write it into a temporary file (maybe called ‘temp.fits’). Then, look into that file’s metadata (with ‘asttable temp.fits -i’) to see the exact column positions and possible names, then add the necessary calls to this option to your previous call to ‘asttable’, so it writes proper metadata in the same run (for example, in a script or Makefile). Recall that when a name is given, this option will update the metadata of the first column that matches, so if you have multiple columns with the same name, you can call this options multiple times with the same first argument to change them all to different names. Finally, if you already have a FITS table by other means (for example, by downloading) and you merely want to update the column metadata and leave the data intact, it is much more efficient to directly modify the respective FITS header keywords with ‘astfits’, using the keyword manipulation features described in *note Keyword inspection and manipulation::. ‘--colmetadata’ is mainly intended for scenarios where you want to edit the data so it will always load the full/partial dataset into memory, then write out the resulting datasets with updated/corrected metadata. ‘-f STR’ ‘--txtf32format=STR’ The plain-text format of 32-bit floating point columns when output is not binary (this option is ignored for binary outputs like FITS tables). The acceptable values are listed below. This is just the format of the plain-text outputs; see ‘--txtf32precision’ for customizing their precision. ‘fixed’ Fixed-point notation (for example ‘1234567.89012’) ‘exp’ Exponential notation (for example ‘1.2345689012e+06’) ‘-p STR’ ‘--txtf32precision=INT’ Number of digits after the decimal point (precision) for columns with a 32-bit floating point datatype. ‘-d STR’ ‘--txtf64format=STR’ The plain-text format of 64-bit floating point columns when output is not binary (this option is ignored for binary outputs like FITS tables). The acceptable values are listed below. This is just the format of the plain-text outputs; see ‘--txtf64precision’ for customizing their precision. ‘fixed’ Fixed-point notation (for example ‘1234567.89012’) ‘exp’ Exponential notation (for example ‘1.2345689012e+06’) ‘-B STR’ ‘--txtf64precision=INT’ Number of digits after the decimal point (precision) for columns with a 64-bit floating point datatype. 5.4 Query ========= There are many astronomical databases available for downloading astronomical data. Most follow the International Virtual Observatory Alliance (IVOA, ) standards (and in particular the Table Access Protocol, or TAP(1)). With TAP, it is possible to submit your queries via a command-line downloader (for example, ‘curl’) to only get specific tables, targets (rows in a table) or measurements (columns in a table): you do not have to download the full table (which can be very large in some cases)! These customizations are done through the Astronomical Data Query Language (ADQL(2)). Therefore, if you are sufficiently familiar with TAP and ADQL, you can easily custom-download any part of an online dataset. However, you also need to keep a record of the URLs of each database and in many cases, the commands will become long and hard/buggy to type on the command-line. On the other hand, most astronomers do not know TAP or ADQL at all, and are forced to go to the database’s web page which is slow (it needs to download so many images, and has too much annoying information), requires manual interaction (further making it slow and buggy), and cannot be automated. Gnuastro’s Query program is designed to be the middle-man in this process: it provides a simple high-level interface to let you specify your constraints on what you want to download. It then internally constructs the command to download the data based on your inputs and runs it to download your desired data. Query also prints the full command before it executes it (if not called with ‘--quiet’). Also, if you ask for a FITS output table, the full command is written into its 0-th extension along with other input parameters to query (all Gnuastro programs generally keep their input configuration parameters as FITS keywords in the zero-th output). You can see it with Gnuastro’s Fits program, like below: $ astfits query-output.fits -h0 With the full command used to download the dataset, you only need a minimal knowledge of ADQL to do lower-level customizations on your downloaded dataset. You can simply copy that command and change the parts of the query string you want: ADQL is very powerful! For example, you can ask the server to do mathematical operations on the columns and apply selections after those operations, or combine/match multiple datasets. We will try to add high-level interfaces for such capabilities, but generally, do not limit yourself to the high-level operations (that cannot cover everything!). ---------- Footnotes ---------- (1) (2) 5.4.1 Available databases ------------------------- The current list of databases supported by Query are listed at the end of this section. To get the list of available datasets within each database, you can use the ‘--information’ option. for example, with the command below you can get a list of the roughly 100 datasets that are available within the ESA Gaia server with their description: $ astquery gaia --information However, other databases like VizieR host many more datasets (tens of thousands!). Therefore it is very inconvenient to get the _full_ information every time you want to find your dataset of interest (the full metadata file VizieR is more than 20Mb). In such cases, you can limit the downloaded and displayed information with the ‘--limitinfo’ option. for example, with the first command below, you can get all datasets relating to the MUSE (an instrument on the Very Large Telescope), and those that include Roland Bacon (Principle Investigator of MUSE) as an author (‘Bacon, R.’). Recall that ‘-i’ is the short format of ‘--information’. $ astquery vizier -i --limitinfo=MUSE $ astquery vizier -i --limitinfo="Bacon R." Once you find the recognized name of your desired dataset, you can see the column information of that dataset with adding the dataset name. For example, with the command below you can see the column metadata in the ‘J/A+A/608/A2/udf10’ dataset (one of the datasets in the search above) using this command: $ astquery vizier --dataset=J/A+A/608/A2/udf10 -i For very popular datasets of a database, Query provides an easier-to-remember short name that you can feed to ‘--dataset’. This short name will map to the officially recognized name of the dataset on the server. In this mode, Query will also set positional columns accordingly. for example, most VizieR datasets have an ‘RAJ2000’ column (the RA and the epoch of 2000) so it is the default RA column name for coordinate search (using ‘--center’ or ‘--overlapwith’). However, some datasets do not have this column (for example, SDSS DR12). So when you use the short name and Query knows about this dataset, it will internally set the coordinate columns that SDSS DR12 has: ‘RA_ICRS’ and ‘DEC_ICRS’. Recall that you can always change the coordinate columns with ‘--ccol’. for example, in the VizieR and Gaia databases, the recognized name for the early data release 3 data is respectively ‘I/350/gaiaedr3’ and ‘gaiaedr3.gaia_source’. These technical names can be hard to remember. Therefore Query provides ‘gaiaedr3’ (for VizieR) and ‘edr3’ (for ESA’s Gaia) shortcuts which you can give to ‘--dataset’ instead. They will be directly mapped to the fully recognized name by Query. In the list below that describes the available databases, the available short names are also listed. *Not all datasets support TAP:* Large databases like VizieR have TAP access for all their datasets. However, smaller databases have not implemented TAP for all their tables. Therefore some datasets that are searchable in their web interface may not be available for a TAP search. To see the full list of TAP-ed datasets in a database, use the ‘--information’ (or ‘-i’) option with the dataset name like the command below. $ astquery astron -i If your desired dataset is not in this list, but has web-access, contact the database maintainers and ask them to add TAP access for it. After they do it, you should see the name added to the output list of the command above. The list of databases recognized by Query (and their names in Query) is described below. Since Query is a new member of the Gnuastro family (first available in Gnuastro 0.14), this list will hopefully grow significantly in the next releases. If you have any particular datasets in mind, please let us know by sending an email to ‘bug-gnuastro@gnu.org’. If the dataset supports IVOA’s TAP (Table Access Protocol), it should be very easy to add. ‘astron’ The ASTRON Virtual Observatory service () is a database focused on radio astronomy data and images, primarily those collected by ASTRON itself. A query to ‘astron’ is submitted to ‘https://vo.astron.nl/__system__/tap/run/tap/sync’. Here is the list of short names for dataset(s) in ASTRON’s VO service: • ‘tgssadr --> tgssadr.main’ ‘gaia’ The Gaia project () database which is a large collection of star positions on the celestial sphere, as well as peculiar velocities, parallaxes and magnitudes in some bands among many others. Besides scientific studies (like studying resolved stellar populations in the Galaxy and its halo), Gaia is also invaluable for raw data calibrations, like astrometry. A query to ‘gaia’ is submitted to ‘https://gea.esac.esa.int/tap-server/tap/sync’. Here is the list of short names for popular datasets within Gaia: • ‘dr3 --> gaiadr3.gaia_source’ • ‘edr3 --> gaiaedr3.gaia_source’ • ‘dr2 --> gaiadr2.gaia_source’ • ‘dr1 --> gaiadr1.gaia_source’ • ‘tycho2 --> public.tycho2’ • ‘hipparcos --> public.hipparcos’ ‘ned’ The NASA/IPAC Extragalactic Database (NED, ) is a fusion database, integrating the information about extra-galactic sources from many large sky surveys into a single catalog. It covers the full spectrum, from Gamma rays to radio frequencies and is updated when new data arrives. A TAP query to ‘ned’ is submitted to ‘https://ned.ipac.caltech.edu/tap/sync’. • ‘objdir --> NEDTAP.objdir’: default TAP-based dataset in NED. • ‘extinction’: A command-line interface to the NED Extinction Calculator (https://ned.ipac.caltech.edu/extinction_calculator). It only takes a central coordinate and returns a VOTable of the calculated extinction in many commonly used filters at that point. As a result, options like ‘--width’ or ‘--radius’ are not supported. However, Gnuastro does not yet support the VOTable format. Therefore, if you specify an ‘--output’ file, it should have an ‘.xml’ suffix and the downloaded file will not be checked. Until VOTable support is added to Gnuastro, you can use GREP, AWK and SED to convert the VOTable data into a FITS table with a command like below (assuming the queried VOTable is called ‘ned-extinction.xml’): grep '^' ned-extinction.xml \ | sed -e's|||' \ -e's|||' \ -e's||@|g' \ | awk 'BEGIN{FS="@"; \ print "# Column 1: FILTER [name,str15] Filter name"; \ print "# Column 2: CENTRAL [um,f32] Central Wavelength"; \ print "# Column 3: EXTINCTION [mag,f32] Galactic Ext."; \ print "# Column 4: ADS_REF [ref,str50] ADS reference"} \ {printf "%-15s %g %g %s\n", $1, $2, $3, $4}' \ | asttable -oned-extinction.fits Once the table is in FITS, you can easily get the extinction for a certain filter (for example, the ‘SDSS r’ filter) like the command below: asttable ned-extinction.fits --equal=FILTER,"SDSS r" \ -cEXTINCTION ‘vizier’ Vizier () is arguably the largest catalog database in astronomy: containing more than 20500 catalogs as of mid January 2021. Almost all published catalogs in major projects, and even the tables in many papers are archived and accessible here. for example, VizieR also has a full copy of the Gaia database mentioned below, with some additional standardized columns (like RA and Dec in J2000). The current implementation of ‘--limitinfo’ only looks into the description of the datasets, but since VizieR is so large, there is still a lot of room for improvement. Until then, if ‘--limitinfo’ is not sufficient, you can use VizieR’s own web-based search for your desired dataset: Because VizieR curates such a diverse set of data from tens of thousands of projects and aims for interoperability between them, the column names in VizieR may not be identical to the column names in the surveys’ own databases (Gaia in the example above). A query to ‘vizier’ is submitted to ‘http://tapvizier.u-strasbg.fr/TAPVizieR/tap/sync’. Here is the list of short names for popular datasets within VizieR (sorted alphabetically by their short name). Please feel free to suggest other major catalogs (covering a wide area or commonly used in your field).. For details on each dataset with necessary citations, and links to web pages, look into their details with their ViziR names in . • ‘2mass --> II/246/out’ (2MASS All-Sky Catalog) • ‘akarifis --> II/298/fis’ (AKARI/FIS All-Sky Survey) • ‘allwise --> II/328/allwise’ (AllWISE Data Release) • ‘apass9 --> II/336/apass9’ (AAVSO Photometric All Sky Survey, DR9) • ‘catwise --> II/365/catwise’ (CatWISE 2020 catalog) • ‘des1 --> II/357/des_dr1’ (Dark Energy Survey data release 1) • ‘gaiadr3 --> I/355/gaiadr3’ (GAIA Data Release 3) • ‘gaiaedr3 --> I/350/gaiaedr3’ (GAIA early Data Release 3) • ‘gaiadr2 --> I/345/gaia2’ (GAIA Data Release 2) • ‘galex5 --> II/312/ais’ (All-sky Survey of GALEX DR5) • ‘nomad --> I/297/out’ (Naval Observatory Merged Astrometric Dataset) • ‘panstarrs1 --> II/349/ps1’ (Pan-STARRS Data Release 1). • ‘ppmxl --> I/317/sample’ (Positions and proper motions on the ICRS) • ‘sdss12 --> V/147/sdss12’ (SDSS Photometric Catalogue, Release 12) • ‘usnob1 --> I/284/out’ (Whole-Sky USNO-B1.0 Catalog) • ‘ucac5 --> I/340/ucac5’ (5th U.S. Naval Obs. CCD Astrograph Catalog) • ‘unwise --> II/363/unwise’ (Band-merged unWISE Catalog) • ‘wise --> II/311/wise’ (WISE All-Sky data Release) 5.4.2 Invoking Query -------------------- Query provides a high-level interface to downloading subsets of data from databases. The executable name is ‘astquery’ with the following general template $ astquery DATABASE-NAME [OPTION...] ... One line examples: ## Information about all datasets in ESA's GAIA database: $ astquery gaia --information ## Only show catalogs in VizieR that have 'MUSE' in their ## description. The '-i' is short for '--information'. $ astquery vizier -i --limitinfo=MUSE ## List of columns in 'J/A+A/608/A2/udf10' (one of the above). $ astquery vizier --dataset=J/A+A/608/A2/udf10 -i ## ID, RA and Dec of all Gaia sources within an image. $ astquery gaia --dataset=edr3 --overlapwith=image.fits \ -csource_id,ra,dec ## RA, Dec and Spectroscopic redshifts of objects in SDSS DR12 ## spectroscopic redshift that overlap with 'image.fits'. $ astquery vizier --dataset=sdss12 --overlapwith=image.fits \ -cRA_ICRS,DE_ICRS,zsp --range=zsp,1e-10,inf ## All columns of all entries in the Gaia eDR3 catalog (hosted at ## VizieR) within 1 arc-minute of the given coordinate. $ astquery vizier --dataset=I/350/gaiaedr3 --output=my-gaia.fits \ --center=113.8729761,31.9027152 --radius=1/60 \ ## Similar to above, but only ID, RA and Dec columns for objects with ## magnitude range 10 to 15. In VizieR, this column is called 'Gmag'. ## Also, using sexagesimal coordinates instead of degrees for center. $ astquery vizier --dataset=I/350/gaiaedr3 --output=my-gaia.fits \ --center=07h35m29.51,31d54m9.77 --radius=1/60 \ --range=Gmag,10:15 -cEDR3Name,RAJ2000,DEJ2000 Query takes a single argument which is the name of the database. For the full list of available databases and accessing them, see *note Available databases::. There are two methods to query the databases, each is more fully discussed in its option’s description below. • *Low-level:* With ‘--query’ you can directly give a raw query statement that is recognized by the database. This is very low level and will require a good knowledge of the database’s query language, but of course, it is much more powerful. If this option is given, the raw string is directly passed to the server and all other constraints/options (for Query’s high-level interface) are ignored. • *High-level:* With the high-level options (like ‘--column’, ‘--center’, ‘--radius’, ‘--range’ and other constraining options below), the low-level query will be constructed automatically for the particular database. This method is only limited to the generic capabilities that Query provides for all servers. So ‘--query’ is more powerful, however, in this mode, you do not need any knowledge of the database’s query language. You can see the internally generated query on the terminal (if ‘--quiet’ is not used) or in the 0-th extension of the output (if it is a FITS file). This full command contains the internally generated query. The name of the downloaded output file can be set with ‘--output’. The requested output format can have any of the *note Recognized table formats:: (currently ‘.txt’ or ‘.fits’). Like all Gnuastro programs, if the output is a FITS file, the zero-th/first HDU of the output will contain all the command-line options given to Query as well as the full command used to access the server. When ‘--output’ is not set, the output name will be in the format of ‘NAME-STRING.fits’, where ‘NAME’ is the name of the database and ‘STRING’ is a randomly selected 6-character set of numbers and alphabetic characters. With this feature, a second run of ‘astquery’ that is not called with ‘--output’ will not over-write an already downloaded one. Generally, when calling Query more than once, it is recommended to set an output name for each call based on your project’s context. The outputs of Query will have a common output format, irrespective of the used database. To achieve this, Query will ask the databases to provide a FITS table output (for larger tables, FITS can consume much less download volume). After downloading is complete, the raw downloaded file will be read into memory once by Query, and written into the file given to ‘--output’. The raw downloaded file will be deleted by default, but can be preserved with the ‘--keeprawdownload’ option. This strategy avoids unnecessary surprises depending on database. for example, some databases can download a compressed FITS table, even though we ask for FITS. But with the strategy above, the final output will be an uncompressed FITS file. The metadata that is added by Query (including the full download command) is also very useful for future usage of the downloaded data. Unfortunately many databases do not write the input queries into their generated tables. ‘--dry-run’ Only print the final download command to contact the server, do not actually run it. This option is good when you want to check the finally constructed query or download options given to the download program. You may also want to use the constructed command as a base to do further customizations on it and run it yourself. ‘-k’ ‘--keeprawdownload’ Do Not delete the raw downloaded file from the database. The name of the raw download will have a ‘OUTPUT-raw-download.fits’ format. Where ‘OUTPUT’ is either the base-name of the final output file (without a suffix). ‘-i’ ‘--information’ Print the information of all datasets (tables) within a database or all columns within a database. When ‘--dataset’ is specified, the latter mode (all column information) is downloaded and printed and when it is not defined, all dataset information (within the database) is printed. Some databases (like VizieR) contain tens of thousands of datasets, so you can limit the downloaded and printed information for available databases with the ‘--limitinfo’ option (described below). Dataset descriptions are often large and contain a lot of text (unlike column descriptions). Therefore when printing the information of all datasets within a database, the information (e.g., database name) will be printed on separate lines before the description. However, when printing column information, the output has the same format as a similar option in Table (see *note Invoking asttable::). Important note to consider: the printed order of the datasets or columns is just for displaying in the printed output. You cannot ask for datasets or columns based on the printed order, you need to use dataset or column names. ‘-L STR’ ‘--limitinfo=STR’ Limit the information that is downloaded and displayed (with ‘--information’) to those that have the string given to this option in their description. Note that _this is case-sensitive_. This option is only relevant when ‘--information’ is also called. Databases may have thousands (or tens of thousands) of datasets. Therefore just the metadata (information) to show with ‘--information’ can be tens of megabytes (for example, the full VizieR metadata file is about 23Mb as of January 2021). Once downloaded, it can also be hard to parse manually. With ‘--limitinfo’, only the metadata of datasets that contain this string _in their description_ will be downloaded and displayed, greatly improving the speed of finding your desired dataset. ‘-Q "STR"’ ‘--query="STR"’ Directly specify the query to be passed onto the database. The queries will generally contain space and other meta-characters, so we recommend placing the query within quotations. ‘-s STR’ ‘--dataset=STR’ The dataset to query within the database (not compatible with ‘--query’). This option is mandatory when ‘--query’ or ‘--information’ are not provided. You can see the list of available datasets within a database using ‘--information’ (possibly supplemented by ‘--limitinfo’). The output of ‘--information’ will contain the recognized name of the datasets within that database. You can pass the recognized name directly to this option. For more on finding and using your desired database, see *note Available databases::. ‘-c STR’ ‘--column=STR[,STR[,...]]’ The column name(s) to retrieve from the dataset in the given order (not compatible with ‘--query’). If not given, all the dataset’s columns for the selected rows will be queried (which can be large!). This option can take multiple values in one instance (for example, ‘--column=ra,dec,mag’), or in multiple instances (for example, ‘-cra -cdec -cmag’), or mixed (for example, ‘-cra,dec -cmag’). In case, you do not know the full list of the dataset’s column names a-priori, and you do not want to download all the columns (which can greatly decrease your download speed), you can use the ‘--information’ option combined with the ‘--dataset’ option, see *note Available databases::. ‘-H INT’ ‘--head=INT’ Only ask for the first ‘INT’ rows of the finally selected columns, not all the rows. This can be good when your search can result a large dataset, but before downloading the full volume, you want to see the top rows and get a feeling of what the whole dataset looks like. ‘-v FITS’ ‘--overlapwith=FITS’ File name of FITS file containing an image (in the HDU given by ‘--hdu’) to use for identifying the region to query in the give database and dataset. Based on the image’s WCS and pixel size, the sky coverage of the image is estimated and values to the ‘--center’, ‘--width’ will be calculated internally. Hence this option cannot be used with ‘--center’, ‘--width’ or ‘--radius’. Also, since it internally generates the query, it cannot be used with ‘--query’. Note that if the image has WCS distortions and the reference point for the WCS is not within the image, the WCS will not be well-defined. Therefore the resulting catalog may not overlap, or correspond to a larger/small area in the sky. ‘-C FLT,FLT’ ‘--center=FLT,FLT’ The spatial center position (mostly RA and Dec) to use for the automatically generated query (not compatible with ‘--query’). The comma-separated values can either be in degrees (a single number), or sexagesimal (‘_h_m_’ for RA, ‘_d_m_’ for Dec, or ‘_:_:_’ for both). The given values will be compared to two columns in the database to find/return rows within a certain region around this center position will be requested and downloaded. Pre-defined RA and Dec column names are defined in Query for every database, however you can use ‘--ccol’ to select other columns to use instead. The region can either be a circle and the point (configured with ‘--radius’) or a box/rectangle around the point (configured with ‘--width’). ‘--ccol=STR,STR’ The name of the coordinate-columns in the dataset to compare with the values given to ‘--center’. Query will use its internal defaults for each dataset (for example, ‘RAJ2000’ and ‘DEJ2000’ for VizieR data). But each dataset is treated separately and it is not guaranteed that these columns exist in all datasets. Also, more than one coordinate system/epoch may be present in a dataset and you can use this option to construct your spatial constraint based on the others coordinate systems/epochs. ‘-r FLT’ ‘--radius=FLT’ The radius about the requested center to use for the automatically generated query (not compatible with ‘--query’). The radius is in units of degrees, but you can use simple division with this option directly on the command-line. for example, if you want a radius of 20 arc-minutes or 20 arc-seconds, you can use ‘--radius=20/60’ or ‘--radius=20/3600’ respectively (which is much more human-friendly than ‘0.3333’ or ‘0.005556’). ‘-w FLT[,FLT]’ ‘--width=FLT[,FLT]’ The square (or rectangle) side length (width) about the requested center to use for the automatically generated query (not compatible with ‘--query’). If only one value is given to ‘--width’ the region will be a square, but if two values are given, the widths of the query box along each dimension will be different. The value(s) is (are) in the same units as the coordinate column (see ‘--ccol’, usually RA and Dec which are degrees). You can use simple division for each value directly on the command-line if you want relatively small (and more human-friendly) sizes. for example, if you want your box to be 1 arc-minutes along the RA and 2 arc-minutes along Dec, you can use ‘--width=1/60,2/60’. ‘-g STR,FLT,FLT’ ‘--range=STR,FLT,FLT’ The column name and numerical range (inclusive) of acceptable values in that column (not compatible with ‘--query’). This option can be called multiple times for applying range limits on many columns in one call (thus greatly reducing the download size). for example, when used on the ESA gaia database, you can use ‘--range=phot_g_mean_mag,10:15’ to only get rows that have a value between 10 and 15 (inclusive on both sides) in the ‘phot_g_mean_mag’ column. If you want all rows larger, or smaller, than a certain number, you can use ‘inf’, or ‘-inf’ as the first or second values respectively. For example, if you want objects with SDSS spectroscopic redshifts larger than 2 (from the VizieR ‘sdss12’ database), you can use ‘--range=zsp,2,inf’ If you want the interval to not be inclusive on both sides, you can run ‘astquery’ once and get the command that it executes. Then you can edit it to be non-inclusive on your desired side. ‘-b STR[,STR]’ ‘--noblank=STR[,STR]’ Only ask for rows that do not have a blank value in the ‘STR’ column. This option can be called many times, and each call can have multiple column names (separated by a comma or <,>). for example, if you want the retrieved rows to not have a blank value in columns ‘A’, ‘B’, ‘C’ and ‘D’, you can use ‘--noblank=A -bB,C,D’. ‘--sort=STR[,STR]’ Ask for the server to sort the downloaded data based on the given columns. for example, let’s assume your desired catalog has column ‘Z’ for redshift and column ‘MAG_R’ for magnitude in the R band. When you call ‘--sort=Z,MAG_R’, it will primarily sort the columns based on the redshift, but if two objects have the same redshift, they will be sorted by magnitude. You can add as many columns as you like for higher-level sorting. 6 Data manipulation ******************* Images are one of the major formats of data that is used in astronomy. The functions in this chapter explain the GNU Astronomy Utilities which are provided for their manipulation. for example, cropping out a part of a larger image or convolving the image with a given kernel or applying a transformation to it. 6.1 Crop ======== Astronomical images are often very large, filled with thousands of galaxies. It often happens that you only want a section of the image, or you have a catalog of sources and you want to visually analyze them in small postage stamps. Crop is made to do all these things. When more than one crop is required, Crop will divide the crops between multiple threads to significantly reduce the run time. Astronomical surveys are usually extremely large. So large in fact, that the whole survey will not fit into a reasonably sized file. Because of this, surveys usually cut the final image into separate tiles and store each tile in a file. for example, the COSMOS survey’s Hubble space telescope, ACS F814W image consists of 81 separate FITS images, with each one having a volume of 1.7 Giga bytes. Even though the tile sizes are chosen to be large enough that too many galaxies/targets do not fall on the edges of the tiles, inevitably some do. So when you simply crop the image of such targets from one tile, you will miss a large area of the surrounding sky (which is essential in estimating the noise). Therefore in its WCS mode, Crop will stitch parts of the tiles that are relevant for a target (with the given width) from all the input images that cover that region into the output. Of course, the tiles have to be present in the list of input files. Besides cropping postage stamps around certain coordinates, Crop can also crop arbitrary polygons from an image (or a set of tiles by stitching the relevant parts of different tiles within the polygon), see ‘--polygon’ in *note Invoking astcrop::. Alternatively, it can crop out rectangular regions through the ‘--section’ option from one image, see *note Crop section syntax::. 6.1.1 Crop modes ---------------- In order to be comprehensive, intuitive, and easy to use, there are two ways to define the crop: 1. From its center and side length. for example, if you already know the coordinates of an object and want to inspect it in an image or to generate postage stamps of a catalog containing many such coordinates. 2. The vertices of the crop region, this can be useful for larger crops over many targets, for example, to crop out a uniformly deep, or contiguous, region of a large survey. Irrespective of how the crop region is defined, the coordinates to define the crop can be in Image (pixel) or World Coordinate System (WCS) standards. All coordinates are read as floating point numbers (not integers, except for the ‘--section’ option, see below). By setting the _mode_ in Crop, you define the standard that the given coordinates must be interpreted. Here, the different ways to specify the crop region are discussed within each standard. For the full list options, please see *note Invoking astcrop::. When the crop is defined by its center, the respective (integer) central pixel position will be found internally according to the FITS standard. To have this pixel positioned in the center of the cropped region, the final cropped region will have an add number of pixels (even if you give an even number to ‘--width’ in image mode). Furthermore, when the crop is defined as by its center, Crop allows you to only keep crops what do not have any blank pixels in the vicinity of their center (your primary target). This can be very convenient when your input catalog/coordinates originated from another survey/filter which is not fully covered by your input image, to learn more about this feature, please see the description of the ‘--checkcenter’ option in *note Invoking astcrop::. Image coordinates In image mode (‘--mode=img’), Crop interprets the pixel coordinates and widths in units of the input data-elements (for example, pixels in an image, not world coordinates). In image mode, only one image may be input. The output crop(s) can be defined in multiple ways as listed below. Center of multiple crops (in a catalog) The center of (possibly multiple) crops are read from a text file. In this mode, the columns identified with the ‘--coordcol’ option are interpreted as the center of a crop with a width of ‘--width’ pixels along each dimension. The columns can contain any floating point value. The value to ‘--output’ option is seen as a directory which will host (the possibly multiple) separate crop files, see *note Crop output:: for more. For a tutorial using this feature, please see *note Reddest clumps cutouts and parallelization::. Center of a single crop (on the command-line) The center of the crop is given on the command-line with the ‘--center’ option. The crop width is specified by the ‘--width’ option along each dimension. The given coordinates and width can be any floating point number. Vertices of a single crop In Image mode there are two options to define the vertices of a region to crop: ‘--section’ and ‘--polygon’. The former is lower-level (does not accept floating point vertices, and only a rectangular region can be defined), it is also only available in Image mode. Please see *note Crop section syntax:: for a full description of this method. The latter option (‘--polygon’) is a higher-level method to define any polygon (with any number of vertices) with floating point values. Please see the description of this option in *note Invoking astcrop:: for its syntax. WCS coordinates In WCS mode (‘--mode=wcs’), the coordinates and width are interpreted using the World Coordinate System (WCS, that must accompany the dataset), not pixel coordinates. You can optionally use ‘--widthinpix’ for the width to be interpreted in pixels (even though the coordinates are in WCS). In WCS mode, Crop accepts multiple datasets as input. When the cropped region (defined by its center or vertices) overlaps with multiple of the input images/tiles, the overlapping regions will be taken from the respective input (they will be stitched when necessary for each output crop). In this mode, the input images do not necessarily have to be the same size, they just need to have the same orientation and pixel resolution. Currently only orientation along the celestial coordinates is accepted, if your input has a different orientation or resolution you can use Warp’s ‘--gridfile’ option to align the image before cropping it (see *note Warp::). Each individual input image/tile can even be smaller than the final crop. In any case, any part of any of the input images which overlaps with the desired region will be used in the crop. Note that if there is an overlap in the input images/tiles, the pixels from the last input image read are going to be used for the overlap. Crop will not change pixel values, so it assumes your overlapping tiles were cutout from the same original image. There are multiple ways to define your cropped region as listed below. Center of multiple crops (in a catalog) Similar to catalog inputs in Image mode (above), except that the values along each dimension are assumed to have the same units as the dataset’s WCS information. For example, the central RA and Dec value for each crop will be read from the first and second calls to the ‘--coordcol’ option. The width of the cropped box (in units of the WCS, or degrees in RA and Dec mode) must be specified with the ‘--width’ option. You can optionally use ‘--widthinpix’ for the value of ‘--width’ to be interpreted in pixels. Center of a single crop (on the command-line) You can specify the center of only one crop box with the ‘--center’ option. If it exists in the input images, it will be cropped similar to the catalog mode, see above also for ‘--width’. Vertices of a single crop The ‘--polygon’ option is a high-level method to define any convex polygon (with any number of vertices). Please see the description of this option in *note Invoking astcrop:: for its syntax. *CAUTION:* In WCS mode, the image has to be aligned with the celestial coordinates, such that the first FITS axis is parallel (opposite direction) to the Right Ascension (RA) and the second FITS axis is parallel to the declination. If these conditions are not met for an image, Crop will warn you and abort. You can use Warp to align the input image to standard celestial coordinates, see *note Warp::. As a summary, if you do not specify a catalog, you have to define the cropped region manually on the command-line. In any case the mode is mandatory for Crop to be able to interpret the values given as coordinates or widths. 6.1.2 Crop section syntax ------------------------- When in image mode, one of the methods to crop only one rectangular section from the input image is to use the ‘--section’ option. Crop has a powerful syntax to read the box parameters from a string of characters. If you leave certain parts of the string to be empty, Crop can fill them for you based on the input image sizes. To define a box, you need the coordinates of two points: the first (‘X1’, ‘Y1’) and the last pixel (‘X2’, ‘Y2’) pixel positions in the image, or four integer numbers in total. The four coordinates can be specified with one string in this format: ‘‘X1:X2,Y1:Y2’’. This string is given to the ‘--section’ option. Therefore, the pixels along the first axis that are $\geq$‘X1’ and $\leq$‘X2’ will be included in the cropped image. The same goes for the second axis. Note that each different term will be read as an integer, not a float. The reason it only accepts integers is that ‘--section’ is a low-level option (which is also very fast!). For a higher-level way to specify region (any polygon, not just a box), please see the ‘--polygon’ option in *note Crop options::. Also note that in the FITS standard, pixel indexes along each axis start from unity(1) not zero(0). You can omit any of the values and they will be filled automatically. The left hand side of the colon (‘:’) will be filled with ‘1’, and the right side with the image size. So, ‘2:,:’ will include the full range of pixels along the second axis and only those with a first axis index larger than ‘2’ in the first axis. If the colon is omitted for a dimension, then the full range is automatically used. So the same string is also equal to ‘2:,’ or ‘2:’ or even ‘2’. If you want such a case for the second axis, you should set it to: ‘,2’. If you specify a negative value, it will be seen as before the indexes of the image which are outside the image along the bottom or left sides when viewed in SAO DS9. In case you want to count from the top or right sides of the image, you can use an asterisk (‘*’). When confronted with a ‘*’, Crop will replace it with the maximum length of the image in that dimension. So ‘*-10:*+10,*-20:*+20’ will mean that the crop box will be 20\times40 pixels in size and only include the top corner of the input image with 3/4 of the image being covered by blank pixels, see *note Blank pixels::. If you feel more comfortable with space characters between the values, you can use as many space characters as you wish, just be careful to put your value in double quotes, for example, ‘--section="5:200, 123:854"’. If you forget the quotes, anything after the first space will not be seen by ‘--section’ and you will most probably get an error because the rest of your string will be read as a filename (which most probably does not exist). See *note Command-line:: for a description of how the command-line works. 6.1.3 Blank pixels ------------------ The cropped box can potentially include pixels that are beyond the image range. for example, when a target in the input catalog was very near the edge of the input image. The parts of the cropped image that were not in the input image will be filled with the following two values depending on the data type of the image. In both cases, SAO DS9 will not color code those pixels. • If the data type of the image is a floating point type (float or double), IEEE NaN (Not a number) will be used. • For integer types, pixels out of the image will be filled with the value of the ‘BLANK’ keyword in the cropped image header. The value assigned to it is the lowest value possible for that type, so you will probably never need it any way. Only for the unsigned character type (‘BITPIX=8’ in the FITS header), the maximum value is used because it is unsigned, the smallest value is zero which is often meaningful. You can ask for such blank regions to not be included in the output crop image using the ‘--noblank’ option. In such cases, there is no guarantee that the image size of your outputs are what you asked for. In some survey images, unfortunately they do not use the ‘BLANK’ FITS keyword. Instead they just give all pixels outside of the survey area a value of zero. So by default, when dealing with float or double image types, any values that are 0.0 are also regarded as blank regions. This can be turned off with the ‘--zeroisnotblank’ option. 6.1.4 Invoking Crop ------------------- Crop will crop a region from an image. If in WCS mode, it will also stitch parts from separate images in the input files. The executable name is ‘astcrop’ with the following general template $ astcrop [OPTION...] [ASCIIcatalog] ASTRdata ... One line examples: ## Crop all objects in cat.txt from image.fits: $ astcrop --catalog=cat.txt image.fits ## Crop all options in catalog (with RA,DEC) from all the files ## ending in `_drz.fits' in `/mnt/data/COSMOS/': $ astcrop --mode=wcs --catalog=cat.txt /mnt/data/COSMOS/*_drz.fits ## Crop the outer 10 border pixels of the input image: $ astcrop --section=10:*-10,10:*-10 --hdu=2 image.fits ## Crop region around RA and Dec of (189.16704, 62.218203): $ astcrop --mode=wcs --center=189.16704,62.218203 goodsnorth.fits ## Same crop above, but coordinates given in sexagesimal (you can ## also use ':' between the sexagesimal components). $ astcrop --mode=wcs --center=12h36m40.08,62d13m5.53 goodsnorth.fits ## Crop region around pixel coordinate (568.342, 2091.719): $ astcrop --mode=img --center=568.342,2091.719 --width=201 image.fits Crop has one mandatory argument which is the input image name(s), shown above with ‘ASTRdata ...’. You can use shell expansions, for example, ‘*’ for this if you have lots of images in WCS mode. If the crop box centers are in a catalog, you can use the ‘--catalog’ option. In other cases, you have to provide the single cropped output parameters must be given with command-line options. See *note Crop output:: for how the output file name(s) can be specified. For the full list of general options to all Gnuastro programs (including Crop), please see *note Common options::. Floating point numbers can be used to specify the crop region (except the ‘--section’ option, see *note Crop section syntax::). In such cases, the floating point values will be used to find the desired integer pixel indices based on the FITS standard. Hence, Crop ultimately does not do any sub-pixel cropping (in other words, it does not change pixel values). If you need such crops, you can use *note Warp:: to first warp the image to the a new pixel grid, then crop from that. For example, let’s assume you want a crop from pixels 12.982 to 80.982 along the first dimension. You should first translate the image by $-0.482$ (note that the edge of a pixel is at integer multiples of $0.5$). So you should run Warp with ‘--translate=-0.482,0’ and then crop the warped image with ‘--section=13:81’. There are two ways to define the cropped region: with its center or its vertices. See *note Crop modes:: for a full description. In the former case, Crop can check if the central region of the cropped image is indeed filled with data or is blank (see *note Blank pixels::), and not produce any output when the center is blank, see the description under ‘--checkcenter’ for more. When in catalog mode, Crop will run in parallel unless you set ‘--numthreads=1’, see *note Multi-threaded operations::. Note that when multiple outputs are created with threads, the outputs will not be created in the same order. This is because the threads are asynchronous and thus not started in order. This has no effect on each output, see *note Reddest clumps cutouts and parallelization:: for a tutorial on effectively using this feature. 6.1.4.1 Crop options .................... The options can be classified into the following contexts: Input, Output and operating mode options. Options that are common to all Gnuastro program are listed in *note Common options:: and will not be repeated here. When you are specifying the crop vertices yourself (through ‘--section’, or ‘--polygon’) on relatively small regions (depending on the resolution of your images) the outputs from image and WCS mode can be approximately equivalent. However, as the crop sizes get large, the curved nature of the WCS coordinates have to be considered. For example, when using ‘--section’, the right ascension of the bottom left and top left corners will not be equal. If you only want regions within a given right ascension, use ‘--polygon’ in WCS mode. Input image parameters: ‘--hstartwcs=INT’ Specify the first keyword card (line number) to start finding the input image world coordinate system information. This is useful when certain header keywords of the input may cause bad conflicts with your crop (see an example described below). To get line numbers of the header keywords, you can pipe the fully printed header into ‘cat -n’ like below: $ astfits image.fits -h1 | cat -n For example, distortions have only been present in WCSLIB from version 5.15 (released in mid 2016). Therefore some pipelines still apply their own specific set of WCS keywords for distortions and put them into the image header along with those that WCSLIB does recognize. So now that WCSLIB recognizes most of the standard distortion parameters, they will get confused with the old ones and give wrong results. for example, in the CANDELS-GOODS South images that were created before WCSLIB 5.15(1). The two ‘--hstartwcs’ and ‘--hendwcs’ are thus provided so when using older datasets, you can specify what region in the FITS headers you want to use to read the WCS keywords. Note that this is only relevant for reading the WCS information, basic data information like the image size are read separately. These two options will only be considered when the value to ‘--hendwcs’ is larger than that of ‘--hstartwcs’. So if they are equal or ‘--hstartwcs’ is larger than ‘--hendwcs’, then all the input keywords will be parsed to get the WCS information of the image. ‘--hendwcs=INT’ Specify the last keyword card to read for specifying the image world coordinate system on the input images. See ‘--hstartwcs’ Crop box parameters: ‘-c FLT[,FLT[,...]]’ ‘--center=FLT[,FLT[,...]]’ The central position of the crop in the input image. The positions along each dimension must be separated by a comma (<,>) and fractions are also acceptable. The comma-separated values can either be in degrees (a single number), or sexagesimal (‘_h_m_’ for RA, ‘_d_m_’ for Dec, or ‘_:_:_’ for both). The number of values given to this option must be the same as the dimensions of the input dataset. The width of the crop should be set with ‘--width’. The units of the coordinates are read based on the value to the ‘--mode’ option, see below. ‘-w FLT[,FLT[,...]]’ ‘--width=FLT[,FLT[,...]]’ Width of the cropped region about coordinate given to ‘--center’. If in WCS mode, value(s) given to this option will be read in the same units as the dataset’s WCS information along this dimension (unless ‘--widthinpix’ is given). This option may take either a single value (to be used for all dimensions: ‘--width=10’ in image-mode will crop a $10\times10$ pixel image) or multiple values (a specific value for each dimension: ‘--width=10,20’ in image-mode will crop a $10\times20$ pixel image). The ‘--width’ option also accepts fractions. for example, if you want the width of your crop to be 3 by 5 arcseconds along RA and Dec respectively and you are in wcs-mode, you can use: ‘--width=3/3600,5/3600’. The final output will have an odd number of pixels to allow easy identification of the pixel which keeps your requested coordinate (from ‘--center’ or ‘--catalog’). If you want an even sided crop, you can run Crop afterwards with ‘--section=":*-1,:*-1"’ or ‘--section=2:,2:’ (depending on which side you do not need), see *note Crop section syntax::. The basic reason for making an odd-sided crop is that your given central coordinate will ultimately fall within a discrete pixel in the image (defined by the FITS standard). When the crop has an odd number of pixels in each dimension, that pixel can be very well defined as the “central” pixel of the crop, making it unambiguously easy to identify. However, for an even-sided crop, it will be very hard to identify the central pixel (it can be on any of the four pixels adjacent to the central point of the image!). ‘-X’ ‘--widthinpix’ In WCS mode, interpret the value to ‘--width’ as number of pixels, not the WCS units like degrees. This is useful when you want a fixed crop size in pixels, even though your center coordinates are in WCS (for example, RA and Dec). ‘-l STR’ ‘-l FLT:FLT,...’ ‘--polygon=STR’ ‘--polygon=FLT,FLT:FLT,FLT:...’ Polygon vertice coordinates (when value is in ‘FLT,FLT:FLT,FLT:...’ format) or the filename of a SAO DS9 region file (when the value has no ‘,’ or ‘:’ characters). Each vertice can either be in degrees (a single floating point number) or sexagesimal (in formats of ‘‘_h_m_’’ for RA and ‘‘_d_m_’’ for Dec, or simply ‘‘_:_:_’’ for either of them). The vertices are used to define the polygon: in the same order given to this option. When the vertices are not necessarily ordered in the proper order (for example, one vertice in a square comes after its diagonal opposite), you can add the ‘--polygonsort’ option which will attempt to sort the vertices before cropping. Note that for concave polygons, sorting is not recommended because there is no unique solution, for more, see the description under ‘--polygonsort’. This option can be used both in the image and WCS modes, see *note Crop modes::. If a SAO DS9 region file is used, the coordinate mode of Crop will be determined by the contents of the file and any value given to ‘--mode’ is ignored. The cropped image will be the size of the rectangular region that completely encompasses the polygon. By default all the pixels that are outside of the polygon will be set as blank values (see *note Blank pixels::). However, if ‘--polygonout’ is called all pixels internal to the vertices will be set to blank. In WCS-mode, you may provide many FITS images/tiles: Crop will stitch them to produce this cropped region, then apply the polygon. The syntax for the polygon vertices is similar to, and simpler than, that for ‘--section’. In short, the dimensions of each coordinate are separated by a comma (<,>) and each vertex is separated by a colon (<:>). You can define as many vertices as you like. If you would like to use space characters between the dimensions and vertices to make them more human-readable, then you have to put the value to this option in double quotation marks. For example, let’s assume you want to work on the deepest part of the WFC3/IR images of Hubble Space Telescope eXtreme Deep Field (HST-XDF). According to the web page (https://archive.stsci.edu/prepds/xdf/)(2) the deepest part is contained within the coordinates: [ (53.187414,-27.779152), (53.159507,-27.759633), (53.134517,-27.787144), (53.161906,-27.807208) ] They have provided mask images with only these pixels in the WFC3/IR images, but what if you also need to work on the same region in the full resolution ACS images? Also what if you want to use the CANDELS data for the shallow region? Running Crop with ‘--polygon’ will easily pull out this region of the image for you, irrespective of the resolution. If you have set the operating mode to WCS mode in your nearest configuration file (see *note Configuration files::), there is no need to call ‘--mode=wcs’ on the command line. $ astcrop --mode=wcs desired-filter-image(s).fits \ --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \ 53.134517,-27.787144 : 53.161906,-27.807208" More generally, you have an image and want to define the polygon yourself (it is not already published like the example above). As the number of vertices increases, checking the vertex coordinates on a FITS viewer (for example, SAO DS9) and typing them in, one by one, can be very tedious and prone to typo errors. In such cases, you can make a polygon “region” in DS9 and using your mouse, easily define (and visually see) it. Given that SAO DS9 has a graphic user interface (GUI), if you do not have the polygon vertices before-hand, it is much more easier build your polygon there and pass it onto Crop through the region file. You can take the following steps to make an SAO DS9 region file containing your polygon. Open your desired FITS image with SAO DS9 and activate its “region” mode with Edit→Region. Then define the region as a polygon with Region→Shape→Polygon. Click on the approximate center of the region you want and a small square will appear. By clicking on the vertices of the square you can shrink or expand it, clicking and dragging anywhere on the edges will enable you to define a new vertex. After the region has been nicely defined, save it as a file with Region→“Save Regions”. You can then select the name and address of the output file, keep the format as ‘REG (*.reg)’ and press the “OK” button. In the next window, keep format as “ds9” and “Coordinate System” as “fk5” for RA and Dec (or “Image” for pixel coordinates). A plain text file is now created (let’s call it ‘ds9.reg’) which you can pass onto Crop with ‘--polygon=ds9.reg’. For the expected format of the region file, see the description of ‘gal_ds9_reg_read_polygon’ in *note SAO DS9 library::. However, since SAO DS9 makes this file for you, you do not usually need to worry about its internal format unless something un-expected happens and you find a bug. ‘--polygonout’ Keep all the regions outside the polygon and mask the inner ones with blank pixels (see *note Blank pixels::). This is practically the inverse of the default mode of treating polygons. Note that this option only works when you have only provided one input image. If multiple images are given (in WCS mode), then the full area covered by all the images has to be shown and the polygon excluded. This can lead to a very large area if large surveys like COSMOS are used. So Crop will abort and notify you. In such cases, it is best to crop out the larger region you want, then mask the smaller region with this option. ‘--polygonsort’ Sort the given set of vertices to the ‘--polygon’ option. For a concave polygon it will sort the vertices correctly, however for a convex polygon it there is no unique sorting, so be careful because the crop may not be what you expected. Polygons come in two classes: convex and concave (or generally, non-convex!), see below for a demonstration. Convex polygons are those where all inner angles are less than 180 degrees. By contrast, a concave polygon is one where an inner angle may be more than 180 degrees. Concave Polygon Convex Polygon D --------C D------------- C \ | E / | \E | \ | / | \ | A--------B A ----------B ‘-s STR’ ‘--section=STR’ Section of the input image which you want to be cropped. See *note Crop section syntax:: for a complete explanation on the syntax required for this input. ‘-C FITS/TXT’ ‘--catalog=FITS/TXT’ File name of catalog for making multiple crops from the input images/cubes. The catalog can be in any of Gnuastro’s recognized *note Recognized table formats::. The columns containing the coordinates for the crop centers can be specified with the ‘--coordcol’ option (using column names or numbers, see *note Selecting table columns::). The catalog can also contain the name of each crop, you can specify the column containing the name with the ‘--namecol’. ‘--cathdu=STR/INT’ The HDU (extension) containing the catalog (if the file given to ‘--catalog’ is a FITS file). This can either be the HDU name (if it has one) or number (counting from 0). By default (if this option is not given), the second HDU will be used (equivalent to ‘--cathdu=1’. For more on how to specify the HDU, see the explanation of the ‘--hdu’ option in *note Input output options::. ‘-x STR/INT’ ‘--coordcol=STR/INT’ The column in a catalog to read as a coordinate. The value can be either the column number (starting from 1), or a match/search in the table meta-data, see *note Selecting table columns::. This option must be called multiple times, depending on the number of dimensions in the input dataset. If it is called more than necessary, the extra columns (later calls to this option on the command-line or configuration files) will be ignored, see *note Configuration file precedence::. ‘-n STR/INT’ ‘--namecol=STR/INT’ Column selection of crop file name. The value can be either the column number (starting from 1), or a match/search in the table meta-data, see *note Selecting table columns::. This option can be used both in Image and WCS modes, and not a mandatory. When a column is given to this option, the final crop base file name will be taken from the contents of this column. The directory will be determined by the ‘--output’ option (current directory if not given) and the value to ‘--suffix’ will be appended. When this column is not given, the row number will be used instead. Output options: ‘-c FLT/INT’ ‘--checkcenter=FLT/INT’ Square box width of region in the center of the image to check for blank values. If any of the pixels in this central region of a crop (defined by its center) are blank, then it will not be stored in an output file. If the value to this option is zero, no checking is done. This check is only applied when the cropped region(s) are defined by their center (not by the vertices, see *note Crop modes::). The units of the value are interpreted based on the ‘--mode’ value (in WCS or pixel units). The ultimate checked region size (in pixels) will be an odd integer around the center (converted from WCS, or when an even number of pixels are given to this option). In WCS mode, the value can be given as fractions, for example, if the WCS units are in degrees, ‘0.1/3600’ will correspond to a check size of 0.1 arcseconds. Because survey regions do not often have a clean square or rectangle shape, some of the pixels on the sides of the survey FITS image do not commonly have any data and are blank (see *note Blank pixels::). So when the catalog was not generated from the input image, it often happens that the image does not have data over some of the points. When the given center of a crop falls in such regions or outside the dataset, and this option has a non-zero value, no crop will be created. Therefore with this option, you can specify a width of a small box (3 pixels is often good enough) around the central pixel of the cropped image. You can check which crops were created and which were not from the command-line (if ‘--quiet’ was not called, see *note Operating mode options::), or in Crop’s log file (see *note Crop output::). ‘-p STR’ ‘--suffix=STR’ The suffix (or post-fix) of the output files for when you want all the cropped images to have a special ending. One case where this might be helpful is when besides the science images, you want the weight images (or exposure maps, which are also distributed with survey images) of the cropped regions too. So in one run, you can set the input images to the science images and ‘--suffix=_s.fits’. In the next run you can set the weight images as input and ‘--suffix=_w.fits’. ‘--primaryimghdu’ Write the output into the primary (0-th) HDU/extension of the output. By default, like all Gnuastro’s default outputs, no data is written in the primary extension because the FITS standard suggests keeping that extension free of data and only for meta data. ‘-b’ ‘--noblank’ Pixels outside of the input image that are in the crop box will not be used. By default they are filled with blank values (depending on type), see *note Blank pixels::. This option only applies only in Image mode, see *note Crop modes::. ‘-z’ ‘--zeroisnotblank’ In float or double images, it is common to give the value of zero to blank pixels. If the input image type is one of these two types, such pixels will also be considered as blank. You can disable this behavior with this option, see *note Blank pixels::. Operating mode options: ‘-O STR’ ‘--mode=STR’ Operate in Image mode or WCS mode when the input coordinates can be both image or WCS. The value must either be ‘img’ or ‘wcs’, see *note Crop modes:: for a full description. ---------- Footnotes ---------- (1) (2) 6.1.4.2 Crop output ................... The string given to ‘--output’ option will be interpreted depending on how many crops were requested, see *note Crop modes::: • When a catalog is given, the value of the ‘--output’ (see *note Common options::) will be read as the directory to store the output cropped images. Hence if it does not already exist, Crop will abort with an “No such file or directory” error. The crop file names will consist of two parts: a variable part (the row number of each target starting from 1) along with a fixed string which you can set with the ‘--suffix’ option. Optionally, you may also use the ‘--namecol’ option to define a column in the input catalog to use as the file name instead of numbers. • When only one crop is desired, the value to ‘--output’ will be read as a file name. If no output is specified or if it is a directory, the output file name will follow the automatic output names of Gnuastro, see *note Automatic output::: The string given to ‘--suffix’ will be replaced with the ‘.fits’ suffix of the input. By default, as suggested by the FITS standard and implemented in all Gnuastro programs, the first/primary extension of the output files will only contain meta data. The cropped images/cubes will be written into the 2nd HDU of their respective FITS file (which is actually counted as ‘1’ because HDU counting starts from ‘0’). However, if you want the cropped data to be written into the primary (0-th) HDU, run Crop with the ‘--primaryimghdu’ option. The header of each output cropped image will contain the names of the input image(s) it was cut from. If a name is longer than the 70 character space that the FITS standard allows for header keyword values, the name will be cut into several keywords from the nearest slash (). The keywords have the following format: ‘ICFn_m’ (for Crop File). Where ‘n’ is the number of the image used in this crop and ‘m’ is the part of the name (it can be broken into multiple keywords). Following the name is another keyword named ‘ICFnPIX’ which shows the pixel range from that input image in the same syntax as *note Crop section syntax::. So this string can be directly given to the ‘--section’ option later. Once done, a log file can be created in the current directory with the ‘--log’ option. This file will have three columns and the same number of rows as the number of cropped images. There are also comments on the top of the log file explaining basic information about the run and descriptions for the columns. A short description of the columns is also given below: 1. The cropped image file name for that row. 2. The number of input images that were used to create that image. 3. A ‘0’ if the central few pixels (value to the ‘--checkcenter’ option) are blank and ‘1’ if they are not. When the crop was not defined by its center (see *note Crop modes::), or ‘--checkcenter’ was given a value of 0 (see *note Invoking astcrop::), the center will not be checked and this column will be given a value of ‘-1’. If the output crop(s) have a single element (pixel in an image) and ‘--oneelemstdout’ has been called, no output file will be produced! Instead, the single element’s value is printed on the standard output. See the description of ‘--oneelemstdout’ below for more: ‘-t’ ‘--oneelemstdout’ When a crop only has a single element (a single pixel), print it to the standard output instead of making a file. By default (without this option), a single-pixel crop will be saved to a file, just like a crop of any other size. When a single crop is requested (either through ‘--center’, or a catalog of one row is given), the single value alone is printed with nothing else. This makes it easy to immediately write the value into a shell variable for example: value=$(astcrop img.fits --mode=wcs --center=1.234,5.678 \ --width=1 --widthinpix --oneelemstdout \ --quiet) If a catalog of coordinates is given (that would produce multiple crops; or multiple values in this scenario), the solution for a single value will not work! Recall that Crop will do the crops in parallel, therefore each time you run it, the order of the rows will be different and not correspond to the order of the inputs. To allow identification of each value (which row of the input catalog it corresponds to), Crop will first print the name of the would-be created file name, and print the value after it (separated by an empty SPACE character). In other words, the file in the first column will not actually be created, but the value of the pixel it would have contained (if this option was not called) is printed after it. 6.1.4.3 Crop known issues ......................... When running Crop, you may encounter strange errors and bugs. In these cases, please report a bug and we will try to fix it as soon as possible, see *note Report a bug::. However, some things are beyond our control, or may take too long to fix directly. In this section we list such known issues that may occur in known cases and suggest the hack (or work-around) to fix the problem: Crash with ‘Killed’ when cropping catalog from ‘.fits.gz’ This happens because CFISTIO (that reads and writes FITS files) will internally decompress the file in a temporary place (possibly in the RAM), then start reading from it. On the other hand, by default when given a catalog (with many crops) and not specifying ‘--numthreads’, Crop will use the maximum number of threads available on your system to do each crop faster. On an normal (not compressed) file, parallel access will not cause a problem, however, when attempting parallel access with the maximum number of threads on a compressed file, CFITSIO crashes with ‘Killed’. Therefore the following solutions can be used to fix this crash: • Decrease the number of threads (at the minimum, set ‘--numthreads=1’). Since this solution does not attempt to change any of your previous Crop command components or does not change your local file structure, it is the preferred way. • Decompress the file (with the command below) and feed the ‘.fits’ file into Crop without changing the number of threads. $ gunzip -k image.fits.gz 6.2 Arithmetic ============== It is commonly necessary to do operations on some or all of the elements of a dataset independently (pixels in an image). For example, in the reduction of raw data it is necessary to subtract the Sky value (*note Sky value::) from each image image. Later (once the images as warped into a single grid using Warp for example, see *note Warp::), the images are co-added (the output pixel grid is the average of the pixels of the individual input images). Arithmetic is Gnuastro’s program for such operations on your datasets directly from the command-line. It currently uses the reverse polish or post-fix notation, see *note Reverse polish notation:: and will work on the native data types of the input images/data to reduce CPU and RAM resources, see *note Numeric data types::. For more information on how to run Arithmetic, please see *note Invoking astarithmetic::. 6.2.1 Reverse polish notation ----------------------------- The most common notation for arithmetic operations is the infix notation (https://en.wikipedia.org/wiki/Infix_notation) where the operator goes between the two operands, for example, $4+5$. The infix notation is the preferred way in most programming languages which come with scripting features for large programs. This is because the infix notation requires a way to define precedence when more than one operator is involved. for example, consider the statement ‘5 + 6 / 2’. Should 6 first be divided by 2, then added by 5? Or should 5 first be added with 6, then divided by 2? Therefore we need parenthesis to show precedence: ‘5+(6/2)’ or ‘(5+6)/2’. Furthermore, if you need to leave a value for later processing, you will need to define a variable for it; for example, ‘a=(5+6)/2’. Gnuastro provides libraries where you can also use infix notation in C or C++ programs. However, Gnuastro’s programs are primarily designed to be run on the command-line and the level of complexity that infix notation requires can be annoying/confusing to write on the command-line (where they can get confused with the shell’s parenthesis or variable definitions). Therefore Gnuastro’s Arithmetic and Table (when doing column arithmetic) programs use the post-fix notation, also known as reverse polish notation (https://en.wikipedia.org/wiki/Reverse_Polish_notation). For example, instead of writing ‘5+6’, we write ‘5 6 +’. The Wikipedia article on the reverse polish notation provides some excellent explanation on this notation but here we will give a short summary here for self-sufficiency. In short, in the reverse polish notation, the operator is placed after the operands. As we will see below this removes the need to define parenthesis and lets you use previous values without needing to define a variable. In the future(1) we do plan to also optionally allow infix notation when arithmetic operations on datasets are desired, but due to time constraints on the developers we cannot do it immediately. To easily understand how the reverse polish notation works, you can think of each operand (‘5’ and ‘6’ in the example above) as a node in a “last-in-first-out” stack. One such stack in daily life is a stack of dishes in the kitchen: you put a clean dish, on the top of a stack of dishes when it is ready for later usage. Later, when you need a dish, you pick the top one (hence the “last” dish placed “in” the stack is the “first” dish that comes “out” when necessary). Each operator will need a certain number of operands (in the example above, the ‘+’ operator needs two operands: ‘5’ and ‘6’). In the kitchen metaphor, an operator can be an oven. Every time an operator is confronted, the operator takes (or “pops”) the number of operands it needs from the top of the stack (so they do not exist in the stack any more), does its operation, and places (or “pushes”) the result back on top of the stack. So if you want the average of 5 and 6, you would write: ‘5 6 + 2 /’. The operations that are done are: 1. ‘5’ is an operand, so Arithmetic pushes it to the top of the stack (which is initially empty). In the kitchen metaphor, you can visualize this as taking a new dish from the cabinet, putting the number 5 inside of the dish, and putting the dish on top of the (empty) cooking table in front of you. You now have a stack of one dish on the table in front of you. 2. ‘6’ is also an operand, so it is pushed to the top of the stack. Like before, you can visualize this as taking a new dish from the cabinet, putting the number 6 in it and placing it on top of the previous dish. You now have a stack of two dishes on the table in front of you. 3. ‘+’ is a _binary_ operator, so it will pop the top two elements of the stack out of it, and perform addition on them (the order is $5+6$ in the example above). The result is ‘11’ which is pushed to the top of the stack. To visualize this, you can think of the ‘+’ operator as an oven with a place for two dishes. You pick up the top-most dish (that has the number 6 in it) and put it in the oven. The top dish is now the one that has the number 5. You also pick it up and put it in the oven, and close the oven door. When the oven has finished its cooking, it produces a single output (in one dish, with the number 11 inside of it). You take that output dish and put it back on the table. You now have a stack of one dish on the table in front of you. 4. ‘2’ is an operand so push it onto the top of the stack. In the kitchen metaphor, you again go to the cabinet, pick up a dish and put the number 2 inside of it and put the dish over the previous dish (that has the number 11). You now have a stack of two dishes on the table in front of you. 5. ‘/’ (division) is a binary operator, so pull out the top two elements of the stack (top-most is ‘2’, then ‘11’) and divide the second one by the first. In the kitchen metaphor, the ‘/’ operator can be visualized as a microwave that takes two dishes. But unlike the oven (‘+’ operator) before, the order of inputs matters (they are on top of each other: with the top dish holder being the nominator and the bottom one being the denominator). Again, you look to your stack of dishes on the table. You pick up the top one (with value 2 inside of it) and put it in the microwave’s bottom (denominator) dish holder. Then you go back to your stack of dishes on the table and pick up the top dish (with value 11 inside of it) and put that in the top (nominator) dish holder. The microwave will do its work and when it is finished, returns a new dish with the single value 5.5 inside of it. You pick up the dish from the microwave and place it back on the table. 6. There are no more operands or operators, so simply return the remaining operand in the output. In the kitchen metaphor, you see that your recipe has no more steps, so you just pick up the remaining dish and take it to the dining room to enjoy a good dinner. In the Arithmetic program, the operands can be FITS images of any dimensionality, or numbers (see *note Invoking astarithmetic::). In Table’s column arithmetic, they can be any column in the table (a series of numbers in an array) or a single number (see *note Column arithmetic::). With this notation, very complicated procedures can be created without the need for parenthesis or worrying about precedence. Even functions which take an arbitrary number of arguments can be defined in this notation. This is a very powerful notation and is used in languages like Postscript (2) which produces PDF files when compiled. ---------- Footnotes ---------- (1) (2) See the EPS and PDF part of *note Recognized file formats:: for a little more on the Postscript language. 6.2.2 Integer benefits and pitfalls ----------------------------------- Integers are the simplest numerical data types (*note Numeric data types::). Because of this, their storage space is much less, and their processing is much faster than floating point types. You can confirm this on your computer with the series of commands below. You will make four 5000 by 5000 pixel images filled with random values. Two of them will be saved as signed 8-bit integers, and two with 64-bit floating point types. The last command prints the size of the created images. $ astarithmetic 5000 5000 2 makenew 5 mknoise-sigma int8 -oint-1.fits $ astarithmetic 5000 5000 2 makenew 5 mknoise-sigma int8 -oint-2.fits $ astarithmetic 5000 5000 2 makenew 5 mknoise-sigma float64 -oflt-1.fits $ astarithmetic 5000 5000 2 makenew 5 mknoise-sigma float64 -oflt-2.fits $ ls -lh int-*.fits flt-*.fits The 8-bit integer images are only 24MB, while the 64-bit floating point images are 191 MB! Besides helping in storage (on your disk, or in RAM, while the program is running), the small size of these files also helps in faster reading of the inputs. Furthermore, CPUs can process integer operations much faster than floating points. In the integers, the ones with a smaller width (number of bits) can be processed much faster. You can see this with the two commands below where you will add the integer images with each other and the floats with each other: $ astarithmetic flt-1.fits flt-2.fits + -oflt-sum.fits -g1 $ astarithmetic int-1.fits int-2.fits + -oint-sum.fits -g1 Have a look at the running time of the two commands above (that is printed on their last line). On the system that this paragraph was written on, the floating point and integer image sums were respectively done in 0.481 and 0.089 seconds (the integer operation was almost 5 times faster!). *If your data does not have decimal points, use integer types:* integer types are much faster and can take much less space in your storage or RAM (while the program is running). *Select the smallest width that can host the range/precision of values*: For example, if the largest possible value in your dataset is 1000 and all numbers are integers, store it as a 16-bit integer. Also, if you know the values can never become negative, store it as an unsigned 16-bit integer. For floating point types, if you know you will not need a precision of more than 6 significant digits, use the 32-bit floating point type. For more on the range (for integers) and precision (for floats), see *note Numeric data types::. There is a price to be paid for this improved efficiency in integers: your wisdom! If you have not selected your types wisely, strange situations may happen. for example, try the command below: $ astarithmetic 125 10 + You expect the output to be $135$, but it will be $-121$! The reason is that when Arithmetic (or column-arithmetic in Table) confronts a number on the command-line, it use the principles above to select the most efficient type for each number. Both $125$ and $10$ can safely fit within a signed, 8-bit integer type, so arithmetic will store both as an 8-bit integer. However, the sum ($135$) is larger than the maximum possible value of an 8-bit signed integer ($127$). Therefore an integer overflow will occur, and the bits will be over-written. As a result, the value will be $135-128=7$ more than the minimum value of this type ($-128$), which is $-128+7=-121$. When you know situations like this may occur, you can simply use *note Numerical type conversion operators::, to set just one of the inputs to a wider data type (the smallest, wider type to avoid wasting resources). In the example above, this would be ‘uint16’: $ astarithmetic 125 uint16 10 + The reason this worked is that $125$ is now converted into an unsigned 16-bit integer before the ‘+’ operator. Since this is larger than an 8-bit integer, the C programming language’s automatic type conversion will treat both as the wider type and store the result of the binary operation (‘+’) in that type. For such a basic operation like the command above, a faster hack would be any of the two commands below (which are equivalent). This is because ‘125.0’ or ‘125.’ are interpreted as floating-point types and they do not suffer from such issues (converting only on one input is enough): $ astarithmetic 125. 10 + $ astarithmetic 125.0 10 + For this particular command, the fix above will be as fast as the ‘uint16’ solution. This is because there are only two numbers, and the overhead of Arithmetic (reading configuration files, etc.) dominates the running time. However, for large datasets, the ‘uint16’ solution will be faster (as you saw above), Arithmetic will consume less RAM while running, and the output will consume less storage in your system (all major benefits)! It is possible to do internal checks in Gnuastro and catch integer overflows and correct them internally. However, we have not opted for this solution because all those checks will consume significant resources and slow down the program (especially with large datasets where RAM, storage and running time become important). To be optimal, we therefore trust that you (the wise Gnuastro user!) make the appropriate type conversion in your commands where necessary (recall that the operators are available in *note Numerical type conversion operators::). 6.2.3 Arithmetic operators -------------------------- In this section, list of recognized operators in Arithmetic (and the Table program’s *note Column arithmetic::) and discussed in detail with examples. As mentioned before, to be able to easily do complex operations on the command-line, the Reverse Polish Notation is used (where you write ‘$4\quad5\quad+$’ instead of ‘$4 + 5$’), if you are not already familiar with it, before continuing, please see *note Reverse polish notation::. The operands to all operators can be a data array (for example, a FITS image or data cube) or a number, the output will be an array or number according to the inputs. for example, a number multiplied by an array will produce an array. The numerical data type of the output of each operator is described within it. Here are some generic tips and tricks (relevant to all operators): Multiple operators in one command When you need to use arithmetic commands in several consecutive operations, you can use one command instead of multiple commands and perform all calculations in the same command. For example, assume you want to apply a threshold of 10 on your image, and label the connected groups of pixel above this threshold. You need two operators for this: ‘gt’ (for “greater than”, see *note Conditional operators::) and ‘connected-components’ (see *note Mathematical morphology operators::). The bad (non-optimized and slow) way of doing this is to call Arithmetic two times: $ astarithmetic image.fits 10 gt --output=thresh.fits $ astarithmetic thresh.fits 2 connected-components \ --output=labeled.fits $ rm thresh.fits The good (optimal) way is to call them after each other (remember *note Reverse polish notation::): $ astarithmetic image.fits 10 gt 2 connected-components \ --output=labeled.fits You can similarly add any number of operations that must be done sequentially in a single command and benefit from the speed and lack of intermediate files. When your commands become long, you can use the ‘set-AAA’ operator to make it more readable, see *note Operand storage in memory or a file::. Blank pixels in Arithmetic Blank pixels in the image (see *note Blank pixels::) will be stored based on the data type. When the input is floating point type, blank values are NaN. One aspect of NaN values is that by definition they will fail on _any_ comparison. Also, any operator that includes a NaN as a an operand will produce a NaN (irrespective of its other operands). Hence both equal and not-equal operators will fail when both their operands are NaN! Therefore, the only way to guarantee selection of blank pixels is through the ‘isblank’ operator explained above. One way you can exploit this property of the NaN value to your advantage is when you want a fully zero-valued image (even over the blank pixels) based on an already existing image (with same size and world coordinate system settings). The following command will produce this for you: $ astarithmetic input.fits nan eq --output=all-zeros.fits Note that on the command-line you can write NaN in any case (for example, ‘NaN’, or ‘NAN’ are also acceptable). Reading NaN as a floating point number in Gnuastro is not case-sensitive. 6.2.3.1 Basic mathematical operators .................................... These are some of the most common operations you will be doing on your data and include, so no further explanation is necessary. If you are new to Gnuastro, just read the description of each carefully. ‘+’ Addition, so “‘4 5 +’” is equivalent to $4+5$. for example, in the command below, the value 20000 is added to each pixel’s value in ‘image.fits’: $ astarithmetic 20000 image.fits + You can also use this operator to sum the values of one pixel in two images (which have to be the same size). for example, in the commands below (which are identical, see paragraph after the commands), each pixel of ‘sum.fits’ is the sum of the same pixel’s values in ‘a.fits’ and ‘b.fits’. $ astarithmetic a.fits b.fits + -h1 -h1 --output=sum.fits $ astarithmetic a.fits b.fits + -g1 --output=sum.fits The HDU/extension has to be specified for each image with ‘-h’. However, if the HDUs are the same in all inputs, you can use ‘-g’ to only specify the HDU once If you need to add more than one dataset, one way is to use this operator multiple times, for example, see the two commands below that are identical in the Reverse Polish Notation (*note Reverse polish notation::): $ astarithmetic a.fits b.fits + c.fits + -osum.fits $ astarithmetic a.fits b.fits c.fits + + -osum.fits However, this can get annoying/buggy if you have more than three or four images, in that case, a better way to sum data is to use the ‘sum’ operator (which also ignores blank pixels), that is discussed below. ‘-’ Subtraction, so “‘4 5 -’” is equivalent to $4-5$. Usage of this operator is similar to ‘+’ operator, for example: $ astarithmetic 20000 image.fits - $ astarithmetic a.fits b.fits - -g1 --output=sub.fits ‘x’ Multiplication, so “‘4 5 x’” is equivalent to $4\times5$. for example, in the command below, the value of each output pixel is 5 times its value in ‘image.fits’: $ astarithmetic image.fits 5 x And you can multiply the value of each pixel in two images, like this: $ astarithmetic a.fits a.fits x -g1 –output=multip.fits ‘/’ Division, so “‘4 5 /’” is equivalent to $4/5$. Like the multiplication, for example $ astarithmetic image.fits 5 -h1 / $ astarithmetic a.fits b.fits / -g1 –output=div.fits ‘%’ Modulo (remainder), so “‘3 2 %’” will return $1$. Note that the modulo operator only works on integer types (see *note Numeric data types::). This operator is therefore not defined for most processed astronomical astronomical images that have floating-point value. However it is useful in labeled images, for example, *note Segment output::). In such cases, each pixel is the integer label of the object it is associated with hence with the example command below, we can change the labels to only be between 1 and 4 and decrease all objects on the image to 4/5th (all objects with a label that is a multiple of 5 will be set to 0). $ astarithmetic label.fits 5 1 % ‘abs’ Absolute value of first operand, so “‘4 abs’” is equivalent to $|4|$. for example, the output of the command bellow will not have any negative pixels (all negative pixels will be multiplied by $-1$ to become positive) $ astarithmetic image.fits abs ‘pow’ First operand to the power of the second, so “‘4.3 5 pow’” is equivalent to $4.3^{5}$. for example, with the command below all pixels will be squared $ astarithmetic image.fits 2 pow ‘sqrt’ The square root of the first operand, so “‘5 sqrt’” is equivalent to $\sqrt{5}$. Since the square root is only defined for positive values, any negative-valued pixel will become NaN (blank). The output will have a floating point type, but its precision is determined from the input: if the input is a 64-bit floating point, the output will also be 64-bit. Otherwise, the output will be 32-bit floating point (see *note Numeric data types:: for the respective precision). Therefore if you require 64-bit precision in estimating the square root, convert the input to 64-bit floating point first, for example, with ‘5 float64 sqrt’. for example, each pixel of the output of the command below will be the square root of that pixel in the input. $ astarithmetic image.fits sqrt If you just want to scale an image with negative values using this operator (for better visual inspection, and the actual values do not matter for you), you can subtract the image from its minimum value, then take its square root: $ astarithmetic image.fits image.fits minvalue - sqrt -g1 Alternatively, to avoid reading the image into memory two times, you can use the ‘set-’ operator to read it into the variable ‘i’ and use ‘i’ two times to speed up the operation (described below): $ astarithmetic image.fits set-i i i minvalue - sqrt ‘log’ Natural logarithm of first operand, so “‘4 log’” is equivalent to $ln(4)$. Negative pixels will become NaN, and the output type is determined from the input, see the explanation under ‘sqrt’ for more on these features. for example, the command below will take the natural logarithm of every pixel in the input. $ astarithmetic image.fits log --output=log.fits ‘log10’ Base-10 logarithm of first popped operand, so “‘4 log’” is equivalent to $log_{10}(4)$. Negative pixels will become NaN, and the output type is determined from the input, see the explanation under ‘sqrt’ for more on these features. for example, the command below will take the base-10 logarithm of every pixel in the input. $ astarithmetic image.fits log10 6.2.3.2 Trigonometric and hyperbolic operators .............................................. All the trigonometric and hyperbolic functions are described here. One good thing with these operators is that they take inputs and outputs in degrees (which we usually need as input or output), not radians (like most other programs/libraries). ‘sin’ ‘cos’ ‘tan’ Basic trigonometric functions. They take one operand, in units of degrees. ‘asin’ ‘acos’ ‘atan’ Inverse trigonometric functions. They take one operand and the returned values are in units of degrees. ‘atan2’ Inverse tangent (output in units of degrees) that uses the signs of the input coordinates to distinguish between the quadrants. This operator therefore needs two operands: the first popped operand is assumed to be the X axis position of the point, and the second popped operand is its Y axis coordinate. for example, see the commands below. To be more clear, we are using Table’s *note Column arithmetic:: which uses exactly the same internal library function as the Arithmetic program for images. We are showing the results for four points in the four quadrants of the 2D space (if you want to try running them, you do not need to type/copy the parts after <#>). The first point (2,2) is in the first quadrant, therefore the returned angle is 45 degrees. But the second, third and fourth points are in the quadrants of the same order, and the returned angles reflect the quadrant. $ echo " 2 2" | asttable -c'arith $2 $1 atan2' # --> 45 $ echo " 2 -2" | asttable -c'arith $2 $1 atan2' # --> -45 $ echo "-2 -2" | asttable -c'arith $2 $1 atan2' # --> -135 $ echo "-2 2" | asttable -c'arith $2 $1 atan2' # --> 135 However, if you simply use the classic arc-tangent operator (‘atan’) for the same points, the result will only be in two quadrants as you see below: $ echo " 2 2" | asttable -c'arith $2 $1 / atan' # --> 45 $ echo " 2 -2" | asttable -c'arith $2 $1 / atan' # --> -45 $ echo "-2 -2" | asttable -c'arith $2 $1 / atan' # --> 45 $ echo "-2 2" | asttable -c'arith $2 $1 / atan' # --> -45 ‘sinh’ ‘cosh’ ‘tanh’ Hyperbolic sine, cosine, and tangent. These operators take a single operand. ‘asinh’ ‘acosh’ ‘atanh’ Inverse Hyperbolic sine, cosine, and tangent. These operators take a single operand. 6.2.3.3 Constants ................. During your analysis it is often necessary to have certain constants like the number $\pi$. The “operators” in this section do not actually take any operand, they just replace the desired constant into the stack. So in effect, these are actually operands. But since their value is not inserted by the user, we have placed them in the list of operators. ‘e’ Euler’s number, or the base of the natural logarithm (no units). See Wikipedia (https://en.wikipedia.org/wiki/E_(mathematical_constant)). ‘pi’ Ratio of circle’s circumference to its diameter (no units). See Wikipedia (https://en.wikipedia.org/wiki/Pi). ‘c’ The speed of light in vacuum, in units of $m/s$. see Wikipedia (https://en.wikipedia.org/wiki/Speed_of_light). ‘G’ The gravitational constant, in units of $m^3/kg/s^2$. See Wikipedia (https://en.wikipedia.org/wiki/Gravitational_constant). ‘h’ Plank’s constant, in units of $J/Hz$ or $kg\times m^2/s$. See Wikipedia (https://en.wikipedia.org/wiki/Planck_constant). ‘au’ Astronomical Unit, in units of meters. See Wikipedia (https://en.wikipedia.org/wiki/Astronomical_unit). ‘ly’ Distance covered by light in vacuum in one year, in units of meters. See Wikipedia (https://en.wikipedia.org/wiki/Light-year). ‘avogadro’ Avogadro’s constant, in units of $1/mol$. See Wikipedia (https://en.wikipedia.org/wiki/Avogadro_constant). ‘fine-structure’ The fine-structure constant (no units). See Wikipedia (https://en.wikipedia.org/wiki/Fine-structure_constant). 6.2.3.4 Unit conversion operators ................................. It often happens that you have data in one unit (for example, magnitudes to measure the brightness of a galaxy), but would like to convert it into another (for example, electron counts on your CCD). While the equations for the unit conversions can be easily found on the internet, the operators in this section are designed to simplify the process and let you do it easily. ‘counts-to-mag’ Convert counts (usually CCD outputs) to magnitudes using the given zero point. The zero point is the first popped operand and the count image or value is the second popped operand. for example, assume you have measured the standard deviation of the noise in an image to be ‘0.1’ counts, and the image’s zero point is ‘22.5’ and you want to measure the _per-pixel_ surface brightness limit of the dataset(1). To apply this operator on an image, simply replace ‘0.1’ with the image name, as described below. $ astarithmetic 0.1 22.5 counts-to-mag --quiet Of course, you can also convert every pixel in an image (or table column in Table’s *note Column arithmetic::) with this operator if you replace the second popped operand with an image/column name. For an example of applying this operator on an image, see the description of surface brightness in *note Brightness flux magnitude::, where we will convert an image’s pixel values to surface brightness. ‘mag-to-counts’ Convert magnitudes to counts (usually CCD outputs) using the given zero point. The zero point is the first popped operand and the magnitude value is the second. for example, if an object has a magnitude of 20, you can estimate the counts corresponding to it (when the image has a zero point of 24.8) with this command: Note that because the output is a single number, we are using ‘--quiet’ to avoid printing extra information. $ astarithmetic 20 24.8 mag-to-counts --quiet ‘counts-to-sb’ Convert counts to surface brightness using the zero point and area (in units of arcsec$^2$). The first popped operand is the area (in arcsec$^2$), the second popped operand is the zero point and the third are the count values. Estimating the surface brightness involves taking the logarithm. Therefore this operator will produce NaN for counts with a negative value. for example, with the commands below, we read the zero point from the image headers (assuming it is in the ‘ZPOINT’ keyword), we calculate the pixel area from the image itself, and we call this operator to convert the image pixels (in counts) to surface brightness (mag/arcsec$^2$). $ zero point=$(astfits image.fits --keyvalue=ZPOINT -q) $ pixarea=$(astfits image.fits --pixelareaarcsec2) $ astarithmetic image.fits $zero point $pixarea counts-to-sb \ --output=image-sb.fits For more on the definition of surface brightness see *note Brightness flux magnitude::, and for a fully tutorial on optimal usage of this, see *note FITS images in a publication::. ‘sb-to-counts’ Convert surface brightness using the zero point and area (in units of arcsec$^2$) to counts. The first popped operand is the area (in arcsec$^2$), the second popped operand is the zero point and the third are the surface brightness values. See the description of ‘counts-to-sb’ for more. ‘mag-to-sb’ Convert magnitudes to surface brightness over a certain area (in units of arcsec$^2$). The first popped operand is the area and the second is the magnitude. For example, let’s assume you have a table with the two columns of magnitude (called ‘MAG’) and area (called ‘AREAARCSEC2’). In the command below, we will use *note Column arithmetic:: to return the surface brightness. $ asttable table.fits -c'arith MAG AREAARCSEC2 mag-to-sb' ‘sb-to-mag’ Convert surface brightness to magnitudes over a certain area (in units of arcsec$^2$). The first popped operand is the area and the second is the magnitude. See the description of ‘mag-to-sb’ for more. ‘counts-to-jy’ Convert counts (usually CCD outputs) to Janskys through an AB-magnitude based zero point. The top-popped operand is assumed to be the AB-magnitude zero point and the second-popped operand is assumed to be a dataset in units of counts (an image in Arithmetic, and a column in Table’s *note Column arithmetic::). For the full equation and basic definitions, see *note Brightness flux magnitude::. for example, SDSS images are calibrated in units of nanomaggies, with a fixed zero point magnitude of 22.5. Therefore you can convert the units of SDSS image pixels to Janskys with the command below: $ astarithmetic sdss-image.fits 22.5 counts-to-jy ‘jy-to-counts’ Convert Janskys to counts (usually CCD outputs) through an AB-magnitude based zero point. This is the inverse operation of the ‘counts-to-jy’, see there for usage example. ‘mag-to-jy’ Convert AB magnitudes to Janskys, see *note Brightness flux magnitude::. ‘jy-to-mag’ Convert Janskys to AB magnitude, see *note Brightness flux magnitude::. ‘au-to-pc’ Convert Astronomical Units (AUs) to Parsecs (PCs). This operator takes a single argument which is interpreted to be the input AUs. The conversion is based on the definition of Parsecs: $1 \rm{PC} = 1/tan(1^{\prime\prime}) \rm{AU}$, where $1^{\prime\prime}$ is one arcseconds. In other words, $1 (\rm{PC}) = 648000/\pi (\rm{AU})$. For example, if we take Pluto’s average distance to the Sun to be 40 AUs, we can obtain its distance in Parsecs using this command: echo 40 | asttable -c'arith $1 au-to-pc' ‘pc-to-au’ Convert Parsecs (PCs) to Astronomical Units (AUs). This operator takes a single argument which is interpreted to be the input PCs. For more on the conversion equation, see description of ‘au-to-pc’. For example, Proxima Centauri (the nearest star to the Solar system) is 1.3020 Parsecs from the Sun, we can calculate this distance in units of AUs with the command below: echo 1.3020 | asttable -c'arith $1 pc-to-au' ‘ly-to-pc’ Convert Light-years (LY) to Parsecs (PCs). This operator takes a single argument which is interpreted to be the input LYs. The conversion is done from IAU’s definition of the light-year (9460730472580800 m $\approx$ 63241.077 AU = 0.306601 PC, for the conversion of AU to PC, see the description of ‘au-to-pc’). for example, the distance of Andromeda galaxy to our galaxy is 2.5 million light-years, so its distance in kilo-Parsecs can be calculated with the command below (note that we want the output in kilo-parsecs, so we are dividing the output of this operator by 1000): echo 2.5e6 | asttable -c'arith $1 ly-to-pc 1000 /' ‘pc-to-ly’ Convert Parsecs (PCs) to Light-years (LY). This operator takes a single argument which is interpreted to be the input PCs. For the conversion and an example of the inverse of this operator, see the description of ‘ly-to-pc’. ‘ly-to-au’ Convert Light-years (LY) to Astronomical Units (AUs). This operator takes a single argument which is interpreted to be the input LYs. For the conversion and a similar example, see the description of ‘ly-to-pc’. ‘au-to-ly’ Convert Astronomical Units (AUs) to Light-years (LY). This operator takes a single argument which is interpreted to be the input AUs. For the conversion and a similar example, see the description of ‘ly-to-pc’. ---------- Footnotes ---------- (1) The _per-pixel_ surface brightness limit is the magnitude of the noise standard deviation. For more on surface brightness see *note Brightness flux magnitude::. In the example command, because the output is a single number, we are using ‘--quiet’ to avoid printing extra information. 6.2.3.5 Statistical operators ............................. The operators in this section take a single dataset as input, and will return the desired statistic as a single value. ‘minvalue’ Minimum value in the first popped operand, so “‘a.fits minvalue’” will push the minimum pixel value in this image onto the stack. When this operator acts on a single image, the output (operand that is put back on the stack) will no longer be an image, but a number. The output of this operand is in the same type as the input. This operator is mainly intended for multi-element datasets (for example, images or data cubes), if the popped operand is a number, it will just return it without any change. Note that when the final remaining/output operand is a single number, it is printed onto the standard output. for example, with the command below the minimum pixel value in ‘image.fits’ will be printed in the terminal: $ astarithmetic image.fits minvalue However, the output above also includes a lot of extra information that are not relevant in this context. If you just want the final number, run Arithmetic in quiet mode: $ astarithmetic image.fits minvalue -q Also see the description of ‘sqrt’ for other example usages of this operator. ‘maxvalue’ Maximum value of first operand in the same type, similar to ‘minvalue’, see the description there for more. For example $ astarithmetic image.fits maxvalue -q ‘numbervalue’ Number of non-blank elements in first operand in the ‘uint64’ type (since it is always a positive integer, see *note Numeric data types::). Its usage is similar to ‘minvalue’, for example $ astarithmetic image.fits numbervalue -q ‘sumvalue’ Sum of non-blank elements in first operand in the ‘float32’ type. Its usage is similar to ‘minvalue’, for example $ astarithmetic image.fits sumvalue -q ‘meanvalue’ Mean value of non-blank elements in first operand in the ‘float32’ type. Its usage is similar to ‘minvalue’, for example $ astarithmetic image.fits meanvalue -q ‘stdvalue’ Standard deviation of non-blank elements in first operand in the ‘float32’ type. Its usage is similar to ‘minvalue’, for example $ astarithmetic image.fits stdvalue -q ‘medianvalue’ Median of non-blank elements in first operand with the same type. Its usage is similar to ‘minvalue’, for example $ astarithmetic image.fits medianvalue -q ‘unique’ Remove all duplicate (and blank) elements from the first popped operand. The unique elements of the dataset will be stored in a single-dimensional dataset. Recall that by default, single-dimensional datasets are stored as a table column in the output. But you can use ‘--onedasimage’ or ‘--onedonstdout’ to respectively store them as a single-dimensional FITS array/image, or to print them on the standard output. Although you can use this operator on the floating point dataset, due to floating-point errors it may give non-reasonable values: because the tenth digit of the decimal point is also considered although it may be statistically meaningless, see *note Numeric data types::. It is therefore better/recommended to use it on the integer dataset like the labeled images of *note Segment output:: where each pixel has the integer label of the object/clump it is associated with. for example, let’s assume you have cropped a region of a larger labeled image and want to find the labels/objects that are within the crop. With this operator, this job is trivial: $ astarithmetic seg-crop.fits unique ‘noblank’ Remove all blank elements from the first popped operand. Since the blank pixels are being removed, the output dataset will always be single-dimensional, independent of the dimensionality of the input. Recall that by default, single-dimensional datasets are stored as a table column in the output. But you can use ‘--onedasimage’ or ‘--onedonstdout’ to respectively store them as a single-dimensional FITS array/image, or to print them on the standard output. for example, with the command below, the non-blank pixel values of ‘cropped.fits’ are printed on the command-line (the ‘--quiet’ option is used to remove the extra information that Arithmetic prints as it reads the inputs, its version and its running time). $ astarithmetic cropped.fits nonblank --onedonstdout --quiet ‘size’ Size of the dataset along a given FITS (or FORTRAN) dimension (counting from 1). The desired dimension should be the first popped operand and the dataset must be the second popped operand. The output will be a single unsigned integer (dimensions cannot be negative). For example, the following command will produce the size of the first extension/HDU (the default HDU) of ‘a.fits’ along the second FITS axis. $ astarithmetic a.fits 2 size 6.2.3.6 Stacking operators .......................... The operators in this section are used when you have multiple datasets that you would like to merge into one, commonly known as “stacking” or “coaddition”. For example, you have taken ten exposures of your scientific target, and you would like to combine them all into one deep stacked image that is deeper. When calling these operators you should determine how many operands they should take in (unlike the rest of the operators that have a fixed number of input operands). As described in the first operand below, you do this through their first popped operand (which should be a single integer number that is larger than one). ‘min’ For each pixel, find the minimum value in all given datasets. The output will have the same type as the input. The first popped operand to this operator must be a positive integer number which specifies how many further operands should be popped from the stack. All the subsequently popped operands must have the same type and size. This operator (and all the variable-operand operators similar to it that are discussed below) will work in multi-threaded mode unless Arithmetic is called with the ‘--numthreads=1’ option, see *note Multi-threaded operations::. Each pixel of the output of the ‘min’ operator will be given the minimum value of the same pixel from all the popped operands/images. for example, the following command will produce an image with the same size and type as the three inputs, but each output pixel value will be the minimum of the same pixel’s values in all three input images. $ astarithmetic a.fits b.fits c.fits 3 min --output=min.fits Important notes: • NaN/blank pixels will be ignored, see *note Blank pixels::. • The output will have the same type as the inputs. This is natural for the ‘min’ and ‘max’ operators, but for other similar operators (for example, ‘sum’, or ‘average’) the per-pixel operations will be done in double precision floating point and then stored back in the input type. Therefore, if the input was an integer, C’s internal type conversion will be used. • The operation will be multi-threaded, greatly speeding up the process if you have large and numerous data to stack. You can disable multi-threaded operations with the ‘--numthreads=1’ option (see *note Multi-threaded operations::). ‘max’ For each pixel, find the maximum value in all given datasets. The output will have the same type as the input. This operator is called similar to the ‘min’ operator, please see there for more. For example $ astarithmetic a.fits b.fits c.fits 3 max -omax.fits ‘number’ For each pixel count the number of non-blank pixels in all given datasets. The output will be an unsigned 32-bit integer datatype (see *note Numeric data types::). This operator is called similar to the ‘min’ operator, please see there for more. For example $ astarithmetic a.fits b.fits c.fits 3 number -onum.fits Some datasets may have blank values (which are also ignored in all similar operators like ‘min’, ‘sum’, ‘mean’ or ‘median’). Hence, the final pixel values of this operator will not, in general, be equal to the number of inputs. This operator is therefore mostly called in parallel with those operators to know the “weight” of each pixel (in case you want to only keep pixels that had the full exposure for example). ‘sum’ For each pixel, calculate the sum in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the ‘min’ operator, please see there for more. For example $ astarithmetic a.fits b.fits c.fits 3 sum -ostack-sum.fits ‘mean’ For each pixel, calculate the mean in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the ‘min’ operator, please see there for more. For example $ astarithmetic a.fits b.fits c.fits 3 mean -ocoadd-mean.fits ‘std’ For each pixel, find the standard deviation in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the ‘min’ operator, please see there for more. For example $ astarithmetic a.fits b.fits c.fits 3 std -ostd.fits ‘median’ For each pixel, find the median in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the ‘min’ operator, please see there for more. For example $ astarithmetic a.fits b.fits c.fits 3 median \ --output=stack-median.fits ‘quantile’ For each pixel, find the quantile from all given datasets. The output will have the same numeric data type and size as the input datasets. Besides the input datasets, the quantile operator also needs a single parameter (the requested quantile). The parameter should be the first popped operand, with a value between (and including) 0 and 1. The second popped operand must be the number of datasets to use. In the example below, the first-popped operand (‘0.7’) is the quantile, the second-popped operand (‘3’) is the number of datasets to pop. astarithmetic a.fits b.fits c.fits 3 0.7 quantile ‘sigclip-number’ For each pixel, find the sigma-clipped number (after removing outliers) in all given datasets. The output will have the an unsigned 32-bit integer type (see *note Numeric data types::). This operator will combine the specified number of inputs into a single output that contains the number of remaining elements after $\sigma$-clipping on each element/pixel (for more on $\sigma$-clipping, see *note Sigma clipping::). This operator is very similar to ‘min’, with the exception that it expects two operands (parameters for sigma-clipping) before the total number of inputs. The first popped operand is the termination criteria and the second is the multiple of $\sigma$. for example, in the command below, the first popped operand (‘0.2’) is the sigma clipping termination criteria. If the termination criteria is larger than, or equal to, 1 it is interpreted as the number of clips to do. But if it is between 0 and 1, then it is the tolerance level on the standard deviation (see *note Sigma clipping::). The second popped operand (‘5’) is the multiple of sigma to use in sigma-clipping. The third popped operand (‘10’) is number of datasets that will be used (similar to the first popped operand to ‘min’). astarithmetic a.fits b.fits c.fits 3 5 0.2 sigclip-number ‘sigclip-median’ For each pixel, find the sigma-clipped median in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the ‘sigclip-number’ operator, please see there for more. For example astarithmetic a.fits b.fits c.fits 3 5 0.2 sigclip-median ‘sigclip-mean’ For each pixel, find the sigma-clipped mean in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the ‘sigclip-number’ operator, please see there for more. For example astarithmetic a.fits b.fits c.fits 3 5 0.2 sigclip-mean ‘sigclip-std’ For each pixel, find the sigma-clipped standard deviation in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the ‘sigclip-number’ operator, please see there for more. For example astarithmetic a.fits b.fits c.fits 3 5 0.2 sigclip-std 6.2.3.7 Filtering (smoothing) operators ....................................... Image filtering is commonly used for smoothing: every pixel value in the output image is created by applying a certain statistic to the pixels in its vicinity. ‘filter-mean’ Apply mean filtering (or moving average (https://en.wikipedia.org/wiki/Moving_average)) on the input dataset. During mean filtering, each pixel (data element) is replaced by the mean value of all its surrounding pixels (excluding blank values). The number of surrounding pixels in each dimension (to calculate the mean) is determined through the earlier operands that have been pushed onto the stack prior to the input dataset. The number of necessary operands is determined by the dimensions of the input dataset (first popped operand). The order of the dimensions on the command-line is the order in FITS format. Here is one example: $ astarithmetic 5 4 image.fits filter-mean In this example, each pixel is replaced by the mean of a 5 by 4 box around it. The box is 5 pixels along the first FITS dimension (horizontal when viewed in ds9) and 4 pixels along the second FITS dimension (vertical). Each pixel will be placed in the center of the box that the mean is calculated on. If the given width along a dimension is even, then the center is assumed to be between the pixels (not in the center of a pixel). When the pixel is close to the edge, the pixels of the box that fall outside the image are ignored. Therefore, on the edge, less points will be used in calculating the mean. The final effect of mean filtering is to smooth the input image, it is essentially a convolution with a kernel that has identical values for all its pixels (is flat), see *note Convolution process::. Note that blank pixels will also be affected by this operator: if there are any non-blank elements in the box surrounding a blank pixel, in the filtered image, it will have the mean of the non-blank elements, therefore it will not be blank any more. If blank elements are important for your analysis, you can use the ‘isblank’ with the ‘where’ operator to set them back to blank after filtering. ‘filter-median’ Apply median filtering (https://en.wikipedia.org/wiki/Median_filter) on the input dataset. This is very similar to ‘filter-mean’, except that instead of the mean value of the box pixels, the median value is used to replace a pixel value. For more on how to use this operator, please see ‘filter-mean’. The median is less susceptible to outliers compared to the mean. As a result, after median filtering, the pixel values will be more discontinuous than mean filtering. ‘filter-sigclip-mean’ Apply a $\sigma$-clipped mean filtering onto the input dataset. This is very similar to ‘filter-mean’, except that all outliers (identified by the $\sigma$-clipping algorithm) have been removed, see *note Sigma clipping:: for more on the basics of this algorithm. As described there, two extra input parameters are necessary for $\sigma$-clipping: the multiple of $\sigma$ and the termination criteria. ‘filter-sigclip-mean’ therefore needs to pop two other operands from the stack after the dimensions of the box. for example, the line below uses the same box size as the example of ‘filter-mean’. However, all elements in the box that are iteratively beyond $3\sigma$ of the distribution’s median are removed from the final calculation of the mean until the change in $\sigma$ is less than $0.2$. $ astarithmetic 3 0.2 5 4 image.fits filter-sigclip-mean The median (which needs a sorted dataset) is necessary for $\sigma$-clipping, therefore ‘filter-sigclip-mean’ can be significantly slower than ‘filter-mean’. However, if there are strong outliers in the dataset that you want to ignore (for example, emission lines on a spectrum when finding the continuum), this is a much better solution. ‘filter-sigclip-median’ Apply a $\sigma$-clipped median filtering onto the input dataset. This operator and its necessary operands are almost identical to ‘filter-sigclip-mean’, except that after $\sigma$-clipping, the median value (which is less affected by outliers than the mean) is added back to the stack. 6.2.3.8 Interpolation operators ............................... Interpolation is the process of removing blank pixels from a dataset (by giving them a value based on the non-blank neighbors). ‘interpolate-medianngb’ Interpolate the blank elements of the second popped operand with the median of nearest non-blank neighbors to each. The number of the nearest non-blank neighbors used to calculate the median is given by the first popped operand. The distance of the nearest non-blank neighbors is irrelevant in this interpolation. The neighbors of each blank pixel will be parsed in expanding circular rings (for 2D images) or spherical surfaces (for 3D cube) and each non-blank element over them is stored in memory. When the requested number of non-blank neighbors have been found, their median is used to replace that blank element. for example, the line below replaces each blank element with the median of the nearest 5 pixels. $ astarithmetic image.fits 5 interpolate-medianngb When you want to interpolate blank regions and you want each blank region to have a fixed value (for example, the centers of saturated stars) this operator is not good. Because the pixels used to interpolate various parts of the region differ. For such scenarios, you may use ‘interpolate-maxofregion’ or ‘interpolate-inofregion’ (described below). ‘interpolate-minngb’ Similar to ‘interpolate-medianngb’, but will fill the blank values of the dataset with the minimum value of the nearest neighbors. ‘interpolate-maxngb’ Similar to ‘interpolate-medianngb’, but will fill the blank values of the dataset with the maximum value of the nearest neighbors. One useful implementation of this operator is to fill the saturated pixels of stars in images. ‘interpolate-minofregion’ Interpolate all blank regions (consisting of many blank pixels that are touching) in the second popped operand with the minimum value of the pixels that are immediately bordering that region (a single value). The first popped operand is the connectivity (see description in ‘connected-components’). for example, with the command below all the connected blank regions of ‘image.fits’ will be filled. Its an image (2D dataset), so a 2 connectivity means that the independent blank regions are defined by 8-connected neighbors. If connectivity was 1, the regions would be defined by 4-connectivity: blank regions that may only be touching on the corner of one pixel would be identified as separate regions. $ astarithmetic image.fits 2 interpolate-minofregion ‘interpolate-maxofregion’ Similar to ‘interpolate-minofregion’, but the maximum is used to fill the blank regions. This operator can be useful in filling saturated pixels in stars for example. Recall that the ‘interpolate-maxngb’ operator looks for the maximum value with a given number of neighboring pixels and is more useful in small noisy regions. Therefore as the blank regions become larger, ‘interpolate-maxngb’ can cause a fragmentation in the connected blank region because the nearest neighbor to one part of the blank region, may not fall within the pixels searched for the other regions. With this option, the size of the blank region is irrelevant: all the pixels bordering the blank region are parsed and their maximum value is used for the whole region. 6.2.3.9 Dimensionality changing operators ......................................... Through these operators you can change the dimensions of the output through certain statistics on the dimensions that should be removed. For example, let’s assume you have a 3D data cube that has 300 by 300 pixels in the RA and Dec dimensions (first two dimensions), and 3600 slices along the wavelength (third dimension), so the whole cube is $300\times300\times3600$ voxels (volume elements). To create a narrow-band image that only contains 100 slices around a certain wavelength, you can crop that section (using *note Crop::), giving you a $300\times300\times100$ cube. You can now use the ‘collapse-sum’ operator below to “collapse” all the 100 slices into one 2D image that has $300\times300$ pixels. Every pixel in this 2D image will have the flux of the sum of the 100 slices. ‘stitch’ Stitch (connect) any number of given images together along the given dimension. The output has the same number of dimensions as the input, but the number of pixels along the requested dimension will be different from the inputs. The ‘stitch’ operator takes at least three operands: • The first popped operand (placed just before ‘stitch’) is the direction (dimension) that the images should be stitched along. The first FITS dimension is along the horizontal, therefore a value of ‘1’ will stitch them horizontally. Similarly, giving a value of ‘2’ will result in a vertical stitch. • The second popped operand is the number of images that should be stitched. • Depending on the value given to the second popped operand, ‘stitch’ will pop the given number of datasets from the stack and stitch them along the given dimension. The popped images have to have the same number of pixels along the other dimension. The order of the stitching is defined by how they are placed in the command-line, not how they are popped (after being popped, they are placed in a list in the same order). For example, in the commands below, we will first crop out fixed sized regions of $100\times300$ pixels of a larger image (‘large.fits’) first. In the first call of Arithmetic below, we will stitch the bottom set of crops together along the first (horizontal) axis. In the second Arithmetic call, we will stitch all 6 along both dimensions. ## Crop the fixed-size regions of a larger image ('-O' is the ## short form of the '--mode' option). $ astcrop large.fits -Oimg --section=1:100,1:300 -oa.fits $ astcrop large.fits -Oimg --section=101:200,1:300 -ob.fits $ astcrop large.fits -Oimg --section=201:300,1:300 -oc.fits $ astcrop large.fits -Oimg --section=1:100,301:600 -od.fits $ astcrop large.fits -Oimg --section=101:200,301:600 -oe.fits $ astcrop large.fits -Oimg --section=201:300,301:600 -of.fits ## Stitch the bottom three crops into one image. $ astarithmetic a.fits b.fits c.fits 3 1 stitch -obottom.fits # Stitch all the 6 crops along both dimensions $ astarithmetic a.fits b.fits c.fits 3 1 stitch \ d.fits e.fits f.fits 3 1 stitch \ 2 2 stitch -g1 -oall.fits The start of the last command is like the one before it (stitching the bottom three crops along the first FITS dimension, producing a $300\times300$ image). Later in the same command, we then stitch the top three crops horizontally (again, into a $300\times300$ image) This leaves the the two $300\times300$ images on the stack (see *note Reverse polish notation::). We finally stitch those two along the second (vertical) dimension. This operator is therefore useful in scenarios like placing the CCD amplifiers into one image. ‘collapse-sum’ Collapse the given dataset (second popped operand), by summing all elements along the first popped operand (a dimension in FITS standard: counting from one, from fastest dimension). The returned dataset has one dimension less compared to the input. The output will have a double-precision floating point type irrespective of the input dataset’s type. Doing the operation in double-precision (64-bit) floating point will help the collapse (summation) be affected less by floating point errors. But afterwards, single-precision floating points are usually enough in real (noisy) datasets. So depending on the type of the input and its nature, it is recommended to use one of the type conversion operators on the returned dataset. If any WCS is present, the returned dataset will also lack the respective dimension in its WCS matrix. Therefore, when the WCS is important for later processing, be sure that the input is aligned with the respective axes: all non-diagonal elements in the WCS matrix are zero. One common application of this operator is the creation of pseudo broad-band or narrow-band 2D images from 3D data cubes. for example, integral field unit (IFU) data products that have two spatial dimensions (first two FITS dimensions) and one spectral dimension (third FITS dimension). The command below will collapse the whole third dimension into a 2D array the size of the first two dimensions, and then convert the output to single-precision floating point (as discussed above). $ astarithmetic cube.fits 3 collapse-sum float32 ‘collapse-mean’ Similar to ‘collapse-sum’, but the returned dataset will be the mean value along the collapsed dimension, not the sum. ‘collapse-number’ Similar to ‘collapse-sum’, but the returned dataset will be the number of non-blank values along the collapsed dimension. The output will have a 32-bit signed integer type. If the input dataset does not have blank values, all the elements in the returned dataset will have a single value (the length of the collapsed dimension). Therefore this is mostly relevant when there are blank values in the dataset. ‘collapse-min’ Similar to ‘collapse-sum’, but the returned dataset will have the same numeric type as the input and will contain the minimum value for each pixel along the collapsed dimension. ‘collapse-max’ Similar to ‘collapse-sum’, but the returned dataset will have the same numeric type as the input and will contain the maximum value for each pixel along the collapsed dimension. ‘collapse-median’ Similar to ‘collapse-sum’, but the returned dataset will have the same numeric type as the input and will contain the median value for each pixel along the collapsed dimension. The median involves sorting, therefore ‘collapse-median’ will do each calculation in different CPU threads to speed up the operation. By default, Arithmetic will detect and use all available threads, but you can override this with the ‘--numthreads’ (or ‘-N’) option. ‘collapse-sigclip-mean’ Collapse the input dataset (fourth popped operand) along the FITS dimension given as the first popped operand by calculating the sigma-clipped mean. The sigma-clipping parameters (namely, the multiple of sigma and termination criteria) are read as the third and second popped operands respectively. For more on sigma-clipping, see *note Sigma clipping::. For example, with the command below, the pixels of the input 2 dimensional ‘image.fits’ will be collapsed to a single dimension output. The first popped operand is ‘2’, so it will collapse all the pixels that are vertically on top of each other. Such that the output will have the same number of pixels as the horizontal axis of the input. During the collapsing, all pixels that are more than $3\sigma$ (third popped operand) are rejected, and the clipping will continue until the standard deviation changes less than $0.2$ between clips. $ astarithmetic image.fits 3 0.2 2 collapse-sigclip-mean \ --output=collapsed-vertical.fits *Printing output of collapse in plain-text:* the default datatype of ‘collapse-sigclip-mean’ is 32-bit floating point. This is sufficient for any observed astronomical data. However, if you request a plain-text output, or decide to print/view the output as plain-text on the standard output, the full set of decimals may not be printed in some situations. This can lead to apparently discrete values in the output of this operator when viewed in plain-text! The FITS format is always superior (since it stores the value in binary, therefore not having the problem above). But if you are forced to save the output in plain-text, use the ‘float64’ operator after this to change the type to 64-bit floating point (which will print more decimals). ‘collapse-sigclip-std’ Collapse the input dataset along the given FITS dimension by calculating the sigma-clipped standard deviation. Except for returning the standard deviation after clipping, this function is similar to ‘collapse-sigclip-mean’, see the description of that operator for more. ‘collapse-sigclip-median’ Collapse the input dataset along the given FITS dimension by calculating the sigma-clipped median. Except for returning the median after clipping, this function is similar to ‘collapse-sigclip-mean’, see the description of that operator for more. ‘collapse-sigclip-number’ Collapse the input dataset along the given FITS dimension by calculating the number of elements that remain after sigma-clipped. Except for returning the number after clipping, this function is similar to ‘collapse-sigclip-mean’, see the description of that operator for more. ‘add-dimension-slow’ Build a higher-dimensional dataset from all the input datasets stacked after one another (along the slowest dimension). The first popped operand has to be a single number. It is used by the operator to know how many operands it should pop from the stack (and the size of the output in the new dimension). The rest of the operands must have the same size and numerical data type. This operator currently only works for 2D input operands, please contact us if you want inputs to have different dimensions. The output’s WCS (which should have a different dimensionality compared to the inputs) can be read from another file with the ‘--wcsfile’ option. If no file is specified for the WCS, the first dataset’s WCS will be used, you can later add/change the necessary WCS keywords with the FITS keyword modification features of the Fits program (see *note Fits::). If your datasets do not have the same type, you can use the type transformation operators of Arithmetic that are discussed below. Just beware of overflow if you are transforming to a smaller type, see *note Numeric data types::. For example, let’s assume you have 3 two-dimensional images ‘a.fits’, ‘b.fits’ and ‘c.fits’ (each with $200\times100$ pixels). You can construct a 3D data cube with $200\times100\times3$ voxels (volume-pixels) using the command below: $ astarithmetic a.fits b.fits c.fits 3 add-dimension-slow ‘add-dimension-fast’ Similar to ‘add-dimension-slow’ but along the fastest dimension. This operator currently only works for 1D input operands, please contact us if you want inputs to have different dimensions. For example, let’s assume you have 3 one-dimensional datasets, each with 100 elements. With this operator, you can construct a $3\times100$ pixel FITS image that has 3 pixels along the horizontal and 5 pixels along the vertical. 6.2.3.10 Conditional operators .............................. Conditional operators take two inputs and return a binary output that can only have two values 0 (for pixels where the condition was false) or 1 (for the pixels where the condition was true). Because of the binary (2-valued) nature of their outputs, the output is therefore stored in an ‘unsigned char’ data type (see *note Numeric data types::) to speed up process and take less space in your storage. There are two exceptions to the general features above: ‘isblank’ only takes one input, and ‘where’ takes three, while not returning a binary output, see their description for more. ‘lt’ Less than: creates a binary output (values either 0 or 1) where each pixel will be 1 if the second popped operand is smaller than the first popped operand and 0 otherwise. If both operands are images, then all the pixels will be compared with their counterparts in the other image. For example, the pixels in the output of the command below will have a value of 1 (true) if their value in ‘image1.fits’ is less than their value in ‘image2.fits’. Otherwise, their value will be 0 (false). $ astarithmetic image1.fits image2.fits lt If only one operand is an image, then all the pixels will be compared with the single value (number) of the other operand. For example: $ astaithmetic image1.fits 1000 lt Finally if both are numbers, then the output is also just one number (0 or 1). $ astarithmetic 4 5 lt ‘le’ Less or equal: similar to ‘lt’ (‘less than’ operator), but returning 1 when the second popped operand is smaller or equal to the first. For example $ astaithmetic image1.fits 1000 le ‘gt’ Greater than: similar to ‘lt’ (‘less than’ operator), but returning 1 when the second popped operand is greater than the first. For example $ astaithmetic image1.fits 1000 gt ‘ge’ Greater or equal: similar to ‘lt’ (‘less than’ operator), but returning 1 when the second popped operand is larger or equal to the first. For example $ astaithmetic image1.fits 1000 ge ‘eq’ Equality: similar to ‘lt’ (‘less than’ operator), but returning 1 when the two popped operands are equal (to double precision floating point accuracy). $ astaithmetic image1.fits 1000 eq ‘ne’ Non-Equality: similar to ‘lt’ (‘less than’ operator), but returning 1 when the two popped operands are _not_ equal (to double precision floating point accuracy). $ astaithmetic image1.fits 1000 ne ‘and’ Logical AND: returns 1 if both operands have a non-zero value and 0 if both are zero. Both operands have to be the same kind: either both images or both numbers and it mostly makes meaningful values when the inputs are binary (with pixel values of 0 or 1). $ astarithmetic image1.fits image2.fits -g1 and for example, if you only want to see which pixels in an image have a value _between_ 50 (greater equal, or inclusive) and 200 (less than, or exclusive), you can use this command: $ astarithmetic image.fits set-i i 50 ge i 200 lt and ‘or’ Logical OR: returns 1 if either one of the operands is non-zero and 0 only when both operators are zero. Both operands have to be the same kind: either both images or both numbers. The usage is similar to ‘and’. for example, if you only want to see which pixels in an image have a value _outside of_ -100 (greater equal, or inclusive) and 200 (less than, or exclusive), you can use this command: $ astarithmetic image.fits set-i i -100 lt i 200 ge or ‘not’ Logical NOT: returns 1 when the operand is 0 and 0 when the operand is non-zero. The operand can be an image or number, for an image, it is applied to each pixel separately. for example, if you want to know which pixels are not blank, you can use not on the output of the ‘isblank’ operator described below: $ astarithmetic image.fits isblank not ‘isblank’ Test for a blank value (see *note Blank pixels::). In essence, this is very similar to the conditional operators: the output is either 1 or 0 (see the ‘less than’ operator above). The difference is that it only needs one operand. For example: $ astarithmetic image.fits isblank Because of the definition of a blank pixel, a blank value is not even equal to itself, so you cannot use the equal operator above to select blank pixels. See the “Blank pixels” box below for more on Blank pixels in Arithmetic. ‘where’ Change the input (pixel) value _where_/if a certain condition holds. The conditional operators above can be used to define the condition. Three operands are required for ‘where’. The input format is demonstrated in this simplified example: $ astarithmetic modify.fits binary.fits if-true.fits where The value of any pixel in ‘modify.fits’ that corresponds to a non-zero _and_ non-blank pixel of ‘binary.fits’ will be changed to the value of the same pixel in ‘if-true.fits’ (this may also be a number). The 3rd and 2nd popped operands (‘modify.fits’ and ‘binary.fits’ respectively, see *note Reverse polish notation::) have to have the same dimensions/size. ‘if-true.fits’ can be either a number, or have the same dimension/size as the other two. The 2nd popped operand (‘binary.fits’) has to have ‘uint8’ (or ‘unsigned char’ in standard C) type (see *note Numeric data types::). It is treated as a binary dataset (with only two values: zero and non-zero, hence the name ‘binary.fits’ in this example). However, commonly you will not be dealing with an actual FITS file of a condition/binary image. You will probably define the condition in the same run based on some other reference image and use the conditional and logical operators above to make a true/false (or one/zero) image for you internally. for example, the case below: $ astarithmetic in.fits reference.fits 100 gt new.fits where In the example above, any of the ‘in.fits’ pixels that has a value in ‘reference.fits’ greater than ‘100’, will be replaced with the corresponding pixel in ‘new.fits’. Effectively the ‘reference.fits 100 gt’ part created the condition/binary image which was added to the stack (in memory) and later used by ‘where’. The command above is thus equivalent to these two commands: $ astarithmetic reference.fits 100 gt --output=binary.fits $ astarithmetic in.fits binary.fits new.fits where Finally, the input operands are read and used independently, so you can use the same file more than once as any of the operands. When the 1st popped operand to ‘where’ (‘if-true.fits’) is a single number, it may be a NaN value (or any blank value, depending on its type) like the example below (see *note Blank pixels::). When the number is blank, it will be converted to the blank value of the type of the 3rd popped operand (‘in.fits’). Hence, in the example below, all the pixels in ‘reference.fits’ that have a value greater than 100, will become blank in the natural data type of ‘in.fits’ (even though NaN values are only defined for floating point types). $ astarithmetic in.fits reference.fits 100 gt nan where 6.2.3.11 Mathematical morphology operators .......................................... From Wikipedia: “Mathematical morphology (MM) is a theory and technique for the analysis and processing of geometrical structures, based on set theory, lattice theory, topology, and random functions. MM is most commonly applied to digital images”. In theory it extends a very large body of research and methods in image processing, but currently in Gnuastro it mainly applies to images that are binary (only have a value of 0 or 1). for example, you have applied the greater-than operator (‘gt’, see *note Conditional operators::) to select all pixels in your image that are larger than a value of 100. But they will all have a value of 1, and you want to separate the various groups of pixels that are connected (for example, peaks of stars in your image). With the ‘connected-components’ operator, you can give each connected region of the output of ‘gt’ a separate integer label. ‘erode’ Erode the foreground pixels (with value ‘1’) of the input dataset (second popped operand). The first popped operand is the connectivity (see description in ‘connected-components’). Erosion is simply a flipping of all foreground pixels (with value ‘1’) to background (with value ‘0’) that are “touching” background pixels. “Touching” is defined by the connectivity. In effect, this operator “carves off” the outer borders of the foreground, making them thinner. This operator assumes a binary dataset (all pixels are ‘0’ or ‘1’). For example, imagine that you have an astronomical image with a mean/sky value of 0 units and a standard deviation ($\sigma$) of 100 units and many galaxies in it. With the first command below, you can apply a threshold of $2\sigma$ on the image (by only keeping pixels that are greater than 200 using the ‘gt’ operator). The output of thresholding the image is a binary image (each pixel is either smaller or equal to the threshold or larger than it). You can then erode the binary image with the second command below to remove very small false positives (one or two pixel peaks). $ astarithmetic image.fits 100 gt -obinary.fits $ astarithmetic binary.fits 2 erode -oout.fits In fact, you can merge these operations into one command thanks to the reverse polish notation (see *note Reverse polish notation::): $ astarithmetic image.fits 100 gt 2 erode -oout.fits To see the effect of connectivity, try this: $ astarithmetic image.fits 100 gt 1 erode -oout-con-1.fits ‘dilate’ Dilate the foreground pixels (with value ‘1’) of the binary input dataset (second popped operand). The first popped operand is the connectivity (see description in ‘connected-components’). Dilation is simply a flipping of all background pixels (with value ‘0’) to foreground (with value ‘1’) that are “touching” foreground pixels. “Touching” is defined by the connectivity. In effect, this expands the outer borders of the foreground. This operator assumes a binary dataset (all pixels are ‘0’ and ‘1’). The usage is similar to ‘erode’, for example: $ astarithmetic binary.fits 2 dilate -oout.fits ‘connected-components’ Find the connected components in the input dataset (second popped operand). The first popped is the connectivity used in the connected components algorithm. The second popped operand is the dataset where connected components are to be found. It is assumed to be a binary image (with values of 0 or 1). It must have an 8-bit unsigned integer type which is the format produced by conditional operators. This operator will return a labeled dataset where the non-zero pixels in the input will be labeled with a counter (starting from 1). The connectivity is a number between 1 and the number of dimensions in the dataset (inclusive). 1 corresponds to the weakest (symmetric) connectivity between elements and the number of dimensions the strongest. for example, on a 2D image, a connectivity of 1 corresponds to 4-connected neighbors and 2 corresponds to 8-connected neighbors. One example usage of this operator can be the identification of regions above a certain threshold, as in the command below. With this command, Arithmetic will first separate all pixels greater than 100 into a binary image (where pixels with a value of 1 are above that value). Afterwards, it will label all those that are connected. $ astarithmetic in.fits 100 gt 2 connected-components If your input dataset does not have a binary type, but you know all its values are 0 or 1, you can use the ‘uint8’ operator (below) to convert it to binary. ‘fill-holes’ Flip background (0) pixels surrounded by foreground (1) in a binary dataset. This operator takes two operands (similar to ‘connected-components’): the second is the binary (0 or 1 valued) dataset to fill holes in and the first popped operand is the connectivity (to define a hole). Imagine that in your dataset there are some holes with zero value inside the objects with one value (for example, the output of the thresholding example of ‘erode’) and you want to fill the holes: $ astarithmetic binary.fits 2 fill-holes ‘invert’ Invert an unsigned integer dataset (will not work on other data types, see *note Numeric data types::). This is the only operator that ignores blank values (which are set to be the maximum values in the unsigned integer types). This is useful in cases where the target(s) has(have) been imaged in absorption as raw formats (which are unsigned integer types). With this option, the maximum value for the given type will be subtracted from each pixel value, thus “inverting” the image, so the target(s) can be treated as emission. This can be useful when the higher-level analysis methods/tools only work on emission (positive skew in the noise, not negative). $ astarithmetic image.fits invert 6.2.3.12 Bitwise operators .......................... Astronomical images are usually stored as an array multi-byte pixels with different sizes for different precision levels (see *note Numeric data types::). for example, images from CCDs are usually in the unsigned 16-bit integer type (each pixel takes 16 bits, or 2 bytes, of memory) and fully reduced deep images have a 32-bit floating point type (each pixel takes 32 bits or 4 bytes). On the other hand, during the data reduction, we need to preserve a lot of meta-data about some pixels. For example, if a cosmic ray had hit the pixel during the exposure, or if the pixel was saturated, or is known to have a problem, or if the optical vignetting is too strong on it. A crude solution is to make a new image when checking for each one of these things and make a binary image where we flag (set to 1) pixels that satisfy any of these conditions above, and set the rest to zero. However, processing pipelines sometimes need more than 20 flags to store important per-pixel meta-data, and recall that the smallest numeric data type is one byte (or 8 bits, that can store up to 256 different values), while we only need two values for each flag! This is a major waste of storage space! A much more optimal solution is to use the bits within each pixel to store different flags! In other words, if you have an 8-bit pixel, use each bit as a flag to mark if a certain condition has happened on a certain pixel or not. For example, let’s set the following standard based on the four cases mentioned above: the first bit will show that a cosmic ray has hit that pixel. So if a pixel is only affected by cosmic rays, it will have this sequence of bits (note that the bit-counting starts from the right): ‘00000001’. The second bit shows that the pixel was saturated (‘00000010’), the third bit shows that it has known problems (‘00000100’) and the fourth bit shows that it was affected by vignetting (‘00001000’). Since each bit is independent, we can thus mark multiple metadata about that pixel in the actual image, within a single “flag” or “mask” pixel of a flag or mask image that has the same number of pixels. for example, a flag-pixel with the following bits ‘00001001’ shows that it has been affected by cosmic rays _and_ it has been affected by vignetting at the same time. The common data type to store these flagging pixels are unsigned integer types (see *note Numeric data types::). Therefore when you open an unsigned 8-bit flag image in a viewer like DS9, you will see a single integer in each pixel that actually has 8 layers of metadata in it! for example, the integer you will see for the bit sequences given above will respectively be: $2^0=1$ (for a pixel that only has cosmic ray), $2^1=2$ (for a pixel that was only saturated), $2^2=4$ (for a pixel that only has known problems), $2^3=8$ (for a pixel that is only affected by vignetting) and $2^0 + 2^3 = 9$ (for a pixel that has a cosmic ray _and_ was affected by vignetting). You can later use this bit information to mark objects in your final analysis or to mask certain pixels. for example, you may want to set all pixels affected by vignetting to NaN, but can interpolate over cosmic rays. You therefore need ways to separate the pixels with a desired flag(s) from the rest. It is possible to treat a flag pixel as a single integer (and try to define certain ranges in value to select certain flags). But a much more easier and robust way is to actually look at each pixel as a sequence of bits (not as a single integer!) and use the bitwise operators below for this job. For more on the theory behind bitwise operators, see Wikipedia (https://en.wikipedia.org/wiki/Bitwise_operation). ‘bitand’ Bitwise AND operator: only bits with values of 1 in both popped operands will get the value of 1, the rest will be set to 0. for example, (assuming numbers can be written as bit strings on the command-line): ‘00101000 00100010 bitand’ will give ‘00100000’. Note that the bitwise operators only work on integer type datasets. ‘bitor’ Bitwise inclusive OR operator: The bits where at least one of the two popped operands has a 1 value get a value of 1, the others 0. for example, (assuming numbers can be written as bit strings on the command-line): ‘00101000 00100010 bitand’ will give ‘00101010’. Note that the bitwise operators only work on integer type datasets. ‘bitxor’ Bitwise exclusive OR operator: A bit will be 1 if it differs between the two popped operands. for example, (assuming numbers can be written as bit strings on the command-line): ‘00101000 00100010 bitand’ will give ‘00001010’. Note that the bitwise operators only work on integer type datasets. ‘lshift’ Bitwise left shift operator: shift all the bits of the first operand to the left by a number of times given by the second operand. for example, (assuming numbers can be written as bit strings on the command-line): ‘00101000 2 lshift’ will give ‘10100000’. This is equivalent to multiplication by 4. Note that the bitwise operators only work on integer type datasets. ‘rshift’ Bitwise right shift operator: shift all the bits of the first operand to the right by a number of times given by the second operand. for example, (assuming numbers can be written as bit strings on the command-line): ‘00101000 2 rshift’ will give ‘00001010’. Note that the bitwise operators only work on integer type datasets. ‘bitnot’ Bitwise not (more formally known as one’s complement) operator: flip all the bits of the popped operand (note that this is the only unary, or single operand, bitwise operator). In other words, any bit with a value of ‘0’ is changed to ‘1’ and vice-versa. for example, (assuming numbers can be written as bit strings on the command-line): ‘00101000 bitnot’ will give ‘11010111’. Note that the bitwise operators only work on integer type datasets/numbers. 6.2.3.13 Numerical type conversion operators ............................................ With the operators below you can convert the numerical data type of your input, see *note Numeric data types::. Type conversion is particularly useful when dealing with integers, see *note Integer benefits and pitfalls::. As an example, let’s assume that your colleague gives you many single exposure images for processing, but they have a double-precision floating point type! You know that the statistical error a single-exposure image can never exceed 6 or 7 significant digits, so you would prefer to archive them as a single-precision floating point and save space on your computer (a double-precision floating point is also double the file size!). You can do this with the ‘float32’ operator described below. ‘uint8’ Convert the type of the popped operand to 8-bit unsigned integer type (see *note Numeric data types::). The internal conversion of C will be used. ‘int8’ Convert the type of the popped operand to 8-bit signed integer type (see *note Numeric data types::). The internal conversion of C will be used. ‘uint16’ Convert the type of the popped operand to 16-bit unsigned integer type (see *note Numeric data types::). The internal conversion of C will be used. ‘int16’ Convert the type of the popped operand to 16-bit signed integer (see *note Numeric data types::). The internal conversion of C will be used. ‘uint32’ Convert the type of the popped operand to 32-bit unsigned integer type (see *note Numeric data types::). The internal conversion of C will be used. ‘int32’ Convert the type of the popped operand to 32-bit signed integer type (see *note Numeric data types::). The internal conversion of C will be used. ‘uint64’ Convert the type of the popped operand to 64-bit unsigned integer (see *note Numeric data types::). The internal conversion of C will be used. ‘float32’ Convert the type of the popped operand to 32-bit (single precision) floating point (see *note Numeric data types::). The internal conversion of C will be used. for example, if ‘f64.fits’ is a 64-bit floating point image, and you want to store it as a 32-bit floating point image, you can use the command below (the second command is to show that the output file consumes half the storage) $ astarithmetic f64.fits float32 --output=f32.fits $ ls -lh f64.fits f32.fits ‘float64’ Convert the type of the popped operand to 64-bit (double precision) floating point (see *note Numeric data types::). The internal conversion of C will be used. 6.2.3.14 Random number generators ................................. When you simulate data (for example, see *note Sufi simulates a detection::), everything is ideal and there is no noise! The final step of the process is to add simulated noise to the data. The operators in this section are designed for that purpose. ‘mknoise-sigma’ Add a fixed noise (Gaussian standard deviation) to each element of the input dataset. This operator takes two arguments: the top/first popped operand is the noise standard deviation, the next popped operand is the dataset that the noise should be added to. When ‘--quiet’ is not given, a statement will be printed on each invocation of this operator (if there are multiple calls to the ‘mknoise-*’, the statement will be printed multiple times). It will show the random number generator function and seed that was used in that invocation, see *note Generating random numbers::. Reproducibility of the outputs can be ensured with the ‘--envseed’ option, see below for more. for example, with the first command below, ‘image.fits’ will be degraded by a noise of standard deviation 3 units. $ astarithmetic image.fits 3 mknoise-sigma Alternatively, you can use this operator within column arithmetic in the Table program, to generate a random number like below (centered on 0, with $\sigma=3$) like the first command below. With the second command, you can put it into a shell variable for later usage. $ echo 0 | asttable -c'arith $1 3 mknoise-sigma' $ value=$(echo 0 | asttable -c'arith $1 3 mknoise-sigma') $ echo $value You can also use this operator in combination with AWK to easily generate an arbitrarily large table with random columns. In the example below, we will create a two column table with 20 rows. The first column will be centered on 5 and $\sigma_1=2$, the second will be centered on 10 and $\sigma_2=3$: $ echo 5 10 \ | awk '{for(i=0;i<20;++i) print $1, $2}' \ | asttable -c'arith $1 2 mknoise-sigma' \ -c'arith $2 3 mknoise-sigma' By adding an extra ‘--output=random.fits’, the table will be saved into a file called ‘random.fits’, and you can change the ‘i<20’ to ‘i<5000’ to have 5000 rows instead. Of course, if your input table has different values in the desired column the noisy distribution will be centered on each input element, but all will have the same scatter/sigma. You can use the ‘--envseed’ option to fix the random number generator seed (and thus get a reproducible result). For more on ‘--envseed’, see *note Generating random numbers::. When using column arithmetic in Table, it may happen that multiple columns need random numbers (with any of the ‘mknoise-*’ operators) in one call of ‘asttable’. In such cases, the value given to ‘GSL_RNG_SEED’ is incremented by one on every call to the ‘mknoise-*’ operators. Without this increment, when the column values are the same (happens a lot, for no-noised datasets), the returned values for all columns will be identical. But this feature has a side-effect: that if the order of calling the ‘mknoise-*’ operators changes, the seeds used for each operator will change(1). In case each data element should have an independent sigma, the first popped operand can be a dataset of the same size as the second. In this case, for each element, a different noise measure (for example, sigma in ‘mknoise-sigma’) will be used. ‘mknoise-poisson’ Add Poisson noise to each element of the input dataset (see *note Photon counting noise::). This operator takes two arguments: 1. the first popped operand (just before the operator) is the _per-pixel_ background value (in units of electron counts). 2. The second popped operand is the dataset that the noise should be added to. Recall that the background values reported by observatories (for example, to define dark or gray nights), or in papers, is usually reported in units of magnitudes per arcseconds square. You need to do the conversion to counts per pixel manually. The conversion of magnitudes to counts is described below. For converting arcseconds squared to number of pixels, you can use the ‘--pixelscale’ option of *note Fits::. for example, ‘astfits image.fits --pixelscale’. Except for the noise-model, this operator is very similar to ‘mknoise-sigma’ and the examples there apply here too. The main difference with ‘mknoise-sigma’ is that in a Poisson distribution the scatter/sigma will depend on each element’s value. For example, let’s assume you have made a mock image called ‘mock.fits’ with *note MakeProfiles:: and it is assumed zero point is 22.5 (for more on the zero point, see *note Brightness flux magnitude::). Let’s assume the background level for the Poisson noise has a value of 19 magnitudes. You can first use the ‘mag-to-counts’ operator to convert this background magnitude into counts, then feed the background value in counts to ‘mknoise-poisson’ operator: $ astarithmetic mock.fits 19 22.5 mag-to-counts \ mknoise-poisson Try changing the background value from 19 to 10 to see the effect! Recall that the tutorial *note Sufi simulates a detection:: shows how you can use MakeProfiles to build mock images. ‘mknoise-uniform’ Add uniform noise to each element of the input dataset. This operator takes two arguments: the top/first popped operand is the width of the interval, the second popped operand is the dataset that the noise should be added to (each element will be the center of the interval). The returned random values may happen to be the minimum interval value, but will never be the maximum. Except for the noise-model, this operator behaves very similar to ‘mknoise-sigma’, see the explanation there for more. for example, with the command below, a random value will be selected between 10 to 14 (centered on 12, which is the only input data element, with a total width of 4). echo 12 | asttable -c'arith $1 4 mknoise-uniform' Similar to the example in ‘mknoise-sigma’, you can pipe the output of ‘echo’ to ‘awk’ before passing it to ‘asttable’ to generate a full column of uniformly selected values within the same interval. ‘random-from-hist-raw’ Generate random values from a custom distribution (defined by a histogram). The output will have a double-precision floating point type (see *note Numeric data types::). This operator takes three operands: • The first popped operand (nearest to the operator) is the histogram values. The histogram is a 1-dimensional dataset (a table column) and contains the probability of obtaining a certain interval of values. The histogram does not have to be normalized: the GNU Scientific Library (or GSL, which is used by Gnuastro for this operator), will normalize it internally. The value of each bin (whose probability is given in the histogram) is given in the second popped operand. Therefore these two operands have to have the same number of rows. • The second popped operand is the bin value (mostly the bin center, but it can be anything). The probability of each bin is defined in the histogram operand (first popped operand). The bins can have any width (do not have to be evenly spaced), and any order. Just make sure that the same row in the bins column corresponds to the same row in the histogram: the number of rows in the bins and histogram must be equal. • The third popped operand is the dataset that the random values should be written over. Effectively only its size will be used by this operator (all values will be over-written as a double-precision floating point number). The first two operands have to be single-dimensional (a table column) and have the same number of rows, but the last popped operand can have any number of dimensions. You can use the ‘load-col-’ operator to load the two bins and histogram columns from an external file (see *note Loading external columns::). For example, in the command below, we first construct a fake histogram to represent a $y=x^2$ distribution with AWK. We aim to distribute random values from this distribution in a $100\times100$ image. Therefore, we use the ‘makenew’ operator to construct an empty image of that size, use the ‘load-col-’ operator to load the histogram columns into Arithmetic and put the output in ‘random.fits’. Finally we visually inspect ‘random.fits’ with DS9 and also have a look at its pixel distribution with ‘aststatistics’. $ echo "" | awk '{for(i=1;i<5;++i) print i, i*i}' \ > histogram.txt $ cat histogram.txt 1 1 2 4 3 9 4 16 $ astarithmetic 100 100 2 makenew \ load-col-1-from-histogram.txt \ load-col-2-from-histogram.txt \ random-from-hist-raw \ --output=random.fits $ astscript-fits-view random.fits $ aststatistics random.fits --asciihist --numasciibins=50 | * | * | * | * | * * | * * | * * | * * * | * * * |* * * * |* * * * |-------------------------------------------------- As you see, the 10000 pixels in the image only have values 1, 2, 3 or 4 (which were the values in the bins column of ‘histogram.txt’), and the number of times each of these values occurs follows the $y=x^2$ distribution. Generally, any value given in the bins column will be used for the final output values. for example, in the command below (for generating a histogram from an analytical function), we are adding the bins by 20 (while keeping the same probability distribution of $y=x^2$). If you re-run the Arithmetic command above after this, you will notice that the pixels values are now one of the following 21, 22, 23 or 24 (instead of 1, 2, 3, or 4). But the shape of the histogram of the resulting random distribution will be unchanged. $ echo "" | awk '{for(i=1;i<5;++i) print 20+i, i*i}' \ > histogram.txt If you do not want the outputs to have exactly the value of the bin identifier, but be a randomly selected value from a uniform distribution within the bin, you should use ‘random-from-hist’ (see below). As mentioned above, the output will have a double-precision floating point type (see *note Numeric data types::). Therefore, by default each element of the output will consume 8 bytes (64-bits) of storage. This is usually far more than the statistical error/precision of your data (and just results in wasted storage in your file system, or wasted RAM when a program that uses the data is being run, and a slower running time of the program). It is therefore recommended to use a type-conversion operator after this operator to put the output in the smallest type that can be used to safely store your data without wasting storage, RAM or time. For the list of type conversion operators, see *note Numerical type conversion operators::. Recall that you already know the values returned by this operator (they are one of the values in the bins column). For example, in the example above, the whole image only has values 1, 2, 3 or 4. Since they are always positive and are below 255, we can safely place them in an unsigned 8-bit integer (see *note Numeric data types::) with the command below (note the ‘uint8’ after the operator name, and that we are using a different name for the output). After building the new image, let’s have a look at the sizes of the two images with ‘ls -l’: $ astarithmetic 100 100 2 makenew \ load-col-1-from-histogram.txt \ load-col-2-from-histogram.txt \ random-from-hist-raw uint8 \ --output=random-u8.fits $ ls -lh random.fits random-u8.fits -rw-r--r-- 1 name name 85K Jan 01 13:40 random.fits -rw-r--r-- 1 name name 17K Jan 01 13:45 random-u8.fits As you see, when using a suitable data type, we can shrink the size of the file significantly without loosing any information (from 85 kilobytes to 17 kilobytes). This difference can be felt much better for larger (real-world) datasets, so be sure to always set the output data type after calling this operator. ‘random-from-hist’ Similar to ‘random-from-hist-raw’, but do not return the exact bin value, instead return a random value from a uniform distribution within each bin. Therefore the following limitations have to be taken into account (compared to ‘random-from-hist-raw’): • The number associated with each bin (in the bin column) should be its center. • The bins have to be in descending order (so the second row in the bin column is larger than the first). • The bin widths (distance from one bin to another) have to be fixed. For a demonstration, let’s replace ‘random-from-hist-raw’ with ‘random-from-hist’ in the example of the description of ‘random-from-hist-raw’. Note how we are manually converting the output of this operator into single-precision floating point (32-bit, since the default 64-bit precision is statistically meaningless in this scenario and we do not want to waste storage, memory and running time): $ echo "" | awk '{for(i=1;i<5;++i) print i, i*i}' \ > histogram.txt $ astarithmetic 100 100 2 makenew \ load-col-1