# GNU Astronomy Utilities

Next: , Previous: , Up: (dir)   [Contents][Index]

# GNU Astronomy Utilities

This book documents version 0.13 of the GNU Astronomy Utilities (Gnuastro). Gnuastro provides various programs and libraries for astronomical data manipulation and analysis.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”.

To navigate easily in this web page, you can use the Next, Previous, Up and Contents links in the top and bottom of each page. Next and Previous will take you to the next or previous topic in the same level, for example from chapter 1 to chapter 2 or vice versa. To go to the sections or subsections, you have to click on the menu entries that are there when ever a sub-component to a title is present.

Next: , Previous: , Up: Top   [Contents][Index]

## 1 Introduction

GNU Astronomy Utilities (Gnuastro) is an official GNU package consisting of separate programs and libraries for the manipulation and analysis of astronomical data. All the programs share the same basic command-line user interface for the comfort of both the users and developers. Gnuastro is written to comply fully with the GNU coding standards so it integrates finely with the GNU/Linux operating system. This also enables astronomers to expect a fully familiar experience in the source code, building, installing and command-line user interaction that they have seen in all the other GNU software that they use. The official and always up to date version of this book (or manual) is freely available under GNU Free Doc. License in various formats (PDF, HTML, plain text, info, and as its Texinfo source) at http://www.gnu.org/software/gnuastro/manual/.

For users who are new to the GNU/Linux environment, unless otherwise specified most of the topics in Installation and Common program behavior are common to all GNU software, for example installation, managing command-line options or getting help (also see New to GNU/Linux?). So if you are new to this empowering environment, we encourage you to go through these chapters carefully. They can be a starting point from which you can continue to learn more from each program’s own manual and fully benefit from and enjoy this wonderful environment. Gnuastro also comes with a large set of libraries, so you can write your own programs using Gnuastro’s building blocks, see Review of library fundamentals for an introduction.

In Gnuastro, no change to any program or library will be committed to its history, before it has been fully documented here first. As discussed in Science and its tools this is a founding principle of the Gnuastro.

Next: , Previous: , Up: Introduction   [Contents][Index]

### 1.1 Quick start

The latest official release tarball is always available as gnuastro-latest.tar.gz. For better compression (faster download), and robust archival features, an Lzip compressed tarball is also available at gnuastro-latest.tar.lz, see Release tarball for more details on the tarball release1.

Let’s assume the downloaded tarball is in the TOPGNUASTRO directory. The first two commands below can be used to decompress the source. If you download tar.lz and your Tar implementation doesn’t recognize Lzip (the second command fails), run the third and fourth lines2. Note that lines starting with ## don’t need to be typed.

## Go into the download directory.
$cd TOPGNUASTRO ## Also works on tar.gz'. GNU Tar recognizes both formats.$ tar xf gnuastro-latest.tar.lz

## Only when previous command fails.
$lzip -d gnuastro-latest.tar.lz$ tar xf gnuastro-latest.tar


Gnuastro has three mandatory dependencies and some optional dependencies for extra functionality, see Dependencies for the full list. In Dependencies from package managers we have prepared the command to easily install Gnuastro’s dependencies using the package manager of some operating systems. When the mandatory dependencies are ready, you can configure, compile, check and install Gnuastro on your system with the following commands.

$cd gnuastro-X.X # Replace X.X with version number.$ ./configure
$make -j8 # Replace 8 with no. CPU threads.$ make check
$sudo make install  See Known issues if you confront any complications. For each program there is an ‘Invoke ProgramName’ sub-section in this book which explains how the programs should be run on the command-line (for example Invoking Table). You can read the same section on the command-line by running $ info astprogname (for example info asttable). The ‘Invoke ProgramName’ sub-section starts with a few examples of each program and goes on to explain the invocation details. See Getting help for all the options you have to get help. In Tutorials some real life examples of how these programs might be used are given.

Next: , Previous: , Up: Introduction   [Contents][Index]

### 1.2 Science and its tools

History of science indicates that there are always inevitably unseen faults, hidden assumptions, simplifications and approximations in all our theoretical models, data acquisition and analysis techniques. It is precisely these that will ultimately allow future generations to advance the existing experimental and theoretical knowledge through their new solutions and corrections.

In the past, scientists would gather data and process them individually to achieve an analysis thus having a much more intricate knowledge of the data and analysis. The theoretical models also required little (if any) simulations to compare with the data. Today both methods are becoming increasingly more dependent on pre-written software. Scientists are dissociating themselves from the intricacies of reducing raw observational data in experimentation or from bringing the theoretical models to life in simulations. These ‘intricacies’ are precisely those unseen faults, hidden assumptions, simplifications and approximations that define scientific progress.

Unfortunately, most persons who have recourse to a computer for statistical analysis of data are not much interested either in computer programming or in statistical method, being primarily concerned with their own proper business. Hence the common use of library programs and various statistical packages. ... It’s time that was changed.

F.J. Anscombe. The American Statistician, Vol. 27, No. 1. 1973

Anscombe’s quartet demonstrates how four data sets with widely different shapes (when plotted) give nearly identical output from standard regression techniques. Anscombe uses this (now famous) quartet, which was introduced in the paper quoted above, to argue that “Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer”. Echoing Anscombe’s concern after 44 years, some of the highly recognized statisticians of our time (Leek, McShane, Gelman, Colquhoun, Nuijten and Goodman), wrote in Nature that:

We need to appreciate that data analysis is not purely computational and algorithmic – it is a human behaviour....Researchers who hunt hard enough will turn up a result that fits statistical criteria – but their discovery will probably be a false positive.

Five ways to fix statistics, Nature, 551, Nov 2017.

Users of statistical (scientific) methods (software) are therefore not passive (objective) agents in their result. Therefore, it is necessary to actually understand the method, not just use it as a black box. The subjective experience gained by frequently using a method/software is not sufficient to claim an understanding of how the tool/method works and how relevant it is to the data and analysis. This kind of subjective experience is prone to serious misunderstandings about the data, what the software/statistical-method really does (especially as it gets more complicated), and thus the scientific interpretation of the result. This attitude is further encouraged through non-free software3, poorly written (or non-existent) scientific software manuals, and non-reproducible papers4. This approach to scientific software and methods only helps in producing dogmas and an “obscurantist faith in the expert’s special skill, and in his personal knowledge and authority5.

Program or be programmed. Choose the former, and you gain access to the control panel of civilization. Choose the latter, and it could be the last real choice you get to make.

Douglas Rushkoff. Program or be programmed, O/R Books (2010).

It is obviously impractical for any one human being to gain the intricate knowledge explained above for every step of an analysis. On the other hand, scientific data can be large and numerous, for example images produced by telescopes in astronomy. This requires efficient algorithms. To make things worse, natural scientists have generally not been trained in the advanced software techniques, paradigms and architecture that are taught in computer science or engineering courses and thus used in most software. The GNU Astronomy Utilities are an effort to tackle this issue.

Gnuastro is not just a software, this book is as important to the idea behind Gnuastro as the source code (software). This book has tried to learn from the success of the “Numerical Recipes” book in educating those who are not software engineers and computer scientists but still heavy users of computational algorithms, like astronomers. There are two major differences.

The first difference is that Gnuastro’s code and the background information are segregated: the code is moved within the actual Gnuastro software source code and the underlying explanations are given here in this book. In the source code, every non-trivial step is heavily commented and correlated with this book, it follows the same logic of this book, and all the programs follow a similar internal data, function and file structure, see Program source. Complementing the code, this book focuses on thoroughly explaining the concepts behind those codes (history, mathematics, science, software and usage advise when necessary) along with detailed instructions on how to run the programs. At the expense of frustrating “professionals” or “experts”, this book and the comments in the code also intentionally avoid jargon and abbreviations. The source code and this book are thus intimately linked, and when considered as a single entity can be thought of as a real (an actual software accompanying the algorithms) “Numerical Recipes” for astronomy.

The second major, and arguably more important, difference is that “Numerical Recipes” does not allow you to distribute any code that you have learned from it. In other words, it does not allow you to release your software’s source code if you have used their codes, you can only publicly release binaries (a black box) to the community. Therefore, while it empowers the privileged individual who has access to it, it exacerbates social ignorance. Exactly at the opposite end of the spectrum, Gnuastro’s source code is released under the GNU general public license (GPL) and this book is released under the GNU free documentation license. You are therefore free to distribute any software you create using parts of Gnuastro’s source code or text, or figures from this book, see Your rights.

With these principles in mind, Gnuastro’s developers aim to impose the minimum requirements on you (in computer science, engineering and even the mathematics behind the tools) to understand and modify any step of Gnuastro if you feel the need to do so, see Why C programming language? and Program design philosophy.

Without prior familiarity and experience with optics, it is hard to imagine how, Galileo could have come up with the idea of modifying the Dutch military telescope optics to use in astronomy. Astronomical objects could not be seen with the Dutch military design of the telescope. In other words, it is unlikely that Galileo could have asked a random optician to make modifications (not understood by Galileo) to the Dutch design, to do something no astronomer of the time took seriously. In the paradigm of the day, what could be the purpose of enlarging geometric spheres (planets) or points (stars)? In that paradigm only the position and movement of the heavenly bodies was important, and that had already been accurately studied (recently by Tycho Brahe).

In the beginning of his “The Sidereal Messenger” (published in 1610) he cautions the readers on this issue and before describing his results/observations, Galileo instructs us on how to build a suitable instrument. Without a detailed description of how he made his tools and done his observations, no reasonable person would believe his results. Before he actually saw the moons of Jupiter, the mountains on the Moon or the crescent of Venus, Galileo was “evasive”6 to Kepler. Science is defined by its tools/methods, not its raw results7.

The same is true today: science cannot progress with a black box, or poorly released code. The source code of a research is the new (abstractified) communication language in science, understandable by humans and computers. Source code (in any programming language) is a language/notation designed to express all the details that would be too tedious/long/frustrating to report in spoken languages like English, similar to mathematic notation.

Today, the quality of the source code that goes into a scientific result (and the distribution of that code) is as critical to scientific vitality and integrity, as the quality of its written language/English used in publishing/distributing its paper. A scientific paper will not even be reviewed by any respectable journal if its written in a poor language/English. A similar level of quality assessment is thus increasingly becoming necessary regarding the codes/methods used to derive the results of a scientific paper.

Bjarne Stroustrup (creator of the C++ language) says: “Without understanding software, you are reduced to believing in magic”. Ken Thomson (the designer or the Unix operating system) says “I abhor a system designed for the ‘user’ if that word is a coded pejorative meaning ‘stupid and unsophisticated’.” Certainly no scientist (user of a scientific software) would want to be considered a believer in magic, or stupid and unsophisticated.

This can happen when scientists get too distant from the raw data and methods, and are mainly discussing results. In other words, when they feel they have tamed Nature into their own high-level (abstract) models (creations), and are mainly concerned with scaling up, or industrializing those results. Roughly five years before special relativity, and about two decades before quantum mechanics fundamentally changed Physics, Lord Kelvin is quoted as saying:

There is nothing new to be discovered in physics now. All that remains is more and more precise measurement.

William Thomson (Lord Kelvin), 1900

A few years earlier Albert. A. Michelson made the following statement:

The more important fundamental laws and facts of physical science have all been discovered, and these are now so firmly established that the possibility of their ever being supplanted in consequence of new discoveries is exceedingly remote.... Our future discoveries must be looked for in the sixth place of decimals.

Albert. A. Michelson, dedication of Ryerson Physics Lab, U. Chicago 1894

If scientists are considered to be more than mere “puzzle” solvers8 (simply adding to the decimals of existing values or observing a feature in 10, 100, or 100000 more galaxies or stars, as Kelvin and Michelson clearly believed), they cannot just passively sit back and uncritically repeat the previous (observational or theoretical) methods/tools on new data. Today there is a wealth of raw telescope images ready (mostly for free) at the finger tips of anyone who is interested with a fast enough internet connection to download them. The only thing lacking is new ways to analyze this data and dig out the treasure that is lying hidden in them to existing methods and techniques.

New data that we insist on analyzing in terms of old ideas (that is, old models which are not questioned) cannot lead us out of the old ideas. However many data we record and analyze, we may just keep repeating the same old errors, missing the same crucially important things that the experiment was competent to find.

Jaynes, Probability theory, the logic of science. Cambridge U. Press (2003).

Next: , Previous: , Up: Introduction   [Contents][Index]

The paragraphs below, in this section, belong to the GNU Texinfo9 manual and are not written by us! The name “Texinfo” is just changed to “GNU Astronomy Utilities” or “Gnuastro” because they are released under the same licenses and it is beautifully written to inform you of your rights.

GNU Astronomy Utilities is “free software”; this means that everyone is free to use it and free to redistribute it on certain conditions. Gnuastro is not in the public domain; it is copyrighted and there are restrictions on its distribution, but these restrictions are designed to permit everything that a good cooperating citizen would want to do. What is not allowed is to try to prevent others from further sharing any version of Gnuastro that they might get from you.

Specifically, we want to make sure that you have the right to give away copies of the programs that relate to Gnuastro, that you receive the source code or else can get it if you want it, that you can change these programs or use pieces of them in new free programs, and that you know you can do these things.

To make sure that everyone has such rights, we have to forbid you to deprive anyone else of these rights. For example, if you distribute copies of the Gnuastro related programs, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights.

Also, for our own protection, we must make certain that everyone finds out that there is no warranty for the programs that relate to Gnuastro. If these programs are modified by someone else and passed on, we want their recipients to know that what they have is not what we distributed, so that any problems introduced by others will not reflect on our reputation.

The full text of the licenses for the Gnuastro book and software can be respectively found in GNU Gen. Pub. License v310 and GNU Free Doc. License11.

Next: , Previous: , Up: Introduction   [Contents][Index]

### 1.4 Naming convention

Gnuastro is a package of independent programs and a collection of libraries, here we are mainly concerned with the programs. Each program has an official name which consists of one or two words, describing what they do. The latter are printed with no space, for example NoiseChisel or Crop. On the command-line, you can run them with their executable names which start with an ast and might be an abbreviation of the official name, for example astnoisechisel or astcrop, see Executable names.

We will use “ProgramName” for a generic official program name and astprogname for a generic executable name. In this book, the programs are classified based on what they do and thoroughly explained. An alphabetical list of the programs that are installed on your system with this installation are given in Gnuastro programs list. That list also contains the executable names and version numbers along with a one line description.

Next: , Previous: , Up: Introduction   [Contents][Index]

### 1.5 Version numbering

Gnuastro can have two formats of version numbers, for official and unofficial releases. Official Gnuastro releases are announced on the info-gnuastro mailing list, they have a version control tag in Gnuastro’s development history, and their version numbers are formatted like “A.B”. A is a major version number, marking a significant planned achievement (for example see GNU Astronomy Utilities 1.0), while B is a minor version number, see below for more on the distinction. Note that the numbers are not decimals, so version 2.34 is much more recent than version 2.5, which is not equal to 2.50.

Gnuastro also allows a unique version number for unofficial releases. Unofficial releases can mark any point in Gnuastro’s development history. This is done to allow astronomers to easily use any point in the version controlled history for their data-analysis and research publication. See Version controlled source for a complete introduction. This section is not just for developers and is intended to straightforward and easy to read, so please have a look if you are interested in the cutting-edge. This unofficial version number is a meaningful and easy to read string of characters, unique to that particular point of history. With this feature, users can easily stay up to date with the most recent bug fixes and additions that are committed between official releases.

The unofficial version number is formatted like: A.B.C-D. A and B are the most recent official version number. C is the number of commits that have been made after version A.B. D is the first 4 or 5 characters of the commit hash number12. Therefore, the unofficial version number ‘3.92.8-29c8’, corresponds to the 8th commit after the official version 3.92 and its commit hash begins with 29c8. The unofficial version number is sort-able (unlike the raw hash) and as shown above is descriptive of the state of the unofficial release. Of course an official release is preferred for publication (since its tarballs are easily available and it has gone through more tests, making it more stable), so if an official release is announced prior to your publication’s final review, please consider updating to the official release.

The major version number is set by a major goal which is defined by the developers and user community before hand, for example see GNU Astronomy Utilities 1.0. The incremental work done in minor releases are commonly small steps in achieving the major goal. Therefore, there is no limit on the number of minor releases and the difference between the (hypothetical) versions 2.927 and 3.0 can be a small (negligible to the user) improvement that finalizes the defined goals.

Previous: , Up: Version numbering   [Contents][Index]

#### 1.5.1 GNU Astronomy Utilities 1.0

Currently (prior to Gnuastro 1.0), the aim of Gnuastro is to have a complete system for data manipulation and analysis at least similar to IRAF13. So an astronomer can take all the standard data analysis steps (starting from raw data to the final reduced product and standard post-reduction tools) with the various programs in Gnuastro.

The maintainers of each camera or detector on a telescope can provide a completely transparent shell script or Makefile to the observer for data analysis. This script can set configuration files for all the required programs to work with that particular camera. The script can then run the proper programs in the proper sequence. The user/observer can easily follow the standard shell script to understand (and modify) each step and the parameters used easily. Bash (or other modern GNU/Linux shell scripts) is powerful and made for this gluing job. This will simultaneously improve performance and transparency. Shell scripting (or Makefiles) are also basic constructs that are easy to learn and readily available as part of the Unix-like operating systems. If there is no program to do a desired step, Gnuastro’s libraries can be used to build specific programs.

The main factor is that all observatories or projects can freely contribute to Gnuastro and all simultaneously benefit from it (since it doesn’t belong to any particular one of them), much like how for-profit organizations (for example RedHat, or Intel and many others) are major contributors to free and open source software for their shared benefit. Gnuastro’s copyright has been fully awarded to GNU, so it doesn’t belong to any particular astronomer or astronomical facility or project.

Next: , Previous: , Up: Introduction   [Contents][Index]

### 1.6 New to GNU/Linux?

Some astronomers initially install and use a GNU/Linux operating system because their necessary tools can only be installed in this environment. However, the transition is not necessarily easy. To encourage you in investing the patience and time to make this transition, and actually enjoy it, we will first start with a basic introduction to GNU/Linux operating systems. Afterwards, in Command-line interface we’ll discuss the wonderful benefits of the command-line interface, how it beautifully complements the graphic user interface, and why it is worth the (apparently steep) learning curve. Finally a complete chapter (Tutorials) is devoted to real world scenarios of using Gnuastro (on the command-line). Therefore if you don’t yet feel comfortable with the command-line we strongly recommend going through that chapter after finishing this section.

You might have already noticed that we are not using the name “Linux”, but “GNU/Linux”. Please take the time to have a look at the following essays and FAQs for a complete understanding of this very important distinction.

In short, the Linux kernel14 is built using the GNU C library (glibc) and GNU compiler collection (gcc). The Linux kernel software alone is just a means for other software to access the hardware resources, it is useless alone: to say “running Linux”, is like saying “driving your carburetor”.

To have an operating system, you need lower-level (to build the kernel), and higher-level (to use it) software packages. The majority of such software in most Unix-like operating systems are GNU software: “the whole system is basically GNU with Linux loaded”. Therefore to acknowledge GNU’s instrumental role in the creation and usage of the Linux kernel and the operating systems that use it, we should call these operating systems “GNU/Linux”.

Previous: , Up: New to GNU/Linux?   [Contents][Index]

#### 1.6.1 Command-line interface

One aspect of Gnuastro that might be a little troubling to new GNU/Linux users is that (at least for the time being) it only has a command-line user interface (CLI). This might be contrary to the mostly graphical user interface (GUI) experience with proprietary operating systems. Since the various actions available aren’t always on the screen, the command-line interface can be complicated, intimidating, and frustrating for a first-time user. This is understandable and also experienced by anyone who started using the computer (from childhood) in a graphical user interface (this includes most of Gnuastro’s authors). Here we hope to convince you of the unique benefits of this interface which can greatly enhance your productivity while complementing your GUI experience.

Through GNOME 315, most GNU/Linux based operating systems now have an advanced and useful GUI. Since the GUI was created long after the command-line, some wrongly consider the command line to be obsolete. Both interfaces are useful for different tasks. For example you can’t view an image, video, pdf document or web page on the command-line. On the other hand you can’t reproduce your results easily in the GUI. Therefore they should not be regarded as rivals but as complementary user interfaces, here we will outline how the CLI can be useful in scientific programs.

You can think of the GUI as a veneer over the CLI to facilitate a small subset of all the possible CLI operations. Each click you do on the GUI, can be thought of as internally running a different CLI command. So asymptotically (if a good designer can design a GUI which is able to show you all the possibilities to click on) the GUI is only as powerful as the command-line. In practice, such graphical designers are very hard to find for every program, so the GUI operations are always a subset of the internal CLI commands. For programs that are only made for the GUI, this results in not including lots of potentially useful operations. It also results in ‘interface design’ to be a crucially important part of any GUI program. Scientists don’t usually have enough resources to hire a graphical designer, also the complexity of the GUI code is far more than CLI code, which is harmful for a scientific software, see Science and its tools.

For programs that have a GUI, one action on the GUI (moving and clicking a mouse, or tapping a touchscreen) might be more efficient and easier than its CLI counterpart (typing the program name and your desired configuration). However, if you have to repeat that same action more than once, the GUI will soon become frustrating and prone to errors. Unless the designers of a particular program decided to design such a system for a particular GUI action, there is no general way to run any possible series of actions automatically on the GUI.

On the command-line, you can run any series of of actions which can come from various CLI capable programs you have decided your self in any possible permutation with one command16. This allows for much more creativity and exact reproducibility that is not possible to a GUI user. For technical and scientific operations, where the same operation (using various programs) has to be done on a large set of data files, this is crucially important. It also allows exact reproducibility which is a foundation principle for scientific results. The most common CLI (which is also known as a shell) in GNU/Linux is GNU Bash, we strongly encourage you to put aside several hours and go through this beautifully explained web page: https://flossmanuals.net/command-line/. You don’t need to read or even fully understand the whole thing, only a general knowledge of the first few chapters are enough to get you going.

Since the operations in the GUI are limited and they are visible, reading a manual is not that important in the GUI (most programs don’t even have any!). However, to give you the creative power explained above, with a CLI program, it is best if you first read the manual of any program you are using. You don’t need to memorize any details, only an understanding of the generalities is needed. Once you start working, there are more easier ways to remember a particular option or operation detail, see Getting help.

To experience the command-line in its full glory and not in the GUI terminal emulator, press the following keys together: CTRL+ALT+F417 to access the virtual console. To return back to your GUI, press the same keys above replacing F4 with F7 (or F1, or F2, depending on your GNU/Linux distribution). In the virtual console, the GUI, with all its distracting colors and information, is gone. Enabling you to focus entirely on your actual work.

For operations that use a lot of your system’s resources (processing a large number of large astronomical images for example), the virtual console is the place to run them. This is because the GUI is not competing with your research work for your system’s RAM and CPU. Since the virtual consoles are completely independent, you can even log out of your GUI environment to give even more of your hardware resources to the programs you are running and thus reduce the operating time.

Since it uses far less system resources, the CLI is also convenient for remote access to your computer. Using secure shell (SSH) you can log in securely to your system (similar to the virtual console) from anywhere even if the connection speeds are low. There are apps for smart phones and tablets which allow you to do this.

Next: , Previous: , Up: Introduction   [Contents][Index]

### 1.7 Report a bug

According to Wikipedia “a software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways”. So when you see that a program is crashing, not reading your input correctly, giving the wrong results, or not writing your output correctly, you have found a bug. In such cases, it is best if you report the bug to the developers. The programs will also inform you if known impossible situations occur (which are caused by something unexpected) and will ask the users to report the bug issue.

Prior to actually filing a bug report, it is best to search previous reports. The issue might have already been found and even solved. The best place to check if your bug has already been discussed is the bugs tracker on Gnuastro project webpage at https://savannah.gnu.org/bugs/?group=gnuastro. In the top search fields (under “Display Criteria”) set the “Open/Closed” drop-down menu to “Any” and choose the respective program or general category of the bug in “Category” and click the “Apply” button. The results colored green have already been solved and the status of those colored in red is shown in the table.

Recently corrected bugs are probably not yet publicly released because they are scheduled for the next Gnuastro stable release. If the bug is solved but not yet released and it is an urgent issue for you, you can get the version controlled source and compile that, see Version controlled source.

To solve the issue as readily as possible, please follow the following to guidelines in your bug report. The How to Report Bugs Effectively and How To Ask Questions The Smart Way essays also provide some good generic advice for all software (don’t contact their authors for Gnuastro’s problems). Mastering the art of giving good bug reports (like asking good questions) can greatly enhance your experience with any free and open source software. So investing the time to read through these essays will greatly reduce your frustration after you see something doesn’t work the way you feel it is supposed to for a large range of software, not just Gnuastro.

Be descriptive

Please provide as many details as possible and be very descriptive. Explain what you expected and what the output was: it might be that your expectation was wrong. Also please clearly state which sections of the Gnuastro book (this book), or other references you have studied to understand the problem. This can be useful in correcting the book (adding links to likely places where users will check). But more importantly, it will be encouraging for the developers, since you are showing how serious you are about the problem and that you have actually put some thought into it. “To be able to ask a question clearly is two-thirds of the way to getting it answered.” – John Ruskin (1819-1900).

Individual and independent bug reports

If you have found multiple bugs, please send them as separate (and independent) bugs (as much as possible). This will significantly help us in managing and resolving them sooner.

Reproducible bug reports

If we cannot exactly reproduce your bug, then it is very hard to resolve it. So please send us a Minimal working example18 along with the description. For example in running a program, please send us the full command-line text and the output with the -P option, see Operating mode options. If it is caused only for a certain input, also send us that input file. In case the input FITS is large, please use Crop to only crop the problematic section and make it as small as possible so it can easily be uploaded and downloaded and not waste the archive’s storage, see Crop.

There are generally two ways to inform us of bugs:

• Send a mail to bug-gnuastro@gnu.org. Any mail you send to this address will be distributed through the bug-gnuastro mailing list19. This is the simplest way to send us bug reports. The developers will then register the bug into the project webpage (next choice) for you.
• Use the Gnuastro project webpage at https://savannah.gnu.org/projects/gnuastro/: There are two ways to get to the submission page as listed below. Fill in the form as described below and submit it (see Gnuastro project webpage for more on the project webpage).
• Using the top horizontal menu items, immediately under the top page title. Hovering your mouse on “Support” will open a drop-down list. Select “Submit new”.
• In the main body of the page, under the “Communication tools” section, click on “Submit new item”.

Once the items have been registered in the mailing list or webpage, the developers will add it to either the “Bug Tracker” or “Task Manager” trackers of the Gnuastro project webpage. These two trackers can only be edited by the Gnuastro project developers, but they can be browsed by anyone, so you can follow the progress on your bug. You are most welcome to join us in developing Gnuastro and fixing the bug you have found maybe a good starting point. Gnuastro is designed to be easy for anyone to develop (see Science and its tools) and there is a full chapter devoted to developing it: Developing.

Next: , Previous: , Up: Introduction   [Contents][Index]

### 1.8 Suggest new feature

We would always be happy to hear of suggested new features. For every program there are already lists of features that we are planning to add. You can see the current list of plans from the Gnuastro project webpage at https://savannah.gnu.org/projects/gnuastro/ and following “Tasks”→“Browse” on the horizontal menu at the top of the page immediately under the title, see Gnuastro project webpage. If you want to request a feature to an existing program, click on the “Display Criteria” above the list and under “Category”, choose that particular program. Under “Category” you can also see the existing suggestions for new programs or other cases like installation, documentation or libraries. Also be sure to set the “Open/Closed” value to “Any”.

If the feature you want to suggest is not already listed in the task manager, then follow the steps that are fully described in Report a bug. Please have in mind that the developers are all busy with their own astronomical research, and implementing existing “task”s to add or resolving bugs. Gnuastro is a volunteer effort and none of the developers are paid for their hard work. So, although we will try our best, please don’t not expect that your suggested feature be immediately included (with the next release of Gnuastro).

 Gnuastro is a collection of low level programs: As described in Program design philosophy, a founding principle of Gnuastro is that each library or program should be basic and low-level. High level jobs should be done by running the separate programs or using separate functions in succession through a shell script or calling the libraries by higher level functions, see the examples in Tutorials. So when making the suggestions please consider how your desired job can best be broken into separate steps and modularized.

Next: , Previous: , Up: Introduction   [Contents][Index]

### 1.9 Announcements

Gnuastro has a dedicated mailing list for making announcements (info-gnuastro). Anyone can subscribe to this mailing list. Anytime there is a new stable or test release, an email will be circulated there. The email contains a summary of the overall changes along with a detailed list (from the NEWS file). This mailing list is thus the best way to stay up to date with new releases, easily learn about the updated/new features, or dependencies (see Dependencies).

To subscribe to this list, please visit https://lists.gnu.org/mailman/listinfo/info-gnuastro. Traffic (number of mails per unit time) in this list is designed to be low: only a handful of mails per year. Previous announcements are available on its archive.

Next: , Previous: , Up: Introduction   [Contents][Index]

### 1.10 Conventions

In this book we have the following conventions:

• All commands that are to be run on the shell (command-line) prompt as the user start with a $. In case they must be run as a super-user or system administrator, they will start with a single #. If the command is in a separate line and next line is also in the code type face, but doesn’t have any of the $ or # signs, then it is the output of the command after it is run. As a user, you don’t need to type those lines. A line that starts with ## is just a comment for explaining the command to a human reader and must not be typed.
• If the command becomes larger than the page width a \ is inserted in the code. If you are typing the code by hand on the command-line, you don’t need to use multiple lines or add the extra space characters, so you can omit them. If you want to copy and paste these examples (highly discouraged!) then the \ should stay.

The \ character is a shell escape character which is used commonly to make characters which have special meaning for the shell loose that special place (the shell will not treat them specially if there is a \ behind them). When it is a last character in a line (the next character is a new-line character) the new-line character looses its meaning an the shell sees it as a simple white-space character, enabling you to use multiple lines to write your commands.

Previous: , Up: Introduction   [Contents][Index]

### 1.11 Acknowledgments

Gnuastro would not have been possible without scholarships and grants from several funding institutions. We thus ask that if you used Gnuastro in any of your papers/reports, please add the proper citation and acknowledge the funding agencies/projects. For details of which papers to cite (may be different for different programs) and get the acknowledgment statement to include in your paper, please run the relevant programs with the common --cite option like the example commands below (for more on --cite, please see Operating mode options).

$astnoisechisel --cite$ astmkcatalog --cite


Here, we’ll acknowledge all the institutions (and their grants) along with the people who helped make Gnuastro possible. The full list of Gnuastro authors is available at the start of this book and the AUTHORS file in the source code (both are generated automatically from the version controlled history). The plain text file THANKS, which is also distributed along with the source code, contains the list of people and institutions who played an indirect role in Gnuastro (not committed any code in the Gnuastro version controlled history).

The Japanese Ministry of Education, Culture, Sports, Science, and Technology (MEXT) scholarship for Mohammad Akhlaghi’s Masters and PhD degree in Tohoku University Astronomical Institute had an instrumental role in the long term learning and planning that made the idea of Gnuastro possible. The very critical view points of Professor Takashi Ichikawa (Mohammad’s adviser) were also instrumental in the initial ideas and creation of Gnuastro. Afterwards, the European Research Council (ERC) advanced grant 339659-MUSICOS (Principal investigator: Roland Bacon) was vital in the growth and expansion of Gnuastro. Working with Roland at the Centre de Recherche Astrophysique de Lyon (CRAL), enabled a thorough re-write of the core functionality of all libraries and programs, turning Gnuastro into the large collection of generic programs and libraries it is today. Work on improving Gnuastro and making it mature is now continuing primarily in the Instituto de Astrofisica de Canarias (IAC) and in particular in collaboration with Johan Knapen and Ignacio Trujillo.

In general, we would like to gratefully thank the following people for their useful and constructive comments and suggestions (in alphabetical order by family name): Valentina Abril-melgarejo, Marjan Akbari, Carlos Allende Prieto, Hamed Altafi, Roland Bacon, Roberto Baena Gallé, Zahra Bagheri, Karl Berry, Leindert Boogaard, Nicolas Bouché, Stefan Brüns, Fernando Buitrago, Adrian Bunk, Rosa Calvi, Mark Calabretta Nushkia Chamba, Benjamin Clement, Nima Dehdilani, Antonio Diaz Diaz, Alexey Dokuchaev, Pierre-Alain Duc, Elham Eftekhari, Gaspar Galaz, Thérèse Godefroy, Madusha Gunawardhana, Bruno Haible, Stephen Hamer, Takashi Ichikawa, Raúl Infante Sainz, Brandon Invergo, Oryna Ivashtenko, Aurélien Jarno, Lee Kelvin, Brandon Kelly, Mohammad-Reza Khellat, Johan Knapen, Geoffry Krouchi, Floriane Leclercq, Alan Lefor, Sebastián Luna Valero, Guillaume Mahler, Raphael Morales, Juan Molina Tobar, Francesco Montanari, Dmitrii Oparin, Bertrand Pain, William Pence, Mamta Pommier, Marcel Popescu, Bob Proulx, Joseph Putko, Samane Raji, Teymoor Saifollahi, Joanna Sakowska, Elham Saremi, Yahya Sefidbakht, Alejandro Serrano Borlaff, Zahra Sharbaf, David Shupe Jenny Sorce, Lee Spitler, Richard Stallman, Michael Stein, Ole Streicher, Alfred M. Szmidt, Michel Tallon, Juan C. Tello, Éric Thiébaut, Ignacio Trujillo, David Valls-Gabaud, Aaron Watkins, Michael H.F. Wilkinson, Christopher Willmer, Sara Yousefi Taemeh, Johannes Zabl. The GNU French Translation Team is also managing the French version of the top Gnuastro webpage which we highly appreciate. Finally we should thank all the (sometimes anonymous) people in various online forums which patiently answered all our small (but imporant) technical questions.

All work on Gnuastro has been voluntary, but the authors are most grateful to the following institutions (in chronological order) for hosting/supporting us in our research. Where necessary, these institutions have disclaimed any ownership of the parts of Gnuastro that were developed there, thus insuring the freedom of Gnuastro for the future (see Copyright assignment). We highly appreciate their support for free software, and thus free science, and therefore a free society.

Tohoku University Astronomical Institute, Sendai, Japan.
University of Salento, Lecce, Italy.
Centre de Recherche Astrophysique de Lyon (CRAL), Lyon, France.
Instituto de Astrofisica de Canarias (IAC), Tenerife, Spain.

Next: , Previous: , Up: Top   [Contents][Index]

## 2 Tutorials

To help new users have a smooth and easy start with Gnuastro, in this chapter several thoroughly elaborated tutorials, or cookbooks, are provided. These tutorials demonstrate the capabilities of different Gnuastro programs and libraries, along with tips and guidelines for the best practices of using them in various realistic situations.

We strongly recommend going through these tutorials to get a good feeling of how the programs are related (built in a modular design to be used together in a pipeline), very similar to the core Unix-based programs that they were modeled on. Therefore these tutorials will greatly help in optimally using Gnuastro’s programs (and generally, the Unix-like command-line environment) effectively for your research.

In Sufi simulates a detection, we’ll start with a fictional20 tutorial explaining how Abd al-rahman Sufi (903 – 986 A.D., the first recorded description of “nebulous” objects in the heavens is attributed to him) could have used some of Gnuastro’s programs for a realistic simulation of his observations and see if his detection of nebulous objects was trust-able. Because all conditions are under control in a simulated/mock environment/dataset, mock datasets can be a valuable tool to inspect the limitations of your data analysis and processing. But they need to be as realistic as possible, so the first tutorial is dedicated to this important step of an analysis.

The next two tutorials (General program usage tutorial and Detecting large extended targets) use real input datasets from some of the deep Hubble Space Telescope (HST) images and the Sloan Digital Sky Survey (SDSS) respectively. Their aim is to demonstrate some real-world problems that many astronomers often face and how they can be be solved with Gnuastro’s programs.

The ultimate aim of General program usage tutorial is to detect galaxies in a deep HST image, measure their positions and brightness and select those with the strongest colors. In the process, it takes many detours to introduce you to the useful capabilities of many of the programs. So please be patient in reading it. If you don’t have much time and can only try one of the tutorials, we recommend this one.

Detecting large extended targets deals with a major problem in astronomy: effectively detecting the faint outer wings of bright (and large) nearby galaxies to extremely low surface brightness levels (roughly one quarter of the local noise level in the example discussed). Besides the interesting scientific questions in these low-surface brightness features, failure to properly detect them will bias the measurements of the background objects and the survey’s noise estimates. This is an important issue, especially in wide surveys. Because bright/large galaxies and stars21, cover a significant fraction of the survey area.

In these tutorials, we have intentionally avoided too many cross references to make it more easy to read. For more information about a particular program, you can visit the section with the same name as the program in this book. Each program section in the subsequent chapters starts by explaining the general concepts behind what it does, for example see Convolve. If you only want practical information on running a program, for example its options/configuration, input(s) and output(s), please consult the subsection titled “Invoking ProgramName”, for example see Invoking NoiseChisel. For an explanation of the conventions we use in the example codes through the book, please see Conventions.

Next: , Previous: , Up: Tutorials   [Contents][Index]

### 2.1 Sufi simulates a detection

It is the year 953 A.D. and Abd al-rahman Sufi (903 – 986 A.D.)22 is in Shiraz as a guest astronomer. He had come there to use the advanced 123 centimeter astrolabe for his studies on the Ecliptic. However, something was bothering him for a long time. While mapping the constellations, there were several non-stellar objects that he had detected in the sky, one of them was in the Andromeda constellation. During a trip he had to Yemen, Sufi had seen another such object in the southern skies looking over the Indian ocean. He wasn’t sure if such cloud-like non-stellar objects (which he was the first to call ‘Sahābi’ in Arabic or ‘nebulous’) were real astronomical objects or if they were only the result of some bias in his observations. Could such diffuse objects actually be detected at all with his detection technique?

He still had a few hours left until nightfall (when he would continue his studies on the ecliptic) so he decided to find an answer to this question. He had thoroughly studied Claudius Ptolemy’s (90 – 168 A.D) Almagest and had made lots of corrections to it, in particular in measuring the brightness. Using his same experience, he was able to measure a magnitude for the objects and wanted to simulate his observation to see if a simulated object with the same brightness and size could be detected in a simulated noise with the same detection technique. The general outline of the steps he wants to take are:

1. Make some mock profiles in an over-sampled image. The initial mock image has to be over-sampled prior to convolution or other forms of transformation in the image. Through his experiences, Sufi knew that this is because the image of heavenly bodies is actually transformed by the atmosphere or other sources outside the atmosphere (for example gravitational lenses) prior to being sampled on an image. Since that transformation occurs on a continuous grid, to best approximate it, he should do all the work on a finer pixel grid. In the end he can re-sample the result to the initially desired grid size.
2. Convolve the image with a point spread function (PSF, see Point spread function) that is over-sampled to the same resolution as the mock image. Since he wants to finish in a reasonable time and the PSF kernel will be very large due to oversampling, he has to use frequency domain convolution which has the side effect of dimming the edges of the image. So in the first step above he also has to build the image to be larger by at least half the width of the PSF convolution kernel on each edge.
3. With all the transformations complete, the image should be re-sampled to the same size of the pixels in his detector.
4. He should remove those extra pixels on all edges to remove frequency domain convolution artifacts in the final product.
5. He should add noise to the (until now, noise-less) mock image. After all, all observations have noise associated with them.

Fortunately Sufi had heard of GNU Astronomy Utilities from a colleague in Isfahan (where he worked) and had installed it on his computer a year before. It had tools to do all the steps above. He had used MakeProfiles before, but wasn’t sure which columns he had chosen in his user or system wide configuration files for which parameters, see Configuration files. So to start his simulation, Sufi runs MakeProfiles with the -P option to make sure what columns in a catalog MakeProfiles currently recognizes and the output image parameters. In particular, Sufi is interested in the recognized columns (shown below).

$astmkprof -P [[[ ... Truncated lines ... ]]] # Output: type float32 # Type of output: e.g., int16, float32, etc... mergedsize 1000,1000 # Number of pixels along first FITS axis. oversample 5 # Scale of oversampling (>0 and odd). [[[ ... Truncated lines ... ]]] # Columns, by info (see --searchin'), or number (starting from 1): ccol 2 # Center along first FITS axis (horizontal). ccol 3 # Center along second FITS axis (vertical). fcol 4 # sersic (1), moffat (2), gaussian (3), # point (4), flat (5), circumference (6). rcol 5 # Effective radius or FWHM in pixels. ncol 6 # Sersic index or Moffat beta. pcol 7 # Position angle. qcol 8 # Axis ratio. mcol 9 # Magnitude. tcol 10 # Truncation in units of radius or pixels. [[[ ... Truncated lines ... ]]]  In Gnuastro, column counting starts from 1, so the columns are ordered such that the first column (number 1) can be an ID he specifies for each object (and MakeProfiles ignores), each subsequent column is used for another property of the profile. It is also possible to use column names for the values of these options and change these defaults, but Sufi preferred to stick to the defaults. Fortunately MakeProfiles has the capability to also make the PSF which is to be used on the mock image and using the --prepforconv option, he can also make the mock image to be larger by the correct amount and all the sources to be shifted by the correct amount. For his initial check he decides to simulate the nebula in the Andromeda constellation. The night he was observing, the PSF had roughly a FWHM of about 5 pixels, so as the first row (profile), he defines the PSF parameters and sets the radius column (rcol above, fifth column) to 5.000, he also chooses a Moffat function for its functional form. Remembering how diffuse the nebula in the Andromeda constellation was, he decides to simulate it with a mock Sérsic index 1.0 profile. He wants the output to be 499 pixels by 499 pixels, so he can put the center of the mock profile in the central pixel of the image (note that an even number doesn’t have a central element). Looking at his drawings of it, he decides a reasonable effective radius for it would be 40 pixels on this image pixel scale, he sets the axis ratio and position angle to approximately correct values too and finally he sets the total magnitude of the profile to 3.44 which he had accurately measured. Sufi also decides to truncate both the mock profile and PSF at 5 times the respective radius parameters. In the end he decides to put four stars on the four corners of the image at very low magnitudes as a visual scale. While he was preparing the catalog, one of his students approached him and was also following the steps. Using all the information above, he creates the catalog of mock profiles he wants in a file named cat.txt (short for catalog) using his favorite text editor and stores it in a directory named simulationtest in his home directory. [The cat command prints the contents of a file, short for “concatenation”. So please copy-paste the lines after “cat cat.txt” into cat.txt when the editor opens in the steps above it, note that there are 7 lines, first one starting with #. Also be careful when copying from the PDF format, the Info, web, or text formats shouldn’t have any problem]: $ mkdir ~/simulationtest
$cd ~/simulationtest$ pwd
/home/rahman/simulationtest
$emacs cat.txt$ ls
cat.txt
$cat cat.txt # Column 4: PROFILE_NAME [,str6] Radial profile's functional name 1 0.0000 0.0000 moffat 5.000 4.765 0.0000 1.000 30.000 5.000 2 250.00 250.00 sersic 40.00 1.000 -25.00 0.400 3.4400 5.000 3 50.000 50.000 point 0.000 0.000 0.0000 0.000 6.0000 0.000 4 450.00 50.000 point 0.000 0.000 0.0000 0.000 6.5000 0.000 5 50.000 450.00 point 0.000 0.000 0.0000 0.000 7.0000 0.000 6 450.00 450.00 point 0.000 0.000 0.0000 0.000 7.5000 0.000  The zero-point magnitude for his observation was 18. Now he has all the necessary parameters and runs MakeProfiles with the following command: $ astmkprof --prepforconv --mergedsize=499,499 --zeropoint=18.0 cat.txt
MakeProfiles started on Sat Oct  6 16:26:56 953
- 6 profiles read from cat.txt
- Random number generator (RNG) type: mt19937
---- row 2 complete, 5 left to go
---- row 3 complete, 4 left to go
---- row 4 complete, 3 left to go
---- row 5 complete, 2 left to go
---- ./0_cat.fits created.
---- row 0 complete, 1 left to go
---- row 1 complete, 0 left to go
- ./cat.fits created.                                0.041651 seconds
MakeProfiles finished in 0.267234 seconds

$ls 0_cat.fits cat.fits cat.txt  The file 0_cat.fits is the PSF Sufi had asked for, and cat.fits is the image containing the main objects in the catalog. The size of cat.fits was surprising for the student, instead of 499 by 499 (as we had requested), it was 2615 by 2615 pixels (from the command below): $ astfits cat.fits -h1 | grep NAXIS


So Sufi explained why oversampling is important in modeling, especially for parts of the image where the flux change is significant over a pixel. Recall that when you oversample the model (for example by 5 times), for every desired pixel, you get 25 pixels ($$5\times5$$). Sufi then explained that after convolving (next step below) we will down-sample the image to get our originally desired size/resolution.

Sufi then opened cat.fits [you can use any FITS viewer, for example, ds9]. After seeing the image, the student complained that only the large elliptical model for the Andromeda nebula can be seen in the center. He couldn’t see the four stars that we had also requested in the catalog. So Sufi had to explain that the stars are there in the image, but the reason that they aren’t visible when looking at the whole image at once, is that they only cover a single pixel! To prove it, he centered the image around the coordinates 2308 and 2308, where one of the stars is located in the over-sampled image [you can do this in ds9 by selecting “Pan” in the “Edit” menu, then clicking around that position]. Sufi then zoomed in to that region and soon, the star’s non-zero pixel could be clearly seen.

Sufi explained that the stars will take the shape of the PSF (cover an area of more than one pixel) after convolution. If we didn’t have an atmosphere and we didn’t need an aperture, then stars would only cover a single pixel with normal CCD resolutions. So Sufi convolved the image with this command:

$astconvolve --kernel=0_cat.fits cat.fits Convolve started on Mon Apr 6 16:35:32 953 - Using 8 CPU threads. - Input: cat.fits (hdu: 1) - Kernel: 0_cat.fits (hdu: 1) - Input and Kernel images padded. 0.075541 seconds - Images converted to frequency domain. 6.728407 seconds - Multiplied in the frequency domain. 0.040659 seconds - Converted back to the spatial domain. 3.465344 seconds - Padded parts removed. 0.016767 seconds - Output: cat_convolved.fits Convolve finished in: 10.422161 seconds$ls
0_cat.fits  cat_convolved.fits  cat.fits  cat.txt


When convolution finished, Sufi opened cat_convolved.fits and the four stars could be easily seen now. It was interesting for the student that all the flux in that single pixel is now distributed over so many pixels (the sum of all the pixels in each convolved star is actually equal to the value of the single pixel before convolution). Sufi explained how a PSF with a larger FWHM would make the points even wider than this (distributing their flux in a larger area). With the convolved image ready, they were prepared to re-sample it to the original pixel scale Sufi had planned [from the $astmkprof -P command above, recall that MakeProfiles had over-sampled the image by 5 times]. Sufi explained the basic concepts of warping the image to his student and ran Warp with the following command: $ astwarp --scale=1/5 --centeroncorner cat_convolved.fits
Warp started on Mon Apr  6 16:51:59 953
Input: cat_convolved.fits (hdu: 1)
matrix:
0.2000   0.0000   0.4000
0.0000   0.2000   0.4000
0.0000   0.0000   1.0000

$ls 0_cat.fits cat_convolved_scaled.fits cat.txt cat_convolved.fits cat.fits$ astfits -p cat_convolved_scaled.fits | grep NAXIS
NAXIS   =                    2 / number of data axes
NAXIS1  =                  523 / length of data axis 1
NAXIS2  =                  523 / length of data axis 2


cat_convolved_scaled.fits now has the correct pixel scale. However, the image is still larger than what we had wanted, it is 523 ($$499+12+12$$) by 523 pixels. The student is slightly confused, so Sufi also re-samples the PSF with the same scale by running

$astwarp --scale=1/5 --centeroncorner 0_cat.fits$ astfits -p 0_cat_scaled.fits | grep NAXIS
NAXIS   =                    2 / number of data axes
NAXIS1  =                   25 / length of data axis 1
NAXIS2  =                   25 / length of data axis 2


Sufi notes that $$25=(2\times12)+1$$ and goes on to explain how frequency space convolution will dim the edges and that is why he added the --prepforconv option to MakeProfiles, see If convolving afterwards. Now that convolution is done, Sufi can remove those extra pixels using Crop with the command below. Crop’s --section option accepts coordinates inclusively and counting from 1 (according to the FITS standard), so the crop region’s first pixel has to be 13, not 12.

$astcrop cat_convolved_scaled.fits --section=13:*-12,13:*-12 \ --mode=img --zeroisnotblank Crop started on Sat Oct 6 17:03:24 953 - Read metadata of 1 image. 0.001304 seconds ---- ...nvolved_scaled_cropped.fits created: 1 input. Crop finished in: 0.027204 seconds$ls
0_cat.fits          cat_convolved_scaled_cropped.fits  cat.fits
cat_convolved.fits  cat_convolved_scaled.fits          cat.txt


Finally, cat_convolved_scaled_cropped.fits is $$499\times499$$ pixels and the mock Andromeda galaxy is centered on the central pixel (open the image in a FITS viewer and confirm this by zooming into the center, note that an even-width image wouldn’t have a central pixel). This is the same dimensions as Sufi had desired in the beginning. All this trouble was certainly worth it because now there is no dimming on the edges of the image and the profile centers are more accurately sampled.

The final step to simulate a real observation would be to add noise to the image. Sufi set the zeropoint magnitude to the same value that he set when making the mock profiles and looking again at his observation log, he had measured the background flux near the nebula had a magnitude of 7 that night. So using these values he ran MakeNoise:

$astmknoise --zeropoint=18 --background=7 --output=out.fits \ cat_convolved_scaled_cropped.fits MakeNoise started on Mon Apr 6 17:05:06 953 - Generator type: ranlxs1 - Generator seed: 1428318100 MakeNoise finished in: 0.033491 (seconds)$ls
0_cat.fits         cat_convolved_scaled_cropped.fits cat.fits  out.fits
cat_convolved.fits cat_convolved_scaled.fits         cat.txt


The out.fits file now contains the noised image of the mock catalog Sufi had asked for. Seeing how the --output option allows the user to specify the name of the output file, the student was confused and wanted to know why Sufi hadn’t used it before? Sufi then explained to him that for intermediate steps it is best to rely on the automatic output, see Automatic output. Doing so will give all the intermediate files the same basic name structure, so in the end you can simply remove them all with the Shell’s capabilities. So Sufi decided to show this to the student by making a shell script from the commands he had used before.

The command-line shell has the capability to read all the separate input commands from a file. This is useful when you want to do the same thing multiple times, with only the names of the files or minor parameters changing between the different instances. Using the shell’s history (by pressing the up keyboard key) Sufi reviewed all the commands and then he retrieved the last 5 commands with the $history 5 command. He selected all those lines he had input and put them in a text file named mymock.sh. Then he defined the edge and base shell variables for easier customization later. Finally, before every command, he added some comments (lines starting with #) for future readability. edge=12 base=cat # Stop running next commands if one fails. set -e # Remove any (possibly) existing output (from previous runs) # before starting. rm -f out.fits # Run MakeProfiles to create an oversampled FITS image. astmkprof --prepforconv --mergedsize=499,499 --zeropoint=18.0 \ "$base".txt

# Convolve the created image with the kernel.
astconvolve --kernel=0_"$base".fits "$base".fits

# Scale the image back to the intended resolution.
astwarp --scale=1/5 --centeroncorner "$base"_convolved.fits # Crop the edges out (dimmed during convolution). ‘--section’ accepts # inclusive coordinates, so the start of start of the section must be # one pixel larger than its end. st_edge=$(( edge + 1 ))
astcrop "$base"_convolved_scaled.fits --zeroisnotblank \ --mode=img --section=$st_edge:*-$edge,$st_edge:*-$edge # Add noise to the image. astmknoise --zeropoint=18 --background=7 --output=out.fits \ "$base"_convolved_scaled_cropped.fits

# Remove all the temporary files.
rm 0*.fits "$base"*.fits  He used this chance to remind the student of the importance of comments in code or shell scripts: when writing the code, you have a good mental picture of what you are doing, so writing comments might seem superfluous and excessive. However, in one month when you want to re-use the script, you have lost that mental picture and remembering it can be time-consuming and frustrating. The importance of comments is further amplified when you want to share the script with a friend/colleague. So it is good to accompany any script/code with useful comments while you are writing it (create a good mental picture of what/why you are doing something). Sufi then explained to the eager student that you define a variable by giving it a name, followed by an = sign and the value you want. Then you can reference that variable from anywhere in the script by calling its name with a $ prefix. So in the script whenever you see $base, the value we defined for it above is used. If you use advanced editors like GNU Emacs or even simpler ones like Gedit (part of the GNOME graphical user interface) the variables will become a different color which can really help in understanding the script. We have put all the $base variables in double quotation marks (") so the variable name and the following text do not get mixed, the shell is going to ignore the " after replacing the variable value. To make the script executable, Sufi ran the following command:

$chmod +x mymock.sh  Then finally, Sufi ran the script, simply by calling its file name: $ ./mymock.sh


After the script finished, the only file remaining is the out.fits file that Sufi had wanted in the beginning. Sufi then explained to the student how he could run this script anywhere that he has a catalog if the script is in the same directory. The only thing the student had to modify in the script was the name of the catalog (the value of the base variable in the start of the script) and the value to the edge variable if he changed the PSF size. The student was also happy to hear that he won’t need to make it executable again when he makes changes later, it will remain executable unless he explicitly changes the executable flag with chmod.

The student was really excited, since now, through simple shell scripting, he could really speed up his work and run any command in any fashion he likes allowing him to be much more creative in his works. Until now he was using the graphical user interface which doesn’t have such a facility and doing repetitive things on it was really frustrating and some times he would make mistakes. So he left to go and try scripting on his own computer.

Sufi could now get back to his own work and see if the simulated nebula which resembled the one in the Andromeda constellation could be detected or not. Although it was extremely faint23, fortunately it passed his detection tests and he wrote it in the draft manuscript that would later become “Book of fixed stars”. He still had to check the other nebula he saw from Yemen and several other such objects, but they could wait until tomorrow (thanks to the shell script, he only has to define a new catalog). It was nearly sunset and they had to begin preparing for the night’s measurements on the ecliptic.

Next: , Previous: , Up: Tutorials   [Contents][Index]

### 2.2 General program usage tutorial

Measuring colors of astronomical objects in broad-band or narrow-band images is one of the most basic and common steps in astronomical analysis. Here, we will use Gnuastro’s programs to get a physical scale (area at certain redshifts) of the field we are studying, detect objects in a Hubble Space Telescope (HST) image, measure their colors and identify the ones with the strongest colors, do a visual inspection of these objects and inspect spatial position in the image. After this tutorial, you can also try the Detecting large extended targets tutorial which goes into a little more detail on detecting very low surface brightness signal.

During the tutorial, we will take many detours to explain, and practically demonstrate, the many capabilities of Gnuastro’s programs. In the end you will see that the things you learned during this tutorial are much more generic than this particular problem and can be used in solving a wide variety of problems involving the analysis of data (images or tables). So please don’t rush, and go through the steps patiently to optimally master Gnuastro.

In this tutorial, we’ll use the HSTeXtreme Deep Field dataset. Like almost all astronomical surveys, this dataset is free for download and usable by the public. You will need the following tools in this tutorial: Gnuastro, SAO DS9 24, GNU Wget25, and AWK (most common implementation is GNU AWK26).

This tutorial was first prepared for the “Exploring the Ultra-Low Surface Brightness Universe” workshop (November 2017) at the ISSI in Bern, Switzerland. It was further extended in the “4th Indo-French Astronomy School” (July 2018) organized by LIO, CRAL CNRS UMR5574, UCBL, and IUCAA in Lyon, France. We are very grateful to the organizers of these workshops and the attendees for the very fruitful discussions and suggestions that made this tutorial possible.

 Write the example commands manually: Try to type the example commands on your terminal manually and use the history feature of your command-line (by pressing the “up” button to retrieve previous commands). Don’t simply copy and paste the commands shown here. This will help simulate future situations when you are processing your own datasets.

Next: , Previous: , Up: General program usage tutorial   [Contents][Index]

#### 2.2.1 Calling Gnuastro’s programs

A handy feature of Gnuastro is that all program names start with ast. This will allow your command-line processor to easily list and auto-complete Gnuastro’s programs for you. Try typing the following command (press TAB key when you see <TAB>) to see the list:

$ast<TAB><TAB>  Any program that starts with ast (including all Gnuastro programs) will be shown. By choosing the subsequent characters of your desired program and pressing <TAB><TAB> again, the list will narrow down and the program name will auto-complete once your input characters are unambiguous. In short, you often don’t need to type the full name of the program you want to run. Next: , Previous: , Up: General program usage tutorial [Contents][Index] #### 2.2.2 Accessing documentation Gnuastro contains a large number of programs and it is natural to forget the details of each program’s options or inputs and outputs. Therefore, before starting the analysis steps of this tutorial, let’s review how you can access this book to refresh your memory any time you want, without having to take your hands off the keyboard. When you install Gnuastro, this book is also installed on your system along with all the programs and libraries, so you don’t need an internet connection to to access/read it. Also, by accessing this book as described below, you can be sure that it corresponds to your installed version of Gnuastro. GNU Info27 is the program in charge of displaying the manual on the command-line (for more, see Info). To see this whole book on your command-line, please run the following command and press subsequent keys. Info has its own mini-environment, therefore we’ll show the keys that must be pressed in the mini-environment after a -> sign. You can also ignore anything after the # sign in the middle of the line, they are only for your information. $ info gnuastro                # Open the top of the manual.
-> <SPACE>                     # All the book chapters.
-> <SPACE>                     # Continue down: show sections.
-> <SPACE> ...                 # Keep pressing space to go down.


The thing that greatly simplifies navigation in Info is the links (regions with an underline). You can immediately go to the next link in the page with the <TAB> key and press <ENTER> on it to go into that part of the manual. Try the commands above again, but this time also use <TAB> to go to the links and press <ENTER> on them to go to the respective section of the book. Then follow a few more links and go deeper into the book. To return to the previous page, press l (small L). If you are searching for a specific phrase in the whole book (for example an option name), press s and type your search phrase and end it with an <ENTER>.

You don’t need to start from the top of the manual every time. For example, to get to Invoking NoiseChisel, run the following command. In general, all programs have such an “Invoking ProgramName” section in this book. These sections are specifically for the description of inputs, outputs and configuration options of each program. You can access them directly for each program by giving its executable name to Info.

$info astnoisechisel  The other sections don’t have such shortcuts. To directly access them from the command-line, you need to tell Info to look into Gnuastro’s manual, then look for the specific section (an unambiguous title is necessary). For example, if you only want to review/remember NoiseChisel’s Detection options), just run the following command. Note how case is irrelevant for Info when calling a title in this manner. $ info gnuastro "Detection options"


In general, Info is a powerful and convenient way to access this whole book with detailed information about the programs you are running. If you are not already familiar with it, please run the following command and just read along and do what it says to learn it. Don’t stop until you feel sufficiently fluent in it. Please invest the half an hour’s time necessary to start using Info comfortably. It will greatly improve your productivity and you will start reaping the rewards of this investment very soon.

$info info  As a good scientist you need to feel comfortable to play with the features/options and avoid (be critical to) using default values as much as possible. On the other hand, our human memory is limited, so it is important to be able to easily access any part of this book fast and remember the option names, what they do and their acceptable values. If you just want the option names and a short description, calling the program with the --help option might also be a good solution like the first example below. If you know a few characters of the option name, you can feed the output to grep like the second or third example commands. $ astnoisechisel --help
$astnoisechisel --help | grep quant$ astnoisechisel --help | grep check


Next: , Previous: , Up: General program usage tutorial   [Contents][Index]

The first step in the analysis of the tutorial is to download the necessary input datasets. First, to keep things clean, let’s create a gnuastro-tutorial directory and continue all future steps in it:

$mkdir gnuastro-tutorial$ cd gnuastro-tutorial


We will be using the near infra-red Wide Field Camera dataset. If you already have them in another directory (for example XDFDIR, with the same FITS file names), you can set the download directory to be a symbolic link to XDFDIR with a command like this:

$ln -s XDFDIR download  Otherwise, when the following images aren’t already present on your system, you can make a download directory and download them there. $ mkdir download
$cd download$ xdfurl=http://archive.stsci.edu/pub/hlsp/xdf
$wget$xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_f105w_v1_sci.fits
$wget$xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_f125w_v1_sci.fits
$wget$xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits
$cd ..  In this tutorial, we’ll just use these three filters. Later, you may need to download more filters. To do that, you can use the shell’s for loop to download them all in series (one after the other28) with one command like the one below for the WFC3 filters. Put this command instead of the three wget commands above. Recall that all the extra spaces, back-slashes (\), and new lines can be ignored if you are typing on the lines on the terminal. $ for f in f105w f125w f140w f160w; do \
wget $xdfurl/hlsp_xdf_hst_wfc3ir-60mas_hudf_"$f"_v1_sci.fits; \
done


Next: , Previous: , Up: General program usage tutorial   [Contents][Index]

#### 2.2.4 Dataset inspection and cropping

First, let’s visually inspect the datasets we downloaded in Setup and data download. Let’s take F160W image as an example. Do the steps below with the other image(s) too (and later with any dataset that you want to work on). It is very important to get a good visual feeling of the dataset you intend to use. Also, note how SAO DS9 (used here for visual inspection of FITS images) doesn’t follow the GNU style of options where “long” and “short” options are preceded by -- and - respectively (for example --width and -w, see Options).

Run the command below to see the F160W image with DS9. Ds9’s -zscale scaling is good to visually highlight the low surface brightness regions, and as the name suggests, -zoom to fit will fit the whole dataset in the window. If the window is too small, expand it with your mouse, then press the “zoom” button on the top row of buttons above the image. Afterwards, in the bottom row of buttons, press “zoom fit”. You can also zoom in and out by scrolling your mouse or the respective operation on your touch-pad when your cursor/pointer is over the image.

$ds9 download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits \ -zscale -zoom to fit  As you hover your mouse over the image, notice how the “Value” and positional fields on the top of the ds9 window get updated. The first thing you might notice is that when you hover the mouse over the regions with no data, they have a value of zero. The next thing might be that the dataset actually has two “depth”s (see Quantifying measurement limits). Recall that this is a combined/reduced image of many exposures, and the parts that have more exposures are deeper. In particular, the exposure time of the deep inner region is larger than 4 times of the outer (more shallower) parts. To simplify the analysis in this tutorial, we’ll only be working on the deep field, so let’s crop it out of the full dataset. Fortunately the XDF survey webpage (above) contains the vertices of the deep flat WFC3-IR field. With Gnuastro’s Crop program29, you can use those vertices to cutout this deep region from the larger image. But before that, to keep things organized, let’s make a directory called flat-ir and keep the flat (single-depth) regions in that directory (with a ‘xdf-’ suffix for a shorter and easier filename). $ mkdir flat-ir
$astcrop --mode=wcs -h0 --output=flat-ir/xdf-f105w.fits \ --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \ 53.134517,-27.787144 : 53.161906,-27.807208" \ download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f105w_v1_sci.fits$ astcrop --mode=wcs -h0 --output=flat-ir/xdf-f125w.fits \
--polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \
53.134517,-27.787144 : 53.161906,-27.807208" \

$astcrop --mode=wcs -h0 --output=flat-ir/xdf-f160w.fits \ --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \ 53.134517,-27.787144 : 53.161906,-27.807208" \ download/hlsp_xdf_hst_wfc3ir-60mas_hudf_f160w_v1_sci.fits  The only thing varying in the three calls to Gnuastro’s Crop program is the filter name! Note how everything else is the same. In such cases, you should generally avoid repeating a command manually, it is prone to many bugs, and as you see, it is very hard to read (didn’t you suddenly write a 7 as an 8?). To simplify the command, and later allow work on more filters, we can use the shell’s for loop as shown below. Notice how the place where the filter names (f105w, f125w and f160w) are used above, have been replaced with$f (the shell variable that for will update in every loop) below.

$rm flat-ir/*.fits$ for f in f105w f125w f160w; do \
astcrop --mode=wcs -h0 --output=flat-ir/xdf-$f.fits \ --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \ 53.134517,-27.787144 : 53.161906,-27.807208" \ download/hlsp_xdf_hst_wfc3ir-60mas_hudf_"$f"_v1_sci.fits; \
done


Please open these images and inspect them with the same ds9 command you used above. You will see how it is nicely flat now and doesn’t have varying depths. Another important result of this crop is that regions with no data now have a NaN (Not-a-Number, or a blank value) value. In the downloaded files, such regions had a value of zero. However, zero is a number, and is thus meaningful, especially when you later want to NoiseChisel30. Generally, when you want to ignore some pixels in a dataset, and avoid higher-level ambiguities or complications, it is always best to give them blank values (not zero, or some other absurdly large or small number). Gnuastro has the Arithmetic program for such cases, and we’ll introduce it later in this tutorial.

Next: , Previous: , Up: General program usage tutorial   [Contents][Index]

#### 2.2.5 Angular coverage on the sky

This is the deepest image we currently have of the sky. The first thing that comes to mind may be this: “How large is this field on the sky?”. The FITS world coordinate system (WCS) meta data standard contains the key to answering this question. Run the following command to see all the FITS keywords (metadata) for one of the images (mostly the same with the other filters because they were are scaled to the same region of Sky):

astfits flat-ir/xdf-f160w.fits -h1


Look into the keywords grouped under the ‘World Coordinate System (WCS)’ title. These keywords define how the image relates to the outside world. In particular, the CDELT* keywords (or CDELT1 and CDELT2 in this 2D image) contain the “Coordinate DELTa” (or change in coordinate units) with a change in one pixel. But what is the units of each “world” coordinate? The CUNIT* keywords (for “Coordinate UNIT”) have the answer. In this case, both CUNIT1 and CUNIT1 have a value of deg, so both “world” coordiantes are in units of degrees. We can thus conclude that the value of CDELT* is in units of degrees-per-pixel31.

With the commands below, we’ll use CDELT (along with the image size) to find the answer of our initial question: “how much of the sky does this image cover?”. The lines starting with ## are just comments for you to read and understand each command. Don’t type them on the terminal. The commands are intentionally repetitive in some places to better understand each step and also to demonstrate the beauty of command-line features like history, variables, pipes and loops (which you will commonly use as you master the command-line).

 Use shell history: Don’t forget to make effective use of your shell’s history: you don’t have to re-type previous command to add something to them. This is especially convenient when you just want to make a small change to your previous command. Press the “up” key on your keyboard (possibly multiple times) to see your previous command(s) and modify them accordingly.
## See the general statistics of non-blank pixel values.
$aststatistics flat-ir/xdf-f160w.fits ## We only want the number of non-blank pixels.$ aststatistics flat-ir/xdf-f160w.fits --number

## Keep the result of the command above in the shell variable n'.
$n=$(aststatistics flat-ir/xdf-f160w.fits --number)

## See what is stored the shell variable n'.
$echo$n

## Show all the FITS keywords of this image.
$astfits flat-ir/xdf-f160w.fits -h1 ## The resolution (in degrees/pixel) is in the CDELT' keywords. ## Only show lines that contain these characters, by feeding ## the output of the previous command to the grep' program.$ astfits flat-ir/xdf-f160w.fits -h1 | grep CDELT

## Since the resolution of both dimensions is (approximately) equal,
## we'll only use one of them (CDELT1).
$astfits flat-ir/xdf-f160w.fits -h1 | grep CDELT1 ## To extract the value (third token in the line above), we'll ## feed the output to AWK. Note that the first two tokens are ## CDELT1' and ='.$ astfits flat-ir/xdf-f160w.fits -h1 | grep CDELT1 | awk '{print $3}' ## Save it as the shell variable r'.$ r=$(astfits flat-ir/xdf-f160w.fits -h1 | grep CDELT1 \ | awk '{print$3}')

## Print the values of n' and r'.
$echo$n $r ## Use the number of pixels (first number passed to AWK) and ## length of each pixel's edge (second number passed to AWK) ## to estimate the area of the field in arc-minutes squared.$ echo $n$r | awk '{print $1 * ($2^2) * 3600}'


The output of the last command (area of this field) is 4.03817 (or approximately 4.04) arc-minutes squared. Just for comparison, this is roughly 175 times smaller than the average moon’s angular area (with a diameter of 30arc-minutes or half a degree).

 AWK for table/value processing: As you saw above AWK is a powerful and simple tool for text processing. You will see it often in shell scripts. GNU AWK (the most common implementation) comes with a free and wonderful book in the same format as this book which will allow you to master it nicely. Just like this manual, you can also access GNU AWK’s manual on the command-line whenever necessary without taking your hands off the keyboard. Just run info awk.

#### 2.2.6 Cosmological coverage

Having found the angular coverage of the dataset in Angular coverage on the sky, we can now use Gnuastro to answer a more physically motivated question: “How large is this area at different redshifts?”. To get a feeling of the tangential area that this field covers at redshift 2, you can use Gnuastro’s CosmicCalcular program (CosmicCalculator). In particular, you need the tangential distance covered by 1 arc-second as raw output. Combined with the field’s area that was measured before, we can calculate the tangential distance in Mega Parsecs squared ($$Mpc^2$$).

## Print general cosmological properties at redshift 2 (for example).
$astcosmiccal -z2 ## When given a "Specific calculation" option, CosmicCalculator ## will just print that particular calculation. To see all such ## calculations, add a --help' token to the previous command ## (under the same title). Note that with --help', no processing ## is done, so you can always simply append it to remember ## something without modifying the command you want to run.$ astcosmiccal -z2 --help

## Only print the "Tangential dist. covered by 1arcsec at z (kpc)".
## in units of kpc/arc-seconds.
$astcosmiccal -z2 --arcsectandist ## But its easier to use the short version of this option (which ## can be appended to other short options.$ astcosmiccal -sz2

## Convert this distance to kpc^2/arcmin^2 and save in k'.
$k=$(astcosmiccal -sz2 | awk '{print ($1*60)^2}') ## Re-calculate the area of the dataset in arcmin^2.$ n=$(aststatistics flat-ir/xdf-f160w.fits --number)$ r=$(astfits flat-ir/xdf-f160w.fits -h1 | grep CDELT1 \ | awk '{print$3}')
$a=$(echo $n$r | awk '{print $1 * ($2^2) * 3600}')

## Multiply k' and a' and divide by 10^6 for value in Mpc^2.
$echo$k $a | awk '{print$1 * $2 / 1e6}'  At redshift 2, this field therefore covers approximately 1.07 $$Mpc^2$$. If you would like to see how this tangential area changes with redshift, you can use a shell loop like below. $ for z in 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0; do        \
k=$(astcosmiccal -sz$z);                                  \
echo $z$k $a | awk '{print$1, ($2*60)^2 *$3 / 1e6}';   \
done


Fortunately, the shell has a useful tool/program to print a sequence of numbers that is nicely called seq. You can use it instead of typing all the different redshifts in this example. For example the loop below will calculate and print the tangential coverage of this field across a larger range of redshifts (0.1 to 5) and with finer increments of 0.1.

$for z in$(seq 0.1 0.1 5); do                                  \
k=$(astcosmiccal -z$z --arcsectandist);                      \
echo $z$k $area | awk '{print$1, ($2*60)^2 *$3 / 1e6}';   \
done


#### 2.2.7 Building custom programs with the library

In Cosmological coverage, we repeated a certain calculation/output of a program multiple times using the shell’s for loop. This simple way repeating a calculation is great when it is only necessary once. However, if you commonly need this calculation and possibly for a larger number of redshifts at higher precision, the command above can be slow (try it out to see).

This slowness of the repeated calls to a generic program (like CosmicCalculator), is because it can have a lot of overhead on each call. To be generic and easy to operate, it has to parse the command-line and all configuration files (see Option management and configuration files) which contain human-readable characters and need a lot of pre-processing to be ready for processing by the computer. Afterwards, CosmicCalculator has to check the sanity of its inputs and check which of its many options you have asked for. All the this pre-processing takes as much time as the high-level calculation you are requesting, and it has to re-do all of these for every redshift in your loop.

To greatly speed up the processing, you can directly access the core work-horse of CosmicCalculator without all that overhead by designing your custom program for this job. Using Gnuastro’s library, you can write your own tiny program particularly designed for this exact calculation (and nothing else!). To do that, copy and paste the following C program in a file called myprogram.c.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <gnuastro/cosmology.h>

int
main(void)
{
double area=4.03817;          /* Area of field (arcmin^2). */
double z, adist, tandist;     /* Temporary variables.      */

/* Constants from Plank 2018 (arXiv:1807.06209, Table 2) */

/* Do the same thing for all redshifts (z) between 0.1 and 5. */
for(z=0.1; z<5; z+=0.1)
{
/* Calculate the angular diameter distance. */

/* Calculate the tangential distance of one arcsecond. */
tandist = adist * 1000 * M_PI / 3600 / 180;

/* Print the redshift and area. */
printf("%-5.2f %g\n", z, pow(tandist * 60,2) * area / 1e6);
}

/* Tell the system that everything finished successfully. */
return EXIT_SUCCESS;
}


Then run the following command to compile your program and run it.

$astbuildprog myprogram.c  In the command above, you used Gnuastro’s BuildProgram program. Its job is to greatly simplify the compilation, linking and running of simple C programs that use Gnuastro’s library (like this one). BuildProgram is designed to manage Gnuastro’s dependencies, compile and link your custom program and then run it. Did you notice how your custom program was much faster than the repeated calls to CosmicCalculator in the previous section? You might have noticed that a new file called myprogram is also created in the directory. This is the compiled program that was created and run by the command above (its in binary machine code format, not human-readable any more). You can run it again to get the same results with a command like this: $ ./myprogram


The efficiency of your custom myprogram compared to repeated calls to CosmicCalculator is because in the latter, the requested processing is comparable to the necessary overheads. For other programs that take large input datasets and do complicated processing on them, the overhead is usually negligible compared to the processing. In such cases, the libraries are only useful if you want a different/new processing compared to the functionalities in Gnuastro’s existing programs.

Gnuastro has a large library which is used extensively by all the programs. In other words, the library is like the skeleton of Gnuastro. For the full list of available functions classified by context, please see Gnuastro library. Gnuastro’s library and BuildProgram are created to make it easy for you to use these powerful features as you like. This gives you a high level of creativity, while also providing efficiency and robustness. Several other complete working examples (involving images and tables) of Gnuastro’s libraries can be see in Library demo programs.

But for this tutorial, let’s stop discussing the libraries at this point in and get back to Gnuastro’s already built programs which don’t need any programming. But before continuing, let’s clean up the files we don’t need any more:

$rm myprogram*  #### 2.2.8 Option management and configuration files None of Gnuastro’s programs keep a default value internally within their code. However, when you ran CosmicCalculator only with the -z2 option (not specifying the cosmological parameters) in Cosmological coverage, it completed its processing and printed results. Where did the necessary cosmological parameters (like the matter density, etc) that are necessary for its calculations come from? Fast reply: the values come from a configuration file (see Configuration file precedence). CosmicCalculator is a small program with a limited set of parameters/options. Therefore, let’s use it to discuss configuration files in Gnuastro (for more, you can always see Configuration files). Configuration files are an important part of all Gnuastro’s programs, especially the ones with a large number of options, so its important to understand this part well . Once you get comfortable with configuration files here, you can make good use of them in all Gnuastro programs (for example, NoiseChisel). For example, to do optimal detection on various datasets, you can have configuration files for different noise properties. The configuration of each program (besides its version) is vital for the reproducibility of your results, so it is important to manage them properly. As we saw above, the full list of the options in all Gnuastro programs can be seen with the --help option. Try calling it with CosmicCalculator as shown below. Note how options are grouped by context to make it easier to find your desired option. However, in each group, options are ordered alphabetically. $ astcosmiccal --help


The options that need a value have an = sign after their long version and FLT, INT or STR for floating point numbers, integer numbers, and strings (filenames for example) respectively. All options have a long format and some have a short format (a single character), for more see Options.

When you are using a program, it is often necessary to check the value the option has just before the program starts its processing. In other words, after it has parsed the command-line options and all configuration files. You can see the values of all options that need one with the --printparams or -P option. --printparams is common to all programs (see Common options). In the command below, try replacing -P with --printparams to see how both do the same operation.

$astcosmiccal -P  Let’s say you want a different Hubble constant. Try running the following command (just adding --H0=70 after the command above) to see how the Hubble constant in the output of the command above has changed. $ astcosmiccal -P --H0=70


Afterwards, delete the -P and add a -z2 to see the calculations with the new cosmology (or configuration).

$astcosmiccal --H0=70 -z2  From the output of the --help option, note how the option for Hubble constant has both short (-H) and long (--H0) formats. One final note is that the equal (=) sign is not mandatory. In the short format, the value can stick to the actual option (the short option name is just one character after-all, thus easily identifiable) and in the long format, a white-space character is also enough. $ astcosmiccal -H70    -z2
$astcosmiccal --H0 70 -z2 --arcsectandist  When an option doesn’t need a value, and has a short format (like --arcsectandist), you can easily append it before other short options. So the last command above can also be written as: $ astcosmiccal --H0 70 -sz2


Let’s assume that in one project, you want to only use rounded cosmological parameters (H0 of 70km/s/Mpc and matter density of 0.3). You should therefore run CosmicCalculator like this:

$astcosmiccal --H0=70 --olambda=0.7 --omatter=0.3 -z2  But having to type these extra options every time you run CosmicCalculator will be prone to errors (typos in particular), frustrating and slow. Therefore in Gnuastro, you can put all the options and their values in a “Configuration file” and tell the programs to read the option values from there. Let’s create a configuration file... With your favorite text editor, make a file named my-cosmology.conf (or my-cosmology.txt, the suffix doesn’t matter, but a more descriptive suffix like .conf is recommended). Then put the following lines inside of it. One space between the option value and name is enough, the values are just under each other to help in readability. Also note that you can only use long option names in configuration files. H0 70 olambda 0.7 omatter 0.3  You can now tell CosmicCalculator to read this file for option values immediately using the --config option as shown below. Do you see how the output of the following command corresponds to the option values in my-cosmology.conf, and is therefore identical to the previous command? $ astcosmiccal --config=my-cosmology.conf -z2


But still, having to type --config=my-cosmology.conf every time is annoying, isn’t it? If you need this cosmology every time you are working in a specific directory, you can use Gnuastro’s default configuration file names and avoid having to type it manually.

The default configuration files (that are checked if they exist) must be placed in the hidden .gnuastro sub-directory (in the same directory you are running the program). Their file name (within .gnuastro) must also be the same as the program’s executable name. So in the case of CosmicCalculator, the default configuration file in a given directory is .gnuastro/astcosmiccal.conf.

Let’s do this. We’ll first make a directory for our custom cosmology, then build a .gnuastro within it. Finally, we’ll copy the custom configuration file there:

$mkdir my-cosmology$ mkdir my-cosmology/.gnuastro
$mv my-cosmology.conf my-cosmology/.gnuastro/astcosmiccal.conf  Once you run CosmicCalculator within my-cosmology (as shown below), you will see how your custom cosmology has been implemented without having to type anything extra on the command-line. $ cd my-cosmology
$astcosmiccal -P$ cd ..


To further simplify the process, you can use the --setdirconf option. If you are already in your desired working directory, calling this option with the others will automatically write the final values (along with descriptions) in .gnuastro/astcosmiccal.conf. For example try the commands below:

$mkdir my-cosmology2$ cd my-cosmology2
$astcosmiccal -P$ astcosmiccal --H0 70 --olambda=0.7 --omatter=0.3 --setdirconf
$astcosmiccal -P$ cd ..


Gnuastro’s programs also have default configuration files for a specific user (when run in any directory). This allows you to set a special behavior every time a program is run by a specific user. Only the directory and filename differ from the above, the rest of the process is similar to before. Finally, there are also system-wide configuration files that can be used to define the option values for all users on a system. See Configuration file precedence for a more detailed discussion.

We’ll stop the discussion on configuration files here, but you can always read about them in Configuration files. Before continuing the tutorial, let’s delete the two extra directories that we don’t need any more:

rm -rf my-cosmology*  #### 2.2.9 Warping to a new pixel grid We are now ready to start processing the downloaded images. The XDF datasets we are using here are already aligned to the same pixel grid. However, warping to a different/matched pixel grid is commonly needed before higher-level analysis when you are using datasets from different instruments. So let’s have a look at Gnuastro’s features warping features here. Gnuastro’s Warp program should be used for warping the pixel-grid (see Warp). For example, try rotating one of the images by 20 degrees:  astwarp flat-ir/xdf-f160w.fits --rotate=20


Open the output (xdf-f160w_rotated.fits) and see how it is rotated. If your final image is already aligned with RA and Dec, you can simply use the --align option and let Warp calculate the necessary rotation and apply it. For example, try aligning the rotated image back to the standard orientation (just note that because of the two rotations, the NaN parts of the image are larger now):

astwarp xdf-f160w_rotated.fits --align  Warp can generally be used for many kinds of pixel grid manipulation (warping), not just rotations. For example the outputs of the commands below will respectively have larger pixels (new resolution being one quarter the original resolution), get shifted by 2.8 (by sub-pixel), get a shear of 2, and be tilted (projected). Run each of them and open the output file to see the effect, they will become handy for you in the future.  astwarp flat-ir/xdf-f160w.fits --scale=0.25
$astwarp flat-ir/xdf-f160w.fits --translate=2.8$ astwarp flat-ir/xdf-f160w.fits --shear=0.2
$astwarp flat-ir/xdf-f160w.fits --project=0.001,0.0005  If you need to do multiple warps, you can combine them in one call to Warp. For example to first rotate the image, then scale it, run this command: $ astwarp flat-ir/xdf-f160w.fits --rotate=20 --scale=0.25


If you have multiple warps, do them all in one command. Don’t warp them in separate commands because the correlated noise will become too strong. As you see in the matrix that is printed when you run Warp, it merges all the warps into a single warping matrix (see Merging multiple warpings) and simply applies that (mixes the pixel values) just once. However, if you run Warp multiple times, the pixels will be mixed multiple times, creating a strong artificial blur/smoothing, or stronger correlated noise.

Recall that the merging of multiple warps is done through matrix multiplication, therefore order matters in the separate operations. At a lower level, through Warp’s --matrix option, you can directly request your desired final warp and don’t have to break it up into different warps like above (see Invoking Warp).

Fortunately these datasets are already aligned to the same pixel grid, so you don’t actually need the files that were just generated.You can safely delete them all with the following command. Here, you see why we put the processed outputs that we need later into a separate directory. In this way, the top directory can be used for temporary files for testing that you can simply delete with a generic command like below.

$rm *.fits  #### 2.2.10 NoiseChisel and Multiextension FITS files Having completed a review of the basics in the previous sections, we are now ready to separate the signal (galaxies or stars) from the background noise in the image. We will be using the results of Dataset inspection and cropping, so be sure you already have them. Gnuastro has NoiseChisel for this job. But NoiseChisel’s output is a multi-extension FITS file, therefore to better understand how to use NoiseChisel, let’s take a look at multi-extension FITS files and how you can interact with them. In the FITS format, each extension contains a separate dataset (image in this case). You can get basic information about the extensions in a FITS file with Gnuastro’s Fits program (see Fits). To start with, let’s run NoiseChisel without any options, then use Gnuastro’s FITS program to inspect the number of extensions in this file. $ astnoisechisel flat-ir/xdf-f160w.fits
$astfits xdf-f160w_detected.fits  From the output list, we see that NoiseChisel’s output contains 5 extensions and the first (counting from zero, with name NOISECHISEL-CONFIG) is empty: it has value of 0 in the last column (which shows its size). The first extension in all the outputs of Gnuastro’s programs only contains meta-data: data about/describing the datasets within (all) the output’s extensions. This is recommended by the FITS standard, see Fits for more. In the case of Gnuastro’s programs, this generic zero-th/meta-data extension (for the whole file) contains all the configuration options of the program that created the file. The second extension of NoiseChisel’s output (numbered 1, named INPUT-NO-SKY) is the Sky-subtracted input that you provided. The third (DETECTIONS) is NoiseChisel’s main output which is a binary image with only two possible values for all pixels: 0 for noise and 1 for signal. Since it only has two values, to avoid taking too much space on your computer, its numeric datatype an unsigned 8-bit integer (or uint8)32. The fourth and fifth (SKY and SKY_STD) extensions, have the Sky and its standard deviation values for the input on a tile grid and were calculated over the undetected regions (for more on the importance of the Sky value, see Sky value). Metadata regarding how the analysis was done (or a dataset was created) is very important for higher-level analysis and reproducibility. Therefore, Let’s first take a closer look at the NOISECHISEL-CONFIG extension. If you specify a special header in the FITS file, Gnuastro’s Fits program will print the header keywords (metadata) of that extension. You can either specify the HDU/extension counter (starting from 0), or name. Therefore, the two commands below are identical for this file: $ astfits xdf-f160w_detected.fits -h0
$astfits xdf-f160w_detected.fits -hNOISECHISEL-CONFIG  The first group of FITS header keywords are standard keywords (containing the SIMPLE and BITPIX keywords the first empty line). They are required by the FITS standard and must be present in any FITS extension. The second group contains the input file and all the options with their values in that run of NoiseChisel. Finally, the last group contains the date and version information of Gnuastro and its dependencies. The “versions and date” group of keywords are present in all Gnuastro’s FITS extension outputs, for more see Output FITS files. Note that if a keyword name is larger than 8 characters, it is preceded by a HIERARCH keyword and that all keyword names are in capital letters. Therefore, if you want to see only one keyword’s value by feeding the output to Grep, you should ask Grep to ignore case with its -i option (short name for --ignore-case). For example, below we’ll check the value to the --snminarea option, note how we don’t need Grep’s -i option when it is fed with astnoisechisel -P since it is already in small-caps there. The extra white spaces in the first command are only to help in readability, you can ignore them when typing. $ astnoisechisel -P                   | grep    snminarea
$astfits xdf-f160w_detected.fits -h0 | grep -i snminarea  The metadata (that is stored in the output) can later be used to exactly reproduce/understand your result, even if you have lost/forgot the command you used to create the file. This feature is present in all of Gnuastro’s programs, not just NoiseChisel. Let’s continue with the extensions in NoiseChisel’s output that contain a dataset by visually inspecting them (here, we’ll use SAO DS9). Since the file contains multiple related extensions, the easiest way to view all of them in DS9 is to open the file as a “Multi-extension data cube” with the -mecube option as shown below33. $ ds9 -mecube xdf-f160w_detected.fits -zscale -zoom to fit


A “cube” window opens along with DS9’s main window. The buttons and horizontal scroll bar in this small new window can be used to navigate between the extensions. In this mode, all DS9’s settings (for example zoom or color-bar) will be identical between the extensions. Try zooming into to one part and flipping through the extensions to see how the galaxies were detected along with the Sky and Sky standard deviation values for that region. Just have in mind that NoiseChisel’s job is only detection (separating signal from noise), We’ll do segmentation on this result later to find the individual galaxies/peaks over the detected pixels.

Each HDU/extension in a FITS file is an independent dataset (image or table) which you can delete from the FITS file, or copy/cut to another file. For example, with the command below, you can copy NoiseChisel’s DETECTIONS HDU/extension to another file:

$astfits xdf-f160w_detected.fits --copy=DETECTIONS -odetections.fits  There are similar options to conveniently cut (--cut, copy, then remove from the input) or delete (--remove) HDUs from a FITS file also. See HDU manipulation for more. #### 2.2.11 NoiseChisel optimization for detection In NoiseChisel and Multiextension FITS files, we ran NoiseChisel and reviewed NoiseChisel’s output format. Now that you have a better feeling for multi-extension FITS files, let’s optimize NoiseChisel for this particular dataset. One good way to see if you have missed any signal (small galaxies, or the wings of brighter galaxies) is to mask all the detected pixels and inspect the noise pixels. For this, you can use Gnuastro’s Arithmetic program (in particular its where operator, see Arithmetic operators). The command below will produce mask-det.fits. In it, all the pixels in the INPUT-NO-SKY extension that are flagged 1 in the DETECTIONS extension (dominated by signal, not noise) will be set to NaN. Since the various extensions are in the same file, for each dataset we need the file and extension name. To make the command easier to read/write/understand, let’s use shell variables: ‘in’ will be used for the Sky-subtracted input image and ‘det’ will be used for the detection map. Recall that a shell variable’s value can be retrieved by adding a $ before its name, also note that the double quotations are necessary when we have white-space characters in a variable name (like this case).

$in="xdf-f160w_detected.fits -hINPUT-NO-SKY"$ det="xdf-f160w_detected.fits -hDETECTIONS"
$astarithmetic$in $det nan where --output=mask-det.fits  To invert the result (only keep the detected pixels), you can flip the detection map (from 0 to 1 and vice-versa) by adding a ‘not’ after the second $det:

$astarithmetic$in $det not nan where --output=mask-sky.fits  Looking again at the detected pixels, we see that there are thin connections between many of the smaller objects or extending from larger objects. This shows that we have dug in too deep, and that we are following correlated noise. Correlated noise is created when we warp datasets from individual exposures (that are each slightly offset compared to each other) into the same pixel grid, then add them to form the final result. Because it mixes nearby pixel values, correlated noise is a form of convolution and it smooths the image. In terms of the number of exposures (and thus correlated noise), the XDF dataset is by no means an ordinary dataset. It is the result of warping and adding roughly 80 separate exposures which can create strong correlated noise/smoothing. In common surveys the number of exposures is usually 10 or less. Let’s tweak NoiseChisel’s configuration a little to get a better result on this dataset. Don’t forget that “Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer” (Anscombe 1973, see Science and its tools). A good scientist must have a good understanding of her tools to make a meaningful analysis. So don’t hesitate in playing with the default configuration and reviewing the manual when you have a new dataset in front of you. Robust data analysis is an art, therefore a good scientist must first be a good artist. NoiseChisel can produce “Check images” to help you visualize and inspect how each step is done. You can see all the check images it can produce with this command. $ astnoisechisel --help | grep check


Let’s check the overall detection process to get a better feeling of what NoiseChisel is doing with the following command. To learn the details of NoiseChisel in more detail, please see NoiseChisel, Akhlaghi and Ichikawa [2015] and Akhlaghi [2019].

$astnoisechisel flat-ir/xdf-f160w.fits --checkdetection  The check images/tables are also multi-extension FITS files. As you saw from the command above, when check datasets are requested, NoiseChisel won’t go to the end. It will abort as soon as all the extensions of the check image are ready. Please list the extensions of the output with astfits and then opening it with ds9 as we done above. If you have read the paper, you will see why there are so many extensions in the check image. $ astfits xdf-f160w_detcheck.fits
$ds9 -mecube xdf-f160w_detcheck.fits -zscale -zoom to fit  In order to understand the parameters and their biases (especially as you are starting to use Gnuastro, or running it a new dataset), it is strongly encouraged to play with the different parameters and use the respective check images to see which step is affected by your changes and how, for example see Detecting large extended targets. The OPENED_AND_LABELED extension shows the initial detection step of NoiseChisel. We see these thin connections between smaller points are already present here (a relatively early stage in the processing). Such connections at the lowest surface brightness limits usually occur when the dataset is too smoothed. Because of correlated noise, the dataset is already artificially smoothed, therefore further smoothing it with the default kernel may be the problem. One solution is thus to use a sharper kernel (NoiseChisel’s first step in its processing). By default NoiseChisel uses a Gaussian with full-width-half-maximum (FWHM) of 2 pixels. We can use Gnuastro’s MakeProfiles to build a kernel with FWHM of 1.5 pixel (truncated at 5 times the FWHM, like the default) using the following command. MakeProfiles is a powerful tool to build any number of mock profiles on one image or independently, to learn more of its features and capabilities, see MakeProfiles. $ astmkprof --kernel=gaussian,1.5,5 --oversample=1


Please open the output kernel.fits and have a look (it is very small and sharp). We can now tell NoiseChisel to use this instead of the default kernel with the following command (we’ll keep checking the detection steps)

$astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \ --checkdetection  Looking at the OPENED_AND_LABELED extension, we see that the thin connections between smaller peaks has now significantly decreased. Going two extensions/steps ahead (in the first HOLES-FILLED), you can see that during the process of finding false pseudo-detections, too many holes have been filled: do you see how the many of the brighter galaxies are connected? At this stage all holes are filled, irrespective of their size. Try looking two extensions ahead (in the first PSEUDOS-FOR-SN), you can see that there aren’t too many pseudo-detections because of all those extended filled holes. If you look closely, you can see the number of pseudo-detections in the result NoiseChisel prints (around 5000). This is another side-effect of correlated noise. To address it, we should slightly increase the pseudo-detection threshold (before changing --dthresh, run with -P to see the default value): $ astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \
--dthresh=0.1 --checkdetection


Before visually inspecting the check image, you can already see the effect of this change in NoiseChisel’s command-line output: notice how the number of pseudos has increased to more than 6000. Open the check image now and have a look, you can see how the pseudo-detections are distributed much more evenly in the image.

 Maximize the number of pseudo-detections: For a new noise-pattern (different instrument), play with --dthresh until you get a maximal number of pseudo-detections (the total number of pseudo-detections is printed on the command-line when you run NoiseChisel).

The signal-to-noise ratio of pseudo-detections define NoiseChisel’s reference for removing false detections, so they are very important to get right. Let’s have a look at their signal-to-noise distribution with --checksn.

$astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \ --dthresh=0.1 --checkdetection --checksn  The output (xdf-f160w_detsn.fits) contains two extensions for the pseudo-detections over the undetected (sky) regions and those over detections. The first column is the pseudo-detection label which you can see in the respective34 PSEUDOS-FOR-SN extension of xdf-f160w_detcheck.fits. You can see the table columns with the first command below and get a feeling for its distribution with the second command (the two Table and Statistics programs will be discussed later in the tutorial) $ asttable xdf-f160w_detsn.fits -hSKY_PSEUDODET_SN
$aststatistics xdf-f160w_detsn.fits -hSKY_PSEUDODET_SN -c2  The correlated noise is again visible in this pseudo-detection signal-to-noise distribution: it is highly skewed. A small change in the quantile will translate into a big change in the S/N value. For example see the difference between the three 0.99, 0.95 and 0.90 quantiles with this command: $ aststatistics xdf-f160w_detsn.fits -hSKY_PSEUDODET_SN -c2      \
--quantile=0.99 --quantile=0.95 --quantile=0.90


If you run NoiseChisel with -P, you’ll see the default signal-to-noise quantile --snquant is 0.99. In effect with this option you specify the purity level you want (contamination by false detections). With the aststatistics command above, you see that a small number of extra false detections (impurity) in the final result causes a big change in completeness (you can detect more lower signal-to-noise true detections). So let’s loosen-up our desired purity level, remove the check-image options, and then mask the detected pixels like before to see if we have missed anything.

$astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \ --dthresh=0.1 --snquant=0.95$ in="xdf-f160w_detected.fits -hINPUT-NO-SKY"
$det="xdf-f160w_detected.fits -hDETECTIONS"$ astarithmetic $in$det nan where --output=mask-det.fits


Overall it seems good, but if you play a little with the color-bar and look closer in the noise, you’ll see a few very sharp, but faint, objects that have not been detected. This only happens for under-sampled datasets like HST (where the pixel size is larger than the point spread function FWHM). So this won’t happen on ground-based images. Because of this, sharp and faint objects will be very small and eroded too easily during NoiseChisel’s erosion step.

To address this problem of sharp objects, we can use NoiseChisel’s --noerodequant option. All pixels above this quantile will not be eroded, thus allowing us to preserve faint and sharp objects. Check its default value, then run NoiseChisel like below and make the mask again. You will see many of those sharp objects are now detected.

$astnoisechisel flat-ir/xdf-f160w.fits --kernel=kernel.fits \ --noerodequant=0.95 --dthresh=0.1 --snquant=0.95  This seems to be fine and we can continue with our analysis. To avoid having to write these options on every call to NoiseChisel, we’ll just make a configuration file in a visible config directory. Then we’ll define the hidden .gnuastro directory (that all Gnuastro’s programs will look into for configuration files) as a symbolic link to the config directory. Finally, we’ll write the finalized values of the options into NoiseChisel’s standard configuration file within that directory. We’ll also put the kernel in a separate directory to keep the top directory clean of any files we later need. $ mkdir kernel config
$ln -s config/ .gnuastro$ mv kernel.fits kernel/noisechisel.fits
$echo "kernel kernel/noisechisel.fits" > config/astnoisechisel.conf$ echo "noerodequant 0.95"             >> config/astnoisechisel.conf
$echo "dthresh 0.1" >> config/astnoisechisel.conf$ echo "snquant      0.95"             >> config/astnoisechisel.conf


We are now ready to finally run NoiseChisel on the two filters and keep the output in a dedicated directory (nc).

$rm *.fits$ mkdir nc
$astnoisechisel flat-ir/xdf-f160w.fits --output=nc/xdf-f160w.fits$ astnoisechisel flat-ir/xdf-f125w.fits --output=nc/xdf-f125w.fits
$astnoisechisel flat-ir/xdf-f105w.fits --output=nc/xdf-f105w.fits  #### 2.2.12 NoiseChisel optimization for storage As we showed before (in NoiseChisel and Multiextension FITS files), NoiseChisel’s output is a multi-extension FITS file with several images the same size as the input. As the input datasets get larger this output can become hard to manage and waste a lot of storage space. Fortunately there is a solution to this problem (which is also useful for Segment’s outputs). In this small section we’ll take a short detour to show this feature. Please note that the outputs generated here are not needed for the rest of the tutorial. But first, let’s have a look at the contents/HDUs and volume of NoiseChisel’s output from NoiseChisel optimization for detection (fast answer, its larger than 100 mega-bytes): $ astfits nc/xdf-f160w.fits
$ls -lh nc/xdf-f160w.fits  Two options can drastically decrease NoiseChisel’s output file size: 1) With the --rawoutput option, NoiseChisel won’t create a Sky-subtracted input. After all, it is redundant: you can always generate it by subtracting the SKY extension from the input image (which you have in your database) using the Arithmetic program. 2) With the --oneelempertile, you can tell NoiseChisel to store its Sky and Sky standard deviation results with one pixel per tile (instead of many pixels per tile). So let’s run NoiseChisel with these options, then have another look at the HDUs and the over-all file size: $ astnoisechisel flat-ir/xdf-f160w.fits --oneelempertile --rawoutput \
--output=nc-for-storage.fits
$astfits nc-for-storage.fits$ ls -lh nc-for-storage.fits


See how nc-for-storage.fits has four HDUs, while nc/xdf-f160w.fits had five HDUs? As explained above, the missing extension is INPUT-NO-SKY. Also, look at the sizes of the SKY and SKY_STD HDUs, unlike before, they aren’t the same size as DETECTIONS, they only have one pixel for each tile (group of pixels in raw input). Finally, you see that nc-for-storage.fits is just under 8 mega byes (while nc/xdf-f160w.fits was 100 mega bytes)!

But were are not finished! You can even be more efficient in storage, archival or transferring NoiseChisel’s output by compressing this file. Try the command below to see how NoiseChisel’s output has now shrunk to about 250 kilo-byes while keeping all the necessary information as the original 100 mega-byte output.

$gzip --best nc-for-storage.fits$ ls -lh nc-for-storage.fits.gz


We can get this wonderful level of compression because NoiseChisel’s output is binary with only two values: 0 and 1. Compression algorithms are highly optimized in such scenarios.

You can open nc-for-storage.fits.gz directly in SAO DS9 or feed it to any of Gnuastro’s programs without having to decompress it. Higher-level programs that take NoiseChisel’s output (for example Segment or MakeCatalog) can also deal with this compressed image where the Sky and its Standard deviation are one pixel-per-tile. You just have to give the “values” image as a separate option, for more, see Segment and MakeCatalog.

Segment (the program we will introduce in the next section for identifying sub-structure), also has similar features to optimize its output for storage. Since this file was only created for a fast detour demonstration, let’s keep our top directory clean and move to the next step:

rm nc-for-storage.fits.gz


#### 2.2.13 Segmentation and making a catalog

The main output of NoiseChisel is the binary detection map (DETECTIONS extension, see NoiseChisel optimization for detection). which only has two values of 1 or 0. This is useful when studying the noise, but hardly of any use when you actually want to study the targets/galaxies in the image, especially in such a deep field where the detection map of almost everything is connected. To find the galaxies over the detections, we’ll use Gnuastro’s Segment program:

$mkdir seg$ astsegment nc/xdf-f160w.fits -oseg/xdf-f160w.fits
$astsegment nc/xdf-f125w.fits -oseg/xdf-f125w.fits$ astsegment nc/xdf-f105w.fits -oseg/xdf-f105w.fits


Segment’s operation is very much like NoiseChisel (in fact, prior to version 0.6, it was part of NoiseChisel). For example the output is a multi-extension FITS file, it has check images and uses the undetected regions as a reference. Please have a look at Segment’s multi-extension output with ds9 to get a good feeling of what it has done.

$ds9 -mecube seg/xdf-f160w.fits -zscale -zoom to fit  Like NoiseChisel, the first extension is the input. The CLUMPS extension shows the true “clumps” with values that are $$\ge1$$, and the diffuse regions labeled as $$-1$$. In the OBJECTS extension, we see that the large detections of NoiseChisel (that may have contained many galaxies) are now broken up into separate labels. See Segment for more. The clumps are not affected by the hard-to-deblend and low signal-to-noise diffuse regions, they are more robust for calculating the colors (compared to objects). Therefore from this step onward, we’ll continue with clumps. Having localized the regions of interest in the dataset, we are ready to do measurements on them with MakeCatalog. Besides the IDs, we want to measure (in this order) the Right Ascension (with --ra), Declination (--dec), magnitude (--magnitude), and signal-to-noise ratio (--sn) of the objects and clumps. Furthermore, as mentioned above, we also want measurements on clumps, so we also need to call --clumpscat. The following command will make these measurements on Segment’s F160W output and write them in a catalog for each object and clump in a FITS table. $ mkdir cat
$astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \ --zeropoint=25.94 --clumpscat --output=cat/xdf-f160w.fits  From the printed statements on the command-line, you see that MakeCatalog read all the extensions in Segment’s output for the various measurements it needed. To calculate colors, we also need magnitude measurements on the other filters. So let’s repeat the command above on them, just changing the file names and zeropoint (which we got from the XDF survey webpage): $ astmkcatalog seg/xdf-f125w.fits --ids --ra --dec --magnitude --sn \
--zeropoint=26.23 --clumpscat --output=cat/xdf-f125w.fits

$astmkcatalog seg/xdf-f105w.fits --ids --ra --dec --magnitude --sn \ --zeropoint=26.27 --clumpscat --output=cat/xdf-f105w.fits  However, the galaxy properties might differ between the filters (which is the whole purpose behind observing in different filters!). Also, the noise properties and depth of the datasets differ. You can see the effect of these factors in the resulting clump catalogs, with Gnuastro’s Table program. We’ll go deep into working with tables in the next section, but in summary: the -i option will print information about the columns and number of rows. To see the column values, just remove the -i option. In the output of each command below, look at the Number of rows:, and note that they are different. asttable cat/xdf-f105w.fits -hCLUMPS -i asttable cat/xdf-f125w.fits -hCLUMPS -i asttable cat/xdf-f160w.fits -hCLUMPS -i  Matching the catalogs is possible (for example with Match). However, the measurements of each column are also done on different pixels: the clump labels can/will differ from one filter to another for one object. Please open them and focus on one object to see for your self. This can bias the result, if you match catalogs. An accurate color calculation can only be done when magnitudes are measured from the same pixels on both images. Fortunately in these images, the Point spread function (PSF) are very similar, allowing us to do this directly35. You can do this with MakeCatalog and is one of the reasons that NoiseChisel or Segment don’t generate a catalog at all (to give you the freedom of selecting the pixels to do catalog measurements on). The F160W image is deeper, thus providing better detection/segmentation, and redder, thus observing smaller/older stars and representing more of the mass in the galaxies. We will thus use the F160W filter as a reference and use its segment labels to identify which pixels to use for which objects/clumps. But we will do the measurements on the sky-subtracted F105W and F125W images (using MakeCatalog’s --valuesfile option) as shown below: Notice how the major difference between this call to MakeCatalog and the call to generate the F160W catalog (excluding the zeropoint and the output name) is the --valuesfile. $ astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \
--valuesfile=nc/xdf-f125w.fits --zeropoint=26.23 \
--clumpscat --output=cat/xdf-f125w-on-f160w-lab.fits

$astmkcatalog seg/xdf-f160w.fits --ids --ra --dec --magnitude --sn \ --valuesfile=nc/xdf-f105w.fits --zeropoint=26.27 \ --clumpscat --output=cat/xdf-f105w-on-f160w-lab.fits  Look into what MakeCatalog printed on the command-line after running the commands above. You can see that (as requested) the object and clump labels were taken from the respective extensions in seg/xdf-f160w.fits, while the values and Sky standard deviation were taken from nc/xdf-f105w.fits. Since we used the same labeled image on both filters, the number of rows in both catalogs are now identical: asttable cat/xdf-f105w-on-f160w-lab.fits -hCLUMPS -i asttable cat/xdf-f125w-on-f160w-lab.fits -hCLUMPS -i asttable cat/xdf-f160w.fits -hCLUMPS -i  Finally, the comments in MakeCatalog’s output (COMMENT keywords in the FITS headers, or lines starting with # in plain text) contain some important information about the input datasets and other useful info (for example pixel area or per-pixel surface brightness limit). You can see them with this command: $ astfits cat/xdf-f160w.fits -h1 | grep COMMENT


Next: , Previous: , Up: General program usage tutorial   [Contents][Index]

#### 2.2.14 Working with catalogs (estimating colors)

The output of the MakeCatalog command above is a FITS table (see Segmentation and making a catalog). The two clump and object catalogs are available in the two extensions of the single FITS file36. Let’s see the extensions and their basic properties with the Fits program:

$astfits cat/xdf-f160w.fits # Extension information  Now, let’s inspect the table in each extension with Gnuastro’s Table program (see Table). Note that we could have used -hOBJECTS and -hCLUMPS instead of -h1 and -h2 respectively. $ asttable cat/xdf-f160w.fits -h1 --info   # Objects catalog info.
$asttable cat/xdf-f160w.fits -h1 # Objects catalog columns.$ asttable cat/xdf-f160w.fits -h2 -i       # Clumps catalog info.
$asttable cat/xdf-f160w.fits -h2 # Clumps catalog columns.  As you see above, when given a specific table (file name and extension), Table will print the full contents of all the columns. To see the basic metadata about each column (for example name, units and comments), simply append a --info (or -i) to the command. To print the contents of special column(s), just specify the column number(s) (counting from 1) or the column name(s) (if they have one). For example, if you just want the magnitude and signal-to-noise ratio of the clumps (in -h2), you can get it with any of the following commands $ asttable cat/xdf-f160w.fits -h2 -c5,6
$asttable cat/xdf-f160w.fits -h2 -c5,SN$ asttable cat/xdf-f160w.fits -h2 -c5         -c6
$asttable cat/xdf-f160w.fits -h2 -cMAGNITUDE -cSN  Using column names instead of numbers has many advantages: 1) you don’t have to worry about the order of columns in the table. 2) It acts as a documentation in the script. Column meta-data (including a name) aren’t just limited to FITS tables and can also be used in plain text tables, see Gnuastro text table format. Since cat/xdf-f160w.fits and cat/xdf-f105w-on-f160w-lab.fits have exactly the same number of rows, we can use Table to merge the columns of these two tables, to have one table with magnitudes in both filters. We do this with the --catcolumnfile option like below. You give this option a file name (which is assumed to be a table that has the same number of rows), and all the table’s columns will be concatenated/appended to the main table. So please try it out with the commands below. We’ll first look at the metadata of the first table (only the CLUMPS extension). With the second command, we’ll concatenate the two tables and write them in, two-in-one.fits and finally, we’ll check the new catalog’s metadata. $ asttable cat/xdf-f160w.fits -i -hCLUMPS
$asttable cat/xdf-f160w.fits -hCLUMPS --output=two-in-one.fits \ --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \ --catcolumnhdu=CLUMPS$ asttable two-in-one.fits -i


Looking at the two metadata outputs (called with -i), you may have noticed that both tables have the same number of rows. But what might have attracted your attention more, is that both-mags.fits has double the number of columns (as expected, after all, you merged both tables into one file). In fact you can concatenate any number of other tables in one command, for example:

$asttable cat/xdf-f160w.fits -hCLUMPS --output=three-in-one.fits \ --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \ --catcolumnfile=cat/xdf-f105w-on-f160w-lab.fits \ --catcolumnhdu=CLUMPS --catcolumnhdu=CLUMPS$ asttable three-in-one.fits -i


As you see, to avoid confusion in column names, Table has intentionally appended a -1 to the column names of the first concatenated table (so for example we have the original RA column, and another one called RA-1). Similarly a -2 has been added for the columns of the second concatenated table.

However, this example clearly shows a problem with this full concatenation: some columns are identical (for example HOST_OBJ_ID and HOST_OBJ_ID-1), or not needed (for example RA-1 and DEC-1 which are not necessary here). In such cases, you can use --catcolumns to only concatenate certain columns, not the whole table, for example this command:

$asttable cat/xdf-f160w.fits -hCLUMPS --output=two-in-one-2.fits \ --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \ --catcolumnhdu=CLUMPS --catcolumns=MAGNITUDE$ asttable three-in-one-2.fits -i


You see that we have now only appended the MAGNITUDE column of cat/xdf-f125w-on-f160w-lab.fits. This is what we needed to be able to later subtract the magnitudes. Let’s go ahead and add the F105W magnitudes also with the command below. Note how we need to call --catcolumnhdu once for every table that should be appended, but we only call --catcolumn once (assuming all the tables that should be appended have this column).

$asttable cat/xdf-f160w.fits -hCLUMPS --output=three-in-one-2.fits \ --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \ --catcolumnfile=cat/xdf-f105w-on-f160w-lab.fits \ --catcolumnhdu=CLUMPS --catcolumnhdu=CLUMPS \ --catcolumns=MAGNITUDE$ asttable three-in-one-2.fits -i


But we aren’t finished yet! There is a very big problem: its not clear which one of MAGNITUDE, MAGNITUDE-1 or MAGNITUDE-2 columns belong to which filter! Right now, you know this because you just ran this command. But in one hour, you’ll start doubting your self and will be forced to go through your command history, trying to answer this question. You should never torture your future-self (or your colleagues) like this! So, let’s rename these confusing columns in the matched catalog.

Fortunately, with the --colmetadata option, you can correct the column metadata of the final table (just before it is written). It takes four values: 1) the column name or number, 2) the column name, 3) the column unit and 4) the column comments. Since the comments are usually human-friendly sentences and contain space characters, you should put them in double quotations like below. For example by adding three calls of this option to the previous command, we write the filter name in the magnitude column name and description.

$asttable cat/xdf-f160w.fits -hCLUMPS --output=three-in-one-3.fits \ --catcolumnfile=cat/xdf-f125w-on-f160w-lab.fits \ --catcolumnfile=cat/xdf-f105w-on-f160w-lab.fits \ --catcolumnhdu=CLUMPS --catcolumnhdu=CLUMPS \ --catcolumns=MAGNITUDE \ --colmetadata=MAGNITUDE,MAG-F160w,log,"Magnitude in F160W." \ --colmetadata=MAGNITUDE-1,MAG-F125w,log,"Magnitude in F125W." \ --colmetadata=MAGNITUDE-2,MAG-F105w,log,"Magnitude in F105W."$ asttable three-in-one-3.fits -i


We now have both magnitudes in one table and can start doing arithmetic on them (to estimate colors, which are just a subtraction of magnitudes). To use column arithmetic, simply call the column selection option (--column or -c), put the value in single quotations and start the value with arith (followed by a space) like the example below. Column arithmetic uses the same notation as the Arithmetic program (see Reverse polish notation), with almost all the same operators (see Arithmetic operators), and some column-specific operators (that aren’t available for images). In column-arithmetic, you can identify columns by number (prefixed with a $) or name, for more see Column arithmetic. So let’s estimate one color from three-in-one-3.fits using column arithmetic. All the commands below will produce the same output, try them each and focus on the differences. Note that column arithmetic can be mixed with other ways to choose output columns (the -c option). $ asttable three-in-one-3.fits -ocolor-cat.fits \
-c1,2,RA,DEC,'arith $5$7 -'

$asttable three-in-one-3.fits -ocolor-cat.fits \ -c1,2,RA,DEC,'arith MAG-F125W MAG-F160W -'$ asttable three-in-one-3.fits -ocolor-cat.fits -c1,2 \
-cRA,DEC --column='arith MAG-F105W MAG-F160W -'


This example again highlights the important point on column metadata: do you see how clearly understandable the the last two commands are? On the contrary, do you feel how cryptic the first one is? When you have column names, please use them. If your table doesn’t have column names, give them names with the --colmetadata (described above) as you are creating them. But how about the metadata for the column you just created with column arithmetic? Have a look at the column metadata of the table produced above:

$asttable color-cat.fits -i  The name of the column produced by arithmetic column is ARITH_1! This is natural: Arithmetic has no idea what the modified column is! You could have multiplied two columns, or done much more complex transformations with many columns. Metadata can’t be set automatically. To add metadata, you can use --colmetadata like before: $ asttable three-in-one-3.fits -ocolor-cat.fits -c1,2,RA,DEC \
--column='arith MAG-F105W MAG-F160W -' \


We are now ready to make our final table. We want it to have the magnitudes in all three filters, as well colors. Recall that by convention in astronomy colors are defined by subtracting the bluer magnitude from the redder magnitude. In this way a larger color value corresponds to a redder object. So from the three magnitudes, we can produce three colors (as shown below). Also, because this is the final table we are creating here and want to use it later, we’ll store it in cat/ and we’ll also give it a clear name and use the --range option to only print columns with a signal-to-noise ratio (SN column, from the F160W filter) above 5.

$asttable three-in-one-3.fits --range=SN,5,inf -c1,2,RA,DEC,SN \ -cMAG-F160W,MAG-F125W,MAG-F105W \ -c'arith MAG-F125W MAG-F160W -' \ -c'arith MAG-F105W MAG-F125W -' \ -c'arith MAG-F105W MAG-F160W -' \ --colmetadata=SN,SN-F160W,ratio,"F160W signal to noise ratio" \ --colmetadata=ARITH_1,F125W-F160W,log,"Color F125W and F160W" \ --colmetadata=ARITH_2,F105W-F125W,log,"Color F105W and F125W" \ --colmetadata=ARITH_3,F105W-F160W,log,"Color F105W and F160W" \ --output=cat/mags-with-color.fits$ asttable cat/mags-with-color.fits -i


The table now has all the columns we need and it has the proper metadata to let us safely use it later (without frustrating over column orders!) or passing it to colleagues. You can now inspect the distribution of colors with the Statistics program.

$aststatistics cat/mags-with-color.fits -cF105W-F125W$ aststatistics cat/mags-with-color.fits -cF105W-F160W
$aststatistics cat/mags-with-color.fits -cF125W-F160W  This tiny and cute ASCII histogram (and the general information printed above it) gives you a crude (but very useful and fast) feeling on the distribution. You can later use Gnuastro’s Statistics program with the --histogram option to build a much more fine-grained histogram as a table to feed into your favorite plotting program for a much more accurate/appealing plot (for example with PGFPlots in LaTeX). If you just want a specific measure, for example the mean, median and standard deviation, you can ask for them specifically, like below: $ aststatistics cat/mags-with-color.fits -cF105W-F160W \
--mean --median --std


We won’t go much deeper into the Statistics program here, but there is so much more you can do with it, please see Statistics later.

Let’s finish this section of the tutorial with a useful tip on modifying column metadata. Above, updating/changing column metadata was done with the --colmetadata in the same command that produced the newly created Table file. But in many situations, the table is already made and you just want to update the metadata of one column. In such cases using --colmetadata is over-kill (wasting CPU/RAM energy or time if the table is large) because it will load the full table data and metadata into memory, just change the metadata and write it back into a file.

In scenarios when the table’s data doesn’t need to be changed and you just want to set or update the metadata, it is much more efficient to use basic FITS keyword editing. For example, in the FITS standard, column names are stored in the TTYPE header keywords, so let’s have a look:

$asttable two-in-one.fits -i$ astfits two-in-one.fits -h1 | grep TTYPE


Changing/updating the column names is as easy as updating the values to these keywords. You don’t need to touch the actual data! With the command below, we’ll just update the MAGNITUDE and MAGNITUDE-1 columns (which are respectively stored in the TTYPE5 and TTYPE11 keywords) by modifying the keyword values and checking the effect by listing the column metadata again:

$astfits two-in-one.fits -h1 \ --update=TTYPE5,MAG-F160W \ --update=TTYPE11,MAG-F125W$ asttable two-in-one.fits -i


You can see that the column names have indeed been changed without touching any of the data. You can do the same for the column units or comments by modifying the keywords starting with TUNIT or TCOMM.

Generally, Gnuastro’s table is a very useful program in data analysis and what you have seen so far is just the tip of the iceberg. But to keep the tutorial short, we’ll stop reviewing the features here, for more, please see Table. Finally, let’s delete all the temporary FITS tables we placed in the top project directory:

rm *.fits


Next: , Previous: , Up: General program usage tutorial   [Contents][Index]

#### 2.2.15 Aperture photometry

The colors we calculated in Working with catalogs (estimating colors) used a different segmentation map for each object. This might not satisfy some science cases that need the flux within a fixed area/aperture. Fortunately Gnuastro’s modular programs make it very easy do this type of measurement (photometry). To do this, we can ignore the labeled images of NoiseChisel of Segment, we can just built our own labeled image! That labeled image can then be given to MakeCatalog

To generate the apertures catalog we’ll use Gnuastro’s MakeProfiles (see MakeProfiles). But first we need a list of positions (aperture photometry needs a-priori knowledge of your target positions). So we’ll first read the clump positions from the F160W catalog, then use AWK to set the other parameters of each profile to be a fixed circle of radius 5 pixels (recall that we want all apertures to have an identical size/area in this scenario).

$rm *.fits *.txt$ asttable cat/xdf-f160w.fits -hCLUMPS -cRA,DEC                    \
| awk '!/^#/{print NR, $1,$2, 5, 5, 0, 0, 1, NR, 1}' \
> apertures.txt
$cat apertures.txt  We can now feed this catalog into MakeProfiles using the command below to build the apertures over the image. The most important option for this particular job is --mforflatpix, it tells MakeProfiles that the values in the magnitude column should be used for each pixel of a flat profile. Without it, MakeProfiles would build the profiles such that the sum of the pixels of each profile would have a magnitude (in log-scale) of the value given in that column (what you would expect when simulating a galaxy for example). See Invoking MakeProfiles for details on the options. $ astmkprof apertures.txt --background=flat-ir/xdf-f160w.fits     \
--clearcanvas --replace --type=int16 --mforflatpix    \
--mode=wcs


The first thing you might notice in the printed information is that the profiles are not built in order. This is because MakeProfiles works in parallel, and parallel CPU operations are asynchronous. You can try running MakeProfiles with one thread (using --numthreads=1) to see how order is respected in that case, but slower (note that the multi-threaded run will be much more faster when more mathematically-complicated profiles are built, like Séric profiles).

Open apertures.fits with a FITS viewer and look around at the circles placed over the targets. Also open the input image and Segment’s clumps image and compare them with the positions of these circles. Where the apertures overlap, you will notice that one label has replaced the other (because of the --replace option). In the future, MakeCatalog will be able to work with overlapping labels, but currently it doesn’t. If you are interested, please join us in completing Gnuastro with added improvements like this (see task 14750 37).

We can now feed the apertures.fits labeled image into MakeCatalog instead of Segment’s output as shown below. In comparison with the previous MakeCatalog call, you will notice that there is no more --clumpscat option, since there is no more separate “clump” image now, each aperture is treated as a separate “object”.

$astmkcatalog apertures.fits -h1 --zeropoint=26.27 \ --valuesfile=nc/xdf-f105w.fits \ --ids --ra --dec --magnitude --sn \ --output=cat/xdf-f105w-aper.fits  This catalog has the same number of rows as the catalog produced from clumps in Working with catalogs (estimating colors). Therefore similar to how we found colors, you can compare the aperture and clump magnitudes for example. You can also change the filter name and zeropoint magnitudes and run this command again to have the fixed aperture magnitude in the F160W filter and measure colors on apertures. #### 2.2.16 Matching catalogs In the example above, we had the luxury to generate the catalogs ourselves, and where thus able to generate them in a way that the rows match. But this isn’t generally the case. In many situations, you need to use catalogs from many different telescopes, or catalogs with high-level calculations that you can’t simply regenerate with the same pixels without spending a lot of time or using heavy computation. In such cases, when each catalog has the coordinates of its own objects, you can use the coordinates to match the rows with Gnuastro’s Match program (see Match). As the name suggests, Gnuastro’s Match program will match rows based on distance (or aperture in 2D) in one, two, or three columns. For this tutorial, let’s try matching the two catalogs that weren’t created from the same labeled images, recall how each has a different number of rows: $ asttable cat/xdf-f105w.fits -hCLUMPS -i
$asttable cat/xdf-f160w.fits -hCLUMPS -i  You give Match two catalogs (from the two different filters we derived above) as argument, and the HDUs containing them (if they are FITS files) with the --hdu and --hdu2 options. The --ccol1 and --ccol2 options specify the coordinate-columns which should be matched with which in the two catalogs. With --aperture you specify the acceptable error (radius in 2D), in the same units as the columns. $ astmatch cat/xdf-f160w.fits           cat/xdf-f105w.fits         \
--hdu=CLUMPS                 --hdu2=CLUMPS              \
--ccol1=RA,DEC               --ccol2=RA,DEC             \
--aperture=0.5/3600 --log                               \
--output=matched.fits
$astfits matched.fits  From the second command, you see that the output has two extensions and that both have the same number of rows. The rows in each extension correspond with the rows in the other. You can also see which objects didn’t match with the --notmatched, like below. Note how each extension now has a different number of rows. $ astmatch cat/xdf-f160w.fits           cat/xdf-f105w.fits         \
--hdu=CLUMPS                 --hdu2=CLUMPS              \
--ccol1=RA,DEC               --ccol2=RA,DEC             \
--aperture=0.5/3600 --log                               \
--output=matched.fits        --notmatched
$astfits matched.fits  The --outcols of Match is a very convenient feature: you can use it to specify which columns from the two catalogs you want in the output (merge two input catalogs into one). If the first character is an ‘a’, the respective matched column (number or name, similar to Table above) in the first catalog will be written in the output table. When the first character is a ‘b’, the respective column from the second catalog will be written in the output. Also, if the first character is followed by _all, then all the columns from the respective catalog will be put in the output. $ astmatch cat/xdf-f160w.fits           cat/xdf-f105w.fits         \
--hdu=CLUMPS                 --hdu2=CLUMPS              \
--ccol1=RA,DEC               --ccol2=RA,DEC             \
--aperture=0.35/3600 --log                              \
--outcols=a_all,bMAGNITUDE,bSN                          \
--output=matched.fits
$astfits matched.fits  Next: , Previous: , Up: General program usage tutorial [Contents][Index] #### 2.2.17 Finding reddest clumps and visual inspection As a final step, let’s go back to the original clumps-based color measurement we generated in Working with catalogs (estimating colors). We’ll find the objects with the strongest color and make a cutout to inspect them visually and finally, we’ll see how they are located on the image. With the command below, we’ll select the reddest objects (those with a color larger than 1.5): $ asttable cat/mags-with-color.fits --range=F105W-F160W,1.5,inf


You can see how many they are by piping it to wc -l:

$asttable cat/mags-with-color.fits --range=F105W-F160W,1.5,inf | wc -l  Let’s crop the F160W image around each of these objects, but we first need a unique identifier for them. We’ll define this identifier using the object and clump labels (with an underscore between them) and feed the output of the command above to AWK to generate a catalog. Note that since we are making a plain text table, we’ll define the necessary (for the string-type first column) metadata manually (see Gnuastro text table format). $ echo "# Column 1: ID [name, str10] Object ID" > reddest.txt
$asttable cat/mags-with-color.fits --range=F105W-F160W,1.5,inf \ | awk '{printf("%d_%-10d %f %f\n",$1, $2,$3, $4)}' \ >> reddest.txt  We can now feed reddest.txt into Gnuastro’s Crop program to see what these objects look like. To keep things clean, we’ll make a directory called crop-red and ask Crop to save the crops in this directory. We’ll also add a -f160w.fits suffix to the crops (to remind us which filter they came from). The width of the crops will be 15 arc-seconds (or 15/3600 degrees, which is the units of the WCS). $ mkdir crop-red
$astcrop flat-ir/xdf-f160w.fits --mode=wcs --namecol=ID \ --catalog=reddest.txt --width=15/3600,15/3600 \ --suffix=-f160w.fits --output=crop-red  You can see all the cropped FITS files in the crop-red directory. Like the MakeProfiles command in Aperture photometry, you might notice that the crops aren’t made in order. This is because each crop is independent of the rest, therefore crops are done in parallel, and parallel operations are asynchronous. In the command above, you can change f160w to f105w to make the crops in both filters. To view the crops more easily (not having to open ds9 for each image), you can convert the FITS crops into the JPEG format with a shell loop like below. $ cd crop-red
$for f in *.fits; do \ astconvertt$f --fluxlow=-0.001 --fluxhigh=0.005 --invert -ojpg;   \
done
$cd ..$ ls crop-red/


You can now use your general graphic user interface image viewer to flip through the images more easily, or import them into your papers/reports.

The for loop above to convert the images will do the job in series: each file is converted only after the previous one is complete. If you have GNU Parallel, you can greatly speed up this conversion. GNU Parallel will run the separate commands simultaneously on different CPU threads in parallel. For more information on efficiently using your threads, see Multi-threaded operations. Here is a replacement for the shell for loop above using GNU Parallel.

$cd crop-red$ parallel astconvertt --fluxlow=-0.001 --fluxhigh=0.005 --invert   \
-ojpg ::: *.fits
$cd ..  Did you notice how much faster this one was? When possible, its always very helpful to do your analysis in parallel. But the problem is that many operations are not as simple as this. For such cases, you can use Make which will greatly help designing workflows. But that is beyond the topic here. As the final action, let’s see how these objects are positioned over the dataset. DS9 has the “Region”s concept for this purpose. You just have to convert your catalog into a “region file” to feed into DS9. To do that, you can use AWK again as shown below. $ awk 'BEGIN{print "# Region file format: DS9 version 4.1";      \
print "global color=green width=2";                 \
print "fk5";}                                       \
!/^#/{printf "circle(%s,%s,1\") # text={%s}\n",$2,$3,$1;}'\ reddest.txt > reddest.reg  This region file can be loaded into DS9 with its -regions option to display over any image (that has world coordinate system). In the example below, we’ll open Segment’s output and load the regions over all the extensions (to see the image and the respective clump): $ ds9 -mecube seg/xdf-f160w.fits -zscale -zoom to fit    \


#### 2.2.18 Writing scripts to automate the steps

In the previous sub-sections, we went through a series of steps like downloading the necessary datasets (in Setup and data download), detecting the objects in the image, and finally selecting a particular subset of them to inspect visually (in Finding reddest clumps and visual inspection). To benefit most effectively from this subsection, please go through the previous sub-sections, and if you haven’t actually done them, we recommended to do/run them before continuing here.

Each sub-section/step of the sub-sections above involved several commands on the command-line. Therefore, if you want to reproduce the previous results (for example to only change one part, and see its effect), you’ll have to go through all the sections above and read through them again. If you done the commands recently, you may also have them in the history of your shell (command-line environment). You can see many of your previous commands on the shell (even if you have closed the terminal) with the history command, like this:

$history  Try it in your teminal to see for your self. By default in GNU Bash, it shows the last 500 commands. You can also save this “history” of previous commands to a file using shell redirection (to have it after your next 500 commands), with this command $ history > my-previous-commands.txt


This is a good way to temporarily keep track of every single command you ran. But in the middle of all the useful commands, you will have many extra commands, like tests that you did before/after the good output of a step (that you decided to continue working on), or an unrelated job you had to do in the middle of this project. Because of these impurities, after a few days (that you have forgot the context: tests you didn’t end-up using, or unrelated jobs) reading this full history will be very frustrating.

Keeping the final commands that were used in each step of an analysis is a common problem for anyone who is doing something serious with the computer. But simply keeping the most important commands in a text file is not enough, the small steps in the middle (like making a directory to keep the outputs of one step) are also important. In other words, the only way you can be sure that you are under control of your processing (and actually understand how you produced your final result) is to run the commands automatically.

Fortunately, typing commands interactively with your fingers isn’t the only way to operate the shell. The shell can also take its orders/commands from a plain-text file, which is called a script. When given a script, the shell will read it line-by-line as if you have actually typed it manually.

Let’s continue with an example: try typing the commands below in your shell. With these commands we are making a text file (a.txt) containing a simple $$3\times3$$ matrix, converting it to a FITS image and computing its basic statistics. After the first three commands open a.txt with a text editor to actually see the values we wrote in it, and after the fourth, open the FITS file to see the matrix as an image. a.txt is created through the shell’s redirection feature: ‘>’ overwrites the existing contents of a file, and ‘>>’ appends the new contents after the old contents.

$echo "1 1 1" > a.txt$ echo "1 2 1" >> a.txt
$echo "1 1 1" >> a.txt$ astconvertt a.txt --output=a.fits
$aststatistics a.fits  To automate these series of commands, you should put them in a text file. But that text file must have two special features: 1) It should tell the shell what program should interpret the script. 2) The operating system should know that the file can be directly executed. For the first, Unix-like operating systems define the shebang concept (also known as sha-bang or hashbang). In the shebang convention, the first two characters of a file should be ‘#!’. When confronted with these characters, the script will be interpretted with the program that follows them. In this case, we want to write a shell script and the most common shell program is GNU Bash which is installed in /bin/bash. So the first line of your script should be ‘#!/bin/bash38. Using your favorite text editor, make a new empty file, let’s call it my-first-script.sh. Write the GNU Bash shebang (above) as its first line After the shebang, copy the series of commands we ran above. Just note that the ‘$’ sign at the start of every line above is the prompt of the interactive shell (you never actually typed it, remember?). Therefore, commands in a shell script should not start with a ‘$’. Once you add the commands, close the text editor and run the cat command to confirm its contents. It should look like the example below. Recall that you should only type the line that starts with a ‘$’, the lines without a ‘$’, are printed automatically on the command-line (they are the contents of your script). $ cat my-first-script.sh
#!/bin/bash
echo "1 1 1" > a.txt
echo "1 2 1" >> a.txt
echo "1 1 1" >> a.txt
astconvertt a.txt --output=a.fits
aststatistics a.fits


The script contents are now ready, but to run it, you should activate the script file’s executable flag. In Unix-like operating systems, every file has three types of flags: read (or r), write (or w) and execute (or x). To toggle a file’s flags, you should use the chmod (for “change mode”) command. To activate a flag, you put a ‘+’ before the flag character (for example +x). To deactivate it, you put a ‘-’ (for example -x). In this case, you want to activate the script’s executable flag, so you should run

$chmod +x my-first-script.sh  Your script is now ready to run/execute the series of commands. To run it, you should call it while specifying its location in the file system. Since you are currently in the same directory as the script, its easiest to use relative addressing like below (where ‘./’ means the current directory). But before running your script, first delete the two a.txt and a.fits files that were created when you interactively ran the commands. $ rm a.txt a.fits
$ls$ ./my-first-script.sh
$ls  The script immediately prints the statistics while doing all the previous steps in the background. With the last ls, you see that it automatically re-built the a.txt and a.fits files, open them and have a look at their contents. An extremely useful feature of shell scripts is that the shell will ignore anything after a ‘#’ character. You can thus add descriptions/comments to the commands and make them much more useful for the future. For example, after adding comments, your script might look like this: $ cat my-first-script.sh
#!/bin/bash

# This script is my first attempt at learning to write shell scripts.
# As a simple series of commands, I am just building a small FITS
# image, and calculating its basic statistics.

# Write the matrix into a file.
echo "1 1 1" > a.txt
echo "1 2 1" >> a.txt
echo "1 1 1" >> a.txt

# Convert the matrix to a FITS image.
astconvertt a.txt --output=a.fits

# Calculate the statistics of the FITS image.
aststatistics a.fits


Isn’t this much more easier to read now? Comments help to provide human-friendly context to the raw commands. At the time you make a script, comments may seem like an extra effort and slow you down. But in one year, you will forget almost everything about your script and you will appreciate the effort so much! Think of the comments as an email to your future-self and always put a well-written description of the context/purpose (most importantly, things that aren’t directly clear by reading the commands) in your scripts.

The example above was very basic and mostly redundant series of commands, to show the basic concepts behind scripts. You can put any (arbitrarily long and complex) series of commands in a script by following the two rules: 1) add a shebang, and 2) enable the executable flag. Infact, as you continue your own research projects, you will find that any time you are dealing with more than two or three commands, keeping them in a script (and modifying that script, and running it) is much more easier, and future-proof, then typing the commands directly on the command-line and relying on things like history. Here are some tips that will come in handy when you are writing your scripts:

As a more realistic example, let’s have a look at a script that will do the steps of Setup and data download and Dataset inspection and cropping. In particular note how often we are using variables to avoid repeating fixed strings of characters (usually file/directory names). This greatly helps in scaling up your project, and avoiding hard-to-find bugs that are caused by typos in those fixed strings.

$cat gnuastro-tutorial-1.sh #!/bin/bash # Download the input datasets # --------------------------- # # The default file names have this format (where FILTER' differs for # each filter): # hlsp_xdf_hst_wfc3ir-60mas_hudf_FILTER_v1_sci.fits # To make the script easier to read, a prefix and suffix variable are # used to sandwich the filter name into one short line. downloaddir=download xdfsuffix=_v1_sci.fits xdfprefix=hlsp_xdf_hst_wfc3ir-60mas_hudf_ xdfurl=http://archive.stsci.edu/pub/hlsp/xdf # The file name and full URLs of the input data. f105w_in=$xdfprefix"f105w"$xdfsuffix f160w_in=$xdfprefix"f160w"$xdfsuffix f105w_full=$xdfurl/$f105w_in f160w_full=$xdfurl/$f160w_in # Go into the download directory and download the images there, # then come back up to the top running directory. mkdir$downloaddir
cd $downloaddir wget$f105w_full
wget $f160w_full cd .. # Only work on the deep region # ---------------------------- # # To help in readability, each vertice of the deep/flat field is stored # as a separate variable. They are then merged into one variable to # define the polygon. flatdir=flat-ir vertice1="53.187414,-27.779152" vertice2="53.159507,-27.759633" vertice3="53.134517,-27.787144" vertice4="53.161906,-27.807208" f105w_flat=$flatdir/xdf-f105w.fits
f160w_flat=$flatdir/xdf-f160w.fits deep_polygon="$vertice1:$vertice2:$vertice3:$vertice4" mkdir$flatdir
astcrop --mode=wcs -h0 --output=$f105w_flat \ --polygon=$deep_polygon $downloaddir/$f105w_in
astcrop --mode=wcs -h0 --output=$f160w_flat \ --polygon=$deep_polygon $downloaddir/$f160w_in


The first thing you may notice is that even if you already have the downloaded input images, this script will always try to re-download them. Also, if you re-run the script, you will notice that mkdir prints an error message that the download directory already exists. Therefore, the script above isn’t too useful and some modifications are necessary to make it more generally useful. Here are some general tips that are often very useful when writing scripts:

Stop script if a command crashes

By default, if a command in a script crashes (aborts and fails to do what it was meant to do), the script will continue onto the next command. In GNU Bash, you can tell the shell to stop a script in the case of a crash by adding this line at the start of your script:

set -e

Check if a file/directory exists to avoid re-creating it

Conditionals are a very useful feature in scripts. One common conditional is to check if a file exists or not. Assuming the file’s name is FILENAME, you can check its existance (to avoid re-doing the commands that build it) like this:

if [ -f FILENAME ]; then
echo "FILENAME exists"
else
# Some commands to generate the file
echo "done" > FILENAME
fi


To check the existance of a directory instead of a file, use -d instead of -f. To negate a conditional, use ‘!’ and note that conditionals can be written in one line also (useful for when its short).

One common scenario that you’ll need to check the existance of directories is when you are making them: the default mkdir command will crash if the desired directory already exists. On some systems (including GNU/Linux distributions), mkdir has options to deal with such cases. But if you want your script to be portable, its best to check yourself like below:

if ! [ -d DIRNAME ]; then mkdir DIRNAME; fi


Taking these tips into consideration, we can write a better version of the script above that includes checks on every step to avoid repeating steps/commands. Please compare this script with the previous one carefully to spot the differences. These are very important points that you will definitely encouter during your own research, and knowing them can greatly help your productiveity, so pay close attention (even in the comments).

$cat gnuastro-tutorial-2.sh #!/bin/bash set -e # Download the input datasets # --------------------------- # # The default file names have this format (where FILTER' differs for # each filter): # hlsp_xdf_hst_wfc3ir-60mas_hudf_FILTER_v1_sci.fits # To make the script easier to read, a prefix and suffix variable are # used to sandwich the filter name into one short line. downloaddir=download xdfsuffix=_v1_sci.fits xdfprefix=hlsp_xdf_hst_wfc3ir-60mas_hudf_ xdfurl=http://archive.stsci.edu/pub/hlsp/xdf # The file name and full URLs of the input data. f105w_in=$xdfprefix"f105w"$xdfsuffix f160w_in=$xdfprefix"f160w"$xdfsuffix f105w_full=$xdfurl/$f105w_in f160w_full=$xdfurl/$f160w_in # Go into the download directory and download the images there, # then come back up to the top running directory. if ! [ -d$downloaddir ]; then mkdir $downloaddir; fi cd$downloaddir
if ! [ -f $f105w_in ]; then wget$f105w_full; fi
if ! [ -f $f160w_in ]; then wget$f160w_full; fi
cd ..

# Only work on the deep region
# ----------------------------
#
# To help in readability, each vertice of the deep/flat field is stored
# as a separate variable. They are then merged into one variable to
# define the polygon.
flatdir=flat-ir
vertice1="53.187414,-27.779152"
vertice2="53.159507,-27.759633"
vertice3="53.134517,-27.787144"
vertice4="53.161906,-27.807208"
f105w_flat=$flatdir/xdf-f105w.fits f160w_flat=$flatdir/xdf-f160w.fits
deep_polygon="$vertice1:$vertice2:$vertice3:$vertice4"

if ! [ -d $flatdir ]; then mkdir$flatdir; fi
if ! [ -f $f105w_flat ]; then astcrop --mode=wcs -h0 --output=$f105w_flat \
--polygon=$deep_polygon$downloaddir/$f105w_in fi if ! [ -f$f160w_flat ]; then
astcrop --mode=wcs -h0 --output=$f160w_flat \ --polygon=$deep_polygon $downloaddir/$f160w_in
fi


#### 2.2.19 Citing and acknowledging Gnuastro

In conclusion, we hope this extended tutorial has been a good starting point to help in your exciting research. If this book or any of the programs in Gnuastro have been useful for your research, please cite the respective papers, and acknowledge the funding agencies that made all of this possible. Without citations, we won’t be able to secure future funding to continue working on Gnuastro or improving it, so please take software citation seriously (for all the scientific software you use, not just Gnuastro).

To help you in this aspect is well, all Gnuastro programs have a --cite option to facilitate the citation and acknowledgment. Just note that it may be necessary to cite additional papers for different programs, so please try it out on all the programs that you used, for example:

$astmkcatalog --cite$ astnoisechisel --cite


Previous: , Up: Tutorials   [Contents][Index]

### 2.3 Detecting large extended targets

The outer wings of large and extended objects can sink into the noise very gradually and can have a large variety of shapes (for example due to tidal interactions). Therefore separating the outer boundaries of the galaxies from the noise can be particularly tricky. Besides causing an under-estimation in the total estimated brightness of the target, failure to detect such faint wings will also cause a bias in the noise measurements, thereby hampering the accuracy of any measurement on the dataset. Therefore even if they don’t constitute a significant fraction of the target’s light, or aren’t your primary target, these regions must not be ignored. In this tutorial, we’ll walk you through the strategy of detecting such targets using NoiseChisel.

 Don’t start with this tutorial: If you haven’t already completed General program usage tutorial, we strongly recommend going through that tutorial before starting this one. Basic features like access to this book on the command-line, the configuration files of Gnuastro’s programs, benefiting from the modular nature of the programs, viewing multi-extension FITS files, or using NoiseChisel’s outputs are discussed in more detail there.

We’ll try to detect the faint tidal wings of the beautiful M51 group39 in this tutorial. We’ll use a dataset/image from the public Sloan Digital Sky Survey, or SDSS. Due to its more peculiar low surface brightness structure/features, we’ll focus on the dwarf companion galaxy of the group (or NGC 5195). To get the image, you can use SDSS’s Simple field search tool. As long as it is covered by the SDSS, you can find an image containing your desired target either by providing a standard name (if it has one), or its coordinates. To access the dataset we will use here, write NGC5195 in the “Object Name” field and press “Submit” button.

 Type the example commands: Try to type the example commands on your terminal and use the history feature of your command-line (by pressing the “up” button to retrieve previous commands). Don’t simply copy and paste the commands shown here. This will help simulate future situations when you are processing your own datasets.

You can see the list of available filters under the color image. For this demonstration, we’ll use the r-band filter image. By clicking on the “r-band FITS” link, you can download the image. Alternatively, you can just run the following command to download it with GNU Wget40. To keep things clean, let’s also put it in a directory called ngc5195. With the -O option, we are asking Wget to save the downloaded file with a more manageable name: r.fits.bz2 (this is an r-band image of NGC 5195, which was the directory name).

$mkdir ngc5195$ cd ngc5195
$topurl=https://dr12.sdss.org/sas/dr12/boss/photoObj/frames$ wget $topurl/301/3716/6/frame-r-003716-6-0117.fits.bz2 -Or.fits.bz2  This server keeps the files in a Bzip2 compressed file format. So we’ll first decompress it with the following command. By convention, compression programs delete the original file (compressed when uncompressing, or uncompressed when compressing). To keep the original file, you can use the --keep or -k option which is available in most compression programs for this job. Here, we don’t need the compressed file any more, so we’ll just let bunzip delete it for us and keep the directory clean. $ bunzip2 r.fits.bz2


#### 2.3.1 NoiseChisel optimization

In Detecting large extended targets we downloaded the single exposure SDSS image. Let’s see how NoiseChisel operates on it with its default parameters:

$astnoisechisel r.fits -h0  As described in NoiseChisel and Multiextension FITS files, NoiseChisel’s default output is a multi-extension FITS file. Open the output r_detected.fits file and have a look at the extensions, the first extension is only meta-data and contains NoiseChisel’s configuration parameters. The rest are the Sky-subtracted input, the detection map, Sky values and Sky standard deviation. $ ds9 -mecube r_detected.fits -zscale -zoom to fit


Flipping through the extensions in a FITS viewer, you will see that the first image (Sky-subtracted image) looks reasonable: there are no major artifacts due to bad Sky subtraction compared to the input. The second extension also seems reasonable with a large detection map that covers the whole of NGC5195, but also extends beyond towards the bottom of the image.

Now try flipping between the DETECTIONS and SKY extensions. In the SKY extension, you’ll notice that there is still significant signal beyond the detected pixels. You can tell that this signal belongs to the galaxy because the far-right side of the image is dark and the brighter tiles are surrounding the detected pixels.

The fact that signal from the galaxy remains in the Sky dataset shows that you haven’t done a good detection. The SKY extension must not contain any light around the galaxy. Generally, any time your target is much larger than the tile size and the signal is almost flat (like this case), this will happen. Therefore, when there are large objects in the dataset, the best place to check the accuracy of your detection is the estimated Sky image.

When dominated by the background, noise has a symmetric distribution. However, signal is not symmetric (we don’t have negative signal). Therefore when non-constant signal is present in a noisy dataset, the distribution will be positively skewed. This skewness is a good measure of how much signal we have in the distribution. The skewness can be accurately measured by the difference in the mean and median: assuming no strong outliers, the more distant they are, the more skewed the dataset is. For more see Quantifying signal in a tile.

However, skewness is only a proxy for signal when the signal has structure (varies per pixel). Therefore, when it is approximately constant over a whole tile, or sub-set of the image, the signal’s effect is just to shift the symmetric center of the noise distribution to the positive and there won’t be any skewness (major difference between the mean and median). This positive41 shift that preserves the symmetric distribution is the Sky value. When there is a gradient over the dataset, different tiles will have different constant shifts/Sky-values, for example see Figure 11 of Akhlaghi and Ichikawa [2015].

To get less scatter in measuring the mean and median (and thus better estimate the skewness), you will need a larger tile. So let’s play with the tessellation a little to see how it affects the result. In Gnuastro, you can see the option values (--tilesize in this case) by adding the -P option to your last command. Try running NoiseChisel with -P to see its default tile size.

You can clearly see that the default tile size is indeed much smaller than this (huge) galaxy and its tidal features. As a result, NoiseChisel was unable to identify the skewness within the tiles under the outer parts of M51 and NGC 5159 and the threshold has been over-estimated on those tiles. To see which tiles were used for estimating the quantile threshold (no skewness was measured), you can use NoiseChisel’s --checkqthresh option:

$astnoisechisel r.fits -h0 --checkqthresh  Notice how this option doesn’t allow NoiseChisel to finish. NoiseChisel aborted after finding and applying the quantile thresholds. When you call any of NoiseChisel’s --check* options, by default, it will abort as soon as all the check steps have been written in the check file (a multi-extension FITS file). This allows you to focus on the problem you wanted to check as soon as possible (you can disable this feature with the --continueaftercheck option). To optimize the threshold-related settings for this image, let’s playing with this quantile threshold check image a little. Don’t forget that “Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer” (Anscombe 1973, see Science and its tools). A good scientist must have a good understanding of her tools to make a meaningful analysis. So don’t hesitate in playing with the default configuration and reviewing the manual when you have a new dataset in front of you. Robust data analysis is an art, therefore a good scientist must first be a good artist. The first extension of r_qthresh.fits (CONVOLVED) is the convolved input image where the threshold(s) is(are) defined and applied. For more on the effect of convolution and thresholding, see Sections 3.1.1 and 3.1.2 of Akhlaghi and Ichikawa [2015]. The second extension (QTHRESH_ERODE) has a blank value for all the pixels of any tile that was identified as having significant signal. The next two extensions (QTHRESH_NOERODE and QTHRESH_EXPAND) are the other two quantile thresholds that are necessary in NoiseChisel’s later steps. Every step in this file is repeated on the three thresholds. Play a little with the color bar of the QTHRESH_ERODE extension, you clearly see how the non-blank tiles around NGC 5195 have a gradient. As one line of attack against discarding too much signal below the threshold, NoiseChisel rejects outlier tiles. Go forward by three extensions to VALUE1_NO_OUTLIER and you will see that many of the tiles over the galaxy have been removed in this step. For more on the outlier rejection algorithm, see the latter half of Quantifying signal in a tile. However, the default outlier rejection parameters weren’t enough, and when you play with the color-bar, you still see a strong gradient around the outer tidal feature of the galaxy. You have two strategies for fixing this problem: 1) Increase the tile size to get more accurate measurements of skewness. 2) Strengthen the outlier rejection parameters to discard more of the tiles with signal. Fortunately in this image we have a sufficiently large region on the right of the image that the galaxy doesn’t extend to. So we can use the more robust first solution. In situations where this doesn’t happen (for example if the field of view in this image was shifted to have more of M51 and less sky) you are limited to a combination of the two solutions or just to the second solution.  Skipping convolution for faster tests: The slowest step of NoiseChisel is the convolution of the input dataset. Therefore when your dataset is large (unlike the one in this test), and you are not changing the input dataset or kernel in multiple runs (as in the tests of this tutorial), it is faster to do the convolution separately once (using Convolve) and use NoiseChisel’s --convolved option to directly feed the convolved image and avoid convolution. For more on --convolved, see NoiseChisel input. To identify the skewness caused by the flat NGC 5195 and M51 tidal features on the tiles under it, we thus have to choose a tile size that is larger than the gradient of the signal. Let’s try a tile size of 75 by 75 pixels: $ astnoisechisel r.fits -h0 --tilesize=75,75 --checkqthresh


You can clearly see the effect of this increased tile size: the tiles are much larger and when you look into VALUE1_NO_OUTLIER, you see that almost all the previous tiles under the galaxy have been discarded and we only have a few tiles on the edge with a gradient. So let’s define a more strict condition to keep tiles:

$astnoisechisel r.fits -h0 --tilesize=75,75 --meanmedqdiff=0.001 \ --checkqthresh  After constraining --meanmedqdiff, NoiseChisel stopped with a different error. Please read it: at the start, it says that only 6 tiles passed the constraint while you have asked for 9. The r_qthresh.fits image also only has 8 extensions (not the original 15). Take a look at the initially selected tiles and those after outlier rejection. You can see the place of the tiles that passed. They seem to be in the good place (very far away from the M51 group and its tidal feature. Using the 6 nearest neighbors is also not too bad. So let’s decrease the number of neighboring tiles for interpolation so NoiseChisel can continue: $ astnoisechisel r.fits -h0 --tilesize=75,75 --meanmedqdiff=0.001 \
--interpnumngb=6 --checkqthresh


The next group of extensions (those ending with _INTERP), give a value to all blank tiles based on the nearest tiles with a measurement. The following group of extensions (ending with _SMOOTH) have smoothed the interpolated image to avoid sharp cuts on tile edges. Inspecting THRESH1_SMOOTH, you can see that there is no longer any significant gradient and no major signature of NGC 5195 exists.

We can now remove --checkqthresh and let NoiseChisel proceed with its detection. Also, similar to the argument in NoiseChisel optimization for detection, in the command above, we set the pseudo-detection signal-to-noise ratio quantile (--snquant) to 0.95.

$rm r_qthresh.fits$ astnoisechisel r.fits -h0 --tilesize=75,75 --meanmedqdiff=0.001 \
--interpnumngb=6 --snquant=0.95


Looking at the DETECTIONS extension of NoiseChisel’s output, we see the right-ward edges in particular have many holes that are fully surrounded by signal and the signal stretches out in the noise very thinly (the size of the holes increases as we go out). This suggests that there is still signal that can be detected. You can confirm this guess by looking at the SKY extension to see that indeed, there is a clear footprint of the M51 group in the Sky image (which is not good!). Therefore, we should dig deeper into the noise.

With the --detgrowquant option, NoiseChisel will use the detections as seeds and grow them in to the noise. Its value is the ultimate limit of the growth in units of quantile (between 0 and 1). Therefore --detgrowquant=1 means no growth and --detgrowquant=0.5 means an ultimate limit of the Sky level (which is usually too much!). Try running the previous command with various values (from 0.6 to higher values) to see this option’s effect. For this particularly huge galaxy (with signal that extends very gradually into the noise), we’ll set it to 0.65:

$astnoisechisel r.fits -h0 --tilesize=75,75 --meanmedqdiff=0.001 \ --interpnumngb=6 --snquant=0.95 --detgrowquant=0.65  Beyond this level (smaller --detgrowquant values), you see the smaller background galaxies starting to create thin spider-leg-like features, showing that we are following correlated noise for too much. Now, when you look at the DETECTIONS extension, you see the wings of the galaxy being detected much farther out, But you also see many holes which are clearly just caused by noise. After growing the objects, NoiseChisel also allows you to fill such holes when they are smaller than a certain size through the --detgrowmaxholesize option. In this case, a maximum area/size of 10,000 pixels seems to be good: $ astnoisechisel r.fits -h0 --tilesize=75,75 --meanmedqdiff=0.001    \
--interpnumngb=6 --snquant=0.95 --detgrowquant=0.65 \
--detgrowmaxholesize=10000


The detection looks good now, but when you look in to the SKY extension, you still clearly still see a footprint of the galaxy. We’ll leave it as an exercise for you to play with NoiseChisel further and improve the detected pixels.

So, we’ll just stop with one last tool NoiseChisel gives you to get a slightly better estimation of the Sky: --minskyfrac. On each tile, NoiseChisel will only measure the Sky-level if the fraction of undetected pixels is larger than the value given to this option. To avoid the edges of the galaxy, we’ll set it to 0.9. Therefore, tiles that are covered by detected pixels for more than $$10\%$$ of their area are ignored.

$astnoisechisel r.fits -h0 --tilesize=75,75 --meanmedqdiff=0.001 \ --interpnumngb=6 --snquant=0.95 --detgrowquant=0.65 \ --detgrowmaxholesize=10000 --minskyfrac=0.9  The footprint of the galaxy still exists in the SKY extension, but it has decreased in significance now. Let’s calculate the significance of the undetected gradient, in units of noise. Since the gradient is roughly along the horizontal axis, we’ll collapse the image along the second (vertical) FITS dimension to have a 1D array (a table column, see its values with the second command). $ astarithmetic r_detected.fits 2 collapse-mean -hSKY -ocollapsed.fits
$asttable collapsed.fits  We can now calculate the minimum and maximum values of this array and define their difference (in units of noise) as the gradient: $ grad=$(astarithmetic r_detected.fits 2 collapse-mean set-i \ i maxvalue i minvalue - -hSKY -q)$ echo $grad$ std=$(aststatistics r_detected.fits -hSKY_STD --mean)$ echo $std$ astarithmetic -q $grad$std /


The undetected gradient (grad above) is thus roughly a quarter of the noise. But don’t forget that this is per-pixel: individually its small, but it extends over millions of pixels, so the total flux may still be relevant.

When looking at the raw input shallow image, you don’t see anything so far out of the galaxy. You might just think that “this is all noise, I have just dug too deep and I’m following systematics”! If you feel like this, have a look at the deep images of this system in Watkins et al. [2015], or a 12 hour deep image of this system (with a 12-inch telescope): https://i.redd.it/jfqgpqg0hfk11.jpg42. In these deeper images you see that the outer edges of the M51 group clearly follow this exact structure, below in Achieved surface brightness level, we’ll measure the exact level.

As the gradient in the SKY extension shows, and the deep images cited above confirm, the galaxy’s signal extends even beyond this. But this is already far deeper than what most (if not all) other tools can detect. Therefore, we’ll stop configuring NoiseChisel at this point in the tutorial and let you play with it a little more while reading more about it in NoiseChisel.

After finishing this tutorial please go through the NoiseChisel paper and its options and play with them to further decrease the gradient. This will greatly help you get a good feeling of the options. When you do find a better configuration, please send it to us and we’ll mention your name here with your suggested configuration. Don’t forget that good data analysis is an art, so like a sculptor, master your chisel for a good result.

 This NoiseChisel configuration is NOT GENERIC: Don’t use this configuration blindly on another image. As you saw above, the reason we chose this particular configuration for NoiseChisel to detect the wings of the M51 group was strongly influenced by the noise properties of this particular image. So as long as your image noise has similar properties (from the same data-reduction step of the same database), you can use this configuration on any image. For images from other instruments, or higher-level/reduced SDSS products, please follow a similar logic to what was presented here and find the best configuration yourself.
 Smart NoiseChisel: As you saw during this section, there is a clear logic behind the optimal parameter value for each dataset. Therefore, we plan to capabilities to (optionally) automate some of the choices made here based on the actual dataset, please join us in doing this if you are interested. However, given the many problems in existing “smart” solutions, such automatic changing of the configuration may cause more problems than they solve. So even when they are implemented, we would strongly recommend quality checks for a robust analysis.

Previous: , Up: Detecting large extended targets   [Contents][Index]

#### 2.3.2 Achieved surface brightness level

In NoiseChisel optimization we showed how to customize NoiseChisel for a single-exposure SDSS image of the M51 group. Let’s measure how deep we carved the signal out of noise. For this measurement, we’ll need to estimate the average flux on the outer edges of the detection. Fortunately all this can be done with a few simple commands (and no higher-level language mini-environments like Python or IRAF) using Arithmetic and MakeCatalog.

First, let’s separate each detected region, or give a unique label/counter to all the connected pixels of NoiseChisel’s detection map:

$det="r_detected.fits -hDETECTIONS"$ astarithmetic $det 2 connected-components -olabeled.fits  You can find the label of the main galaxy visually (by opening the image and hovering your mouse over the M51 group’s label). But to have a little more fun, lets do this automatically. The M51 group detection is by far the largest detection in this image, this allows us to find the ID/label that corresponds to it. We’ll first run MakeCatalog to find the area of all the detections, then we’ll use AWK to find the ID of the largest object and keep it as a shell variable (id): $ astmkcatalog labeled.fits --ids --geoarea -h1 -ocat.txt
$id=$(awk '!/^#/{if($2>max) {id=$1; max=$2}} END{print id}' cat.txt)$ echo $id  To separate the outer edges of the detections, we’ll need to “erode” the M51 group detection. We’ll erode three times (to have more pixels and thus less scatter), using a maximum connectivity of 2 (8-connected neighbors). We’ll then save the output in eroded.fits. $ astarithmetic labeled.fits $id eq 2 erode 2 erode 2 erode \ -oeroded.fits  In labeled.fits, we can now set all the 1-valued pixels of eroded.fits to 0 using Arithmetic’s where operator added to the previous command. We’ll need the pixels of the M51 group in labeled.fits two times: once to do the erosion, another time to find the outer pixel layer. To do this (and be efficient and more readable) we’ll use the set-i operator. In the command below, it will save/set/name the pixels of the M51 group as the ‘i’. In this way we can use it any number of times afterwards, while only reading it from disk and finding M51’s pixels once. $ astarithmetic labeled.fits $id eq set-i i \ i 2 erode 2 erode 2 erode 0 where -oedge.fits  Open the image and have a look. You’ll see that the detected edge of the M51 group is now clearly visible. You can use edge.fits to mark (set to blank) this boundary on the input image and get a visual feeling of how far it extends: $ astarithmetic r.fits edge.fits nan where -ob-masked.fits -h0


To quantify how deep we have detected the low-surface brightness regions, we’ll use the command below. In short it just divides all the non-zero pixels of edge.fits in the Sky subtracted input (first extension of NoiseChisel’s output) by the pixel standard deviation of the same pixel. This will give us a signal-to-noise ratio image. The mean value of this image shows the level of surface brightness that we have achieved.

You can also break the command below into multiple calls to Arithmetic and create temporary files to understand it better. However, if you have a look at Reverse polish notation and Arithmetic operators, you should be able to easily understand what your computer does when you run this command43.

$edge="edge.fits -h1"$ skystd="r_detected.fits -hSKY_STD"
$skysub="r_detected.fits -hINPUT-NO-SKY"$ astarithmetic $skysub$skystd / $edge not nan where \ meanvalue --quiet  We have thus detected the wings of the M51 group down to roughly 1/4th of the noise level in this image! But the signal-to-noise ratio is a relative measurement. Let’s also measure the depth of our detection in absolute surface brightness units; or magnitudes per square arcseconds. To find out, we’ll first need to calculate how many pixels of this image are in one arcsecond-squared. Fortunately the world coordinate system (or WCS) meta data of Gnuastro’s output FITS files (in particular the CDELT keywords) give us this information. $ pixscale=$(astfits r_detected.fits -h1 \ | awk '/CDELT1/ {p=1/($3*3600); print p*p}')
$echo$pixscale


Note that we multiplied the value by 3600 so we work in units of arc-seconds not degrees. Now, let’s calculate the average sky-subtracted flux in the border region per pixel.

$f=$(astarithmetic r_detected.fits edge.fits not nan where set-i \
i sumvalue i numbervalue / -q -hINPUT-NO-SKY)
$echo$f


We can just multiply the two to get the average flux on this border in one arcsecond squared. We also have the r-band SDSS zeropoint magnitude44 to be 24.80. Therefore we can get the surface brightness of the outer edge (in magnitudes per arcsecond squared) using the following command. Just note that log in AWK is in base-2 (not 10), and that AWK doesn’t have a log10 operator. So we’ll do an extra division by log(10) to correct for this.

$z=24.80$ echo "$pixscale$f $z" | awk '{print -2.5*log($1*$2)/log(10)+$3}'
--> 28.2989


On a single-exposure SDSS image, we have reached a surface brightness limit fainter than 28 magnitudes per arcseconds squared!

In interpreting this value, you should just have in mind that NoiseChisel works based on the contiguity of signal in the pixels. Therefore the larger the object, the deeper NoiseChisel can carve it out of the noise. In other words, this reported depth, is only for this particular object and dataset, processed with this particular NoiseChisel configuration: if the M51 group in this image was larger/smaller than this, or if the image was larger/smaller, or if we had used a different configuration, we would go deeper/shallower.

To avoid typing all these options every time you run NoiseChisel on this image, you can use Gnuastro’s configuration files, see Configuration files. For an applied example of setting/using them, see Option management and configuration files.

To continue your analysis of such datasets with extended emission, you can use Segment to identify all the “clumps” over the diffuse regions: background galaxies and foreground stars.

$astsegment r_detected.fits  Open the output r_detected_segmented.fits as a multi-extension data cube like before and flip through the first and second extensions to see the detected clumps (all pixels with a value larger than 1). To optimize the parameters and make sure you have detected what you wanted, we recommend to visually inspect the detected clumps on the input image. For visual inspection, you can make a simple shell script like below. It will first call MakeCatalog to estimate the positions of the clumps, then make an SAO ds9 region file and open ds9 with the image and region file. Recall that in a shell script, the numeric variables (like $1, $2, and $3 in the example below) represent the arguments given to the script. But when used in the AWK arguments, they refer to column numbers.

To create the shell script, using your favorite text editor, put the contents below into a file called check-clumps.sh. Recall that everything after a # is just comments to help you understand the command (so read them!). Also note that if you are copying from the PDF version of this book, fix the single quotes in the AWK command.

#! /bin/bash
set -e	   # Stop execution when there is an error.
set -u	   # Stop execution when a variable is not initialized.

# Run MakeCatalog to write the coordinates into a FITS table.
# Default output is $1_cat.fits'. astmkcatalog$1.fits --clumpscat --ids --ra --dec

# Use Gnuastro's Table program to read the RA and Dec columns of the
# clumps catalog (in the CLUMPS' extension). Then pipe the columns
# to AWK for saving as a DS9 region file.
asttable $1"_cat.fits" -hCLUMPS -cRA,DEC \ | awk 'BEGIN { print "# Region file format: DS9 version 4.1"; \ print "global color=green width=1"; \ print "fk5" } \ { printf "circle(%s,%s,1\")\n",$1, $2 }' >$1.reg

# Show the image (with the requested color scale) and the region file.
ds9 -geometry 1800x3000 -mecube $1.fits -zoom to fit \ -scale limits$2 $3 -regions load all$1.reg

# Clean up (delete intermediate files).
rm $1"_cat.fits"$1.reg


Finally, you just have to activate the script’s executable flag with the command below. This will enable you to directly/easily call the script as a command.

$chmod +x check-clumps.sh  This script doesn’t expect the .fits suffix of the input’s filename as the first argument. Because the script produces intermediate files (a catalog and DS9 region file, which are later deleted). However, we don’t want multiple instances of the script (on different files in the same directory) to collide (read/write to the same intermediate files). Therefore, we have used suffixes added to the input’s name to identify the intermediate files. Note how all the $1 instances in the commands (not within the AWK command45) are followed by a suffix. If you want to keep the intermediate files, put a # at the start of the last line.

The few, but high-valued, bright pixels in the central parts of the galaxies can hinder easy visual inspection of the fainter parts of the image. With the second and third arguments to this script, you can set the numerical values of the color map (first is minimum/black, second is maximum/white). You can call this script with any46 output of Segment (when --rawoutput is not used) with a command like this:

$./check-clumps.sh r_detected_segmented -0.1 2  Go ahead and run this command. You will see the intermediate processing being done and finally it opens SAO DS9 for you with the regions superimposed on all the extensions of Segment’s output. The script will only finish (and give you control of the command-line) when you close DS9. If you need your access to the command-line before closing DS9, add a & after the end of the command above. While DS9 is open, slide the dynamic range (values for black and white, or minimum/maximum values in different color schemes) and zoom into various regions of the M51 group to see if you are satisfied with the detected clumps. Don’t forget that through the “Cube” window that is opened along with DS9, you can flip through the extensions and see the actual clumps also. The questions you should be asking your self are these: 1) Which real clumps (as you visually feel) have been missed? In other words, is the completeness good? 2) Are there any clumps which you feel are false? In other words, is the purity good? Note that completeness and purity are not independent of each other, they are anti-correlated: the higher your purity, the lower your completeness and vice-versa. You can see this by playing with the purity level using the --snquant option. Run Segment as shown above again with -P and see its default value. Then increase/decrease it for higher/lower purity and check the result as before. You will see that if you want the best purity, you have to sacrifice completeness and vice versa. One interesting region to inspect in this image is the many bright peaks around the central parts of M51. Zoom into that region and inspect how many of them have actually been detected as true clumps. Do you have a good balance between completeness and purity? Also look out far into the wings of the group and inspect the completeness and purity there. An easier way to inspect completeness (and only completeness) is to mask all the pixels detected as clumps and visually inspecting the rest of the pixels. You can do this using Arithmetic in a command like below. For easy reading of the command, we’ll define the shell variable i for the image name and save the output in masked.fits. $ in="r_detected_segmented.fits -hINPUT"
$clumps="r_detected_segmented.fits -hCLUMPS"$ astarithmetic $in$clumps 0 gt nan where -oclumps-masked.fits


Inspecting clumps-masked.fits, you can see some very diffuse peaks that have been missed, especially as you go farther away from the group center and into the diffuse wings. This is due to the fact that with this configuration, we have focused more on the sharper clumps. To put the focus more on diffuse clumps, you can use a wider convolution kernel. Using a larger kernel can also help in detecting the existing clumps to fainter levels (thus better separating them from the surrounding diffuse signal).

You can make any kernel easily using the --kernel option in MakeProfiles. But note that a larger kernel is also going to wash-out many of the sharp/small clumps close to the center of M51 and also some smaller peaks on the wings. Please continue playing with Segment’s configuration to obtain a more complete result (while keeping reasonable purity). We’ll finish the discussion on finding true clumps at this point.

The properties of the clumps within M51, or the background objects can then easily be measured using MakeCatalog. To measure the properties of the background objects (detected as clumps over the diffuse region), you shouldn’t mask the diffuse region. When measuring clump properties with MakeCatalog and using the --clumpscat, the ambient flux (from the diffuse region) is calculated and subtracted. If the diffuse region is masked, its effect on the clump brightness cannot be calculated and subtracted.

To keep this tutorial short, we’ll stop here. See Segmentation and making a catalog and Segment for more on using Segment, producing catalogs with MakeCatalog and using those catalogs.

Next: , Previous: , Up: Top   [Contents][Index]

## 3 Installation

The latest released version of Gnuastro source code is always available at the following URL:

Quick start describes the commands necessary to configure, build, and install Gnuastro on your system. This chapter will be useful in cases where the simple procedure above is not sufficient, for example your system lacks a mandatory/optional dependency (in other words, you can’t pass the $./configure step), or you want greater customization, or you want to build and install Gnuastro from other random points in its history, or you want a higher level of control on the installation. Thus if you were happy with downloading the tarball and following Quick start, then you can safely ignore this chapter and come back to it in the future if you need more customization. Dependencies describes the mandatory, optional and bootstrapping dependencies of Gnuastro. Only the first group are required/mandatory when you are building Gnuastro using a tarball (see Release tarball), they are very basic and low-level tools used in most astronomical software, so you might already have them installed, if not they are very easy to install as described for each. Downloading the source discusses the two methods you can obtain the source code: as a tarball (a significant snapshot in Gnuastro’s history), or the full history47. The latter allows you to build Gnuastro at any random point in its history (for example to get bug fixes or new features that are not released as a tarball yet). The building and installation of Gnuastro is heavily customizable, to learn more about them, see Build and install. This section is essentially a thorough explanation of the steps in Quick start. It discusses ways you can influence the building and installation. If you encounter any problems in the installation process, it is probably already explained in Known issues. In Other useful software the installation and usage of some other free software that are not directly required by Gnuastro but might be useful in conjunction with it is discussed. Next: , Previous: , Up: Installation [Contents][Index] ### 3.1 Dependencies A minimal set of dependencies are mandatory for building Gnuastro from the standard tarball release. If they are not present you cannot pass Gnuastro’s configuration step. The mandatory dependencies are therefore very basic (low-level) tools which are easy to obtain, build and install, see Mandatory dependencies for a full discussion. If you have the packages of Optional dependencies, Gnuastro will have additional functionality (for example converting FITS images to JPEG or PDF). If you are installing from a tarball as explained in Quick start, you can stop reading after this section. If you are cloning the version controlled source (see Version controlled source), an additional bootstrapping step is required before configuration and its dependencies are explained in Bootstrapping dependencies. Your operating system’s package manager is an easy and convenient way to download and install the dependencies that are already pre-built for your operating system. In Dependencies from package managers, we’ll list some common operating system package manager commands to install the optional and mandatory dependencies. Next: , Previous: , Up: Dependencies [Contents][Index] #### 3.1.1 Mandatory dependencies The mandatory Gnuastro dependencies are very basic and low-level tools. They all follow the same basic GNU based build system (like that shown in Quick start), so even if you don’t have them, installing them should be pretty straightforward. In this section we explain each program and any specific note that might be necessary in the installation. Next: , Previous: , Up: Mandatory dependencies [Contents][Index] #### 3.1.1.1 GNU Scientific library The GNU Scientific Library, or GSL, is a large collection of functions that are very useful in scientific applications, for example integration, random number generation, and Fast Fourier Transform among many others. To install GSL from source, you can run the following commands after you have downloaded gsl-latest.tar.gz: $ tar xf gsl-latest.tar.gz
$cd gsl-X.X # Replace X.X with version number.$ ./configure
$make -j8 # Replace 8 with no. CPU threads.$ make check
$sudo make install  Next: , Previous: , Up: Mandatory dependencies [Contents][Index] #### 3.1.1.2 CFITSIO CFITSIO is the closest you can get to the pixels in a FITS image while remaining faithful to the FITS standard. It is written by William Pence, the principal author of the FITS standard48, and is regularly updated. Setting the definitions for all other software packages using FITS images. Some GNU/Linux distributions have CFITSIO in their package managers, if it is available and updated, you can use it. One problem that might occur is that CFITSIO might not be configured with the --enable-reentrant option by the distribution. This option allows CFITSIO to open a file in multiple threads, it can thus provide great speed improvements. If CFITSIO was not configured with this option, any program which needs this capability will warn you and abort when you ask for multiple threads (see Multi-threaded operations). To install CFITSIO from source, we strongly recommend that you have a look through Chapter 2 (Creating the CFITSIO library) of the CFITSIO manual and understand the options you can pass to $ ./configure (they aren’t too much). This is a very basic package for most astronomical software and it is best that you configure it nicely with your system. Once you download the source and unpack it, the following configure script should be enough for most purposes. Don’t forget to read chapter two of the manual though, for example the second option is only for 64bit systems. The manual also explains how to check if it has been installed correctly.

CFITSIO comes with two executable files called fpack and funpack. From their manual: they “are standalone programs for compressing and uncompressing images and tables that are stored in the FITS (Flexible Image Transport System) data format. They are analogous to the gzip and gunzip compression programs except that they are optimized for the types of astronomical images that are often stored in FITS format”. The commands below will compile and install them on your system along with CFITSIO. They are not essential for Gnuastro, since they are just wrappers for functions within CFITSIO, but they can come in handy. The make utils command is only available for versions above 3.39, it will build these executable files along with several other executable test files which are deleted in the following commands before the installation (otherwise the test files will also be installed).

The commands necessary to decompress, build and install CFITSIO from source are described below. Let’s assume you have downloaded cfitsio_latest.tar.gz and are in the same directory:

$tar xf cfitsio_latest.tar.gz$ cd cfitsio-X.XX                   # Replace X.XX with version
$./configure --prefix=/usr/local --enable-sse2 --enable-reentrant$ make
$make utils$ ./testprog > testprog.lis
$diff testprog.lis testprog.out # Should have no output$ cmp testprog.fit testprog.std     # Should have no output
$rm cookbook fitscopy imcopy smem speed testprog$ sudo make install


Previous: , Up: Mandatory dependencies   [Contents][Index]

#### 3.1.1.3 WCSLIB

WCSLIB is written and maintained by one of the authors of the World Coordinate System (WCS) definition in the FITS standard49, Mark Calabretta. It might be already built and ready in your distribution’s package management system. However, here the installation from source is explained, for the advantages of installation from source please see Mandatory dependencies. To install WCSLIB you will need to have CFITSIO already installed, see CFITSIO.

WCSLIB also has plotting capabilities which use PGPLOT (a plotting library for C). If you wan to use those capabilities in WCSLIB, PGPLOT provides the PGPLOT installation instructions. However PGPLOT is old50, so its installation is not easy, there are also many great modern WCS plotting tools (mostly in written in Python). Hence, if you will not be using those plotting functions in WCSLIB, you can configure it with the --without-pgplot option as shown below.

If you have the cURL library 51 on your system and you installed CFITSIO version 3.42 or later, you will need to also link with the cURL library at configure time (through the -lcurl option as shown below). CFITSIO uses the cURL library for its HTTPS (or HTTP Secure52) support and if it is present on your system, CFITSIO will depend on it. Therefore, if ./configure command below fails (you don’t have the cURL library), then remove this option and rerun it.

Let’s assume you have downloaded wcslib.tar.bz2 and are in the same directory, to configure, build, check and install WCSLIB follow the steps below.

$tar xf wcslib.tar.bz2 ## In the cd' command, replace X.X' with version number.$ cd wcslib-X.X

## If ./configure' fails, remove -lcurl' and run again.
$./configure LIBS="-pthread -lcurl -lm" --without-pgplot \ --disable-fortran$ make
$make check$ sudo make install


Next: , Previous: , Up: Dependencies   [Contents][Index]

#### 3.1.2 Optional dependencies

The libraries listed here are only used for very specific applications, therefore if you don’t want these operations, Gnuastro will be built and installed without them and you don’t have to have the dependencies.

If the ./configure script can’t find these requirements, it will warn you in the end that they are not present and notify you of the operation(s) you can’t do due to not having them. If the output you request from a program requires a missing library, that program is going to warn you and abort. In the case of program dependencies (like GPL GhostScript), if you install them at a later time, the program will run. This is because if required libraries are not present at build time, the executables cannot be built, but an executable is called by the built program at run time so if it becomes available, it will be used. If you do install an optional library later, you will have to rebuild Gnuastro and reinstall it for it to take effect.

GNU Libtool

Libtool is a program to simplify managing of the libraries to build an executable (a program). GNU Libtool has some added functionality compared to other implementations. If GNU Libtool isn’t present on your system at configuration time, a warning will be printed and BuildProgram won’t be built or installed. The configure script will look into your search path (PATH) for GNU Libtool through the following executable names: libtool (acceptable only if it is the GNU implementation) or glibtool. See Installation directory for more on PATH.

GNU Libtool (the binary/executable file) is a low-level program that is probably already present on your system, and if not, is available in your operating system package manager53. If you want to install GNU Libtool’s latest version from source, please visit its webpage.

Gnuastro’s tarball is shipped with an internal implementation of GNU Libtool. Even if you have GNU Libtool, Gnuastro’s internal implementation is used for the building and installation of Gnuastro. As a result, you can still build, install and use Gnuastro even if you don’t have GNU Libtool installed on your system. However, this internal Libtool does not get installed. Therefore, after Gnuastro’s installation, if you want to use BuildProgram to compile and link your own C source code which uses the Gnuastro library, you need to have GNU Libtool available on your system (independent of Gnuastro). See Review of library fundamentals to learn more about libraries.

libgit2

Git is one of the most common version control systems (see Version controlled source). When libgit2 is present, and Gnuastro’s programs are run within a version controlled directory, outputs will contain the version number of the working directory’s repository for future reproducibility. See the COMMIT keyword header in Output FITS files for a discussion.

libjpeg

libjpeg is only used by ConvertType to read from and write to JPEG images, see Recognized file formats. libjpeg is a very basic library that provides tools to read and write JPEG images, most Unix-like graphic programs and libraries use it. Therefore you most probably already have it installed. libjpeg-turbo is an alternative to libjpeg. It uses Single instruction, multiple data (SIMD) instructions for ARM based systems that significantly decreases the processing time of JPEG compression and decompression algorithms.

libtiff

libtiff is used by ConvertType and the libraries to read TIFF images, see Recognized file formats. libtiff is a very basic library that provides tools to read and write TIFF images, most Unix-like operating system graphic programs and libraries use it. Therefore even if you don’t have it installed, it must be easily available in your package manager.

GPL Ghostscript

GPL Ghostscript’s executable (gs) is called by ConvertType to compile a PDF file from a source PostScript file, see ConvertType. Therefore its headers (and libraries) are not needed. With a very high probability you already have it in your GNU/Linux distribution. Unfortunately it does not follow the standard GNU build style so installing it is very hard. It is best to rely on your distribution’s package managers for this.

Next: , Previous: , Up: Dependencies   [Contents][Index]

#### 3.1.3 Bootstrapping dependencies

Bootstrapping is only necessary if you have decided to obtain the full version controlled history of Gnuastro, see Version controlled source and Bootstrapping. Using the version controlled source enables you to always be up to date with the most recent development work of Gnuastro (bug fixes, new functionalities, improved algorithms, etc). If you have downloaded a tarball (see Downloading the source), then you can ignore this subsection.

To successfully run the bootstrapping process, there are some additional dependencies to those discussed in the previous subsections. These are low level tools that are used by a large collection of Unix-like operating systems programs, therefore they are most probably already available in your system. If they are not already installed, you should be able to easily find them in any GNU/Linux distribution package management system (apt-get, yum, pacman, etc). The short names in parenthesis in typewriter font after the package name can be used to search for them in your package manager. For the GNU Portability Library, GNU Autoconf Archive and TeX Live, it is recommended to use the instructions here, not your operating system’s package manager.

GNU Portability Library (Gnulib)

To ensure portability for a wider range of operating systems (those that don’t include GNU C library, namely glibc), Gnuastro depends on the GNU portability library, or Gnulib. Gnulib keeps a copy of all the functions in glibc, implemented (as much as possible) to be portable to other operating systems. The bootstrap script can automatically clone Gnulib (as a gnulib/ directory inside Gnuastro), however, as described in Bootstrapping this is not recommended.

The recommended way to bootstrap Gnuastro is to first clone Gnulib and the Autoconf archives (see below) into a local directory outside of Gnuastro. Let’s call it DEVDIR54 (which you can set to any directory). Currently in Gnuastro, both Gnulib and Autoconf archives have to be cloned in the same top directory55 like the case here56:

$DEVDIR=/home/yourname/Development$ cd $DEVDIR$ git clone git://git.sv.gnu.org/gnulib.git
$git clone git://git.sv.gnu.org/autoconf-archive.git  You now have the full version controlled source of these two repositories in separate directories. Both these packages are regularly updated, so every once in a while, you can run $ git pull within them to get any possible updates.

GNU Automake (automake)

GNU Automake will build the Makefile.in files in each sub-directory using the (hand-written) Makefile.am files. The Makefile.ins are subsequently used to generate the Makefiles when the user runs ./configure before building.

GNU Autoconf (autoconf)

GNU Autoconf will build the configure script using the configurations we have defined (hand-written) in configure.ac.

GNU Autoconf Archive

These are a large collection of tests that can be called to run at ./configure time. See the explanation under GNU Portability Library above for instructions on obtaining it and keeping it up to date.

GNU Libtool (libtool)

GNU Libtool is in charge of building all the libraries in Gnuastro. The libraries contain functions that are used by more than one program and are installed for use in other programs. They are thus put in a separate directory (lib/).

GNU help2man (help2man)

GNU help2man is used to convert the output of the --help option (--help) to the traditional Man page (Man pages).

LaTeX and some TeX packages

Some of the figures in this book are built by LaTeX (using the PGF/TikZ package). The LaTeX source for those figures is version controlled for easy maintenance not the actual figures. So the ./boostrap script will run LaTeX to build the figures. The best way to install LaTeX and all the necessary packages is through TeX live which is a package manager for TeX related tools that is independent of any operating system. It is thus preferred to the TeX Live versions distributed by your operating system.

To install TeX Live, go to the webpage and download the appropriate installer by following the “download” link. Note that by default the full package repository will be downloaded and installed (around 4 Giga Bytes) which can take very long to download and to update later. However, most packages are not needed by everyone, it is easier, faster and better to install only the “Basic scheme” (consisting of only the most basic TeX and LaTeX packages, which is less than 200 Mega bytes)57.

After the installation, be sure to set the environment variables as suggested in the end of the outputs. Any time you confront (need) a package you don’t have, simply install it with a command like below (similar to how you install software from your operating system’s package manager)58. To install all the necessary TeX packages for a successful Gnuastro bootstrap, run this command:

$su # tlmgr install epsf jknapltx caption biblatex biber iftex \ etoolbox logreq xstring xkeyval pgf ms \ xcolor pgfplots times rsfs ps2eps epspdf  ImageMagick (imagemagick) ImageMagick is a wonderful and robust program for image manipulation on the command-line. bootstrap uses it to convert the book images into the formats necessary for the various book formats. Previous: , Up: Dependencies [Contents][Index] #### 3.1.4 Dependencies from package managers The most basic way to install a package on your system is to build the packages from source yourself. Alternatively, you can use your operating system’s package manager to download pre-compiled files and install them. The latter choice is easier and faster. However, we recommend that you build the Mandatory dependencies yourself from source (all necessary commands and links are given in the respective section). Here are some basic reasons behind this recommendation. 1. Your distribution’s pre-built package might not be the most recent release. 2. For each package, Gnuastro might preform better (or require) certain configuration options that your distribution’s package managers didn’t add for you. If present, these configuration options are explained during the installation of each in the sections below (for example in CFITSIO). When the proper configuration has not been set, the programs should complain and inform you. 3. For the libraries, they might separate the binary file from the header files which can cause confusion, see Known issues. 4. Like any other tool, the science you derive from Gnuastro’s tools highly depend on these lower level dependencies, so generally it is much better to have a close connection with them. By reading their manuals, installing them and staying up to date with changes/bugs in them, your scientific results and understanding (of what is going on, and thus how you interpret your scientific results) will also correspondingly improve. Based on your package manager, you can use any of the following commands to install the mandatory and optional dependencies. If your package manager isn’t included in the list below, please send us the respective command, so we add it. Gnuastro itself if also already packaged in some package managers (for example Debian or Homebrew). As discussed above, we recommend installing the mandatory dependencies manually from source (see Mandatory dependencies). Therefore, in each command below, first the optional dependencies are given. The mandatory dependencies are included after an empty line. If you would also like to install the mandatory dependencies with your package manager, just ignore the empty line. For better archivability and compression ratios, Gnuastro’s recommended tarball compression format is with the Lzip program, see Release tarball. Therefore, the package manager commands below also contain Lzip. apt-get (Debian-based OSs: Debian, Ubuntu, Linux Mint, etc) Debian is one of the oldest GNU/Linux distributions59. It thus has a very extended user community and a robust internal structure and standards. All of it is free software and based on the work of volunteers around the world. Many distributions are thus derived from it, for example Ubuntu and Linux Mint. This arguably makes Debian-based OSs the largest, and most used, class of GNU/Linux distributions. All of them use Debian’s Advanced Packaging Tool (APT, for example apt-get) for managing packages. $ sudo apt-get install ghostscript libtool-bin libjpeg-dev  \
libtiff-dev libgit2-dev lzip         \
\
libgsl0-dev libcfitsio-dev wcslib-dev


Gnuastro is packaged in Debian (and thus some of its derivate operating systems). Just make sure it is the most recent version.

dnf
yum (Red Hat-based OSs: Red Hat, Fedora, CentOS, Scientific Linux, etc)

Red Hat Enterprise Linux (RHEL) is released by Red Hat Inc. RHEL requires paid subscriptions for use of its binaries and support. But since it is free software, many other teams use its code to spin-off their own distributions based on RHEL. Red Hat-based GNU/Linux distributions initially used the “Yellowdog Updated, Modifier” (YUM) package manager, which has been replaced by “Dandified yum” (DNF). If the latter isn’t available on your system, you can use yum instead of dnf in the command below.

$sudo dnf install ghostscript libtool libjpeg-devel \ libtiff-devel libgit2-devel lzip \ \ gsl-devel cfitsio-devel wcslib-devel  brew (macOS) macOS is the operating system used on Apple devices. macOS does not come with a package manager pre-installed, but several widely used, third-party package managers exist, such as Homebrew or MacPorts. Both are free software. Currently we have only tested Gnuastro’s installation with Homebrew as described below. If not already installed, first obtain Homebrew by following the instructions at https://brew.sh. Homebrew manages packages in different ‘taps’. To install WCSLIB (discussed in Mandatory dependencies) via Homebrew you will need to tap into brewsci/science first (the tap may change in the future, but can be found by calling brew search wcslib). $ brew install ghostscript libtool libjpeg libtiff          \
libgit2 lzip                                 \
\
gsl cfitsio
$brew tap brewsci/science$ brew install wcslib

pacman (Arch Linux)

Arch Linux is a smaller GNU/Linux distribution, which follows the KISS principle (“keep it simple, stupid”) as a general guideline. It “focuses on elegance, code correctness, minimalism and simplicity, and expects the user to be willing to make some effort to understand the system’s operation”. Arch Linux uses “Package manager” (Pacman) to manage its packages/components.

$sudo pacman -S ghostscript libtool libjpeg libtiff \ libgit2 lzip \ \ gsl cfitsio wcslib  zypper (openSUSE and SUSE Linux Enterprise Server) SUSE Linux Enterprise Server60 (SLES) is the commercial offering which shares code and tools. Many additional packages are offered in the Build Service61. openSUSE and SLES use zypper (cli) and YaST (GUI) for managing repositories and packages. $ sudo zypper install ghostscript_any libtool pkgconfig    \
cfitsio-devel gsl-devel libcurl-devel        \
libgit2-devel libjpeg62-devel libtiff-devel  \
wcslib-devel


When building Gnuastro, run the configure script with the following CPPFLAGS environment variable:

$./configure CPPFLAGS="-I/usr/include/cfitsio"  Usually, when libraries are installed by operating system package managers, there should be no problems when configuring and building other programs from source (that depend on the libraries: Gnuastro in this case). However, in some special conditions, problems may pop-up during the configuration, building, or checking/running any of Gnuastro’s programs. The most common of such problems and their solution are discussed below.  Not finding library during configuration: If a library is installed, but during Gnuastro’s configure step the library isn’t found, then configure Gnuastro like the command below (correcting /path/to/lib). For more, see Known issues and Installation directory. $ ./configure LDFLAGS="-L/path/to/lib" 
 Not finding header (.h) files while building: If a library is installed, but during Gnuastro’s make step, the library’s header (file with a .h suffix) isn’t found, then configure Gnuastro like the command below (correcting /path/to/include). For more, see Known issues and Installation directory. $./configure CPPFLAGS="-I/path/to/include"   Gnuastro’s programs don’t run during check or after install: If a library is installed, but the programs don’t run due to linking problems, set the LD_LIBRARY_PATH variable like below (assuming Gnuastro is installed in /path/to/installed). For more, see Known issues and Installation directory. $ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/path/to/installed/lib"  Next: , Previous: , Up: Installation [Contents][Index] ### 3.2 Downloading the source Gnuastro’s source code can be downloaded in two ways. As a tarball, ready to be configured and installed on your system (as described in Quick start), see Release tarball. If you want official releases of stable versions this is the best, easiest and most common option. Alternatively, you can clone the version controlled history of Gnuastro, run one extra bootstrapping step and then follow the same steps as the tarball. This will give you access to all the most recent work that will be included in the next release along with the full project history. The process is thoroughly introduced in Version controlled source. Next: , Previous: , Up: Downloading the source [Contents][Index] #### 3.2.1 Release tarball A release tarball (commonly compressed) is the most common way of obtaining free and open source software. A tarball is a snapshot of one particular moment in the Gnuastro development history along with all the necessary files to configure, build, and install Gnuastro easily (see Quick start). It is very straightforward and needs the least set of dependencies (see Mandatory dependencies). Gnuastro has tarballs for official stable releases and pre-releases for testing. See Version numbering for more on the two types of releases and the formats of the version numbers. The URLs for each type of release are given below. Official stable releases (http://ftp.gnu.org/gnu/gnuastro): This URL hosts the official stable releases of Gnuastro. Always use the most recent version (see Version numbering). By clicking on the “Last modified” title of the second column, the files will be sorted by their date which you can also use to find the latest version. It is recommended to use a mirror to download these tarballs, please visit http://ftpmirror.gnu.org/gnuastro/ and see below. Pre-release tar-balls (http://alpha.gnu.org/gnu/gnuastro): This URL contains unofficial pre-release versions of Gnuastro. The pre-release versions of Gnuastro here are for enthusiasts to try out before an official release. If there are problems, or bugs then the testers will inform the developers to fix before the next official release. See Version numbering to understand how the version numbers here are formatted. If you want to remain even more up-to-date with the developing activities, please clone the version controlled source as described in Version controlled source. Gnuastro’s official/stable tarball is released with two formats: Gzip (with suffix .tar.gz) and Lzip (with suffix .tar.lz). The pre-release tarballs (after version 0.3) are released only as an Lzip tarball. Gzip is a very well-known and widely used compression program created by GNU and available in most systems. However, Lzip provides a better compression ratio and more robust archival capacity. For example Gnuastro 0.3’s tarball was 2.9MB and 4.3MB with Lzip and Gzip respectively, see the Lzip webpage for more. Lzip might not be pre-installed in your operating system, if so, installing it from your operating system’s package manager or from source is very easy and fast (it is a very small program). The GNU FTP server is mirrored (has backups) in various locations on the globe (http://www.gnu.org/order/ftp.html). You can use the closest mirror to your location for a more faster download. Note that only some mirrors keep track of the pre-release (alpha) tarballs. Also note that if you want to download immediately after and announcement (see Announcements), the mirrors might need some time to synchronize with the main GNU FTP server. Previous: , Up: Downloading the source [Contents][Index] #### 3.2.2 Version controlled source The publicly distributed Gnuastro tar-ball (for example gnuastro-X.X.tar.gz) does not contain the revision history, it is only a snapshot of the source code at one significant instant of Gnuastro’s history (specified by the version number, see Version numbering), ready to be configured and built. To be able to develop successfully, the revision history of the code can be very useful to track when something was added or changed, also some updates that are not yet officially released might be in it. We use Git for the version control of Gnuastro. For those who are not familiar with it, we recommend the ProGit book. The whole book is publicly available for online reading and downloading and does a wonderful job at explaining the concepts and best practices. Let’s assume you want to keep Gnuastro in the TOPGNUASTRO directory (can be any directory, change the value below). The full version controlled history of Gnuastro can be cloned in TOPGNUASTRO/gnuastro by running the following commands62: $ TOPGNUASTRO=/home/yourname/Research/projects/
$cd$TOPGNUASTRO
$git clone git://git.sv.gnu.org/gnuastro.git  The$TOPGNUASTRO/gnuastro directory will contain hand-written (version controlled) source code for Gnuastro’s programs, libraries, this book and the tests. All are divided into sub-directories with standard and very descriptive names. The version controlled files in the top cloned directory are either mainly in capital letters (for example THANKS and README) or mainly written in small-caps (for example configure.ac and Makefile.am). The former are non-programming, standard writing for human readers containing high-level information about the whole package. The latter are instructions to customize the GNU build system for Gnuastro. For more on Gnuastro’s source code structure, please see Developing. We won’t go any deeper here.

The cloned Gnuastro source cannot immediately be configured, compiled, or installed since it only contains hand-written files, not automatically generated or imported files which do all the hard work of the build process. See Bootstrapping for the process of generating and importing those files (its not too hard!). Once you have bootstrapped Gnuastro, you can run the standard procedures (in Quick start). Very soon after you have cloned it, Gnuastro’s main master branch will be updated on the main repository (since the developers are actively working on Gnuastro), for the best practices in keeping your local history in sync with the main repository see Synchronizing.

Next: , Previous: , Up: Version controlled source   [Contents][Index]

#### 3.2.2.1 Bootstrapping

The version controlled source code lacks the source files that we have not written or are automatically built. These automatically generated files are included in the distributed tar ball for each distribution (for example gnuastro-X.X.tar.gz, see Version numbering) and make it easy to immediately configure, build, and install Gnuastro. However from the perspective of version control, they are just bloatware and sources of confusion (since they are not changed by Gnuastro developers).

The process of automatically building and importing necessary files into the cloned directory is known as bootstrapping. All the instructions for an automatic bootstrapping are available in bootstrap and configured using bootstrap.conf. bootstrap and COPYING (which contains the software copyright notice) are the only files not written by Gnuastro developers but under version control to enable simple bootstrapping and legal information on usage immediately after cloning. bootstrap.conf is maintained by the GNU Portability Library (Gnulib) and this file is an identical copy, so do not make any changes in this file since it will be replaced when Gnulib releases an update. Make all your changes in bootstrap.conf.

The bootstrapping process has its own separate set of dependencies, the full list is given in Bootstrapping dependencies. They are generally very low-level and used by a very large set of commonly used programs, so they are probably already installed on your system. The simplest way to bootstrap Gnuastro is to simply run the bootstrap script within your cloned Gnuastro directory as shown below. However, please read the next paragraph before doing so (see Version controlled source for TOPGNUASTRO).

$cd TOPGNUASTRO/gnuastro$ ./bootstrap                      # Requires internet connection


Without any options, bootstrap will clone Gnulib within your cloned Gnuastro directory (TOPGNUASTRO/gnuastro/gnulib) and download the necessary Autoconf archives macros. So if you run bootstrap like this, you will need an internet connection every time you decide to bootstrap. Also, Gnulib is a large package and cloning it can be slow. It will also keep the full Gnulib repository within your Gnuastro repository, so if another one of your projects also needs Gnulib, and you insist on running bootstrap like this, you will have two copies. In case you regularly backup your important files, Gnulib will also slow down the backup process. Therefore while the simple invocation above can be used with no problem, it is not recommended. To do better, see the next paragraph.

The recommended way to get these two packages is thoroughly discussed in Bootstrapping dependencies (in short: clone them in the separate DEVDIR/ directory). The following commands will take you into the cloned Gnuastro directory and run the bootstrap script, while telling it to copy some files (instead of making symbolic links, with the --copy option, this is not mandatory63) and where to look for Gnulib (with the --gnulib-srcdir option). Please note that the address given to --gnulib-srcdir has to be an absolute address (so don’t use ~ or ../ for example).

$cd$TOPGNUASTRO/gnuastro
$./bootstrap --copy --gnulib-srcdir=$DEVDIR/gnulib


Since Gnulib and Autoconf archives are now available in your local directories, you don’t need an internet connection every time you decide to remove all untracked files and redo the bootstrap (see box below). You can also use the same command on any other project that uses Gnulib. All the necessary GNU C library functions, Autoconf macros and Automake inputs are now available along with the book figures. The standard GNU build system (Quick start) will do the rest of the job.

 Undoing the bootstrap: During the development, it might happen that you want to remove all the automatically generated and imported files. In other words, you might want to reverse the bootstrap process. Fortunately Git has a good program for this job: git clean. Run the following command and every file that is not version controlled will be removed. git clean -fxd  It is best to commit any recent change before running this command. You might have created new files since the last commit and if they haven’t been committed, they will all be gone forever (using rm). To get a list of the non-version controlled files instead of deleting them, add the n option to git clean, so it becomes -fxdn.

Besides the bootstrap and bootstrap.conf, the bootstrapped/ directory and README-hacking file are also related to the bootstrapping process. The former hosts all the imported (bootstrapped) directories. Thus, in the version controlled source, it only contains a REAME file, but in the distributed tar-ball it also contains sub-directories filled with all bootstrapped files. README-hacking contains a summary of the bootstrapping process discussed in this section. It is a necessary reference when you haven’t built this book yet. It is thus not distributed in the Gnuastro tarball.

Previous: , Up: Version controlled source   [Contents][Index]

#### 3.2.2.2 Synchronizing

The bootstrapping script (see Bootstrapping) is not regularly needed: you mainly need it after you have cloned Gnuastro (once) and whenever you want to re-import the files from Gnulib, or Autoconf archives64 (not too common). However, Gnuastro developers are constantly working on Gnuastro and are pushing their changes to the official repository. Therefore, your local Gnuastro clone will soon be out-dated. Gnuastro has two mailing lists dedicated to its developing activities (see Developing mailing lists). Subscribing to them can help you decide when to synchronize with the official repository.

To pull all the most recent work in Gnuastro, run the following command from the top Gnuastro directory. If you don’t already have a built system, ignore make distclean. The separate steps are described in detail afterwards.

$make distclean && git pull && autoreconf -f  You can also run the commands separately: $ make distclean
$git pull$ autoreconf -f


If Gnuastro was already built in this directory, you don’t want some outputs from the previous version being mixed with outputs from the newly pulled work. Therefore, the first step is to clean/delete all the built files with make distclean. Fortunately the GNU build system allows the separation of source and built files (in separate directories). This is a great feature to keep your source directory clean and you can use it to avoid the cleaning step. Gnuastro comes with a script with some useful options for this job. It is useful if you regularly pull recent changes, see Separate build and source directories.

After the pull, we must re-configure Gnuastro with autoreconf -f (part of GNU Autoconf). It will update the ./configure script and all the Makefile.in65 files based on the hand-written configurations (in configure.ac and the Makefile.am files). After running autoreconf -f, a warning about TEXI2DVI might show up, you can ignore that.

The most important reason for re-building Gnuastro’s build system is to generate/update the version number for your updated Gnuastro snapshot. This generated version number will include the commit information (see Version numbering). The version number is included in nearly all outputs of Gnuastro’s programs, therefore it is vital for reproducing an old result.

As a summary, be sure to run ‘autoreconf -f’ after every change in the Git history. This includes synchronization with the main server or even a commit you have made yourself.

If you would like to see what has changed since you last synchronized your local clone, you can take the following steps instead of the simple command above (don’t type anything after #):

$git checkout master # Confirm if you are on master.$ git fetch origin                # Fetch all new commits from server.
$git log master..origin/master # See all the new commit messages.$ git merge origin/master         # Update your master branch.
$autoreconf -f # Update the build system.  By default git log prints the most recent commit first, add the --reverse option to see the changes chronologically. To see exactly what has been changed in the source code along with the commit message, add a -p option to the git log. If you want to make changes in the code, have a look at Developing to get started easily. Be sure to commit your changes in a separate branch (keep your master branch to follow the official repository) and re-run autoreconf -f after the commit. If you intend to send your work to us, you can safely use your commit since it will be ultimately recorded in Gnuastro’s official history. If not, please upload your separate branch to a public hosting service, for example GitLab, and link to it in your report/paper. Alternatively, run make distcheck and upload the output gnuastro-X.X.X.XXXX.tar.gz to a publicly accessible webpage so your results can be considered scientific (reproducible) later. Previous: , Up: Installation [Contents][Index] ### 3.3 Build and install This section is basically a longer explanation to the sequence of commands given in Quick start. If you didn’t have any problems during the Quick start steps, you want to have all the programs of Gnuastro installed in your system, you don’t want to change the executable names during or after installation, you have root access to install the programs in the default system wide directory, the Letter paper size of the print book is fine for you or as a summary you don’t feel like going into the details when everything is working, you can safely skip this section. If you have any of the above problems or you want to understand the details for a better control over your build and install, read along. The dependencies which you will need prior to configuring, building and installing Gnuastro are explained in Dependencies. The first three steps in Quick start need no extra explanation, so we will skip them and start with an explanation of Gnuastro specific configuration options and a discussion on the installation directory in Configuring, followed by some smaller subsections: Tests, A4 print book, and Known issues which explains the solutions to known problems you might encounter in the installation steps and ways you can solve them. Next: , Previous: , Up: Build and install [Contents][Index] #### 3.3.1 Configuring The $ ./configure step is the most important step in the build and install process. All the required packages, libraries, headers and environment variables are checked in this step. The behaviors of make and make install can also be set through command line options to this command.

The configure script accepts various arguments and options which enable the final user to highly customize whatever she is building. The options to configure are generally very similar to normal program options explained in Arguments and options. Similar to all GNU programs, you can get a full list of the options along with a short explanation by running

$./configure --help  A complete explanation is also included in the INSTALL file. Note that this file was written by the authors of GNU Autoconf (which builds the configure script), therefore it is common for all programs which use the $ ./configure script for building and installing, not just Gnuastro. Here we only discuss cases where you don’t have super-user access to the system and if you want to change the executable names. But before that, a review of the options to configure that are particular to Gnuastro are discussed.

Next: , Previous: , Up: Configuring   [Contents][Index]

#### 3.3.1.1 Gnuastro configure options

Most of the options to configure (which are to do with building) are similar for every program which uses this script. Here the options that are particular to Gnuastro are discussed. The next topics explain the usage of other configure options which can be applied to any program using the GNU build system (through the configure script).

--enable-debug

Compile/build Gnuastro with debugging information, no optimization and without shared libraries.

In order to allow more efficient programs when using Gnuastro (after the installation), by default Gnuastro is built with a 3rd level (a very high level) optimization and no debugging information. By default, libraries are also built for static and shared linking (see Linking). However, when there are crashes or unexpected behavior, these three features can hinder the process of localizing the problem. This configuration option is identical to manually calling the configuration script with CFLAGS="-g -O0" --disable-shared.

In the (rare) situations where you need to do your debugging on the shared libraries, don’t use this option. Instead run the configure script by explicitly setting CFLAGS like this:

$./configure CFLAGS="-g -O0"  --enable-check-with-valgrind Do the make check tests through Valgrind. Therefore, if any crashes or memory-related issues (segmentation faults in particular) occur in the tests, the output of Valgrind will also be put in the tests/test-suite.log file without having to manually modify the check scripts. This option will also activate Gnuastro’s debug mode (see the --enable-debug configure-time option described above). Valgrind is free software. It is a program for easy checking of memory-related issues in programs. It runs a program within its own controlled environment and can thus identify the exact line-number in the program’s source where a memory-related issue occurs. However, it can significantly slow-down the tests. So this option is only useful when a segmentation fault is found during make check. --enable-progname Only build and install progname along with any other program that is enabled in this fashion. progname is the name of the executable without the ast, for example crop for Crop (with the executable name of astcrop). Note that by default all the programs will be installed. This option (and the --disable-progname options) are only relevant when you don’t want to install all the programs. Therefore, if this option is called for any of the programs in Gnuastro, any program which is not explicitly enabled will not be built or installed. --disable-progname --enable-progname=no Do not build or install the program named progname. This is very similar to the --enable-progname, but will build and install all the other programs except this one.  Note: If some programs are enabled and some are disabled, it is equivalent to simply enabling those that were enabled. Listing the disabled programs is redundant. --enable-gnulibcheck Enable checks on the GNU Portability Library (Gnulib). Gnulib is used by Gnuastro to enable users of non-GNU based operating systems (that don’t use GNU C library or glibc) to compile and use the advanced features that this library provides. We make extensive use of such functions. If you give this option to $ ./configure, when you run $make check, first the functions in Gnulib will be tested, then the Gnuastro executables. If your operating system does not support glibc or has an older version of it and you have problems in the build process ($ make), you can give this flag to configure to see if the problem is caused by Gnulib not supporting your operating system or Gnuastro, see Known issues.

--disable-guide-message
--enable-guide-message=no

Do not print a guiding message during the GNU Build process of Quick start. By default, after each step, a message is printed guiding the user what the next command should be. Therefore, after ./configure, it will suggest running make. After make, it will suggest running make check and so on. If Gnuastro is configured with this option, for example

$./configure --disable-guide-message  Then these messages will not be printed after any step (like most programs). For people who are not yet fully accustomed to this build system, these guidelines can be very useful and encouraging. However, if you find those messages annoying, use this option. --without-libgit2 Build Gnuastro without libgit2 (for including Git commit hashes in output files), see Optional dependencies. libgit2 is an optional dependency, with this option, Gnuastro will ignore any possibly existing libgit2 that may already be on the system. --without-libjpeg Build Gnuastro without libjpeg (for reading/writing to JPEG files), see Optional dependencies. libjpeg is an optional dependency, with this option, Gnuastro will ignore any possibly existing libjpeg that may already be on the system. --without-libtiff Build Gnuastro without libtiff (for reading/writing to TIFF files), see Optional dependencies. libtiff is an optional dependency, with this option, Gnuastro will ignore any possibly existing libtiff that may already be on the system. The tests of some programs might depend on the outputs of the tests of other programs. For example MakeProfiles is one the first programs to be tested when you run $ make check. MakeProfiles’ test outputs (FITS images) are inputs to many other programs (which in turn provide inputs for other programs). Therefore, if you don’t install MakeProfiles for example, the tests for many the other programs will be skipped. To avoid this, in one run, you can install all the programs and run the tests but not install. If everything is working correctly, you can run configure again with only the programs you want. However, don’t run the tests and directly install after building.

Next: , Previous: , Up: Configuring   [Contents][Index]

#### 3.3.1.2 Installation directory

One of the most commonly used options to ./configure is --prefix, it is used to define the directory that will host all the installed files (or the “prefix” in their final absolute file name). For example, when you are using a server and you don’t have administrator or root access. In this example scenario, if you don’t use the --prefix option, you won’t be able to install the built files and thus access them from anywhere without having to worry about where they are installed. However, once you prepare your startup file to look into the proper place (as discussed thoroughly below), you will be able to easily use this option and benefit from any software you want to install without having to ask the system administrators or install and use a different version of a software that is already installed on the server.

The most basic way to run an executable is to explicitly write its full file name (including all the directory information) and run it. One example is running the configuration script with the $./configure command (see Quick start). By giving a specific directory (the current directory or ./), we are explicitly telling the shell to look in the current directory for an executable file named ‘configure’. Directly specifying the directory is thus useful for executables in the current (or nearby) directories. However, when the program (an executable file) is to be used a lot, specifying all those directories will become a significant burden. For example, the ls executable lists the contents in a given directory and it is (usually) installed in the /usr/bin/ directory by the operating system maintainers. Therefore, if using the full address was the only way to access an executable, each time you wanted a listing of a directory, you would have to run the following command (which is very inconvenient, both in writing and in remembering the various directories). $ /usr/bin/ls


To address this problem, we have the PATH environment variable. To understand it better, we will start with a short introduction to the shell variables. Shell variable values are basically treated as strings of characters. For example, it doesn’t matter if the value is a name (string of alphabetic characters), or a number (string of numeric characters), or both. You can define a variable and a value for it by running

$myvariable1=a_test_value$ myvariable2="a test value"


As you see above, if the value contains white space characters, you have to put the whole value (including white space characters) in double quotes ("). You can see the value it represents by running

$echo$myvariable1
$echo$myvariable2


If a variable has no value or it wasn’t defined, the last command will only print an empty line. A variable defined like this will be known as long as this shell or terminal is running. Other terminals will have no idea it existed. The main advantage of shell variables is that if they are exported66, subsequent programs that are run within that shell can access their value. So by changing their value, you can change the “environment” of a program which uses them. The shell variables which are accessed by programs are therefore known as “environment variables”67. You can see the full list of exported variables that your shell recognizes by running:

$printenv  HOME is one commonly used environment variable, it is any user’s (the one that is logged in) top directory. Try finding it in the command above. It is used so often that the shell has a special expansion (alternative) for it: ‘~’. Whenever you see file names starting with the tilde sign, it actually represents the value to the HOME environment variable, so ~/doc is the same as$HOME/doc.

Another one of the most commonly used environment variables is PATH, it is a list of directories to search for executable names. Its value is a list of directories (separated by a colon, or ‘:’). When the address of the executable is not explicitly given (like ./configure above), the system will look for the executable in the directories specified by PATH. If you have a computer nearby, try running the following command to see which directories your system will look into when it is searching for executable (binary) files, one example is printed here (notice how /usr/bin, in the ls example above, is one of the directories in PATH):

$echo$PATH
/usr/local/sbin:/usr/local/bin:/usr/bin


By default PATH usually contains system-wide directories, which are readable (but not writable) by all users, like the above example. Therefore if you don’t have root (or administrator) access, you need to add another directory to PATH which you actually have write access to. The standard directory where you can keep installed files (not just executables) for your own user is the ~/.local/ directory. The names of hidden files start with a ‘.’ (dot), so it will not show up in your common command-line listings, or on the graphical user interface. You can use any other directory, but this is the most recognized.

The top installation directory will be used to keep all the package’s components: programs (executables), libraries, include (header) files, shared data (like manuals), or configuration files (see Review of library fundamentals for a thorough introduction to headers and linking). So it commonly has some of the following sub-directories for each class of installed components respectively: bin/, lib/, include/ man/, share/, etc/. Since the PATH variable is only used for executables, you can add the ~/.local/bin directory (which keeps the executables/programs or more generally, “binary” files) to PATH with the following command. As defined below, first the existing value of PATH is used, then your given directory is added to its end and the combined value is put back in PATH (run ‘$echo$PATH’ afterwards to check if it was added).

$PATH=$PATH:~/.local/bin


Any executable that you installed in ~/.local/bin will now be usable without having to remember and write its full address. However, as soon as you leave/close your current terminal session, this modified PATH variable will be forgotten. Adding the directories which contain executables to the PATH environment variable each time you start a terminal is also very inconvenient and prone to errors. Fortunately, there are standard ‘startup files’ defined by your shell precisely for this (and other) purposes. There is a special startup file for every significant starting step:

/etc/profile and everything in /etc/profile.d/

These startup scripts are called when your whole system starts (for example after you turn on your computer). Therefore you need administrator or root privileges to access or modify them.

~/.bash_profile

If you are using (GNU) Bash as your shell, the commands in this file are run, when you log in to your account through Bash. Most commonly when you login through the virtual console (where there is no graphic user interface).

~/.bashrc

If you are using (GNU) Bash as your shell, the commands here will be run each time you start a terminal and are already logged in. For example, when you open your terminal emulator in the graphic user interface.

For security reasons, it is highly recommended to directly type in your HOME directory value by hand in startup files instead of using variables. So in the following, let’s assume your user name is ‘name’ (so ~ may be replaced with /home/name). To add ~/.local/bin to your PATH automatically on any startup file, you have to “export” the new value of PATH in the startup file that is most relevant to you by adding this line:

export PATH=$PATH:/home/name/.local/bin  Now that you know your system will look into ~/.local/bin for executables, you can tell Gnuastro’s configure script to install everything in the top ~/.local directory using the --prefix option. When you subsequently run $ make install, all the install-able files will be put in their respective directory under ~/.local/ (the executables in ~/.local/bin, the compiled library files in ~/.local/lib, the library header files in ~/.local/include and so on, to learn more about these different files, please see Review of library fundamentals). Note that tilde (‘~’) expansion will not happen if you put a ‘=’ between --prefix and ~/.local68, so we have avoided the = character here which is optional in GNU-style options, see Options.

$./configure --prefix ~/.local  You can install everything (including libraries like GSL, CFITSIO, or WCSLIB which are Gnuastro’s mandatory dependencies, see Mandatory dependencies) locally by configuring them as above. However, recall that PATH is only for executable files, not libraries and that libraries can also depend on other libraries. For example WCSLIB depends on CFITSIO and Gnuastro needs both. Therefore, when you installed a library in a non-recognized directory, you have to guide the program that depends on them to look into the necessary library and header file directories. To do that, you have to define the LDFLAGS and CPPFLAGS environment variables respectively. This can be done while calling ./configure as shown below: $ ./configure LDFLAGS=-L/home/name/.local/lib            \
CPPFLAGS=-I/home/name/.local/include       \
--prefix ~/.local


It can be annoying/buggy to do this when configuring every software that depends on such libraries. Hence, you can define these two variables in the most relevant startup file (discussed above). The convention on using these variables doesn’t include a colon to separate values (as PATH-like variables do), they use white space characters and each value is prefixed with a compiler option69: note the -L and -I above (see Options), for -I see Headers, and for -L, see Linking. Therefore we have to keep the value in double quotation signs to keep the white space characters and adding the following two lines to the startup file of choice:

export LDFLAGS="$LDFLAGS -L/home/name/.local/lib" export CPPFLAGS="$CPPFLAGS -I/home/name/.local/include"


Dynamic libraries are linked to the executable every time you run a program that depends on them (see Linking to fully understand this important concept). Hence dynamic libraries also require a special path variable called LD_LIBRARY_PATH (same formatting as PATH). To use programs that depend on these libraries, you need to add ~/.local/lib to your LD_LIBRARY_PATH environment variable by adding the following line to the relevant start-up file:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/name/.local/lib  If you also want to access the Info (see Info) and man pages (see Man pages) documentations add ~/.local/share/info and ~/.local/share/man to your INFOPATH70 and MANPATH environment variables respectively. A final note is that order matters in the directories that are searched for all the variables discussed above. In the examples above, the new directory was added after the system specified directories. So if the program, library or manuals are found in the system wide directories, the user directory is no longer searched. If you want to search your local installation first, put the new directory before the already existing list, like the example below. export LD_LIBRARY_PATH=/home/name/.local/lib:$LD_LIBRARY_PATH


This is good when a library, for example CFITSIO, is already present on the system, but the system-wide install wasn’t configured with the correct configuration flags (see CFITSIO), or you want to use a newer version and you don’t have administrator or root access to update it on the whole system/server. If you update LD_LIBRARY_PATH by placing ~/.local/lib first (like above), the linker will first find the CFITSIO you installed for yourself and link with it. It thus will never reach the system-wide installation.

There are important security problems with using local installations first: all important system-wide executables and libraries (important executables like ls and cp, or libraries like the C library) can be replaced by non-secure versions with the same file names and put in the customized directory (~/.local in this example). So if you choose to search in your customized directory first, please be sure to keep it clean from executables or libraries with the same names as important system programs or libraries.

 Summary: When you are using a server which doesn’t give you administrator/root access AND you would like to give priority to your own built programs and libraries, not the version that is (possibly already) present on the server, add these lines to your startup file. See above for which startup file is best for your case and for a detailed explanation on each. Don’t forget to replace ‘/YOUR-HOME-DIR’ with your home directory (for example ‘/home/your-id’): export PATH="/YOUR-HOME-DIR/.local/bin:$PATH" export LDFLAGS="-L/YOUR-HOME-DIR/.local/lib$LDFLAGS" export MANPATH="/YOUR-HOME-DIR/.local/share/man/:$MANPATH" export CPPFLAGS="-I/YOUR-HOME-DIR/.local/include$CPPFLAGS" export INFOPATH="/YOUR-HOME-DIR/.local/share/info/:$INFOPATH" export LD_LIBRARY_PATH="/YOUR-HOME-DIR/.local/lib:$LD_LIBRARY_PATH"  Afterwards, you just need to add an extra --prefix=/YOUR-HOME-DIR/.local to the ./configure command of the software that you intend to install. Everything else will be the same as a standard build and install, see Quick start.

Next: , Previous: , Up: Configuring   [Contents][Index]

#### 3.3.1.3 Executable names

At first sight, the names of the executables for each program might seem to be uncommonly long, for example astnoisechisel or astcrop. We could have chosen terse (and cryptic) names like most programs do. We chose this complete naming convention (something like the commands in TeX) so you don’t have to spend too much time remembering what the name of a specific program was. Such complete names also enable you to easily search for the programs.

To facilitate typing the names in, we suggest using the shell auto-complete. With this facility you can find the executable you want very easily. It is very similar to file name completion in the shell. For example, simply by typing the letters below (where [TAB] stands for the Tab key on your keyboard)

$ast[TAB][TAB]  you will get the list of all the available executables that start with ast in your PATH environment variable directories. So, all the Gnuastro executables installed on your system will be listed. Typing the next letter for the specific program you want along with a Tab, will limit this list until you get to your desired program. In case all of this does not convince you and you still want to type short names, some suggestions are given below. You should have in mind though, that if you are writing a shell script that you might want to pass on to others, it is best to use the standard name because other users might not have adopted the same customization. The long names also serve as a form of documentation in such scripts. A similar reasoning can be given for option names in scripts: it is good practice to always use the long formats of the options in shell scripts, see Options. The simplest solution is making a symbolic link to the actual executable. For example let’s assume you want to type ic to run Crop instead of astcrop. Assuming you installed Gnuastro executables in /usr/local/bin (default) you can do this simply by running the following command as root: # ln -s /usr/local/bin/astcrop /usr/local/bin/ic  In case you update Gnuastro and a new version of Crop is installed, the default executable name is the same, so your custom symbolic link still works. The installed executable names can also be set using options to $ ./configure, see Configuring. GNU Autoconf (which configures Gnuastro for your particular system), allows the builder to change the name of programs with the three options --program-prefix, --program-suffix and --program-transform-name. The first two are for adding a fixed prefix or suffix to all the programs that will be installed. This will actually make all the names longer! You can use it to add versions of program names to the programs in order to simultaneously have two executable versions of a program.

The third configure option allows you to set the executable name at install time using the SED program. SED is a very useful ‘stream editor’. There are various resources on the internet to use it effectively. However, we should caution that using configure options will change the actual executable name of the installed program and on every re-install (an update for example), you have to also add this option to keep the old executable name updated. Also note that the documentation or configuration files do not change from their standard names either.

For example, let’s assume that typing ast on every invocation of every program is really annoying you! You can remove this prefix from all the executables at configure time by adding this option:

$./configure --program-transform-name='s/ast/ /'  Previous: , Up: Configuring [Contents][Index] #### 3.3.1.4 Configure and build in RAM Gnuastro’s configure and build process (the GNU build system) involves the creation, reading, and modification of a large number of files (input/output, or I/O). Therefore file I/O issues can directly affect the work of developers who need to configure and build Gnuastro numerous times. Some of these issues are listed below: • I/O will cause wear and tear on both the HDDs (mechanical failures) and SSDs (decreasing the lifetime). • Having the built files mixed with the source files can greatly affect backing up (synchronization) of source files (since it involves the management of a large number of small files that are regularly changed. Backup software can of course be configured to ignore the built files and directories. However, since the built files are mixed with the source files and can have a large variety, this will require a high level of customization. One solution to address both these problems is to use the tmpfs file system. Any file in tmpfs is actually stored in the RAM (and possibly SAWP), not on HDDs or SSDs. The RAM is built for extensive and fast I/O. Therefore the large number of file I/Os associated with configuring and building will not harm the HDDs or SSDs. Due to the volatile nature of RAM, files in the tmpfs file-system will be permanently lost after a power-off. Since all configured and built files are derivative files (not files that have been directly written by hand) there is no problem in this and this feature can be considered as an automatic cleanup. The modern GNU C library (and thus the Linux kernel) defines the /dev/shm directory for this purpose in the RAM (POSIX shared memory). To build in it, you can use the GNU build system’s ability to build in a separate directory (not necessarily in the source directory) as shown below. Just set SRCDIR as the address of Gnuastro’s top source directory (for example, the unpacked tarball). $ mkdir /dev/shm/tmp-gnuastro-build
$cd /dev/shm/tmp-gnuastro-build$ SRCDIR/configure --srcdir=SRCDIR
$make  Gnuastro comes with a script to simplify this process of configuring and building in a different directory (a “clean” build), for more see Separate build and source directories. Next: , Previous: , Up: Build and install [Contents][Index] #### 3.3.2 Separate build and source directories The simple steps of Quick start will mix the source and built files. This can cause inconvenience for developers or enthusiasts following the the most recent work (see Version controlled source). The current section is mainly focused on this later group of Gnuastro users. If you just install Gnuastro on major releases (following Announcements), you can safely ignore this section. When it is necessary to keep the source (which is under version control), but not the derivative (built) files (after checking or installing), the best solution is to keep the source and the built files in separate directories. One application of this is already discussed in Configure and build in RAM. To facilitate this process of configuring and building in a separate directory, Gnuastro comes with the developer-build script. It is available in the top source directory and is not installed. It will make a directory under a given top-level directory (given to --top-build-dir) and build Gnuastro in there directory. It thus keeps the source completely separated from the built files. For easy access to the built files, it also makes a symbolic link to the built directory in the top source files called build. When run without any options, default values will be used for its configuration. As with Gnuastro’s programs, you can inspect the default values with -P (or --printparams, the output just looks a little different here). The default top-level build directory is /dev/shm: the shared memory directory in RAM on GNU/Linux systems as described in Configure and build in RAM. Besides these, it also has some features to facilitate the job of developers or bleeding edge users like the --debug option to do a fast build, with debug information, no optimization, and no shared libraries. Here is the full list of options you can feed to this script to configure its operations.  Not all Gnuastro’s common program behavior usable here: developer-build is just a non-installed script with a very limited scope as described above. It thus doesn’t have all the common option behaviors or configuration files for example.  White space between option and value: developer-build doesn’t accept an = sign between the options and their values. It also needs at least one character between the option and its value. Therefore -n 4 or --numthreads 4 are acceptable, while -n4, -n=4, or --numthreads=4 aren’t. Finally multiple short option names cannot be merged: for example you can say -c -n 4, but unlike Gnuastro’s programs, -cn4 is not acceptable.  Reusable for other packages: This script can be used in any software which is configured and built using the GNU Build System. Just copy it in the top source directory of that software and run it from there. -b STR --top-build-dir STR The top build directory to make a directory for the build. If this option isn’t called, the top build directory is /dev/shm (only available in GNU/Linux operating systems, see Configure and build in RAM). -V --version Print the version string of Gnuastro that will be used in the build. This string will be appended to the directory name containing the built files. -a --autoreconf Run autoreconf -f before building the package. In Gnuastro, this is necessary when a new commit has been made to the project history. In Gnuastro’s build system, the Git description will be used as the version, see Version numbering and Synchronizing. -c --clean Delete the contents of the build directory (clean it) before starting the configuration and building of this run. This is useful when you have recently pulled changes from the main Git repository, or committed a change your self and ran autoreconf -f, see Synchronizing. After running GNU Autoconf, the version will be updated and you need to do a clean build. -d --debug Build with debugging flags (for example to use in GNU Debugger, also known as GDB, or Valgrind), disable optimization and also the building of shared libraries. Similar to running the configure script of below $ ./configure --enable-debug


Besides all the debugging advantages of building with this option, it will also be significantly speed up the build (at the cost of slower built programs). So when you are testing something small or working on the build system itself, it will be much faster to test your work with this option.

-v
--valgrind

Build all make check tests within Valgrind. For more, see the description of --enable-check-with-valgrind in Gnuastro configure options.

-j INT
--jobs INT

The maximum number of threads/jobs for Make to build at any moment. As the name suggests (Make has an identical option), the number given to this option is directly passed on to any call of Make with its -j option.

-C
--check

After finishing the build, also run make check. By default, make check isn’t run because the developer usually has their own checks to work on (for example defined in tests/during-dev.sh).

-i
--install

After finishing the build, also run make install.

-D
--dist

Run make dist-lzip pdf to build a distribution tarball (in .tar.lz format) and a PDF manual. This can be useful for archiving, or sending to colleagues who don’t use Git for an easy build and manual.

-u STR

Activate the --dist (-D) option, then use secure copy (scp, part of the SSH tools) to copy the tarball and PDF to the src and pdf sub-directories of the specified server and its directory (value to this option). For example --upload my-server:dir, will copy the tarball in the dir/src, and the PDF manual in dir/pdf of my-server server. It will then make a symbolic link in the top server directory to the tarball that is called gnuastro-latest.tar.lz.

-p
--publish

Short for --autoreconf --clean --debug --check --upload STR. --debug is added because it will greatly speed up the build. It will have no effect on the produced tarball. This is good when you have made a commit and are ready to publish it on your server (if nothing crashes). Recall that if any of the previous steps fail the script aborts.

-I
--install-archive

Short for --autoreconf --clean --check --install --dist. This is useful when you actually want to install the commit you just made (if the build and checks succeed). It will also produce a distribution tarball and PDF manual for easy access to the installed tarball on your system at a later time.

Ideally, Gnuastro’s Git version history makes it easy for a prepared system to revert back to a different point in history. But Gnuastro also needs to bootstrap files and also your collaborators might (usually do!) find it too much of a burden to do the bootstrapping themselves. So it is convenient to have a tarball and PDF manual of the version you have installed (and are using in your research) handily available.

-h
--help
-P
--printparams

Print a description of this script along with all the options and their current values.

Next: , Previous: , Up: Build and install   [Contents][Index]

#### 3.3.3 Tests

After successfully building (compiling) the programs with the $make command you can check the installation before installing. To run the tests, run $ make check


For every program some tests are designed to check some possible operations. Running the command above will run those tests and give you a final report. If everything is OK and you have built all the programs, all the tests should pass. In case any of the tests fail, please have a look at Known issues and if that still doesn’t fix your problem, look that the ./tests/test-suite.log file to see if the source of the error is something particular to your system or more general. If you feel it is general, please contact us because it might be a bug. Note that the tests of some programs depend on the outputs of other program’s tests, so if you have not installed them they might be skipped or fail. Prior to releasing every distribution all these tests are checked. If you have a reasonably modern terminal, the outputs of the successful tests will be colored green and the failed ones will be colored red.

These scripts can also act as a good set of examples for you to see how the programs are run. All the tests are in the tests/ directory. The tests for each program are shell scripts (ending with .sh) in a sub-directory of this directory with the same name as the program. See Test scripts for more detailed information about these scripts in case you want to inspect them.

Next: , Previous: , Up: Build and install   [Contents][Index]

#### 3.3.4 A4 print book

The default print version of this book is provided in the letter paper size. If you would like to have the print version of this book on paper and you are living in a country which uses A4, then you can rebuild the book. The great thing about the GNU build system is that the book source code which is in Texinfo is also distributed with the program source code, enabling you to do such customization (hacking).

In order to change the paper size, you will need to have GNU Texinfo installed. Open doc/gnuastro.texi with any text editor. This is the source file that created this book. In the first few lines you will see this line:

@c@afourpaper


In Texinfo, a line is commented with @c. Therefore, un-comment this line by deleting the first two characters such that it changes to:

@afourpaper


Save the file and close it. You can now run the following command

$make pdf  and the new PDF book will be available in SRCdir/doc/gnuastro.pdf. By changing the pdf in $ make pdf to ps or dvi you can have the book in those formats. Note that you can do this for any book that is in Texinfo format, they might not have @afourpaper line, so you can add it close to the top of the Texinfo source file.

Previous: , Up: Build and install   [Contents][Index]

#### 3.3.5 Known issues

Depending on your operating system and the version of the compiler you are using, you might confront some known problems during the configuration ($./configure), compilation ($ make) and tests ($make check). Here, their solutions are discussed. • $ ./configure: Configure complains about not finding a library even though you have installed it. The possible solution is based on how you installed the package:
• From your distribution’s package manager. Most probably this is because your distribution has separated the header files of a library from the library parts. Please also install the ‘development’ packages for those libraries too. Just add a -dev or -devel to the end of the package name and re-run the package manager. This will not happen if you install the libraries from source. When installed from source, the headers are also installed.
• From source. Then your linker is not looking where you installed the library. If you followed the instructions in this chapter, all the libraries will be installed in /usr/local/lib. So you have to tell your linker to look in this directory. To do so, configure Gnuastro like this:
$./configure LDFLAGS="-L/usr/local/lib"  If you want to use the libraries for your other programming projects, then export this environment variable in a start-up script similar to the case for LD_LIBRARY_PATH explained below, also see Installation directory. • $ make: Complains about an unknown function on a non-GNU based operating system. In this case, please run $./configure with the --enable-gnulibcheck option to see if the problem is from the GNU Portability Library (Gnulib) not supporting your system or if there is a problem in Gnuastro, see Gnuastro configure options. If the problem is not in Gnulib and after all its tests you get the same complaint from make, then please contact us at bug-gnuastro@gnu.org. The cause is probably that a function that we have used is not supported by your operating system and we didn’t included it along with the source tar ball. If the function is available in Gnulib, it can be fixed immediately. • $ make: Can’t find the headers (.h files) of installed libraries. Your C pre-processor (CPP) isn’t looking in the right place. To fix this, configure Gnuastro with an additional CPPFLAGS like below (assuming the library is installed in /usr/local/include:
$./configure CPPFLAGS="-I/usr/local/include"  If you want to use the libraries for your other programming projects, then export this environment variable in a start-up script similar to the case for LD_LIBRARY_PATH explained below, also see Installation directory. • $ make check: Only the first couple of tests pass, all the rest fail or get skipped. It is highly likely that when searching for shared libraries, your system doesn’t look into the /usr/local/lib directory (or wherever you installed Gnuastro or its dependencies). To make sure it is added to the list of directories, add the following line to your ~/.bashrc file and restart your terminal. Don’t forget to change /usr/local/lib if the libraries are installed in other (non-standard) directories.
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"  You can also add more directories by using a colon ‘:’ to separate them. See Installation directory and Linking to learn more on the PATH variables and dynamic linking respectively. • $ make check: The tests relying on external programs (for example fitstopdf.sh fail.) This is probably due to the fact that the version number of the external programs is too old for the tests we have preformed. Please update the program to a more recent version. For example to create a PDF image, you will need GPL Ghostscript, but older versions do not work, we have successfully tested it on version 9.15. Older versions might cause a failure in the test result.
• $make pdf: The PDF book cannot be made. To make a PDF book, you need to have the GNU Texinfo program (like any program, the more recent the better). A working TeX program is also necessary, which you can get from Tex Live71. • After make check: do not copy the programs’ executables to another (for example, the installation) directory manually (using cp, or mv for example). In the default configuration72, the program binaries need to link with Gnuastro’s shared library which is also built and installed with the programs. Therefore, to run successfully before and after installation, linking modifications need to be made by GNU Libtool at installation time. make install does this internally, but a simple copy might give linking errors when you run it. If you need to copy the executables, you can do so after installation. If your problem was not listed above, please file a bug report (Report a bug). Next: , Previous: , Up: Top [Contents][Index] ## 4 Common program behavior All the programs in Gnuastro share a set of common behavior mainly to do with user interaction to facilitate their usage and development. This includes how to feed input datasets into the programs, how to configure them, specifying the outputs, numerical data types, treating columns of information in tables, etc. This chapter is devoted to describing this common behavior in all programs. Because the behaviors discussed here are common to several programs, they are not repeated in each program’s description. In Command-line, a very general description of running the programs on the command-line is discussed, like difference between arguments and options, as well as options that are common/shared between all programs. None of Gnuastro’s programs keep any internal configuration value (values for their different operational steps), they read their configuration primarily from the command-line, then from specific files in directory, user, or system-wide settings. Using these configuration files can greatly help reproducible and robust usage of Gnuastro, see Configuration files for more. It is not possible to always have the different options and configurations of each program on the top of your head. It is very natural to forget the options of a program, their current default values, or how it should be run and what it did. Gnuastro’s programs have multiple ways to help you refresh your memory in multiple levels (just an option name, a short description, or fast access to the relevant section of the manual. See Getting help for more for more on benefiting from this very convenient feature. Many of the programs use the multi-threaded character of modern CPUs, in Multi-threaded operations we’ll discuss how you can configure this behavior, along with some tips on making best use of them. In Numeric data types, we’ll review the various types to store numbers in your datasets: setting the proper type for the usage context73 can greatly improve the file size and also speed of reading, writing or processing them. We’ll then look into the recognized table formats in Tables and how large datasets are broken into tiles, or mesh grid in Tessellation. Finally, we’ll take a look at the behavior regarding output files: Automatic output describes how the programs set a default name for their output when you don’t give one explicitly (using --output). When the output is a FITS file, all the programs also store some very useful information in the header that is discussed in Output FITS files. Next: , Previous: , Up: Common program behavior [Contents][Index] ### 4.1 Command-line Gnuastro’s programs are customized through the standard Unix-like command-line environment and GNU style command-line options. Both are very common in many Unix-like operating system programs. In Arguments and options we’ll start with the difference between arguments and options and elaborate on the GNU style of options. Afterwards, in Common options, we’ll go into the detailed list of all the options that are common to all the programs in Gnuastro. Next: , Previous: , Up: Command-line [Contents][Index] #### 4.1.1 Arguments and options When you type a command on the command-line, it is passed onto the shell (a generic name for the program that manages the command-line) as a string of characters. As an example, see the “Invoking ProgramName” sections in this manual for some examples of commands with each program, like Invoking Table, Invoking Fits, or Invoking Statistics. The shell then brakes up your string into separate tokens or words using any metacharacters (like white-space, tab, |, > or ;) that are in the string. On the command-line, the first thing you usually enter is the name of the program you want to run. After that, you can specify two types of tokens: arguments and options. In the GNU-style, arguments are those tokens that are not preceded by any hyphens (-, see Arguments). Here is one example: $ astcrop --center=53.162551,-27.789676 -w10/3600 --mode=wcs udf.fits


In the example above, we are running Crop to crop a region of width 10 arc-seconds centered at the given RA and Dec from the input Hubble Ultra-Deep Field (UDF) FITS image. Here, the argument is udf.fits. Arguments are most commonly the input file names containing your data. Options start with one or two hyphens, followed by an identifier for the option (the option’s name, for example, --center, -w, --mode in the example above) and its value (anything after the option name, or the optional = character). Through options you can configure how the program runs (interprets the data you provided).

Arguments can be mandatory and optional and unlike options, they don’t have any identifiers. Hence, when there multiple arguments, their order might also matter (for example in cp which is used for copying one file to another location). The outputs of --usage and --help shows which arguments are optional and which are mandatory, see --usage.

As their name suggests, options can be considered to be optional and most of the time, you don’t have to worry about what order you specify them in. When the order does matter, or the option can be invoked multiple times, it is explicitly mentioned in the “Invoking ProgramName” section of each program (this is a very important aspect of an option).

If there is only one such character, you can use a backslash (\) before it. If there are multiple, it might be easier to simply put your whole argument or option value inside of double quotes ("). In such cases, everything inside the double quotes will be seen as one token or word.

For example, let’s say you want to specify the header data unit (HDU) of your FITS file using a complex expression like ‘3; images(exposure > 100)’. If you simply add these after the --hdu (-h) option, the programs in Gnuastro will read the value to the HDU option as ‘3’ and run. Then, the shell will attempt to run a separate command ‘images(exposure > 100)’ and complain about a syntax error. This is because the semicolon (;) is an ‘end of command’ character in the shell. To solve this problem you can simply put double quotes around the whole string you want to pass to --hdu as seen below:



The $ is the shell prompt, astcrop is the program name. There are two arguments (catalog.txt and ASTRdata) and four options, two of them given in short format (-D, -r) and two in long format (--width and --deccol). Three of them require a value and one (-D) is an on/off option. If an abbreviation is unique between all the options of a program, the long option names can be abbreviated. For example, instead of typing --printparams, typing --print or maybe even --pri will be enough, if there are conflicts, the program will warn you and show you the alternatives. Finally, if you want the argument parser to stop parsing arguments beyond a certain point, you can use two dashes: --. No text on the command-line beyond these two dashes will be parsed. Gnuastro has two types of options with values, those that only take a single value are the most common type. If these options are repeated or called more than once on the command-line, the value of the last time it was called will be assigned to it. This is very useful when you are testing/experimenting. Let’s say you want to make a small modification to one option value. You can simply type the option with a new value in the end of the command and see how the script works. If you are satisfied with the change, you can remove the original option for human readability. If the change wasn’t satisfactory, you can remove the one you just added and not worry about forgetting the original value. Without this capability, you would have to memorize or save the original value somewhere else, run the command and then change the value again which is not at all convenient and is potentially cause lots of bugs. On the other hand, some options can be called multiple times in one run of a program and can thus take multiple values (for example see the --column option in Invoking Table. In these cases, the order of stored values is the same order that you specified on the command-line. Gnuastro’s programs don’t keep any internal default values, so some options are mandatory and if they don’t have a value, the program will complain and abort. Most programs have many such options and typing them by hand on every call is impractical. To facilitate the user experience, after parsing the command-line, Gnuastro’s programs read special configuration files to get the necessary values for the options you haven’t identified on the command-line. These configuration files are fully described in Configuration files.  CAUTION: In specifying a file address, if you want to use the shell’s tilde expansion (~) to specify your home directory, leave at least one space between the option name and your value. For example use -o ~/test, --output ~/test or --output= ~/test. Calling them with -o~/test or --output=~/test will disable shell expansion.  CAUTION: If you forget to specify a value for an option which requires one, and that option is the last one, Gnuastro will warn you. But if it is in the middle of the command, it will take the text of the next option or argument as the value which can cause undefined behavior.  NOTE: In some contexts Gnuastro’s counting starts from 0 and in others 1. You can assume by default that counting starts from 1, if it starts from 0 for a special option, it will be explicitly mentioned. Next: , Previous: , Up: Command-line [Contents][Index] #### 4.1.2 Common options To facilitate the job of the users and developers, all the programs in Gnuastro share some basic command-line options for the options that are common to many of the programs. The full list is classified as Input/Output options, Processing options, and Operating mode options. In some programs, some of the options are irrelevant, but still recognized (you won’t get an unrecognized option error, but the value isn’t used). Unless otherwise mentioned, these options are identical between all programs. Next: , Previous: , Up: Common options [Contents][Index] #### 4.1.2.1 Input/Output options These options are to do with the input and outputs of the various programs. --stdintimeout Number of micro-seconds to wait for writing/typing in the first line of standard input from the command-line (see Standard input). This is only relevant for programs that also accept input from the standard input, and you want to manually write/type the contents on the terminal. When the standard input is already connected to a pipe (output of another program), there won’t be any waiting (hence no timeout, thus making this option redundant). If the first line-break (for example with the ENTER key) is not provided before the timeout, the program will abort with an error that no input was given. Note that this time interval is only for the first line that you type. Once the first line is given, the program will assume that more data will come and accept rest of your inputs without any time limit. You need to specify the ending of the standard input, for example by pressing CTRL-D after a new line. Note that any input you write/type into a program on the command-line with Standard input will be discarded (lost) once the program is finished. It is only recoverable manually from your command-line (where you actually typed) as long as the terminal is open. So only use this feature when you are sure that you don’t need the dataset (or have a copy of it somewhere else). -h STR/INT --hdu=STR/INT The name or number of the desired Header Data Unit, or HDU, in the FITS image. A FITS file can store multiple HDUs or extensions, each with either an image or a table or nothing at all (only a header). Note that counting of the extensions starts from 0(zero), not 1(one). Counting from 0 is forced on us by CFITSIO which directly reads the value you give with this option (see CFITSIO). When specifying the name, case is not important so IMAGE, image or ImAgE are equivalent. CFITSIO has many capabilities to help you find the extension you want, far beyond the simple extension number and name. See CFITSIO manual’s “HDU Location Specification” section for a very complete explanation with several examples. A # is appended to the string you specify for the HDU74 and the result is put in square brackets and appended to the FITS file name before calling CFITSIO to read the contents of the HDU for all the programs in Gnuastro. -s STR --searchin=STR Where to match/search for columns when the column identifier wasn’t a number, see Selecting table columns. The acceptable values are name, unit, or comment. This option is only relevant for programs that take table columns as input. -I --ignorecase Ignore case while matching/searching column meta-data (in the field specified by the --searchin). The FITS standard suggests to treat the column names as case insensitive, which is strongly recommended here also but is not enforced. This option is only relevant for programs that take table columns as input. This option is not relevant to BuildProgram, hence in that program the short option -I is used for include directories, not to ignore case. -o STR --output=STR The name of the output file or directory. With this option the automatic output names explained in Automatic output are ignored. -T STR --type=STR The data type of the output depending on the program context. This option isn’t applicable to some programs like Fits and will be ignored by them. The different acceptable values to this option are fully described in Numeric data types. -D --dontdelete By default, if the output file already exists, Gnuastro’s programs will silently delete it and put their own outputs in its place. When this option is activated, if the output file already exists, the programs will not delete it, will warn you, and will abort. -K --keepinputdir In automatic output names, don’t remove the directory information of the input file names. As explained in Automatic output, if no output name is specified (with --output), then the output name will be made in the existing directory based on your input’s file name (ignoring the directory of the input). If you call this option, the directory information of the input will be kept and the automatically generated output name will be in the same directory as the input (usually with a suffix added). Note that his is only relevant if you are running the program in a different directory than the input data. -t STR --tableformat=STR The output table’s type. This option is only relevant when the output is a table and its format cannot be deduced from its filename. For example, if a name ending in .fits was given to --output, then the program knows you want a FITS table. But there are two types of FITS tables: FITS ASCII, and FITS binary. Thus, with this option, the program is able to identify which type you want. The currently recognized values to this option are: txt A plain text table with white-space characters between the columns (see Gnuastro text table format). fits-ascii A FITS ASCII table (see Recognized table formats). fits-binary A FITS binary table (see Recognized table formats). Next: , Previous: , Up: Common options [Contents][Index] #### 4.1.2.2 Processing options Some processing steps are common to several programs, so they are defined as common options to all programs. Note that this class of common options is thus necessarily less common between all the programs than those described in Input/Output options, or Operating mode options options. Also, if they are irrelevant for a program, these options will not display in the --help output of the program. --minmapsize=INT The minimum size (in bytes) to store the contents of each main processing array of a program as a file (on the non-volatile HDD/SSD), not in RAM. This can be very useful when you have limited RAM, but need to process large datasets which can be very memory intensive. In such scenarios, without this option, the program will crash. A random filename is assigned to the array. This file will keep the contents of the array as long as it is necessary and the program will delete it as soon as its not necessary any more. If the .gnuastro_mmap directory exists and is writable, then the random file will be placed in there. Otherwise, the randomly named file will be directly written in the current directory with the .gnuastro_mmap_ prefix. By default, the name of the created file, and its size (in bytes) is printed by the program when it is created and later, when its deleted/freed. These messages are useful to the user who has enough RAM, but has forgot to increase the value to --minmapsize (this is often the case). To suppress/disable such messages, use the --quietmmap option. When this option has a value of 0 (zero, strongly discouraged, see box below), all arrays that use this feature in a program will actually be placed in a file (not in RAM). When this option is larger than all the input datasets, all arrays will be definitely allocated in RAM and the program will run MUCH faster. Please note that using a non-volatile file (in the HDD/SDD) instead of RAM can significantly increase the program’s running time, especially on HDDs (where read/write is slower). So it is best to give this option large values by default. You can then decrease it for a specific program’s invocation on a large input after you see memory issues arise (for example an error, or the program not aborting and fully consuming your memory). The random file will be deleted once it is no longer needed by the program. The .gnuastro directory will also be deleted if it has no other contents (you may also have configuration files in this directory, see Configuration files). If you see randomly named files remaining in this directory when the program finishes normally, please send us a bug report so we address the problem, see Report a bug.  Limited number of memory-mapped files: The operating system kernels usually support a limited number of memory-mapped files. Therefore never set --minmapsize to zero or a small number of bytes (so too many files are created). If the kernel capacity is exceeded, the program will crash. --quietmmap Don’t print any message when an array is stored in non-volatile memory (HDD/SSD) and not RAM, see the description of --minmapsize (above) for more. -Z INT[,INT[,...]] --tilesize=[,INT[,...]] The size of regular tiles for tessellation, see Tessellation. For each dimension an integer length (in units of data-elements or pixels) is necessary. If the number of input dimensions is different from the number of values given to this option, the program will stop with an error. Values must be separated by commas (,) and can also be fractions (for example 4/2). If they are fractions, the result must be an integer, otherwise an error will be printed. -M INT[,INT[,...]] --numchannels=INT[,INT[,...]] The number of channels for larger input tessellation, see Tessellation. The number and types of acceptable values are similar to --tilesize. The only difference is that instead of length, the integers values given to this option represent the number of channels, not their size. -F FLT --remainderfrac=FLT The fraction of remainder size along all dimensions to add to the first tile. See Tessellation for a complete description. This option is only relevant if --tilesize is not exactly divisible by the input dataset’s size in a dimension. If the remainder size is larger than this fraction (compared to --tilesize), then the remainder size will be added with one regular tile size and divided between two tiles at the start and end of the given dimension. --workoverch Ignore the channel borders for the high-level job of the given application. As a result, while the channel borders are respected in defining the small tiles (such that no tile will cross a channel border), the higher-level program operation will ignore them, see Tessellation. --checktiles Make a FITS file with the same dimensions as the input but each pixel is replaced with the ID of the tile that it is associated with. Note that the tile IDs start from 0. See Tessellation for more on Tiling an image in Gnuastro. --oneelempertile When showing the tile values (for example with --checktiles, or when the program’s output is tessellated) only use one element for each tile. This can be useful when only the relative values given to each tile compared to the rest are important or need to be checked. Since the tiles usually have a large number of pixels within them the output will be much smaller, and so easier to read, write, store, or send. Note that when the full input size in any dimension is not exactly divisible by the given --tilesize in that dimension, the edge tile(s) will have different sizes (in units of the input’s size), see --remainderfrac. But with this option, all displayed values are going to have the (same) size of one data-element. Hence, in such cases, the image proportions are going to be slightly different with this option. If your input image is not exactly divisible by the tile size and you want one value per tile for some higher-level processing, all is not lost though. You can see how many pixels were within each tile (for example to weight the values or discard some for later processing) with Gnuastro’s Statistics (see Statistics) as shown below. The output FITS file is going to have two extensions, one with the median calculated on each tile and one with the number of elements that each tile covers. You can then use the where operator in Arithmetic to set the values of all tiles that don’t have the regular area to a blank value. $ aststatistics --median --number --ontile input.fits    \
--oneelempertile --output=o.fits
$REGULAR_AREA=1600 # Check second extension of o.fits'.$ astarithmetic o.fits o.fits $REGULAR_AREA ne nan where \ -h1 -h2  Note that if input.fits also has blank values, then the median on tiles with blank values will also be ignored with the command above (which is desirable). --inteponlyblank When values are to be interpolated, only change the values of the blank elements, keep the non-blank elements untouched. --interpmetric=STR The metric to use for finding nearest neighbors. Currently it only accepts the Manhattan (or taxicab) metric with manhattan, or the radial metric with radial. The Manhattan distance between two points is defined with $$|\Delta{x}|+|\Delta{y}|$$. Thus the Manhattan metric has the advantage of being fast, but at the expense of being less accurate. The radial distance is the standard definition of distance in a Euclidean space: $$\sqrt{\Delta{x}^2+\Delta{y}^2}$$. It is accurate, but the multiplication and square root can slow down the processing. --interpnumngb=INT The number of nearby non-blank neighbors to use for interpolation. Previous: , Up: Common options [Contents][Index] #### 4.1.2.3 Operating mode options Another group of options that are common to all the programs in Gnuastro are those to do with the general operation of the programs. The explanation for those that are not only limited to Gnuastro but are common to all GNU programs start with (GNU option). -- (GNU option) Stop parsing the command-line. This option can be useful in scripts or when using the shell history. Suppose you have a long list of options, and want to see if removing some of them (to read from configuration files, see Configuration files) can give a better result. If the ones you want to remove are the last ones on the command-line, you don’t have to delete them, you can just add -- before them and if you don’t get what you want, you can remove the -- and get the same initial result. --usage (GNU option) Only print the options and arguments and abort. This is very useful for when you know the what the options do, and have just forgot their long/short identifiers, see --usage. -? --help (GNU option) Print all options with an explanation and abort. Adding this option will print all the options in their short and long formats, also displaying which ones need a value if they are called (with an = after the long format followed by a string specifying the format, see Options). A short explanation is also given for what the option is for. The program will quit immediately after the message is printed and will not do any form of processing, see --help. -V --version (GNU option) Print a short message, showing the full name, version, copyright information and program authors and abort. On the first line, it will print the official name (not executable name) and version number of the program. Following this is a blank line and a copyright information. The program will not run. -q --quiet Don’t report steps. All the programs in Gnuastro that have multiple major steps will report their steps for you to follow while they are operating. If you do not want to see these reports, you can call this option and only error/warning messages will be printed. If the steps are done very fast (depending on the properties of your input) disabling these reports will also decrease running time. --cite Print all necessary information to cite and acknowledge Gnuastro in your published papers. With this option, the programs will print the BibTeX entry to include in your paper for Gnuastro in general, and the particular program’s paper (if that program comes with a separate paper). It will also print the necessary acknowledgment statement to add in the respective section of your paper and it will abort. For a more complete explanation, please see Acknowledgments. Citations and acknowledgments are vital for the continued work on Gnuastro. Gnuastro started, and is continued, based on separate research projects. So if you find any of the tools offered in Gnuastro to be useful in your research, please use the output of this command to cite and acknowledge the program (and Gnuastro) in your research paper. Thank you. Gnuastro is still new, there is no separate paper only devoted to Gnuastro yet. Therefore currently the paper to cite for Gnuastro is the paper for NoiseChisel which is the first published paper introducing Gnuastro to the astronomical community. Upon reaching a certain point, a paper completely devoted to describing Gnuastro’s many functionalities will be published, see GNU Astronomy Utilities 1.0. -P --printparams With this option, Gnuastro’s programs will read your command-line options and all the configuration files. If there is no problem (like a missing parameter or a value in the wrong format or range) and immediately before actually running, the programs will print the full list of option names, values and descriptions, sorted and grouped by context and abort. They will also report the version number, the date they were configured on your system and the time they were reported. As an example, you can give your full command-line options and even the input and output file names and finally just add -P to check if all the parameters are finely set. If everything is OK, you can just run the same command (easily retrieved from the shell history, with the top arrow key) and simply remove the last two characters that showed this option. No program will actually start its processing when this option is called. The otherwise mandatory arguments for each program (for example input image or catalog files) are no longer required when you call this option. --config=STR Parse STR as a configuration file immediately when this option is confronted (see Configuration files). The --config option can be called multiple times in one run of any Gnuastro program on the command-line or in the configuration files. In any case, it will be immediately read (before parsing the rest of the options on the command-line, or lines in a configuration file). Note that by definition, options on the command-line still take precedence over those in any configuration file, including the file(s) given to this option if they are called before it. Also see --lastconfig and --onlyversion on how this option can be used for reproducible results. You can use --checkconfig (below) to check/confirm the parsing of configuration files. --checkconfig Print options and their values, within the command-line or configuration files, as they are parsed (see Configuration file precedence). If an option has already been set, or is ignored by the program, this option will also inform you with special values like --ALREADY-SET--. Only options that are parsed after this option are printed, so to see the parsing of all input options, it is recommended to put this option immediately after the program name before any other options. This is a very good option to confirm where the value of each option is has been defined in scenarios where there are multiple configuration files (for debugging). -S --setdirconf Update the current directory configuration file for the Gnuastro program and quit. The full set of command-line and configuration file options will be parsed and options with a value will be written in the current directory configuration file for this program (see Configuration files). If the configuration file or its directory doesn’t exist, it will be created. If a configuration file exists it will be replaced (after it, and all other configuration files have been read). In any case, the program will not run. This is the recommended method75 to edit/set the configuration file for all future calls to Gnuastro’s programs. It will internally check if your values are in the correct range and type and save them according to the configuration file format, see Configuration file format. So if there are unreasonable values to some options, the program will notify you and abort before writing the final configuration file. When this option is called, the otherwise mandatory arguments, for example input image or catalog file(s), are no longer mandatory (since the program will not run). -U --setusrconf Update the user configuration file and quit (see Configuration files). See explanation under --setdirconf for more details. --lastconfig This is the last configuration file that must be read. When this option is confronted in any stage of reading the options (on the command-line or in a configuration file), no other configuration file will be parsed, see Configuration file precedence and Current directory and User wide. Like all on/off options, on the command-line, this option doesn’t take any values. But in a configuration file, it takes the values of 0 or 1, see Configuration file format. If it is present in a configuration file with a value of 0, then all later occurrences of this option will be ignored. --onlyversion=STR Only run the program if Gnuastro’s version is exactly equal to STR (see Version numbering). Note that it is not compared as a number, but as a string of characters, so 0, or 0.0 and 0.00 are different. If the running Gnuastro version is different, then this option will report an error and abort as soon as it is confronted on the command-line or in a configuration file. If the running Gnuastro version is the same as STR, then the program will run as if this option was not called. This is useful if you want your results to be exactly reproducible and not mistakenly run with an updated/newer or older version of the program. Besides internal algorithmic/behavior changes in programs, the existence of options or their names might change between versions (especially in these earlier versions of Gnuastro). Hence, when using this option (probably in a script or in a configuration file), be sure to call it before other options. The benefit is that, when the version differs, the other options won’t be parsed and you, or your collaborators/users, won’t get errors saying an option in your configuration doesn’t exist in the running version of the program. Here is one example of how this option can be used in conjunction with the --lastconfig option. Let’s assume that you were satisfied with the results of this command: astnoisechisel image.fits --snquant=0.95 (along with various options set in various configuration files). You can save the state of NoiseChisel and reproduce that exact result on image.fits later by following these steps (the extra spaces, and \, are only for easy readability, if you want to try it out, only one space between each token is enough). $ echo "onlyversion X.XX"             > reproducible.conf
$echo "lastconfig 1" >> reproducible.conf$ astnoisechisel image.fits --snquant=0.95 -P            \
>> reproducible.conf


--onlyversion was available from Gnuastro 0.0, so putting it immediately at the start of a configuration file will ensure that later, you (or others using different version) won’t get a non-recognized option error in case an option was added/removed. --lastconfig will inform the installed NoiseChisel to not parse any other configuration files. This is done because we don’t want the user’s user-wide or system wide option values affecting our results. Finally, with the third command, which has a -P (short for --printparams), NoiseChisel will print all the option values visible to it (in all the configuration files) and the shell will append them to reproduce.conf. Hence, you don’t have to worry about remembering the (possibly) different options in the different configuration files.

Afterwards, if you run NoiseChisel as shown below (telling it to read this configuration file with the --config option). You can be sure that there will either be an error (for version mismatch) or it will produce exactly the same result that you got before.

$astnoisechisel --config=reproducible.conf  --log Some programs can generate extra information about their outputs in a log file. When this option is called in those programs, the log file will also be printed. If the program doesn’t generate a log file, this option is ignored.  --log isn’t thread-safe: The log file usually has a fixed name. Therefore if two simultaneous calls (with --log) of a program are made in the same directory, the program will try to write to he same file. This will cause problems like unreasonable log file, undefined behavior, or a crash. -N INT --numthreads=INT Use INT CPU threads when running a Gnuastro program (see Multi-threaded operations). If the value is zero (0), or this option is not given on the command-line or any configuration file, the value will be determined at run-time: the maximum number of threads available to the system when you run a Gnuastro program. Note that multi-threaded programming is only relevant to some programs. In others, this option will be ignored. Previous: , Up: Command-line [Contents][Index] #### 4.1.3 Standard input The most common way to feed the primary/first input dataset into a program is to give its filename as an argument (discussed in Arguments). When you want to run a series of programs in sequence, this means that each will have to keep the output of each program in a separate file and re-type that file’s name in the next command. This can be very slow and frustrating (mis-typing a file’s name). To solve the problem, the founders of Unix defined pipes to directly feed the output of one program (its “Standard output” stream) into the “standard input” of a next program. This removes the need to make temporary files between separate processes and became one of the best demonstrations of the Unix-way, or Unix philosophy. Every program has three streams identifying where it reads/writes non-file inputs/outputs: Standard input, Standard output, and Standard error. When a program is called alone, all three are directed to the terminal that you are using. If it needs an input, it will prompt you for one and you can type it in. Or, it prints its results in the terminal for you to see. For example, say you have a FITS table/catalog containing the B and V band magnitudes (MAG_B and MAG_V columns) of a selection of galaxies along with many other columns. If you want to see only these two columns in your terminal, can use Gnuastro’s Table program like below: $ asttable cat.fits -cMAG_B,MAG_V


Through the Unix pipe mechanism, when the shell confronts the pipe character (|), it connects the standard output of the program before the pipe, to the standard input of the program after it. So it is literally a “pipe”: everything that you would see printed by the first program on the command (without any pipe), is now passed to the second program (and not seen by you).

To continue the previous example, let’s say you want to see the B-V color. To do this, you can pipe Table’s output to AWK (a wonderful tool for processing things like plain text tables):

$asttable cat.fits -cMAG_B,MAG_V | awk '{print$1-$2}'  But understanding the distribution by visually seeing all the numbers under each other is not too useful! You can therefore feed this single column information into Statistics to give you a general feeling of the distribution with the same command: $ asttable cat.fits -cMAG_B,MAG_V | awk '{print $1-$2}' | aststatistics


Gnuastro’s programs that accept input from standard input, only look into the Standard input stream if there is no first argument. In other words, arguments take precedence over Standard input. When no argument is provided, the programs check if the standard input stream is already full or not (output from another program is waiting to be used). If data is present in the standard input stream, it is used.

When the standard input is empty, the program will wait --stdintimeout micro-seconds for you to manually enter the first line (ending with a new-line character, or the ENTER key, see Input/Output options). If it detects the first line in this time, there is no more time limit, and you can manually write/type all the lines for as long as it takes. To inform the program that Standard input has finished, press CTRL-D after a new line. If the program doesn’t catch the first line before the time-out finishes, it will abort with an error saying that no input was provided.

 Manual input in Standard input is discarded: Be careful that when you manually fill the Standard input, the data will be discarded once the program finishes and reproducing the result will be impossible. Therefore this form of providing input is only good for temporary tests.
 Standard input currently only for plain text: Currently Standard input only works for plain text inputs like the example above. We will later allow FITS files into the programs through standard input also.

Next: , Previous: , Up: Common program behavior   [Contents][Index]

### 4.2 Configuration files

Each program needs a certain number of parameters to run. Supplying all the necessary parameters each time you run the program is very frustrating and prone to errors. Therefore all the programs read the values for the necessary options you have not given in the command line from one of several plain text files (which you can view and edit with any text editor). These files are known as configuration files and are usually kept in a directory named etc/ according to the file system hierarchy standard76.

The thing to have in mind is that none of the programs in Gnuastro keep any internal default value. All the values must either be stored in one of the configuration files or explicitly called in the command-line. In case the necessary parameters are not given through any of these methods, the program will print a missing option error and abort. The only exception to this is --numthreads, whose default value is determined at run-time using the number of threads available to your system, see Multi-threaded operations. Of course, you can still provide a default value for the number of threads at any of the levels below, but if you don’t, the program will not abort. Also note that through automatic output name generation, the value to the --output option is also not mandatory on the command-line or in the configuration files for all programs which don’t rely on that value as an input77, see Automatic output.

Next: , Previous: , Up: Configuration files   [Contents][Index]

#### 4.2.1 Configuration file format

The configuration files for each program have the standard program executable name with a ‘.conf’ suffix. When you download the source code, you can find them in the same directory as the source code of each program, see Program source.

Any line in the configuration file whose first non-white character is a # is considered to be a comment and is ignored. An empty line is also similarly ignored. The long name of the option should be used as an identifier. The parameter name and parameter value have to be separated by any number of ‘white-space’ characters: space, tab or vertical tab. By default several space characters are used. If the value of an option has space characters (most commonly for the hdu option), then the full value can be enclosed in double quotation signs (", similar to the example in Arguments and options). If it is an option without a value in the --help output (on/off option, see Options), then the value should be 1 if it is to be ‘on’ and 0 otherwise.

In each non-commented and non-blank line, any text after the first two words (option identifier and value) is ignored. If an option identifier is not recognized in the configuration file, the name of the file, the line number of the unrecognized option, and the unrecognized identifier name will be reported and the program will abort. If a parameter is repeated more more than once in the configuration files, accepts only one value, and is not set on the command-line, then only the first value will be used, the rest will be ignored.

You can build or edit any of the directories and the configuration files yourself using any text editor. However, it is recommended to use the --setdirconf and --setusrconf options to set default values for the current directory or this user, see Operating mode options. With these options, the values you give will be checked before writing in the configuration file. They will also print a set of commented lines guiding the reader and will also classify the options based on their context and write them in their logical order to be more understandable.

Next: , Previous: , Up: Configuration files   [Contents][Index]

#### 4.2.2 Configuration file precedence

The option values in all the programs of Gnuastro will be filled in the following order. If an option only takes one value which is given in an earlier step, any value for that option in a later step will be ignored. Note that if the lastconfig option is specified in any step below, no other configuration files will be parsed (see Operating mode options).

1. Command-line options, for a particular run of ProgramName.
2. .gnuastro/astprogname.conf is parsed by ProgramName in the current directory.
3. .gnuastro/gnuastro.conf is parsed by all Gnuastro programs in the current directory.
4. $HOME/.local/etc/astprogname.conf is parsed by ProgramName in the user’s home directory (see Current directory and User wide). 5.$HOME/.local/etc/gnuastro.conf is parsed by all Gnuastro programs in the user’s home directory (see Current directory and User wide).
6. prefix/etc/astprogname.conf is parsed by ProgramName in the system-wide installation directory (see System wide for prefix).
7. prefix/etc/gnuastro.conf is parsed by all Gnuastro programs in the system-wide installation directory (see System wide for prefix).

The basic idea behind setting this progressive state of checking for parameter values is that separate users of a computer or separate folders in a user’s file system might need different values for some parameters.

 Checking the order: You can confirm/check the order of parsing configuration files using the --checkconfig option with any Gnuastro program, see Operating mode options. Just be sure to place this option immediately after the program name, before any other option.

As you see above, there can also be a configuration file containing the common options in all the programs: gnuastro.conf (see Common options). If options specific to one program are specified in this file, there will be unrecognized option errors, or unexpected behavior if the option has different behavior in another program. On the other hand, there is no problem with astprogname.conf containing common options78.

 Manipulating the order: You can manipulate this order or add new files with the following two options which are fully described in Operating mode options: --config Allows you to define any file to be parsed as a configuration file on the command-line or within the any other configuration file. Recall that the file given to --config is parsed immediately when this option is confronted (on the command-line or in a configuration file). --lastconfig Allows you to stop the parsing of subsequent configuration files. Note that if this option is given in a configuration file, it will be fully read, so its position in the configuration doesn’t matter (unlike --config).

One example of benefiting from these configuration files can be this: raw telescope images usually have their main image extension in the second FITS extension, while processed FITS images usually only have one extension. If your system-wide default input extension is 0 (the first), then when you want to work with the former group of data you have to explicitly mention it to the programs every time. With this progressive state of default values to check, you can set different default values for the different directories that you would like to run Gnuastro in for your different purposes, so you won’t have to worry about this issue any more.

The same can be said about the gnuastro.conf files: by specifying a behavior in this single file, all Gnuastro programs in the respective directory, user, or system-wide steps will behave similarly. For example to keep the input’s directory when no specific output is given (see Automatic output), or to not delete an existing file if it has the same name as a given output (see Input/Output options).

Next: , Previous: , Up: Configuration files   [Contents][Index]

#### 4.2.3 Current directory and User wide

For the current (local) and user-wide directories, the configuration files are stored in the hidden sub-directories named .gnuastro/ and $HOME/.local/etc/ respectively. Unless you have changed it, the$HOME environment variable should point to your home directory. You can check it by running $echo$HOME. Each time you run any of the programs in Gnuastro, this environment variable is read and placed in the above address. So if you suddenly see that your home configuration files are not being read, probably you (or some other program) has changed the value of this environment variable.

Although it might cause confusions like above, this dependence on the HOME environment variable enables you to temporarily use a different directory as your home directory. This can come in handy in complicated situations. To set the user or current directory configuration files based on your command-line input, you can use the --setdirconf or --setusrconf, see Operating mode options.

Previous: , Up: Configuration files   [Contents][Index]

#### 4.2.4 System wide

When Gnuastro is installed, the configuration files that are shipped with the distribution are copied into the (possibly system wide) prefix/etc/ directory. For more details on prefix, see Installation directory (by default it is: /usr/local). This directory is the final place (with the lowest priority) that the programs in Gnuastro will check to retrieve parameter values.

If you remove an option and its value from the system wide configuration files, you either have to specify it in more immediate configuration files or set it each time in the command-line. Recall that none of the programs in Gnuastro keep any internal default values and will abort if they don’t find a value for the necessary parameters (except the number of threads and output file name). So even though you might never expect to use an optional option, it safe to have it available in this system-wide configuration file even if you don’t intend to use it frequently.

Note that in case you install Gnuastro from your distribution’s repositories, prefix will either be set to / (the root directory) or /usr, so you can find the system wide configuration variables in /etc/ or /usr/etc/. The prefix of /usr/local/ is conventionally used for programs you install from source by your self as in Quick start.

Next: , Previous: , Up: Common program behavior   [Contents][Index]

### 4.3 Getting help

Probably the first time you read this book, it is either in the PDF or HTML formats. These two formats are very convenient for when you are not actually working, but when you are only reading. Later on, when you start to use the programs and you are deep in the middle of your work, some of the details will inevitably be forgotten. Going to find the PDF file (printed or digital) or the HTML webpage is a major distraction.

GNU software have a very unique set of tools for aiding your memory on the command-line, where you are working, depending how much of it you need to remember. In the past, such command-line help was known as “online” help, because they were literally provided to you ‘on’ the command ‘line’. However, nowadays the word “online” refers to something on the internet, so that term will not be used. With this type of help, you can resume your exciting research without taking your hands off the keyboard.

Another major advantage of such command-line based help routines is that they are installed with the software in your computer, therefore they are always in sync with the executable you are actually running. Three of them are actually part of the executable. You don’t have to worry about the version of the book or program. If you rely on external help (a PDF in your personal print or digital archive or HTML from the official webpage) you have to check to see if their versions fit with your installed program.

If you only need to remember the short or long names of the options, --usage is advised. If it is what the options do, then --help is a great tool. Man pages are also provided for those who are use to this older system of documentation. This full book is also available to you on the command-line in Info format. If none of these seems to resolve the problems, there is a mailing list which enables you to get in touch with experienced Gnuastro users. In the subsections below each of these methods are reviewed.

Next: , Previous: , Up: Getting help   [Contents][Index]

#### 4.3.1 --usage

If you give this option, the program will not run. It will only print a very concise message showing the options and arguments. Everything within square brackets ([]) is optional. For example here are the first and last two lines of Crop’s --usage is shown:

$astcrop --usage Usage: astcrop [-Do?IPqSVW] [-d INT] [-h INT] [-r INT] [-w INT] [-x INT] [-y INT] [-c INT] [-p STR] [-N INT] [--deccol=INT] .... [--setusrconf] [--usage] [--version] [--wcsmode] [ASCIIcatalog] FITSimage(s).fits  There are no explanations on the options, just their short and long names shown separately. After the program name, the short format of all the options that don’t require a value (on/off options) is displayed. Those that do require a value then follow in separate brackets, each displaying the format of the input they want, see Options. Since all options are optional, they are shown in square brackets, but arguments can also be optional. For example in this example, a catalog name is optional and is only required in some modes. This is a standard method of displaying optional arguments for all GNU software. Next: , Previous: , Up: Getting help [Contents][Index] #### 4.3.2 --help If the command-line includes this option, the program will not be run. It will print a complete list of all available options along with a short explanation. The options are also grouped by their context. Within each context, the options are sorted alphabetically. Since the options are shown in detail afterwards, the first line of the --help output shows the arguments and if they are optional or not, similar to --usage. In the --help output of all programs in Gnuastro, the options for each program are classified based on context. The first two contexts are always options to do with the input and output respectively. For example input image extensions or supplementary input files for the inputs. The last class of options is also fixed in all of Gnuastro, it shows operating mode options. Most of these options are already explained in Operating mode options. The help message will sometimes be longer than the vertical size of your terminal. If you are using a graphical user interface terminal emulator, you can scroll the terminal with your mouse, but we promised no mice distractions! So here are some suggestions: • Shift + PageUP to scroll up and Shift + PageDown to scroll down. For most help output this should be enough. The problem is that it is limited by the number of lines that your terminal keeps in memory and that you can’t scroll by lines, only by whole screens. • Pipe to less. A pipe is a form of shell re-direction. The less tool in Unix-like systems was made exactly for such outputs of any length. You can pipe (|) the output of any program that is longer than the screen to it and then you can scroll through (up and down) with its many tools. For example: $ astnoisechisel --help | less


Once you have gone through the text, you can quit less by pressing the q key.

• Redirect to a file. This is a less convenient way, because you will then have to open the file in a text editor! You can do this with the shell redirection tool (>):
$astnoisechisel --help > filename.txt  In case you have a special keyword you are looking for in the help, you don’t have to go through the full list. GNU Grep is made for this job. For example if you only want the list of options whose --help output contains the word “axis” in Crop, you can run the following command: $ astcrop --help | grep axis


If the output of this option does not fit nicely within the confines of your terminal, GNU does enable you to customize its output through the environment variable ARGP_HELP_FMT, you can set various parameters which specify the formatting of the help messages. For example if your terminals are wider than 70 spaces (say 100) and you feel there is too much empty space between the long options and the short explanation, you can change these formats by giving values to this environment variable before running the program with the --help output. You can define this environment variable in this manner:

$export ARGP_HELP_FMT=rmargin=100,opt-doc-col=20  This will affect all GNU programs using GNU C library’s argp.h facilities as long as the environment variable is in memory. You can see the full list of these formatting parameters in the “Argp User Customization” part of the GNU C library manual. If you are more comfortable to read the --help outputs of all GNU software in your customized format, you can add your customization (similar to the line above, without the $ sign) to your ~/.bashrc file. This is a standard option for all GNU software.

Next: , Previous: , Up: Getting help   [Contents][Index]

#### 4.3.3 Man pages

Man pages were the Unix method of providing command-line documentation to a program. With GNU Info, see Info the usage of this method of documentation is highly discouraged. This is because Info provides a much more easier to navigate and read environment.

However, some operating systems require a man page for packages that are installed and some people are still used to this method of command line help. So the programs in Gnuastro also have Man pages which are automatically generated from the outputs of --version and --help using the GNU help2man program. So if you run

$man programname  You will be provided with a man page listing the options in the standard manner. Next: , Previous: , Up: Getting help [Contents][Index] #### 4.3.4 Info Info is the standard documentation format for all GNU software. It is a very useful command-line document viewing format, fully equipped with links between the various pages and menus and search capabilities. As explained before, the best thing about it is that it is available for you the moment you need to refresh your memory on any command-line tool in the middle of your work without having to take your hands off the keyboard. This complete book is available in Info format and can be accessed from anywhere on the command-line. To open the Info format of any installed programs or library on your system which has an Info format book, you can simply run the command below (change executablename to the executable name of the program or library): $ info executablename


In case you are not already familiar with it, run $info info. It does a fantastic job in explaining all its capabilities its self. It is very short and you will become sufficiently fluent in about half an hour. Since all GNU software documentation is also provided in Info, your whole GNU/Linux life will significantly improve. Once you’ve become an efficient navigator in Info, you can go to any part of this book or any other GNU software or library manual, no matter how long it is, in a matter of seconds. It also blends nicely with GNU Emacs (a text editor) and you can search manuals while you are writing your document or programs without taking your hands off the keyboard, this is most useful for libraries like the GNU C library. To be able to access all the Info manuals installed in your GNU/Linux within Emacs, type Ctrl-H + i. To see this whole book from the beginning in Info, you can run $ info gnuastro


If you run Info with the particular program executable name, for example astcrop or astnoisechisel:

$info astprogramname  you will be taken to the section titled “Invoking ProgramName” which explains the inputs and outputs along with the command-line options for that program. Finally, if you run Info with the official program name, for example Crop or NoiseChisel: $ info ProgramName


you will be taken to the top section which introduces the program. Note that in all cases, Info is not case sensitive.

Previous: , Up: Getting help   [Contents][Index]

#### 4.3.5 help-gnuastro mailing list

Gnuastro maintains the help-gnuastro mailing list for users to ask any questions related to Gnuastro. The experienced Gnuastro users and some of its developers are subscribed to this mailing list and your email will be sent to them immediately. However, when contacting this mailing list please have in mind that they are possibly very busy and might not be able to answer immediately.

To ask a question from this mailing list, send a mail to help-gnuastro@gnu.org. Anyone can view the mailing list archives at http://lists.gnu.org/archive/html/help-gnuastro/. It is best that before sending a mail, you search the archives to see if anyone has asked a question similar to yours. If you want to make a suggestion or report a bug, please don’t send a mail to this mailing list. We have other mailing lists and tools for those purposes, see Report a bug or Suggest new feature.

Next: , Previous: , Up: Common program behavior   [Contents][Index]

### 4.4 Installed scripts

Gnuastro’s programs (introduced in previous chapters) are designed to be highly modular and thus mainly contain lower-level operations on the data. However, in many contexts, higher-level operations (for example a sequence of calls to multiple Gnuastro programs, or a special way of running a program and using the outputs) are also very similar between various projects.

To facilitate data analysis on these higher-level steps also, Gnuastro also installs some scripts on your system with the (astscript-) prefix (in contrast to the other programs that only have the ast prefix).

Like all of Gnuastro’s source code, these scripts are also heavily commented. They are written in GNU Bash, which doesn’t need compilation. Therefore, if you open the installed scripts in a text editor, you can actually read them79. Bash is the same language that is mainly used when typing on the command-line. Because of these factors, Bash is much more widely known and used than C (the language of other Gnuastro programs). Gnuastro’s installed scripts also do higher-level operations, so customizing these scripts for a special project will be more common than the programs. You can always inspect them (to customize, check, or educate your self) with this command (just replace emacs with your favorite text editor):

$emacs$(which astscript-NAME)


These scripts also accept options and are in many ways similar to the programs (see Common options) with some minor differences:

• Currently they don’t accept configuration files themselves. However, the configuration files of the Gnuastro programs they call are indeed parsed and used by those programs.

As a result, they don’t have the following options: --checkconfig, --config, --lastconfig, --onlyversion, --printparams, --setdirconf and --setusrconf.

• They don’t directly allocate any memory, so there is no --minmapsize.
• They don’t have an independent --usage option: when called with --usage, they just recommend running --help.
• The output of --help is not configurable like the programs (see --help).
• The scripts will commonly use your installed Bash and other basic command-line tools (for example AWK or SED). Different systems have different versions and implementations of these basic tools (for example GNU/Linux systems use GNU AWK and GNU SED which are far more advanced and up to date then the minimalist AWK and SED of most other systems). Therefore, unexpected errors in these tools might come up when you run these scripts. We will try our best to write these scripts in a portable way. However, if you do confront such strange errors, please submit a bug report so we fix it (see Report a bug).

Next: , Previous: , Up: Common program behavior   [Contents][Index]

Some of the programs benefit significantly when you use all the threads your computer’s CPU has to offer to your operating system. The number of threads available can be larger than the number of physical (hardware) cores in the CPU (also known as Simultaneous multithreading). For example, in Intel’s CPUs (those that implement its Hyper-threading technology) the number of threads is usually double the number of physical cores in your CPU. On a GNU/Linux system, the number of threads available can be found with the command $nproc command (part of GNU Coreutils). Gnuastro’s programs can find the number of threads available to your system internally at run-time (when you execute the program). However, if a value is given to the --numthreads option, the given number will be used, see Operating mode options and Configuration files for ways to use this option. Thus --numthreads is the only common option in Gnuastro’s programs with a value that doesn’t have to be specified anywhere on the command-line or in the configuration files. Next: , Previous: , Up: Multi-threaded operations [Contents][Index] #### 4.5.1 A note on threads Spinning off threads is not necessarily the most efficient way to run an application. Creating a new thread isn’t a cheap operation for the operating system. It is most useful when the input data are fixed and you want the same operation to be done on parts of it. For example one input image to Crop and multiple crops from various parts of it. In this fashion, the image is loaded into memory once, all the crops are divided between the number of threads internally and each thread cuts out those parts which are assigned to it from the same image. On the other hand, if you have multiple images and you want to crop the same region(s) out of all of them, it is much more efficient to set --numthreads=1 (so no threads spin off) and run Crop multiple times simultaneously, see How to run simultaneous operations. You can check the boost in speed by first running a program on one of the data sets with the maximum number of threads and another time (with everything else the same) and only using one thread. You will notice that the wall-clock time (reported by most programs at their end) in the former is longer than the latter divided by number of physical CPU cores (not threads) available to your operating system. Asymptotically these two times can be equal (most of the time they aren’t). So limiting the programs to use only one thread and running them independently on the number of available threads will be more efficient. Note that the operating system keeps a cache of recently processed data, so usually, the second time you process an identical data set (independent of the number of threads used), you will get faster results. In order to make an unbiased comparison, you have to first clean the system’s cache with the following command between the two runs. $ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches

 SUMMARY: Should I use multiple threads? Depends: If you only have one data set (image in most cases!), then yes, the more threads you use (with a maximum of the number of threads available to your OS) the faster you will get your results. If you want to run the same operation on multiple data sets, it is best to set the number of threads to 1 and use Make, or GNU Parallel, as explained in How to run simultaneous operations.

Previous: , Up: Multi-threaded operations   [Contents][Index]

#### 4.5.2 How to run simultaneous operations

There are two80 approaches to simultaneously execute a program: using GNU Parallel or Make (GNU Make is the most common implementation). The first is very useful when you only want to do one job multiple times and want to get back to your work without actually keeping the command you ran. The second is usually for more important operations, with lots of dependencies between the different products (for example a full scientific research).

GNU Parallel

When you only want to run multiple instances of a command on different threads and get on with the rest of your work, the best method is to use GNU parallel. Surprisingly GNU Parallel is one of the few GNU packages that has no Info documentation but only a Man page, see Info. So to see the documentation after installing it please run

$man parallel  As an example, let’s assume we want to crop a region fixed on the pixels (500, 600) with the default width from all the FITS images in the ./data directory ending with sci.fits to the current directory. To do this, you can run: $ parallel astcrop --numthreads=1 --xc=500 --yc=600 ::: \
./data/*sci.fits


GNU Parallel can help in many more conditions, this is one of the simplest, see the man page for lots of other examples. For absolute beginners: the backslash (\) is only a line breaker to fit nicely in the page. If you type the whole command in one line, you should remove it.

Make

Make is a program for building “targets” (e.g., files) using “recipes” (a set of operations) when their known “prerequisites” (other files) have been updated. It elegantly allows you to define dependency structures for building your final output and updating it efficiently when the inputs change. It is the most common infra-structure to build software today.

Scientific research methodology is very similar to software development: you start by testing a hypothesis on a small sample of objects/targets with a simple set of steps. As you are able to get promising results, you improve the method and use it on a larger, more general, sample. In the process, you will confront many issues that have to be corrected (bugs in software development jargon). Make a wonderful tool to manage this style of development. It has been used to make reproducible papers, for example see the reproduction pipeline of the paper introducing NoiseChisel (one of Gnuastro’s programs).

GNU Make81 is the most common implementation which (similar to nearly all GNU programs, comes with a wonderful manual82). Make is very basic and simple, and thus the manual is short (the most important parts are in the first roughly 100 pages) and easy to read/understand.

Make comes with a --jobs (-j) option which allows you to specify the maximum number of jobs that can be done simultaneously. For example if you have 8 threads available to your operating system. You can run:

$make -j8  With this command, Make will process your Makefile and create all the targets (can be thousands of FITS images for example) simultaneously on 8 threads, while fully respecting their dependencies (only building a file/target when its prerequisites are successfully built). Make is thus strongly recommended for managing scientific research where robustness, archiving, reproducibility and speed83 are important. Next: , Previous: , Up: Common program behavior [Contents][Index] ### 4.6 Numeric data types At the lowest level, the computer stores everything in terms of 1 or 0. For example, each program in Gnuastro, or each astronomical image you take with the telescope is actually a string of millions of these zeros and ones. The space required to keep a zero or one is the smallest unit of storage, and is known as a bit. However, understanding and manipulating this string of bits is extremely hard for most people. Therefore, different standards are defined to package the bits into separate types with a fixed interpretation of the bits in each package. To store numbers, the most basic standard/type is for integers ($$..., -2, -1, 0, 1, 2, ...$$). The common integer types are 8, 16, 32, and 64 bits wide (more bits will give larger limits). Each bit corresponds to a power of 2 and they are summed to create the final number. In the integer types, for each width there are two standards for reading the bits: signed and unsigned. In the ‘signed’ convention, one bit is reserved for the sign (stating that the integer is positive or negative). The ‘unsigned’ integers use that bit in the actual number and thus contain only positive numbers (starting from zero). Therefore, at the same number of bits, both signed and unsigned integers can allow the same number of integers, but the positive limit of the unsigned types is double their signed counterparts with the same width (at the expense of not having negative numbers). When the context of your work doesn’t involve negative numbers (for example counting, where negative is not defined), it is best to use the unsigned types. For the full numerical range of all integer types, see below. Another standard of converting a given number of bits to numbers is the floating point standard, this standard can approximately store any real number with a given precision. There are two common floating point types: 32-bit and 64-bit, for single and double precision floating point numbers respectively. The former is sufficient for data with less than 8 significant decimal digits (most astronomical data), while the latter is good for less than 16 significant decimal digits. The representation of real numbers as bits is much more complex than integers. If you are interested to learn more about it, you can start with the Wikipedia article. Practically, you can use Gnuastro’s Arithmetic program to convert/change the type of an image/datacube (see Arithmetic), or Gnuastro Table program to convert a table column’s data type (see Column arithmetic). Conversion of a dataset’s type is necessary in some contexts. For example the program/library, that you intend to feed the data into, only accepts floating point values, but you have an integer image/column. Another situation that conversion can be helpful is when you know that your data only has values that fit within int8 or uint16. However it is currently formatted in the float64 type. The important thing to consider is that operations involving wider, floating point, or signed types can be significantly slower than smaller-width, integer, or unsigned types respectively. Note that besides speed, a wider type also requires much more storage space (by 4 or 8 times). Therefore, when you confront such situations that can be optimized and want to store/archive/transfer the data, it is best to use the most efficient type. For example if your dataset (image or table column) only has positive integers less than 65535, store it as an unsigned 16-bit integer for faster processing, faster transfer, and less storage space. The short and long names for the recognized numeric data types in Gnuastro are listed below. Both short and long names can be used when you want to specify a type. For example, as a value to the common option --type (see Input/Output options), or in the information comment lines of Gnuastro text table format. The ranges listed below are inclusive. u8 uint8 8-bit unsigned integers, range: $$[0\rm{\ to\ }2^8-1]$$ or $$[0\rm{\ to\ }255]$$. i8 int8 8-bit signed integers, range: $$[-2^7\rm{\ to\ }2^7-1]$$ or $$[-128\rm{\ to\ }127]$$. u16 uint16 16-bit unsigned integers, range: $$[0\rm{\ to\ }2^{16}-1]$$ or $$[0\rm{\ to\ }65535]$$. i16 int16 16-bit signed integers, range: $$[-2^{15}\rm{\ to\ }2^{15}-1]$$ or $$[-32768\rm{\ to\ }32767]$$. u32 uint32 32-bit unsigned integers, range: $$[0\rm{\ to\ }2^{32}-1]$$ or $$[0\rm{\ to\ }4294967295]$$. i32 int32 32-bit signed integers, range: $$[-2^{31}\rm{\ to\ }2^{31}-1]$$ or $$[-2147483648\rm{\ to\ }2147483647]$$. u64 uint64 64-bit unsigned integers, range $$[0\rm{\ to\ }2^{64}-1]$$ or $$[0\rm{\ to\ }18446744073709551615]$$. i64 int64 64-bit signed integers, range: $$[-2^{63}\rm{\ to\ }2^{63}-1]$$ or $$[-9223372036854775808\rm{\ to\ }9223372036854775807]$$. f32 float32 32-bit (single-precision) floating point types. The maximum (minimum is its negative) possible value is $$3.402823\times10^{38}$$. Single-precision floating points can accurately represent a floating point number up to $$\sim7.2$$ significant decimals. Given the heavy noise in astronomical data, this is usually more than sufficient for storing results. f64 float64 64-bit (double-precision) floating point types. The maximum (minimum is its negative) possible value is $$\sim10^{308}$$. Double-precision floating points can accurately represent a floating point number $$\sim15.9$$ significant decimals. This is usually good for processing (mixing) the data internally, for example a sum of single precision data (and later storing the result as float32).  Some file formats don’t recognize all types. For example the FITS standard (see Fits) does not define uint64 in binary tables or images. When a type is not acceptable for output into a given file format, the respective Gnuastro program or library will let you know and abort. On the command-line, you can convert the numerical type of an image, or table column into another type with Arithmetic or Table respectively. If you are writing your own program, you can use the gal_data_copy_to_new_type() function in Gnuastro’s library, see Copying datasets. Next: , Previous: , Up: Common program behavior [Contents][Index] ### 4.7 Tables “A table is a collection of related data held in a structured format within a database. It consists of columns, and rows.” (from Wikipedia). Each column in the table contains the values of one property and each row is a collection of properties (columns) for one target object. For example, let’s assume you have just ran MakeCatalog (see MakeCatalog) on an image to measure some properties for the labeled regions (which might be detected galaxies for example) in the image. For each labeled region (detected galaxy), there will be a row which groups its measured properties as columns, one column for each property. One such property can be the object’s magnitude, which is the sum of pixels with that label, or its center can be defined as the light-weighted average value of those pixels. Many such properties can be derived from the raw pixel values and their position, see Invoking MakeCatalog for a long list. As a summary, for each labeled region (or, galaxy) we have one row and for each measured property we have one column. This high-level structure is usually the first step for higher-level analysis, for example finding the stellar mass or photometric redshift from magnitudes in multiple colors. Thus, tables are not just outputs of programs, in fact it is much more common for tables to be inputs of programs. For example, to make a mock galaxy image, you need to feed in the properties of each galaxy into MakeProfiles for it do the inverse of the process above and make a simulated image from a catalog, see Sufi simulates a detection. In other cases, you can feed a table into Crop and it will crop out regions centered on the positions within the table, see Finding reddest clumps and visual inspection. So to end this relatively long introduction, tables play a very important role in astronomy, or generally all branches of data analysis. In Recognized table formats the currently recognized table formats in Gnuastro are discussed. You can use any of these tables as input or ask for them to be built as output. The most common type of table format is a simple plain text file with each row on one line and columns separated by white space characters, this format is easy to read/write by eye/hand. To give it the full functionality of more specific table types like the FITS tables, Gnuastro has a special convention which you can use to give each column a name, type, unit, and comments, while still being readable by other plain text table readers. This convention is described in Gnuastro text table format. When tables are input to a program, the program reading it needs to know which column(s) it should use for its desired purposes. Gnuastro’s programs all follow a similar convention, on the way you can select columns in a table. They are thoroughly discussed in Selecting table columns. Next: , Previous: , Up: Tables [Contents][Index] #### 4.7.1 Recognized table formats The list of table formats that Gnuastro can currently read from and write to are described below. Each has their own advantage and disadvantages, so a short review of the format is also provided to help you make the best choice based on how you want to define your input tables or later use your output tables. Plain text table This is the most basic and simplest way to create, view, or edit the table by hand on a text editor. The other formats described below are less eye-friendly and have a more formal structure (for easier computer readability). It is fully described in Gnuastro text table format. FITS ASCII tables The FITS ASCII table extension is fully in ASCII encoding and thus easily readable on any text editor (assuming it is the only extension in the FITS file). If the FITS file also contains binary extensions (for example an image or binary table extensions), then there will be many hard to print characters. The FITS ASCII format doesn’t have new line characters to separate rows. In the FITS ASCII table standard, each row is defined as a fixed number of characters (value to the NAXIS1 keyword), so to visually inspect it properly, you would have to adjust your text editor’s width to this value. All columns start at given character positions and have a fixed width (number of characters). Numbers in a FITS ASCII table are printed into ASCII format, they are not in binary (that the CPU uses). Hence, they can take a larger space in memory, loose their precision, and take longer to read into memory. If you are dealing with integer type columns (see Numeric data types), another issue with FITS ASCII tables is that the type information for the column will be lost (there is only one integer type in FITS ASCII tables). One problem with the binary format on the other hand is that it isn’t portable (different CPUs/compilers) have different standards for translating the zeros and ones. But since ASCII characters are defined on a byte and are well recognized, they are better for portability on those various systems. Gnuastro’s plain text table format described below is much more portable and easier to read/write/interpret by humans manually. Generally, as the name implies, this format is useful for when your table mainly contains ASCII columns (for example file names, or descriptions). They can be useful when you need to include columns with structured ASCII information along with other extensions in one FITS file. In such cases, you can also consider header keywords (see Fits). FITS binary tables The FITS binary table is the FITS standard’s solution to the issues discussed with keeping numbers in ASCII format as described under the FITS ASCII table title above. Only columns defined as a string type (a string of ASCII characters) are readable in a text editor. The portability problem with binary formats discussed above is mostly solved thanks to the portability of CFITSIO (see CFITSIO) and the very long history of the FITS format which has been widely used since the 1970s. In the case of most numbers, storing them in binary format is more memory efficient than ASCII format. For example, to store -25.72034 in ASCII format, you need 9 bytes/characters. But if you keep this same number (to the approximate precision possible) as a 4-byte (32-bit) floating point number, you can keep/transmit it with less than half the amount of memory. When catalogs contain thousands/millions of rows in tens/hundreds of columns, this can lead to significant improvements in memory/band-width usage. Moreover, since the CPU does its operations in the binary formats, reading the table in and writing it out is also much faster than an ASCII table. When you are dealing with integer numbers, the compression ratio can be even better, for example if you know all of the values in a column are positive and less than 255, you can use the unsigned char type which only takes one byte! If they are between -128 and 127, then you can use the (signed) char type. So if you are thoughtful about the limits of your integer columns, you can greatly reduce the size of your file and also the speed at which it is read/written. This can be very useful when sharing your results with collaborators or publishing them. To decrease the file size even more you can name your output as ending in .fits.gz so it is also compressed after creation. Just note that compression/decompressing is CPU intensive and can slow down the writing/reading of the file. Fortunately the FITS Binary table format also accepts ASCII strings as column types (along with the various numerical types). So your dataset can also contain non-numerical columns. Next: , Previous: , Up: Tables [Contents][Index] #### 4.7.2 Gnuastro text table format Plain text files are the most generic, portable, and easiest way to (manually) create, (visually) inspect, or (manually) edit a table. In this format, the ending of a row is defined by the new-line character (a line on a text editor). So when you view it on a text editor, every row will occupy one line. The delimiters (or characters separating the columns) are white space characters (space, horizontal tab, vertical tab) and a comma (,). The only further requirement is that all rows/lines must have the same number of columns. The columns don’t have to be exactly under each other and the rows can be arbitrarily long with different lengths. For example the following contents in a file would be interpreted as a table with 4 columns and 2 rows, with each element interpreted as a double type (see Numeric data types). 1 2.234948 128 39.8923e8 2 , 4.454 792 72.98348e7  However, the example above has no other information about the columns (it is just raw data, with no meta-data). To use this table, you have to remember what the numbers in each column represent. Also, when you want to select columns, you have to count their position within the table. This can become frustrating and prone to bad errors (getting the columns wrong) especially as the number of columns increase. It is also bad for sending to a colleague, because they will find it hard to remember/use the columns properly. To solve these problems in Gnuastro’s programs/libraries you aren’t limited to using the column’s number, see Selecting table columns. If the columns have names, units, or comments you can also select your columns based on searches/matches in these fields, for example see Table. Also, in this manner, you can’t guide the program reading the table on how to read the numbers. As an example, the first and third columns above can be read as integer types: the first column might be an ID and the third can be the number of pixels an object occupies in an image. So there is no need to read these to columns as a double type (which takes more memory, and is slower). In the bare-minimum example above, you also can’t use strings of characters, for example the names of filters, or some other identifier that includes non-numerical characters. In the absence of any information, only numbers can be read robustly. Assuming we read columns with non-numerical characters as string, there would still be the problem that the strings might contain space (or any delimiter) character for some rows. So, each ‘word’ in the string will be interpreted as a column and the program will abort with an error that the rows don’t have the same number of columns. To correct for these limitations, Gnuastro defines the following convention for storing the table meta-data along with the raw data in one plain text file. The format is primarily designed for ease of reading/writing by eye/fingers, but is also structured enough to be read by a program. When the first non-white character in a line is #, or there are no non-white characters in it, then the line will not be considered as a row of data in the table (this is a pretty standard convention in many programs, and higher level languages). In the former case, the line is interpreted as a comment. If the comment line starts with ‘# Column N:’, then it is assumed to contain information about column N (a number, counting from 1). Comment lines that don’t start with this pattern are ignored and you can use them to include any further information you want to store with the table in the text file. A column information comment is assumed to have the following format: # Column N: NAME [UNIT, TYPE, BLANK] COMMENT  Any sequence of characters between ‘:’ and ‘[’ will be interpreted as the column name (so it can contain anything except the ‘[’ character). Anything between the ‘]’ and the end of the line is defined as a comment. Within the brackets, anything before the first ‘,’ is the units (physical units, for example km/s, or erg/s), anything before the second ‘,’ is the short type identifier (see below, and Numeric data types). Finally (still within the brackets), any non-white characters after the second ‘,’ are interpreted as the blank value for that column (see Blank pixels). The blank value can either be in the same type as the column (for example -99 for a signed integer column), or any string (for example NaN in that same column). In both cases, the values will be stored in memory as Gnuastro’s fixed blank values for each type. For floating point types, Gnuastro’s internal blank value is IEEE NaN (Not-a-Number). For signed integers, it is the smallest possible value and for unsigned integers its the largest possible value. When a formatting problem occurs (for example you have specified the wrong type code, see below), or the column was already given meta-data in a previous comment, or the column number is larger than the actual number of columns in the table (the non-commented or empty lines), then the comment information line will be ignored. When a comment information line can be used, the leading and trailing white space characters will be stripped from all of the elements. For example in this line: # Column 5: column name [km/s, f32,-99] Redshift as speed  The NAME field will be ‘column name’ and the TYPE field will be ‘f32’. Note how all the white space characters before and after strings are not used, but those in the middle remained. Also, white space characters aren’t mandatory. Hence, in the example above, the BLANK field will be given the value of ‘-99’. Except for the column number (N), the rest of the fields are optional. Also, the column information comments don’t have to be in order. In other words, the information for column $$N+m$$ ($$m>0$$) can be given in a line before column $$N$$. Also, you don’t have to specify information for all columns. Those columns that don’t have this information will be interpreted with the default settings (like the case above: values are double precision floating point, and the column has no name, unit, or comment). So these lines are all acceptable for any table (the first one, with nothing but the column number is redundant): # Column 5: # Column 1: ID [,i8] The Clump ID. # Column 3: mag_f160w [AB mag, f32] Magnitude from the F160W filter  The data type of the column should be specified with one of the following values: • For a numeric column, you can use any of the numeric types (and their recognized identifiers) described in Numeric data types. • strN’: for strings. The N value identifies the length of the string (how many characters it has). The start of the string on each row is the first non-delimiter character of the column that has the string type. The next N characters will be interpreted as a string and all leading and trailing white space will be removed. If the next column’s characters, are closer than N characters to the start of the string column in that line/row, they will be considered part of the string column. If there is a new-line character before the ending of the space given to the string column (in other words, the string column is the last column), then reading of the string will stop, even if the N characters are not complete yet. See tests/table/table.txt for one example. Therefore, the only time you have to pay attention to the positioning and spaces given to the string column is when it is not the last column in the table. The only limitation in this format is that trailing and leading white space characters will be removed from the columns that are read. In most cases, this is the desired behavior, but if trailing and leading white-spaces are critically important to your analysis, define your own starting and ending characters and remove them after the table has been read. For example in the sample table below, the two ‘|’ characters (which are arbitrary) will remain in the value of the second column and you can remove them manually later. If only one of the leading or trailing white spaces is important for your work, you can only use one of the ‘|’s. # Column 1: ID [label, u8] # Column 2: Notes [no unit, str50] 1 leading and trailing white space is ignored here 2.3442e10 2 | but they will be preserved here | 8.2964e11  Note that the FITS binary table standard does not define the unsigned int and unsigned long types, so if you want to convert your tables to FITS binary tables, use other types. Also, note that in the FITS ASCII table, there is only one integer type (long). So if you convert a Gnuastro plain text table to a FITS ASCII table with the Table program, the type information for integers will be lost. Conversely if integer types are important for you, you have to manually set them when reading a FITS ASCII table (for example with the Table program when reading/converting into a file, or with the gnuastro/table.h library functions when reading into memory). Previous: , Up: Tables [Contents][Index] #### 4.7.3 Selecting table columns At the lowest level, the only defining aspect of a column in a table is its number, or position. But selecting columns purely by number is not very convenient and, especially when the tables are large it can be very frustrating and prone to errors. Hence, table file formats (for example see Recognized table formats) have ways to store additional information about the columns (meta-data). Some of the most common pieces of information about each column are its name, the units of data in the it, and a comment for longer/informal description of the column’s data. To facilitate research with Gnuastro, you can select columns by matching, or searching in these three fields, besides the low-level column number. To view the full list of information on the columns in the table, you can use the Table program (see Table) with the command below (replace table-file with the filename of your table, if its FITS, you might also need to specify the HDU/extension which contains the table): $ asttable --information table-file


Gnuastro’s programs need the columns for different purposes, for example in Crop, you specify the columns containing the central coordinates of the crop centers with the --coordcol option (see Crop options). On the other hand, in MakeProfiles, to specify the column containing the profile position angles, you must use the --pcol option (see MakeProfiles catalog). Thus, there can be no unified common option name to select columns for all programs (different columns have different purposes). However, when the program expects a column for a specific context, the option names end in the col suffix like the examples above. These options accept values in integer (column number), or string (metadata match/search) format.

If the value can be parsed as a positive integer, it will be seen as the low-level column number. Note that column counting starts from 1, so if you ask for column 0, the respective program will abort with an error. When the value can’t be interpreted as an a integer number, it will be seen as a string of characters which will be used to match/search in the table’s meta-data. The meta-data field which the value will be compared with can be selected through the --searchin option, see Input/Output options. --searchin can take three values: name, unit, comment. The matching will be done following this convention:

• If the value is enclosed in two slashes (for example -x/RA_/, or --coordcol=/RA_/, see Crop options), then it is assumed to be a regular expression with the same convention as GNU AWK. GNU AWK has a very well written chapter describing regular expressions, so we we will not continue discussing them here. Regular expressions are a very powerful tool in matching text and useful in many contexts. We thus strongly encourage reviewing this chapter for greatly improving the quality of your work in many cases, not just for searching column meta-data in Gnuastro.
• When the string isn’t enclosed between ‘/’s, any column that exactly matches the given value in the given field will be selected.

Note that in both cases, you can ignore the case of alphabetic characters with the --ignorecase option, see Input/Output options. Also, in both cases, multiple columns may be selected with one call to this function. In this case, the order of the selected columns (with one call) will be the same order as they appear in the table.

Next: , Previous: , Up: Common program behavior   [Contents][Index]

### 4.8 Tessellation

It is sometimes necessary to classify the elements in a dataset (for example pixels in an image) into a grid of individual, non-overlapping tiles. For example when background sky gradients are present in an image, you can define a tile grid over the image. When the tile sizes are set properly, the background’s variation over each tile will be negligible, allowing you to measure (and subtract) it. In other cases (for example spatial domain convolution in Gnuastro, see Convolve), it might simply be for speed of processing: each tile can be processed independently on a separate CPU thread. In the arts and mathematics, this process is formally known as tessellation.

The size of the regular tiles (in units of data-elements, or pixels in an image) can be defined with the --tilesize option. It takes multiple numbers (separated by a comma) which will be the length along the respective dimension (in FORTRAN/FITS dimension order). Divisions are also acceptable, but must result in an integer. For example --tilesize=30,40 can be used for an image (a 2D dataset). The regular tile size along the first FITS axis (horizontal when viewed in SAO ds9) will be 30 pixels and along the second it will be 40 pixels. Ideally, --tilesize should be selected such that all tiles in the image have exactly the same size. In other words, that the dataset length in each dimension is divisible by the tile size in that dimension.

However, this is not always possible: the dataset can be any size and every pixel in it is valuable. In such cases, Gnuastro will look at the significance of the remainder length, if it is not significant (for example one or two pixels), then it will just increase the size of the first tile in the respective dimension and allow the rest of the tiles to have the required size. When the remainder is significant (for example one pixel less than the size along that dimension), the remainder will be added to one regular tile’s size and the large tile will be cut in half and put in the two ends of the grid/tessellation. In this way, all the tiles in the central regions of the dataset will have the regular tile sizes and the tiles on the edge will be slightly larger/smaller depending on the remainder significance. The fraction which defines the remainder significance along all dimensions can be set through --remainderfrac.

The best tile size is directly related to the spatial properties of the property you want to study (for example, gradient on the image). In practice we assume that the gradient is not present over each tile. So if there is a strong gradient (for example in long wavelength ground based images) or the image is of a crowded area where there isn’t too much blank area, you have to choose a smaller tile size. A larger mesh will give more pixels and and so the scatter in the results will be less (better statistics).

For raw image processing, a single tessellation/grid is not sufficient. Raw images are the unprocessed outputs of the camera detectors. Modern detectors usually have multiple readout channels each with its own amplifier. For example the Hubble Space Telescope Advanced Camera for Surveys (ACS) has four amplifiers over its full detector area dividing the square field of view to four smaller squares. Ground based image detectors are not exempt, for example each CCD of Subaru Telescope’s Hyper Suprime-Cam camera (which has 104 CCDs) has four amplifiers, but they have the same height of the CCD and divide the width by four parts.

The bias current on each amplifier is different, and initial bias subtraction is not perfect. So even after subtracting the measured bias current, you can usually still identify the boundaries of different amplifiers by eye. See Figure 11(a) in Akhlaghi and Ichikawa (2015) for an example. This results in the final reduced data to have non-uniform amplifier-shaped regions with higher or lower background flux values. Such systematic biases will then propagate to all subsequent measurements we do on the data (for example photometry and subsequent stellar mass and star formation rate measurements in the case of galaxies).

Therefore an accurate analysis requires a two layer tessellation: the top layer contains larger tiles, each covering one amplifier channel. For clarity we’ll call these larger tiles “channels”. The number of channels along each dimension is defined through the --numchannels. Each channel is then covered by its own individual smaller tessellation (with tile sizes determined by the --tilesize option). This will allow independent analysis of two adjacent pixels from different channels if necessary. If the image is processed or the detector only has one amplifier, you can set the number of channels in both dimension to 1.

The final tessellation can be inspected on the image with the --checktiles option that is available to all programs which use tessellation for localized operations. When this option is called, a FITS file with a _tiled.fits suffix will be created along with the outputs, see Automatic output. Each pixel in this image has the number of the tile that covers it. If the number of channels in any dimension are larger than unity, you will notice that the tile IDs are defined such that the first channels is covered first, then the second and so on. For the full list of processing-related common options (including tessellation options), please see Processing options.

Next: , Previous: , Up: Common program behavior   [Contents][Index]

### 4.9 Automatic output

All the programs in Gnuastro are designed such that specifying an output file or directory (based on the program context) is optional. When no output name is explicitly given (with --output, see Input/Output options), the programs will automatically set an output name based on the input name(s) and what the program does. For example when you are using ConvertType to save FITS image named dataset.fits to a JPEG image and don’t specify a name for it, the JPEG output file will be name dataset.jpg. When the input is from the standard input (for example a pipe, see Standard input), and --output isn’t given, the output name will be the program’s name (for example converttype.jpg).

Another very important part of the automatic output generation is that all the directory information of the input file name is stripped off of it. This feature can be disabled with the --keepinputdir option, see Input/Output options. It is the default because astronomical data are usually very large and organized specially with special file names. In some cases, the user might not have write permissions in those directories84.

Let’s assume that we are working on a report and want to process the FITS images from two projects (ABC and DEF), which are stored in the sub-directories named ABCproject/ and DEFproject/ of our top data directory (/mnt/data). The following shell commands show how one image from the former is first converted to a JPEG image through ConvertType and then the objects from an image in the latter project are detected using NoiseChisel. The text after the # sign are comments (not typed!).

$pwd # Current location /home/usrname/research/report$ ls                                         # List directory contents
ABC01.jpg
$ls /mnt/data/ABCproject # Archive 1 ABC01.fits ABC02.fits ABC03.fits$ ls /mnt/data/DEFproject                                  # Archive 2
DEF01.fits DEF02.fits DEF03.fits
$astconvertt /mnt/data/ABCproject/ABC02.fits --output=jpg # Prog 1$ ls
ABC01.jpg ABC02.jpg
$astnoisechisel /mnt/data/DEFproject/DEF01.fits # Prog 2$ ls
ABC01.jpg ABC02.jpg DEF01_detected.fits


Previous: , Up: Common program behavior   [Contents][Index]

### 4.10 Output FITS files

The output of many of Gnuastro’s programs are (or can be) FITS files. The FITS format has many useful features for storing scientific datasets (cubes, images and tables) along with a robust features for archivability. For more on this standard, please see Fits.

As a community convention described in Fits, the first extension of all FITS files produced by Gnuastro’s programs only contains the meta-data that is intended for the file’s extension(s). For a Gnuastro program, this generic meta-data (that is stored as FITS keyword records) is its configuration when it produced this dataset: file name(s) of input(s) and option names, values and comments. Note that when the configuration is too trivial (only input filename, for example the program Table) no meta-data is written in this extension.

FITS keywords have the following limitations in regards to generic option names and values which are described below:

• If a keyword (option name) is longer than 8 characters, the first word in the record (80 character line) is HIERARCH which is followed by the keyword name.
• Values can be at most 75 characters, but for strings, this changes to 73 (because of the two extra ' characters that are necessary). However, if the value is a file name, containing slash (/) characters to separate directories, Gnuastro will break the value into multiple keywords.
• Keyword names ignore case, therefore they are all in capital letters. Therefore, if you want to use Grep to inspect these keywords, use the -i option, like the example below.
$astfits image_detected.fits -h0 | grep -i snquant  The keywords above are classified (separated by an empty line and title) as a group titled “ProgramName configuration”. This meta-data extension, as well as all the other extensions (which contain data), also contain have final group of keywords to keep the basic date and version information of Gnuastro, its dependencies and the pipeline that is using Gnuastro (if its under version control). DATE The creation time of the FITS file. This date is written directly by CFITSIO and is in UT format. COMMIT Git’s commit description from the running directory of Gnuastro’s programs. If the running directory is not version controlled or libgit2 isn’t installed (see Optional dependencies) then this keyword will not be present. The printed value is equivalent to the output of the following command: git describe --dirty --always  If the running directory contains non-committed work, then the stored value will have a ‘-dirty’ suffix. This can be very helpful to let you know that the data is not ready to be shared with collaborators or submitted to a journal. You should only share results that are produced after all your work is committed (safely stored in the version controlled history and thus reproducible). At first sight, version control appears to be mainly a tool for software developers. However progress in a scientific research is almost identical to progress in software development: first you have a rough idea that starts with handful of easy steps. But as the first results appear to be promising, you will have to extend, or generalize, it to make it more robust and work in all the situations your research covers, not just your first test samples. Slowly you will find wrong assumptions or bad implementations that need to be fixed (‘bugs’ in software development parlance). Finally, when you submit the research to your collaborators or a journal, many comments and suggestions will come in, and you have to address them. Software developers have created version control systems precisely for this kind of activity. Each significant moment in the project’s history is called a “commit”, see Version controlled source. A snapshot of the project in each “commit” is safely stored away, so you can revert back to it at a later time, or check changes/progress. This way, you can be sure that your work is reproducible and track the progress and history. With version control, experimentation in the project’s analysis is greatly facilitated, since you can easily revert back if a brainstorm test procedure fails. One important feature of version control is that the research result (FITS image, table, report or paper) can be stamped with the unique commit information that produced it. This information will enable you to exactly reproduce that same result later, even if you have made changes/progress. For one example of a research paper’s reproduction pipeline, please see the reproduction pipeline of the paper describing NoiseChisel. CFITSIO The version of CFITSIO used (see CFITSIO). WCSLIB The version of WCSLIB used (see WCSLIB). Note that older versions of WCSLIB do not report the version internally. So this is only available if you are using more recent WCSLIB versions. GSL The version of GNU Scientific Library that was used, see GNU Scientific library. GNUASTRO The version of Gnuastro used (see Version numbering). Here is one example of the last few lines of an example output.  / Versions and date DATE = '...' / file creation date COMMIT = 'v0-8-g547f6eb' / Commit description in running dir. CFITSIO = '3.45 ' / CFITSIO version. WCSLIB = '5.19 ' / WCSLIB version. GSL = '2.5 ' / GNU Scientific Library version. GNUASTRO= '0.7 ' / GNU Astronomy Utilities version. END  Next: , Previous: , Up: Top [Contents][Index] ## 5 Data containers The most low-level and basic property of a dataset is how it is stored. To process, archive and transmit the data, you need a container to store it first. From the start of the computer age, different formats have been defined to store data, optimized for particular applications. One format/container can never be useful for all applications: the storage defines the application and vice-versa. In astronomy, the Flexible Image Transport System (FITS) standard has become the most common format of data storage and transmission. It has many useful features, for example multiple sub-containers (also known as extensions or header data units, HDUs) within one file, or support for tables as well as images. Each HDU can store an independent dataset and its corresponding meta-data. Therefore, Gnuastro has one program (see Fits) specifically designed to manipulate FITS HDUs and the meta-data (header keywords) in each HDU. Your astronomical research does not just involve data analysis (where the FITS format is very useful). For example you want to demonstrate your raw and processed FITS images or spectra as figures within slides, reports, or papers. The FITS format is not defined for such applications. Thus, Gnuastro also comes with the ConvertType program (see ConvertType) which can be used to convert a FITS image to and from (where possible) other formats like plain text and JPEG (which allow two way conversion), along with EPS and PDF (which can only be created from FITS, not the other way round). Finally, the FITS format is not just for images, it can also store tables. Binary tables in particular can be very efficient in storing catalogs that have more than a few tens of columns and rows. However, unlike images (where all elements/pixels have one data type), tables contain multiple columns and each column can have different properties: independent data types (see Numeric data types) and meta-data. In practice, each column can be viewed as a separate container that is grouped with others in the table. The only shared property of the columns in a table is thus the number of elements they contain. To allow easy inspection/manipulation of table columns, Gnuastro has the Table program (see Table). It can be used to select certain table columns in a FITS table and see them as a human readable output on the command-line, or to save them into another plain text or FITS table. Next: , Previous: , Up: Data containers [Contents][Index] ### 5.1 Fits The “Flexible Image Transport System”, or FITS, is by far the most common data container format in astronomy and in constant use since the 1970s. Archiving (future usage, simplicity) has been one of the primary design principles of this format. In the last few decades it has proved so useful and robust that the Vatican Library has also chosen FITS for its “long-term digital preservation” project85. Although the full name of the standard invokes the idea that it is only for images, it also contains complete and robust features for tables. It started off in the 1970s and was formally published as a standard in 1981, it was adopted by the International Astronomical Union (IAU) in 1982 and an IAU working group to maintain its future was defined in 1988. The FITS 2.0 and 3.0 standards were approved in 2000 and 2008 respectively, and the 4.0 draft has also been released recently, please see the FITS standard document webpage for the full text of all versions. Also see the FITS 3.0 standard paper for a nice introduction and history along with the full standard. Many common image formats, for example a JPEG, only have one image/dataset per file, however one great advantage of the FITS standard is that it allows you to keep multiple datasets (images or tables along with their separate meta-data) in one file. In the FITS standard, each data + metadata is known as an extension, or more formally a header data unit or HDU. The HDUs in a file can be completely independent: you can have multiple images of different dimensions/sizes or tables as separate extensions in one file. However, while the standard doesn’t impose any constraints on the relation between the datasets, it is strongly encouraged to group data that are contextually related with each other in one file. For example an image and the table/catalog of objects and their measured properties in that image. Other examples can be images of one patch of sky in different colors (filters), or one raw telescope image along with its calibration data (tables or images). As discussed above, the extensions in a FITS file can be completely independent. To keep some information (meta-data) about the group of extensions in the FITS file, the community has adopted the following convention: put no data in the first extension, so it is just meta-data. This extension can thus be used to store Meta-data regarding the whole file (grouping of extensions). Subsequent extensions may contain data along with their own separate meta-data. All of Gnuastro’s programs also follow this convention: the main output dataset(s) are placed in the second (or later) extension(s). The first extension contains no data the program’s configuration (input file name, along with all its option values) are stored as its meta-data, see Output FITS files. The meta-data contain information about the data, for example which region of the sky an image corresponds to, the units of the data, what telescope, camera, and filter the data were taken with, it observation date, or the software that produced it and its configuration. Without the meta-data, the raw dataset is practically just a collection of numbers and really hard to understand, or connect with the real world (other datasets). It is thus strongly encouraged to supplement your data (at any level of processing) with as much meta-data about your processing/science as possible. The meta-data of a FITS file is in ASCII format, which can be easily viewed or edited with a text editor or on the command-line. Each meta-data element (known as a keyword generally) is composed of a name, value, units and comments (the last two are optional). For example below you can see three FITS meta-data keywords for specifying the world coordinate system (WCS, or its location in the sky) of a dataset: LATPOLE = -27.805089 / [deg] Native latitude of celestial pole RADESYS = 'FK5' / Equatorial coordinate system EQUINOX = 2000.0 / [yr] Equinox of equatorial coordinates  However, there are some limitations which discourage viewing/editing the keywords with text editors. For example there is a fixed length of 80 characters for each keyword (its name, value, units and comments) and there are no new-line characters, so on a text editor all the keywords are seen in one line. Also, the meta-data keywords are immediately followed by the data which are commonly in binary format and will show up as strange looking characters on a text editor, and significantly slowing down the processor. Gnuastro’s Fits program was designed to allow easy manipulation of FITS extensions and meta-data keywords on the command-line while conforming fully with the FITS standard. For example you can copy or cut (copy and remove) HDUs/extensions from one FITS file to another, or completely delete them. It also has features to delete, add, or edit meta-data keywords within one HDU. Previous: , Up: Fits [Contents][Index] #### 5.1.1 Invoking Fits Fits can print or manipulate the FITS file HDUs (extensions), meta-data keywords in a given HDU. The executable name is astfits with the following general template $ astfits [OPTION...] ASTRdata


One line examples:

## View general information about every extension:
$astfits image.fits ## Print the header keywords in the second HDU (counting from 0):$ astfits image.fits -h1

## Only print header keywords that contain NAXIS':
$astfits image.fits -h1 | grep NAXIS ## Only print the WCS standard PC matrix elements$ astfits image.fits -h1 | grep 'PC._.'

## Copy a HDU from input.fits to out.fits:
$astfits input.fits --copy=hdu-name --output=out.fits ## Update the OLDKEY keyword value to 153.034:$ astfits --update=OLDKEY,153.034,"Old keyword comment"

## Delete one COMMENT keyword and add a new one:
$astfits --delete=COMMENT --comment="Anything you like ;-)." ## Write two new keywords with different values and comments:$ astfits --write=MYKEY1,20.00,"An example keyword" --write=MYKEY2,fd


When no action is requested (and only a file name is given), Fits will print a list of information about the extension(s) in the file. This information includes the HDU number, HDU name (EXTNAME keyword), type of data (see Numeric data types, and the number of data elements it contains (size along each dimension for images and table rows and columns). You can use this to get a general idea of the contents of the FITS file and what HDU to use for further processing, either with the Fits program or any other Gnuastro program.

Here is one example of information about a FITS file with four extensions: the first extension has no data, it is a purely meta-data HDU (commonly used to keep meta-data about the whole file, or grouping of extensions, see Fits). The second extension is an image with name IMAGE and single precision floating point type (float32, see Numeric data types), it has 4287 pixels along its first (horizontal) axis and 4286 pixels along its second (vertical) axis. The third extension is also an image with name MASK. It is in 2-byte integer format (int16) which is commonly used to keep information about pixels (for example to identify which ones were saturated, or which ones had cosmic rays and so on), note how it has the same size as the IMAGE extension. The third extension is a binary table called CATALOG which has 12371 rows and 5 columns (it probably contains information about the sources in the image).

GNU Astronomy Utilities X.X
Run on Day Month DD HH:MM:SS YYYY
-----
HDU (extension) information: image.fits'.
Column 1: Index (counting from 0).
Column 2: Name (EXTNAME' in FITS standard).
Column 3: Image data type or table' format (ASCII or binary).
Column 4: Size of data in HDU.
-----
0      n/a             uint8           0
1      IMAGE           float32         4287x4286
3      CATALOG         table_binary    12371x5


If a specific HDU is identified on the command-line with the --hdu (or -h option) and no operation requested, then the full list of header keywords in that HDU will be printed (as if the --printallkeys was called, see below). It is important to remember that this only occurs when --hdu is given on the command-line. The --hdu value given in a configuration file will only be used when a specific operation on keywords requested. Therefore as described in the paragraphs above, when no explicit call to the --hdu option is made on the command-line and no operation is requested (on the command-line or configuration files), the basic information of each HDU/extension is printed.

The operating mode and input/output options to Fits are similar to the other programs and fully described in Common options. The options particular to Fits can be divided into two groups: 1) those related to modifying HDUs or extensions (see HDU manipulation), and 2) those related to viewing/modifying meta-data keywords (see Keyword manipulation). These two classes of options cannot be called together in one run: you can either work on the extensions or meta-data keywords in any instance of Fits.

Next: , Previous: , Up: Invoking astfits   [Contents][Index]

#### 5.1.1.1 HDU manipulation

Each header data unit, or HDU (also known as an extension), in a FITS file is an independent dataset (data + meta-data). Multiple HDUs can be stored in one FITS file, see Fits. The HDU modifying options to the Fits program are listed below.

These options may be called multiple times in one run. If so, the extensions will be copied from the input FITS file to the output FITS file in the given order (on the command-line and also in configuration files, see Configuration file precedence). If the separate classes are called together in one run of Fits, then first --copy is run (on all specified HDUs), followed by --cut (again on all specified HDUs), and then --remove (on all specified HDUs).

The --copy and --cut options need an output FITS file (specified with the --output option). If the output file exists, then the specified HDU will be copied following the last extension of the output file (the existing HDUs in it will be untouched). Thus, after Fits finishes, the copied HDU will be the last HDU of the output file. If no output file name is given, then automatic output will be used to store the HDUs given to this option (see Automatic output).

-n
--numhdus

Print the number of extensions/HDUs in the given file. Note that this option must be called alone and will only print a single number. It is thus useful in scripts, for example when you need to do check the number of extensions in a FITS file.

For a complete list of basic meta-data on the extensions in a FITS file, don’t use any of the options in this section or in Keyword manipulation. For more, see Invoking Fits.

--datasum

Calculate and print the given HDU’s "datasum" to stdout. The given HDU is specified with the --hdu (or -h) option. This number is calculated by parsing all the bytes of the given HDU’s data records (excluding keywords). This option ignores any possibly existing DATASUM keyword in the HDU. For more on the datasum feature of the FITS standard, see Keyword manipulation (under the checksum component of --write).

You can use this option to confirm that the data in two different HDUs (possibly with different keywords) is identical. Its advantage over --write=datasum (which writes the DATASUM keyword into the given HDU) is that it doesn’t require write permissions.

--pixelscale

Print the HDU’s pixel-scale (change in world coordinate for one pixel along each dimension). Without the --quiet option, the output of --pixelscale is more human-friendly by printing the file/HDU name, number of dimensions, and the units of each number along with the actual pixel scales. However, in scripts (that are to be run automatically), this human-friendly format is annoying, so when called with the --quiet option, only the pixel-scale value(s) along each dimension is(are) printed in one line.

-C STR
--copy=STR

Copy the specified extension into the output file, see explanations above.

-k STR
--cut=STR

Cut (copy to output, remove from input) the specified extension into the output file, see explanations above.

-R STR
--remove=STR

Remove the specified HDU from the input file.

The first (zero-th) HDU cannot be removed with this option. Consider using --copy or --cut in combination with primaryimghdu to not have an empty zero-th HDU. From CFITSIO: “In the case of deleting the primary array (the first HDU in the file) then [it] will be replaced by a null primary array containing the minimum set of required keywords and no data.”. So in practice, any existing data (array) and meta-data in the first extension will be removed, but the number of extensions in the file won’t change. This is because of the unique position the first FITS extension has in the FITS standard (for example it cannot be used to store tables).

--primaryimghdu

Copy or cut an image HDU to the zero-th HDU/extension a file that doesn’t yet exist. This option is thus irrelevant if the output file already exists or the copied/cut extension is a FITS table. For example with the commands below, first we make sure that out.fits doesn’t exist, then we copy the first extension of in.fits to the zero-th extension of out.fits.

$rm -f out.fits$ astfits in.fits --copy=1 --primaryimghdu --output=out.fits


If we hadn’t used --primaryimghdu, then the zero-th extension of out.fits would have no data, and its second extension would host the copied image (just like any other output of Gnuastro).

Previous: , Up: Invoking astfits   [Contents][Index]

#### 5.1.1.2 Keyword manipulation

The meta-data in each header data unit, or HDU (also known as extension, see Fits) is stored as “keyword”s. Each keyword consists of a name, value, unit, and comments. The Fits program (see Fits) options related to viewing and manipulating keywords in a FITS HDU are described below.

To see the full list of keywords in a FITS HDU, you can use the --printallkeys option. If any of the keywords are to be modified, the headers of the input file will be changed. If you want to keep the original FITS file or HDU, it is easiest to create a copy first and then run Fits on that. In the FITS standard, keywords are always uppercase. So case does not matter in the input or output keyword names you specify.

Most of the options can accept multiple instances in one command. For example you can add multiple keywords to delete by calling --delete multiple times, since repeated keywords are allowed, you can even delete the same keyword multiple times. The action of such options will start from the top most keyword.

The precedence of operations are described below. Note that while the order within each class of actions is preserved, the order of individual actions is not. So irrespective of what order you called --delete and --update. First, all the delete operations are going to take effect then the update operations.

1. --delete
2. --rename
3. --update
4. --write
5. --asis
6. --history
7. --comment
8. --date
9. --printallkeys
10. --verify
11. --copykeys

All possible syntax errors will be reported before the keywords are actually written. FITS errors during any of these actions will be reported, but Fits won’t stop until all the operations are complete. If --quitonerror is called, then Fits will immediately stop upon the first error.

If you want to inspect only a certain set of header keywords, it is easiest to pipe the output of the Fits program to GNU Grep. Grep is a very powerful and advanced tool to search strings which is precisely made for such situations. For example if you only want to check the size of an image FITS HDU, you can run:

$astfits input.fits | grep NAXIS   FITS STANDARD KEYWORDS: Some header keywords are necessary for later operations on a FITS file, for example BITPIX or NAXIS, see the FITS standard for their full list. If you modify (for example remove or rename) such keywords, the FITS file extension might not be usable any more. Also be careful for the world coordinate system keywords, if you modify or change their values, any future world coordinate system (like RA and Dec) measurements on the image will also change. The keyword related options to the Fits program are fully described below. -a STR --asis=STR Write STR exactly into the FITS file header with no modifications. If it does not conform to the FITS standards, then it might cause trouble, so please be very careful with this option. If you want to define the keyword from scratch, it is best to use the --write option (see below) and let CFITSIO worry about the standards. The best way to use this option is when you want to add a keyword from one FITS file to another unchanged and untouched. Below is an example of such a case that can be very useful sometimes (on the command-line or in scripts): $ key=$(astfits firstimage.fits | grep KEYWORD)$ astfits --asis="$key" secondimage.fits  In particular note the double quotation signs (") around the reference to the key shell variable ($key). FITS keywords usually have lots of space characters, if this variable is not quoted, the shell will only give the first word in the full keyword to this option, which will definitely be a non-standard FITS keyword and will make it hard to work on the file afterwords. See the “Quoting” section of the GNU Bash manual for more information if your keyword has the special characters $, , or \. -d STR --delete=STR Delete one instance of the STR keyword from the FITS header. Multiple instances of --delete can be given (possibly even for the same keyword, when its repeated in the meta-data). All keywords given will be removed from the headers in the same given order. If the keyword doesn’t exist, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror. -r STR --rename=STR Rename a keyword to a new value. STR contains both the existing and new names, which should be separated by either a comma (,) or a space character. Note that if you use a space character, you have to put the value to this option within double quotation marks (") so the space character is not interpreted as an option separator. Multiple instances of --rename can be given in one command. The keywords will be renamed in the specified order. If the keyword doesn’t exist, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror. -u STR --update=STR Update a keyword, its value, its comments and its units in the format described below. If there are multiple instances of the keyword in the header, they will be changed from top to bottom (with multiple --update options). The format of the values to this option can best be specified with an example: --update=KEYWORD,value,"comments for this keyword",unit  If there is a writing error, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror. The value can be any numerical or string value86. Other than the KEYWORD, all the other values are optional. To leave a given token empty, follow the preceding comma (,) immediately with the next. If any space character is present around the commas, it will be considered part of the respective token. So if more than one token has space characters within it, the safest method to specify a value to this option is to put double quotation marks around each individual token that needs it. Note that without double quotation marks, space characters will be seen as option separators and can lead to undefined behavior. -w STR --write=STR Write a keyword to the header. For the possible value input formats, comments and units for the keyword, see the --update option above. The special names (first string) below will cause a special behavior: / Write a “title” to the list of keywords. A title consists of one blank line and another which is blank for several spaces and starts with a slash (/). The second string given to this option is the “title” or string printed after the slash. For example with the command below you can add a “title” of ‘My keywords’ after the existing keywords and add the subsequent K1 and K2 keywords under it (note that keyword names are not case sensitive). $ astfits test.fits -h1 --write=/,"My keywords" \
--write=k1,1.23,"My first keyword"    \
--write=k2,4.56,"My second keyword"
$astfits test.fits -h1 [[[ ... truncated ... ]]] / My keywords K1 = 1.23 / My first keyword K2 = 4.56 / My second keyword END  Adding a “title” before each contextually separate group of header keywords greatly helps in readability and visual inspection of the keywords. So generally, when you want to add new FITS keywords, its good practice to also add a title before them. The reason you need to use / as the keyword name for setting a title is that / is the first non-white character. The title(s) is(are) written into the FITS with the same order that --write is called. Therefore in one run of the Fits program, you can specify many different titles (with their own keywords under them). For example the command below that builds on the previous example and adds another group of keywords named A1 and A2. $ astfits test.fits -h1 --write=/,"My keywords"   \
--write=k1,1.23,"My first keyword"      \
--write=k2,4.56,"My second keyword"     \
--write=/,"My second group of keywords" \
--write=a1,7.89,"First keyword"         \
--write=a2,0.12,"Second keyword"

checksum

When nothing is given afterwards, the header integrity keywords DATASUM and CHECKSUM will be calculated and written/updated. This is calculation and writing is done fully by CFITSIO. They thus comply with the FITS standard 4.087 that defines these keywords (its Appendix J).

If a value is given (e.g., --write=checksum,MyOwnCheckSum), then CFITSIO won’t be called to calculate these two keywords and the value (as well as possible comment and unit) will be written just like any other keyword. This is generally not recommended, but necessary in special circumstances (for example when the checksum needs to be manually updated).

DATASUM only depends on the data section of the HDU/extension, so it is not changed when you update the keywords. But CHECKSUM also depends on the header and will not be valid if you make any further changes to the header. This includes any further keyword modification options in the same call to the Fits program. Therefore it is recommended to write these keywords as the last keywords that are written/modified in the extension. You can use the --verify option (described below) to verify the values of these two keywords.

datasum

Similar to checksum, but only write the DATASUM keyword (that doesn’t depend on the header keywords, only the data).

-H STR
--history STR

Add a HISTORY keyword to the header with the given value. A new HISTORY keyword will be created for every instance of this option. If the string given to this option is longer than 70 characters, it will be separated into multiple keyword cards. If there is an error, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror.

-c STR
--comment STR

Add a COMMENT keyword to the header with the given value. Similar to the explanation for --history above.

-t
--date

Put the current date and time in the header. If the DATE keyword already exists in the header, it will be updated. If there is a writing error, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror.

-p
--printallkeys

Print all the keywords in the specified FITS extension (HDU) on the command-line. If this option is called along with any of the other keyword editing commands, as described above, all other editing commands take precedence to this. Therefore, it will print the final keywords after all the editing has been done.

-v
--verify

Verify the DATASUM and CHECKSUM data integrity keywords of the FITS standard. See the description under the checksum (under --write, above) for more on these keywords.

This option will print Verified for both keywords if they can be verified. Otherwise, if they don’t exist in the given HDU/extension, it will print NOT-PRESENT, and if they cannot be verified it will print INCORRECT. In the latter case (when the keyword values exist but can’t be verified), the Fits program will also return with a failure.

By default this function will also print a short description of the DATASUM AND CHECKSUM keywords. You can suppress this extra information with --quiet option.

--copykeys=INT:INT

Copy the input’s keyword records in the given range (inclusive) to the output HDU (specified with the --output and --outhdu options, for the filename and HDU/extension respectively).

The given string to this option must be two integers separated by a colon (:). The first integer must be positive (counting of the keyword records starts from 1). The second integer may be negative (zero is not acceptable) or an integer larger than the first.

A negative second integer means counting from the end. So -1 is the last copy-able keyword (not including the END keyword).

To see the header keywords of the input with a number before them, you can pipe the output of the FITS program (when it prints all the keywords in an extension) into the cat program like below:

$astfits input.fits -h1 | cat -n  --outhdu The HDU/extension to write the output keywords of --copykeys. -Q --quitonerror Quit if any of the operations above are not successful. By default if an error occurs, Fits will warn the user of the faulty keyword and continue with the rest of actions. -s STR --datetosec STR Interpret the value of the given keyword in the FITS date format (most generally: YYYY-MM-DDThh:mm:ss.ddd...) and return the corresponding Unix epoch time (number of seconds that have passed since 00:00:00 Thursday, January 1st, 1970). The Thh:mm:ss.ddd... section (specifying the time of day), and also the .ddd... (specifying the fraction of a second) are optional. The value to this option must be the FITS keyword name that contains the requested date, for example --datetosec=DATE-OBS. This option can also interpret the older FITS date format (DD/MM/YYThh:mm:ss.ddd...) where only two characters are given to the year. In this case (following the GNU C Library), this option will make the following assumption: values 68 to 99 correspond to the years 1969 to 1999, and values 0 to 68 as the years 2000 to 2068. This is a very useful option for operations on the FITS date values, for example sorting FITS files by their dates, or finding the time difference between two FITS files. The advantage of working with the Unix epoch time is that you don’t have to worry about calendar details (for example the number of days in different months, or leap years, etc). --wcsdistortion STR If the argument has a WCS distortion, the output (file given with the --output option) will have the distortion given to this option (for example SIP, TPV). With this option, the FITS program will read the minimal set of keywords from the input HDU and the HDU data, it will then write them into the file given to the --output option but with a newly created set of WCS-related keywords corresponding to the desired distortion standard. If no --output file is specified, an automatically generated output name will be used which is composed of the input’s name but with the -DDD.fits suffix, see Automatic output. Where DDD is the value given to this option (desired output distortion). Note that all possible conversions between all standards are not yet supported. If the requested conversion is not supported, an informative error message will be printed. If this happens, please let us know and we’ll try our best to add the respective conversions. For example with the command below, you can be sure that if in.fits has a distortion in its WCS, the distortion of out.fits will be in the SIP standard. $ astfits in.fits --wcsdistortion=SIP --output=out.fits


Next: , Previous: , Up: Data containers   [Contents][Index]

### 5.2 Sort FITS files by night

FITS images usually contain (several) keywords for preserving important dates. In particular, for lower-level data, this is usually the observation date and time (for example, stored in the DATE-OBS keyword value). When analyzing observed datasets, many calibration steps (like the dark, bias or flat-field), are commonly calculated on a per-observing-night basis.

However, the FITS standard’s date format (YYYY-MM-DDThh:mm:ss.ddd) is based on the western (Gregorian) calendar. Dates that are stored in this format are complicated for automatic processing: a night starts in the final hours of one calendar day, and extends to the early hours of the next calendar day. As a result, to identify datasets from one night, we commonly need to search for two dates. However calendar peculiarities can make this identification very difficult. For example when an observation is done on the night separating two months (like the night starting on March 31st and going into April 1st), or two years (like the night starting on December 31st 2018 and going into January 1st, 2019). To account for such situations, it is necessary to keep track of how many days are in a month, and leap years, etc.

Gnuastro’s astscript-sort-by-night script is created to help in such important scenarios. It uses Fits to convert the FITS date format into the Unix epoch time (number of seconds since 00:00:00 of January 1st, 1970), using the --datetosec option. The Unix epoch time is a single number (integer, if not given in sub-second precision), enabling easy comparison and sorting of dates after January 1st, 1970.

You can use this script as a basis for making a much more highly customized sorting script. Here are some examples

• If you need to copy the files, but only need a single extension (not the whole file), you can add a step just before the making of the symbolic links, or copies, and change it to only copy a certain extension of the FITS file using the Fits program’s --copy option, see HDU manipulation.
• If you need to classify the files with finer detail (for example the purpose of the dataset), you can add a step just before the making of the symbolic links, or copies, to specify a file-name prefix based on other certain keyword values in the files. For example when the FITS files have a keyword to specify if the dataset is a science, bias, or flat-field image. You can read it and to add a sci-, bias-, or flat- to the created file (after the --prefix) automatically.

For example, let’s assume the observing mode is stored in the hypothetical MODE keyword, which can have three values of BIAS-IMAGE, SCIENCE-IMAGE and FLAT-EXP. With the step below, you can generate a mode-prefix, and add it to the generated link/copy names (just correct the filename and extension of the first line to the script’s variables):

modepref=$(astfits infile.fits -h1 \ | sed -e"s/'/ /g" \ | awk '$1=="MODE"{ \
if($3=="BIAS-IMAGE") print "bias-"; \ else if($3=="SCIENCE-IMAGE") print "sci-"; \
else if($3==FLAT-EXP) print "flat-"; \ else print$3, "NOT recognized"; exit 1}')


Here is a description of it. We first use astfits to print all the keywords in extension 1 of infile.fits. In the FITS standard, string values (that we are assuming here) are placed in single quotes (') which are annoying in this context/use-case. Therefore, we pipe the output of astfits into sed to remove all such quotes (substituting them with a blank space). The result is then piped to AWK for giving us the final mode-prefix: with $1=="MODE", we ask AWK to only consider the line where the first column is MODE. There is an equal sign between the key name and value, so the value is the third column ($3 in AWK). We thus use a simple if-else structure to look into this value and print our custom prefix based on it. The output of AWK is then stored in the modepref shell variable which you can add to the link/copy name.

With the solution above, the increment of the file counter for each night will be independent of the mode. If you want the counter to be mode-dependent, you can add a different counter for each mode and use that counter instead of the generic counter for each night (based on the value of modepref). But we’ll leave the implementation of this step to you as an exercise.

Previous: , Up: Sort FITS files by night   [Contents][Index]

#### 5.2.1 Invoking astscript-sort-by-night

This installed script will read a FITS date formatted value from the given keyword, and classify the input FITS files into individual nights. For more on installed scripts please see (see Installed scripts). This script can be used with the following general template:

$astscript-sort-by-night [OPTION...] FITS-files  One line examples: ## Use the DATE-OBS keyword$ astscript-sort-by-night --key=DATE-OBS /path/to/data/*.fits

## Make links to the input files with the img-' prefix
$astscript-sort-by-night --link --prefix=img- /path/to/data/*.fits  This script will look into a HDU/extension (--hdu) for a keyword (--key) in the given FITS files and interpret the value as a date. The inputs will be separated by "night"s (9:00a.m to next day’s 8:59:59a.m, spanning two calendar days, exact hour can be set with --hour). The default output is a list of all the input files along with the following two columns: night number and file number in that night (sorted by time). With --link a symbolic link will be made (one for each input) that contains the night number, and number of file in that night (sorted by time), see the description of --link for more. When --copy is used instead of a link, a copy of the inputs will be made instead of symbolic link. Below you can see one example where all the target-*.fits files in the data directory should be separated by observing night according to the DATE-OBS keyword value in their second extension (number 1, recall that HDU counting starts from 0). You can see the output after the ls command. $ astscript-sort-by-night -pimg- -h1 -kDATE-OBS data/target-*.fits
$ls img-n1-1.fits img-n1-2.fits img-n2-1.fits ...  The outputs can be placed in a different (already existing) directory by including that directory’s name in the --prefix value, for example --prefix=sorted/img- will put them all under the sorted directory. This script can be configured like all Gnuastro’s programs (through command-line options, see Common options), with some minor differences that are described in Installed scripts. The particular options to this script are listed below: -h STR --hdu=STR The HDU/extension to use in all the given FITS files. All of the given FITS files must have this extension. -k STR --key=STR The keyword name that contains the FITS date format to classify/sort by. -H FLT --hour=FLT The hour that defines the next “night”. By default, all times before 9:00a.m are considered to belong to the previous calendar night. If a sub-hour value is necessary, it should be given in units of hours, for example --hour=9.5 corresponds to 9:30a.m.  Dealing with time zones: The time that is recorded in --key may be in UTC (Universal Time Coordinate). However, the organization of the images taken during the night depends on the local time. It is possible to take this into account by setting the --hour option to the local time in UTC. For example, consider a set of images taken in Auckland (New Zealand, UTC+12) during different nights. If you want to classify these images by night, you have to know at which time (in UTC time) the Sun rises (or any other separator/definition of a different night). In this particular example, you can use --hour=21. Because in Auckland, a night finishes (roughly) at the local time of 9:00, which corresponds to 21:00 UTC. -l --link Create a symbolic link for each input FITS file. This option cannot be used with --copy. The link will have a standard name in the following format (variable parts are written in CAPITAL letters and described after it): PnN-I.fits  P This is the value given to --prefix. By default, its value is ./ (to store the links in the directory this script was run in). See the description of --prefix for more. N This is the night-counter: starting from 1. N is just incremented by 1 for the next night, no matter how many nights (without any dataset) there are between two subsequent observing nights (its just an identifier for each night which you can easily map to different calendar nights). I File counter in that night, sorted by time. -c --copy Make a copy of each input FITS file with the standard naming convention described in --link. With this option, instead of making a link, a copy is made. This option cannot be used with --link. -p STR --prefix=STR Prefix to append before the night-identifier of each newly created link or copy. This option is thus only relevant with the --copy or --link options. See the description of --link for how its used. For example, with --prefix=img-, all the created file names in the current directory will start with img-, making outputs like img-n1-1.fits or img-n3-42.fits. --prefix can also be used to store the links/copies in another directory relative to the directory this script is being run (it must already exist). For example --prefix=/path/to/processing/img- will put all the links/copies in the /path/to/processing directory, and the files (in that directory) will all start with img-. Next: , Previous: , Up: Data containers [Contents][Index] ### 5.3 ConvertType The FITS format used in astronomy was defined mainly for archiving, transmission, and processing. In other situations, the data might be useful in other formats. For example, when you are writing a paper or report, or if you are making slides for a talk, you can’t use a FITS image. Other image formats should be used. In other cases you might want your pixel values in a table format as plain text for input to other programs that don’t recognize FITS. ConvertType is created for such situations. The various types will increase with future updates and based on need. The conversion is not only one way (from FITS to other formats), but two ways (except the EPS and PDF formats88). So you can also convert a JPEG image or text file into a FITS image. Basically, other than EPS/PDF, you can use any of the recognized formats as different color channel inputs to get any of the recognized outputs. So before explaining the options and arguments (in Invoking ConvertType), we’ll start with a short description of the recognized files types in Recognized file formats, followed a short introduction to digital color in Color. Next: , Previous: , Up: ConvertType [Contents][Index] #### 5.3.1 Recognized file formats The various standards and the file name extensions recognized by ConvertType are listed below. Currently Gnuastro uses the file name’s suffix to identify the format. FITS or IMH Astronomical data are commonly stored in the FITS format (or the older data IRAF .imh format), a list of file name suffixes which indicate that the file is in this format is given in Arguments. Each image extension of a FITS file only has one value per pixel/element. Therefore, when used as input, each input FITS image contributes as one color channel. If you want multiple extensions in one FITS file for different color channels, you have to repeat the file name multiple times and use the --hdu, --hdu2, --hdu3 or --hdu4 options to specify the different extensions. JPEG The JPEG standard was created by the Joint photographic experts group. It is currently one of the most commonly used image formats. Its major advantage is the compression algorithm that is defined by the standard. Like the FITS standard, this is a raster graphics format, which means that it is pixelated. A JPEG file can have 1 (for gray-scale), 3 (for RGB) and 4 (for CMYK) color channels. If you only want to convert one JPEG image into other formats, there is no problem, however, if you want to use it in combination with other input files, make sure that the final number of color channels does not exceed four. If it does, then ConvertType will abort and notify you. The file name endings that are recognized as a JPEG file for input are: .jpg, .JPG, .jpeg, .JPEG, .jpe, .jif, .jfif and .jfi. TIFF TIFF (or Tagged Image File Format) was originally designed as a common format for scanners in the early 90s and since then it has grown to become very general. In many aspects, the TIFF standard is similar to the FITS image standard: it can allow data of many types (see Numeric data types), and also allows multiple images to be stored in a single file (each image in the file is called a ‘directory’ in the TIFF standard). However, unlike FITS, it can only store images, it has no constructs for tables. Another (inconvenient) difference with the FITS standard is that keyword names are stored as numbers, not human-readable text. However, outside of astronomy, because of its support of different numeric data types, many fields use TIFF images for accurate (for example 16-bit integer or floating point for example) imaging data. Currently ConvertType can only read TIFF images, if you are interested in writing TIFF images, please get in touch with us. EPS The Encapsulated PostScript (EPS) format is essentially a one page PostScript file which has a specified size. PostScript also includes non-image data, for example lines and texts. It is a fully functional programming language to describe a document. Therefore in ConvertType, EPS is only an output format and cannot be used as input. Contrary to the FITS or JPEG formats, PostScript is not a raster format, but is categorized as vector graphics. The Portable Document Format (PDF) is currently the most common format for documents. Some believe that PDF has replaced PostScript and that PostScript is now obsolete. This view is wrong, a PostScript file is an actual plain text file that can be edited like any program source with any text editor. To be able to display its programmed content or print, it needs to pass through a processor or compiler. A PDF file can be thought of as the processed output of the compiler on an input PostScript file. PostScript, EPS and PDF were created and are registered by Adobe Systems. With these features in mind, you can see that when you are compiling a document with TeX or LaTeX, using an EPS file is much more low level than a JPEG and thus you have much greater control and therefore quality. Since it also includes vector graphic lines we also use such lines to make a thin border around the image to make its appearance in the document much better. No matter the resolution of the display or printer, these lines will always be clear and not pixelated. In the future, addition of text might be included (for example labels or object IDs) on the EPS output. However, this can be done better with tools within TeX or LaTeX such as PGF/Tikz89. If the final input image (possibly after all operations on the flux explained below) is a binary image or only has two colors of black and white (in segmentation maps for example), then PostScript has another great advantage compared to other formats. It allows for 1 bit pixels (pixels with a value of 0 or 1), this can decrease the output file size by 8 times. So if a gray-scale image is binary, ConvertType will exploit this property in the EPS and PDF (see below) outputs. The standard formats for an EPS file are .eps, .EPS, .epsf and .epsi. The EPS outputs of ConvertType have the .eps suffix. PDF As explained above, a PDF document is a static document description format, viewing its result is therefore much faster and more efficient than PostScript. To create a PDF output, ConvertType will make a PostScript page description and convert that to PDF using GPL Ghostscript. The suffixes recognized for a PDF file are: .pdf, .PDF. If GPL Ghostscript cannot be run on the PostScript file, it will remain and a warning will be printed. blank This is not actually a file type! But can be used to fill one color channel with a blank value. If this argument is given for any color channel, that channel will not be used in the output. Plain text Plain text files have the advantage that they can be viewed with any text editor or on the command-line. Most programs also support input as plain text files. As input, each plain text file is considered to contain one color channel. In ConvertType, the recognized extensions for plain text files are .txt and .dat. As described in Invoking ConvertType, if you just give these extensions, (and not a full filename) as output, then automatic output will be preformed to determine the final output name (see Automatic output). Besides these, when the format of a file cannot be recognized from its name, ConvertType will fall back to plain text mode. So you can use any name (even without an extension) for a plain text input or output. Just note that when the suffix is not recognized, automatic output will not be preformed. The basic input/output on plain text images is very similar to how tables are read/written as described in Gnuastro text table format. Simply put, the restrictions are very loose, and there is a convention to define a name, units, data type (see Numeric data types), and comments for the data in a commented line. The only difference is that as a table, a text file can contain many datasets (columns), but as a 2D image, it can only contain one dataset. As a result, only one information comment line is necessary for a 2D image, and instead of the starting ‘# Column N’ (N is the column number), the information line for a 2D image must start with ‘# Image 1’. When ConvertType is asked to output to plain text file, this information comment line is written before the image pixel values. When converting an image to plain text, consider the fact that if the image is large, the number of columns in each line will become very large, possibly making it very hard to open in some text editors. Standard output (command-line) This is very similar to the plain text output, but instead of creating a file to keep the printed values, they are printed on the command line. This can be very useful when you want to redirect the results directly to another program in one command with no intermediate file. The only difference is that only the pixel values are printed (with no information comment line). To print to the standard output, set the output name to ‘stdout’. Next: , Previous: , Up: ConvertType [Contents][Index] #### 5.3.2 Color Color is defined by mixing various measurements/filters. In digital monitors or common digital cameras, colors are displayed/stored by mixing the three basic colors of red, green and blue (RGB) with various proportions. When printing on paper, standard printers use the cyan, magenta, yellow and key (CMYK, key=black) color space. In other words, for each displayed/printed pixel of a color image, the dataset/image has three or four values. To store/show the three values for each pixel, cameras and monitors allocate a certain fraction of each pixel’s area to red, green and blue filters. These three filters are thus built into the hardware at the pixel level. However, because measurement accuracy is very important in scientific instruments, and we want to do measurements (take images) with various/custom filters (without having to order a new expensive detector!), scientific detectors use the full area of the pixel to store one value for it in a single/mono channel dataset. To make measurements in different filters, we just place a filter in the light path before the detector. Therefore, the FITS format that is used to store astronomical datasets is inherently a mono-channel format (see Recognized file formats or Fits). When a subject has been imaged in multiple filters, you can feed each different filter into the red, green and blue channels and obtain a colored visualization. In ConvertType, you can do this by giving each separate single-channel dataset (for example in the FITS image format) as an argument (in the proper order), then asking for the output in a format that supports multi-channel datasets (for example JPEG or PDF, see the examples in Invoking ConvertType). As discussed above, color is not defined when a dataset/image contains a single value for each pixel. However, we interact with scientific datasets through monitors or printers (which allow multiple values per pixel and produce color with them). As a result, there is a lot of freedom in visualizing a single-channel dataset. The most basic is to use shades of black (because of its strong contrast with white). This scheme is called grayscale. To help in visualization, more complex mappings can be defined. For example, the values can be scaled to a range of 0 to 360 and used as the “Hue” term of the Hue-Saturation-Value (HSV) color space (while fixing the “Saturation” and “Value” terms). In ConvertType, you can use the --colormap option to choose between different mappings of mono-channel inputs, see Invoking ConvertType. Since grayscale is a commonly used mapping of single-valued datasets, we’ll continue with a closer look at how it is stored. One way to represent a gray-scale image in different color spaces is to use the same proportions of the primary colors in each pixel. This is the common way most FITS image viewers work: for each pixel, they fill all the channels with the single value. While this is necessary for displaying a dataset, there are downsides when storing/saving this type of grayscale visualization (for example in a paper). • Three (for RGB) or four (for CMYK) values have to be stored for every pixel, this makes the output file very heavy (in terms of bytes). • If printing, the printing errors of each color channel can make the printed image slightly more blurred than it actually is. To solve both these problems when storing grayscale visualization, the best way is to save a single-channel dataset into the black channel of the CMYK color space. The JPEG standard is the only common standard that accepts CMYK color space. The JPEG and EPS standards set two sizes for the number of bits in each channel: 8-bit and 12-bit. The former is by far the most common and is what is used in ConvertType. Therefore, each channel should have values between 0 to 2^8-1=255. From this we see how each pixel in a gray-scale image is one byte (8 bits) long, in an RGB image, it is 3 bytes long and in CMYK it is 4 bytes long. But thanks to the JPEG compression algorithms, when all the pixels of one channel have the same value, that channel is compressed to one pixel. Therefore a Grayscale image and a CMYK image that has only the K-channel filled are approximately the same file size. Previous: , Up: ConvertType [Contents][Index] #### 5.3.3 Invoking ConvertType ConvertType will convert any recognized input file type to any specified output type. The executable name is astconvertt with the following general template $ astconvertt [OPTION...] InputFile [InputFile2] ... [InputFile4]


One line examples:

## Convert an image in FITS to PDF:
$astconvertt image.fits --output=pdf ## Similar to before, but use the Viridis color map:$ astconvertt image.fits --colormap=viridis --output=pdf

## Convert an image in JPEG to FITS (with multiple extensions
## if its color):
$astconvertt image.jpg -oimage.fits ## Use three plain text 2D arrays to create an RGB JPEG output:$ astconvertt f1.txt f2.txt f3.fits -o.jpg

## Use two images and one blank for an RGB EPS output:
$astconvertt M31_r.fits M31_g.fits blank -oeps ## Directly pass input from output of another program through Standard ## input (not a file).$ cat 2darray.txt | astconvertt -oimg.fits


The output’s file format will be interpreted from the value given to the --output option. It can either be given on the command-line or in any of the configuration files (see Configuration files). Note that if the output suffix is not recognized, it will default to plain text format, see Recognized file formats.

At most four input files (one for each color channel for formats that allow it) are allowed in ConvertType. The first input dataset can either be a file or come from Standard input (see Standard input). The order of multiple input files is important. After reading the input file(s) the number of color channels in all the inputs will be used to define which color space to use for the outputs and how each color channel is interpreted.

Some formats can allow more than one color channel (for example in the JPEG format, see Recognized file formats). If there is one input dataset (color channel) the output will be gray-scale, if three input datasets (color channels) are given, they are respectively considered to be the red, green and blue color channels. Finally, if there are four color channels they will be be cyan, magenta, yellow and black (CMYK colors).

The value to --output (or -o) can be either a full file name or just the suffix of the desired output format. In the former case, it will used for the output. In the latter case, the name of the output file will be set based on the automatic output guidelines, see Automatic output. Note that the suffix name can optionally start a . (dot), so for example --output=.jpg and --output=jpg are equivalent. See Recognized file formats.

Besides the common set of options explained in Common options, the options to ConvertType can be classified into input, output and flux related options. The majority of the options are to do with the flux range. Astronomical data usually have a very large dynamic range (difference between maximum and minimum value) and different subjects might be better demonstrated with a limited flux range.

Input:

-h STR/INT
--hdu=STR/INT

In ConvertType, it is possible to call the HDU option multiple times for the different input FITS or TIFF files in the same order that they are called on the command-line. Note that in the TIFF standard, one ‘directory’ (similar to a FITS HDU) may contain multiple color channels (for example when the image is in RGB).

Except for the fact that multiple calls are possible, this option is identical to the common --hdu in Input/Output options. The number of calls to this option cannot be less than the number of input FITS or TIFF files, but if there are more, the extra HDUs will be ignored, note that they will be read in the order described in Configuration file precedence.

Unlike CFITSIO, libtiff (which is used to read TIFF files) only recognizes numbers (counting from zero, similar to CFITSIO) for ‘directory’ identification. Hence the concept of names is not defined for the directories and the values to this option for TIFF files must be numbers.

Output:

-w FLT
--widthincm=FLT

The width of the output in centimeters. This is only relevant for those formats that accept such a width (not plain text for example). For most digital purposes, the number of pixels is far more important than the value to this parameter because you can adjust the absolute width (in inches or centimeters) in your document preparation program.

-b INT
--borderwidth=INT

The width of the border to be put around the EPS and PDF outputs in units of PostScript points. There are 72 or 28.35 PostScript points in an inch or centimeter respectively. In other words, there are roughly 3 PostScript points in every millimeter. If you are planning on adding a border, its significance is highly correlated with the value you give to the --widthincm parameter.

Unfortunately in the document structuring convention of the PostScript language, the “bounding box” has to be in units of PostScript points with no fractions allowed. So the border values only have to be specified in integers. To have a final border that is thinner than one PostScript point in your document, you can ask for a larger width in ConvertType and then scale down the output EPS or PDF file in your document preparation program. For example by setting width in your includegraphics command in TeX or LaTeX. Since it is vector graphics, the changes of size have no effect on the quality of your output quality (pixels don’t get different values).

-x
--hex

Use Hexadecimal encoding in creating EPS output. By default the ASCII85 encoding is used which provides a much better compression ratio. When converted to PDF (or included in TeX or LaTeX which is finally saved as a PDF file), an efficient binary encoding is used which is far more efficient than both of them. The choice of EPS encoding will thus have no effect on the final PDF.

So if you want to transfer your EPS files (for example if you want to submit your paper to arXiv or journals in PostScript), their storage might become important if you have large images or lots of small ones. By default ASCII85 encoding is used which offers a much better compression ratio (nearly 40 percent) compared to Hexadecimal encoding.

-u INT
--quality=INT

The quality (compression) of the output JPEG file with values from 0 to 100 (inclusive). For other formats the value to this option is ignored. Note that only in gray-scale (when one input color channel is given) will this actually be the exact quality (each pixel will correspond to one input value). If it is in color mode, some degradation will occur. While the JPEG standard does support loss-less graphics, it is not commonly supported.

--colormap=STR[,FLT,...]

The color map to visualize a single channel. The first value given to this option is the name of the color map, which is shown below. Some color maps can be configured. In this case, the configuration parameters are optionally given as numbers following the name of the color map for example see hsv. The table below contains the usable names of the color maps that are currently supported:

gray
grey

Grayscale color map. This color map doesn’t have any parameters. The full dataset range will be scaled to 0 and $$2^8-1=255$$ to be stored in the requested format.

hsv

Hue, Saturation, Value90 color map. If no values are given after the name (--colormap=hsv), the dataset will be scaled to 0 and 360 for hue covering the full spectrum of colors. However, you can limit the range of hue (to show only a special color range) by explicitly requesting them after the name (for example --colormap=hsv,20,240).

The mapping of a single-channel dataset to HSV is done through the Hue and Value elements: Lower dataset elements have lower “value” and lower “hue”. This creates darker colors for fainter parts, while also respecting the range of colors.

viridis

Viridis is the default colormap of the popular Matplotlib module of Python and available in many other visualization tools like PGFPlots.

sls

The SLS color range, taken from the commonly used SAO DS9. The advantage of this color range is that it starts with black, going into dark blue and finishes with the brighter colors of red and white. So unlike the HSV color range, it includes black and white and brighter colors (like yellow, red) show the larger values.

sls-inverse

The inverse of the SLS color map (see above), where the lowest value corresponds to white and the highest value is black. While SLS is good for visualizing on the monitor, SLS-inverse is good for printing.

--rgbtohsv

When there are three input channels and the output is in the FITS format, interpret the three input channels as red, green and blue channels (RGB) and convert them to the hue, saturation, value (HSV) color space.

The currently supported output formats of ConvertType don’t have native support for HSV. Therefore this option is only supported when the output is in FITS format and each of the hue, saturation and value arrays can be saved as one FITS extension in the output for further analysis (for example to select a certain color).

Flux range:

-c STR
--change=STR

(=STR) Change pixel values with the following format "from1:to1, from2:to2,...". This option is very useful in displaying labeled pixels (not actual data images which have noise) like segmentation maps. In labeled images, usually a group of pixels have a fixed integer value. With this option, you can manipulate the labels before the image is displayed to get a better output for print or to emphasize on a particular set of labels and ignore the rest. The labels in the images will be changed in the same order given. By default first the pixel values will be converted then the pixel values will be truncated (see --fluxlow and --fluxhigh).

You can use any number for the values irrespective of your final output, your given values are stored and used in the double precision floating point format. So for example if your input image has labels from 1 to 20000 and you only want to display those with labels 957 and 11342 then you can run ConvertType with these options:

$astconvertt --change=957:50000,11342:50001 --fluxlow=5e4 \ --fluxhigh=1e5 segmentationmap.fits --output=jpg  While the output JPEG format is only 8 bit, this operation is done in an intermediate step which is stored in double precision floating point. The pixel values are converted to 8-bit after all operations on the input fluxes have been complete. By placing the value in double quotes you can use as many spaces as you like for better readability. -C --changeaftertrunc Change pixel values (with --change) after truncation of the flux values, by default it is the opposite. -L FLT --fluxlow=FLT The minimum flux (pixel value) to display in the output image, any pixel value below this value will be set to this value in the output. If the value to this option is the same as --fluxhigh, then no flux truncation will be applied. Note that when multiple channels are given, this value is used for all the color channels. -H FLT --fluxhigh=FLT The maximum flux (pixel value) to display in the output image, see --fluxlow. -m INT --maxbyte=INT This is only used for the JPEG and EPS output formats which have an 8-bit space for each channel of each pixel. The maximum value in each pixel can therefore be $$2^8-1=255$$. With this option you can change (decrease) the maximum value. By doing so you will decrease the dynamic range. It can be useful if you plan to use those values for other purposes. -A INT --forcemin=INT Enforce the value of --fluxlow (when its given), even if its smaller than the minimum of the dataset and the output is format supporting color. This is particularly useful when you are converting a number of images to a common image format like JPEG or PDF with a single command and want them all to have the same range of colors, independent of the contents of the dataset. Note that if the minimum value is smaller than --fluxlow, then this option is redundant. By default, when the dataset only has two values, and the output format is PDF or EPS, ConvertType will use the PostScript optimization that allows setting the pixel values per bit, not byte (Recognized file formats). This can greatly help reduce the file size. However, when --fluxlow or --fluxhigh are called, this optimization is disabled: even though there are only two values (is binary), the difference between them does not correspond to the full contrast of black and white. -B INT --forcemax=INT Similar to --forcemin, but for the maximum. -i --invert For 8-bit output types (JPEG, EPS, and PDF for example) the final value that is stored is inverted so white becomes black and vice versa. The reason for this is that astronomical images usually have a very large area of blank sky in them. The result will be that a large are of the image will be black. Note that this behavior is ideal for gray-scale images, if you want a color image, the colors are going to be mixed up. Previous: , Up: Data containers [Contents][Index] ### 5.4 Table Tables are the products of processing astronomical images and spectra. For example in Gnuastro, MakeCatalog will process the defined pixels over an object and produce a catalog (see MakeCatalog). For each identified object, MakeCatalog can print its position on the image or sky, its total brightness and many other information that is deducible from the given image. Each one of these properties is a column in its output catalog (or table) and for each input object, we have a row. When there are only a small number of objects (rows) and not too many properties (columns), then a simple plain text file is mainly enough to store, transfer, or even use the produced data. However, to be more efficient in all these aspects, astronomers have defined the FITS binary table standard to store data in a binary (0 and 1) format, not plain text. This can offer major advantages in all those aspects: the file size will be greatly reduced and the reading and writing will be faster (because the RAM and CPU also work in binary). The FITS standard also defines a standard for ASCII tables, where the data are stored in the human readable ASCII format, but within the FITS file structure. These are mainly useful for keeping ASCII data along with images and possibly binary data as multiple (conceptually related) extensions within a FITS file. The acceptable table formats are fully described in Tables. Binary tables are not easily readable by human eyes. There is no fixed/unified standard on how the zero and ones should be interpreted. The Unix-like operating systems have flourished because of a simple fact: communication between the various tools is based on human readable characters91. So while the FITS table standards are very beneficial for the tools that recognize them, they are hard to use in the vast majority of available software. This creates limitations for their generic use. ‘Table’ is Gnuastro’s solution to this problem. With Table, FITS tables (ASCII or binary) are directly accessible to the Unix-like operating systems power-users (those working the command-line or shell, see Command-line interface). With Table, a FITS table (in binary or ASCII formats) is only one command away from AWK (or any other tool you want to use). Just like a plain text file that you read with the cat command. You can pipe the output of Table into any other tool for higher-level processing, see the examples in Invoking Table for some simple examples. Next: , Previous: , Up: Table [Contents][Index] #### 5.4.1 Column arithmetic After reading the requested columns from the input table, you can also do operations/arithmetic on the columns and save the resulting values as new column(s) in the output table (possibly in between other requested columns). To enable column arithmetic, the first 6 characters of the value to --column (-c) should be the arithmetic activation word ‘arith ’ (note the space character in the end, after ‘arith’). After the activation word, you can use the reverse polish notation to identify the operators and their operands, see Reverse polish notation. Just note that white-space characters are used between the tokens of the arithmetic expression and that they are meaningful to the command-line environment. Therefore the whole expression (including the activation word) has to be quoted on the command-line or in a shell script (see the examples below). To identify a column you can directly use its name, or specify its number (counting from one, see Selecting table columns). When you are giving a column number, it is necessary to prefix the number with a $, similar to AWK. Otherwise the number is not distinguishable from a constant number to use in the arithmetic operation.

For example with the command below, the first two columns of table.fits will be printed along with a third column that is the result of multiplying the first column with $$10^{10}$$ (for example to convert wavelength from Meters to Angstroms). Note that without the ‘$’, it is not possible to distinguish between “1” as a column-counter, or as a constant number to use in the arithmetic operation. Also note that because of the significance of$ for the command-line environment, the single-quotes are used here (as in an AWK expression), not double-quotes.

$asttable table.fits -c1,2 -c'arith$1 1e10 x'

 Single quotes when string contains $: On the command-line, or in shell-scripts,$ is used to expand variables, for example echo $PATH prints the value (a string of characters) in the variable PATH, it will not simply print $PATH. This operation is also permitted within double quotes, so echo "$PATH" will produce the same output. This is good when printing values, for example in the command below, $PATH will expand to the value within it. $echo "My path is:$PATH"  If you actually want to return the literal string $PATH, not the value in the PATH variable (like the scenario here in column arithmetic), you should put it in single quotes like below. The printed value here will include the $, please try it to see for your self and compare to above. $echo 'My path is:$PATH'  Therefore, when your column arithmetic involves the $sign (to specify columns by number), quote your arith  string with a single quotation mark. Otherwise you can use both single or double quotes. Alternatively, if the columns have meta-data and the first two are respectively called AWAV and SPECTRUM, the command above is equivalent to the command below. Note that the character ‘$’ is no longer necessary in this scenario (because names will not be confused with numbers):

$asttable table.fits -cAWAV,SPECTRUM -c'arith AWAV 1e10 x'  Comparison of the two commands above clearly shows why it is recommended to use column names instead of numbers. When the columns have descriptive names, the command/script actually becomes much more readable, describing the intent of the operation. It is also independent of the low-level table structure: for the second command, the position of the AWAV and SPECTRUM columns in table.fits is irrelevant. By nature, column arithmetic changes the values of the data within the column. So the old column meta data can’t be used any more. By default the new column created for the arithmetic operation will be given generic metadata (for example its name will be ARITH_1, which is hardly useful!). But meta data are critically important and it is good practice to always have short, but descriptive, names for each columns, units and also some comments for more explanation. To add metadata to a column, you can use the --colmetadata option that is described in Invoking Table. Finally, since the arithmetic expressions are a value to --column, it doesn’t necessarily have to be a separate option, so the commands above are also identical to the command below (note that this only has one -c option). Just be very careful with the quoting! $ asttable table.fits -cAWAV,SPECTRUM,'arith AWAV 1e10 x'


Almost all the arithmetic operators of Arithmetic operators are also supported for column arithmetic in Table. In particular, the few that are not present in the Gnuastro library aren’t yet supported. For a list of the Gnuastro library arithmetic operators, please see the macros starting with GAL_ARITHMETIC_OP and ending with the operator name in Arithmetic on datasets (arithmetic.h). Besides the operators in Arithmetic operators, several operators are only available in Table to use on table columns.

wcstoimg

Convert the given WCS positions to image/dataset coordinates based on the number of dimensions in the WCS structure of --wcshdu extension/HDU in --wcsfile. It will output the same number of columns. The first popped operand is the last FITS dimension.

For example the two commands below (which have the same output) will produce 5 columns. The first three columns are the input table’s ID, RA and Dec columns. The fourth and fifth columns will be the pixel positions in image.fits that correspond to each RA and Dec.

$asttable table.fits -cID,RA,DEC,'arith RA DEC wcstoimg' \ --wcsfile=image.fits$ asttable table.fits -cID,RA -cDEC \
-c'arith RA DEC wcstoimg' --wcsfile=image.fits

imgtowcs

Similar to wcstoimg, except that image/dataset coordinates are converted to WCS coordinates.

distance-flat

Return the distance between two points assuming they are on a flat surface. Note that each point needs two coordinates, so this operator needs four operands (currently it only works for 2D spaces). The first and second popped operands are considered to belong to one point and the third and fourth popped operands to the second point.

Each of the input points can be a single coordinate or a full table column (containing many points). In other words, the following commands are all valid:

$asttable table.fits \ -c'arith X1 Y1 X2 Y2 distance-flat'$ asttable table.fits \
-c'arith X Y 12.345 6.789 distance-flat'
$asttable table.fits \ -c'arith 12.345 6.789 X Y distance-flat'  In the first case we are assuming that table.fits has the following four columns X1, Y1, X2, Y2. The returned column by this operator will be the difference between two points in each row with coordinates like the following (X1, Y1) and (X2, Y2). In other words, for each row, the distance between different points is calculated. In the second and third cases (which are identical), it is assumed that table.fits has the two columns X and Y. The returned column by this operator will be the difference of each row with the fixed point at (12.345, 6.789). distance-on-sphere Return the spherical angular distance (along a great circle, in degrees) between the given two points. Note that each point needs two coordinates (in degrees), so this operator needs four operands. The first and second popped operands are considered to belong to one point and the third and fourth popped operands to the second point. Each of the input points can be a single coordinate or a full table column (containing many points). In other words, the following commands are all valid: $ asttable table.fits \
-c'arith RA1 DEC1 RA2 DEC2 distance-on-sphere'
$asttable table.fits \ -c'arith RA DEC 9.876 5.432 distance-on-sphere'$ asttable table.fits \
-c'arith 9.876 5.432 RA DEC distance-on-sphere'


In the first case we are assuming that table.fits has the following four columns RA1, DEC1, RA2, DEC2. The returned column by this operator will be the difference between two points in each row with coordinates like the following (RA1, DEC1) and (RA2, DEC2). In other words, for each row, the angular distance between different points is calculated. In the second and third cases (which are identical), it is assumed that table.fits has the two columns RA and DEC. The returned column by this operator will be the difference of each row with the fixed point at (9.876, 5.432).

The distance (along a great circle) on a sphere between two points is calculated with the equation below, where $$r_1$$, $$r_2$$, $$d_1$$ and $$d_2$$ are the right ascensions and declinations of points 1 and 2.

$$\cos(d)=\sin(d_1)\sin(d_2)+\cos(d_1)\cos(d_2)\cos(r_1-r_2)$$

ra-to-degree

Convert the hour-wise Right Ascension (RA) string, in the format of HH:MM:SS, to degrees. Note that the input column has to be an string format. In FITS tables, string columns are well-defined. For plain-text tables, please follow the standards defined in Gnuastro text table format, otherwise the string column won’t be read.

$asttable catalog.fits -c'arith RA ra-to-degree'$ asttable catalog.fits -c'arith $5 ra-to-degree'  dec-to-degree Convert the Declination (Dec) string, in the format of DD:MM:SS, to degrees (a single floating point number). For more details please see the ra-to-degree operator. degree-to-ra Convert degrees (a column with a single floating point number) to the Right Ascension, RA, string (in the format of HH:MM:SS). The output will be a string column so no further mathematical operations can be done on it. The output can be in any format (for example FITS or plain-text). If its plain-text, the string column will be written following the standards described in Gnuastro text table format. degree-to-dec Convert degrees (a column with a single floating point number) to the Declination, Dec, string (in the format of DD:MM:SS). See the degree-to-ra for more on the format of the output. Previous: , Up: Table [Contents][Index] #### 5.4.2 Invoking Table Table will read/write, select, convert, or show the information of the columns in FITS ASCII table, FITS binary table and plain text table files, see Tables. Output columns can also be determined by number or regular expression matching of column names, units, or comments. The executable name is asttable with the following general template $ asttable [OPTION...] InputFile


One line examples:

## Get the table column information (name, data type, or units):
$asttable bintab.fits --information ## Print columns named RA and DEC, followed by all the columns where ## the name starts with "MAG_":$ asttable bintab.fits --column=RA --column=DEC --column=/^MAG_/

## Similar to the above, but with one call to --column' (or -c'),
## also sort the rows by the input's photometric redshift (Z_PHOT')
## column. To confirm the sort, you can add Z_PHOT' to the columns
## to print.
$asttable bintab.fits -cRA,DEC,/^MAG_/ --sort=Z_PHOT ## Similar to the above, but only print rows that have a photometric ## redshift between 2 and 3.$ asttable bintab.fits -cRA,DEC,/^MAG_/ --range=Z_PHOT,2:3

## Only print rows with a value in the 10th column above 100000:
$asttable bintab.fits --range=10,10e5,inf ## Only print the 2nd column, and the third column multiplied by 5, ## Save the resulting two columns in table.txt'$ asttable bintab.fits -c2,'arith $2 5 x' -otable.fits ## Sort the output columns by the third column, save output:$ asttable bintab.fits --sort=3 -ooutput.txt

## Subtract the first column from the second in cat.fits' (can also
## be a text table) and keep the third and fourth columns.
$asttable cat.txt -c'arith$2 1 -',3,4 -ocat.fits  Table’s input dataset can be given either as a file or from Standard input (see Standard input). In the absence of selected columns, all the input’s columns and rows will be written to the output. If any output file is explicitly requested (with --output) the output table will be written in it. When no output file is explicitly requested the output table will be written to the standard output. If the specified output is a FITS file, the type of FITS table (binary or ASCII) will be determined from the --tabletype option. If the output is not a FITS file, it will be printed as a plain text table (with space characters between the columns). When the columns are accompanied by meta-data (like column name, units, or comments), this information will also printed in the plain text file before the table, as described in Gnuastro text table format. For the full list of options common to all Gnuastro programs please see Common options. Options can also be stored in directory, user or system-wide configuration files to avoid repeating on the command-line, see Configuration files. Table does not follow Automatic output that is common in most Gnuastro programs, see Automatic output. Thus, in the absence of an output file, the selected columns will be printed on the command-line with no column information, ready for redirecting to other tools like AWK or sort, similar to the examples above. -i --information Only print the column information in the specified table on the command-line and exit. Each column’s information (number, name, units, data type, and comments) will be printed as a row on the command-line. Note that the FITS standard only requires the data type (see Numeric data types), and in plain text tables, no meta-data/information is mandatory. Gnuastro has its own convention in the comments of a plain text table to store and transfer this information as described in Gnuastro text table format. This option will take precedence over the --column option, so when it is called along with requested columns, the latter will be ignored. This can be useful if you forget the identifier of a column after you have already typed some on the command-line. You can simply add a -i and run Table to see the whole list and remember. Then you can use the shell history (with the up arrow key on the keyboard), and retrieve the last command with all the previously typed columns present, delete -i and add the identifier you had forgot. -c STR/INT --column=STR/INT Set the output columns either by specifying the column number, or name. For more on selecting columns, see Selecting table columns. If a value of this option starts with ‘arith ’, this option will do the requested operations/arithmetic on the specified columns and output the result in that place (among other requested columns). For more on column arithmetic see Column arithmetic. To ask for multiple columns this option can be used in two way: 1) multiple calls to this option, 2) using a comma between each column specifier in one call to this option. These different solutions may be mixed in one call to Table: for example, -cRA,DEC -cMAG, or -cRA -cDEC -cMAG are both equivalent to -cRA -cDEC -cMAG. The order of the output columns will be the same order given to the option or in the configuration files (see Configuration file precedence). This option is not mandatory, if no specific columns are requested, all the input table columns are output. When this option is called multiple times, it is possible to output one column more than once. -w STR --wcsfile=STR FITS file that contains the WCS to be used in the wcstoimg and imgtowcs operators of --column (see above). The extension name/number within the FITS file can be specified with --wcshdu. If the value to this option is none, no WCS will be written in the output. -W STR --wcshdu=STR FITS extension/HDU that contains the WCS to be used in the wcstoimg and imgtowcs operators of --column (see above). The FITS file name can be specified with --wcsfile. -L STR --catcolumnfile=STR Concatenate (or add, or append) the columns of this option’s value (a filename) to the output columns. This option may be called multiple times (to add columns from more than one file into the final output), the columns from each file will be added in the same order that this option is called. By default all the columns of the given file will be appended, if you only want certain columns to be appended, use the --catcolumns option to specify their name or number (see Selecting table columns). Note that the columns given to --catcolumns must be present in all the given files (if this option is called more than once). The concatenation is done after any column selection (for example with --column) or row selection (for example with --range) is applied to the main input table given to Table. The number of rows in the file(s) given to this option has to be the same as the final output table if this option wasn’t given. If the file given to this option is a FITS file, its necessary to also define the corresponding HDU/extension with --catcolumnhdu. Also note that no operation (for example row selection, arithmetic or etc) is applied to the table given to this option. If the appended columns have a name, the column names of each file will be appended with a -N, where N is a counter starting from 1 for each appended file. This is done because when concatenating columns from multiple tables (more than two) into one, they may have the same name, and its not good practice to have multiple columns with the same name. You can disable this feature with --catcolumnrawname. To have full control over the concatenated column names, you can use the --colmetadata option described below. For example, let’s assume you have two catalogs of the same objects (same number of rows) in different filters. Such that f160w-cat.fits has a MAGNITUDE column that has the magnitude of each object in the F160W filter and similarly f105w-cat.fits, also has a MAGNITUDE column, but for the F105W filter. You can use column concatenation like below to import the MAGNITUDE column from the F105W catalog into the F160W catalog, while giving each magnitude column a different name: asttable f160w-cat.fits --output=both.fits \ --catcolumnfile=f105w-cat.fits --catcolumns=MAGNITUDE \ --colmetadata=MAGNITUDE,MAG-F160W,log,"Magnitude in F160W" \ --colmetadata=MAGNITUDE-1,MAG-F105W,log,"Magnitude in F105W"  -u STR/INT --catcolumnhdu=STR/INT The HDU/extension of the FITS file(s) that should be concatenated, or appended, with --catcolumnfile. If --catcolumn is called more than once with more than one FITS file, its necessary to call this option more than once. The HDUs will be loaded in the same order as the FITS files given to --catcolumnfile. -C STR/INT --catcolumns=STR/INT The column(s) in the file(s) given to --catcolumnfile to append. When this option is not given, all the columns will be concatenated. See --catcolumnfile for more. --catcolumnrawname Don’t modify the names of the concatenated (appended) columns, see description in --catcolumnfile. -O --colinfoinstdout Add column metadata when the output is printed in the standard output. Usually the standard output is used for a fast visual check or to pipe into other program for further processing. So by default meta-data aren’t included. -r STR,FLT:FLT --range=STR,FLT:FLT Only output rows that have a value within the given range in the STR column (can be a name or counter). Note that the range is only inclusive in the lower-limit. For example with --range=sn,5:20 the output’s columns will only contain rows that have a value in the sn column (not case-sensitive) that is greater or equal to 5, and less than 20. This option can be called multiple times (different ranges for different columns) in one run of the Table program. This is very useful for selecting the final rows from multiple criteria/columns. The chosen column doesn’t have to be in the output columns. This is good when you just want to select using one column’s values, but don’t need that column anymore afterwards. For one example of using this option, see the example under --sigclip-median in Invoking Statistics. --inpolygon=STR1,STR2 Only return rows where the given coordinates are inside the polygon specified by the --polygon option. The coordinate columns are the given STR1 and STR2 columns, they can be a column name or counter (see Selecting table columns). Note that the chosen columns doesn’t have to be in the output columns (which are specified by the --column option). For example if we want to select rows in the polygon specified in Dataset inspection and cropping, this option can be used like this (you can remove the double quotations and write them all in one line if you remove the white-spaces around the colon separating the column vertices): asttable table.fits --inpolygon=RA,DEC \ --polygon="53.187414,-27.779152 \ : 53.159507,-27.759633 \ : 53.134517,-27.787144 \ : 53.161906,-27.807208" \   Flat/Euclidean space: The --inpolygon option assumes a flat/Euclidean space so it is only correct for RA and Dec when the polygon size is very small like the example above. If your polygon is a degree or larger, it may not return correct results. We are working on other options for this. --outpolygon=STR1,STR2 Only return rows where the given coordinates are outside the polygon specified by the --polygon option. This option is very similar to the --inpolygon option, so see the description there for more. --polygon=FLT:FLT,... The polygon to use for the --inpolygon and --outpolygon options. The values to this option is parsed in the same way that the Crop program, see its description there for more: Crop options. -e STR,INT/FLT,... --equal=STR,INT/FLT,... Only output rows that are equal to the given number(s) in the given column. The first argument is the column identifier (name or number, see Selecting table columns), after that you can specify any number of values. For example --equal=ID,5,6,8 will only print the rows that have a value of 5, 6, or 8 in the ID column. This option can also be called multiple times, so --equal=ID,4,5 --equal=ID,6,7 has the same effect as --equal=4,5,6,7. The --equal and --notequal options also work when the given column has a string type. In this case the given value to the option will also be parsed as a string, not as a number. When dealing with string columns, be careful with trailing white space characters (the actual value maybe adjusted to the right, left, or center of the column’s width). If you need to account for such white spaces, you can use shell quoting. For example --equal=NAME," myname ".  Equality and floating point numbers: Floating point numbers are only approximate values (see Numeric data types). In this context, their equality depends on how the the input table was originally stored (as a plain text table or as an ASCII/binary FITS table). If you want to select floating point numbers, it is strongly recommended to use the --range option and set a very small interval around your desired number, don’t use --equal or --notequal. -n STR,INT/FLT,... --notequal=STR,INT/FLT,... Only output rows that are not equal to the given number(s) in the given column. The first argument is the column identifier (name or number, see Selecting table columns), after that you can specify any number of values. For example --notequal=ID,5,6,8 will only print the rows where the ID column doesn’t have value of 5, 6, or 8. This option can also be called multiple times, so --notequal=ID,4,5 --notequal=ID,6,7 has the same effect as --notequal=4,5,6,7. Be very careful if you want to use the non-equality with floating point numbers, see the special note under --equal for more. This option also works when the given column has a string type, see the description under --equal (above) for more. -s STR --sort=STR Sort the output rows based on the values in the STR column (can be a column name or number). By default the sort is done in ascending/increasing order, to sort in a descending order, use --descending. The chosen column doesn’t have to be in the output columns. This is good when you just want to sort using one column’s values, but don’t need that column anymore afterwards. -d --descending When called with --sort, rows will be sorted in descending order. -H INT --head=INT Only print the given number of rows from the top of the final table. Note that this option only affects the output table. For example if you use --sort, or --range, the printed rows are the first after applying the sort sorting, or selecting a range of the full input. If the given value to --head is 0, the output columns won’t have any rows and if its larger than the number of rows in the input table, all the rows are printed (this option is effectively ignored). This behavior is taken from the head program in GNU Coreutils. -t INT --tail=INT Only print the given number of rows from the bottom of the final table. See --head for more. -m STR/INT,STR[,STR[,STR]] --colmetadata=STR/INT,STR[,STR[,STR]] Update a column’s metadata just before writing the final table (after all other operations are done, for example column arithmetic, or column concatenation). The first value (before the first comma) given to this option can either be a counter (positive integer, counting from 1), or a name (the column’s name in the output if this option wasn’t called). This option can be very useful in conjunction with column arithmetic (see Column arithmetic), or column concatenation (appending multiple columns from different tables, for more see --catcolumnfile). After the to-be-updated column is identified, at least one other strings should be given, with a maximum of three strings. The first string after the original name will the the selected column’s new name. The next (optional) string will be the selected column’s unit and the third (optional) will be its comments. If the two optional strings aren’t given original column’s units or comments will remain unchanged. Here are three examples --colmetadata=MAGNITUDE,MAG_F160W This will convert name of the original MAGNITUDE column to MAG_F160W, leaving the unit and comments unchanged. --colmetadata=3,MAG_F160W,mag This will convert name of the third column of the final output to MAG_F160W and the units to mag, while leaving the comments untouched. --colmetadata=MAGNITUDE,MAG_F160W,mag,"Magnitude in F160W filter" This will convert name of the original MAGNITUDE column to MAG_F160W, and the units to mag and the comments to Magnitude in F160W filter. Note the double quotations around the comment string, they are necessary to preserve the white-space characters within the column comment from the command-line, into the program (otherwise, upon reaching a white-space character, the shell will consider this option to be finished and cause un-expected behavior). The recommended way to use this option is to first do all your operations on your table’s data and write it into a temporary file (maybe called temp.fits). Look into that file’s metadata (with asttable temp.fits -i) to see the exact column positions and possible names, then add the necessary calls to this option to your previous call to asttable, so it writes proper metadata in the same run (for example in a script or Makefile). Recall that when a name is given, this option will update the metadata of the first column that matches, so if you have multiple columns with the same name, you can call this options multiple times with the same first argument to change them all. Finally, if you already have a FITS table by other means (for example by downloading) and you merely want to update the column metadata and leave the data intact, it is much more efficient to directly modify the respective FITS header keywords with astfits, using the keyword manipulation features described in Keyword manipulation. --colmetadata is mainly intended for scenarios where you want to edit the data so it will always load the full/partial dataset into memory, then write out the resulting datasets with updated/corrected metadata. Next: , Previous: , Up: Top [Contents][Index] ## 6 Data manipulation Images are one of the major formats of data that is used in astronomy. The functions in this chapter explain the GNU Astronomy Utilities which are provided for their manipulation. For example cropping out a part of a larger image or convolving the image with a given kernel or applying a transformation to it. Next: , Previous: , Up: Data manipulation [Contents][Index] ### 6.1 Crop Astronomical images are often very large, filled with thousands of galaxies. It often happens that you only want a section of the image, or you have a catalog of sources and you want to visually analyze them in small postage stamps. Crop is made to do all these things. When more than one crop is required, Crop will divide the crops between multiple threads to significantly reduce the run time. Astronomical surveys are usually extremely large. So large in fact, that the whole survey will not fit into a reasonably sized file. Because of this, surveys usually cut the final image into separate tiles and store each tile in a file. For example the COSMOS survey’s Hubble space telescope, ACS F814W image consists of 81 separate FITS images, with each one having a volume of 1.7 Giga bytes. Even though the tile sizes are chosen to be large enough that too many galaxies/targets don’t fall on the edges of the tiles, inevitably some do. So when you simply crop the image of such targets from one tile, you will miss a large area of the surrounding sky (which is essential in estimating the noise). Therefore in its WCS mode, Crop will stitch parts of the tiles that are relevant for a target (with the given width) from all the input images that cover that region into the output. Of course, the tiles have to be present in the list of input files. Besides cropping postage stamps around certain coordinates, Crop can also crop arbitrary polygons from an image (or a set of tiles by stitching the relevant parts of different tiles within the polygon), see --polygon in Invoking Crop. Alternatively, it can crop out rectangular regions through the --section option from one image, see Crop section syntax. Next: , Previous: , Up: Crop [Contents][Index] #### 6.1.1 Crop modes In order to be comprehensive, intuitive, and easy to use, there are two ways to define the crop: 1. From its center and side length. For example if you already know the coordinates of an object and want to inspect it in an image or to generate postage stamps of a catalog containing many such coordinates. 2. The vertices of the crop region, this can be useful for larger crops over many targets, for example to crop out a uniformly deep, or contiguous, region of a large survey. Irrespective of how the crop region is defined, the coordinates to define the crop can be in Image (pixel) or World Coordinate System (WCS) standards. All coordinates are read as floating point numbers (not integers, except for the --section option, see below). By setting the mode in Crop, you define the standard that the given coordinates must be interpreted. Here, the different ways to specify the crop region are discussed within each standard. For the full list options, please see Invoking Crop. When the crop is defined by its center, the respective (integer) central pixel position will be found internally according to the FITS standard. To have this pixel positioned in the center of the cropped region, the final cropped region will have an add number of pixels (even if you give an even number to --width in image mode). Furthermore, when the crop is defined as by its center, Crop allows you to only keep crops what don’t have any blank pixels in the vicinity of their center (your primary target). This can be very convenient when your input catalog/coordinates originated from another survey/filter which is not fully covered by your input image, to learn more about this feature, please see the description of the --checkcenter option in Invoking Crop. Image coordinates In image mode (--mode=img), Crop interprets the pixel coordinates and widths in units of the input data-elements (for example pixels in an image, not world coordinates). In image mode, only one image may be input. The output crop(s) can be defined in multiple ways as listed below. Center of multiple crops (in a catalog) The center of (possibly multiple) crops are read from a text file. In this mode, the columns identified with the --coordcol option are interpreted as the center of a crop with a width of --width pixels along each dimension. The columns can contain any floating point value. The value to --output option is seen as a directory which will host (the possibly multiple) separate crop files, see Crop output for more. For a tutorial using this feature, please see Finding reddest clumps and visual inspection. Center of a single crop (on the command-line) The center of the crop is given on the command-line with the --center option. The crop width is specified by the --width option along each dimension. The given coordinates and width can be any floating point number. Vertices of a single crop In Image mode there are two options to define the vertices of a region to crop: --section and --polygon. The former is lower-level (doesn’t accept floating point vertices, and only a rectangular region can be defined), it is also only available in Image mode. Please see Crop section syntax for a full description of this method. The latter option (--polygon) is a higher-level method to define any polygon (with any number of vertices) with floating point values. Please see the description of this option in Invoking Crop for its syntax. WCS coordinates In WCS mode (--mode=wcs), the coordinates and widths are interpreted using the World Coordinate System (WCS, that must accompany the dataset), not pixel coordinates. In WCS mode, Crop accepts multiple datasets as input. When the cropped region (defined by its center or vertices) overlaps with multiple of the input images/tiles, the overlapping regions will be taken from the respective input (they will be stitched when necessary for each output crop). In this mode, the input images do not necessarily have to be the same size, they just need to have the same orientation and pixel resolution. Currently only orientation along the celestial coordinates is accepted, if your input has a different orientation you can use Warp’s --align option to align the image before cropping it (see Warp). Each individual input image/tile can even be smaller than the final crop. In any case, any part of any of the input images which overlaps with the desired region will be used in the crop. Note that if there is an overlap in the input images/tiles, the pixels from the last input image read are going to be used for the overlap. Crop will not change pixel values, so it assumes your overlapping tiles were cutout from the same original image. There are multiple ways to define your cropped region as listed below. Center of multiple crops (in a catalog) Similar to catalog inputs in Image mode (above), except that the values along each dimension are assumed to have the same units as the dataset’s WCS information. For example, the central RA and Dec value for each crop will be read from the first and second calls to the --coordcol option. The width of the cropped box (in units of the WCS, or degrees in RA and Dec mode) must be specified with the --width option. Center of a single crop (on the command-line) You can specify the center of only one crop box with the --center option. If it exists in the input images, it will be cropped similar to the catalog mode, see above also for --width. Vertices of a single crop The --polygon option is a high-level method to define any convex polygon (with any number of vertices). Please see the description of this option in Invoking Crop for its syntax.  CAUTION: In WCS mode, the image has to be aligned with the celestial coordinates, such that the first FITS axis is parallel (opposite direction) to the Right Ascension (RA) and the second FITS axis is parallel to the declination. If these conditions aren’t met for an image, Crop will warn you and abort. You can use Warp’s --align option to align the input image with these coordinates, see Warp. As a summary, if you don’t specify a catalog, you have to define the cropped region manually on the command-line. In any case the mode is mandatory for Crop to be able to interpret the values given as coordinates or widths. Next: , Previous: , Up: Crop [Contents][Index] #### 6.1.2 Crop section syntax When in image mode, one of the methods to crop only one rectangular section from the input image is to use the --section option. Crop has a powerful syntax to read the box parameters from a string of characters. If you leave certain parts of the string to be empty, Crop can fill them for you based on the input image sizes. To define a box, you need the coordinates of two points: the first (X1, Y1) and the last pixel (X2, Y2) pixel positions in the image, or four integer numbers in total. The four coordinates can be specified with one string in this format: ‘X1:X2,Y1:Y2’. This string is given to the --section option. Therefore, the pixels along the first axis that are $$\geq$$X1 and $$\leq$$X2 will be included in the cropped image. The same goes for the second axis. Note that each different term will be read as an integer, not a float. The reason it only accepts integers is that --section is a low-level option (which is also very fast!). For a higher-level way to specify region (any polygon, not just a box), please see the --polygon option in Crop options. Also note that in the FITS standard, pixel indexes along each axis start from unity(1) not zero(0). You can omit any of the values and they will be filled automatically. The left hand side of the colon (:) will be filled with 1, and the right side with the image size. So, 2:,: will include the full range of pixels along the second axis and only those with a first axis index larger than 2 in the first axis. If the colon is omitted for a dimension, then the full range is automatically used. So the same string is also equal to 2:, or 2: or even 2. If you want such a case for the second axis, you should set it to: ,2. If you specify a negative value, it will be seen as before the indexes of the image which are outside the image along the bottom or left sides when viewed in SAO ds9. In case you want to count from the top or right sides of the image, you can use an asterisk (*). When confronted with a *, Crop will replace it with the maximum length of the image in that dimension. So *-10:*+10,*-20:*+20 will mean that the crop box will be 20\times40 pixels in size and only include the top corner of the input image with 3/4 of the image being covered by blank pixels, see Blank pixels. If you feel more comfortable with space characters between the values, you can use as many space characters as you wish, just be careful to put your value in double quotes, for example --section="5:200, 123:854". If you forget the quotes, anything after the first space will not be seen by --section and you will most probably get an error because the rest of your string will be read as a filename (which most probably doesn’t exist). See Command-line for a description of how the command-line works. Next: , Previous: , Up: Crop [Contents][Index] #### 6.1.3 Blank pixels The cropped box can potentially include pixels that are beyond the image range. For example when a target in the input catalog was very near the edge of the input image. The parts of the cropped image that were not in the input image will be filled with the following two values depending on the data type of the image. In both cases, SAO ds9 will not color code those pixels. • If the data type of the image is a floating point type (float or double), IEEE NaN (Not a number) will be used. • For integer types, pixels out of the image will be filled with the value of the BLANK keyword in the cropped image header. The value assigned to it is the lowest value possible for that type, so you will probably never need it any way. Only for the unsigned character type (BITPIX=8 in the FITS header), the maximum value is used because it is unsigned, the smallest value is zero which is often meaningful. You can ask for such blank regions to not be included in the output crop image using the --noblank option. In such cases, there is no guarantee that the image size of your outputs are what you asked for. In some survey images, unfortunately they do not use the BLANK FITS keyword. Instead they just give all pixels outside of the survey area a value of zero. So by default, when dealing with float or double image types, any values that are 0.0 are also regarded as blank regions. This can be turned off with the --zeroisnotblank option. Previous: , Up: Crop [Contents][Index] #### 6.1.4 Invoking Crop Crop will crop a region from an image. If in WCS mode, it will also stitch parts from separate images in the input files. The executable name is astcrop with the following general template  astcrop [OPTION...] [ASCIIcatalog] ASTRdata ...


One line examples:

## Crop all objects in cat.txt from image.fits:
$astcrop --catalog=cat.txt image.fits ## Crop all options in catalog (with RA,DEC) from all the files ## ending in _drz.fits' in /mnt/data/COSMOS/':$ astcrop --mode=wcs --catalog=cat.txt /mnt/data/COSMOS/*_drz.fits

## Crop the outer 10 border pixels of the input image:
$astcrop --section=10:*-10,10:*-10 --hdu=2 image.fits ## Crop region around RA and Dec of (189.16704, 62.218203):$ astcrop --mode=wcs --center=189.16704,62.218203 goodsnorth.fits

## Crop region around pixel coordinate (568.342, 2091.719):
$astcrop --mode=img --center=568.342,2091.719 --width=201 image.fits  Crop has one mandatory argument which is the input image name(s), shown above with ASTRdata .... You can use shell expansions, for example * for this if you have lots of images in WCS mode. If the crop box centers are in a catalog, you can use the --catalog option. In other cases, you have to provide the single cropped output parameters must be given with command-line options. See Crop output for how the output file name(s) can be specified. For the full list of general options to all Gnuastro programs (including Crop), please see Common options. Floating point numbers can be used to specify the crop region (except the --section option, see Crop section syntax). In such cases, the floating point values will be used to find the desired integer pixel indices based on the FITS standard. Hence, Crop ultimately doesn’t do any sub-pixel cropping (in other words, it doesn’t change pixel values). If you need such crops, you can use Warp to first warp the image to the a new pixel grid, then crop from that. For example, let’s assume you want a crop from pixels 12.982 to 80.982 along the first dimension. You should first translate the image by $$-0.482$$ (note that the edge of a pixel is at integer multiples of $$0.5$$). So you should run Warp with --translate=-0.482,0 and then crop the warped image with --section=13:81. There are two ways to define the cropped region: with its center or its vertices. See Crop modes for a full description. In the former case, Crop can check if the central region of the cropped image is indeed filled with data or is blank (see Blank pixels), and not produce any output when the center is blank, see the description under --checkcenter for more. When in catalog mode, Crop will run in parallel unless you set --numthreads=1, see Multi-threaded operations. Note that when multiple outputs are created with threads, the outputs will not be created in the same order. This is because the threads are asynchronous and thus not started in order. This has no effect on each output, see Finding reddest clumps and visual inspection for a tutorial on effectively using this feature. Next: , Previous: , Up: Invoking astcrop [Contents][Index] #### 6.1.4.1 Crop options The options can be classified into the following contexts: Input, Output and operating mode options. Options that are common to all Gnuastro program are listed in Common options and will not be repeated here. When you are specifying the crop vertices your self (through --section, or --polygon) on relatively small regions (depending on the resolution of your images) the outputs from image and WCS mode can be approximately equivalent. However, as the crop sizes get large, the curved nature of the WCS coordinates have to be considered. For example, when using --section, the right ascension of the bottom left and top left corners will not be equal. If you only want regions within a given right ascension, use --polygon in WCS mode. Input image parameters: --hstartwcs=INT Specify the first keyword card (line number) to start finding the input image world coordinate system information. Distortions were only recently included in WCSLIB (from version 5). Therefore until now, different telescope would apply their own specific set of WCS keywords and put them into the image header along with those that WCSLIB does recognize. So now that WCSLIB recognizes most of the standard distortion parameters, they will get confused with the old ones and give completely wrong results. For example in the CANDELS-GOODS South images92. The two --hstartwcs and --hendwcs are thus provided so when using older datasets, you can specify what region in the FITS headers you want to use to read the WCS keywords. Note that this is only relevant for reading the WCS information, basic data information like the image size are read separately. These two options will only be considered when the value to --hendwcs is larger than that of --hstartwcs. So if they are equal or --hstartwcs is larger than --hendwcs, then all the input keywords will be parsed to get the WCS information of the image. --hendwcs=INT Specify the last keyword card to read for specifying the image world coordinate system on the input images. See --hstartwcs Crop box parameters: -c FLT[,FLT[,...]] --center=FLT[,FLT[,...]] The central position of the crop in the input image. The positions along each dimension must be separated by a comma (,) and fractions are also acceptable. The number of values given to this option must be the same as the dimensions of the input dataset. The width of the crop should be set with --width. The units of the coordinates are read based on the value to the --mode option, see below. -w FLT[,FLT[,...]] --width=FLT[,FLT[,...]] Width of the cropped region about its center. --width may take either a single value (to be used for all dimensions) or multiple values (a specific value for each dimension). If in WCS mode, value(s) given to this option will be read in the same units as the dataset’s WCS information along this dimension. The final output will have an odd number of pixels to allow easy identification of the pixel which keeps your requested coordinate (from --center or --catalog). The --width option also accepts fractions. For example if you want the width of your crop to be 3 by 5 arcseconds along RA and Dec respectively, you can call it with: --width=3/3600,5/3600. If you want an even sided crop, you can run Crop afterwards with --section=":*-1,:*-1" or --section=2:,2: (depending on which side you don’t need), see Crop section syntax. -l STR --polygon=STR String of vertices to define a polygon to crop. The vertices are used to define the polygon in the same order given to this option. When the vertices are not necessarily ordered in the proper order (for example one vertice in a square comes after its diagonal opposite), you can add the --polygonsort option which will attempt to sort the vertices before cropping. Note that for concave polygons, sorting is not recommended because there is no unique solution, for more, see the description under --polygonsort. This option can be used both in the image and WCS modes, see Crop modes. The cropped image will be the size of the rectangular region that completely encompasses the polygon. By default all the pixels that are outside of the polygon will be set as blank values (see Blank pixels). However, if --polygonout is called all pixels internal to the vertices will be set to blank. In WCS-mode, you may provide many FITS images/tiles: Crop will stitch them to produce this cropped region, then apply the polygon. The syntax for the polygon vertices is similar to, and simpler than, that for --section. In short, the dimensions of each coordinate are separated by a comma (,) and each vertex is separated by a colon (:). You can define as many vertices as you like. If you would like to use space characters between the dimensions and vertices to make them more human-readable, then you have to put the value to this option in double quotation marks. For example, let’s assume you want to work on the deepest part of the WFC3/IR images of Hubble Space Telescope eXtreme Deep Field (HST-XDF). According to the webpage93 the deepest part is contained within the coordinates: [ (53.187414,-27.779152), (53.159507,-27.759633), (53.134517,-27.787144), (53.161906,-27.807208) ]  They have provided mask images with only these pixels in the WFC3/IR images, but what if you also need to work on the same region in the full resolution ACS images? Also what if you want to use the CANDELS data for the shallow region? Running Crop with --polygon will easily pull out this region of the image for you, irrespective of the resolution. If you have set the operating mode to WCS mode in your nearest configuration file (see Configuration files), there is no need to call --mode=wcs on the command line. $ astcrop --mode=wcs desired-filter-image(s).fits           \
--polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \
53.134517,-27.787144 : 53.161906,-27.807208"


In other cases, you have an image and want to define the polygon yourself (it isn’t already published like the example above). As the number of vertices increases, checking the vertex coordinates on a FITS viewer (for example SAO ds9) and typing them in one by one can be very tedious and prone to typo errors.

You can take the following steps to avoid the frustration and possible typos: Open the image with ds9 and activate its “region” mode with Edit→Region. Then define the region as a polygon with Region→Shape→Polygon. Click on the approximate center of the region you want and a small square will appear. By clicking on the vertices of the square you can shrink or expand it, clicking and dragging anywhere on the edges will enable you to define a new vertex. After the region has been nicely defined, save it as a file with Region→Save Regions. You can then select the name and address of the output file, keep the format as REG and press “OK”. In the next window, keep format as “ds9” and “Coordinate System” as “fk5”. A plain text file (let’s call it ds9.reg) is now created.

You can now convert this plain text file to Crop’s polygon format with this command (when typing on the command-line, ignore the “\” at the end of the first and second lines along with the extra spaces, these are only for nice printing):

$v=$(awk 'NR==4' ds9.reg | sed -e's/polygon(//'        \
-e's/$$[^,]*,[^,]*$$,/\1:/g' -e's/)//' )
$astcrop --mode=wcs image.fits --polygon=$v

--polygonout

Keep all the regions outside the polygon and mask the inner ones with blank pixels (see Blank pixels). This is practically the inverse of the default mode of treating polygons. Note that this option only works when you have only provided one input image. If multiple images are given (in WCS mode), then the full area covered by all the images has to be shown and the polygon excluded. This can lead to a very large area if large surveys like COSMOS are used. So Crop will abort and notify you. In such cases, it is best to crop out the larger region you want, then mask the smaller region with this option.

--polygonsort

Sort the given set of vertices to the --polygon option. For a concave polygon it will sort the vertices correctly, however for a convex polygon it there is no unique sorting, so be careful because the crop may not be what you expected.

Polygons come in two classes: convex and concave (or generally, non-convex!), see below for a demonstration. Convex polygons are those where all inner angles are less than 180 degrees. By contrast, a convex polygon is one where an inner angle may be more than 180 degrees.

            Concave Polygon        Convex Polygon

D --------C          D------------- C
\        |        E /              |
\E      |          \              |
/       |           \             |
A--------B             A ----------B

-s STR
--section=STR

Section of the input image which you want to be cropped. See Crop section syntax for a complete explanation on the syntax required for this input.

-x STR/INT
--coordcol=STR/INT

The column in a catalog to read as a coordinate. The value can be either the column number (starting from 1), or a match/search in the table meta-data, see Selecting table columns. This option must be called multiple times, depending on the number of dimensions in the input dataset. If it is called more than necessary, the extra columns (later calls to this option on the command-line or configuration files) will be ignored, see Configuration file precedence.

-n STR/INT
--namecol=STR/INT

Column selection of crop file name. The value can be either the column number (starting from 1), or a match/search in the table meta-data, see Selecting table columns. This option can be used both in Image and WCS modes, and not a mandatory. When a column is given to this option, the final crop base file name will be taken from the contents of this column. The directory will be determined by the --output option (current directory if not given) and the value to --suffix will be appended. When this column isn’t given, the row number will be used instead.

Output options:

-c FLT/INT
--checkcenter=FLT/INT

Square box width of region in the center of the image to check for blank values. If any of the pixels in this central region of a crop (defined by its center) are blank, then it will not be stored in an output file. If the value to this option is zero, no checking is done. This check is only applied when the cropped region(s) are defined by their center (not by the vertices, see Crop modes).

The units of the value are interpreted based on the --mode value (in WCS or pixel units). The ultimate checked region size (in pixels) will be an odd integer around the center (converted from WCS, or when an even number of pixels are given to this option). In WCS mode, the value can be given as fractions, for example if the WCS units are in degrees, 0.1/3600 will correspond to a check size of 0.1 arcseconds.

Because survey regions don’t often have a clean square or rectangle shape, some of the pixels on the sides of the survey FITS image don’t commonly have any data and are blank (see Blank pixels). So when the catalog was not generated from the input image, it often happens that the image does not have data over some of the points.

When the given center of a crop falls in such regions or outside the dataset, and this option has a non-zero value, no crop will be created. Therefore with this option, you can specify a width of a small box (3 pixels is often good enough) around the central pixel of the cropped image. You can check which crops were created and which weren’t from the command-line (if --quiet was not called, see Operating mode options), or in Crop’s log file (see Crop output).

-p STR
--suffix=STR

The suffix (or post-fix) of the output files for when you want all the cropped images to have a special ending. One case where this might be helpful is when besides the science images, you want the weight images (or exposure maps, which are also distributed with survey images) of the cropped regions too. So in one run, you can set the input images to the science images and --suffix=_s.fits. In the next run you can set the weight images as input and --suffix=_w.fits.

-b
--noblank

Pixels outside of the input image that are in the crop box will not be used. By default they are filled with blank values (depending on type), see Blank pixels. This option only applies only in Image mode, see Crop modes.

-z
--zeroisnotblank

In float or double images, it is common to give the value of zero to blank pixels. If the input image type is one of these two types, such pixels will also be considered as blank. You can disable this behavior with this option, see Blank pixels.

Operating mode options:

-O STR
--mode=STR

Operate in Image mode or WCS mode when the input coordinates can be both image or WCS. The value must either be img or wcs, see Crop modes for a full description.

Previous: , Up: Invoking astcrop   [Contents][Index]

#### 6.1.4.2 Crop output

The string given to --output option will be interpreted depending on how many crops were requested, see Crop modes:

• When a catalog is given, the value of the --output (see Common options) will be read as the directory to store the output cropped images. Hence if it doesn’t already exist, Crop will abort with an error of a “No such file or directory” error.

The crop file names will consist of two parts: a variable part (the row number of each target starting from 1) along with a fixed string which you can set with the --suffix option. Optionally, you may also use the --namecol option to define a column in the input catalog to use as the file name instead of numbers.

• When only one crop is desired, the value to --output will be read as a file name. If no output is specified or if it is a directory, the output file name will follow the automatic output names of Gnuastro, see Automatic output: The string given to --suffix will be replaced with the .fits suffix of the input.

The header of each output cropped image will contain the names of the input image(s) it was cut from. If a name is longer than the 70 character space that the FITS standard allows for header keyword values, the name will be cut into several keywords from the nearest slash (/). The keywords have the following format: ICFn_m (for Crop File). Where n is the number of the image used in this crop and m is the part of the name (it can be broken into multiple keywords). Following the name is another keyword named ICFnPIX which shows the pixel range from that input image in the same syntax as Crop section syntax. So this string can be directly given to the --section option later.

Once done, a log file can be created in the current directory with the --log option. This file will have three columns and the same number of rows as the number of cropped images. There are also comments on the top of the log file explaining basic information about the run and descriptions for the columns. A short description of the columns is also given below:

1. The cropped image file name for that row.
2. The number of input images that were used to create that image.
3. A 0 if the central few pixels (value to the --checkcenter option) are blank and 1 if they aren’t. When the crop was not defined by its center (see Crop modes), or --checkcenter was given a value of 0 (see Invoking Crop), the center will not be checked and this column will be given a value of -1.

Next: , Previous: , Up: Data manipulation   [Contents][Index]

### 6.2 Arithmetic

It is commonly necessary to do operations on some or all of the elements of a dataset independently (pixels in an image). For example, in the reduction of raw data it is necessary to subtract the Sky value (Sky value) from each image image. Later (once the images as warped into a single grid using Warp for example, see Warp), the images are co-added (the output pixel grid is the average of the pixels of the individual input images). Arithmetic is Gnuastro’s program for such operations on your datasets directly from the command-line. It currently uses the reverse polish or post-fix notation, see Reverse polish notation and will work on the native data types of the input images/data to reduce CPU and RAM resources, see Numeric data types. For more information on how to run Arithmetic, please see Invoking Arithmetic.

Next: , Previous: , Up: Arithmetic   [Contents][Index]

#### 6.2.1 Reverse polish notation

The most common notation for arithmetic operations is the infix notation where the operator goes between the two operands, for example $$4+5$$. While the infix notation is the preferred way in most programming languages, currently the Gnuastro’s program (in particular Arithmetic and Table, when doing column arithmetic) do not use it. This is because it will require parenthesis which can complicate the implementation of the code. In the near future we do plan to also allow this notation94, but for the time being (due to time constraints on the developers), arithmetic operations can only be done in the post-fix notation (also known as reverse polish notation). The Wikipedia article provides some excellent explanation on this notation but here we will give a short summary here for self-sufficiency.

In the post-fix notation, the operator is placed after the operands, as we will see below this removes the need to define parenthesis for most ordinary operators. For example, instead of writing 5+6, we write 5 6 +. To easily understand how this notation works, you can think of each operand as a node in a “last-in-first-out” stack. Every time an operator is confronted, the operator pops the number of operands it needs from the top of the stack (so they don’t exist in the stack any more), does its operation and pushes the result back on top of the stack. So if you want the average of 5 and 6, you would write: 5 6 + 2 /. The operations that are done are:

1. 5 is an operand, so it is pushed to the top of the stack (which is initially empty).
2. 6 is an operand, so it is pushed to the top of the stack.
3. + is a binary operator, so it will pop the top two elements of the stack out of it, and perform addition on them (the order is $$5+6$$ in the example above). The result is 11 which is pushed to the top of the stack.
4. 2 is an operand so push it onto the top of the stack.
5. / is a binary operator, so pull out the top two elements of the stack (top-most is 2, then 11) and divide the second one by the first.

In the Arithmetic program, the operands can be FITS images or numbers (see Invoking Arithmetic). In Table’s column arithmetic, they can be any column or a number (see Column arithmetic).

With this notation, very complicated procedures can be created without the need for parenthesis or worrying about precedence. Even functions which take an arbitrary number of arguments can be defined in this notation. This is a very powerful notation and is used in languages like Postscript 95 which produces PDF files when compiled.

Next: , Previous: , Up: Arithmetic   [Contents][Index]

#### 6.2.2 Arithmetic operators

The recognized operators in Arithmetic are listed below. See Reverse polish notation for more on how the operators and operands should be ordered on the command-line. The operands to all operators can be a data array (for example a FITS image) or a number, the output will be an array or number according to the inputs. For example a number multiplied by an array will produce an array. The conditional operators will return pixel, or numerical values of 0 (false) or 1 (true) and stored in an unsigned char data type (see Numeric data types).

+

Addition, so “4 5 +” is equivalent to $$4+5$$.

-

Subtraction, so “4 5 -” is equivalent to $$4-5$$.

x

Multiplication, so “4 5 x” is equivalent to $$4\times5$$.

/

Division, so “4 5 /” is equivalent to $$4/5$$.

%

Modulo (remainder), so “3 2 %” is equivalent to $$1$$. Note that the modulo operator only works on integer types.

abs

Absolute value of first operand, so “4 abs” is equivalent to $$|4|$$.

pow

First operand to the power of the second, so “4.3 5 pow” is equivalent to $$4.3^{5}$$.

sqrt

The square root of the first operand, so “5 sqrt” is equivalent to $$\sqrt{5}$$. The output will have a floating point type, but its precision is determined from the input: if the input is a 64-bit floating point, the output will also be 64-bit. Otherwise, the output will be 32-bit floating point (see Numeric data types for the respective precision). Therefore if you require 64-bit precision in estimating the square root, convert the input to 64-bit floating point first, for example with 5 float64 sqrt.

log

Natural logarithm of first operand, so “4 log” is equivalent to $$ln(4)$$. The output type is determined from the input, see the explanation under sqrt for more.

log10

Base-10 logarithm of first operand, so “4 log10” is equivalent to $$\log(4)$$. The output type is determined from the input, see the explanation under sqrt for more.

minvalue

Minimum (non-blank) value in the top operand on the stack, so “a.fits minvalue” will push the minimum pixel value in this image onto the stack. Therefore this operator is mainly intended for data (for example images), if the top operand is a number, this operator just returns it without any change. So note that when this operator acts on a single image, the output will no longer be an image, but a number. The output of this operand is in the same type as the input.

maxvalue

Maximum (non-blank) value of first operand in the same type, similar to minvalue.

numbervalue

Number of non-blank elements in first operand in the uint64 type, similar to minvalue.

sumvalue

Sum of non-blank elements in first operand in the float32 type, similar to minvalue.

meanvalue

Mean value of non-blank elements in first operand in the float32 type, similar to minvalue.

stdvalue

Standard deviation of non-blank elements in first operand in the float32 type, similar to minvalue.

medianvalue

Median of non-blank elements in first operand with the same type, similar to minvalue.

min

For each pixel, find the minimum value in all given datasets. The output will have the same type as the input.

The first popped operand to this operator must be a positive integer number which specifies how many further operands should be popped from the stack. All the subsequently popped operands must have the same type and size. This operator (and all the variable-operand operators similar to it that are discussed below) will work in multi-threaded mode unless Arithmetic is called with the --numthreads=1 option, see Multi-threaded operations.

Each pixel of the output of the min operator will be given the minimum value of the same pixel from all the popped operands/images. For example the following command will produce an image with the same size and type as the three inputs, but each output pixel value will be the minimum of the same pixel’s values in all three input images.

$astarithmetic a.fits b.fits c.fits 3 min  Important notes: • NaN/blank pixels will be ignored, see Blank pixels. • The output will have the same type as the inputs. This is natural for the min and max operators, but for other similar operators (for example sum, or average) the per-pixel operations will be done in double precision floating point and then stored back in the input type. Therefore, if the input was an integer, C’s internal type conversion will be used. max For each pixel, find the maximum value in all given datasets. The output will have the same type as the input. This operator is called similar to the min operator, please see there for more. number For each pixel count the number of non-blank pixels in all given datasets. The output will be an unsigned 32-bit integer datatype (see Numeric data types). This operator is called similar to the min operator, please see there for more. sum For each pixel, calculate the sum in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the min operator, please see there for more. mean For each pixel, calculate the mean in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the min operator, please see there for more. std For each pixel, find the standard deviation in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the min operator, please see there for more. median For each pixel, find the median in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the min operator, please see there for more. quantile For each pixel, find the quantile from all given datasets. The output will have the same numeric data type and size as the input datasets. Besides the input datasets, the quantile operator also needs a single parameter (the requested quantile). The parameter should be the first popped operand, with a value between (and including) 0 and 1. The second popped operand must be the number of datasets to use. In the example below, the first-popped operand (0.7) is the quantile, the second-popped operand (3) is the number of datasets to pop. astarithmetic a.fits b.fits c.fits 3 0.7 quantile  sigclip-number For each pixel, find the sigma-clipped number (after removing outliers) in all given datasets. The output will have the an unsigned 32-bit integer type (see Numeric data types). This operator will combine the specified number of inputs into a single output that contains the number of remaining elements after $$\sigma$$-clipping on each element/pixel (for more on $$\sigma$$-clipping, see Sigma clipping). This operator is very similar to min, with the exception that it expects two operands (parameters for sigma-clipping) before the total number of inputs. The first popped operand is the termination criteria and the second is the multiple of $$\sigma$$. For example in the command below, the first popped operand (0.2) is the sigma clipping termination criteria. If the termination criteria is larger than, or equal to, 1 it is interpreted as the number of clips to do. But if it is between 0 and 1, then it is the tolerance level on the standard deviation (see Sigma clipping). The second popped operand (5) is the multiple of sigma to use in sigma-clipping. The third popped operand (10) is number of datasets that will be used (similar to the first popped operand to min). astarithmetic a.fits b.fits c.fits 3 5 0.2 sigclip-number  sigclip-median For each pixel, find the sigma-clipped median in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the sigclip-number operator, please see there for more. sigclip-mean For each pixel, find the sigma-clipped mean in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the sigclip-number operator, please see there for more. sigclip-std For each pixel, find the sigma-clipped standard deviation in all given datasets. The output will have the a single-precision (32-bit) floating point type. This operator is called similar to the sigclip-number operator, please see there for more. filter-mean Apply mean filtering (or moving average) on the input dataset. During mean filtering, each pixel (data element) is replaced by the mean value of all its surrounding pixels (excluding blank values). The number of surrounding pixels in each dimension (to calculate the mean) is determined through the earlier operands that have been pushed onto the stack prior to the input dataset. The number of necessary operands is determined by the dimensions of the input dataset (first popped operand). The order of the dimensions on the command-line is the order in FITS format. Here is one example: $ astarithmetic 5 4 image.fits filter-mean


In this example, each pixel is replaced by the mean of a 5 by 4 box around it. The box is 5 pixels along the first FITS dimension (horizontal when viewed in ds9) and 4 pixels along the second FITS dimension (vertical).

Each pixel will be placed in the center of the box that the mean is calculated on. If the given width along a dimension is even, then the center is assumed to be between the pixels (not in the center of a pixel). When the pixel is close to the edge, the pixels of the box that fall outside the image are ignored. Therefore, on the edge, less points will be used in calculating the mean.

The final effect of mean filtering is to smooth the input image, it is essentially a convolution with a kernel that has identical values for all its pixels (is flat), see Convolution process.

Note that blank pixels will also be affected by this operator: if there are any non-blank elements in the box surrounding a blank pixel, in the filtered image, it will have the mean of the non-blank elements, therefore it won’t be blank any more. If blank elements are important for your analysis, you can use the isblank with the where operator to set them back to blank after filtering.

filter-median

Apply median filtering on the input dataset. This is very similar to filter-mean, except that instead of the mean value of the box pixels, the median value is used to replace a pixel value. For more on how to use this operator, please see filter-mean.

The median is less susceptible to outliers compared to the mean. As a result, after median filtering, the pixel values will be more discontinuous than mean filtering.

filter-sigclip-mean

Apply a $$\sigma$$-clipped mean filtering onto the input dataset. This is very similar to filter-mean, except that all outliers (identified by the $$\sigma$$-clipping algorithm) have been removed, see Sigma clipping for more on the basics of this algorithm. As described there, two extra input parameters are necessary for $$\sigma$$-clipping: the multiple of $$\sigma$$ and the termination criteria. filter-sigclip-mean therefore needs to pop two other operands from the stack after the dimensions of the box.

For example the line below uses the same box size as the example of filter-mean. However, all elements in the box that are iteratively beyond $$3\sigma$$ of the distribution’s median are removed from the final calculation of the mean until the change in $$\sigma$$ is less than $$0.2$$.

astarithmetic 3 0.2 5 4 image.fits filter-sigclip-mean  The median (which needs a sorted dataset) is necessary for $$\sigma$$-clipping, therefore filter-sigclip-mean can be significantly slower than filter-mean. However, if there are strong outliers in the dataset that you want to ignore (for example emission lines on a spectrum when finding the continuum), this is a much better solution. filter-sigclip-median Apply a $$\sigma$$-clipped median filtering onto the input dataset. This operator and its necessary operands are almost identical to filter-sigclip-mean, except that after $$\sigma$$-clipping, the median value (which is less affected by outliers than the mean) is added back to the stack. interpolate-medianngb Interpolate all the blank elements of the second popped operand with the median of its nearest non-blank neighbors. The number of the nearest non-blank neighbors used to calculate the median is given by the first popped operand. Note that the distance of the nearest non-blank neighbors is irrelevant in this interpolation. interpolate-minngb Similar to interpolate-medianngb, but will fill the blank values of the dataset with the minimum value of the nearest neighbors. interpolate-maxngb Similar to interpolate-medianngb, but will fill the blank values of the dataset with the maximum value of the nearest neighbors. One useful implementation of this operator is to fill the saturated pixels of stars in images. collapse-sum Collapse the given dataset (second popped operand), by summing all elements along the first popped operand (a dimension in FITS standard: counting from one, from fastest dimension). The returned dataset has one dimension less compared to the input. The output will have a double-precision floating point type irrespective of the input dataset’s type. Doing the operation in double-precision (64-bit) floating point will help the collapse (summation) be affected less by floating point errors. But afterwards, single-precision floating points are usually enough in real (noisy) datasets. So depending on the type of the input and its nature, it is recommended to use one of the type conversion operators on the returned dataset. If any WCS is present, the returned dataset will also lack the respective dimension in its WCS matrix. Therefore, when the WCS is important for later processing, be sure that the input is aligned with the respective axes: all non-diagonal elements in the WCS matrix are zero. One common application of this operator is the creation of pseudo broad-band or narrow-band 2D images from 3D data cubes. For example integral field unit (IFU) data products that have two spatial dimensions (first two FITS dimensions) and one spectral dimension (third FITS dimension). The command below will collapse the whole third dimension into a 2D array the size of the first two dimensions, and then convert the output to single-precision floating point (as discussed above).  astarithmetic cube.fits 3 collapse-sum float32

collapse-mean

Similar to collapse-sum, but the returned dataset will be the mean value along the collapsed dimension, not the sum.

collapse-number

Similar to collapse-sum, but the returned dataset will be the number of non-blank values along the collapsed dimension. The output will have a 32-bit signed integer type. If the input dataset doesn’t have blank values, all the elements in the returned dataset will have a single value (the length of the collapsed dimension). Therefore this is mostly relevant when there are blank values in the dataset.

collapse-min

Similar to collapse-sum, but the returned dataset will have the same numeric type as the input and will contain the minimum value for each pixel along the collapsed dimension.

collapse-max

Similar to collapse-sum, but the returned dataset will have the same numeric type as the input and will contain the maximum value for each pixel along the collapsed dimension.

add-dimension

Build a higher-dimensional dataset from all the input datasets stacked after one another (along the slowest dimension). The first popped operand has to be a single number. It is used by the operator to know how many operands it should pop from the stack (and the size of the output in the new dimension). The rest of the operands must have the same size and numerical data type. This operator currently only works for 2D input operands, please contact us if you want inputs to have different dimensions.

The output’s WCS (which should have a different dimensionality compared to the inputs) can be read from another file with the --wcsfile option. If no file is specified for the WCS, the first dataset’s WCS will be used, you can later add/change the necessary WCS keywords with the FITS keyword modification features of the Fits program (see Fits).

If your datasets don’t have the same type, you can use the type transformation operators of Arithmetic that are discussed below. Just beware of overflow if you are transforming to a smaller type, see Numeric data types.

For example if you want to put the three img1.fits, img2.fits and img3.fits images (each a 2D dataset) into one 3D datacube, you can use this command:

$astarithmetic img1.fits img2.fits img3.fits 3 add-dimension  unique Remove all duplicate (and blank) elements from the first popped operand. The unique elements of the dataset will be stored in a single-dimensional dataset. Recall that by default, single-dimensional datasets are stored as a table column in the output. But you can use --onedasimage or --onedonstdout to respectively store them as a single-dimensional FITS array/image, or to print them on the standard output. erode Erode the foreground pixels (with value 1) of the input dataset (second popped operand). The first popped operand is the connectivity (see description in connected-components). Erosion is simply a flipping of all foreground pixels (with value 1) to background (with value 0) that are “touching” background pixels. “Touching” is defined by the connectivity. In effect, this carves off the outer borders of the foreground, making them thinner. This operator assumes a binary dataset (all pixels are 0 and 1). dilate Dilate the foreground pixels (with value 1) of the input dataset (second popped operand). The first popped operand is the connectivity (see description in connected-components). Dilation is simply a flipping of all background pixels (with value 0) to foreground (with value 1) that are “touching” foreground pixels. “Touching” is defined by the connectivity. In effect, this expands the outer borders of the foreground. This operator assumes a binary dataset (all pixels are 0 and 1). connected-components Find the connected components in the input dataset (second popped operand). The first popped is the connectivity used in the connected components algorithm. The second popped operand is the dataset where connected components are to be found. It is assumed to be a binary image (with values of 0 or 1). It must have an 8-bit unsigned integer type which is the format produced by conditional operators. This operator will return a labeled dataset where the non-zero pixels in the input will be labeled with a counter (starting from 1). The connectivity is a number between 1 and the number of dimensions in the dataset (inclusive). 1 corresponds to the weakest (symmetric) connectivity between elements and the number of dimensions the strongest. For example on a 2D image, a connectivity of 1 corresponds to 4-connected neighbors and 2 corresponds to 8-connected neighbors. One example usage of this operator can be the identification of regions above a certain threshold, as in the command below. With this command, Arithmetic will first separate all pixels greater than 100 into a binary image (where pixels with a value of 1 are above that value). Afterwards, it will label all those that are connected. $ astarithmetic in.fits 100 gt 2 connected-components


If your input dataset doesn’t have a binary type, but you know all its values are 0 or 1, you can use the uint8 operator (below) to convert it to binary.

fill-holes

Flip background (0) pixels surrounded by foreground (1) in a binary dataset. This operator takes two operands (similar to connected-components): the first popped operand is the connectivity (to define a hole) and the second is the binary (0 or 1 valued) dataset to fill holes in.

invert

Invert an unsigned integer dataset. This is the only operator that ignores blank values (which are set to be the maximum values in the unsigned integer types).

This is useful in cases where the target(s) has(have) been imaged in absorption as raw formats (which are unsigned integer types). With this option, the maximum value for the given type will be subtracted from each pixel value, thus “inverting” the image, so the target(s) can be treated as emission. This can be useful when the higher-level analysis methods/tools only work on emission (positive skew in the noise, not negative).

lt

Less than: If the second popped (or left operand in infix notation, see Reverse polish notation) value is smaller than the first popped operand, then this function will return a value of 1, otherwise it will return a value of 0. If both operands are images, then all the pixels will be compared with their counterparts in the other image. If only one operand is an image, then all the pixels will be compared with the single value (number) of the other operand. Finally if both are numbers, then the output is also just one number (0 or 1). When the output is not a single number, it will be stored as an unsigned char type.

le

Less or equal: similar to lt (‘less than’ operator), but returning 1 when the second popped operand is smaller or equal to the first.

gt

Greater than: similar to lt (‘less than’ operator), but returning 1 when the second popped operand is greater than the first.

ge

Greater or equal: similar to lt (‘less than’ operator), but returning 1 when the second popped operand is larger or equal to the first.

eq

Equality: similar to lt (‘less than’ operator), but returning 1 when the two popped operands are equal (to double precision floating point accuracy).

ne

Non-Equality: similar to lt (‘less than’ operator), but returning 1 when the two popped operands are not equal (to double precision floating point accuracy).

and

Logical AND: returns 1 if both operands have a non-zero value and 0 if both are zero. Both operands have to be the same kind: either both images or both numbers.

or

Logical OR: returns 1 if either one of the operands is non-zero and 0 only when both operators are zero. Both operands have to be the same kind: either both images or both numbers.

not

Logical NOT: returns 1 when the operand is zero and 0 when the operand is non-zero. The operand can be an image or number, for an image, it is applied to each pixel separately.

isblank

Test for a blank value (see Blank pixels). In essence, this is very similar to the conditional operators: the output is either 1 or 0 (see the ‘less than’ operator above). The difference is that it only needs one operand. Because of the definition of a blank pixel, a blank value is not even equal to itself, so you cannot use the equal operator above to select blank pixels. See the “Blank pixels” box below for more on Blank pixels in Arithmetic.

where

Change the input (pixel) value where/if a certain condition holds. The conditional operators above can be used to define the condition. Three operands are required for where. The input format is demonstrated in this simplified example:

$astarithmetic modify.fits binary.fits if-true.fits where  The value of any pixel in modify.fits that corresponds to a non-zero and non-blank pixel of binary.fits will be changed to the value of the same pixel in if-true.fits (this may also be a number). The 3rd and 2nd popped operands (modify.fits and binary.fits respectively, see Reverse polish notation) have to have the same dimensions/size. if-true.fits can be either a number, or have the same dimension/size as the other two. The 2nd popped operand (binary.fits) has to have uint8 (or unsigned char in standard C) type (see Numeric data types). It is treated as a binary dataset (with only two values: zero and non-zero, hence the name binary.fits in this example). However, commonly you won’t be dealing with an actual FITS file of a condition/binary image. You will probably define the condition in the same run based on some other reference image and use the conditional and logical operators above to make a true/false (or one/zero) image for you internally. For example the case below: $ astarithmetic in.fits reference.fits 100 gt new.fits where


In the example above, any of the in.fits pixels that has a value in reference.fits greater than 100, will be replaced with the corresponding pixel in new.fits. Effectively the reference.fits 100 gt part created the condition/binary image which was added to the stack (in memory) and later used by where. The command above is thus equivalent to these two commands:

$astarithmetic reference.fits 100 gt --output=binary.fits$ astarithmetic in.fits binary.fits new.fits where


Finally, the input operands are read and used independently, so you can use the same file more than once as any of the operands.

When the 1st popped operand to where (if-true.fits) is a single number, it may be a NaN value (or any blank value, depending on its type) like the example below (see Blank pixels). When the number is blank, it will be converted to the blank value of the type of the 3rd popped operand (in.fits). Hence, in the example below, all the pixels in reference.fits that have a value greater than 100, will become blank in the natural data type of in.fits (even though NaN values are only defined for floating point types).

$astarithmetic in.fits reference.fits 100 gt nan where  bitand Bitwise AND operator: only bits with values of 1 in both popped operands will get the value of 1, the rest will be set to 0. For example (assuming numbers can be written as bit strings on the command-line): 00101000 00100010 bitand will give 00100000. Note that the bitwise operators only work on integer type datasets. bitor Bitwise inclusive OR operator: The bits where at least one of the two popped operands has a 1 value get a value of 1, the others 0. For example (assuming numbers can be written as bit strings on the command-line): 00101000 00100010 bitand will give 00101010. Note that the bitwise operators only work on integer type datasets. bitxor Bitwise exclusive OR operator: A bit will be 1 if it differs between the two popped operands. For example (assuming numbers can be written as bit strings on the command-line): 00101000 00100010 bitand will give 00001010. Note that the bitwise operators only work on integer type datasets. lshift Bitwise left shift operator: shift all the bits of the first operand to the left by a number of times given by the second operand. For example (assuming numbers can be written as bit strings on the command-line): 00101000 2 lshift will give 10100000. This is equivalent to multiplication by 4. Note that the bitwise operators only work on integer type datasets. rshift Bitwise right shift operator: shift all the bits of the first operand to the right by a number of times given by the second operand. For example (assuming numbers can be written as bit strings on the command-line): 00101000 2 rshift will give 00001010. Note that the bitwise operators only work on integer type datasets. bitnot Bitwise not (more formally known as one’s complement) operator: flip all the bits of the popped operand (note that this is the only unary, or single operand, bitwise operator). In other words, any bit with a value of 0 is changed to 1 and vice-versa. For example (assuming numbers can be written as bit strings on the command-line): 00101000 bitnot will give 11010111. Note that the bitwise operators only work on integer type datasets/numbers. uint8 Convert the type of the popped operand to 8-bit unsigned integer type (see Numeric data types). The internal conversion of C will be used. int8 Convert the type of the popped operand to 8-bit signed integer type (see Numeric data types). The internal conversion of C will be used. uint16 Convert the type of the popped operand to 16-bit unsigned integer type (see Numeric data types). The internal conversion of C will be used. int16 Convert the type of the popped operand to 16-bit signed integer (see Numeric data types). The internal conversion of C will be used. uint32 Convert the type of the popped operand to 32-bit unsigned integer type (see Numeric data types). The internal conversion of C will be used. int32 Convert the type of the popped operand to 32-bit signed integer type (see Numeric data types). The internal conversion of C will be used. uint64 Convert the type of the popped operand to 64-bit unsigned integer (see Numeric data types). The internal conversion of C will be used. float32 Convert the type of the popped operand to 32-bit (single precision) floating point (see Numeric data types). The internal conversion of C will be used. float64 Convert the type of the popped operand to 64-bit (double precision) floating point (see Numeric data types). The internal conversion of C will be used. size Size of the dataset along a given FITS/Fortran dimension (counting from 1). The desired dimension should be the first popped operand and the dataset must be the second popped operand. The output will be a single unsigned integer (dimensions cannot be negative). For example, the following command will produce the size of the first extension/HDU (the default HDU) of a.fits along the second FITS axis. astarithmetic a.fits 2 size  set-AAA Set the characters after the dash (AAA in the case shown here) as a name for the first popped operand on the stack. The named dataset will be freed from memory as soon as it is no longer needed, or if the name is reset to refer to another dataset later in the command. This operator thus enables re-usability of a dataset without having to re-read it from a file every time it is necessary during a process. When a dataset is necessary more than once, this operator can thus help simplify reading/writing on the command-line (thus avoiding potential bugs), while also speeding up the processing. Like all operators, this operator pops the top operand off of the main processing stack, but unlike other operands, it won’t add anything back to the stack immediately. It will keep the popped dataset in memory through a separate list of named datasets (not on the main stack). That list will be used to add/copy any requested dataset to the main processing stack when the name is called. The name to give the popped dataset is part of the operator’s name. For example the set-a operator of the command below, gives the name “a” to the contents of image.fits. This name is then used instead of the actual filename to multiply the dataset by two. $ astarithmetic image.fits set-a a 2 x


The name can be any string, but avoid strings ending with standard filename suffixes (for example .fits)96.

One example of the usefulness of this operator is in the where operator. For example, let’s assume you want to mask all pixels larger than 5 in image.fits (extension number 1) with a NaN value. Without setting a name for the dataset, you have to read the file two times from memory in a command like this:

$astarithmetic image.fits image.fits 5 gt nan where -g1  But with this operator you can simply give image.fits the name i and simplify the command above to the more readable one below (which greatly helps when the filename is long): $ astarithmetic image.fits set-i   i i 5 gt nan where

tofile-AAA

Write the top operand on the operands stack into a file called AAA (can be any FITS file name) without changing the operands stack. If you don’t need the dataset any more and would like to free it, see the tofilefree operator below.

By default, any file that is given to this operator is deleted before Arithmetic actually starts working on the input datasets. The deletion can be deactivated with the --dontdelete option (as in all Gnuastro programs, see Input/Output options). If the same FITS file is given to this operator multiple times, it will contain multiple extensions (in the same order that it was called.

For example the operator tofile-check.fits will write the top operand to check.fits. Since it doesn’t modify the operands stack, this operator is very convenient when you want to debug, or understanding, a string of operators and operands given to Arithmetic: simply put tofile-AAA anywhere in the process to see what is happening behind the scenes without modifying the overall process.

tofilefree-AAA

Similar to the tofile operator, with the only difference that the dataset that is written to a file is popped from the operand stack and freed from memory (cannot be used any more).

 Blank pixels in Arithmetic: Blank pixels in the image (see Blank pixels) will be stored based on the data type. When the input is floating point type, blank values are NaN. One aspect of NaN values is that by definition they will fail on any comparison. Hence both equal and not-equal operators will fail when both their operands are NaN! Therefore, the only way to guarantee selection of blank pixels is through the isblank operator explained above. One way you can exploit this property of the NaN value to your advantage is when you want a fully zero-valued image (even over the blank pixels) based on an already existing image (with same size and world coordinate system settings). The following command will produce this for you: $astarithmetic input.fits nan eq --output=all-zeros.fits  Note that on the command-line you can write NaN in any case (for example NaN, or NAN are also acceptable). Reading NaN as a floating point number in Gnuastro isn’t case-sensitive. Previous: , Up: Arithmetic [Contents][Index] #### 6.2.3 Invoking Arithmetic Arithmetic will do pixel to pixel arithmetic operations on the individual pixels of input data and/or numbers. For the full list of operators with explanations, please see Arithmetic operators. Any operand that only has a single element (number, or single pixel FITS image) will be read as a number, the rest of the inputs must have the same dimensions. The general template is: $ astarithmetic [OPTION...] ASTRdata1 [ASTRdata2] OPERATOR ...


One line examples:

## Calculate (10.32-3.84)^2.7 quietly (will just print 155.329):
$astarithmetic -q 10.32 3.84 - 2.7 pow ## Inverse the input image (1/pixel):$ astarithmetic 1 image.fits / --out=inverse.fits

## Multiply each pixel in image by -1:
$astarithmetic image.fits -1 x --out=negative.fits ## Subtract extension 4 from extension 1 (counting from zero):$ astarithmetic image.fits image.fits - --out=skysub.fits           \
--hdu=1 --hdu=4

## Add two images, then divide them by 2 (2 is read as floating point):
## Note that without the '.0', the '2' will be read/used as an integer.
$astarithmetic image1.fits image2.fits + 2.0 / --out=average.fits ## Use Arithmetic's average operator:$ astarithmetic image1.fits image2.fits average --out=average.fits

## Calculate the median of three images in three separate extensions:
$astarithmetic img1.fits img2.fits img3.fits median \ -h0 -h1 -h2 --out=median.fits  Arithmetic’s notation for giving operands to operators is fully described in Reverse polish notation. The output dataset is last remaining operand on the stack. When the output dataset a single number, it will be printed on the command-line. When the output is an array, it will be stored as a file. The name of the final file can be specified with the --output option, but if its not given, Arithmetic will use “automatic output” on the name of the first FITS image encountered to generate an output file name, see Automatic output. By default, if the output file already exists, it will be deleted before Arithmetic starts operation. However, this can be disabled with the --dontdelete option (see below). At any point during Arithmetic’s operation, you can also write the top operand on the stack to a file, using the tofile or tofilefree operators, see Arithmetic operators. By default, the world coordinate system (WCS) information of the output dataset will be taken from the first input image (that contains a WCS) on the command-line. This can be modified with the --wcsfile and --wcshdu options described below. When the --quiet option isn’t given, the name and extension of the dataset used for the output’s WCS is printed on the command-line. Through operators like those starting with collapse-, the dimensionality of the inputs may not be the same as the outputs. By default, when the output is 1D, Arithmetic will write it as a table, not an image/array. The format of the output table (plain text or FITS ASCII or binary) can be set with the --tableformat option, see Input/Output options). You can disable this feature (write 1D arrays as FITS images/arrays, or to the standard output) with the --onedasimage or --onedonstdout options. See Common options for a review of the options in all Gnuastro programs. Arithmetic just redefines the --hdu and --dontdelete options as explained below. -h INT/STR --hdu INT/STR The header data unit of the input FITS images, see Input/Output options. Unlike most options in Gnuastro (which will ultimately only have one value for this option), Arithmetic allows --hdu to be called multiple times and the value of each invocation will be stored separately (for the unlimited number of input images you would like to use). Recall that for other programs this (common) option only takes a single value. So in other programs, if you specify it multiple times on the command-line, only the last value will be used and in the configuration files, it will be ignored if it already has a value. The order of the values to --hdu has to be in the same order as input FITS images. Options are first read from the command-line (from left to right), then top-down in each configuration file, see Configuration file precedence. If the number of HDUs is less than the number of input images, Arithmetic will abort and notify you. However, if there are more HDUs than FITS images, there is no problem: they will be used in the given order (every time a FITS image comes up on the stack) and the extra HDUs will be ignored in the end. So there is no problem with having extra HDUs in the configuration files and by default several HDUs with a value of 0 are kept in the system-wide configuration file when you install Gnuastro. -g INT/STR --globalhdu INT/STR Use the value to this option as the HDU of all input FITS files. This option is very convenient when you have many input files and the dataset of interest is in the same HDU of all the files. When this option is called, any values given to the --hdu option (explained above) are ignored and will not be used. -w STR --wcsfile STR FITS Filename containing the WCS structure that must be written to the output. The HDU/extension should be specified with --wcshdu. When this option is used, the respective WCS will be read before any processing is done on the command-line and directly used in the final output. If the given file doesn’t have any WCS, then the default WCS (first file on the command-line with WCS) will be used in the output. This option will mostly be used when the default file (first of the set of inputs) is not the one containing your desired WCS. But with this option, you can also use Arithmetic to rewrite/change the WCS of an existing FITS dataset from another file: $ astarithmetic data.fits --wcsfile=other.fits -ofinal.fits

-W STR
--wcshdu STR

HDU/extension to read the WCS within the file given to --wcsfile. For more, see the description of --wcsfile.

-O
--onedasimage

When final dataset to write as output only has one dimension, write it as a FITS image/array. By default, if the output is 1D, it will be written as a table, see above.

-s
--onedonstdout

When final dataset to write as output only has one dimension, print it on the standard output, not in a file. By default, if the output is 1D, it will be written as a table, see above.

-D
--dontdelete

Don’t delete the output file, or files given to the tofile or tofilefree operators, if they already exist. Instead append the desired datasets to the extensions that already exist in the respective file. Note it doesn’t matter if the final output file name is given with the --output option, or determined automatically.

Arithmetic treats this option differently from its default operation in other Gnuastro programs (see Input/Output options). If the output file exists, when other Gnuastro programs are called with --dontdelete, they simply complain and abort. But when Arithmetic is called with --dontdelete, it will appended the dataset(s) to the existing extension(s) in the file.

Arithmetic accepts two kinds of input: images and numbers. Images are considered to be any of the inputs that is a file name of a recognized type (see Arguments) and has more than one element/pixel. Numbers on the command-line will be read into the smallest type (see Numeric data types) that can store them, so -2 will be read as a char type (which is signed on most systems and can thus keep negative values), 2500 will be read as an unsigned short (all positive numbers will be read as unsigned), while 3.1415926535897 will be read as a double and 3.14 will be read as a float. To force a number to be read as float, put a . after it (possibly followed by a zero for easier readability), or add an f after it. Hence while 5 will be read as an integer, 5., 5.0 or 5f will be added to the stack as float (see Reverse polish notation).

Unless otherwise stated (in Arithmetic operators), the operators can deal with numeric multiple data types (see Numeric data types). For example in “a.fits b.fits +”, the image types can be long and float. In such cases, C’s internal type conversion will be used. The output type will be set to the higher-ranking type of the two inputs. Unsigned integer types have smaller ranking than their signed counterparts and floating point types have higher ranking than the integer types. So the internal C type conversions done in the example above are equivalent to this piece of C:

size_t i;
long a[100];
float b[100], out[100];
for(i=0;i<100;++i) out[i]=a[i]+b[i];


Relying on the default C type conversion significantly speeds up the processing and also requires less RAM (when using very large images).

Some operators can only work on integer types (of any length, for example bitwise operators) while others only work on floating point types, (currently only the pow operator). In such cases, if the operand type(s) are different, an error will be printed. Arithmetic also comes with internal type conversion operators which you can use to convert the data into the appropriate type, see Arithmetic operators.

The hyphen (-) can be used both to specify options (see Options) and also to specify a negative number which might be necessary in your arithmetic. In order to enable you to do this, Arithmetic will first parse all the input strings and if the first character after a hyphen is a digit, then that hyphen is temporarily replaced by the vertical tab character which is not commonly used. The arguments are then parsed and these strings will not be specified as an option. Then the given arguments are parsed and any vertical tabs are replaced back with a hyphen so they can be read as negative numbers. Therefore, as long as the names of the files you want to work on, don’t start with a vertical tab followed by a digit, there is no problem. An important consequence of this implementation is that you should not write negative fractions like this: -.3, instead write them as -0.3.

Without any images, Arithmetic will act like a simple calculator and print the resulting output number on the standard output like the first example above. If you really want such calculator operations on the command-line, AWK (GNU AWK is the most common implementation) is much faster, easier and much more powerful. For example, the numerical one-line example above can be done with the following command. In general AWK is a fantastic tool and GNU AWK has a wonderful manual (https://www.gnu.org/software/gawk/manual/). So if you often confront situations like this, or have to work with large text tables/catalogs, be sure to checkout AWK and simplify your life.

$echo "" | awk '{print (10.32-3.84)^2.7}' 155.329  Next: , Previous: , Up: Data manipulation [Contents][Index] ### 6.3 Convolve On an image, convolution can be thought of as a process to blur or remove the contrast in an image. If you are already familiar with the concept and just want to run Convolve, you can jump to Convolution kernel and Invoking Convolve and skip the lengthy introduction on the basic definitions and concepts of convolution. There are generally two methods to convolve an image. The first and more intuitive one is in the “spatial domain” or using the actual image pixel values, see Spatial domain convolution. The second method is when we manipulate the “frequency domain”, or work on the magnitudes of the different frequencies that constitute the image, see Frequency domain and Fourier operations. Understanding convolution in the spatial domain is more intuitive and thus recommended if you are just starting to learn about convolution. However, getting a good grasp of the frequency domain is a little more involved and needs some concentration and some mathematical proofs. However, its reward is a faster operation and more importantly a very fundamental understanding of this very important operation. Convolution of an image will generally result in blurring the image because it mixes pixel values. In other words, if the image has sharp differences in neighboring pixel values97, those sharp differences will become smoother. This has very good consequences in detection of signal in noise for example. In an actual observed image, the variation in neighboring pixel values due to noise can be very high. But after convolution, those variations will decrease and we have a better hope in detecting the possible underlying signal. Another case where convolution is extensively used is in mock images and modeling in general, convolution can be used to simulate the effect of the atmosphere or the optical system on the mock profiles that we create, see Point spread function. Convolution is a very interesting and important topic in any form of signal analysis (including astronomical observations). So we have thoroughly98 explained the concepts behind it in the following sub-sections. Next: , Previous: , Up: Convolve [Contents][Index] #### 6.3.1 Spatial domain convolution The pixels in an input image represent different “spatial” positions, therefore when convolution is done only using the actual input pixel values, we name the process as being done in the “Spatial domain”. In particular this is in contrast to the “frequency domain” that we will discuss later in Frequency domain and Fourier operations. In the spatial domain (and in realistic situations where the image and the convolution kernel don’t extend to infinity), convolution is the process of changing the value of one pixel to the weighted average of all the pixels in its neighborhood. The ‘neighborhood’ of each pixel (how many pixels in which direction) and the ‘weight’ function (how much each neighboring pixel should contribute depending on its position) are given through a second image which is known as a “kernel”99. Next: , Previous: , Up: Spatial domain convolution [Contents][Index] #### 6.3.1.1 Convolution process In convolution, the kernel specifies the weight and positions of the neighbors of each pixel. To find the convolved value of a pixel, the central pixel of the kernel is placed on that pixel. The values of each overlapping pixel in the kernel and image are multiplied by each other and summed for all the kernel pixels. To have one pixel in the center, the sides of the convolution kernel have to be an odd number. This process effectively mixes the pixel values of each pixel with its neighbors, resulting in a blurred image compared to the sharper input image. Formally, convolution is one kind of linear ‘spatial filtering’ in image processing texts. If we assume that the kernel has $$2a+1$$ and $$2b+1$$ pixels on each side, the convolved value of a pixel placed at $$x$$ and $$y$$ ($$C_{x,y}$$) can be calculated from the neighboring pixel values in the input image ($$I$$) and the kernel ($$K$$) from $$C_{x,y}=\sum_{s=-a}^{a}\sum_{t=-b}^{b}K_{s,t}\times{}I_{x+s,y+t}.$$ Any pixel coordinate that is outside of the image in the equation above will be considered to be zero. When the kernel is symmetric about its center the blurred image has the same orientation as the original image. However, if the kernel is not symmetric, the image will be affected in the opposite manner, this is a natural consequence of the definition of spatial filtering. In order to avoid this we can rotate the kernel about its center by 180 degrees so the convolved output can have the same original orientation. Technically speaking, only if the kernel is flipped the process is known Convolution. If it isn’t it is known as Correlation. To be a weighted average, the sum of the weights (the pixels in the kernel) have to be unity. This will have the consequence that the convolved image of an object and unconvolved object will have the same brightness (see Flux Brightness and magnitude), which is natural, because convolution should not eat up the object photons, it only disperses them. Previous: , Up: Spatial domain convolution [Contents][Index] #### 6.3.1.2 Edges in the spatial domain In purely ‘linear’ spatial filtering (convolution), there are problems on the edges of the input image. Here we will explain the problem in the spatial domain. For a discussion of this problem from the frequency domain perspective, see Edges in the frequency domain. The problem originates from the fact that on the edges, in practice100, the sum of the weights we use on the actual image pixels is not unity. For example, as discussed above, a profile in the center of an image will have the same brightness before and after convolution. However, for partially imaged profile on the edge of the image, the brightness (sum of its pixel fluxes within the image, see Flux Brightness and magnitude) will not be equal, some of the flux is going to be ‘eaten’ by the edges. If you ran $ make check on the source files of Gnuastro, you can see the this effect by comparing the convolve_frequency.fits with convolve_spatial.fits in the ./tests/ directory. In the spatial domain, by default, no assumption will be made about pixels outside of the image or any blank pixels in the image. The problem explained above will also occur on the sides of blank regions (see Blank pixels). The solution to this edge effect problem is only possible in the spatial domain. For pixels near the edge, we have to abandon the assumption that the sum of the kernel pixels is unity during the convolution process101. So taking $$W$$ as the sum of the kernel pixels that overlapped with non-blank and in-image pixels, the equation in Convolution process will become:

$$C_{x,y}= { \sum_{s=-a}^{a}\sum_{t=-b}^{b}K_{s,t}\times{}I_{x+s,y+t} \over W}.$$

In this manner, objects which are near the edges of the image or blank pixels will also have the same brightness (within the image) before and after convolution. This correction is applied by default in Convolve when convolving in the spatial domain. To disable it, you can use the --noedgecorrection option. In the frequency domain, there is no way to avoid this loss of flux near the edges of the image, see Edges in the frequency domain for an interpretation from the frequency domain perspective.

Note that the edge effect discussed here is different from the one in If convolving afterwards. In making mock images we want to simulate a real observation. In a real observation the images of the galaxies on the sides of the CCD are first blurred by the atmosphere and instrument, then imaged. So light from the parts of a galaxy which are immediately outside the CCD will affect the parts of the galaxy which are covered by the CCD. Therefore in modeling the observation, we have to convolve an image that is larger than the input image by exactly half of the convolution kernel. We can hence conclude that this correction for the edges is only useful when working on actual observed images (where we don’t have any more data on the edges) and not in modeling.

Next: , Previous: , Up: Convolve   [Contents][Index]

#### 6.3.2 Frequency domain and Fourier operations

Getting a good grip on the frequency domain is usually not an easy job! So we have decided to give the issue a complete review here. Convolution in the frequency domain (see Convolution theorem) heavily relies on the concepts of Fourier transform (Fourier transform) and Fourier series (Fourier series) so we will be investigating these important operations first. It has become something of a cliché for people to say that the Fourier series “is a way to represent a (wave-like) function as the sum of simple sine waves” (from Wikipedia). However, sines themselves are abstract functions, so this statement really adds no extra layer of physical insight.

Before jumping head-first into the equations and proofs, we will begin with a historical background to see how the importance of frequencies actually roots in our ancient desire to see everything in terms of circles. A short review of how the complex plane should be interpreted is then given. Having paved the way with these two basics, we define the Fourier series and subsequently the Fourier transform. The final aim is to explain discrete Fourier transform, however some very important concepts need to be solidified first: The Dirac comb, convolution theorem and sampling theorem. So each of these topics are explained in their own separate sub-sub-section before going on to the discrete Fourier transform. Finally we revisit (after Edges in the spatial domain) the problem of convolution on the edges, but this time in the frequency domain. Understanding the sampling theorem and the discrete Fourier transform is very important in order to be able to pull out valuable science from the discrete image pixels. Therefore we have included the mathematical proofs and figures so you can have a clear understanding of these very important concepts.

#### 6.3.2.1 Fourier series historical background

Ever since the ancient times, the circle has been (and still is) the simplest shape for abstract comprehension. All you need is a center point and a radius and you are done. All the points on a circle are at a fixed distance from the center. However, the moment you try to connect this elegantly simple and beautiful abstract construct (the circle) with the real world (for example compute its area or its circumference), things become really hard (ideally, impossible) because the irrational number $$\pi$$ gets involved.

The key to understanding the Fourier series (thus the Fourier transform and finally the Discrete Fourier Transform) is our ancient desire to express everything in terms of circles or the most exceptionally simple and elegant abstract human construct. Most people prefer to say the same thing in a more ahistorical manner: to break a function into sines and cosines. As the term “ancient” in the previous sentence implies, Jean-Baptiste Joseph Fourier (1768 – 1830 A.D.) was not the first person to do this. The main reason we know this process by his name today is that he came up with an ingenious method to find the necessary coefficients (radius of) and frequencies (“speed” of rotation on) the circles for any generic (integrable) function.

Figure 6.1: Epicycles and the Fourier series. Left: A demonstration of Mercury’s epicycles relative to the “center of the world” by Qutb al-Din al-Shirazi (1236 – 1311 A.D.) retrieved from Wikipedia. Middle and Right: How adding more epicycles (or terms in the Fourier series) will approximate functions. The right animation is also available.

Like most aspects of mathematics, this process of interpreting everything in terms of circles, began for astronomical purposes. When astronomers noticed that the orbit of Mars and other outer planets, did not appear to be a simple circle (as everything should have been in the heavens). At some point during their orbit, the revolution of these planets would become slower, stop, go back a little (in what is known as the retrograde motion) and then continue going forward again.

The correction proposed by Ptolemy (90 – 168 A.D.) was the most agreed upon. He put the planets on Epicycles or circles whose center itself rotates on a circle whose center is the earth. Eventually, as observations became more and more precise, it was necessary to add more and more epicycles in order to explain the complex motions of the planets102. Figure 6.1(Left) shows an example depiction of the epicycles of Mercury in the late 13th century.

Of course we now know that if they had abdicated the Earth from its throne in the center of the heavens and allowed the Sun to take its place, everything would become much simpler and true. But there wasn’t enough observational evidence for changing the “professional consensus” of the time to this radical view suggested by a small minority103. So the pre-Galilean astronomers chose to keep Earth in the center and find a correction to the models (while keeping the heavens a purely “circular” order).

The main reason we are giving this historical background which might appear off topic is to give historical evidence that while such “approximations” do work and are very useful for pragmatic reasons (like measuring the calendar from the movement of astronomical bodies). They offer no physical insight. The astronomers who were involved with the Ptolemaic world view had to add a huge number of epicycles during the centuries after Ptolemy in order to explain more accurate observations. Finally the death knell of this world-view was Galileo’s observations with his new instrument (the telescope). So the physical insight, which is what Astronomers and Physicists are interested in (as opposed to Mathematicians and Engineers who just like proving and optimizing or calculating!) comes from being creative and not limiting our selves to such approximations. Even when they work.

#### 6.3.2.2 Circles and the complex plane

Before going onto the derivation, it is also useful to review how the complex numbers and their plane relate to the circles we talked about above. The two schematics in the middle and right of Figure 6.1 show how a 1D function of time can be made using the 2D real and imaginary surface. Seeing the animation in Wikipedia will really help in understanding this important concept. At each point in time, we take the vertical coordinate of the point and use it to find the value of the function at that point in time. Figure 6.2 shows this relation with the axes marked.

Leonhard Euler104 (1707 – 1783 A.D.) showed that the complex exponential ($$e^{iv}$$ where $$v$$ is real) is periodic and can be written as: $$e^{iv}=\cos{v}+isin{v}$$. Therefore $$e^{iv+2\pi}=e^{iv}$$. Later, Caspar Wessel (mathematician and cartographer 1745 – 1818 A.D.) showed how complex numbers can be displayed as vectors on a plane. Euler’s identity might seem counter intuitive at first, so we will try to explain it geometrically (for deeper physical insight). On the real-imaginary 2D plane (like the left hand plot in each box of Figure 6.2), multiplying a number by $$i$$ can be interpreted as rotating the point by $$90$$ degrees (for example the value $$3$$ on the real axis becomes $$3i$$ on the imaginary axis). On the other hand, $$e\equiv\lim_{n\rightarrow\infty}(1+{1\over n})^n$$, therefore, defining $$m\equiv nu$$, we get:

$$e^{u}=\lim_{n\rightarrow\infty}\left(1+{1\over n}\right)^{nu} =\lim_{n\rightarrow\infty}\left(1+{u\over nu}\right)^{nu} =\lim_{m\rightarrow\infty}\left(1+{u\over m}\right)^{m}$$

Taking $$u\equiv iv$$ the result can be written as a generic complex number (a function of $$v$$):

$$e^{iv}=\lim_{m\rightarrow\infty}\left(1+i{v\over m}\right)^{m}=a(v)+ib(v)$$

For $$v=\pi$$, a nice geometric animation of going to the limit can be seen on Wikipedia. We see that $$\lim_{m\rightarrow\infty}a(\pi)=-1$$, while $$\lim_{m\rightarrow\infty}b(\pi)=0$$, which gives the famous $$e^{i\pi}=-1$$ equation. The final value is the real number $$-1$$, however the distance of the polygon points traversed as $$m\rightarrow\infty$$ is half the circumference of a circle or $$\pi$$, showing how $$v$$ in the equation above can be interpreted as an angle in units of radians and therefore how $$a(v)=cos(v)$$ and $$b(v)=sin(v)$$.

Since $$e^{iv}$$ is periodic (let’s assume with a period of $$T$$), it is more clear to write it as $$v\equiv{2{\pi}n\over T}t$$ (where $$n$$ is an integer), so $$e^{iv}=e^{i{2{\pi}n\over T}t}$$. The advantage of this notation is that the period ($$T$$) is clearly visible and the frequency ($$2{\pi}n \over T$$, in units of 1/cycle) is defined through the integer $$n$$. In this notation, $$t$$ is in units of “cycle”s.

As we see from the examples in Figure 6.1 and Figure 6.2, for each constituting frequency, we need a respective ‘magnitude’ or the radius of the circle in order to accurately approximate the desired 1D function. The concepts of “period” and “frequency” are relatively easy to grasp when using temporal units like time because this is how we define them in every-day life. However, in an image (astronomical data), we are dealing with spatial units like distance. Therefore, by one “period” we mean the distance at which the signal is identical and frequency is defined as the inverse of that spatial “period”. The complex circle of Figure 6.2 can be thought of the Moon rotating about Earth which is rotating around the Sun; so the “Real (signal)” axis shows the Moon’s position as seen by a distant observer on the Sun as time goes by. Because of the scalar (not having any direction or vector) nature of time, Figure 6.2 is easier to understand in units of time. When thinking about spatial units, mentally replace the “Time (sec)” axis with “Distance (meters)”. Because length has direction and is a vector, visualizing the rotation of the imaginary circle and the advance along the “Distance (meters)” axis is not as simple as temporal units like time.

Figure 6.2: Relation between the real (signal), imaginary ($$i\equiv\sqrt{-1}$$) and time axes at two snapshots of time.

Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

#### 6.3.2.3 Fourier series

In astronomical images, our variable (brightness, or number of photo-electrons, or signal to be more generic) is recorded over the 2D spatial surface of a camera pixel. However to make things easier to understand, here we will assume that the signal is recorded in 1D (assume one row of the 2D image pixels). Also for this section and the next (Fourier transform) we will be talking about the signal before it is digitized or pixelated. Let’s assume that we have the continuous function $$f(l)$$ which is integrable in the interval $$[l_0, l_0+L]$$ (always true in practical cases like images). Take $$l_0$$ as the position of the first pixel in the assumed row of the image and $$L$$ as the width of the image along that row. The units of $$l_0$$ and $$L$$ can be in any spatial units (for example meters) or an angular unit (like radians) multiplied by a fixed distance which is more common.

To approximate $$f(l)$$ over this interval, we need to find a set of frequencies and their corresponding ‘magnitude’s (see Circles and the complex plane). Therefore our aim is to show $$f(l)$$ as the following sum of periodic functions:

$$f(l)=\displaystyle\sum_{n=-\infty}^{\infty}c_ne^{i{2{\pi}n\over L}l}$$

Note that the different frequencies ($$2{\pi}n/L$$, in units of cycles per meters for example) are not arbitrary. They are all integer multiples of the fundamental frequency of $$\omega_0=2\pi/L$$. Recall that $$L$$ was the length of the signal we want to model. Therefore, we see that the smallest possible frequency (or the frequency resolution) in the end, depends on the length we observed the signal or $$L$$. In the case of each dimension on an image, this is the size of the image in the respective dimension. The frequencies have been defined in this “harmonic” fashion to insure that the final sum is periodic outside of the $$[l_0, l_0+L]$$ interval too. At this point, you might be thinking that the sky is not periodic with the same period as my camera’s view angle. You are absolutely right! The important thing is that since your camera’s observed region is the only region we are “observing” and will be using, the rest of the sky is irrelevant; so we can safely assume the sky is periodic outside of it. However, this working assumption will haunt us later in Edges in the frequency domain.

The frequencies are thus determined by definition. So all we need to do is to find the coefficients ($$c_n$$), or magnitudes, or radii of the circles for each frequency which is identified with the integer $$n$$. Fourier’s approach was to multiply both sides with a fixed term:

$$f(l)e^{-i{2{\pi}m\over L}l}=\displaystyle\sum_{n=-\infty}^{\infty}c_ne^{i{2{\pi}(n-m)\over L}l}$$

where $$m>0$$105. We can then integrate both sides over the observation period:

$$\int_{l_0}^{l_0+L}f(l)e^{-i{2{\pi}m\over L}l}dl =\int_{l_0}^{l_0+L}\displaystyle\sum_{n=-\infty}^{\infty}c_ne^{i{2{\pi}(n-m)\over L}l}dl=\displaystyle\sum_{n=-\infty}^{\infty}c_n\int_{l_0}^{l_0+L}e^{i{2{\pi}(n-m)\over L}l}dl$$

Both $$n$$ and $$m$$ are positive integers. Also, we know that a complex exponential is periodic so after one period ($$L$$) it comes back to its starting point. Therefore $$\int_{l_0}^{l_0+L}e^{2{\pi}k/L}dl=0$$ for any $$k>0$$. However, when $$k=0$$, this integral becomes: $$\int_{l_0}^{l_0+T}e^0dt=\int_{l_0}^{l_0+T}dt=T$$. Hence since the integral will be zero for all $$n{\neq}m$$, we get:

$$\displaystyle\sum_{n=-\infty}^{\infty}c_n\int_{l_0}^{l_0+T}e^{i{2{\pi}(n-m)\over L}l}dl=Lc_m$$

The origin of the axis is fundamentally an arbitrary position. So let’s set it to the start of the image such that $$l_0=0$$. So we can find the “magnitude” of the frequency $$2{\pi}m/L$$ within $$f(l)$$ through the relation:

$$c_m={1\over L}\int_{0}^{L}f(l)e^{-i{2{\pi}m\over L}l}dl$$

Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

#### 6.3.2.4 Fourier transform

In Fourier series, we had to assume that the function is periodic outside of the desired interval with a period of $$L$$. Therefore, assuming that $$L\rightarrow\infty$$ will allow us to work with any function. However, with this approximation, the fundamental frequency ($$\omega_0$$) or the frequency resolution that we discussed in Fourier series will tend to zero: $$\omega_0\rightarrow0$$. In the equation to find $$c_m$$, every $$m$$ represented a frequency (multiple of $$\omega_0$$) and the integration on $$l$$ removes the dependence of the right side of the equation on $$l$$, making it only a function of $$m$$ or frequency. Let’s define the following two variables:

$$\omega{\equiv}m\omega_0={2{\pi}m\over L}$$

$$F(\omega){\equiv}Lc_m$$

The equation to find the coefficients of each frequency in Fourier series thus becomes:

$$F(\omega)=\int_{-\infty}^{\infty}f(l)e^{-i{\omega}l}dl.$$

The function $$F(\omega)$$ is thus the Fourier transform of $$f(l)$$ in the frequency domain. So through this transformation, we can find (analyze) the magnitudes of the constituting frequencies or the value in the frequency space106 of our spatial input function. The great thing is that we can also do the reverse and later synthesize the input function from its Fourier transform. Let’s do it: with the approximations above, multiply the right side of the definition of the Fourier Series (Fourier series) with $$1=L/L=({\omega_0}L)/(2\pi)$$:

$$f(l)={1\over 2\pi}\displaystyle\sum_{n=-\infty}^{\infty}Lc_ne^{{2{\pi}in\over L}l}\omega_0={1\over 2\pi}\displaystyle\sum_{n=-\infty}^{\infty}F(\omega)e^{i{\omega}l}\Delta\omega$$

To find the right most side of this equation, we renamed $$\omega_0$$ as $$\Delta\omega$$ because it was our resolution, $$2{\pi}n/L$$ was written as $$\omega$$ and finally, $$Lc_n$$ was written as $$F(\omega)$$ as we defined above. Now, as $$L\rightarrow\infty$$, $$\Delta\omega\rightarrow0$$ so we can write:

$$f(l)={1\over 2\pi}\int_{-\infty}^{\infty}F(\omega)e^{i{\omega}l}d\omega$$

Together, these two equations provide us with a very powerful set of tools that we can use to process (analyze) and recreate (synthesize) the input signal. Through the first equation, we can break up our input function into its constituent frequencies and analyze it, hence it is also known as analysis. Using the second equation, we can synthesize or make the input function from the known frequencies and their magnitudes. Thus it is known as synthesis. Here, we symbolize the Fourier transform (analysis) and its inverse (synthesis) of a function $$f(l)$$ and its Fourier Transform $$F(\omega)$$ as $${\cal F}[f]$$ and $${\cal F}^{-1}[F]$$.

Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

#### 6.3.2.5 Dirac delta and comb

The Dirac $$\delta$$ (delta) function (also known as an impulse) is the way that we convert a continuous function into a discrete one. It is defined to satisfy the following integral:

$$\int_{-\infty}^{\infty}\delta(l)dl=1$$

When integrated with another function, it gives that function’s value at $$l=0$$:

$$\int_{-\infty}^{\infty}f(l)\delta(l)dt=f(0)$$

An impulse positioned at another point (say $$l_0$$) is written as $$\delta(l-l_0)$$:

$$\int_{-\infty}^{\infty}f(l)\delta(l-l_0)dt=f(l_0)$$

The Dirac $$\delta$$ function also operates similarly if we use summations instead of integrals. The Fourier transform of the delta function is:

$${\cal F}[\delta(l)]=\int_{-\infty}^{\infty}\delta(l)e^{-i{\omega}l}dl=e^{-i{\omega}0}=1$$

$${\cal F}[\delta(l-l_0)]=\int_{-\infty}^{\infty}\delta(l-l_0)e^{-i{\omega}l}dl=e^{-i{\omega}l_0}$$

From the definition of the Dirac $$\delta$$ we can also define a Dirac comb ($${\rm III}_P$$) or an impulse train with infinite impulses separated by $$P$$:

$${\rm III}_P(l)\equiv\displaystyle\sum_{k=-\infty}^{\infty}\delta(l-kP)$$

$$P$$ is chosen to represent “pixel width” later in Sampling theorem. Therefore the Dirac comb is periodic with a period of $$P$$. We have intentionally used a different name for the period of the Dirac comb compared to the input signal’s length of observation that we showed with $$L$$ in Fourier series. This difference is highlighted here to avoid confusion later when these two periods are needed together in Discrete Fourier transform. The Fourier transform of the Dirac comb will be necessary in Sampling theorem, so let’s derive it. By its definition, it is periodic, with a period of $$P$$, so the Fourier coefficients of its Fourier Series (Fourier series) can be calculated within one period:

$${\rm III}_P=\displaystyle\sum_{n=-\infty}^{\infty}c_ne^{i{2{\pi}n\over P}l}$$

We can now find the $$c_n$$ from Fourier series:

$$c_n={1\over P}\int_{-P/2}^{P/2}\delta(l)e^{-i{2{\pi}n\over P}l} ={1\over P}\quad\quad \rightarrow \quad\quad {\rm III}_P={1\over P}\displaystyle\sum_{n=-\infty}^{\infty}e^{i{2{\pi}n\over P}l}$$

So we can write the Fourier transform of the Dirac comb as:

$${\cal F}[{\rm III}_P]=\int_{-\infty}^{\infty}{\rm III}_Pe^{-i{\omega}l}dl ={1\over P}\displaystyle\sum_{n=-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-i(\omega-{2{\pi}n\over P})l}dl={1\over P}\displaystyle\sum_{n=-\infty}^{\infty}\delta\left(\omega-{2{\pi}n\over P}\right)$$

In the last step, we used the fact that the complex exponential is a periodic function, that $$n$$ is an integer and that as we defined in Fourier transform, $$\omega{\equiv}m\omega_0$$, where $$m$$ was an integer. The integral will be zero for any $$\omega$$ that is not equal to $$2{\pi}n/P$$, a more complete explanation can be seen in Fourier series. Therefore, while in the spatial domain the impulses had spacing of $$P$$ (meters for example), in the frequency space, the spacing between the different impulses are $$2\pi/P$$ cycles per meters.

Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

#### 6.3.2.6 Convolution theorem

The convolution (shown with the $$\ast$$ operator) of the two functions $$f(l)$$ and $$h(l)$$ is defined as:

$$c(l)\equiv[f{\ast}h](l)=\int_{-\infty}^{\infty}f(\tau)h(l-\tau)d\tau$$

See Convolution process for a more detailed physical (pixel based) interpretation of this definition. The Fourier transform of convolution ($$C(\omega)$$) can be written as:

$$C(\omega)=\int_{-\infty}^{\infty}[f{\ast}h](l)e^{-i{\omega}l}dl= \int_{-\infty}^{\infty}f(\tau)\left[\int_{-\infty}^{\infty}h(l-\tau)e^{-i{\omega}l}dl\right]d\tau$$

To solve the inner integral, let’s define $$s{\equiv}l-\tau$$, so that $$ds=dl$$ and $$l=s+\tau$$ then the inner integral becomes:

$$\int_{-\infty}^{\infty}h(l-\tau)e^{-i{\omega}l}dl= \int_{-\infty}^{\infty}h(s)e^{-i{\omega}(s+\tau)}ds=e^{-i{\omega}\tau}\int_{-\infty}^{\infty}h(s)e^{-i{\omega}s}ds=H(\omega)e^{-i{\omega}\tau}$$

where $$H(\omega)$$ is the Fourier transform of $$h(l)$$. Substituting this result for the inner integral above, we get:

$$C(\omega)=H(\omega)\int_{-\infty}^{\infty}f(\tau)e^{-i{\omega}\tau}d\tau=H(\omega)F(\omega)=F(\omega)H(\omega)$$

where $$F(\omega)$$ is the Fourier transform of $$f(l)$$. So multiplying the Fourier transform of two functions individually, we get the Fourier transform of their convolution. The convolution theorem also proves a relation between the convolutions in the frequency space. Let’s define:

$$D(\omega){\equiv}F(\omega){\ast}H(\omega)$$

Applying the inverse Fourier Transform or synthesis equation (Fourier transform) to both sides and following the same steps above, we get:

$$d(l)=f(l)h(l)$$

Where $$d(l)$$ is the inverse Fourier transform of $$D(\omega)$$. We can therefore re-write the two equations above formally as the convolution theorem:

$${\cal F}[f{\ast}h]={\cal F}[f]{\cal F}[h]$$

$${\cal F}[fh]={\cal F}[f]\ast{\cal F}[h]$$

Besides its usefulness in blurring an image by convolving it with a given kernel, the convolution theorem also enables us to do another very useful operation in data analysis: to match the blur (or PSF) between two images taken with different telescopes/cameras or under different atmospheric conditions. This process is also known as de-convolution. Let’s take $$f(l)$$ as the image with a narrower PSF (less blurry) and $$c(l)$$ as the image with a wider PSF which appears more blurred. Also let’s take $$h(l)$$ to represent the kernel that should be convolved with the sharper image to create the more blurry image. Above, we proved the relation between these three images through the convolution theorem. But there, we assumed that $$f(l)$$ and $$h(l)$$ are known (given) and the convolved image is desired.

In de-convolution, we have $$f(l)$$ –the sharper image– and $$f*h(l)$$ –the more blurry image– and we want to find the kernel $$h(l)$$. The solution is a direct result of the convolution theorem:

$${\cal F}[h]={{\cal F}[f{\ast}h]\over {\cal F}[f]} \quad\quad {\rm or} \quad\quad h(l)={\cal F}^{-1}\left[{{\cal F}[f{\ast}h]\over {\cal F}[f]}\right]$$

While this works really nice, it has two problems:

• If $${\cal F}[f]$$ has any zero values, then the inverse Fourier transform will not be a number!
• If there is significant noise in the image, then the high frequencies of the noise are going to significantly reduce the quality of the final result.

A standard solution to both these problems is the Weiner de-convolution algorithm107.

Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

#### 6.3.2.7 Sampling theorem

Our mathematical functions are continuous, however, our data collecting and measuring tools are discrete. Here we want to give a mathematical formulation for digitizing the continuous mathematical functions so that later, we can retrieve the continuous function from the digitized recorded input. Assuming that we have a continuous function $$f(l)$$, then we can define $$f_s(l)$$ as the ‘sampled’ $$f(l)$$ through the Dirac comb (see Dirac delta and comb):

$$f_s(l)=f(l){\rm III}_P=\displaystyle\sum_{n=-\infty}^{\infty}f(l)\delta(l-nP)$$

The discrete data-element $$f_k$$ (for example, a pixel in an image), where $$k$$ is an integer, can thus be represented as:

$$f_k=\int_{-\infty}^{\infty}f_s(l)dl=\int_{-\infty}^{\infty}f(l)\delta(l-kP)dt=f(kP)$$

Note that in practice, our discrete data points are not found in this fashion. Each detector pixel (in an image for example) has an area and averages the signal it receives over that area, not a mathematical point as the Dirac $$\delta$$ function defines. However, as long as the variation in the signal over one detector pixel is not significant, this can be a good approximation. Having put this issue to the side, we can now try to find the relation between the Fourier transforms of the un-sampled $$f(l)$$ and the sampled $$f_s(l)$$. For a more clear notation, let’s define:

$$F_s(\omega)\equiv{\cal F}[f_s]$$

$$D(\omega)\equiv{\cal F}[{\rm III}_P]$$

Then using the Convolution theorem (see Convolution theorem), $$F_s(\omega)$$ can be written as:

$$F_s(\omega)={\cal F}[f(l){\rm III}_P]=F(\omega){\ast}D(\omega)$$

Finally, from the definition of convolution and the Fourier transform of the Dirac comb (see Dirac delta and comb), we get:

\eqalign{ F_s(\omega) &= \int_{-\infty}^{\infty}F(\omega)D(\omega-\mu)d\mu \cr &= {1\over P}\displaystyle\sum_{n=-\infty}^{\infty}\int_{-\infty}^{\infty}F(\omega)\delta\left(\omega-\mu-{2{\pi}n\over P}\right)d\mu \cr &= {1\over P}\displaystyle\sum_{n=-\infty}^{\infty}F\left( \omega-{2{\pi}n\over P}\right).\cr }

$$F(\omega)$$ was only a simple function, see Figure 6.3(left). However, from the sampled Fourier transform function we see that $$F_s(\omega)$$ is the superposition of infinite copies of $$F(\omega)$$ that have been shifted, see Figure 6.3(right). From the equation, it is clear that the shift in each copy is $$2\pi/P$$.

Figure 6.3: Sampling causes infinite repetition in the frequency domain. FT is an abbreviation for ‘Fourier transform’. $$\omega_m$$ represents the maximum frequency present in the input. $$F(\omega)$$ is only symmetric on both sides of 0 when the input is real (not complex). In general $$F(\omega)$$ is complex and thus cannot be simply plotted like this. Here we have assumed a real Gaussian $$f(t)$$ which has produced a Gaussian $$F(\omega)$$.

The input $$f(l)$$ can have any distribution of frequencies in it. In the example of Figure 6.3(left), the input consisted of a range of frequencies equal to $$\Delta\omega=2\omega_m$$. Fortunately as Figure 6.3(right) shows, the assumed pixel size ($$P$$) we used to sample this hypothetical function was such that $$2\pi/P>\Delta\omega$$. The consequence is that each copy of $$F(\omega)$$ has become completely separate from the surrounding copies. Such a digitized (sampled) data set is thus called over-sampled. When $$2\pi/P=\Delta\omega$$, $$P$$ is just small enough to finely separate even the largest frequencies in the input signal and thus it is known as critically-sampled. Finally if $$2\pi/P<\Delta\omega$$ we are dealing with an under-sampled data set. In an under-sampled data set, the separate copies of $$F(\omega)$$ are going to overlap and this will deprive us of recovering high constituent frequencies of $$f(l)$$. The effects of under-sampling in an image with high rates of change (for example a brick wall imaged from a distance) can clearly be visually seen and is known as aliasing.

When the input $$f(l)$$ is composed of a finite range of frequencies, $$f(l)$$ is known as a band-limited function. The example in Figure 6.3(left) was a nice demonstration of such a case: for all $$\omega<-\omega_m$$ or $$\omega>\omega_m$$, we have $$F(\omega)=0$$. Therefore, when the input function is band-limited and our detector’s pixels are placed such that we have critically (or over-) sampled it, then we can exactly reproduce the continuous $$f(l)$$ from the discrete or digitized samples. To do that, we just have to isolate one copy of $$F(\omega)$$ from the infinite copies and take its inverse Fourier transform.

This ability to exactly reproduce the continuous input from the sampled or digitized data leads us to the sampling theorem which connects the inherent property of the continuous signal (its maximum frequency) to that of the detector (the spacing between its pixels). The sampling theorem states that the full (continuous) signal can be recovered when the pixel size ($$P$$) and the maximum constituent frequency in the signal ($$\omega_m$$) have the following relation108:

$${2\pi\over P}>2\omega_m$$

This relation was first formulated by Harry Nyquist (1889 – 1976 A.D.) in 1928 and formally proved in 1949 by Claude E. Shannon (1916 – 2001 A.D.) in what is now known as the Nyquist-Shannon sampling theorem. In signal processing, the signal is produced (synthesized) by a transmitter and is received and de-coded (analyzed) by a receiver. Therefore producing a band-limited signal is necessary.

In astronomy, we do not produce the shapes of our targets, we are only observers. Galaxies can have any shape and size, therefore ideally, our signal is not band-limited. However, since we are always confined to observing through an aperture, the aperture will cause a point source (for which $$\omega_m=\infty$$) to be spread over several pixels. This spread is quantitatively known as the point spread function or PSF. This spread does blur the image which is undesirable; however, for this analysis it produces the positive outcome that there will be a finite $$\omega_m$$. Though we should caution that any detector will have noise which will add lots of very high frequency (ideally infinite) changes between the pixels. However, the coefficients of those noise frequencies are usually exceedingly small.

#### 6.3.2.8 Discrete Fourier transform

As we have stated several times so far, the input image is a digitized, pixelated or discrete array of values ($$f_s(l)$$, see Sampling theorem). The input is not a continuous function. Also, all our numerical calculations can only be done on a sampled, or discrete Fourier transform. Note that $$F_s(\omega)$$ is not discrete, it is continuous. One way would be to find the analytic $$F_s(\omega)$$, then sample it at any desired “freq-pixel”109 spacing. However, this process would involve two steps of operations and computers in particular are not too good at analytic operations for the first step. So here, we will derive a method to directly find the ‘freq-pixel’ated $$F_s(\omega)$$ from the pixelated $$f_s(l)$$. Let’s start with the definition of the Fourier transform (see Fourier transform):

$$F_s(\omega)=\int_{-\infty}^{\infty}f_s(l)e^{-i{\omega}l}dl$$

From the definition of $$f_s(\omega)$$ (using $$x$$ instead of $$n$$) we get:

\eqalign{ F_s(\omega) &= \displaystyle\sum_{x=-\infty}^{\infty} \int_{-\infty}^{\infty}f(l)\delta(l-xP)e^{-i{\omega}l}dl \cr &= \displaystyle\sum_{x=-\infty}^{\infty} f_xe^{-i{\omega}xP} }

Where $$f_x$$ is the value of $$f(l)$$ on the point $$x$$ or the value of the $$x$$th pixel. As shown in Sampling theorem this function is infinitely periodic with a period of $$2\pi/P$$. So all we need is the values within one period: $$0<\omega<2\pi/P$$, see Figure 6.3. We want $$X$$ samples within this interval, so the frequency difference between each frequency sample or freq-pixel is $$1/XP$$. Hence we will evaluate the equation above on the points at:

$$\omega={u\over XP} \quad\quad u = 0, 1, 2, ..., X-1$$

Therefore the value of the freq-pixel $$u$$ in the frequency domain is:

$$F_u=\displaystyle\sum_{x=0}^{X-1} f_xe^{-i{ux\over X}}$$

Therefore, we see that for each freq-pixel in the frequency domain, we are going to need all the pixels in the spatial domain110. If the input (spatial) pixel row is also $$X$$ pixels wide, then we can exactly recover the $$x$$th pixel with the following summation:

$$f_x={1\over X}\displaystyle\sum_{u=0}^{X-1} F_ue^{i{ux\over X}}$$

When the input pixel row (we are still only working on 1D data) has $$X$$ pixels, then it is $$L=XP$$ spatial units wide. $$L$$, or the length of the input data was defined in Fourier series and $$P$$ or the space between the pixels in the input was defined in Dirac delta and comb. As we saw in Sampling theorem, the input (spatial) pixel spacing ($$P$$) specifies the range of frequencies that can be studied and in Fourier series we saw that the length of the (spatial) input, ($$L$$) determines the resolution (or size of the freq-pixels) in our discrete Fourier transformed image. Both result from the fact that the frequency domain is the inverse of the spatial domain.

#### 6.3.2.9 Fourier operations in two dimensions

Once all the relations in the previous sections have been clearly understood in one dimension, it is very easy to generalize them to two or even more dimensions since each dimension is by definition independent. Previously we defined $$l$$ as the continuous variable in 1D and the inverse of the period in its direction to be $$\omega$$. Let’s show the second spatial direction with $$m$$ the inverse of the period in the second dimension with $$\nu$$. The Fourier transform in 2D (see Fourier transform) can be written as:

$$F(\omega, \nu)=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} f(l, m)e^{-i({\omega}l+{\nu}m)}dl$$

$$f(l, m)=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} F(\omega, \nu)e^{i({\omega}l+{\nu}m)}dl$$

The 2D Dirac $$\delta(l,m)$$ is non-zero only when $$l=m=0$$. The 2D Dirac comb (or Dirac brush! See Dirac delta and comb) can be written in units of the 2D Dirac $$\delta$$. For most image detectors, the sides of a pixel are equal in both dimensions. So $$P$$ remains unchanged, if a specific device is used which has non-square pixels, then for each dimension a different value should be used.

$${\rm III}_P(l, m)\equiv\displaystyle\sum_{j=-\infty}^{\infty} \displaystyle\sum_{k=-\infty}^{\infty} \delta(l-jP, m-kP)$$

The Two dimensional Sampling theorem (see Sampling theorem) is thus very easily derived as before since the frequencies in each dimension are independent. Let’s take $$\nu_m$$ as the maximum frequency along the second dimension. Therefore the two dimensional sampling theorem says that a 2D band-limited function can be recovered when the following conditions hold111:

$${2\pi\over P} > 2\omega_m \quad\quad\quad {\rm and} \quad\quad\quad {2\pi\over P} > 2\nu_m$$

Finally, let’s represent the pixel counter on the second dimension in the spatial and frequency domains with $$y$$ and $$v$$ respectively. Also let’s assume that the input image has $$Y$$ pixels on the second dimension. Then the two dimensional discrete Fourier transform and its inverse (see Discrete Fourier transform) can be written as:

$$F_{u,v}=\displaystyle\sum_{x=0}^{X-1}\displaystyle\sum_{y=0}^{Y-1} f_{x,y}e^{-i({ux\over X}+{vy\over Y})}$$

$$f_{x,y}={1\over XY}\displaystyle\sum_{u=0}^{X-1}\displaystyle\sum_{v=0}^{Y-1} F_{u,v}e^{i({ux\over X}+{vy\over Y})}$$

#### 6.3.2.10 Edges in the frequency domain

With a good grasp of the frequency domain, we can revisit the problem of convolution on the image edges, see Edges in the spatial domain. When we apply the convolution theorem (see Convolution theorem) to convolve an image, we first take the discrete Fourier transforms (DFT, Discrete Fourier transform) of both the input image and the kernel, then we multiply them with each other and then take the inverse DFT to construct the convolved image. Of course, in order to multiply them with each other in the frequency domain, the two images have to be the same size, so let’s assume that we pad the kernel (it is usually smaller than the input image) with zero valued pixels in both dimensions so it becomes the same size as the input image before the DFT.

Having multiplied the two DFTs, we now apply the inverse DFT which is where the problem is usually created. If the DFT of the kernel only had values of 1 (unrealistic condition!) then there would be no problem and the inverse DFT of the multiplication would be identical with the input. However in real situations, the kernel’s DFT has a maximum of 1 (because the sum of the kernel has to be one, see Convolution process) and decreases something like the hypothetical profile of Figure 6.3. So when multiplied with the input image’s DFT, the coefficients or magnitudes (see Circles and the complex plane) of the smallest frequency (or the sum of the input image pixels) remains unchanged, while the magnitudes of the higher frequencies are significantly reduced.

As we saw in Sampling theorem, the Fourier transform of a discrete input will be infinitely repeated. In the final inverse DFT step, the input is in the frequency domain (the multiplied DFT of the input image and the kernel DFT). So the result (our output convolved image) will be infinitely repeated in the spatial domain. In order to accurately reconstruct the input image, we need all the frequencies with the correct magnitudes. However, when the magnitudes of higher frequencies are decreased, longer periods (shorter frequencies) will dominate in the reconstructed pixel values. Therefore, when constructing a pixel on the edge of the image, the newly empowered longer periods will look beyond the input image edges and will find the repeated input image there. So if you convolve an image in this fashion using the convolution theorem, when a bright object exists on one edge of the image, its blurred wings will be present on the other side of the convolved image. This is often termed as circular convolution or cyclic convolution.

So, as long as we are dealing with convolution in the frequency domain, there is nothing we can do about the image edges. The least we can do is to eliminate the ghosts of the other side of the image. So, we add zero valued pixels to both the input image and the kernel in both dimensions so the image that will be convolved has a size equal to the sum of both images in each dimension. Of course, the effect of this zero-padding is that the sides of the output convolved image will become dark. To put it another way, the edges are going to drain the flux from nearby objects. But at least it is consistent across all the edges of the image and is predictable. In Convolve, you can see the padded images when inspecting the frequency domain convolution steps with the --viewfreqsteps option.

Next: , Previous: , Up: Convolve   [Contents][Index]

#### 6.3.3 Spatial vs. Frequency domain

With the discussions above it might not be clear when to choose the spatial domain and when to choose the frequency domain. Here we will try to list the benefits of each.

The spatial domain,

• Can correct for the edge effects of convolution, see Edges in the spatial domain.
• Can operate on blank pixels.
• Can be faster than frequency domain when the kernel is small (in terms of the number of pixels on the sides).

The frequency domain,

• Will be much faster when the image and kernel are both large.

As a general rule of thumb, when working on an image of modeled profiles use the frequency domain and when working on an image of real (observed) objects use the spatial domain (corrected for the edges). The reason is that if you apply a frequency domain convolution to a real image, you are going to loose information on the edges and generally you don’t want large kernels. But when you have made the profiles in the image yourself, you can just make a larger input image and crop the central parts to completely remove the edge effect, see If convolving afterwards. Also due to oversampling, both the kernels and the images can become very large and the speed boost of frequency domain convolution will significantly improve the processing time, see Oversampling.

Next: , Previous: , Up: Convolve   [Contents][Index]

#### 6.3.4 Convolution kernel

All the programs that need convolution will need to be given a convolution kernel file and extension. In most cases (other than Convolve, see Convolve) the kernel file name is optional. However, the extension is necessary and must be specified either on the command-line or at least one of the configuration files (see Configuration files). Within Gnuastro, there are two ways to create a kernel image:

• MakeProfiles: You can use MakeProfiles to create a parametric (based on a radial function) kernel, see MakeProfiles. By default MakeProfiles will make the Gaussian and Moffat profiles in a separate file so you can feed it into any of the programs.
• ConvertType: You can write your own desired kernel into a text file table and convert it to a FITS file with ConvertType, see ConvertType. Just be careful that the kernel has to have an odd number of pixels along its two axes, see Convolution process. All the programs that do convolution will normalize the kernel internally, so if you choose this option, you don’t have to worry about normalizing the kernel. Only within Convolve, there is an option to disable normalization, see Invoking Convolve.

The two options to specify a kernel file name and its extension are shown below. These are common between all the programs that will do convolution.

-k STR
--kernel=STR

The convolution kernel file name. The BITPIX (data type) value of this file can be any standard type and it does not necessarily have to be normalized. Several operations will be done on the kernel image prior to the program’s processing:

• It will be converted to floating point type.
• All blank pixels (see Blank pixels) will be set to zero.
• It will be normalized so the sum of its pixels equal unity.
• It will be flipped so the convolved image has the same orientation. This is only relevant if the kernel is not circular. See Convolution process.
-U STR
--khdu=STR

The convolution kernel HDU. Although the kernel file name is optional, before running any of the programs, they need to have a value for --khdu even if the default kernel is to be used. So be sure to keep its value in at least one of the configuration files (see Configuration files). By default, the system configuration file has a value.

Previous: , Up: Convolve   [Contents][Index]

#### 6.3.5 Invoking Convolve

Convolve an input dataset (2D image or 1D spectrum for example) with a known kernel, or make the kernel necessary to match two PSFs. The general template for Convolve is:

$astconvolve [OPTION...] ASTRdata  One line examples: ## Convolve mockimg.fits with psf.fits:$ astconvolve --kernel=psf.fits mockimg.fits

## Convolve in the spatial domain:
$astconvolve observedimg.fits --kernel=psf.fits --domain=spatial ## Find the kernel to match sharper and blurry PSF images:$ astconvolve --kernel=sharperimage.fits --makekernel=10           \
blurryimage.fits

## Convolve a Spectrum (column 14 in the FITS table below) with a
## custom kernel (the kernel will be normalized internally, so only
## the ratios are important). Sed is used to replace the spaces with
## new line characters so Convolve sees them as values in one column.
$echo "1 3 10 3 1" | sed 's/ /\n/g' | astconvolve spectra.fits -c14  The only argument accepted by Convolve is an input image file. Some of the options are the same between Convolve and some other Gnuastro programs. Therefore, to avoid repetition, they will not be repeated here. For the full list of options shared by all Gnuastro programs, please see Common options. In particular, in the spatial domain, on a multi-dimensional datasets, convolve uses Gnuastro’s tessellation to speed up the run, see Tessellation. Common options related to tessellation are described in in Processing options. 1-dimensional datasets (for example spectra) are only read as columns within a table (see Tables for more on how Gnuastro programs read tables). Note that currently 1D convolution is only implemented in the spatial domain and thus kernel-matching is also not supported. Here we will only explain the options particular to Convolve. Run Convolve with --help in order to see the full list of options Convolve accepts, irrespective of where they are explained in this book. --kernelcolumn Column containing the 1D kernel. When the input dataset is a 1-dimensional column, and the host table has more than one column, use this option to specify which column should be used. --nokernelflip Do not flip the kernel after reading it the spatial domain convolution. This can be useful if the flipping has already been applied to the kernel. --nokernelnormx Do not normalize the kernel after reading it, such that the sum of its pixels is unity. -d STR --domain=STR The domain to use for the convolution. The acceptable values are ‘spatial’ and ‘frequency’, corresponding to the respective domain. For large images, the frequency domain process will be more efficient than convolving in the spatial domain. However, the edges of the image will loose some flux (see Edges in the spatial domain) and the image must not contain any blank pixels, see Spatial vs. Frequency domain. --checkfreqsteps With this option a file with the initial name of the output file will be created that is suffixed with _freqsteps.fits, all the steps done to arrive at the final convolved image are saved as extensions in this file. The extensions in order are: 1. The padded input image. In frequency domain convolution the two images (input and convolved) have to be the same size and both should be padded by zeros. 2. The padded kernel, similar to the above. 3. The Fourier spectrum of the forward Fourier transform of the input image. Note that the Fourier transform is a complex operation (and not view able in one image!) So we either have to show the ‘Fourier spectrum’ or the ‘Phase angle’. For the complex number $$a+ib$$, the Fourier spectrum is defined as $$\sqrt{a^2+b^2}$$ while the phase angle is defined as $$\arctan(b/a)$$. 4. The Fourier spectrum of the forward Fourier transform of the kernel image. 5. The Fourier spectrum of the multiplied (through complex arithmetic) transformed images. 6. The inverse Fourier transform of the multiplied image. If you open it, you will see that the convolved image is now in the center, not on one side of the image as it started with (in the padded image of the first extension). If you are working on a mock image which originally had pixels of precisely 0.0, you will notice that in those parts that your convolved profile(s) did not convert, the values are now $$\sim10^{-18}$$, this is due to floating-point round off errors. Therefore in the final step (when cropping the central parts of the image), we also remove any pixel with a value less than $$10^{-17}$$. --noedgecorrection Do not correct the edge effect in spatial domain convolution. For a full discussion, please see Edges in the spatial domain. -m INT --makekernel=INT (=INT) If this option is called, Convolve will do de-convolution (see Convolution theorem). The image specified by the --kernel option is assumed to be the sharper (less blurry) image and the input image is assumed to be the more blurry image. The value given to this option will be used as the maximum radius of the kernel. Any pixel in the final kernel that is larger than this distance from the center will be set to zero. The two images must have the same size. Noise has large frequencies which can make the result less reliable for the higher frequencies of the final result. So all the frequencies which have a spectrum smaller than the value given to the minsharpspec option in the sharper input image are set to zero and not divided. This will cause the wings of the final kernel to be flatter than they would ideally be which will make the convolved image result unreliable if it is too high. Some notes to take into account for a good result: • Choose a bright (unsaturated) star and use a region box (with Crop for example, see Crop) that is sufficiently above the noise. • Use Warp (see Warp) to warp the pixel grid so the star’s center is exactly on the center of the central pixel in the cropped image. This will certainly slightly degrade the result, however, it is necessary. If there are multiple good stars, you can shift all of them, then normalize them (so the sum of each star’s pixels is one) and then take their average to decrease this effect. • The shifting might move the center of the star by one pixel in any direction, so crop the central pixel of the warped image to have a clean image for the de-convolution. Note that this feature is not yet supported in 1-dimensional datasets. -c --minsharpspec (=FLT) The minimum frequency spectrum (or coefficient, or pixel value in the frequency domain image) to use in deconvolution, see the explanations under the --makekernel option for more information. Previous: , Up: Data manipulation [Contents][Index] ### 6.4 Warp Image warping is the process of mapping the pixels of one image onto a new pixel grid. This process is sometimes known as transformation, however following the discussion of Heckbert 1989112 we will not be using that term because it can be confused with only pixel value or flux transformations. Here we specifically mean the pixel grid transformation which is better conveyed with ‘warp’. Image wrapping is a very important step in astronomy, both in observational data analysis and in simulating modeled images. In modeling, warping an image is necessary when we want to apply grid transformations to the initial models, for example in simulating gravitational lensing (Radial warpings are not yet included in Warp). Observational reasons for warping an image are listed below: • Noise: Most scientifically interesting targets are inherently faint (have a very low Signal to noise ratio). Therefore one short exposure is not enough to detect such objects that are drowned deeply in the noise. We need multiple exposures so we can add them together and increase the objects’ signal to noise ratio. Keeping the telescope fixed on one field of the sky is practically impossible. Therefore very deep observations have to put into the same grid before adding them. • Resolution: If we have multiple images of one patch of the sky (hopefully at multiple orientations) we can warp them to the same grid. The multiple orientations will allow us to ‘guess’ the values of pixels on an output pixel grid that has smaller pixel sizes and thus increase the resolution of the output. This process of merging multiple observations is known as Mosaicing. • Cosmic rays: Cosmic rays can randomly fall on any part of an image. If they collide vertically with the camera, they are going to create a very sharp and bright spot that in most cases can be separated easily113. However, depending on the depth of the camera pixels, and the angle that a cosmic rays collides with it, it can cover a line-like larger area on the CCD which makes the detection using their sharp edges very hard and error prone. One of the best methods to remove cosmic rays is to compare multiple images of the same field. To do that, we need all the images to be on the same pixel grid. • Optical distortion: (Not yet included in Warp) In wide field images, the optical distortion that occurs on the outer parts of the focal plane will make accurate comparison of the objects at various locations impossible. It is therefore necessary to warp the image and correct for those distortions prior to the analysis. • Detector not on focal plane: In some cases (like the Hubble Space Telescope ACS and WFC3 cameras), the CCD might be tilted compared to the focal plane, therefore the recorded CCD pixels have to be projected onto the focal plane before further analysis. Next: , Previous: , Up: Warp [Contents][Index] #### 6.4.1 Warping basics Let’s take $$\left[\matrix{u&v}\right]$$ as the coordinates of a point in the input image and $$\left[\matrix{x&y}\right]$$ as the coordinates of that same point in the output image114. The simplest form of coordinate transformation (or warping) is the scaling of the coordinates, let’s assume we want to scale the first axis by $$M$$ and the second by $$N$$, the output coordinates of that point can be calculated by $$\left[\matrix{x\cr y}\right]= \left[\matrix{Mu\cr Nv}\right]= \left[\matrix{M&0\cr0&N}\right]\left[\matrix{u\cr v}\right]$$ Note that these are matrix multiplications. We thus see that we can represent any such grid warping as a matrix. Another thing we can do with this $$2\times2$$ matrix is to rotate the output coordinate around the common center of both coordinates. If the output is rotated anticlockwise by $$\theta$$ degrees from the positive (to the right) horizontal axis, then the warping matrix should become: $$\left[\matrix{x\cr y}\right]= \left[\matrix{ucos\theta-vsin\theta\cr usin\theta+vcos\theta}\right]= \left[\matrix{cos\theta&-sin\theta\cr sin\theta&cos\theta}\right] \left[\matrix{u\cr v}\right]$$ We can also flip the coordinates around the first axis, the second axis and the coordinate center with the following three matrices respectively: $$\left[\matrix{1&0\cr0&-1}\right]\quad\quad \left[\matrix{-1&0\cr0&1}\right]\quad\quad \left[\matrix{-1&0\cr0&-1}\right]$$ The final thing we can do with this definition of a $$2\times2$$ warping matrix is shear. If we want the output to be sheared along the first axis with $$A$$ and along the second with $$B$$, then we can use the matrix: $$\left[\matrix{1&A\cr B&1}\right]$$ To have one matrix representing any combination of these steps, you use matrix multiplication, see Merging multiple warpings. So any combinations of these transformations can be displayed with one $$2\times2$$ matrix: $$\left[\matrix{a&b\cr c&d}\right]$$ The transformations above can cover a lot of the needs of most coordinate transformations. However they are limited to mapping the point $$[\matrix{0&0}]$$ to $$[\matrix{0&0}]$$. Therefore they are useless if you want one coordinate to be shifted compared to the other one. They are also space invariant, meaning that all the coordinates in the image will receive the same transformation. In other words, all the pixels in the output image will have the same area if placed over the input image. So transformations which require varying output pixel sizes like projections cannot be applied through this $$2\times2$$ matrix either (for example for the tilted ACS and WFC3 camera detectors on board the Hubble space telescope). To add these further capabilities, namely translation and projection, we use the homogeneous coordinates. They were defined about 200 years ago by August Ferdinand Möbius (1790 – 1868). For simplicity, we will only discuss points on a 2D plane and avoid the complexities of higher dimensions. We cannot provide a deep mathematical introduction here, interested readers can get a more detailed explanation from Wikipedia115 and the references therein. By adding an extra coordinate to a point we can add the flexibility we need. The point $$[\matrix{x&y}]$$ can be represented as $$[\matrix{xZ&yZ&Z}]$$ in homogeneous coordinates. Therefore multiplying all the coordinates of a point in the homogeneous coordinates with a constant will give the same point. Put another way, the point $$[\matrix{x&y&Z}]$$ corresponds to the point $$[\matrix{x/Z&y/Z}]$$ on the constant $$Z$$ plane. Setting $$Z=1$$, we get the input image plane, so $$[\matrix{u&v&1}]$$ corresponds to $$[\matrix{u&v}]$$. With this definition, the transformations above can be generally written as: $$\left[\matrix{x\cr y\cr 1}\right]= \left[\matrix{a&b&0\cr c&d&0\cr 0&0&1}\right] \left[\matrix{u\cr v\cr 1}\right]$$ We thus acquired 4 extra degrees of freedom. By giving non-zero values to the zero valued elements of the last column we can have translation (try the matrix multiplication!). In general, any coordinate transformation that is represented by the matrix below is known as an affine transformation116: $$\left[\matrix{a&b&c\cr d&e&f\cr 0&0&1}\right]$$ We can now consider translation, but the affine transform is still spatially invariant. Giving non-zero values to the other two elements in the matrix above gives us the projective transformation or Homography117 which is the most general type of transformation with the $$3\times3$$ matrix: $$\left[\matrix{x'\cr y'\cr w}\right]= \left[\matrix{a&b&c\cr d&e&f\cr g&h&1}\right] \left[\matrix{u\cr v\cr 1}\right]$$ So the output coordinates can be calculated from: $$x={x' \over w}={au+bv+c \over gu+hv+1}\quad\quad\quad\quad y={y' \over w}={du+ev+f \over gu+hv+1}$$ Thus with Homography we can change the sizes of the output pixels on the input plane, giving a ‘perspective’-like visual impression. This can be quantitatively seen in the two equations above. When $$g=h=0$$, the denominator is independent of $$u$$ or $$v$$ and thus we have spatial invariance. Homography preserves lines at all orientations. A very useful fact about Homography is that its inverse is also a Homography. These two properties play a very important role in the implementation of this transformation. A short but instructive and illustrated review of affine, projective and also bi-linear mappings is provided in Heckbert 1989118. Next: , Previous: , Up: Warp [Contents][Index] #### 6.4.2 Merging multiple warpings In Warping basics we saw how a basic warp/transformation can be represented with a matrix. To make more complex warpings (for example to define a translation, rotation and scale as one warp) the individual matrices have to be multiplied through matrix multiplication. However matrix multiplication is not commutative, so the order of the set of matrices you use for the multiplication is going to be very important. The first warping should be placed as the left-most matrix. The second warping to the right of that and so on. The second transformation is going to occur on the warped coordinates of the first. As an example for merging a few transforms into one matrix, the multiplication below represents the rotation of an image about a point $$[\matrix{U&V}]$$ anticlockwise from the horizontal axis by an angle of $$\theta$$. To do this, first we take the origin to $$[\matrix{U&V}]$$ through translation. Then we rotate the image, then we translate it back to where it was initially. These three operations can be merged in one operation by calculating the matrix multiplication below: $$\left[\matrix{1&0&U\cr0&1&V\cr{}0&0&1}\right] \left[\matrix{cos\theta&-sin\theta&0\cr sin\theta&cos\theta&0\cr 0&0&1}\right] \left[\matrix{1&0&-U\cr0&1&-V\cr{}0&0&1}\right]$$ Next: , Previous: , Up: Warp [Contents][Index] #### 6.4.3 Resampling A digital image is composed of discrete ‘picture elements’ or ‘pixels’. When a real image is created from a camera or detector, each pixel’s area is used to store the number of photo-electrons that were created when incident photons collided with that pixel’s surface area. This process is called the ‘sampling’ of a continuous or analog data into digital data. When we change the pixel grid of an image or warp it as we defined in Warping basics, we have to ‘guess’ the flux value of each pixel on the new grid based on the old grid, or re-sample it. Because of the ‘guessing’, any form of warping on the data is going to degrade the image and mix the original pixel values with each other. So if an analysis can be done on an unwarped data image, it is best to leave the image untouched and pursue the analysis. However as discussed in Warp this is not possible most of the times, so we have to accept the problem and re-sample the image. In most applications of image processing, it is sufficient to consider each pixel to be a point and not an area. This assumption can significantly speed up the processing of an image and also the simplicity of the code. It is a fine assumption when the signal to noise ratio of the objects are very large. The question will then be one of interpolation because you have multiple points distributed over the output image and you want to find the values at the pixel centers. To increase the accuracy, you might also sample more than one point from within a pixel giving you more points for a more accurate interpolation in the output grid. However, interpolation has several problems. The first one is that it will depend on the type of function you want to assume for the interpolation. For example you can choose a bi-linear or bi-cubic (the ‘bi’s are for the 2 dimensional nature of the data) interpolation method. For the latter there are various ways to set the constants119. Such functional interpolation functions can fail seriously on the edges of an image. They will also need normalization so that the flux of the objects before and after the warpings are comparable. The most basic problem with such techniques is that they are based on a point while a detector pixel is an area. They add a level of subjectivity to the data (make more assumptions through the functions than the data can handle). For most applications this is fine, but in scientific applications where detection of the faintest possible galaxies or fainter parts of bright galaxies is our aim, we cannot afford this loss. Because of these reasons Warp will not use such interpolation techniques. Warp will do interpolation based on “pixel mixing”120 or “area resampling”. This is also what the Hubble Space Telescope pipeline calls “Drizzling”121. This technique requires no functions, it is thus non-parametric. It is also the closest we can get (make least assumptions) to what actually happens on the detector pixels. The basic idea is that you reverse-transform each output pixel to find which pixels of the input image it covers and what fraction of the area of the input pixels are covered. To find the output pixel value, you simply sum the value of each input pixel weighted by the overlapfraction (between 0 to 1) of the output pixel and that input pixel. Through this process, pixels are treated as an area not as a point (which is how detectors create the image), also the brightness (see Flux Brightness and magnitude) of an object will be left completely unchanged. If there are very high spatial-frequency signals in the image (for example fringes) which vary on a scale smaller than your output image pixel size, pixel mixing can cause ailiasing122. So if the input image has fringes, they have to be calculated and removed separately (which would naturally be done in any astronomical application). Because of the PSF no astronomical target has a sharpchange in the signal so this issue is less important for astronomical applications, see Point spread function. Previous: , Up: Warp [Contents][Index] #### 6.4.4 Invoking Warp Warp an input dataset into a new grid. Any homographic warp (for example scaling, rotation, translation, projection) is acceptable, see Warping basics for the definitions. The general template for invoking Warp is: $ astwarp [OPTIONS...] InputImage


One line examples:

## Rotate and then scale input image:
$astwarp --rotate=37.92 --scale=0.8 image.fits ## Scale, then translate the input image:$ astwarp --scale 8/3 --translate 2.1 image.fits

## Align raw image with celestial coordinates:
astwarp --align rawimage.fits --output=aligned.fits ## Directly input a custom warping matrix (using fraction): astwarp --matrix=1/5,0,4/10,0,1/5,4/10,0,0,1 image.fits

## Directly input a custom warping matrix, with final numbers:
\$ astwarp --matrix="0.7071,-0.7071,  0.7071,0.7071" image.fits


If any processing is to be done, Warp can accept one file as input. As in all Gnuastro programs, when an output is not explicitly set with the --output option, the output filename will be set automatically based on the operation, see Automatic output. For the full list of general options to all Gnuastro programs (including Warp), please see Common options.

To be the most accurate, the input image will be read as a 64-bit double precision floating point dataset and all internal processing is done in this format (including the raw output type). You can use the common --type option to write the output in any type you want, see Numeric data types.

Warps must be specified as command-line options, either as (possibly multiple) modular warpings (for example --rotate, or --scale), or directly as a single raw matrix (with --matrix). If specified together, the latter (direct matrix) will take precedence and all the modular warpings will be ignored. Any number of modular warpings can be specified on the command-line and configuration files. If more than one modular warping is given, all will be merged to create one warping matrix. As described in Merging multiple warpings, matrix multiplication is not commutative, so the order of specifying the modular warpings on the command-line, and/or configuration files makes a difference (see Configuration file precedence). The full list of modular warpings and the other options particular to Warp are described below.

The values to the warping options (modular warpings as well as --matrix), are a sequence of at least one number. Each number in this sequence is separated from the next by a comma (,). Each number can also be written as a single fraction (with a forward-slash / between the numerator and denominator). Space and Tab characters arepermitted between any two numbers, just don’t forget to quote the whole value. Otherwise, the value will not be fully passed onto the option. See the examples above as a demonstration.

Based on the FITS standard, integer values are assigned to the center of a pixel and the coordinate [1.0, 1.0] is the center of the first pixel (bottom left of the image when viewed in SAO ds9). So the coordinate center [0.0, 0.0] is half a pixel away (in each axis) from the bottom left vertex of the first pixel. The resampling that is done in Warp (see Resampling) is done on the coordinate axes and thus directly depends on the coordinate center. In some situations this if fine, for example when rotating/aligning a real image, all the edge pixels will be similarly affected. But in other situations (for example when scaling an over-sampled mock image to its intended resolution, this is not desired: you want the center of the coordinates to be on the corner of the pixel. In such cases, you can use the --centeroncorner option which will shift the center by $$0.5$$ before the main warp, then shift it back by $$-0.5$$ after the main warp, see below.

-a
--align

Align the image and celestial (WCS) axes given in the input. After it, the vertical image direction (when viewed in SAO ds9) corresponds to thedeclination and the horizontal axis is the inverse of the Right Ascension (RA). The inverse of the RA is chosen so the image can correspond to what you would actually see on the sky and is common in most survey images.

Align is internally treated just like a rotation (--rotation), but uses the input image’s WCS to find the rotation angle. Thus, if you have rotated the image before calling --align, you might get unexpected results (because the rotation is defined on the original WCS).

-r FLT
--rotate=FLT

Rotate the input image by the given angle in degrees: $$\theta$$ in Warping basics. Note that commonly, the WCS structure of the image is set such that the RA is the inverse of the image horizontal axis which increases towards the right in the FITS standard and as viewed by SAO ds9. So the default center for rotation is on the right of the image. If you want to rotate about other points, you have to translate the warping center first (with --translate) then apply your rotation and then return the center back to the original position (with another call to --translate, see Merging multiple warpings.

-s FLT[,FLT]
--scale=FLT[,FLT]

Scale the input image by the given factor(s): $$M$$ and $$N$$ in Warping basics. If only one value is given, then both image axes will be scaled with the given value. When two values are given (separated by a comma), the first will be used to scale the first axis and the second will be used for the second axis. If you only need to scale one axis, use 1 for the axis you don’t need to scale. The value(s) can also be written (on the command-line or in configuration files) as a fraction.

-f FLT[,FLT]
--flip=FLT[,FLT]

Flip the input image around the given axis(s). If only one value is given, then both image axes are flipped. When two values are given (separated by acomma), you can choose which axis to flip over. --flip only takes values 0 (for no flip), or 1 (for a flip). Hence, if you want to flip by the second axis only, use --flip=0,1.

-e FLT[,FLT]
--shear=FLT[,FLT]

Shear the input image by the given value(s): $$A$$ and $$B$$ in Warping basics. If only one value is given, then both image axes will be sheared with the given value. When two values are given (separated by a comma), the first will be used to shear the first axis and the second will be used for the second axis. If you only need to shear along one axis, use 0 for the axis that must be untouched. The value(s) can also be written (on the command-line or in configuration files) as a fraction.

-t FLT[,FLT]
--translate=FLT[,FLT]

Translate (move the center of coordinates) the input image by the given value(s): $$c$$ and $$f$$ in Warping basics. If only one value is given, then both image axes will be translated by the given value. When two values are given (separated by a comma), the first will be used to translate the first axis and the second will be used for the second axis. If you only need to translate along one axis, use 0 for the axis that must be untouched. The value(s) can also be written (on the command-line or in configuration files) as a fraction.

-p FLT[,FLT]
--project=FLT[,FLT]

Apply a projection to the input image by the given values(s): $$g$$ and $$h$$ in Warping basics. If only one value is given, then projection will apply to both axes with the given value. When two values are given (separated by a comma), the first will be used to project the first axis and the second will be used for the second axis. If you only need to project along one axis, use 0 for the axis that must be untouched. The value(s) can also be written (on the command-line or in configuration files) as a fraction.

-m STR
--matrix=STR

The warp/transformation matrix. All the elements in this matrix must be separated by comas(,) characters and as described above, you can also use fractions (a forward-slash between two numbers). The transformation matrix can be either a 2 by 2 (4 numbers), or a 3 by 3 (9 numbers) array. In the former case (if a 2 by 2 matrix is given), then it is put into a 3 by 3 matrix (see Warping basics).

The determinant of the matrix has to be non-zero and it must not contain any non-number values (for example infinities or NaNs). The elements of the matrix have to be written row by row. So for the general Homography matrix of Warping basics, it should be called with --matrix=a,b,c,d,e,f,g,h,1.

The raw matrix takes precedence over all the modular warping options listed above, so if it is called with any number of modular warps, the latter are ignored.

-c
--centeroncorer

Put the center of coordinates on the corner of the first (bottom-left when viewed in SAO ds9) pixel. This option is applied after the final warping matrix has been finalized: either through modular warpings or the raw matrix. See the explanation above for coordinates in the FITS standard to better understand this option and when it should be used.

--hstartwcs=INT

Specify the first header keyword number (line) that should be used to read the WCS information, see the full explanation in Invoking Crop.

--hendwcs=INT

Specify the last header keyword number (line) that should be used to read the WCS information, see the full explanation in Invoking Crop.

-k
--keepwcs

Do not correct the WCS information of the input image and save it untouched to the output image. By default the WCS (World Coordinate System) information of the input image is going to be corrected in the output image so the objects in the image are at the same WCS coordinates. But in some cases it might be useful to keep it unchanged (for example to correct alignments).

-C FLT
--coveredfrac=FLT

Depending on the warp, the output pixels that cover pixels on the edge of the input image, or blank pixels in the input image, are not going to be fully covered by input data. With this option, you can specify the acceptable covered fraction of such pixels (any value between 0 and 1). If you only want output pixels that are fully covered by the input image area (and are not blank), then you can set --coveredfrac=1. Alternatively, a value of 0 will keep output pixels that are even infinitesimally covered by the input(so the sum of the pixels in the input and output images will be the same).

Next: , Previous: , Up: Top   [Contents][Index]

## 7 Data analysis

Astronomical datasets (images or tables) contain very valuable information, the tools in this section can help in analyzing, extracting, and quantifying that information. For example getting general or specific statistics of the dataset (with Statistics), detecting signal within a noisy dataset (with NoiseChisel), or creating a catalog from an input dataset (with MakeCatalog).

Next: , Previous: , Up: Data analysis   [Contents][Index]

### 7.1 Statistics

The distribution of values in a dataset can provide valuable information about it. For example, in an image, if it is a positively skewed distribution, we can see that there is significant data in the image. If the distribution is roughly symmetric, we can tell that there is no significant data in the image. In a table, when we need to select a sample of objects, it is important to first get a general view of the whole sample.

On the other hand, you might need to know certain statistical parameters of the dataset. For example, if we have run a detection algorithm on an image, and we want to see how accurate it was, one method is to calculate the average of the undetected pixels and see how reasonable it is (if detection is done correctly, the average of undetected pixels should be approximately equal to the background value, see Sky value). In a table, you might have calculated the magnitudes of a certain class of objects and want to get some general characteristics of the distribution immediately on the command-line (very fast!), to possibly change some parameters. The Statistics program is designed for such situations.

Next: , Previous: , Up: Statistics   [Contents][Index]

#### 7.1.1 Histogram and Cumulative Frequency Plot

Histograms and the cumulative frequency plots are both used to visually study the distribution of a dataset. A histogram shows the number of data points which lie within pre-defined intervals (bins). So on the horizontal axis we have the bin centers and on the vertical, the number of points that are in that bin. You can use it to get a general view of the distribution: which values have been repeated the most? how close/far are the most significant bins? Are there more values in the larger part of the range of the dataset, or in the lower part? Similarly, many very important properties about the dataset can be deduced from a visual inspection of the histogram. In the Statistics program, the histogram can be either output to a table to plot with your favorite plotting program123, or it can be shown with ASCII characters on the command-line, which is very crude, but good enough for a fast and on-the-go analysis, see the example in Invoking Statistics.

The width of the bins is only necessary parameter for a histogram. In the limiting case that the bin-widths tend to zero (while assuming the number of points in the dataset tend to infinity), then the histogram will tend to the probability density function of the distribution. When the absolute number of points in each bin is not relevant to the study (only the shape of the histogram is important), you can normalize a histogram so like the probability density function, the sum of all its bins will be one.

In the cumulative frequency plot of a distribution, the horizontal axis is the sorted data values and the y axis is the index of each data in the sorted distribution. Unlike a histogram, a cumulative frequency plot does not involve intervals or bins. This makes it less prone to any sort of bias or error that a given bin-width would have on the analysis. When a larger number of the data points have roughly the same value, then the cumulative frequency plot will become steep in that vicinity. This occurs because on the horizontal axis, there is little change while on the vertical axis, the indexes constantly increase. Normalizing a cumulative frequency plot means to divide each index (y axis) by the total number of data points (or the last value).

Unlike the histogram which has a limited number of bins, ideally the cumulative frequency plot should have one point for every data element. Even in small datasets (for example a $$200\times200$$ image) this will result in an unreasonably large number of points to plot (40000)! As a result, for practical reasons, it is common to only store its value on a certain number of points (intervals) in the input range rather than the whole dataset, so you should determine the number of bins you want when asking for a cumulative frequency plot. In Gnuastro (and thus the Statistics program), the number reported for each bin is the total number of data points until the larger interval value for that bin. You can see an example histogram and cumulative frequency plot of a single dataset under the --asciihist and --asciicfp options of Invoking Statistics.

So as a summary, both the histogram and cumulative frequency plot in Statistics will work with bins. Within each bin/interval, the lower value is considered to be within then bin (it is inclusive), but its larger value is not (it is exclusive). Formally, an interval/bin between a and b is represented by [a, b). When the over-all range of the dataset is specified (with the --greaterequal, --lessthan, or --qrange options), the acceptable values of the dataset are also defined with a similar inclusive-exclusive manner. But when the range is determined from the actual dataset (none of these options is called), the last element in the dataset is included in the last bin’s count.

Next: , Previous: , Up: Statistics   [Contents][Index]

#### 7.1.2 2D Histograms

In Histogram and Cumulative Frequency Plot the concept of histograms were introduced on a single dataset. However, especially when doing high-level science on tables, the distribution in a 2D space may be of interest (for example a color-magnitude diagram). But the number of points may be too large for a simple scatter plot to show the concentration of the points: they will all fall over each other and just make a large connected region that will hide potentially interesting behaviors. This is where 2D histograms can become very useful. The desired 2D region is broken up into 2D bins (boxes) and the number of points falling within each box is returned. Added with a color-bar, you can now clearly see the distribution.

Gnuastro’s Statistics program has the --histogram2d option for this task. Its output will be three columns that have the centers of every box in both dimensions. The first column is the central box coordinates in the first dimension, the second has values along the second dimension and the third has the number of input points that fall within each box. You can specify the number of bins along each dimension through the --numbins (for first input column) an --numbins2 (for second input column). The output file from this command can then be given to any plotting tool to visualize the distribution.

For example, you can make high-quality plots within your paper (using the same LaTeX engine, thus blending very nicely with your text) using PGFPlots. Below you can see one such minimal example, using your favorite text editor, save it into a file, make the two small corrections in it, then run the commands shown at the top. This assumes that you have LaTeX installed, if not the steps to install a minimally sufficient LaTeX package on your system, see the respective section in Bootstrapping dependencies.

The two parts that need to be corrected are marked with ’%% <--’: the first one (XXXXXXXXX) should be replaced by the value to the --numbins option which is the number of bins along the first dimension. The second one (FILE.txt) should be replaced with the name of the file generated by Statistics.

%% Replace 'XXXXXXXXX' with your selected number of bins in the first
%% dimension.
%%
%% Then run these commands to build the plot in a LaTeX command.
%%    mkdir tikz
%%    pdflatex -shell-escape -halt-on-error plot.tex
\documentclass{article}

%% Load PGFPlots and set it to build the figure separately in a 'tikz'
%% directory (which has to exist before LaTeX is run). This
%% "externalization" is very useful to include the commands of multiple
%% plots in the middle of your paper/report, but also have the plots
%% separately to use in slides or other scenarios.
\usepackage{pgfplots}
\usetikzlibrary{external}
\tikzexternalize
\tikzsetexternalprefix{tikz/}

%% Start the document
\begin{document}

You can actually write a full paper here and include many figures!
Feel free to change this text.

%% Define the colormap.
\pgfplotsset{
/pgfplots/colormap={coldredux}{
[1cm]
rgb255(0cm)=(255,255,255)
rgb255(2cm)=(0,192,255)
rgb255(4cm)=(0,0,255)
rgb255(6cm)=(0,0,0)
}
}

%% Draw the plot.
\begin{tikzpicture}
\small
\begin{axis}[
width=\linewidth,
view={0}{90},
colorbar horizontal,
xlabel=X axis,
ylabel=Y axis,
ylabel shift=-0.1cm,
colorbar style={at={(0,1.01)}, anchor=south west,
xticklabel pos=upper},
]
surf,
mesh/ordering=rowwise,
mesh/cols=XXXXXXXXX,     %% <-- Number of bins in 1st column.
] file {FILE.txt};         %% <-- Name of aststatistics output.

\end{axis}
\end{tikzpicture}

\end{document}
`

Next: , Previous: , Up: Statistics   [Contents][Index]

#### 7.1.3 Sigma clipping

Let’s assume that you have pure noise (centered on zero) with a clear Gaussian distribution, or see Photon counting noise. Now let’s assume you add very bright objects (signal) on the image which have a very sharp boundary. By a sharp boundary