GNU Astronomy Utilities

Short Table of Contents

Table of Contents

Next: , Previous: , Up: (dir)   [Contents][Index]

GNU Astronomy Utilities

This book documents version 0.4 of the GNU Astronomy Utilities (Gnuastro). Gnuastro provides various programs and libraries for astronomical data manipulation and analysis.

Copyright © 2015-2017 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”.

To navigate easily in this web page, you can use the Next, Previous, Up and Contents links in the top and bottom of each page. Next and Previous will take you to the next or previous topic in the same level, for example from chapter 1 to chapter 2 or vice versa. To go to the sections or subsections, you have to click on the menu entries that are there when ever a sub-component to a title is present.


Next: , Previous: , Up: Top   [Contents][Index]

1 Introduction

GNU Astronomy Utilities (Gnuastro) is an official GNU package consisting of separate programs and libraries for the manipulation and analysis of astronomical data. All the programs share the same basic command-line user interface for the comfort of both the users and developers. Gnuastro is written to comply fully with the GNU coding standards so it integrates finely with the GNU/Linux operating system. This also enables astronomers to expect a fully familiar experience in the source code, building, installing and command-line user interaction that they have seen in all the other GNU software that they use. The official and always up to date version of this book (or manual) is freely available under GNU Free Doc. License in various formats (pdf, html, plain text, info, and as its Texinfo source) at http://www.gnu.org/software/gnuastro/manual/.

For users who are new to the GNU/Linux environment, unless otherwise specified most of the topics in Installation and Common program behavior are common to all GNU software, for example installation, managing command-line options or getting help (also see New to GNU/Linux?). So if you are new to this empowering environment, we encourage you to go through these chapters carefully. They can be a starting point from which you can continue to learn more from each program’s own manual and fully benefit from and enjoy this wonderful environment. Gnuastro also comes with a large set of libraries, so you can write your own programs using Gnuastro’s building blocks, see Review of library fundamentals for an introduction.

Finally it must be mentioned that in Gnuastro, no change to any program will be released before it has been fully documented in this here first. As discussed in Science and its tools this is the founding basis of the Gnuastro.


Next: , Previous: , Up: Introduction   [Contents][Index]

1.1 Quick start

Gnuastro has three mandatory dependencies and three optional dependencies for extra functionality, see Dependencies. The latest official release tarball is always available as gnuastro-latest.tar.gz. For better compression (faster download), and robust archival features, an Lzip compressed tarball is also available at gnuastro-latest.tar.lz, see Release tarball for more details on the tarball release. If you have downloaded the tarball in the TOPGNUASTRO directory and the dependencies are installed, you can unpack, compile, check and install Gnuastro with the following commands. If you use GNU Tar, the same command ($ tar xf) can also be used to unpack .tar.lz tarballs (the Lzip must already be installed).

$ cd TOPGNUASTRO
$ tar xf gnuastro-latest.tar.gz    # This works on `.tar.lz' too.
$ cd gnuastro-X.X                  # Replace X.X with version number.
$ ./configure
$ make -j8                         # Replace 8 with no. CPU threads.
$ make check
$ sudo make install

See Known issues if you confront any complications. For each program there is an ‘Invoke ProgramName’ sub-section in this book which explains how the programs should be run on the command-line. You can read it on the command-line by running the command $ info astprogname, see Naming convention and Getting help. The ‘Invoke ProgramName’ sub-section starts with a few examples of each program and goes on to explain the invocation details. In Tutorials some real life examples of how these programs might be used are given.


Next: , Previous: , Up: Introduction   [Contents][Index]

1.2 Science and its tools

History of science indicates that there are always inevitably unseen faults, hidden assumptions, simplifications and approximations in all our theoretical models, data acquisition and analysis techniques. It is precisely these that will ultimately allow future generations to advance the existing experimental and theoretical knowledge through their new solutions and corrections.

In the past, scientists would gather data and process them individually to achieve an analysis thus having a much more intricate knowledge of the data and analysis. The theoretical models also required little (if any) simulations to compare with the data. Today both methods are becoming increasingly more dependent on pre-written software. Scientists are dissociating themselves from the intricacies of reducing raw observational data in experimentation or from bringing the theoretical models to life in simulations. These ‘intricacies’ are precisely those unseen faults, hidden assumptions, simplifications and approximations that define scientific progress.

Unfortunately, most persons who have recourse to a computer for statistical analysis of data are not much interested either in computer programming or in statistical method, being primarily concerned with their own proper business. Hence the common use of library programs and various statistical packages. ... It’s time that was changed.

F. J. Anscombe. The American Statistician, Vol. 27, No. 1. 1973

Anscombe’s quartet 1 demonstrates how four data sets with widely different shapes (when plotted) give nearly identical output from standard regression techniques. Anscombe argues that “Good statistical analysis is not a purely routine matter, and generally calls for more than one pass through the computer”. Anscombe’s quartet can be generalized to say that users of a software cannot claim to understand how it works only based on the experience they have gained by frequently using it. This kind of subjective experience is prone to very serious mis-understandings about what it really does behind the scenes and can be misleading. This attitude is further encouraged through non-free software2. This approach to scientific software only helps in producing dogmas and an “obscurantist faith in the expert’s special skill, and in his personal knowledge and authority”3.

Program or be programmed. Choose the former, and you gain access to the control panel of civilization. Choose the latter, and it could be the last real choice you get to make.

Douglas Rushkoff. Program or be programmed, O/R Books (2010).

It is obviously impractical for any one human being to gain the intricate knowledge explained above for every step of an analysis. On the other hand, scientific data can be very large and numerous, for example images produced by telescopes in astronomy. This requires very efficient algorithms. To make things worse, natural scientists have generally not been trained in the advanced software techniques, paradigms and architecture that are taught in computer science or engineering courses and thus used in most software. The GNU Astronomy Utilities are an effort to tackle this issue.

Gnuastro is not just a software, this book is as important to the idea behind Gnuastro as the source code (software). This book has tried to learn from the success of the “Numerical Recipes” book in educating those who are not software engineers and computer scientists but still heavy users of computational algorithms, like astronomers. There are two major differences: the code and the explanations are segregated: the code is moved within the actual Gnuastro software source code and the underlying explanations are given here. In the source code every non-trivial step is heavily commented and correlated with this book, it follows the same logic of this book, and all the programs follow a similar internal data, function and file structure, see Program source. Complementing the code, this book focuses on thoroughly explaining the concepts behind those codes (history, mathematics, science, software and usage advise when necessary) along with detailed instructions on how to run the programs. At the expense of frustrating “professionals” or “experts”, this book and the comments in the code also intentionally avoid jargon and abbreviations. The source code and this book are thus intimately linked, and when considered as a single entity can be thought of as a real (an actual software accompanying the algorithms) “Numerical Recipes” for astronomy.

The other major and arguably more important difference is that “Numerical Recipes” does not allow you to distribute any code that you have learned from it. So while it empowers the privileged individual who has access to it, it exacerbates social ignorance. For example it does not allow you to release your software’s source code if you have used their codes, you can only publicly release binaries (a black box) to the community. Exactly at the opposite end of the spectrum, Gnuastro’s source code is released under the GNU general public license (GPL) and this book is released under the GNU free documentation license. You are therefore free to distribute any software you create using parts of Gnuastro’s source code or text, or figures from this book, see Your rights. While developing the source code and this book together, the developers of Gnuastro aim to impose the minimum requirements on you (in computer science, engineering and even the mathematics behind the tools) to understand and modify any step of Gnuastro if you feel the need to do so, see Why C programming language? and Program design philosophy.

Imagine if Galileo did not have the technical knowledge to build a telescope. Astronomical objects could not be seen with the Dutch military design of the telescope. In the beginning of his “The Sidereal Messenger” (1610) he cautions the readers on this issue and instructs them on how to build a suitable instrument: without a detailed description of “how” he made his observations, no one would believe him. The same is true today, science cannot progress with a black box. Before he actually saw the moons of Jupiter, the mountains on the Moon or the crescent of Venus, he was “evasive” to Kepler4. Science is not independent of its tools.

Bjarne Stroustrup (creator of the C++ language) says: “Without understanding software, you are reduced to believing in magic”. Ken Thomson (the designer or the Unix operating system) says “I abhor a system designed for the ‘user’ if that word is a coded pejorative meaning ‘stupid and unsophisticated’.” Certainly no scientist (user of a scientific software) would want to be considered a believer in magic, or ‘stupid and unsophisticated’. However, this can happen when scientists get too distant from the raw data and are mainly indulging themselves in their own high-level (abstract) models (creations). For example, roughly 5 years before special relativity and about two decades before quantum mechanics fundamentally changed Physics, Kelvin is quoted as saying:

There is nothing new to be discovered in physics now. All that remains is more and more precise measurement.

William Thomson (Lord Kelvin), 1900

A few years earlier, in a speech Albert. A. Michelson said:

The more important fundamental laws and facts of physical science have all been discovered, and these are now so firmly established that the possibility of their ever being supplanted in consequence of new discoveries is exceedingly remote.... Our future discoveries must be looked for in the sixth place of decimals.

Albert. A. Michelson, dedication of Ryerson Physics Lab, U. Chicago 1894

If scientists are considered to be more than mere “puzzle solvers”5 (simply adding to the decimals of known values or observing a feature in 10, 100, or 100000 more galaxies or stars, as Kelvin and Michelson clearly believed), they cannot just passively sit back and uncritically repeat the previous (observational or theoretical) methods/tools on new data. Today there is a wealth of raw telescope images ready (mostly for free) at the finger tips of anyone who is interested with a fast enough internet connection to download them. The only thing lacking is new ways to analyze this data and dig out the treasure that is lying hidden in them to existing methods and techniques.

New data that we insist on analyzing in terms of old ideas (that is, old models which are not questioned) cannot lead us out of the old ideas. However many data we record and analyze, we may just keep repeating the same old errors, missing the same crucially important things that the experiment was competent to find.

Jaynes, Probability theory, the logic of science. Cambridge U. Press (2003).

Next: , Previous: , Up: Introduction   [Contents][Index]

1.3 Your rights

The paragraphs below, in this section, belong to the GNU Texinfo6 manual and are not written by us! The name “Texinfo” is just changed to “GNU Astronomy Utilities” or “Gnuastro” because they are released under the same licenses and it is beautifully written to inform you of your rights.

GNU Astronomy Utilities is “free software”; this means that everyone is free to use it and free to redistribute it on certain conditions. Gnuastro is not in the public domain; it is copyrighted and there are restrictions on its distribution, but these restrictions are designed to permit everything that a good cooperating citizen would want to do. What is not allowed is to try to prevent others from further sharing any version of Gnuastro that they might get from you.

Specifically, we want to make sure that you have the right to give away copies of the programs that relate to Gnuastro, that you receive the source code or else can get it if you want it, that you can change these programs or use pieces of them in new free programs, and that you know you can do these things.

To make sure that everyone has such rights, we have to forbid you to deprive anyone else of these rights. For example, if you distribute copies of the Gnuastro related programs, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights.

Also, for our own protection, we must make certain that everyone finds out that there is no warranty for the programs that relate to Gnuastro. If these programs are modified by someone else and passed on, we want their recipients to know that what they have is not what we distributed, so that any problems introduced by others will not reflect on our reputation.

The precise conditions of the licenses for the programs currently being distributed that relate to Gnuastro are found in the GNU General Public license that accompany them. This book is covered by the GNU Free Documentation License.


Next: , Previous: , Up: Introduction   [Contents][Index]

1.4 Naming convention

Gnuastro is a package of independent programs and a collection of libraries, here we are mainly concerned with the programs. Each program has an official name which consists of one or two words, describing what they do. The latter are printed with no space, for example NoiseChisel or Crop. On the command-line, you can run them with their executable names which start with an ast and might be an abbreviation of the official name, for example astnoisechisel or astcrop, see Executable names.

We will use “ProgramName” for a generic official program name and astprogname for a generic executable name. In this book, the programs are classified based on what they do and thoroughly explained. An alphabetical list of the programs that are installed on your system with this installation are given in Gnuastro programs list. That list also contains the executable names and version numbers along with a one line description.


Next: , Previous: , Up: Introduction   [Contents][Index]

1.5 Version numbering

Gnuastro can have two formats of version numbers, for official and unofficial releases. Official Gnuastro releases are announced on the info-gnuastro mailing list, they have a version control tag in Gnuastro’s development history, and their version numbers are formatted like “A.B”. A is a major version number, marking a significant planned achievement (for example see GNU Astronomy Utilities 1.0), while B is a minor version number, see below for more on the distinction. Note that the numbers are not decimals, so version 2.34 is much more recent than version 2.5, which is not equal to 2.50.

Gnuastro also allows a unique version number for unofficial releases. Unofficial releases can mark any point in Gnuastro’s development history. This is done to allow astronomers to easily use any point in the version controlled history for their data-analysis and research publication. See Version controlled source for a complete introduction. This section is not just for developers and is very straightforward, so please have a look if you are interested in the cutting-edge. This unofficial version number is a meaningful and easy to read string of characters, unique to that particular point of history. With this feature, users can easily stay up to date with the most recent bug fixes and additions that are committed between official releases.

The unofficial version number is formatted like: A.B.C-D. A and B are the most recent official version number. C is the number of commits that have been made after version A.B. D is the first 4 or 5 characters of the commit hash number7. Therefore, the unofficial version number ‘3.92.8-29c8’, corresponds to the 8th commit after the official version 3.92 and its commit hash begins with 29c8. The unofficial version number is sort-able (unlike the raw hash) and as shown above is very descriptive of the state of the unofficial release. Of course an official release is preferred for publication (since its tarballs are easily available and it has gone through more tests, making it more stable), so if an official release is announced prior to your publication’s final review, please consider updating to the official release.

The major version number is set by a major goal which is defined by the developers and user community before hand, for example see GNU Astronomy Utilities 1.0. The incremental work done in minor releases are commonly small steps in achieving the major goal. Therefore, there is no limit on the number of minor releases and the difference between the (hypothetical) versions 2.927 and 3.0 can be a very small (negligible to the user) improvement that finalizes the defined goals.


Previous: , Up: Version numbering   [Contents][Index]

1.5.1 GNU Astronomy Utilities 1.0

Currently (prior to Gnuastro 1.0), the aim of Gnuastro is to have a complete system for data manipulation and analysis at least similar to IRAF8. So an astronomer can take all the standard data analysis steps (starting from raw data to the final reduced product and standard post-reduction tools) with the various programs in Gnuastro.

The maintainers of each camera or detector on a telescope can provide a completely transparent shell script or Makefile to the observer for data analysis. This script can set configuration files for all the required programs to work with that particular camera. The script can then run the proper programs in the proper sequence. The user/observer can easily follow the standard shell script to understand (and modify) each step and the parameters used easily. Bash (or other modern GNU/Linux shell scripts) are very powerful and made for this gluing job. This will simultaneously improve performance and transparency. Shell scripting (or Makefiles) are also very basic constructs that are easy to learn and readily available as part of the Unix-like operating systems. If there is no program to do a desired step, Gnuastro’s libraries can be used to build specific programs.

The main factor is that all observatories or projects can freely contribute to Gnuastro and all simultaneously benefit from it (since it doesn’t belong to any particular one of them), much like how for-profit organizations (for example RedHat, or Intel and many others) are major contributors to free and open source software for their shared benefit. Gnuastro’s copyright has been fully awarded to GNU, so it doesn’t belong to any particular astronomer or astronomical facility or project.


Next: , Previous: , Up: Introduction   [Contents][Index]

1.6 New to GNU/Linux?

Some astronomers initially install and use the GNU/Linux operating systems because the software that their research community use can only be run in this environment, the transition is not necessarily easy. To encourage you in investing the patience and time to make this transition, we define the GNU/Linux system and argue for the command-line interface of scientific software and how it is worth the (apparently steep) learning curve. Command-line interface contains a short overview of the very powerful command-line user interface. Tutorials is a complete chapter with some real world example applications of Gnuastro making good use of GNU/Linux capabilities written for newcomers to this environment. It is fully explained, easy and (hopefully) entertaining.

You might have already noticed that we are not using the name “Linux”, but “GNU/Linux”. Please take the time to have a look at the following essays and FAQs for a complete understanding of this very important distinction. In short, the Linux kernel is built using the GNU C library (glibc) and GNU compiler collection (gcc). The Linux kernel software alone is useless, in order have an operating system you need many more packages and the majority of such low-level packages in most distributions are developed as part of the GNU project: “the whole system is basically GNU with Linux loaded”. In the form of an analogy: to say “running Linux”, is like saying “driving your carburetor”.


Previous: , Up: New to GNU/Linux?   [Contents][Index]

1.6.1 Command-line interface

One aspect of Gnuastro that might be a little troubling to new GNU/Linux users is that (at least for the time being) it only has a command-line user interface (CLI). This might be contrary to the mostly graphical user interface (GUI) experience with proprietary operating systems. To a first time user, the command-line does appear much more complicated and adapting to it might not be easy and a little frustrating at first. This is understandable and also experienced by anyone who started using the computer (from childhood) in a graphical user interface. Here we hope to convince you of the unique benefits of this interface which can greatly enhance your productivity while complementing your GUI experience.

Through GNOME 39, most GNU/Linux based operating systems now have a very advanced and useful GUI. Since the GUI was created long after the command-line, some wrongly consider the command line to be obsolete. Both interfaces are very useful for different tasks (for example you can’t view an image, video, pdf document or web page on the command-line!), on the other hand you can’t reproduce your results easily in the GUI. Therefore they should not be regarded as rivals but as complementary user interfaces, here we will outline how the CLI can be useful in scientific programs.

You can think of the GUI as a veneer over the CLI to facilitate a small subset of all the possible CLI operations. Each click you do on the GUI, can be thought of as internally running a different CLI command. So asymptotically (if a good designer can design a GUI which is able to show you all the possibilities to click on) the GUI is only as powerful as the command-line. In practice, such graphical designers are very hard to find for every program, so the GUI operations are always a subset of the internal CLI commands. For programs that are only made for the GUI, this results in not including lots of potentially useful operations. It also results in ‘interface design’ to be a crucially important part of any GUI program. Scientists don’t usually have enough resources to hire a graphical designer, also the complexity of the GUI code is far more than CLI code, which is harmful for a scientific software, see Science and its tools.

For programs that have a GUI, one action on the GUI (moving and clicking a mouse, or tapping a touchscreen) might be more efficient and easier than its CLI counterpart (typing the program name and your desired configuration). However, if you have to repeat that same action more than once, the GUI will soon become very frustrating and prone to errors. Unless the designers of a particular program decided to design such a system for a particular GUI action, there is no general way to run any possible series of actions automatically on the GUI.

On the command-line, you can run any series of of actions which can come from various CLI capable programs you have decided your self in any possible permutation with one command10. This allows for much more creativity and exact reproducibility that is not possible to a GUI user. For technical and scientific operations, where the same operation (using various programs) has to be done on a large set of data files, this is crucially important. It also allows exact reproducibility which is a foundation principle for scientific results. The most common CLI (which is also known as a shell) in GNU/Linux is GNU Bash, we strongly encourage you to put aside several hours and go through this beautifully explained web page: https://flossmanuals.net/command-line/. You don’t need to read or even fully understand the whole thing, only a general knowledge of the first few chapters are enough to get you going.

Since the operations in the GUI are very limited and they are visible, reading a manual is not that important in the GUI (most programs don’t even have any!). However, to give you the creative power explained above, with a CLI program, it is best if you first read the manual of any program you are using. You don’t need to memorize any details, only an understanding of the generalities is needed. Once you start working, there are more easier ways to remember a particular option or operation detail, see Getting help.

To experience the command-line in its full glory and not in the GUI terminal emulator, press the following keys together: CTRL+ALT+F411 to access the virtual console. To return back to your GUI, press the same keys above replacing F4 with F7 (or F1, or F2, depending on your GNU/Linux distribution). In the virtual console, the GUI, with all its distracting colors and information, is gone. Enabling you to focus entirely on your actual work.

For operations that use a lot of your system’s resources (processing a large number of large astronomical images for example), the virtual console is the place to run them. This is because the GUI is not competing with your research work for your system’s RAM and CPU. Since the virtual consoles are completely independent, you can even log out of your GUI environment to give even more of your hardware resources to the programs you are running and thus reduce the operating time.

Since it uses far less system resources, the CLI is also very convenient for remote access to your computer. Using secure shell (SSH) you can log in securely to your system (similar to the virtual console) from anywhere even if the connection speeds are low. There are apps for smart phones and tablets which allow you to do this.


Next: , Previous: , Up: Introduction   [Contents][Index]

1.7 Report a bug

According to Wikipedia “a software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways”. So when you see that a program is crashing, not reading your input correctly, giving the wrong results, or not writing your output correctly, you have found a bug. In such cases, it is best if you report the bug to the developers. The programs will also report bugs in known impossible situations (which are caused by something unexpected) and will ask the users to report the bug.

Prior to actually filing a bug report, it is best to search previous reports. The issue might have already been found and even solved. The best place to check if your bug has already been discussed is the bugs tracker on Gnuastro project webpage at https://savannah.gnu.org/bugs/?group=gnuastro. In the top search fields (under “Display Criteria”) set the “Open/Closed” drop-down menu to “Any” and choose the respective program or general category of the bug in “Category” and click the “Apply” button. The results colored green have already been solved and the status of those colored in red is shown in the table.

Recently corrected bugs are probably not yet publicly released because they are scheduled for the next Gnuastro stable release. If the bug is solved but not yet released and it is an urgent issue for you, you can get the version controlled source and compile that, see Version controlled source.

To solve the issue as readily as possible, please follow the following to guidelines in your bug report. The How to Report Bugs Effectively and How To Ask Questions The Smart Way essays also provide some very good generic advice for all software (don’t contact their authors for Gnuastro’s problems). Mastering the art of giving good bug reports (like asking good questions) can greatly enhance your experience with any free and open source software. So investing the time to read through these essays will greatly reduce your frustration after you see something doesn’t work the way you feel it is supposed to for a large range of software, not just Gnuastro.

Be descriptive

Please provide as many details as possible and be very descriptive. Explain what you expected and what the output was: it might be that your expectation was wrong. Also please clearly state which sections of the Gnuastro book (this book), or other references you have studied to understand the problem. This can be useful in correcting the book (adding links to likely places where users will check). But more importantly, it will be very encouraging for the developers, since you are showing how serious you are about the problem and that you have actually put some thought into it. “To be able to ask a question clearly is two-thirds of the way to getting it answered.” – John Ruskin (1819-1900).

Individual and independent bug reports

If you have found multiple bugs, please send them as separate (and independent) bugs (as much as possible). This will significantly help us in managing and resolving them sooner.

Reproducible bug reports

If we cannot exactly reproduce your bug, then it is very hard to resolve it. So please send us a Minimal working example12 along with the description. For example in running a program, please send us the full command-line text and the output with the -P option, see Operating mode options. If it is caused only for a certain input, also send us that input file. In case the input FITS is large, please use Crop to only crop the problematic section and make it as small as possible so it can easily be uploaded and downloaded and not waste the archive’s storage, see Crop.

There are generally two ways to inform us of bugs:

Once the items have been registered in the mailing list or webpage, the developers will add it to either the “Bug Tracker” or “Task Manager” trackers of the Gnuastro project webpage. These two trackers can only be edited by the Gnuastro project developers, but they can be browsed by anyone, so you can follow the progress on your bug. You are most welcome to join us in developing Gnuastro and fixing the bug you have found maybe a good starting point. Gnuastro is designed to be easy for anyone to develop (see Science and its tools) and there is a full chapter devoted to developing it: Developing.


Next: , Previous: , Up: Introduction   [Contents][Index]

1.8 Suggest new feature

We would always be very happy to hear of suggested new features. For every program there are already lists of features that we are planning to add. You can see the current list of plans from the Gnuastro project webpage at https://savannah.gnu.org/projects/gnuastro/ and following “Tasks”→“Browse” on the horizontal menu at the top of the page immediately under the title, see Gnuastro project webpage. If you want to request a feature to an existing program, click on the “Display Criteria” above the list and under “Category”, choose that particular program. Under “Category” you can also see the existing suggestions for new programs or other cases like installation, documentation or libraries. Also be sure to set the “Open/Closed” value to “Any”.

If the feature you want to suggest is not already listed in the task manager, then follow the steps that are fully described in Report a bug. Please have in mind that the developers are all very busy with their own astronomical research, and implementing existing “task”s to add or resolving bugs. Gnuastro is a volunteer effort and none of the developers are paid for their hard work. So, although we will try our best, please don’t not expect that your suggested feature be immediately included (with the next release of Gnuastro).

The best person to apply the exciting new feature you have in mind is you, since you have the motivation and need. In fact Gnuastro is designed for making it as easy as possible for you to hack into it (add new features, change existing ones and so on), see Science and its tools. Please have a look at the chapter devoted to developing (Developing) and start applying your desired feature. Once you have added it, you can use it for your own work and if you feel you want others to benefit from your work, you can request for it to become part of Gnuastro. You can then join the developers and start maintaining your own part of Gnuastro. If you choose to take this path of action please contact us before hand (Report a bug) so we can avoid possible duplicate activities and get interested people in contact.

Gnuastro is a collection of low level programs: As described in Program design philosophy, a founding principle of Gnuastro is that each library or program should be very basic and low-level. High level jobs should be done by running the separate programs or using separate functions in succession through a shell script or calling the libraries by higher level functions, see the examples in Tutorials. So when making the suggestions please consider how your desired job can best be broken into separate steps and modularized.


Next: , Previous: , Up: Introduction   [Contents][Index]

1.9 Announcements

Gnuastro has a dedicated mailing list for making announcements. Anyone that is interested can subscribe to this mailing list to stay up to date with new releases or when the dependencies (see Dependencies) have been updated. To subscribe to this list, please visit https://lists.gnu.org/mailman/listinfo/info-gnuastro.


Next: , Previous: , Up: Introduction   [Contents][Index]

1.10 Conventions

In this book we have the following conventions:


Previous: , Up: Introduction   [Contents][Index]

1.11 Acknowledgments

The list of Gnuastro authors is available at the start of this book and the AUTHORS file in the source code. Here the authors wish to gratefully acknowledge the help and support they received from other people and institutions who had an indirect (not committed in the version controlled history) role in Gnuastro. The plain text file THANKS which is distributed along with the source code also contains this list.

The Japanese Ministry of Science and Technology (MEXT) scholarship for Mohammad Akhlaghi’s Masters and PhD period in Tohoku University Astronomical Institute had an instrumental role in the long term learning and planning that made the idea of Gnuastro possible. The very critical view points of Professor Takashi Ichikawa (from Tohoku University) were also instrumental in the initial ideas and creation of Gnuastro. Brandon Invergo, Karl Berry and Richard Stallman also provided very useful suggestions during the GNU evaluation process. Bob Proulx from Savannah, has kindly supported Gnuastro’s project webpage on Savannah and the management of its version controlled source server there.

We would also like to gratefully thank (in alphabetical order by family name) Marjan Akbari, Roland Bacon, Nicolas Bouché, Fernando Buitrago, Adrian Bunk, Rosa Calvi, Antonio Diaz Diaz, Stephen Hamer, Raúl Infante Sainz, Aurélien Jarno, Lee Kelvin, Mohammad-Reza Khellat, Alan Lefor, Guillaume Mahler, Francesco Montanari, William Pence, Yahya Sefidbakht, Ole Streicher, Ignacio Trujillo, David Valls-Gabaud and Christopher Willmer for their useful and constructive comments and suggestions. Finally we should thank all the (sometimes anonymous) people in various online forums which patiently answered all our small (but important) technical questions.

All work on Gnuastro has been voluntary, but the authors are most grateful to the following institutions (in chronological order) for hosting us in our research:

Ministry of education, culture, sports, science and technology (MEXT), Japan.
Tohoku University Astronomical Institute, Sendai, Japan.
University of Salento, Lecce, Italy.
Centre national de la recherche scientifique (CNRS), France.
Centre de Recherche Astrophysique de Lyon, University of Lyon 1, France.


Next: , Previous: , Up: Top   [Contents][Index]

2 Tutorials

In this chapter we give several tutorials or cookbooks on how to use the various tools in Gnuastro for your scientific purposes. In these tutorials, we have intentionally avoided too many cross references to make it more easily readable. To get more information about a particular program, you can visit the section with the same name as the program in this book. Each program section starts by explaining the general concepts behind what it does. If you only want to see an explanation of the options and arguments of any program, see the subsection titled ‘Invoking ProgramName’. See Conventions, for an explanation of the conventions we use in the example codes through the book.

The tutorials in this section use a fictional setting of some historical figures in the history of astronomy. We have tried to show how Gnuastro would have been helpful for them in making their discoveries if there were GNU/Linux computers in their times! Please excuse us for any historical inaccuracy, this is not intended to be a historical reference. This form of presentation can make the tutorials more pleasant and entertaining to read while also being more practical (explaining from a user’s point of view)14. The main reference for the historical facts mentioned in these fictional settings was Wikipedia.


Next: , Previous: , Up: Tutorials   [Contents][Index]

2.1 Hubble visually checks and classifies his catalog

In 1924 Hubble15 announced his discovery that some of the known nebulous objects are too distant to be within the the Milky Way (or Galaxy) and that they were probably distant Galaxies16 in their own right. He had also used them to show that the redshift of the nebulae increases with their distance. So now he wants to study them more accurately to see what they actually are. Since they are nebulous or amorphous, they can’t be modeled (like stars that are always a point) easily. So there is no better way to distinguish them than to visually inspect them and see if it is possible to classify these nebulae or not.

Hubble has stored all the FITS images of the objects he wants to visually inspect in his /mnt/data/images directory. He has also stored his catalog of extra-galactic nebulae in /mnt/data/catalogs/extragalactic.txt. Any normal user on his GNU/Linux system (including himself) only has read access to the contents of the /mnt/data directory. He has done this by running this command as root:

# chmod -R 755 /mnt/data

Hubble has done this intentionally to avoid mistakenly deleting or modifying the valuable images he has taken at Mount Wilson while he is working as an ordinary user. Retaking all those images and data is simply not an option. In fact they are also in another hard disk (/dev/sdb1). So if the hard disk which stores his GNU/Linux distribution suddenly malfunctions due to work load, his data is not in harms way. That hard disk is only mounted to this directory when he wants to use it with the command:

# mount /dev/sdb1 /mnt/data

In short, Hubble wants to keep his data safe and fortunately by default Gnuastro allows for this. Hubble creates a temporary visualcheck directory in his home directory for this check. He runs the following commands to make the directory and change to it17:

$ mkdir ~/visualcheck
$ cd ~/visualcheck
$ pwd
/home/edwin/visualcheck
$ ls

Hubble has multiple images in /mnt/data/images, some of his targets might be on the edges of an image and so several images need to be stitched to give a good view of them. Also his extra-galactic targets belong to various pointings in the sky, so they are not in one large image. Gnuastro’s Crop is just the program he wants. The catalog in extragalactic.txt is a plain text file which stores the basic information of all his known 200 extra-galactic nebulae. In its second column it has each object’s Right Ascension (the first column is a label he has given to each object) and in the third the object’s declination.

$ astcrop --racol=2 --deccol=3 /mnt/data/images/*.fits     \
             /mnt/data/catalogs/extragalactic.txt
Crop started on Tue Jun  14 10:18:11 1932
  ---- ./4_crop.fits                  1 1
  ---- ./2_crop.fits                  1 1
  ---- ./1_crop.fits                  1 1
[[[ Truncated middle of list ]]]
  ---- ./198_crop.fits                1 1
  ---- ./195_crop.fits                1 1
  - 200 images created.
  - 200 were filled in the center.
  - 0 used more than one input.
Crop finished in:  2.429401 (seconds)

Hubble already knows that thread allocation to the the CPU cores is asynchronous. Hence each time you run it, the order of which job gets done first differs. When using Crop the order of outputs is irrelevant since each crop is independent of the rest. This is why the crops are not necessarily created in the same input order. He is satisfied with the default width of the outputs (which he inspected by running $ astcrop -P). If he wanted a different width for the cropped images, he could do that with the --wwidth option which accepts a value in arc-seconds. When he lists the contents of the directory again he finds his 200 objects as separate FITS images.

$ ls
1_crop.fits 2_crop.fits ... 200_crop.fits

The FITS image format was not designed for efficient/fast viewing, but mainly for accurate storing of the data. So he chooses to convert the cropped images to a more common image format to view them more quickly and easily through standard image viewers (which load much faster than FITS image viewer). JPEG is one of the most recognized image formats that is supported by most image viewers. Fortunately Gnuastro has just such a tool to convert various types of file types to and from each other: ConvertType. Hubble has already heard of GNU Parallel from one of his colleagues at Mount Wilson Observatory. It allows multiple instances of a command to be run simultaneously on the system, so he uses it in conjunction with ConvertType to convert all the images to JPEG.

$ parallel astconvertt -ojpg ::: *_crop.fits

For his graphical user interface Hubble is using GNOME which is the default in most distributions in GNU/Linux. The basic image viewer in GNOME is the Eye of GNOME, which has the executable file name eog 18. Since he has used it before, he knows that once it opens an image, he can use the ENTER or SPACE keys on the keyboard to go to the next image in the directory or the Backspace key to to go the previous image. So he opens the image of the first object with the command below and with his cup of coffee in his other hand, he flips through his targets very fast to get a good initial impression of the morphologies of these extra-galactic nebulae.

$ eog 1_crop.jpg

Hubble’s cup of coffee is now finished and he also got a nice general impression of the shapes of the nebulae. He tentatively/mentally classified the objects into three classes while doing the visual inspection. One group of the nebulae have a very simple elliptical shape and seem to have no internal special structure, so he gives them code 1. Another clearly different class are those which have spiral arms which he associates with code 2 and finally there seems to be a class of nebulae in between which appear to have a disk but no spiral arms, he gives them code 3.

Now he wants to know how many of the nebulae in his extra-galactic sample are within each class. Repeating the same process above and writing the results on paper is very time consuming and prone to errors. Fortunately Hubble knows the basics of GNU Bash shell programming, so he writes the following short script with a loop to help him with the job. After all, computers are made for us to operate and knowing basic shell programming gives Hubble this ability to creatively operate the computer as he wants. So using GNU Emacs19 (his favorite text editor) he puts the following text in a file named classify.sh.

for name in *.jpg
do
    eog $name &
    processid=$!
    echo -n "$name belongs to class: "
    read class
    echo $name $class >> classified.txt
    kill $processid
done

Fortunately GNU Emacs or even simpler editors like Gedit (part of the GNOME graphical user interface) will display the variables and shell constructs in different colors which can really help in understanding the script. Put simply, the for loop gets the name of each JPEG file in the directory this script is run in and puts it in name. In the shell, the value of a variable is used by putting a $ sign before the variable name. Then Eye of GNOME is run on the image in the background to show him that image and its process ID is saved internally (this is necessary to close Eye of GNOME later). The shell then prompts the user to specify a class and after saving it in class, it prints the file name and the given class in the next line of a file named classified.txt. To make the script executable (so he can run it later any time he wants) he runs:

$ chmod +x classify.sh

Now he is ready to do the classification, so he runs the script:

$ ./classify.sh

In the end he can delete all the JPEG and FITS files along with Crop’s log file with the following short command. The only files remaining are the script and the result of the classification.

$ rm *.jpg *.fits astcrop.txt
$ ls
classified.txt   classify.sh

He can now use classified.txt as input to a plotting program to plot the histogram of the classes and start making interpretations about what these nebulous objects that are outside of the Galaxy are.


Previous: , Up: Tutorials   [Contents][Index]

2.2 Sufi simulates a detection

It is the year 953 A.D. and Sufi20 is in Shiraz as a guest astronomer. He had come there to use the advanced 123 centimeter astrolabe for his studies on the Ecliptic. However, something was bothering him for a long time. While mapping the constellations, there were several non-stellar objects that he had detected in the sky, one of them was in the Andromeda constellation. During a trip he had to Yemen, Sufi had seen another such object in the southern skies looking over the Indian ocean. He wasn’t sure if such cloud-like non-stellar objects (which he was the first to call ‘Sahābi’ in Arabic or ‘nebulous’) were real astronomical objects or if they were only the result of some bias in his observations. Could such diffuse objects actually be detected at all with his detection technique?

He still had a few hours left until nightfall (when he would continue his studies on the ecliptic) so he decided to find an answer to this question. He had thoroughly studied Claudius Ptolemy’s (90 – 168 A.D) Almagest and had made lots of corrections to it, in particular in measuring the brightness. Using his same experience, he was able to measure a magnitude for the objects and wanted to simulate his observation to see if a simulated object with the same brightness and size could be detected in a simulated noise with the same detection technique. The general outline of the steps he wants to take are:

  1. Make some mock profiles in an oversampled image. The initial mock image has to be oversampled prior to convolution or other forms of transformation in the image. Through his experiences, Sufi knew that this is because the image of heavenly bodies is actually transformed by the atmosphere or other sources outside the atmosphere (for example gravitational lenses) prior to being sampled on an image. Since that transformation occurs on a continuous grid, to best approximate it, he should do all the work on a finer pixel grid. In the end he can re-sample the result to the initially desired grid size.
  2. Convolve the image with a PSF image that is oversampled to the same value as the mock image. Since he wants to finish in a reasonable time and the PSF kernel will be very large due to oversampling, he has to use frequency domain convolution which has the side effect of dimming the edges of the image. So in the first step above he also has to build the image to be larger by at least half the width of the PSF convolution kernel on each edge.
  3. With all the transformations complete, the image should be re-sampled to the same size of the pixels in his detector.
  4. He should remove those extra pixels on all edges to remove frequency domain convolution artifacts in the final product.
  5. He should add noise to the (until now, noise-less) mock image. After all, all observations have noise associated with them.

Fortunately Sufi had heard of GNU Astronomy Utilities from a colleague in Isfahan (where he worked) and had installed it on his computer a year before. It had tools to do all the steps above. He had used MakeProfiles before, but wasn’t sure which columns he had chosen in his user or system wide configuration files for which parameters, see Configuration files. So to start his simulation, Sufi runs MakeProfiles with the -P option to make sure what columns in a catalog MakeProfiles currently recognizes and the output image parameters. In particular, Sufi is interested in the recognized columns (shown below).

$ astmkprof -P

[[[ ... Truncated lines ... ]]]

# Output:
 type         float32     # Type of output: e.g., int16, float32, etc...
 naxis        1000,1000   # Number of pixels along first FITS axis.
 oversample   5           # Scale of oversampling (>0 and odd).

[[[ ... Truncated lines ... ]]]

# Columns, by info (see `--searchin'), or number (starting from 1):
 ccol         2           # Center along first FITS axis (horizontal).
 ccol         3           # Center along second FITS axis (vertical).
 fcol         4           # sersic (1), moffat (2), gaussian (3),
                          # point (4), flat (5), circumference (6).
 rcol         5           # Effective radius or FWHM in pixels.
 ncol         6           # Sersic index or Moffat beta.
 pcol         7           # Position angle.
 qcol         8           # Axis ratio.
 mcol         9           # Magnitude.
 tcol         10          # Truncation in units of radius or pixels.

[[[ ... Truncated lines ... ]]]

In Gnuastro, column counting starts from 1, so the columns are ordered such that the first column (number 1) can be an ID he specifies for each object (and MakeProfiles ignores), each subsequent column is used used for another property of the profile. It is also possible to use column names for the values of these options and change these defaults, but Sufi preferred to stick to the defaults. Fortunately MakeProfiles has the capability to also make the PSF which is to be used on the mock image and using the --prepforconv option, he can also make the mock image to be larger by the correct amount and all the sources to be shifted by the correct amount.

For his initial check he decides to simulate the nebula in the Andromeda constellation. The night he was observing, the PSF had roughly a FWHM of about 5 pixels, so as the first row (profile), he defines the PSF parameters and sets the radius column (rcol above, fifth column) to 5.000, he also chooses a Moffat function for its functional form. Remembering how diffuse the nebula in the Andromeda constellation was, he decides to simulate it with a mock Sérsic index 1.0 profile. He wants the output to be 500 pixels by 500 pixels, so he puts the mock profile in the center. Looking at his drawings of it, he decides a reasonable effective radius for it would be 40 pixels on this image pixel scale, he sets the axis ratio and position angle to approximately correct values too and finally he sets the total magnitude of the profile to 3.44 which he had accurately measured. Sufi also decides to truncate both the mock profile and PSF at 5 times the respective radius parameters. In the end he decides to put four stars on the four corners of the image at very low magnitudes as a visual scale.

Using all the information above, he creates the catalog of mock profiles he wants in a file named cat.txt (short for catalog) using his favorite text editor and stores it in a directory named simulationtest in his home directory. [The cat command prints the contents of a file, short for concatenation. So please copy-paste the lines after “cat cat.txt” into cat.txt when the editor opens in the steps above it, note that there are 7 lines, first one starting with #]:

$ mkdir ~/simulationtest
$ cd ~/simulationtest
$ pwd
/home/rahman/simulationtest
$ emacs cat.txt
$ ls
cat.txt
$ cat cat.txt
# Column 4: PROFILE_NAME [,str7] Radial profile's functional name
 1  0.0000   0.0000  moffat  5.000  4.765  0.0000  1.000  30.000  5.000
 2  250.00   250.00  sersic  40.00  1.000  -25.00  0.400  3.4400  5.000
 3  50.000   50.000  point   0.000  0.000  0.0000  0.000  9.0000  0.000
 4  450.00   50.000  point   0.000  0.000  0.0000  0.000  9.2500  0.000
 5  50.000   450.00  point   0.000  0.000  0.0000  0.000  9.5000  0.000
 6  450.00   450.00  point   0.000  0.000  0.0000  0.000  9.7500  0.000

The zero-point magnitude for his observation was 18. Now he has all the necessary parameters and runs MakeProfiles with the following command:


$ astmkprof --prepforconv --naxis=500,500 --zeropoint=18.0 cat.txt
MakeProfiles started on Sat Oct  6 16:26:56 953
  - 6 profiles read from cat.txt
  - Random number generator (RNG) type: mt19937
  - Using 8 threads.
  ---- row 2 complete, 5 left to go
  ---- row 3 complete, 4 left to go
  ---- row 4 complete, 3 left to go
  ---- row 5 complete, 2 left to go
  ---- ./0_cat.fits created.
  ---- row 0 complete, 1 left to go
  ---- row 1 complete, 0 left to go
  - ./cat.fits created.                                0.041651 seconds
MakeProfiles finished in 0.267234 seconds

$ls
0_cat.fits  cat.fits  cat.txt

The file 0_cat.fits is the PSF Sufi had asked for and cat.fits is the image containing the other 5 objects. The PSF is now available to him as a separate file for the convolution step. While he was preparing the catalog, one of his students approached him and was also following the steps. When he opened the image, the student was surprised to see that all the stars are only one pixel and not in the shape of the PSF as we see when we image the sky at night. So Sufi explained to him that the stars will take the shape of the PSF after convolution and this is how they would look if we didn’t have an atmosphere or an aperture when we took the image. The size of the image was also surprising for the student, instead of 500 by 500, it was 2630 by 2630 pixels. So Sufi had to explain why oversampling is very important for parts of the image where the flux change is significant over a pixel. Sufi then explained to him that after convolving we will re-sample the image to get our originally desired size. To convolve the image, Sufi ran the following command:

$ astconvolve --kernel=0_cat.fits cat.fits
Convolve started on Mon Apr  6 16:35:32 953
  - Using 8 CPU threads.
  - Input: cat.fits (hdu: 1)
  - Kernel: 0_cat.fits (hdu: 1)
  - Input and Kernel images padded.                    0.075541 seconds
  - Images converted to frequency domain.              6.728407 seconds
  - Multiplied in the frequency domain.                0.040659 seconds
  - Converted back to the spatial domain.              3.465344 seconds
  - Padded parts removed.                              0.016767 seconds
Convolve finished in:  10.422161 seconds

$ls
0_cat.fits  cat_convolved.fits  cat.fits  cat.txt

When convolution finished, Sufi opened the cat_convolved.fits file and showed the effect of convolution to his student and explained to him how a PSF with a larger FWHM would make the points even wider. With the convolved image ready, they were prepared to re-sample it to the original pixel scale Sufi had planned [from the $ astmkprof -P command above, recall that MakeProfiles had oversampled the image by 5 times]. Sufi explained the basic concepts of warping the image to his student and ran Warp with the following command:

$ astwarp --scale=1/5 --centeroncorner cat_convolved.fits
Warp started on Mon Apr  6 16:51:59 953
 Using 8 CPU threads.
 Input: cat_convolved.fits (hdu: 1)
 matrix:
	0.2000   0.0000   0.4000
	0.0000   0.2000   0.4000
	0.0000   0.0000   1.0000

$ ls
0_cat.fits          cat_convolved_scaled.fits     cat.txt
cat_convolved.fits  cat.fits

$ astfits -p cat_convolved_scaled.fits | grep NAXIS
NAXIS   =                    2 / number of data axes
NAXIS1  =                  526 / length of data axis 1
NAXIS2  =                  526 / length of data axis 2

cat_convolved_warped.fits now has the correct pixel scale. However, the image is still larger than what we had wanted, it is 526 (\(500+13+13\)) by 526 pixels. The student is slightly confused, so Sufi also resamples the PSF with the same scale and shows him that it is 27 (\(2\times13+1\)) by 27 pixels. Sufi goes on to explain how frequency space convolution will dim the edges and that is why he added the --prepforconv option to MakeProfiles, see If convolving afterwards. Now that convolution is done, Sufi can remove those extra pixels using Crop with the command below. Crop’s --section option accepts coordinates inclusively and counting from 1 (according to the FITS standard), so the crop’s first pixel has to be 14, not 13.

$ astcrop cat_convolved_scaled.fits --section=14:*-13,14:*-13    \
          --zeroisnotblank
Crop started on Sat Oct  6 17:03:24 953
  - Read metadata of 1 image.                          0.001304 seconds
  ---- ...nvolved_scaled_cropped.fits created: 1 input.
Crop finished in:  0.027204 seconds

$ls
0_cat.fits          cat_convolved_scaled_cropped.fits  cat.fits
cat_convolved.fits  cat_convolved_scaled.fits          cat.txt

Finally, cat_convolved_scaled_cropped.fits has the same dimensions as Sufi had desired in the beginning. All this trouble was certainly worth it because now there is no dimming on the edges of the image and the profile centers are more accurately sampled. The final step to simulate a real observation would be to add noise to the image. Sufi set the zeropoint magnitude to the same value that he set when making the mock profiles and looking again at his observation log, he had measured the background flux near the nebula had a magnitude of 7 that night. So using these values he ran MakeNoise:

$ astmknoise --zeropoint=18 --background=7 --output=out.fits    \
             cat_convolved_warped_crop.fits
MakeNoise started on Mon Apr  6 17:05:06 953
  - Generator type: mt19937
  - Generator seed: 1428318100
MakeNoise finished in:  0.033491 (seconds)

$ls
0_cat.fits         cat_convolved_scaled_cropped.fits cat.fits  out.fits
cat_convolved.fits cat_convolved_scaled.fits         cat.txt

The out.fits file now contains the noised image of the mock catalog Sufi had asked for. Seeing how the --output option allows the user to specify the name of the output file, the student was confused and wanted to know why Sufi hadn’t used it before? Sufi then explained to him that for intermediate steps it is best to rely on the automatic output, see Automatic output. Doing so will give all the intermediate files the same basic name structure, so in the end you can simply remove them all with the Shell’s capabilities. So Sufi decided to show this to the student by making a shell script from the commands he had used before.

The command-line shell has the capability to read all the separate input commands from a file. This is very useful when you want to do the same thing multiple times, with only the names of the files or minor parameters changing between the different instances. Using the shell’s history (by pressing the up keyboard key) Sufi reviewed all the commands and then he retrieved the last 5 commands with the $ history 5 command. He selected all those lines he had input and put them in a text file named mymock.sh. Then he defined the edge and base shell variables for easier customization later. Finally, before every command, he added some comments (lines starting with #) for future readability.

# Basic settings:
edge=13
base=cat

# Remove any existing image to avoid confusion.
rm out.fits

# Run MakeProfiles to create an oversampled FITS image.
astmkprof --prepforconv --naxis=500,500 --zeropoint=18.0 "$base".txt

# Convolve the created image with the kernel.
astconvolve --kernel=0_"$base".fits "$base".fits

# Scale the image back to the intended resolution.
astwarp --scale=1/5 --centeroncorner "$base"_convolved.fits

# Crop the edges out (dimmed during convolution). `--section' accepts
# inclusive coordinates, so the start of start of the section must be
# one pixel larger than its end.
st_edge=$(( edge + 1 ))
astcrop "$base"_convolved_scaled.fits --zeroisnotblank          \
        --section=$st_edge:*-$edge,$st_edge:*-$edge

# Add noise to the image.
astmknoise --zeropoint=18 --background=7 --output=out.fits      \
           "$base"_convolved_scaled_cropped.fits

# Remove all the temporary files.
rm 0*.fits cat*.fits

He used this chance to remind the student of the importance of comments in code or shell scripts: when writing the code, you have a very good mental picture of what you are doing, so writing comments might seem superfluous and excessive. However, in one month when you want to re-use the script, you have lost that mental picture and rebuilding it is can be very time-consuming and frustrating. The importance of comments is further amplified when you want to share the script with a friend/colleague. So it is very good to accompany any script/code with useful comments while you are writing it (have a good mental picture of what/why you are doing something).

Sufi then explained to the eager student that you define a variable by giving it a name, followed by an = sign and the value you want. Then you can reference that variable from anywhere in the script by calling its name with a $ prefix. So in the script whenever you see $base, the value we defined for it above is used. If you use advanced editors like GNU Emacs or even simpler ones like Gedit (part of the GNOME graphical user interface) the variables will become a different color which can really help in understanding the script. We have put all the $base variables in double quotation marks (") so the variable name and the following text do not get mixed, the shell is going to ignore the " after replacing the variable value. To make the script executable, Sufi ran the following command:

$ chmod +x mymock.sh

Then finally, Sufi ran the script, simply by calling its file name:

$ ./mymock.sh

After the script finished, the only file remaining is the out.fits file that Sufi had wanted in the beginning. Sufi then explained to the student how he could run this script anywhere that he has a catalog if the script is in the same directory. The only thing the student had to modify in the script was the name of the catalog (the value of the base variable in the start of the script) and the value to the edge variable if he changed the PSF size. The student was also very happy to hear that he won’t need to make it executable again when he makes changes later, it will remain executable unless he explicitly changes the executable flag with chmod.

The student was really excited, since now, through simple shell scripting, he could really speed up his work and run any command in any fashion he likes allowing him to be much more creative in his works. Until now he was using the graphical user interface which doesn’t have such a facility and doing repetitive things on it was really frustrating and some times he would make mistakes. So he left to go and try scripting on his own computer.

Sufi could now get back to his own work and see if the simulated nebula which resembled the one in the Andromeda constellation could be detected or not. Although it was extremely faint21, fortunately it passed his detection tests and he wrote it in the draft manuscript that would later become “Book of fixed stars”. He still had to check the other nebula he saw from Yemen and several other such objects, but they could wait until tomorrow (thanks to the shell script, he only has to define a new catalog). It was nearly sunset and they had to begin preparing for the night’s measurements on the ecliptic.


Next: , Previous: , Up: Top   [Contents][Index]

3 Installation

The latest released version of Gnuastro source code is always available at the following URL:

http://ftpmirror.gnu.org/gnuastro/gnuastro-latest.tar.gz

Quick start describes the commands necessary to configure, build, and install Gnuastro on your system. This chapter will be useful in cases where the simple procedure above is not sufficient, for example your system lacks a mandatory/optional dependency (in other words, you can’t pass the $ ./configure step), or you want greater customization, or you want to build and install Gnuastro from other random points in its history, or you want a higher level of control on the installation. Thus if you were happy with downloading the tarball and following Quick start, then you can safely ignore this chapter and come back to it in the future if you need more customization.

Dependencies describes the mandatory, optional and bootstrapping dependencies of Gnuastro. Only the first group are required/mandatory when you are building Gnuastro using a tarball (see Release tarball), they are very basic and low-level tools used in most astronomical software, so you might already have them installed, if not they are very easy to install as described for each. Downloading the source discusses the two methods you can obtain the source code: as a tarball (a significant snapshot in Gnuastro’s history), or the full history22. The latter allows you to build Gnuastro at any random point in its history (for example to get bug fixes or new features that are not released as a tarball yet).

The building and installation of Gnuastro is heavily customizable, to learn more about them, see Build and install. This section is essentially a thorough explanation of the steps in Quick start. It discusses ways you can influence the building and installation. If you encounter any problems in the installation process, it is probably already explained in Known issues. In Other useful software the installation and usage of some other free software that are not directly required by Gnuastro but might be useful in conjunction with it is discussed.


Next: , Previous: , Up: Installation   [Contents][Index]

3.1 Dependencies

The dependencies needed to build and install Gnuastro are defined by the features you want and how you would like to obtain the source code (see Downloading the source). A minimal set of dependencies are mandatory, if they are not present you cannot get passed the configuration step. Such mandatory dependencies are therefore very basic (low-level) tools which are easy to obtain, build and install, see Mandatory dependencies for a full discussion.

If you have the packages of Optional dependencies, Gnuastro will have additional functionality (for example converting FITS images to JPEG or PDF). If you are installing from a tarball as explained in Quick start, you can stop reading after this section. However, if you decided to use the version controlled source instead of the tarball (see Version controlled source), an additional bootstrapping step is required before configuration and its dependencies are explained in Bootstrapping dependencies.


Next: , Previous: , Up: Dependencies   [Contents][Index]

3.1.1 Mandatory dependencies

The mandatory Gnuastro dependencies are very basic and low-level tools. They all follow the same basic GNU based build system (like that shown in Quick start), so even if you don’t have them, installing them should be pretty straightforward. In this section we explain each program and any specific note that might be necessary in the installation.

The most basic choice is to build the packages from source yourself, instead of relying on your distribution’s package management system. While the latter choice is indeed possible, we recommend that you build these dependencies yourself as discussed below. We will send out notifications in the info-gnuastro mailing list, see Announcements when we find out that these requirements are updated.

  1. For each package, Gnuastro might preform better (or require) certain configuration options that your distribution’s package managers didn’t add for you. If present, these configuration options are explained during the installation of each in the sections below. When the proper configuration has not been set, the programs should complain and inform you.
  2. Your distribution’s pre-built package might not be the most recent release.
  3. For the libraries, they might separate the binary file from the header files, see Known issues.
  4. Like any other tool, the science you derive from Gnuastro’s tools highly depend on these lower level dependencies, so generally it is much better to have a close connection with them. By reading their manuals, installing them and staying up to date with changes/bugs in them, your scientific results and understanding will also correspondingly improve.

Next: , Previous: , Up: Mandatory dependencies   [Contents][Index]

3.1.1.1 GNU Scientific library

The GNU Scientific Library, or GSL, is a large collection of functions that are very useful in scientific applications, for example integration, random number generation, and Fast Fourier Transform among many others. To install GSL from source, you can run the following commands after you have downloaded gsl-latest.tar.gz:

$ tar xf gsl-latest.tar.gz
$ cd gsl-X.X                     # Replace X.X with version number.
$ ./configure
$ make -j8                       # Replace 8 with no. CPU threads.
$ make check
$ sudo make install

Next: , Previous: , Up: Mandatory dependencies   [Contents][Index]

3.1.1.2 CFITSIO

CFITSIO is the closest you can get to the pixels in a FITS image while remaining faithful to the FITS standard. It is written by William Pence, the principal author of the FITS standard23, and is regularly updated. Setting the definitions for all other software packages using FITS images.

Some GNU/Linux distributions have CFITSIO in their package managers, if it is available and updated, you can use it. One problem that might occur is that CFITSIO might not be configured with the --enable-reentrant option by the distribution. This option allows CFITSIO to open a file in multiple threads, it can thus provide great speed improvements. If CFITSIO was not configured with this option, any program which needs this capability will warn you and abort when you ask for multiple threads (see Multi-threaded operations).

To install CFITSIO from source, we strongly recommend that you have a look through Chapter 2 (Creating the CFITSIO library) of the CFITSIO manual and understand the options you can pass to $ ./configure (they aren’t too much). This is a very basic package for most astronomical software and it is best that you configure it nicely with your system. Once you download the source and unpack it, the following configure script should be enough for most purposes. Don’t forget to read chapter two of the manual though, for example the second option is only for 64bit systems. The manual also explains how to check if it has been installed correctly.

CFITSIO comes with two executables called fpack and funpack. From their manual: they “are standalone programs for compressing and uncompressing images and tables that are stored in the FITS (Flexible Image Transport System) data format. They are analogous to the gzip and gunzip compression programs except that they are optimized for the types of astronomical images that are often stored in FITS format”. The commands below will compile and install them on your system along with CFITSIO. They are not essential for Gnuastro, since they are just wrappers for functions within CFITSIO, but they can come in handy. The make utils command is only available for versions above 3.39, it will build these executables along with several other test executables which are deleted before the installation (otherwise they will also be installed).

The CFITSIO installation from source process is given below. Let’s assume you have downloaded cfitsio_latest.tar.gz and are in the same directory:

$ tar xf cfitsio_latest.tar.gz
$ cd cfitsio
$ ./configure --prefix=/usr/local --enable-sse2 --enable-reentrant
$ make
$ make utils
$ ./testprog > testprog.lis
$ diff testprog.lis testprog.out    # Should have no output
$ cmp testprog.fit testprog.std     # Should have no output
$ rm cookbook fitscopy imcopy smem speed testprog
$ sudo make install

Previous: , Up: Mandatory dependencies   [Contents][Index]

3.1.1.3 WCSLIB

WCSLIB is written and maintained by one of the authors of the World Coordinate System (WCS) definition in the FITS standard24, Mark Calabretta. It might be already built and ready in your distribution’s package management system. However, here the installation from source is explained, for the advantages of installation from source please see Mandatory dependencies. To install WCSLIB you will need to have CFITSIO already installed, see CFITSIO.

WCSLIB also has plotting capabilities which use PGPLOT (a plotting library for C). If you wan to use those capabilities in WCSLIB, PGPLOT provides the PGPLOT installation instructions. However PGPLOT is old25, so its installation is not easy, there are also many great modern WCS plotting tools (mostly in written in Python). Hence, if you will not be using those plotting functions in WCSLIB, you can configure it with the --without-pgplot option as shown below. Let’s assume you have downloaded wcslib.tar.bz2 and are in the same directory:

$ tar xf wcslib.tar.bz2
$ cd wcslib-X.X                    # Replace X.X with version number
$ ./configure --without-pgplot LIBS="-pthread -lm" --disable-fortran
$ make
$ make check
$ sudo make install

Next: , Previous: , Up: Dependencies   [Contents][Index]

3.1.2 Optional dependencies

The libraries listed here are only used for very specific applications, therefore if you don’t want these operations, Gnuastro will be built and installed without them and you don’t have to have the dependencies.

If the ./configure script can’t find these requirements, it will warn you in the end that they are not present and notify you of the operation(s) you can’t do due to not having them. If the output you request from a program requires a missing library, that program is going to warn you and abort. In the case of executables like GPL GhostScript, if you install them at a later time, the program will run. This is because if required libraries are not present at build time, the executables cannot be built, but an executable is called by the built program at run time so if it becomes available, it will be used. If you do install an optional library later, you will have to rebuild Gnuastro and reinstall it for it to take effect.

libgit2

Git is one of the most common version control systems (see Version controlled source). When libgit2 is present, and Gnuastro’s programs are run within a version controlled directory, outputs will contain the version number of the working directory’s repository for future reproducibility. See the COMMIT keyword header in Output headers for a discussion.

libjpeg

libjpeg is only used by ConvertType to read from and write to JPEG images. libjpeg is a very basic library that provides tools to read and write JPEG images, most of the GNU/Linux graphic programs and libraries use it. Therefore you most probably already have it installed. libjpeg-turbo is an alternative to libjpeg. It uses SIMD instructions for ARM based systems that significantly decreases the processing time of JPEG compression and decompression algorithms.

GPL Ghostscript

GPL Ghostscript’s executable (gs) is called used by ConvertType to compile a PDF file from a source PostScript file, see ConvertType. Therefore its headers (and libraries) are not needed. With a very high probability you already have it in your GNU/Linux distribution. Unfortunately it does not follow the standard GNU build style so installing it is very hard. It is best to rely on your distribution’s package managers for this.


Previous: , Up: Dependencies   [Contents][Index]

3.1.3 Bootstrapping dependencies

Bootstrapping is only necessary if you have decided to obtain the full version controlled history of Gnuastro, see Version controlled source and Bootstrapping. Using the version controlled source enables you to always be up to date with the most recent development work of Gnuastro (bug fixes, new functionalities, improved algorithms and etc). If you have downloaded a tarball (see Downloading the source), then you can ignore this subsection.

To successfully run the bootstrapping process, there are some additional dependencies to those discussed in the previous subsections. These are low level tools that are used by a large collection of Unix-like operating systems programs, therefore they are most probably already available in your system. If they are not already installed, you should be able to easily find them in any GNU/Linux distribution package management system (apt-get, yum, pacman and etc). The short names in parenthesis in typewriter font after the package name can be used to search for them in your package manager. For the GNU Portability Library, GNU Autoconf Archive and TeX Live, it is recommended to use the instructions here, not your operating system’s package manager.

GNU Portability Library (Gnulib)

To ensure portability for a wider range of operating systems (those that don’t include GNU C library, namely glibc), Gnuastro depends on the GNU portability library, or Gnulib. Gnulib keeps a copy of all the functions in glibc, implemented (as much as possible) to be portable to other operating systems. The bootstrap script can automatically clone Gnulib (as a gnulib/ directory inside Gnuastro), however, as described in Bootstrapping this is not recommended.

The recommended way to bootstrap Gnuastro is to first clone Gnulib and the Autoconf archives (see below) into a local directory outside of Gnuastro. Let’s call it DEVDIR26 (which you can set to any directory). Currently in Gnuastro, both Gnulib and Autoconf archives have to be cloned in the same top directory27 like the case here28:

$ DEVDIR=/home/yourname/Development
$ cd $DEVDIR
$ git clone git://git.sv.gnu.org/gnulib.git
$ git clone git://git.sv.gnu.org/autoconf-archive.git

You now have the full version controlled source of these two repositories in separate directories. Both these packages are regularly updated, so every once in a while, you can run $ git pull within them to get any possible updates.

GNU Automake (automake)

GNU Automake will build the Makefile.in files in each sub-directory using the (hand-written) Makefile.am files. The Makefile.ins are subsequently used to generate the Makefiles when the user runs ./configure before building.

GNU Autoconf (autoconf)

GNU Autoconf will build the configure script using the configurations we have defined (hand-written) in configure.ac.

GNU Autoconf Archive

These are a large collection of tests that can be called to run at ./configure time. See the explanation under GNU Portability Library above for instructions on obtaining it and keeping it up to date.

GNU Libtool (libtool)

GNU Libtool is in charge of building all the libraries in Gnuastro. The libraries contain functions that are used by more than one program and are installed for use in other programs. They are thus put in a separate directory (lib/).

GNU help2man (help2man)

GNU help2man is used to convert the output of the --help option (--help) to the traditional Man page (Man pages).

LaTeX and some TeX packages

Some of the figures in this book are built by LaTeX (using the PGF/TikZ package). The LaTeX source for those figures is version controlled for easy maintenance not the actual figures. So the ./boostrap script will run LaTeX to build the figures. The best way to install LaTeX and all the necessary packages is through TeX live which is a package manager for TeX related tools that is independent of any operating system. It is thus preferred to the TeX Live versions distributed by your operating system.

To install TeX Live, go to the webpage and download the appropriate installer by following the “download” link. Note that by default the full package repository will be downloaded and installed (around 4 Giga Bytes) which can take very long to download and to update later. However, most packages are not needed by everyone, it is easier, faster and better to install only the “Basic scheme” (consisting of only the most basic TeX and LaTeX packages, which is less than 200 Mega bytes)29.

After the installation be sure to set the environment variables as suggested in the end of the outputs. Any time you confront (need) a package you don’t have, simply install it with a command like below (similar to how you install software from your operating system’s package manager)30. To install all the necessary TeX packages for a successful Gnuastro bootstrap, run this command:

$ su
# tlmgr install epsf jknapltx caption biblatex biber iftex \
                etoolbox logreq xstring xkeyval pgf ms     \
                xcolor pgfplots times rsfs pstools epspdf
ImageMagick (imagemagick)

ImageMagick is a wonderful and robust program for image manipulation on the command-line. bootsrap uses it to convert the book images into the formats necessary for the various book formats.


Next: , Previous: , Up: Installation   [Contents][Index]

3.2 Downloading the source

Gnuastro’s source code can be downloaded in two ways. As a tarball, ready to be configured and installed on your system (as described in Quick start), see Release tarball. If you want official releases of stable versions this is the best, easiest and most common option. Alternatively, you can clone the version controlled history of Gnuastro, run one extra bootstrapping step and then follow the same steps as the tarball. This will give you access to all the most recent work that will be included in the next release along with the full project history. The process is thoroughly introduced in Version controlled source.


Next: , Previous: , Up: Downloading the source   [Contents][Index]

3.2.1 Release tarball

A release tarball (commonly compressed) is the most common way of obtaining free and open source software. A tarball is a snapshot of one particular moment in the Gnuastro development history along with all the necessary files to configure, build, and install Gnuastro easily (see Quick start). It is very straightforward and needs the least set of dependencies (see Mandatory dependencies). Gnuastro has tarballs for official stable releases and pre-releases for testing. See Version numbering for more on the two types of releases and the formats of the version numbers. The URLs for each type of release are given below.

Official stable releases (http://ftp.gnu.org/gnu/gnuastro):

This URL hosts the official stable releases of Gnuastro. Always use the most recent version (see Version numbering). By clicking on the “Last modified” title of the second column, the files will be sorted by their date which you can also use to find the latest version. It is recommended to use a mirror to download these tarballs, please visit http://ftpmirror.gnu.org/gnuastro/ and see below.

Pre-release tar-balls (http://alpha.gnu.org/gnu/gnuastro):

This URL contains unofficial pre-release versions of Gnuastro. The pre-release versions of Gnuastro here are for enthusiasts to try out before an official release. If there are problems, or bugs then the testers will inform the developers to fix before the next official release. See Version numbering to understand how the version numbers here are formatted. If you want to remain even more up-to-date with the developing activities, please clone the version controlled source as described in Version controlled source.

Gnuastro’s official/stable tarball is released with two formats: Gzip (with suffix .tar.gz) and Lzip (with suffix .tar.lz). The pre-release tarballs (after version 0.3) are released only as an Lzip tarball. Gzip is a very well-known and widely used compression program created by GNU and available in most systems. However, Lzip provides a better compression ratio and more robust archival capacity. For example Gnuastro 0.3’s tarball was 2.9MB and 4.3MB with Lzip and Gzip respectively, see the Lzip webpage for more. Lzip might not be pre-installed in your operating system, if so, installing it from your operating system’s package manager or from source is very easy and fast (it is a very small program).

The GNU FTP server is mirrored (has backups) in various locations on the globe (http://www.gnu.org/order/ftp.html). You can use the closest mirror to your location for a more faster download. Note that only some mirrors keep track of the pre-release (alpha) tarballs. Also note that if you want to download immediately after and announcement (see Announcements), the mirrors might need some time to synchronize with the main GNU FTP server.


Previous: , Up: Downloading the source   [Contents][Index]

3.2.2 Version controlled source

The publicly distributed Gnuastro tar-ball (for example gnuastro-X.X.tar.gz) does not contain the revision history, it is only a snapshot of the source code at one significant instant of Gnuastro’s history (specified by the version number, see Version numbering), ready to be configured and built. To be able to develop successfully, the revision history of the code can be very useful to track when something was added or changed, also some updates that are not yet officially released might be in it.

We use Git for the version control of Gnuastro. For those who are not familiar with it, we recommend the Pro Git31 book. The whole book is publicly available for online reading and downloading and does a wonderful job at explaining the concepts and best practices.

Let’s assume you want to keep Gnuastro in the TOPGNUASTRO directory (can be any directory, change the value below). The full version controlled history of Gnuastro can be cloned in TOPGNUASTRO/gnuastro by running the following commands32:

$ TOPGNUASTRO=/home/yourname/Research/projects/
$ cd $TOPGNUASTRO
$ git clone git://git.sv.gnu.org/gnuastro.git

The $TOPGNUASTRO/gnuastro directory will contain hand-written (version controlled) source code for Gnuastro’s programs, libraries, this book and the tests. All are divided into sub-directories with standard and very descriptive names. The version controlled files in the top cloned directory are either mainly in capital letters (for example THANKS and README) or mainly written in small-caps (for example configure.ac and Makefile.am). The former are non-programming, standard writing for human readers containing high-level information about the whole package. The latter are instructions to customize the GNU build system for Gnuastro.

The cloned Gnuastro source cannot immediately be configured, compiled, or installed since it only contains hand-written files, not automatically generated or imported files which do all the hard work of the build process. See Bootstrapping for the process of generating and importing those files (it is very easy!). Once you have bootstrapped Gnuastro, you can run the standard procedures (in Quick start). Very soon after you have cloned it, Gnuastro’s main master branch will be updated on the main repository (since the developers are actively working on Gnuastro), for the best practices in keeping your local history in sync with the main repository see Synchronizing.


Next: , Previous: , Up: Version controlled source   [Contents][Index]

3.2.2.1 Bootstrapping

The version controlled source code lacks the source files that we have not written or are automatically built. These automatically generated files are included in the distributed tar ball for each distribution (for example gnuastro-X.X.tar.gz, see Version numbering) and make it easy to immediately configure, build, and install Gnuastro. However from the perspective of version control, they are just bloatware and sources of confusion (since they are not changed by Gnuastro developers).

The process of automatically building and importing necessary files into the cloned directory is known as bootstrapping. All the instructions for an automatic bootstrapping are available in bootstrap and configured using bootstrap.conf. bootstrap is the only file not written by Gnuastro developers but is under version control to enable simple bootstrapping immediately after cloning. It is maintained by the GNU Portability Library (Gnulib) and this file is an identical copy, so do not make any changes in this file since it will be replaced when Gnulib releases an update. Make all your changes in bootstrap.conf.

The bootstrapping process has its own separate set of dependencies, the full list is given in Bootstrapping dependencies. They are generally very low-level and used by a very large set of commonly used programs, so they are probably already installed on your system. The simplest way to bootstrap Gnuastro is to simply run the bootstrap script within your cloned Gnuastro directory as shown below. However, please read the next paragraph before doing so (see Version controlled source for TOPGNUASTRO).

$ cd TOPGNUASTRO/gnuastro
$ ./bootstrap                      # Requires internet connection

Without any options, bootstrap will clone Gnulib within your cloned Gnuastro directory (TOPGNUASTRO/gnuastro/gnulib) and download the necessary Autoconf archives macros. So if you run bootstrap like this, you will need an internet connection every time you decide to bootstrap. Also, Gnulib is a large package and cloning it can be slow. It will also keep the full Gnulib repository within your Gnuastro repository, so if another one of your projects also needs Gnulib, and you insist on running bootstrap like this, you will have two copies. In case you regularly backup your important files, Gnulib will also slow down the backup process. Therefore while the simple invocation above can be used with no problem, it is not recommended. To do better, see the next paragraph.

The recommended way to get these two packages is thoroughly discussed in Bootstrapping dependencies (in short: clone them in the separate DEVDIR/ directory). The following commands will take you into the cloned Gnuastro directory and run the bootstrap script, while telling it to copy some files (instead of making symbolic links, with the --copy option, this is not mandatory33) and where to look for Gnulib (with the --gnulib-srcdir option).

$ cd $TOPGNUASTRO/gnuastro
$ ./bootstrap --copy --gnulib-srcdir=$DEVDIR/gnulib

Since Gnulib and Autoconf archives are now available in your local directories, you don’t need an internet connection every time you decide to remove all untracked files and redo the bootstrap (see box below). You can also use the same command on any other project that uses Gnulib. All the necessary GNU C library functions, Autoconf macros and Automake inputs are now available along with the book figures. The standard GNU build system (Quick start) will do the rest of the job.

Undoing the bootstrap: During the development, it might happen that you want to remove all the automatically generated and imported files. In other words, you might want to reverse the bootstrap process. Fortunately Git has a good program for this job: git clean. Run the following command and every file that is not version controlled will be removed.

git clean -fxd

It is best to commit any recent change before running this command. You might have created new files since the last commit and if they haven’t been committed, they will all be gone forever (using rm). To get a list of the non-version controlled files instead of deleting them, add the n option to git clean, so it becomes -fxdn.

Besides the bootstrap and bootstrap.conf, the bootstrapped/ directory and README-hacking file are also related to the bootstrapping process. The former hosts all the imported (bootstrapped) directories. Thus, in the version controlled source, it only contains a REAME file, but in the distributed tar-ball it also contains sub-directories filled with all bootstrapped files. README-hacking contains a summary of the bootstrapping process discussed in this section. It is a necessary reference when you haven’t built this book yet. It is thus not distributed in the Gnuastro tarball.


Previous: , Up: Version controlled source   [Contents][Index]

3.2.2.2 Synchronizing

The bootstrapping script (see Bootstrapping) is not regularly needed: you mainly need it after you have cloned Gnuastro (once) and whenever you want to re-import the files from Gnulib, or Autoconf archives34 (not too common). However, Gnuastro developers are constantly working on Gnuastro and are pushing their changes to the official repository. Therefore, your local Gnuastro clone will soon be out-dated. Gnuastro has two mailing lists dedicated to its developing activities (see Developing mailing lists). Subscribing to them can help you decide when to synchronize with the official repository.

To pull all the most recent work in Gnuastro, run the following command from the top Gnuastro directory:

$ git pull && autoconf -f

GNU Autoconf is part of the GNU build system and will update the ./configure script based on the hand-written configurations (in configure.ac, which is version controlled in Gnuastro). The pulled changes might contain changes in the build system configurations. However, The most important reason for running this command is to generate a version number for your Gnuastro snapshot. This generated version number will include the commit information if you are building Gnuastro from any point in Gnuastro’s history (see Version numbering). Since the version number is included in nearly all outputs of the programs, this can help you later exactly reproduce an old result by checking out the exact point in Gnuastro’s history that produced those results. Therefore, be sure to run ‘autoconf -f’ after every synchronization. You can also run them separately:

$ git pull
$ autoconf -f

If you would like to see what has changed since you last synchronized your local clone, you can take the following steps instead of the simple command above (don’t type anything after #):

$ git checkout master             # Confirm if you are on master.
$ git fetch origin                # Fetch all new commits from server.
$ git log master..origin/master   # See all the new commit messages.
$ git merge origin/master         # Update your master branch.
$ autoconf -f                     # Update ./configure.

By default git log prints the most recent commit first, add the --reverse option to see the changes chronologically. To see exactly what has been changed in the source code along with the commit message, add a -p option to the git log.

If you intend make changes in the code, have a look at Developing to get started easily. Be sure to commit your changes in a separate branch (keep your master branch to follow the official repository) and re-run autoconf -f after the commit. If you intend to send your changes to us (see Contributing to Gnuastro) for the benefit of the whole community. If you send your work to us, you can safely use your commit since it will be ultimately recorded in Gnuastro’s official history. If not, please upload your separate branch to a public hosting service (for example GitLab, see Forking tutorial) and link to it in your report, or run make distcheck and upload the output gnuastro-X.X.X.XXXX.tar.gz to a publicly accessible webpage so your results can be considered scientific (reproducible).


Previous: , Up: Installation   [Contents][Index]

3.3 Build and install

This section is basically a longer explanation to the sequence of commands given in Quick start. If you didn’t have any problems during the Quick start steps, you want to have all the programs of Gnuastro installed in your system, you don’t want to change the executable names during or after installation, you have root access to install the programs in the default system wide directory, the Letter paper size of the print book is fine for you or as a summary you don’t feel like going into the details when everything is working, you can safely skip this section.

If you have any of the above problems or you want to understand the details for a better control over your build and install, read along. The dependencies which you will need prior to configuring, building and installing Gnuastro are explained in Dependencies. The first three steps in Quick start need no extra explanation, so we will skip them and start with an explanation of Gnuastro specific configuration options and a discussion on the installation directory in Configuring, followed by some smaller subsections: Tests, A4 print book, and Known issues which explains the solutions to known problems you might encounter in the installation steps and ways you can solve them.


Next: , Previous: , Up: Build and install   [Contents][Index]

3.3.1 Configuring

The $ ./configure step is the most important step in the build and install process. All the required packages, libraries, headers and environment variables are checked in this step. The behaviors of make and make install can also be set through command line options to this command.

The configure script accepts various arguments and options which enable the final user to highly customize whatever she is building. The options to configure are generally very similar to normal program options explained in Arguments and options. Similar to all GNU programs, you can get a full list of the options along with a short explanation by running

$ ./configure --help

A complete explanation is also included in the INSTALL file. Note that this file was written by the authors of GNU Autoconf (which builds the configure script), therefore it is common for all programs which use the $ ./configure script for building and installing, not just Gnuastro. Here we only discuss cases where you don’t have super-user access to the system and if you want to change the executable names. But before that, a review of the options to configure that are particular to Gnuastro are discussed.


Next: , Previous: , Up: Configuring   [Contents][Index]

3.3.1.1 Gnuastro configure options

Most of the options to configure (which are to do with building) are similar for every program which uses this script. Here the options that are particular to Gnuastro are discussed. The next topics explain the usage of other configure options which can be applied to any program using the GNU build system (through the configure script).

--enable-progname

Only build and install progname along with any other program that is enabled in this fashion. progname is the name of the executable without the ast, for example crop for Crop (with the executable name of astcrop). If this option is called for any of the programs in Gnuastro, any program which is not explicitly enabled will not be built or installed.

--disable-progname
--enable-progname=no

Do not build or install the program named progname. This is very similar to the --enable-progname, but will build and install all the other programs except this one.

--enable-bin-op-uint8
--enable-bin-op-int8
--enable-bin-op-uint16
--enable-bin-op-int16
--enable-bin-op-uint32
--enable-bin-op-int32
--enable-bin-op-uint64
--enable-bin-op-int64
--enable-bin-op-float32
--enable-bin-op-float64

Enable the binary data-structure operators to work natively on the respective type of data (u stands for unsigned types, see Numeric data types). Some are compiled by default, to disable them (or disable any other type), either run enable-bin-op-TYPE=no, or run --disable-bin-op-TYPE. The final list of enabled/disabled types can be inspected in the outputs of ./configure (close to the end).

Binary operators, for example + or > (greater than), are some of the most common operators to the Arithmetic program or the data_arithmetic function in Gnuastro library. To operate most efficiently (as fast as possible without using extra memory or CPU resources), it is best to rely on the native types of the input data. For example, if you want to add an integer array with a floating point array, using the native types, means relying the system’s internal type conversion for each array element, see Invoking Arithmetic. If we don’t use the native conversion, then the integer array has to be converted to the same type as the floating point array to do the conversion. This will consume memory (to copy the integer array into a new float array) and CPU (integer types need much less processing) resources and ultimately slow down the running.

There are many binary operators and in order to have them operate natively on of each of the above types, the compiler has to prepare for all the different combinations of these types. This can greatly slow down the compilation35 (when you run make). For example, with only one type, make will finish in less than a minute, but if you enable all types, it can take roughly half an hour. However, the profits of this one-time investment at compilation time will be directly felt (more significantly on large images/datasets) each time you run Gnuastro programs or libraries, because no internal type conversion will be necessary.

If build time is important for you (mainly developers), disabling shared libraries and optimizations (as in Building and debugging) is the first step to take. If you commonly work with very specific data-types, you can enable them (and disable the default types that you don’t need) with these configuration options. Since the outputs of comparison operators are unsigned char (or uint8_t) type and most astronomical datasets are in single precision (32-bit) floating point (float), the recommended minimum enabled types are uint8 and float32.

GNU/Linux distribution package managers who compile once, for a large audience of users who just download the compiled programs and executables, are recommended to enable all types to help their users.

--enable-bin-op-alltypes

Enable native binary arithmetic operation on all types, see the description above for the various types for a full discussion. As discussed there, enabling all types can greatly speed up arithmetic operations on any arbitrary dataset, but will also slow down the building time of Gnuastro. Recall that in practice this only affects the Arithmetic program and the gal_arithmetic library binary operators, nothing else. This option is strongly recommended when you are building Gnuastro to be included in a package manager of a GNU/Linux distribution (or other operating systems).

--enable-gnulibcheck

Enable checks on the GNU Portability Library (Gnulib). Gnulib is used by Gnuastro to enable users of non-GNU based operating systems (that don’t use GNU C library or glibc) to compile and use the advanced features that this library provides. We make extensive use of such functions. If you give this option to $ ./configure, when you run $ make check, first the functions in Gnulib will be tested, then the Gnuastro executables. If your operating system does not support glibc or has an older version of it and you have problems in the build process ($ make), you can give this flag to configure to see if the problem is caused by Gnulib not supporting your operating system or Gnuastro, see Known issues.

--disable-guide-message
--enable-guide-message=no

Do not print a guiding message during the GNU Build process of Quick start. By default, after each step, a message is printed guiding the user what the next command should be. Therefore, after ./configure, it will suggest running make. After make, it will suggest running make check and so on. If Gnuastro is configured with this option, for example

$ ./configure --disable-guide-message

Then these messages will not be printed after any step (like most programs). For people who are not yet fully accustomed to this build system, these guidelines can be very useful and encouraging. However, if you find those messages annoying, use this option.

Note: If some programs are enabled and some are disabled, it is equivalent to simply enabling those that were enabled. Listing the disabled programs is redundant.

Note that the tests of some programs might require other programs to have been installed and tested. For example MakeProfiles is the first program to be tested when you run $ make check, it provides the inputs to all the other tests. So if you don’t install MakeProfiles, then the tests for all the other programs will be skipped or fail. To avoid this, in one run, you can install all the packages and run the tests but not install. If everything is working correctly, you can run configure again with only the packages you want but not run the tests and directly install after building.


Next: , Previous: , Up: Configuring   [Contents][Index]

3.3.1.2 Installation directory

One of the most commonly used options to ./configure is --prefix, it is used to define the directory that will host all the installed files (or the “prefix” in their final absolute file name). For example, when you are using a server and you don’t have administrator or root access. In this example scenario, if you don’t use the --prefix option, you won’t be able to install the built files and thus access them from anywhere without having to worry about where they are installed. However, once you prepare your startup file to look into the proper place (as discussed thoroughly below), you will be able to easily use this option and benefit from any software you want to install without having to ask the system administrators or install and use a different version of a software that is already installed on the server.

The most basic way to run an executable is to explicitly write its full file name (including all the directory information) and run it. One example is running the configuration script with the $ ./configure command (see Quick start). By giving a specific directory (the current directory or ./), we are explicitly telling the shell to look in the current directory for an executable file named ‘configure’. Directly specifying the directory is thus useful for executables in the current (or nearby) directories. However, when the program (an executable file) is to be used a lot, specifying all those directories will become a significant burden. For example, the ls executable lists the contents in a given directory and it is (usually) installed in the /usr/bin/ directory by the operating system maintainers. So each time you want to use it you would have to run the following command (which is very inconvenient, both in writing and in remembering the various directories).

$ /usr/bin/ls

To address this problem, we have the PATH environment variable. To understand it better, we will start with a short introduction to the shell variables. Shell variable values are basically treated as strings of characters. For example, it doesn’t matter if the value is a name (string of alphabetic characters) or a number (string of numeric characters). You can define a variable and a value for it by running

$ myvariable1=a_test_value
$ myvariable2="a test value"

As you see above, if the value contains white space characters, you have to put the whole value (including white space characters) in double quotes ("). You can see the value it represents by running

$ echo $myvariable1
$ echo $myvariable2

If a variable has no value or it wasn’t defined, the last command will only print an empty line. A variable defined like this will be known as long as this shell or terminal is running. Other terminals will have no idea it existed. The main advantage of shell variables is that if they are exported36, subsequent programs that are run within that shell can access their value. So by changing their value, you can change the “environment” of a program which uses them. The shell variables which are accessed by programs are therefore known as “environment variables”37. You can see the full list of exported variables that your shell recognizes by running:

$ printenv

HOME is one commonly used environment variable, it is any user’s (the one that is logged in) top directory. Try finding it in the command above. It is used so often that the shell has a special expansion (alternative) for it: ‘~’. Whenever you see file names starting with the tilde sign, it actually represents the value to the HOME environment variable, so ~/doc is the same as $HOME/doc.

Another one of the most commonly used environment variables is PATH, it is a list of directories to search for executable names. Its value is a list of directories (separated by a colon, or ‘:’). When the address of the executable is not explicitly given (like ./configure above), the system will look for the executable in the directories specified by PATH. If you have a computer nearby, try running the following command to see which directories your system will look into when it is searching for executable (binary) files, one example is printed here (notice how /usr/bin, in the ls example above, is one of the directories in PATH):

$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/bin

By default PATH usually contains system-wide directories, which are readable (but not writable) by all users, like the above example. Therefore if you don’t have root (or administrator) access, you need to add another directory to PATH which you actually have write access to. The standard directory where you can keep installed files (not just executables) for your own user is the ~/.local/ directory. The names of hidden files start with a ‘.’ (dot), so it will not show up in your common command-line listings, or on the graphical user interface. You can use any other directory, but this is the most recognized.

The top installation directory will be used to keep all the package’s components: programs (executables), libraries, include (header) files, shared data (like manuals), or configuration files (see Review of library fundamentals for a thorough introduction to headers and linking). So it commonly has some of the following sub-directories for each class of installed components respectively: bin/, lib/, include/ man/, share/, etc/. Since the PATH variable is only used for executables, you can add the ~/.local/bin directory (which keeps the executables/programs or more generally, “binary” files) to PATH with the following command. As defined below, first the existing value of PATH is used, then your given directory is added to its end and the combined value is put back in PATH (run ‘$ echo $PATH’ afterwards to check if it was added).

$ PATH=$PATH:~/.local/bin

Any executable that you installed in ~/.local/bin will now be usable without having to remember and write its full address. However, as soon as you leave/close your current terminal session, this modified PATH variable will be forgotten. Adding the directories which contain executables to the PATH environment variable each time you start a terminal is also very inconvenient and prone to errors. Fortunately, there are standard ‘startup files’ defined by your shell precisely for this (and other) purposes. There is a special startup file for every significant starting step:

/etc/profile and everything in /etc/profile.d/

These startup scripts are called when your whole system starts (for example after you turn on your computer). Therefore you need administrator or root privileges to access or modify them.

~/.bash_profile

If you are using (GNU) Bash as your shell, the commands in this file are run once every time you log in to your account.

~/.bashrc

If you are using (GNU) Bash as your shell, the commands here will be run each time you start a terminal (for example, when you open your terminal emulator in the graphic user interface).

For security reasons, it is highly recommended to directly type in your HOME directory value by hand in startup files instead of using variables. So in the following, let’s assume your user name is ‘name’ (so ~ may be replaced with /home/name). To add ~/.local/bin to your PATH automatically on any startup file, you have to “export” the new value of PATH in the startup file that is most relevant to you by adding this line:

export PATH=$PATH:/home/name/.local/bin

Now that you know your system will look into ~/.local/bin for executables, you can tell Gnuastro’s configure script to install everything in the top ~/.local directory using the --prefix option. When you subsequently run $ make install, all the install-able files will be put in their respective directory under ~/.local/ (the executables in ~/.local/bin, the compiled library files in ~/.local/lib, the library header files in ~/.local/include and so on, to learn more about these different files, please see Review of library fundamentals). Note that tilde (‘~’) expansion will not happen if you put a ‘=’ between --prefix and ~/.local38, so we have avoided the = character here which is optional in GNU-style options, see Options.

$ ./configure --prefix ~/.local

You can install everything (including libraries like GSL, CFITSIO, or WCSLIB which are Gnuastro’s mandatory dependencies, see Mandatory dependencies) locally by configuring them as above. However, recall that PATH is only for executable files, not libraries and that libraries can also depend on other libraries. For example WCSLIB depends on CFITSIO and Gnuastro needs both. Therefore, when you installed a library in a non-recognized directory, you have to guide the program that depends on them to look into the necessary library and header file directories. To do that, you have to define the LDFLAGS and CPPFLAGS environment variables respectively. This can be done while calling ./configure as shown below:

$ ./configure LDFLAGS=-L/home/name/.local/lib            \
              CPPFLAGS=-I/home/name/.local/include       \
              --prefix ~/.local

It can be annoying/buggy to do this when configuring every software that depends on such libraries. Hence, you can define these two variables in the most relevant startup file (discussed above). The convention on using these variables doesn’t include a colon to separate values (as PATH-like variables do), they use white space characters and each value is prefixed with a compiler option39: note the -L and -I above (see Options), for -I see Headers, and for -L, see Linking. Therefore we have to keep the value in double quotation signs to keep the white space characters and adding the following two lines to the startup file of choice:

export LDFLAGS="$LDFLAGS -L/home/name/.local/lib"
export CPPFLAGS="$CPPFLAGS -I/home/name/.local/include"

Dynamic libraries are linked to the executable every time you run a program that depends on them (see Linking to fully understand this important concept). Hence dynamic libraries also require a special path variable called LD_LIBRARY_PATH (same formatting as PATH). To use programs that depend on these libraries, you need to add ~/.local/lib to your LD_LIBRARY_PATH environment variable by adding the following line to the relevant start-up file:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/name/.local/lib

If you also want to access the Info (see Info) and man pages (see Man pages) documentations add ~/.local/share/info and ~/.local/share/man to your INFOPATH40 and MANPATH environment variables respectively.

A final note is that order matters in the directories that are searched for all the variables discussed above. In the examples above, the new directory was added after the system specified directories. So if the program, library or manuals are found in the system wide directories, the user directory is no longer searched. If you want to search your local installation first, put the new directory before the already existing list, like the example below.

export LD_LIBRARY_PATH=/home/name/.local/lib:$LD_LIBRARY_PATH

This is good when a library, for example CFITSIO, is already present on the system, but the system-wide install wasn’t configured with the correct configuration flags (see CFITSIO), or you want to use a newer version and you don’t have administrator or root access to update it on the whole system/server. If you update LD_LIBRARY_PATH by placing ~/.local/lib first (like above), the linker will first find the CFITSIO you installed for yourself and link with it. It thus will never reach the system-wide installation.

There are important security problems with using local installations first: all important system-wide executables and libraries (important executables like ls and cp, or libraries like the C library) can be replaced by non-secure versions with the same file names and put in the customized directory (~/.local in this example). So if you choose to search in your customized directory first, please be sure to keep it clean from executables or libraries with the same names as important system programs or libraries.

Summary: When you are using a server which doesn’t give you administrator/root access AND you would like to give priority to your own built programs and libraries, not the version that is (possibly already) present on the server, add these lines to your startup file. See above for which startup file is best for your case and for a detailed explanation on each. Don’t forget to replace ‘/YOUR-HOME-DIR’ with your home directory (for example ‘/home/your-id’):

export PATH="/YOUR-HOME-DIR/.local/bin:$PATH"
export LDFLAGS="-L/YOUR-HOME-DIR/.local/lib $LDFLAGS"
export MANPATH="/YOUR-HOME-DIR/.local/share/man/:$MANPATH"
export CPPFLAGS="-I/YOUR-HOME-DIR/.local/include $CPPFLAGS"
export INFOPATH="/YOUR-HOME-DIR/.local/share/info/:$INFOPATH"
export LD_LIBRARY_PATH="/YOUR-HOME-DIR/.local/lib:$LD_LIBRARY_PATH"

Afterwards, you just need to add an extra --prefix=/YOUR-HOME-DIR/.local to the ./configure command of the software that you intend to install. Everything else will be the same as a standard build and install, see Quick start.


Next: , Previous: , Up: Configuring   [Contents][Index]

3.3.1.3 Executable names

At first sight, the names of the executables for each program might seem to be uncommonly long, for example astnoisechisel or astcrop. We could have chosen terse (and cryptic) names like most programs do. We chose this complete naming convention (something like the commands in TeX) so you don’t have to spend too much time remembering what the name of a specific program was. Such complete names also enable you to easily search for the programs.

To facilitate typing the names in, we suggest using the shell auto-complete. With this facility you can find the executable you want very easily. It is very similar to file name completion in the shell. For example, simply by typing the letters below (where [TAB] stands for the Tab key on your keyboard)

$ ast[TAB][TAB]

you will get the list of all the available executables that start with ast in your PATH environment variable directories. So, all the Gnuastro executables installed on your system will be listed. Typing the next letter for the specific program you want along with a Tab, will limit this list until you get to your desired program.

In case all of this does not convince you and you still want to type short names, some suggestions are given below. You should have in mind though, that if you are writing a shell script that you might want to pass on to others, it is best to use the standard name because other users might not have adopted the same customizations. The long names also serve as a form of documentation in such scripts. A similar reasoning can be given for option names in scripts: it is good practice to always use the long formats of the options in shell scripts, see Options.

The simplest solution is making a symbolic link to the actual executable. For example let’s assume you want to type ic to run Crop instead of astcrop. Assuming you installed Gnuastro executables in /usr/local/bin (default) you can do this simply by running the following command as root:

# ln -s /usr/local/bin/astcrop /usr/local/bin/ic

In case you update Gnuastro and a new version of Crop is installed, the default executable name is the same, so your custom symbolic link still works.

The installed executable names can also be set using options to $ ./configure, see Configuring. GNU Autoconf (which configures Gnuastro for your particular system), allows the builder to change the name of programs with the three options --program-prefix, --program-suffix and --program-transform-name. The first two are for adding a fixed prefix or suffix to all the programs that will be installed. This will actually make all the names longer! You can use it to add versions of program names to the programs in order to simultaneously have two executable versions of a program.

The third configure option allows you to set the executable name at install time using the SED program. SED is a very useful ‘stream editor’. There are various resources on the internet to use it effectively. However, we should caution that using configure options will change the actual executable name of the installed program and on every re-install (an update for example), you have to also add this option to keep the old executable name updated. Also note that the documentation or configuration files do not change from their standard names either.

For example, let’s assume that typing ast on every invocation of every program is really annoying you! You can remove this prefix from all the executables at configure time by adding this option:

$ ./configure --program-transform-name='s/ast/ /'

Previous: , Up: Configuring   [Contents][Index]

3.3.1.4 Configure and build in RAM

The configure and build process involves the creation, reading, and modification of a large number of files (input/output, or I/O). Therefore file I/O issues can directly affect the work of developers who need to configure and build Gnuastro numerous times. Some of these issues are listed below:

One solution to address both these problems is to use the tmpfs file system. Any file in tmpfs is actually stored in the RAM (and possibly SAWP), not on HDDs or SSDs. The RAM is built for extensive and fast I/O. Therefore the large number of file I/Os associated with configuring and building will not harm the HDDs or SSDs. Due to the volatile nature of RAM, files in the tmpfs file-system will be permanently lost after a power-off. Since all configured and built files are derivative files (not files that have been directly written by hand) there is no problem in this and this feature can be considered as an automatic cleanup.

The modern GNU C library (and thus the Linux kernel) define the /dev/shm directory for this purpose (POSIX shared memory). So using GNU Build System’s ability to build in a separate directory (not necessarily in the source directory), we can configure and build the programs in /dev/shm to benefit from the RAM. To simplify the process, Gnuastro comes with a tmpfs-config-make script. This script will create a directory in the shared memory, and put a symbolic link to it (called build) in the top source directory (the backup/sync software therefore only needs to ignore this single link/file). The script will then internally change to that directory and configure and build (make -kjN, where N is the number of threads for a parallel build) Gnuastro in there. To benefit from this script, simply run the following command instead of ./configure and make:

$ ./tmpfs-config-make

After this script is finished, you can ‘cd build’ and run other Make commands (for example, ‘make check’, ‘make install’, or ‘make pdf’) from there. In Emacs, the command to be run with the M-x compile command (by default: make -k) can be changed to ‘cd build; make -kjN’, or ‘make -C build -kjN’ (N is the number of threads; an integer \(\geq1\)). For subsequent builds (during development) the M-x recompile command will also do all the building in the RAM while you modify the clean, and backed-up source files and make minimal/efficient use of your non-volatile HDD or SSD.

This script can be used in any software which is configured and built using the GNU Build System. Just copy it in the top source directory of that software and run it from there. The default number of threads and location of the shared memory (/dev/shm) are currently hard-coded within the script. If you need to change them, please open the script with a text editor and set their values manually.


Next: , Previous: , Up: Build and install   [Contents][Index]

3.3.2 Tests

After successfully building (compiling) the programs with the $ make command you can check the installation before installing. To run the tests, run

$ make check

For every program some tests are designed to check some possible operations. Running the command above will run those tests and give you a final report. If everything is ok and you have built all the programs, all the tests should pass. In case any of the tests fail, please have a look at Known issues and if that still doesn’t fix your problem, look that the ./tests/test-suite.log file to see if the source of the error is something particular to your system or more general. If you feel it is general, please contact us because it might be a bug. Note that the tests of some programs depend on the outputs of other program’s tests, so if you have not installed them they might be skipped or fail. Prior to releasing every distribution all these tests are checked. If you have a reasonably modern terminal, the outputs of the successful tests will be colored green and the failed ones will be colored red.

These scripts can also act as a good set of examples for you to see how the programs are run. All the tests are in the tests/ directory. The tests for each program are shell scripts (ending with .sh) in a sub-directory of this directory with the same name as the program. See Test scripts for more detailed information about these scripts in case you want to inspect them.


Next: , Previous: , Up: Build and install   [Contents][Index]

3.3.3 A4 print book

The default print version of this book is provided in the letter paper size. If you would like to have the print version of this book on paper and you are living in a country which uses A4, then you can rebuild the book. The great thing about the GNU build system is that the book source code which is in Texinfo is also distributed with the program source code, enabling you to do such customizations (hacking).

In order to change the paper size, you will need to have GNU Texinfo installed. Open doc/gnuastro.texi with any text editor. This is the source file that created this book. In the first few lines you will see this line:

@c@afourpaper

In Texinfo, a line is commented with @c. Therefore, un-comment this line by deleting the first two characters such that it changes to:

@afourpaper

Save the file and close it. You can now run

$ make pdf

and the new PDF book will be available in SRCdir/doc/gnuastro.pdf. By changing the pdf in $ make pdf to ps or dvi you can have the book in those formats. Note that you can do this for any book that is in Texinfo format, they might not have @afourpaper line, so you can add it close to the top of the Texinfo source file.


Previous: , Up: Build and install   [Contents][Index]

3.3.4 Known issues

Depending on your operating system and the version of the compiler you are using, you might confront some known problems during the configuration ($ ./configure), compilation ($ make) and tests ($ make check). Here, their solutions are discussed.

If your problem was not listed above, please file a bug report (Report a bug).


Next: , Previous: , Up: Top   [Contents][Index]

4 Common program behavior

All the programs in Gnuastro share a set of common behavior mainly to do with user interaction to facilitate their usage. The most basic is how you can configure each program to do what you want: define the input, change parameter/option values, or identify the output. All Gnuastro programs can also read your desired configuration from pre-defined or user-specified files so you don’t have to specify all the (sometimes numerous) parameters on the command-line each time you run a program. These files define the “default” program behavior in each directory, for each user, or on each system. In other cases, some programs can greatly benefit from the many threads available in modern CPUs, so here we’ll also discuss how you can get the most out of your hardware. Among some other issues, we will also discuss how you can get immediate and distraction-free (without taking your hands off the keyboard!) help, or access to this whole book, on the command-line.


Next: , Previous: , Up: Common program behavior   [Contents][Index]

4.1 Command-line

All the programs in Gnuastro are customized through the standard GNU style command-line options. Thus, we’ll start by defining this general style that is very common in many command-line tools on Unix-like operating systems. Finally, the options that are common to all the programs in Gnuastro are discussed.

The command-line text that you type is passed onto the shell (or program managing the command-line) as a string of characters. See the “Invoking ProgramName” sections in this manual for some examples of commands with each program, for example Invoking Table. That string is then broken up into separate tokens or words by any metacharacters (like space, tab, |, > or ;) that might exist in the text. To learn more, please see the GNU Bash manual, for the complete list of meta-characters and other GNU Bash definitions (GNU Bash is the most common shell program). Its “Shell Operation” section has a short summary of the steps the shell takes before passing the commands to the program you called.


Next: , Previous: , Up: Command-line   [Contents][Index]

4.1.1 Arguments and options

On the command-line, the first thing you usually enter is the name of the program you want to run. After that, you can specify two types of input: arguments and options. In the GNU-style, arguments are those tokens that are not preceded by any hyphens (-, see Arguments). Here is one example:

$ astcrop --center=53.162551,-27.789676 -w10 --mode=wcs hubble-udf.fits

In this example, the argument is hubble-udf.fits. Arguments are most commonly the input file names containing your data. Options start with one or two hyphens, followed by an identifier for the option (the option’s name) and its value (see Options). Through options you tell the program how to interpret the data. In this example, we are running Crop to crop a region of width 10 arc-seconds centered at the given RA and Dec from the input Hubble Ultra-Deep Field (UDF) FITS image. So options come with an identifier (the option name which is separate from their value).

Arguments can be both mandatory and optional and unlike options they don’t have any identifiers (or help from you). Hence, their order might also matter (for example in cp which is used for copying one file to another location). The outputs of --usage and --help shows which arguments are optional and which are mandatory, see --usage. As their name suggests, options on the command-line can be considered to be optional and most of the time, you don’t have to worry about what order you specify them in. When the order does matter, or the option can be invoked multiple times, it is explicitly mentioned in the “Invoking ProgramName” section of each program.

In case your arguments or option values contain any of the shell’s meta-characters, you have to quote them. If there is only one such character, you can use a backslash (\) before it. If there are multiple, it might be easier to simply put your whole argument or option value inside of double quotes ("). In such cases, everything inside the double quotes will be seen as one token or word.

For example, let’s say you want to specify the header data unit (HDU) of your FITS file using a complex expression like ‘3; images(exposure > 100)’. If you simply add these after the --hdu (-h) option, the programs in Gnuastro will read the value to the HDU option as ‘3’ and run. Then, Bash will attempt to run a separate command ‘images(exposure > 100)’ and complain about a syntax error. This is because the semicolon (;) is an ‘end of command’ character in the shell. To solve this problem you can simply put double quotes around the whole string you want to pass to --hdu as seen below:

$ astcrop --hdu="3; images(exposure > 100)" FITSimage.fits

Alternatively you can put a ‘\’ before every metacharacter in this string, but try doing that, and probably you will agree that the double quotes are much more easier, elegant and readable.


Next: , Previous: , Up: Arguments and options   [Contents][Index]

4.1.1.1 Arguments

In Gnuastro, arguments are almost exclusively used as the input data file names. Please consult the first few paragraph of the “Invoking ProgramName” section for each program for a description of what it expects as input, how many arguments, or input data, it accepts, or in what order. Everything particular about how a program treats arguments, is explained under the “Invoking ProgramName” section for that program.

Generally, if there is a standard file name extension for a particular format, that filename extension is used to separate the kinds of arguments. The list below shows the data formats that are recognized in Gnuastro’s programs based on their file name endings. Any argument that doesn’t end with the specified extensions below is considered to be a text file (usually catalogs, see Tables). In some cases, a program can accept specific formats, for example ConvertType also accepts .jpg images.

Through out this book and in the command-line outputs, whenever we want to generalize all such astronomical data formats in a text place holder, we will use ASTRdata, we will assume that the extension is also part of this name. Any file ending with these names is directly passed on to CFITSIO to read. Therefore you don’t necessarily have to have these files on your computer, they can also be located on an FTP or HTTP server too, see the CFITSIO manual for more information.

CFITSIO has its own error reporting techniques, if your input file(s) cannot be opened, or read, those errors will be printed prior to the final error by Gnuastro.


Previous: , Up: Arguments and options   [Contents][Index]

4.1.1.2 Options

Command-line options allow configuring the behavior of a program in all GNU/Linux applications for each particular execution on a particular input data. A single option can be called in two ways: long or short. All options in Gnuastro accept the long format which has two hyphens an can have many characters (for example --hdu). Short options only have one hyphen (-) followed by one character (for example -h). You can see some examples in the list of options in Common options or those for each program’s “Invoking ProgramName” section. Both formats are shown for those which support both. First the short is shown then the long.

Usually, the short options are for when you are writing on the command-line and want to save keystrokes and time. The long options are good for shell scripts, where you aren’t usually rushing. Long options provide a level of documentation, since they are more descriptive and less cryptic. Usually after a few months of not running a program, the short options will be forgotten and reading your previously written script will not be easy.

Some options need to be given a value if they are called and some don’t. You can think of the latter type of options as on/off options. These two types of options can be distinguished using the output of the --help and --usage options, which are common to all GNU software, see Getting help. In Gnuastro we use the following strings to specify when the option needs a value and what format that value should be in. More specific tests will be done in the program and if the values are out of range (for example negative when the program only wants a positive value), an error will be reported.

INT

The value is read as an integer.

FLT

The value is read as a float. There are generally two types, depending on the context. If they are for fractions, they will have to be less than or equal to unity.

STR

The value is read as a string of characters (for example a file name) or other particular settings like a HDU name, see below.

To specify a value in the short format, simply put the value after the option. Note that since the short options are only one character long, you don’t have to type anything between the option and its value. For the long option you either need white space or an = sign, for example -h2, -h 2, --hdu 2 or --hdu=2 are all equivalent.

The short format of on/off options (those that don’t need values) can be concatenated for example these two hypothetical sequences of options are equivalent: -a -b -c4 and -abc4. As an example, consider the following command to run Crop:

$ astcrop -Dr3 --wwidth 3 catalog.txt --deccol=4 ASTRdata

The $ is the shell prompt, astcrop is the program name. There are two arguments (catalog.txt and ASTRdata) and four options, two of them given in short format (-D, -r) and two in long format (--width and --deccol). Three of them require a value and one (-D) is an on/off option.

If an abbreviation is unique between all the options of a program, the long option names can be abbreviated. For example, instead of typing --printparams, typing --print or maybe even --pri will be enough, if there are conflicts, the program will warn you and show you the alternatives. Finally, if you want the argument parser to stop parsing arguments beyond a certain point, you can use two dashes: --. No text on the command-line beyond these two dashes will be parsed.

Gnuastro has two types of options with values, those that only take a single value are the most common type. If these options are repeated or called more than once on the command-line, the value of the last time it was called will be assigned to it. This is very useful when you are testing/experimenting. Let’s say you want to make a small modification to one option value. You can simply type the option with a new value in the end of the command and see how the script works. If you are satisfied with the change, you can remove the original option for human readability. If the change wasn’t satisfactory, you can remove the one you just added and not worry about forgetting the original value. Without this capability, you would have to memorize or save the original value somewhere else, run the command and then change the value again which is not at all convenient and is potentially cause lots of bugs.

On the other hand, some options can be called multiple times in one run of a program and can thus take multiple values (for example see the --column option in Invoking Table. In these cases, the order of stored values is the same order that you specified on the command-line.

Gnuastro’s programs don’t keep any internal default values, so some options are mandatory and if they don’t have a value, the program will complain and abort. Most programs have many such options and typing them by hand on every call is impractical. To facilitate the user experience, after parsing the command-line, Gnuastro’s programs read special configuration files to get the necessary values for the options you haven’t identified on the command-line. These configuration files are fully described in Configuration files.

CAUTION: In specifying a file address, if you want to use the shell’s tilde expansion (~) to specify your home directory, leave at least one space between the option name and your value. For example use -o ~/test, --output ~/test or --output= ~/test. Calling them with -o~/test or --output=~/test will disable shell expansion.

CAUTION: If you forget to specify a value for an option which requires one, and that option is the last one, Gnuastro will warn you. But if it is in the middle of the command, it will take the text of the next option or argument as the value which can cause undefined behavior.

NOTE: In some contexts Gnuastro’s counting starts from 0 and in others 1. You can assume by default that counting starts from 1, if it starts from 0 for a special option, it will be explicitly mentioned.


Previous: , Up: Command-line   [Contents][Index]

4.1.2 Common options

To facilitate the job of the users and developers, all the programs in Gnuastro share some basic command-line options for the options that are common to many of the programs. The full list is classified as Input/Output options, Processing options, and Operating mode options. In some programs, some of the options are irrelevant, but still recognized (you won’t get an unrecognized option error, but the value isn’t used). Unless otherwise mentioned, these options are identical between all programs.


Next: , Previous: , Up: Common options   [Contents][Index]

4.1.2.1 Input/Output options

These options are to do with the input and outputs of the various programs.

-h STR/INT
--hdu=STR/INT

The name or number of the desired Header Data Unit, or HDU, in the FITS image. A FITS file can store multiple HDUs or extensions, each with either an image or a table or nothing at all (only a header). Note that counting of the extensions starts from 0(zero), not 1(one). Counting from 0 is forced on us by CFITSIO which directly reads the value you give with this option (see CFITSIO). When specifying the name, case is not important so IMAGE, image or ImAgE are equivalent.

CFITSIO has many capabilities to help you find the extension you want, far beyond the simple extension number and name. See CFITSIO manual’s “HDU Location Specification” section for a very complete explanation with several examples. A # is appended to the string you specify for the HDU43 and the result is put in square brackets and appended to the FITS file name before calling CFITSIO to read the contents of the HDU for all the programs in Gnuastro.

-s STR
--searchin=STR

Where to match/search for columns when the column identifier wasn’t a number, see Selecting table columns. The acceptable values are name, unit, or comment. This option is only relevant for programs that take table columns as input.

-I
--ignorecase

Ignore case while matching/searching column meta-data (in the field specified by the --searchin). The FITS standard suggests to treat the column names as case insensitive, which is strongly recommended here also but is not enforced. This option is only relevant for programs that take table columns as input.

This option is not relevant to BuildProgram, hence in that program the short option -I is used for include directories, not to ignore case.

-o STR
--output=STR

The name of the output file or directory. With this option the automatic output names explained in Automatic output are ignored.

-T STR
--type=STR

The data type of the output depending on the program context. This option isn’t applicable to some programs like Fits and will be ignored by them. The different acceptable values to this option are fully described in Numeric data types.

-D
--dontdelete

By default, if the output file already exists, Gnuastro’s programs will silently delete it and put their own outputs in its place. When this option is activated, if the output file already exists, the programs will not delete it, will warn you, and will abort.

-K
--keepinputdir

In automatic output names, don’t remove the directory information of the input file names. As explained in Automatic output, if no output name is specified (with --output), then the output name will be made in the existing directory based on your input’s file name (ignoring the directory of the input). If you call this option, the directory information of the input will be kept and the automatically generated output name will be in the same directory as the input (usually with a suffix added). Note that his is only relevant if you are running the program in a different directory than the input data.

-t STR
--tableformat=STR

The output table’s type. This option is only relevant when the output is a table and its format cannot be deduced from its filename. For example, if a name ending in .fits was given to --output, then the program knows you want a FITS table. But there are two types of FITS tables: FITS ASCII, and FITS binary. Thus, with this option, the program is able to identify which type you want. The currently recognized values to this option are:

txt

A plain text table with white-space characters between the columns (see Gnuastro text table format).

fits-ascii

A FITS ASCII table (see Recognized table formats).

fits-binary

A FITS binary table (see Recognized table formats).


Next: , Previous: , Up: Common options   [Contents][Index]

4.1.2.2 Processing options

Some processing steps are common to several programs, so they are defined as common options to all programs. Note that this class of common options is thus necessarily less common between all the programs than those described in Input/Output options, or Operating mode options options. Also, if they are irrelevant for a program, these options will not display in the --help output of the program.

-Z INT[,INT[,...]]
--tilesize=[,INT[,...]]

The size of regular tiles for tessellation, see Tessellation. For each dimension an integer length (in units of data-elements or pixels) is necessary. If the input dimensionality is different from the number of values given to this option, the program will stop with an error. Values must be separated by commas (,) and can also be fractions (for example 4/2). If they are fractions, the result must be an integer, otherwise an error will be printed.

-M INT[,INT[,...]]
--numchannels=INT[,INT[,...]]

The number of channels for larger input tessellation, see Tessellation. The number and types of acceptable values are similar to --tilesize. The only difference is that instead of length, the integers values given to this option represent the number of channels, not their size.

-F FLT
--remainderfrac=FLT

The fraction of remainder size along all dimensions to add to the first tile. See Tessellation for a complete description. This option is only relevant if --tilesize is not exactly divisible by the input dataset’s size in a dimension. If the remainder size is larger than this fraction (compared to --tilesize), then the remainder size will be added with one regular tile size and divided between two tiles at the start and end of the given dimension.

--workoverch

Ignore the channel borders for the high-level job of the given application. As a result, while the channel borders are respected in defining the small tiles (such that no tile will cross a channel border), the higher-level program operation will ignore them, see Tessellation.

--checktiles

Make a FITS file with the same dimensions as the input but each pixel is replaced with the ID of the tile that it is associated with. Note that the tile IDs start from 0. See Tessellation for more on Tiling an image in Gnuastro.

--oneelempertile

When showing the tile values (for example with --checktiles, or when the program’s output is tessellated) only use one element for each tile. This can be useful when only the relative values given to each tile compared to the rest are important or need to be checked. Since the tiles usually have a large number of pixels within them the output will be much smaller, and so easier to read, write, store, or send.

Note that when the full input size in any dimension is not exactly divisible by the given --tilesize in that dimension, the edge tile(s) will have different sizes (in units of the input’s size), see --remainderfrac. But with this option, all displayed values are going to have the (same) size of one data-element. Hence, in such cases, the image proportions are going to be slightly different with this option.

If your input image is not exactly divisible by the tile size and you want one value per tile for some higher-level processing, all is not lost though. You can see how many pixels were within each tile (for example to weight the values or discard some for later processing) with Gnuastro’s Statistics (see Statistics) as shown below. The output FITS file is going to have two extensions, one with the median calculated on each tile and one with the number of elements that each tile covers. You can then use the where operator in Arithmetic to set the values of all tiles that don’t have the regular area to a blank value.

$ aststatistics --median --number --ontile input.fits    \
                --oneelempertile --output=o.fits
$ REGULAR_AREA=1600    # Check second extension of `o.fits'.
$ astarithmetic o.fits o.fits $REGULAR_AREA ne nan where \
                -h1 -h2

Note that if input.fits also has blank values, then the median on tiles with blank values will also be ignored with the command above (which is desirable).

--inteponlyblank

When values are to be interpolated, only change the values of the blank elements, keep the non-blank elements untouched.

--interpnumngb=INT

The number of nearby non-blank neighbors to use for interpolation.


Previous: , Up: Common options   [Contents][Index]

4.1.2.3 Operating mode options

Another group of options that are common to all the programs in Gnuastro are those to do with the general operation of the programs. The explanation for those that are not only limited to Gnuastro but are common to all GNU programs start with (GNU option).

--

(GNU option) Stop parsing the command-line. This option can be useful in scripts or when using the shell history. Suppose you have a long list of options, and want to see if removing some of them (to read from configuration files, see Configuration files) can give a better result. If the ones you want to remove are the last ones on the command-line, you don’t have to delete them, you can just add -- before them and if you don’t get what you want, you can remove the -- and get the same initial result.

--usage

(GNU option) Only print the options and arguments and abort. This is very useful for when you know the what the options do, and have just forgot their long/short identifiers, see --usage.

-?
--help

(GNU option) Print all options with an explanation and abort. Adding this option will print all the options in their short and long formats, also displaying which ones need a value if they are called (with an = after the long format followed by a string specifying the format, see Options). A short explanation is also given for what the option is for. The program will quit immediately after the message is printed and will not do any form of processing, see --help.

-V
--version

(GNU option) Print a short message, showing the full name, version, copyright information and program authors and abort. On the first line, it will print the official name (not executable name) and version number of the program. Following this is a blank line and a copyright information. The program will not run.

-q
--quiet

Don’t report steps. All the programs in Gnuastro that have multiple major steps will report their steps for you to follow while they are operating. If you do not want to see these reports, you can call this option and only error/warning messages will be printed. If the steps are done very fast (depending on the properties of your input) disabling these reports will also decrease running time.

--cite

Print the BibTeX entry for Gnuastro and the particular program (if that program comes with a separate paper) and abort. Citations are vital for the continued work on Gnuastro. Gnuastro started and is continued based on separate research projects. So if you find any of the tools offered in Gnuastro to be useful in your research, please use the output of this command to cite the program and Gnuastro in your research paper. Thank you.

Gnuastro is still new, there is no separate paper only devoted to Gnuastro yet. Therefore currently the paper to cite for Gnuastro is the paper for NoiseChisel which is the first published paper introducing Gnuastro to the astronomical community. Upon reaching a certain point, a paper completely devoted to Gnuastro will be published, see GNU Astronomy Utilities 1.0.

-P
--printparams

With this option, Gnuastro’s programs will read your command-line options and all the configuration files. If there is no problem (like a missing parameter or a value in the wrong format or range) and immediately before actually running, the programs will print the full list of option names, values and descriptions, sorted and grouped by context and abort. They will also report the version number, the date they were configured on your system and the time they were reported.

As an example, you can give your full command-line options and even the input and output file names and finally just add -P to check if all the parameters are finely set. If everything is ok, you can just run the same command (easily retrieved from the shell history, with the top arrow key) and simply remove the last two characters that showed this option.

Since no program will actually start its processing when this option is called, the otherwise mandatory arguments for each program (for example input image or catalog files) are no longer required when you call this option.

--config=STR

Parse STR as a configuration file immediately when this option is confronted (see Configuration files). The --config option can be called multiple times in one run of any Gnuastro program on the command-line or in the configuration files. In any case, it will be immediately read (before parsing the rest of the options on the command-line, or lines in a configuration file).

Note that by definition, later options on the command-line still take precedence over those in these in any configuration file, including the file(s) given to this option. Also see --lastconfig and --onlyversion on how this option can be used for reproducible results.

-S
--setdirconf

Update the current directory configuration file for the Gnuastro program and quit. The full set of command-line and configuration file options will be parsed and options with a value will be written in the current directory configuration file for this program (see Configuration files). If the configuration file or its directory doesn’t exist, it will be created. If a configuration file exists it will be replaced (after it, and all other configuration files have been read). In any case, the program will not run.

This is the recommended method44 to edit/set the configuration file for all future calls to Gnuastro’s programs. It will internally check if your values are in the correct range and type and save them according to the configuration file format, see Configuration file format. So if there are unreasonable values to some options, the program will notify you and abort before writing the final configuration file.

When this option is called, the otherwise mandatory arguments, for example input image or catalog file(s), are no longer mandatory (since the program will not run).

-U
--setusrconf

Update the user configuration file and quit (see Configuration files). See explanation under --setdirconf for more details.

--lastconfig

This is the last configuration file that must be read. When this option is confronted in any stage of reading the options (on the command-line or in a configuration file), no other configuration file will be parsed, see Configuration file precedence and Current directory and User wide. Like all on/off options, on the command-line, this option doesn’t take any values. But in a configuration file, it takes the values of 0 or 1, see Configuration file format. If it is present in a configuration file with a value of 0, then all later occurrences of this option will be ignored.

--onlyversion=STR

Only run the program if Gnuastro’s version is exactly equal to STR (see Version numbering). Note that it is not compared as a number, but as a string of characters, so 0, or 0.0 and 0.00 are different. If the running Gnuastro version is different, then this option will report an error and abort as soon as it is confronted on the command-line or in a configuration file. If the running Gnuastro version is the same as STR, then the program will run as if this option was not called.

This is useful if you want your results to be exactly reproducible and not mistakenly run with an updated/newer or older version of the program. Besides internal algorithmic/behavior changes in programs, the existence of options or their names might change between versions (especially in these earlier versions of Gnuastro).

Hence, when using this option (probably in a script or in a configuration file), be sure to call it before other options. The benefit is that, when the version differs, the other options won’t be parsed and you, or your collaborators/users, won’t get errors saying an option in your configuration doesn’t exist in the running version of the program.

Here is one example of how this option can be used in conjunction with the --lastconfig option. Let’s assume that you were satisfied with the results of this command: astnoisechisel image.fits --detquant=0.95 (along with various options set in various configuration files). You can save the state of NoiseChisel and reproduce that exact result on image.fits later by following these steps (the the extra spaces, and \, are only for easy readability, if you want to try it out, only one space between each token is enough).

$ echo "onlyversion X.XX"             > reproducible.conf
$ echo "lastconfig 1"                >> reproducible.conf
$ astnoisechisel image.fits --detquant=0.95 -P       \
                                     >> reproducible.conf

--onlyversion was available from Gnuastro 0.0, so putting it immediately at the start of a configuration file will ensure that later, you (or others using different version) won’t get a non-recognized option error in case an option was added/removed. --lastconfig will inform the installed NoiseChisel to not parse any other configuration files. This is done because we don’t want the user’s user-wide or system wide option values affecting our results. Finally, with the third command, which has a -P (short for --printparams), NoiseChisel will print all the option values visible to it (in all the configuration files) and the shell will append them to reproduce.conf. Hence, you don’t have to worry about remembering the (possibly) different options in the different configuration files.

Afterwards, if you run NoiseChisel as shown below (telling it to read this configuration file with the --config option). You can be sure that there will either be an error (for version mis-match) or it will produce exactly the same result that you got before.

$ astnoisechisel --config=reproducible.conf
--log

Some programs can generate extra information about their outputs in a log file. When this option is called in those programs, the log file will also be printed. If the program doesn’t generate a log file, this option is ignored.

-N INT
--numthreads=INT

Use INT CPU threads when running a Gnuastro program (see Multi-threaded operations). Note that multi-threaded programming is only relevant to some programs. In others, this option will be ignored. If this option is not specified on the command-line or any configuration file, the number of threads will be determined by the programs at configuration time.


Next: , Previous: , Up: Common program behavior   [Contents][Index]

4.2 Configuration files

Each program needs a certain number of parameters to run. Supplying all the necessary parameters each time you run the program is very frustrating and prone to errors. Therefore all the programs read the values for the necessary options you have not given in the command line from one of several plain text files (which you can view and edit with any text editor). These files are known as configuration files and are usually kept in a directory named etc/ according to the file system hierarchy standard45.

The thing to have in mind is that none of the programs in Gnuastro keep any internal default value. All the values must either be stored in one of the configuration files or explicitly called in the command-line. In case the necessary parameters are not given through any of these methods, the program will print a missing option error and abort. The only exception to this is --numthreads, whose default value is determined at run-time using the number of threads available to your system, see Multi-threaded operations. Of course, you can still provide a default value for the number of threads at any of the levels below, but if you don’t, the program will not abort. Also note that through automatic output name generation, the value to the --output option is also not mandatory on the command-line or in the configuration files for all programs which don’t rely on that value as an input46, see Automatic output.


Next: , Previous: , Up: Configuration files   [Contents][Index]

4.2.1 Configuration file format

The configuration files for each program have the standard program executable name with a ‘.conf’ suffix. When you download the source code, you can find them in the same directory as the source code of each program, see Program source.

Any line in the configuration file whose first non-white character is a # is considered to be a comment and is ignored. An empty line is also similarly ignored. The long name of the option should be used as an identifier. The parameter name and parameter value have to be separated by any number of ‘white-space’ characters: space, tab or vertical tab. By default several space characters are used. If the value of an option has space characters (most commonly for the hdu option), then the full value can be enclosed in double quotation signs (", similar to the example in Arguments and options). If it is an option without a value in the --help output (on/off option, see Options), then the value should be 1 if it is to be ‘on’ and 0 otherwise.

In each non-commented and non-blank line, any text after the first two words (option identifier and value) is ignored. If an option identifier is not recognized in the configuration file, the name of the file, the line number of the unrecognized option, and the unrecognized identifier name will be reported and the program will abort. If a parameter is repeated more more than once in the configuration files, accepts only one value, and is not set on the command-line, then only the first value will be used, the rest will be ignored.

You can build or edit any of the directories and the configuration files yourself using any text editor. However, it is recommended to use the --setdirconf and --setusrconf options to set default values for the current directory or this user, see Operating mode options. With these options, the values you give will be checked before writing in the configuration file. They will also print a set of commented lines guiding the reader and will also classify the options based on their context and write them in their logical order to be more understandable.


Next: , Previous: , Up: Configuration files   [Contents][Index]

4.2.2 Configuration file precedence

The option values in all the programs of Gnuastro will be filled in the following order. If an option only takes one value which is given in an earlier step, any value for that option in a later step will be ignored. Note that if the lastconfig option is specified in any step below, all later files will be ignored (see Operating mode options). The basic idea behind setting this progressive state of checking for parameter values is that separate users of a computer or separate folders in a user’s file system might need different values for some parameters.

In each step, there can also be a configuration file containing the common options in all the programs: gnuastro.conf (see Common options). If options specific to one program are specified in this file, there will be un-recognized option errors, or unexpected behavior if the option has different behavior in another program. On the other hand, there is no problem with astprogname.conf containing common options47.

  1. Command-line options, for a particular run of ProgramName.
  2. .gnuastro/astprogname.conf is parsed by ProgramName in the current directory.
  3. .gnuastro/gnuastro.conf is parsed by all Gnuastro programs in the current directory.
  4. $HOME/.local/etc/astprogname.conf is parsed by ProgramName in the user’s home directory (see Current directory and User wide).
  5. $HOME/.local/etc/gnuastro.conf is parsed by all Gnuastro programs in the user’s home directory (see Current directory and User wide).
  6. prefix/etc/astprogname.conf is parsed by ProgramName in the system-wide installation directory (see System wide for prefix).
  7. prefix/etc/gnuastro.conf is parsed by all Gnuastro programs in the system-wide installation directory (see System wide for prefix).

Manipulating the order: You can manipulate this order or add new files with the following two options which are fully described in Operating mode options:

--config

Allows you to define any file to be parsed as a configuration file on the command-line or within the any other configuration file. Recall that the file given to --config is parsed immediately when this option is confronted (on the command-line or in a configuration file).

--lastconfig

Allows you to stop the parsing of subsequent configuration files. Note that if this option is given in a configuration file, it will be fully read, so its position in the configuration doesn’t matter (unlike --config).

One example of benefiting from these configuration files can be this: raw telescope images usually have their main image extension in the second FITS extension, while processed FITS images usually only have one extension. If your system-wide default input extension is 0 (the first), then when you want to work with the former group of data you have to explicitly mention it to the programs every time. With this progressive state of default values to check, you can set different default values for the different directories that you would like to run Gnuastro in for your different purposes, so you won’t have to worry about this issue any more.

The same can be said about the gnuastro.conf files: by specifying a behavior in this single file, all Gnuastro programs in the respective directory, user, or system-wide steps will behave similarly. For example to keep the input’s directory when no specific output is given (see Automatic output), or to not delete an existing file if it has the same name as a given output (see Input/Output options).


Next: , Previous: , Up: Configuration files   [Contents][Index]

4.2.3 Current directory and User wide

For the current (local) and user-wide directories, the configuration files are stored in the hidden sub-directories named .gnuastro/ and $HOME/.local/etc/ respectively. Unless you have changed it, the $HOME environment variable should point to your home directory. You can check it by running $ echo $HOME. Each time you run any of the programs in Gnuastro, this environment variable is read and placed in the above address. So if you suddenly see that your home configuration files are not being read, probably you (or some other program) has changed the value of this environment variable.

Although it might cause confusions like above, this dependence on the HOME environment variable enables you to temporarily use a different directory as your home directory. This can come in handy in complicated situations. To set the user or current directory configuration files based on your command-line input, you can use the --setdirconf or --setusrconf, see Operating mode options.


Previous: , Up: Configuration files   [Contents][Index]

4.2.4 System wide

When Gnuastro is installed, the configuration files that are shipped with the distribution are copied into the (possibly system wide) prefix/etc/ directory. For more details on prefix, see Installation directory (by default it is: /usr/local). This directory is the final place (with the lowest priority) that the programs in Gnuastro will check to retrieve parameter values.

If you remove an option and its value from the system wide configuration files, you either have to specify it in more immediate configuration files or set it each time in the command-line. Recall that none of the programs in Gnuastro keep any internal default values and will abort if they don’t find a value for the necessary parameters (except the number of threads and output file name). So even though you might never expect to use an optional option, it safe to have it available in this system-wide configuration file even if you don’t intend to use it frequently.

Note that in case you install Gnuastro from your distribution’s repositories, prefix will either be set to / (the root directory) or /usr, so you can find the system wide configuration variables in /etc/ or /usr/etc/. The prefix of /usr/local/ is conventionally used for programs you install from source by your self as in Quick start.


Next: , Previous: , Up: Common program behavior   [Contents][Index]

4.3 Multi-threaded operations

Some of the programs benefit significantly when you use all the threads your computer’s CPU has to offer to your operating system. The number of threads available can be larger than the number of physical (hardware) cores in the CPU (also known as Simultaneous multithreading). For example, in Intel’s CPUs (those that implement its Hyper-threading technology) the number of threads is usually double the number of physical cores in your CPU. On a GNU/Linux system, the number of threads available can be found with the command $ nproc command (part of GNU Coreutils).

Gnuastro’s programs can find the number of threads available to your system internally at run-time (when you execute the program). However, if a value is given to the --numthreads option, the given number will be used, see Operating mode options and Configuration files for ways to use this option. Thus --numthreads is the only common option in Gnuastro’s programs with a value that doesn’t have to be specified anywhere on the command-line or in the configuration files.


Next: , Previous: , Up: Multi-threaded operations   [Contents][Index]

4.3.1 A note on threads

Spinning off threads is not necessarily the most efficient way to run an application. Creating a new thread isn’t a cheap operation for the operating system. It is most useful when the input data are fixed and you want the same operation to be done on parts of it. For example one input image to Crop and multiple crops from various parts of it. In this fashion, the image is loaded into memory once, all the crops are divided between the number of threads internally and each thread cuts out those parts which are assigned to it from the same image. On the other hand, if you have multiple images and you want to crop the same region(s) out of all of them, it is much more efficient to set --numthreads=1 (so no threads spin off) and run Crop multiple times simultaneously, see How to run simultaneous operations.

You can check the boost in speed by first running a program on one of the data sets with the maximum number of threads and another time (with everything else the same) and only using one thread. You will notice that the wall-clock time (reported by most programs at their end) in the former is longer than the latter divided by number of physical CPU cores (not threads) available to your operating system. Asymptotically these two times can be equal (most of the time they aren’t). So limiting the programs to use only one thread and running them independently on the number of available threads will be more efficient.

Note that the operating system keeps a cache of recently processed data, so usually, the second time you process an identical data set (independent of the number of threads used), you will get faster results. In order to make an unbiased comparison, you have to first clean the system’s cache with the following command between the two runs.

$ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches

SUMMARY: Should I use multiple threads? Depends:

  • If you only have one data set (image in most cases!), then yes, the more threads you use (with a maximum of the number of threads available to your OS) the faster you will get your results.
  • If you want to run the same operation on multiple data sets, it is best to set the number of threads to 1 and use Make, or GNU Parallel, as explained in How to run simultaneous operations.

Previous: , Up: Multi-threaded operations   [Contents][Index]

4.3.2 How to run simultaneous operations

There are two48 approaches to simultaneously execute a program: using GNU Parallel or Make (GNU Make is the most common implementation). The first is very useful when you only want to do one job multiple times and want to get back to your work without actually keeping the command you ran. The second is usually for more important operations, with lots of dependencies between the different products (for example a full scientific research).

GNU Parallel

When you only want to run multiple instances of a command on different threads and get on with the rest of your work, the best method is to use GNU parallel. Surprisingly GNU Parallel is one of the few GNU packages that has no Info documentation but only a Man page, see Info. So to see the documentation after installing it please run

$ man parallel

As an example, let’s assume we want to crop a region fixed on the pixels (500, 600) with the default width from all the FITS images in the ./data directory ending with sci.fits to the current directory. To do this, you can run:

$ parallel astcrop --numthreads=1 --xc=500 --yc=600 ::: \
  ./data/*sci.fits

GNU Parallel can help in many more conditions, this is one of the simplest, see the man page for lots of other examples. For absolute beginners: the backslash (\) is only a line breaker to fit nicely in the page. If you type the whole command in one line, you should remove it.

Make

Make is a program for building “targets” (e.g., files) using “recipes” (a set of operations) when their known “prerequisites” (other files) have been updated. It elegantly allows you to define dependency structures for building your final output and updating it efficiently when the inputs change. It is the most common infra-structure to build software today.

Scientific research methodology is very similar to software development: you start by testing a hypothesis on a small sample of objects/targets with a simple set of steps. As you are able to get promising results, you improve the method and use it on a larger, more general, sample. In the process, you will confront many issues that have to be corrected (bugs in software development jargon). Make a wonderful tool to manage this style of development. It has been used to make reproducible papers, for example see the reproduction pipeline of the paper introducing NoiseChisel (one of Gnuastro’s programs).

GNU Make49 is the most common implementation which (similar to nearly all GNU programs, comes with a wonderful manual50). Make is very basic and simple, and thus the manual is short (the most important parts are in the first roughly 100 pages) and easy to read/understand.

Make comes with a --jobs (-j) option which allows you to specify the maximum number of jobs that can be done simultaneously. For example if you have 8 threads available to your operating system. You can run:

$ make -j8

With this command, Make will process your Makefile and create all the targets (can be thousands of FITS images for example) simultaneously on 8 threads, while fully respecting their dependencies (only building a file/target when its prerequisites are successfully built). Make is thus strongly recommended for managing scientific research where robustness, archivability, reproducibility and speed51 are very important.


Next: , Previous: , Up: Common program behavior   [Contents][Index]

4.4 Numeric data types

At the lowest level, the computer stores everything in terms of 1 or 0. For example, each program in Gnuastro, or each astronomical image you take with the telescope is actually a string of millions of these zeros and ones. The space required to keep a zero or one is the smallest unit of storage, and is known as a bit. However, understanding and manipulating this string of bits is extremely hard for most people. Therefore, we define packages of these bits along with a standard on how to interpret the bits in each package as a type.

The most basic standard for reading the bits is integer numbers (\(..., -2, -1, 0, 1, 2, ...\), more bits will give larger limits). The common integer types are 8, 16, 32, and 64 bits wide. For each width, there are two standards for reading the bits: signed and unsigned integers. In the former, negative numbers are allowed and in the latter, they aren’t. The unsigned types thus have larger positive limits (one extra bit), but no negative value. When the context of your work doesn’t involve negative numbers (for example counting, where negative is not defined), it is best to use the unsigned types. For full numerical range of all integer types, see below.

Another standard of converting a given number of bits to numbers is the floating point standard, this standard can approximately store any real number with a given precision. There are two common floating point types: 32-bit and 64-bit, for single and double precision floating point numbers respectively. The former is sufficient for data with less than 8 significant decimal digits (most astronomical data), while the latter is good for less than 16 significant decimal digits. The representation of real numbers as bits is much more complex than integers. If you are interested, you can start with the Wikipedia article.

With the conversion operators in Gnuastro’s Arithmetic, you can change the types of data to each other, which is necessary in some contexts. For example the program/library, that you intend to feed the data into, only accepts floating point values, but you have an integer image. Another situation that conversion can be helpful is when you know that your data only has values that fit within int8 or uint16. However it is currently formatted in the float64 type. Operations involving floating point or larger integer types are significantly slower than integer or smaller-width types respectively. In the latter case, it also requires much more (by 8 or 4 times in the example above) storage space. So when you confront such situations and want to store/archive/transfter the data, it is best convert them to the most efficient type.

The short and long names for the recognized numeric data types in Gnuastro are listed below. Both short and long names can be used when you want to specify a type. For example, as a value to the common option --type (see Input/Output options), or in the information comment lines of Gnuastro text table format. The ranges listed below are inclusive.

u8
uint8

8-bit un-signed integers, range:
\([0\rm{\ to\ }2^8-1]\) or \([0\rm{\ to\ }255]\).

i8
int8

8-bit signed integers, range:
\([-2^7\rm{\ to\ }2^7-1]\) or \([-127\rm{\ to\ }127]\).

u16
uint16

16-bit un-signed integers, range:
\([0\rm{\ to\ }2^{16}-1]\) or \([0\rm{\ to\ }65535]\).

i16
int16

16-bit signed integers, range:
\([-2^{15}\rm{\ to\ }2^{15}-1]\) or \([-32768\rm{\ to\ }32768]\).

u32
uint32

32-bit un-signed integers, range:
\([0\rm{\ to\ }2^{32}-1]\) or \([0\rm{\ to\ }4294967295]\).

i32
int32

32-bit signed integers, range:
\([-2^{31}\rm{\ to\ }2^{31}-1]\) or \([-2147483648\rm{\ to\ }2147483647]\).

u64
uint64

64-bit un-signed integers, range
\([0\rm{\ to\ }2^{64}-1]\) or \([0\rm{\ to\ }18446744073709551615]\).

i64
int64

64-bit signed integers, range:
\([-2^{63}\rm{\ to\ }2^{63}-1]\) or \([-9223372036854775808\rm{\ to\ }9223372036854775807]\).

f32
float32

32-bit (single-precision) floating point types. The maximum (minimum is its negative) possible value is \(3.402823\times10^{38}\). Single-precision floating points can accurately represent a floating point number up to \(\sim7.2\) significant decimals. Given the heavy noise in astronomical data, this is usually more than sufficient for storing results.

f64
float64

64-bit (double-precision) floating point types. The maximum (minimum is its negative) possible value is \(\sim10^{308}\). Double-precision floating points can accurately represent a floating point number \(\sim15.9\) significant decimals. This is usually good for processing (mixing) the data internally, for example a sum of single precision data (and later storing the result as float32).

Some file formats don’t recognize all types. Some file formats don’t recognize all the types, for example the FITS standard (see Fits) does not define uint64 in binary tables or images. When a type is not acceptable for output into a given file format, the respective Gnuastro program or library will let you know and abort. On the command-line, you can use the Arithmetic program to convert the numerical type of a dataset, in the libraries, you can call gal_data_copy_to_new_type.


Next: , Previous: , Up: Common program behavior   [Contents][Index]

4.5 Tables

“A table is a collection of related data held in a structured format within a database. It consists of columns, and rows.” (from Wikipedia). Each column in the table contains the values of one property and each row is a collection of properties (columns) for one target object. For example, let’s assume you have just ran MakeCatalog (see MakeCatalog) on an image to measure some properties for the labeled regions (which might be detected galaxies for example) in the image. For each labeled region (detected galaxy), there will be a row which groups its measured properties as columns, one column for each property. One such property can be the object’s magnitude, which is the sum of pixels with that label, or its center can be defined as the light-weighted average value of those pixels. Many such properties can be derived from the raw pixel values and their position, see Invoking MakeCatalog for a long list.

As a summary, for each labeled region (or, galaxy) we have one row and for each measured property we have one column. This high-level structure is usually the first step for higher-level analysis, for example finding the stellar mass or photometric redshift from magnitudes in multiple colors. Thus, tables are not just outputs of programs, in fact it is much more common for tables to be inputs of programs. For example, to make a mock galaxy image, you need to feed in the properties of each galaxy into MakeProfiles for it do the inverse of the process above and make a simulated image from a catalog, see Sufi simulates a detection. In other cases, you can feed a table into Crop and it will crop out regions centered on the positions within the table, see Hubble visually checks and classifies his catalog. So to end this relatively long introduction, tables play a very important role in astronomy, or generally all branches of data analysis.

In Recognized table formats the currently recognized table formats in Gnuastro are discussed. You can use any of these tables as input or ask for them to be built as output. The most common type of table format is a simple plain text file with each row on one line and columns separated by white space characters, this format is easy to read/write by eye/hand. To give it the full functionality of more specific table types like the FITS tables, Gnuastro has a special convention which you can use to give each column a name, type, unit, and comments, while still being readable by other plain text table readers. This convention is described in Gnuastro text table format.

When tables are input to a program, the program reading it needs to know which column(s) it should use for its desired purposes. Gnuastro’s programs all follow a similar convention, on the way you can select columns in a table. They are thoroughly discussed in Selecting table columns.


Next: , Previous: , Up: Tables   [Contents][Index]

4.5.1 Recognized table formats

The list of table formats that Gnuastro can currently read from and write to are described below. Each has their own advantage and disadvantages, so a short review of the format is also provided to help you make the best choice based on how you want to define your input tables or later use your output tables.

Plain text table

This is the most basic and simplest way to create, view, or edit the table by hand on a text editor. The other formats described below are less eye-friendly and have a more formal structure (for easier computer readability). It is fully described in Gnuastro text table format.

FITS ASCII tables

The FITS ASCII table extension is fully in ASCII encoding and thus easily readable on any text editor (assuming it is the only extension in the FITS file). If the FITS file also contains binary extensions (for example an image or binary table extensions), then there will be many hard to print characters. The FITS ASCII format doesn’t have new line characters to separate rows. In the FITS ASCII table standard, each row is defined as a fixed number of characters (value to the NAXIS1 keyword), so to visually inspect it properly, you would have to adjust your text editor’s width to this value. All columns start at given character positions and have a fixed width (number of characters).

Numbers in a FITS ASCII table are printed into ASCII format, they are not in binary (that the CPU uses). Hence, they can take a larger space in memory, loose their precision, and take longer to read into memory. If you are dealing with integer type columns (see Numeric data types), another issue with FITS ASCII tables is that the type information for the column will be lost (there is only one integer type in FITS ASCII tables). One problem with the binary format on the other hand is that it isn’t portable (different CPUs/compilers) have different standards for translating the zeros and ones. But since ASCII characters are defined on a byte and are well recognized, they are better for portability on those various systems. Gnuastro’s plain text table format described below is much more portable and easier to read/write/interpret by humans manually.

Generally, as the name implies, this format is useful for when your table mainly contains ASCII columns (for example file names, or descriptions). They can be useful when you need to include columns with structured ASCII information along with other extensions in one FITS file. In such cases, you can also consider header keywords (see Fits).

FITS binary tables

The FITS binary table is the FITS standard’s solution to the issues discussed with keeping numbers in ASCII format as described under the FITS ASCII table title above. Only columns defined as a string type (a string of ASCII characters) are readable in a text editor. The portability problem with binary formats discussed above is mostly solved thanks to the portability of CFITSIO (see CFITSIO) and the very long history of the FITS format which has been widely used since the 1970s.

In the case of most numbers, storing them in binary format is more memory efficient than ASCII format. For example, to store -25.72034 in ASCII format, you need 9 bytes/characters. But if you keep this same number (to the approximate precision possible) as a 4-byte (32-bit) floating point number, you can keep/transmit it with less than half the amount of memory. When catalogs contain thousands/millions of rows in tens/hundreds of columns, this can lead to significant improvements in memory/band-width usage. Moreover, since the CPU does its operations in the binary formats, reading the table in and writing it out is also much faster than an ASCII table.

When you are dealing with integer numbers, the compression ratio can be even better, for example if you know all of the values in a column are positive and less than 255, you can use the unsigned char type which only takes one byte! If they are between -128 and 127, then you can use the (signed) char type. So if you are thoughtful about the limits of your integer columns, you can greatly reduce the size of your file and also the speed at which it is read/written. This can be very useful when sharing your results with collaborators or publishing them. To decrease the file size even more you can name your output as ending in .fits.gz so it is also compressed after creation. Just note that compression/decompressing is CPU intensive and can slow down the writing/reading of the file.

Fortunately the FITS Binary table format also accepts ASCII strings as column types (along with the various numerical types). So your dataset can also contain non-numerical columns.


Next: , Previous: , Up: Tables   [Contents][Index]

4.5.2 Gnuastro text table format

Plain text files are the most generic, portable, and easiest way to (manually) create, (visually) inspect, or (manually) edit a table. In this format, the ending of a row is defined by the new-line character (a line on a text editor). So when you view it on a text editor, every row will occupy one line. The delimiters (or characters separating the columns) are white space characters (space, horizontal tab, vertical tab) and a comma (,). The only further requirement is that all rows/lines must have the same number of columns.

The columns don’t have to be exactly under each other and the rows can be arbitrarily long with different lengths. For example the following contents in a file would be interpreted as a table with 4 columns and 2 rows, with each element interpreted as a double type (see Numeric data types).

1     2.234948   128   39.8923e8
2 , 4.454        792     72.98348e7

However, the example above has no other information about the columns (it is just raw data, with no meta-data). To use this table, you have to remember what the numbers in each column represent. Also, when you want to select columns, you have to count their position within the table. This can become frustrating and prone to bad errors (getting the columns wrong) especially as the number of columns increase. It is also bad for sending to a colleague, because they will find it hard to remember/use the columns properly.

To solve these problems in Gnuastro’s programs/libraries you aren’t limited to using the column’s number, see Selecting table columns. If the columns have names, units, or comments you can also select your columns based on searches/matches in these fields, for example see Table. Also, in this manner, you can’t guide the program reading the table on how to read the numbers. As an example, the first and third columns above can be read as integer types: the first column might be an ID and the third can be the number of pixels an object occupies in an image. So there is no need to read these to columns as a double type (which takes more memory, and is slower).

In the bare-minimum example above, you also can’t use strings of characters, for example the names of filters, or some other identifier that includes non-numerical characters. In the absence of any information, only numbers can be read robustly. Assuming we read columns with non-numerical characters as string, there would still be the problem that the strings might contain space (or any delimiter) character for some rows. So, each ‘word’ in the string will be interpreted as a column and the program will abort with an error that the rows don’t have the same number of columns.

To correct for these limitations, Gnuastro defines the following convention for storing the table meta-data along with the raw data in one plain text file. The format is primarily designed for ease of reading/writing by eye/fingers, but is also structured enough to be read by a program.

When the first non-white character in a line is #, or there are no non-white characters in it, then the line will not be considered as a row of data in the table (this is a pretty standard convention in many programs, and higher level languages). In the former case, the line is interpreted as a comment. If the comment line starts with ‘# Column N:’, then it is assumed to contain information about column N (a number, counting from 1). Comment lines that don’t start with this pattern are ignored and you can use them to include any further information you want to store with the table in the text file. A column information comment is assumed to have the following format:

# Column N: NAME [UNIT, TYPE, BLANK] COMMENT

Any sequence of characters between ‘:’ and ‘[’ will be interpreted as the column name (so it can contain anything except the ‘[’ character). Anything between the ‘]’ and the end of the line is defined as a comment. Within the brackets, anything before the first ‘,’ is the units (physical units, for example km/s, or erg/s), anything before the second ‘,’ is the short type identifier (see below, and Numeric data types). Finally (still within the brackets), any non-white characters after the second ‘,’ are interpreted as the blank value for that column (see Blank pixels). Note that blank values will be stored in the same type as the column, not as a string52.

When a formatting problem occurs (for example you have specified the wrong type code, see below), or the the column was already given meta-data in a previous comment, or the column number is larger than the actual number of columns in the table (the non-commented or empty lines), then the comment information line will be ignored.

When a comment information line can be used, the leading and trailing white space characters will be stripped from all of the elements. For example in this line:

# Column 5:  column name   [km/s,    f32,-99] Redshift as speed

The NAME field will be ‘column name’ and the TYPE field will be ‘f32’. Note how all the white space characters before and after strings are not used, but those in the middle remained. Also, white space characters aren’t mandatory. Hence, in the example above, the BLANK field will be given the value of ‘-99’.

Except for the column number (N), the rest of the fields are optional. Also, the column information comments don’t have to be in order. In other words, the information for column \(N+m\) (\(m>0\)) can be given in a line before column \(N\). Also, you don’t have to specify information for all columns. Those columns that don’t have this information will be interpreted with the default settings (like the case above: values are double precision floating point, and the column has no name, unit, or comment). So these lines are all acceptable for any table (the first one, with nothing but the column number is redundant):

# Column 5:
# Column 1: ID [,i] The Clump ID.
# Column 3: mag_f160w [AB mag, f] Magnitude from the F160W filter

The data type of the column should be specified with one of the following values:

Note that the FITS binary table standard does not define the unsigned int and unsigned long types, so if you want to convert your tables to FITS binary tables, use other types. Also, note that in the FITS ASCII table, there is only one integer type (long). So if you convert a Gnuastro plain text table to a FITS ASCII table with the Table program, the type information for integers will be lost. Conversely if integer types are important for you, you have to manually set them when reading a FITS ASCII table (for example with the Table program when reading/converting into a file, or with the gnuastro/table.h library functions when reading into memory).


Previous: , Up: Tables   [Contents][Index]

4.5.3 Selecting table columns

At the lowest level, the only defining aspect of a column in a table is its number, or position. But selecting columns purely by number is not very convenient and, especially when the tables are large it can be very frustrating and prone to errors. Hence, table file formats (for example see Recognized table formats) have ways to store additional information about the columns (meta-data). Some of the most common pieces of information about each column are its name, the units of data in the it, and a comment for longer/informal description of the column’s data.

To facilitate research with Gnuastro, you can select columns by matching, or searching in these three fields, besides the low-level column number. To view the full list of information on the columns in the table, you can use the Table program (see Table) with the command below (replace table-file with the filename of your table, if its FITS, you might also need to specify the HDU/extension which contains the table):

$ asttable --information table-file

Gnuastro’s programs need the columns for different purposes, for example in Crop, you specify the column containing the Right Ascension of the crop centers with the --racol option and the column containing the Declination with --deccol. Thus, there is no unified common option name to select columns for all programs. However, when the program expects the column for a specific context (like the RA and Dec example above), the option names end in the col suffix (for example --racol and --deccol). These options accept values in integer (column number), or string (metadata match/search) format.

If the value can be parsed as a positive integer, it will be seen as the low-level column number. Note that column counting starts from 1, so if you ask for column 0, the respective program will abort with an error. When the value can’t be interpreted as an a integer number, it will be seen as a string of characters which will be used to match/search in the table’s meta-data. The meta-data field which the value will be compared with can be selected through the --searchin option, see Input/Output options. --searchin can take three values: name, unit, comment. The matching will be done following this convention:

Note that in both cases, you can ignore the case of alphabetic characters with the --ignorecase option, see Input/Output options. Also, in both cases, multiple columns may be selected with one call to this function. In this case, the order of the selected columns (with one call) will be the same order as they appear in the table.


Next: , Previous: , Up: Common program behavior   [Contents][Index]

4.6 Tessellation

It is sometimes necessary to classify the elements in a dataset (for example pixels in an image) into a grid of individual, non-overlapping tiles. For example when background sky gradients are present in an image, you can define a tile grid over the image. When the tile sizes are set properly, the background’s variation over each tile will be negligible, allowing you to measure (and subtract) it. In other cases (for example spatial domain convolution in Gnuastro, see Convolve), it might simply be for speed of processing: each tile can be processed independently on a separate CPU thread. In the arts and mathematics, this process is formally known as tessellation.

The size of the regular tiles (in units of data-elements, or pixels in an image) can be defined with the --tilesize option. It takes multiple numbers (separated by a comma) which will be the length along the respective dimension (in FORTRAN/FITS dimension order). Divisions are also acceptable, but must result in an integer. For example --tilesize=30,40 can be used for an image (a 2D dataset). The regular tile size along the first FITS axis (horizontal when viewed in SAO ds9) will be 30 pixels and along the second it will be 40 pixels. Ideally, --tilesize should be selected such that all tiles in the image have exactly the same size. In other words, that the dataset length in each dimension is divisible by the tile size in that dimension.

However, this is not always possible: the dataset can be any size and every pixel in it is valuable. In such cases, Gnuastro will look at the significance of the remainder length, if it is not significant (for example one or two pixels), then it will just increase the size of the first tile in the respective dimension and allow the rest of the tiles to have the required size. When the remainder is significant (for example one pixel less than the size along that dimension), the remainder will be added to one regular tile’s size and the large tile will be cut in half and put in the two ends of the grid/tessellation. In this way, all the tiles in the central regions of the dataset will have the regular tile sizes and the tiles on the edge will be slightly larger/smaller depending on the remainder significance. The fraction which defines the remainder significance along all dimensions can be set through --remainderfrac.

The best tile size is directly related to the spatial properties of the property you want to study (for example, gradient on the image). In practice we assume that the gradient is not present over each tile. So if there is a strong gradient (for example in long wavelength ground based images) or the image is of a crowded area where there isn’t too much blank area, you have to choose a smaller tile size. A larger mesh will give more pixels and and so the scatter in the results will be less (better statistics).

For raw image processing, a single tessellation/grid is not sufficient. Raw images are the unprocessed outputs of the camera detectors. Modern detectors usually have multiple readout channels each with its own amplifier. For example the Hubble Space Telescope Advanced Camera for Surveys (ACS) has four amplifiers over its full detector area dividing the square field of view to four smaller squares. Ground based image detectors are not exempt, for example each CCD of Subaru Telescope’s Hyper Suprime-Cam camera (which has 104 CCDs) has four amplifiers, but they have the same height of the CCD and divide the width by four parts.

The bias current on each amplifier is different, and initial bias subtraction is not perfect. So even after subtracting the measured bias current, you can usually still identify the boundaries of different amplifiers by eye. See Figure 11(a) in Akhlaghi and Ichikawa (2015) for an example. This results in the final reduced data to have non-uniform amplifier-shaped regions with higher or lower background flux values. Such systematic biases will then propagate to all subsequent measurements we do on the data (for example photometry and subsequent stellar mass and star formation rate measurements in the case of galaxies).

Therefore an accurate analysis requires a two layer tessellation: the top layer contains larger tiles, each covering one amplifier channel. For clarity we’ll call these larger tiles “channels”. The number of channels along each dimension is defined through the --numchannels. Each channel is then covered by its own individual smaller tessellation (with tile sizes determined by the --tilesize option). This will allow independent analysis of two adjacent pixels from different channels if necessary. If the image is processed or the detector only has one amplifier, you can set the number of channels in both dimension to 1.

The final tessellation can be inspected on the image with the --checktiles option that is available to all programs which use tessellation for localized operations. When this option is called, a FITS file with a _tiled.fits suffix will be created along with the outputs, see Automatic output. Each pixel in this image has the number of the tile that covers it. If the number of channels in any dimension are larger than unity, you will notice that the tile IDs are defined such that the first channels is covered first, then the second and so on. For the full list of processing-related common options (including tessellation options), please see Processing options.


Next: , Previous: , Up: Common program behavior   [Contents][Index]

4.7 Getting help

Probably the first time you read this book, it is either in the PDF or HTML formats. These two formats are very convenient for when you are not actually working, but when you are only reading. Later on, when you start to use the programs and you are deep in the middle of your work, some of the details will inevitably be forgotten. Going to find the PDF file (printed or digital) or the HTML webpage is a major distraction.

GNU software have a very unique set of tools for aiding your memory on the command-line, where you are working, depending how much of it you need to remember. In the past, such command-line help was known as “online” help, because they were literally provided to you ‘on’ the command ‘line’. However, nowadays the word “online” refers to something on the internet, so that term will not be used. With this type of help, you can resume your exciting research without taking your hands off the keyboard.

Another major advantage of such command-line based help routines is that they are installed with the software in your computer, therefore they are always in sync with the executable you are actually running. Three of them are actually part of the executable. You don’t have to worry about the version of the book or program. If you rely on external help (a PDF in your personal print or digital archive or HTML from the official webpage) you have to check to see if their versions fit with your installed program.

If you only need to remember the short or long names of the options, --usage is advised. If it is what the options do, then --help is a great tool. Man pages are also provided for those who are use to this older system of documentation. This full book is also available to you on the command-line in Info format. If none of these seems to resolve the problems, there is a mailing list which enables you to get in touch with experienced Gnuastro users. In the subsections below each of these methods are reviewed.


Next: , Previous: , Up: Getting help   [Contents][Index]

4.7.1 --usage

If you give this option, the program will not run. It will only print a very concise message showing the options and arguments. Everything within square brackets ([]) is optional. For example here are the first and last two lines of Crop’s --usage is shown:

$ astcrop --usage
Usage: astcrop [-Do?IPqSVW] [-d INT] [-h INT] [-r INT] [-w INT]
            [-x INT] [-y INT] [-c INT] [-p STR] [-N INT] [--deccol=INT]
            ....
            [--setusrconf] [--usage] [--version] [--wcsmode]
            [ASCIIcatalog] FITSimage(s).fits

There are no explanations on the options, just their short and long names shown separately. After the program name, the short format of all the options that don’t require a value (on/off options) is displayed. Those that do require a value then follow in separate brackets, each displaying the format of the input they want, see Options. Since all options are optional, they are shown in square brackets, but arguments can also be optional. For example in this example, a catalog name is optional and is only required in some modes. This is a standard method of displaying optional arguments for all GNU software.


Next: , Previous: , Up: Getting help   [Contents][Index]

4.7.2 --help

If the command-line includes this option, the program will not be run. It will print a complete list of all available options along with a short explanation. The options are also grouped by their context. Within each context, the options are sorted alphabetically. Since the options are shown in detail afterwards, the first line of the --help output shows the arguments and if they are optional or not, similar to --usage.

In the --help output of all programs in Gnuastro, the options for each program are classified based on context. The first two contexts are always options to do with the input and output respectively. For example input image extensions or supplementary input files for the inputs. The last class of options is also fixed in all of Gnuastro, it shows operating mode options. Most of these options are already explained in Operating mode options.

The help message will sometimes be longer than the vertical size of your terminal. If you are using a graphical user interface terminal emulator, you can scroll the terminal with your mouse, but we promised no mice distractions! So here are some suggestions:

In case you have a special keyword you are looking for in the help, you don’t have to go through the full list. GNU Grep is made for this job. For example if you only want the list of options whose --help output contains the word “axis” in Crop, you can run the following command:

$ astcrop --help | grep axis

If the output of this option does not fit nicely within the confines of your terminal, GNU does enable you to customize its output through the environment variable ARGP_HELP_FMT, you can set various parameters which specify the formatting of the help messages. For example if your terminals are wider than 70 spaces (say 100) and you feel there is too much empty space between the long options and the short explanation, you can change these formats by giving values to this environment variable before running the program with the --help output. You can define this environment variable in this manner:

$ export ARGP_HELP_FMT=rmargin=100,opt-doc-col=20

This will affect all GNU programs using GNU C library’s argp.h facilities as long as the environment variable is in memory. You can see the full list of these formatting parameters in the “Argp User Customization” part of the GNU C library manual. If you are more comfortable to read the --help outputs of all GNU software in your customized format, you can add your customizations (similar to the line above, without the $ sign) to your ~/.bashrc file. This is a standard option for all GNU software.


Next: , Previous: , Up: Getting help   [Contents][Index]

4.7.3 Man pages

Man pages were the Unix method of providing command-line documentation to a program. With GNU Info, see Info the usage of this method of documentation is highly discouraged. This is because Info provides a much more easier to navigate and read environment.

However, some operating systems require a man page for packages that are installed and some people are still used to this method of command line help. So the programs in Gnuastro also have Man pages which are automatically generated from the outputs of --version and --help using the GNU help2man program. So if you run

$ man programname

You will be provided with a man page listing the options in the standard manner.


Next: , Previous: , Up: Getting help   [Contents][Index]

4.7.4 Info

Info is the standard documentation format for all GNU software. It is a very useful command-line document viewing format, fully equipped with links between the various pages and menus and search capabilities. As explained before, the best thing about it is that it is available for you the moment you need to refresh your memory on any command-line tool in the middle of your work without having to take your hands off the keyboard. This complete book is available in Info format and can be accessed from anywhere on the command-line.

To open the Info format of any installed programs or library on your system which has an Info format book, you can simply run the command below (change executablename to the executable name of the program or library):

$ info executablename

In case you are not already familiar with it, run $ info info. It does a fantastic job in explaining all its capabilities its self. It is very short and you will become sufficiently fluent in about half an hour. Since all GNU software documentation is also provided in Info, your whole GNU/Linux life will significantly improve.

Once you’ve become an efficient navigator in Info, you can go to any part of this book or any other GNU software or library manual, no matter how long it is, in a matter of seconds. It also blends nicely with GNU Emacs (a text editor) and you can search manuals while you are writing your document or programs without taking your hands off the keyboard, this is most useful for libraries like the GNU C library. To be able to access all the Info manuals installed in your GNU/Linux within Emacs, type Ctrl-H + i.

To see this whole book from the beginning in Info, you can run

$ info gnuastro

If you run Info with the particular program executable name, for example astcrop or astnoisechisel:

$ info astprogramname

you will be taken to the section titled “Invoking ProgramName” which explains the inputs and outputs along with the command-line options for that program. Finally, if you run Info with the official program name, for example Crop or NoiseChisel:

$ info ProgramName

you will be taken to the top section which introduces the program. Note that in all cases, Info is not case sensitive.


Previous: , Up: Getting help   [Contents][Index]

4.7.5 help-gnuastro mailing list

Gnuastro maintains the help-gnuastro mailing list for users to ask any questions related to Gnuastro. The experienced Gnuastro users and some of its developers are subscribed to this mailing list and your email will be sent to them immediately. However, when contacting this mailing list please have in mind that they are possibly very busy and might not be able to answer immediately.

To ask a question from this mailing list, send a mail to help-gnuastro@gnu.org. Anyone can view the mailing list archives at http://lists.gnu.org/archive/html/help-gnuastro/. It is best that before sending a mail, you search the archives to see if anyone has asked a question similar to yours. If you want to make a suggestion or report a bug, please don’t send a mail to this mailing list. We have other mailing lists and tools for those purposes, see Report a bug or Suggest new feature.


Next: , Previous: , Up: Common program behavior   [Contents][Index]

4.8 Automatic output

All the programs in Gnuastro are designed such that specifying an output file or directory (based on the program context) is optional. The outputs of most programs are automatically found based on the input and what the program does. For example when you are converting a FITS image named FITSimage.fits to a JPEG image, the JPEG image will be saved in FITSimage.jpg.

Another very important part of the automatic output generation is that all the directory information of the input file name is stripped off of it. This feature can be disabled with the --keepinputdir option, see Common options. It is the default because astronomical data are usually very large and organized specially with special file names. In some cases, the user might not have write permissions in those directories. In fact, even if the data is stored on your own computer, it is advised to only grant write permissions to the super user or root. This way, you won’t accidentally delete or modify your valuable data!

Let’s assume that we are working on a report and want to process the FITS images from two projects (ABC and DEF), which are stored in the sub-directories named ABCproject/ and DEFproject/ of our top data directory (/mnt/data). The following shell commands show how one image from the former is first converted to a JPEG image through ConvertType and then the objects from an image in the latter project are detected using NoiseChisel. The text after the # sign are comments (not typed!).

$ pwd                                               # Current location
/home/usrname/research/report
$ ls                                         # List directory contents
ABC01.jpg
$ ls /mnt/data/ABCproject                                  # Archive 1
ABC01.fits ABC02.fits ABC03.fits
$ ls /mnt/data/DEFproject                                  # Archive 2
DEF01.fits DEF02.fits DEF03.fits
$ astconvertt /mnt/data/ABCproject/ABC02.fits --output=jpg    # Prog 1
$ ls
ABC01.jpg ABC02.jpg
$ astnoisechisel /mnt/data/DEFproject/DEF01.fits              # Prog 2
$ ls
ABC01.jpg ABC02.jpg DEF01_labeled.fits

Previous: , Up: Common program behavior   [Contents][Index]

4.9 Output headers

The output FITS files created by Gnuastro’s programs have some or all of the following standard keywords to keep the basic date and version information of Gnuastro, its dependencies and the pipeline that is using Gnuastro.

DATE

The creation time of the FITS file. This date is written directly by CFITSIO and is in UT format.

COMMIT

Git’s commit description from the running directory of Gnuastro’s programs. If the running directory is not version controlled or libgit2 isn’t installed (see Optional dependencies) then this keyword will not be present. The printed value is equivalent to the output of the following command:

git describe --dirty --always

If the running directory contains non-committed work, then the stored value will have a ‘-dirty’ suffix. This can be very helpful to let you know that the data is not ready to be shared with collaborators or submitted to a journal. You should only share results that are produced after all your work is committed (safely stored in the version controlled history and thus reproducible).

At first sight, version control appears to be mainly a tool for software developers. However progress in a scientific research is almost identical to progress in software development: first you have a rough idea that starts with handful of easy steps. But as the first results appear to be promising, you will have to extend, or generalize, it to make it more robust and work in all the situations your research covers, not just your first test samples. Slowly you will find wrong assumptions or bad implementations that need to be fixed (‘bugs’ in software development parlance). Finally, when you submit the research to your collaborators or a journal, many comments and suggestions will come in, and you have to address them.

Software developers have created version control systems precisely for this kind of activity. Each significant moment in the project’s history is called a “commit”, see Version controlled source. A snapshot of the project in each “commit” is safely stored away, so you can revert back to it at a later time, or check changes/progress. This way, you can be sure that your work is reproducible and track the progress and history. With version control, experimentation in the project’s analysis is greatly facilitated, since you can easily revert back if a brainstorm test procedure fails.

One important feature of version control is that the research result (FITS image, table, report or paper) can be stamped with the unique commit information that produced it. This information will enable you to exactly reproduce that same result later, even if you have made changes/progress. For one example of a research paper’s reproduction pipeline, please see the reproduction pipeline of the paper describing NoiseChisel.

CFITSIO

The version of CFITSIO used (see CFITSIO).

WCSLIB

The version of WCSLIB used (see WCSLIB). Note that older versions of WCSLIB do not report the version internally. So this is only available if you are using more recent WCSLIB versions.

GSL

The version of GNU Scientific Library that was used, see GNU Scientific library.

GNUASTRO

The version of Gnuastro used (see Version numbering).

Here is one example of the last few lines of an example output.

              / Versions and date
DATE    = '...'                / file creation date
COMMIT  = 'v0-8-g547f6eb'      / Commit description in running dir.
CFITSIO = '3.41    '           / CFITSIO version.
WCSLIB  = '5.16    '           / WCSLIB version.
GSL     = '2.3     '           / GNU Scientific Library version.
GNUASTRO= '0.3'                / GNU Astronomy Utilities version.
END

Next: , Previous: , Up: Top   [Contents][Index]

5 Data containers

The most low-level and basic property of a dataset is how it is stored. To process, archive and transmit the data, you need a container to store it first. From the start of the computer age, different formats have been defined to store data, optimized for particular applications. One format/container can never be useful for all applications: the storage defines the application and vice-versa. In astronomy, the Flexible Image Transport System (FITS) standard has become the most common format of data storage and transmission. It has many useful features, for example multiple sub-containers (also known as extensions or header data units, HDUs) within one file, or support for tables as well as images. Each HDU can store an independent dataset and its corresponding meta-data. Therefore, Gnuastro has one program (see Fits) specifically designed to manipulate FITS HDUs and the meta-data (header keywords) in each HDU.

Your astronomical research does not just involve data analysis (where the FITS format is very useful). For example you want to demonstrate your raw and processed FITS images or spectra as figures within slides, reports, or papers. The FITS format is not defined for such applications. Thus, Gnuastro also comes with the ConvertType program (see ConvertType) which can be used to convert a FITS image to and from (where possible) other formats like plain text and JPEG (which allow two way conversion), along with EPS and PDF (which can only be created from FITS, not the other way round).

Finally, the FITS format is not just for images, it can also store tables. Binary tables in particular can be very efficient in storing catalogs that have more than a few tens of columns and rows. However, unlike images (where all elements/pixels have one data type), tables contain multiple columns and each column can have different properties: independent data types (see Numeric data types) and meta-data. In practice, each column can be viewed as a separate container that is grouped with others in the table. The only shared property of the columns in a table is thus the number of elements they contain. To allow easy inspection/manipulation of table columns, Gnuastro has the Table program (see Table). It can be used to select certain table columns in a FITS table and see them as a human readable output on the command-line, or to save them into another plain text or FITS table.


Next: , Previous: , Up: Data containers   [Contents][Index]

5.1 Fits

The “Flexible Image Transport System”, or FITS, is by far the most common data container format in astronomy. Although the full name of the standard invokes the idea that it is only for images, it also contains very complete and robust features for tables. It started off in the 1970s and was formally published as a standard in 1981, it was adopted by the International Astronomical Union (IAU) in 1982 and an IAU working group to maintain its future was defined in 1988. The FITS 2.0 and 3.0 standards were approved in 2000 and 2008 respectively, and the 4.0 draft has also been released recently, please see the FITS standard document webpage for the full text of all versions. Also see the FITS 3.0 standard paper for a nice introduction and history along with the full standard.

Other formats, for example a JPEG image, only have one image/dataset per file, however one great advantage of the FITS standard is that it allows you to keep multiple datasets (images or tables along with their meta-data) in one file. Each data + metadata is known as an extension, or more formally a header data unit or HDU, in the FITS standard. In theory the HDUs can be completely independent: you can have multiple images of different dimensions/sizes or tables as separate extensions in one file. However, while the standard doesn’t impose any constraints on the relation between the datasets, it is strongly encouraged to group data that are contextually related with each other in one file. For example an image and the table/catalog of objects and their measured properties in that image. Another example can be multiple images of one patch of sky in different colors (filters).

As discussed above, the extensions in a FITS file can be completely independent. To keep some information (meta-data) about the group of extensions in the FITS file, the community has adopted the following convention: put no data in the first extension, so it is just meta-data. This extension can thus be used to store Meta-data regarding the whole file (grouping of extensions). Subsequent extensions may contain data along with their own separate meta-data. All of Gnuastro’s programs also follow this convention: the main dataset (image or table) is in the second extension. See the example list of extension properties in Invoking Fits.

The meta-data contain information about the data, for example which region of the sky an image corresponds to, the units of the data, what telescope, camera, and filter the data were taken with, the software that produced it, or it observation or processing date. Hence without the meta-data, the raw dataset is practically just a collection of numbers and really hard to understand, or connect with the real world (other datasets). It is thus strongly encouraged to supplement your data (at any level of processing) with as much meta-data about your processing/science as possible.

The meta-data of a FITS file is in ASCII format, which can be easily viewed or edited with a text editor or on the command-line. Each meta-data element (known as a keyword generally) is composed of a name, value, units and comments (the last two are optional). For example below you can see three FITS meta-data keywords for specifying the world coordinate system (WCS, or its location in the sky) of a dataset:

LATPOLE =           -27.805089 / [deg] Native latitude of celestial pole
RADESYS = 'FK5'                / Equatorial coordinate system
EQUINOX =               2000.0 / [yr] Equinox of equatorial coordinates

However, there are some limitations which discourage viewing/editing the keywords with text editors. For example there is a fixed length of 80 characters for each keyword (its name, value, units and comments) and there are no new-line characters, so on a text editor all the keywords are seen in one line. Also, the meta-data keywords are immediately followed by the data which are commonly in binary format and will show up as strange looking characters on a text editor, and significantly slowing down the processor.

Gnuastro’s Fits program was designed to allow easy manipulation of FITS extensions and meta-data keywords on the command-line while conforming fully with the FITS standard. For example you can copy or cut (copy and remove) HDUs/extensions from one FITS file to another, or completely delete them. It also has features to delete, add, or edit meta-data keywords within one HDU.


Previous: , Up: Fits   [Contents][Index]

5.1.1 Invoking Fits

Fits can print or manipulate the FITS file HDUs (extensions), meta-data keywords in a given HDU. The executable name is astfits with the following general template

$ astfits [OPTION...] ASTRdata

One line examples:

## View general information about every extension:
$ astfits image.fits

## Print the header keywords in the second HDU (counting from 0):
$ astfits image.fits -h1 --printallkeys

## Only print header keywords that contain `NAXIS':
$ astfits image.fits -h1 --printallkeys | grep NAXIS

## Copy a HDU from input.fits to out.fits:
$ astfits input.fits --copy=hdu-name --output=out.fits

## Update the OLDKEY keyword value to 153.034:
$ astfits --update=OLDKEY,153.034,"Old keyword comment"

## Delete one COMMENT keyword and add a new one:
$ astfits --delete=COMMENT --comment="Anything you like ;-)."

## Write two new keywords with different values and comments:
$ astfits --write=MYKEY1,20.00,"An example keyword" --write=MYKEY2,fd

When no action is requested (and only a file name is given), Fits will print a list of information about the extension(s) in the file. This information includes the HDU number, HDU name (EXTNAME keyword), type of data (see Numeric data types, and the number of data elements it contains (size along each dimension for images and table rows and columns). You can use this to get a general idea of the contents of the FITS file and what HDU to use for further processing, either with the Fits program or any other Gnuastro program.

Here is one example of information about a FITS file with four extensions: the first extension has no data, it is a purely meta-data HDU (commonly used to keep meta-data about the whole file, or grouping of extensions, see Fits). The second extension is an image with name IMAGE and single precision floating point type (float32, see Numeric data types), it has 4287 pixels along its first (horizontal) axis and 4286 pixels along its second (vertical) axis. The third extension is also an image with name MASK. It is in 2-byte integer format (int16) which is commonly used to keep information about pixels (for example to identify which ones were saturated, or which ones had cosmic rays and so on), note how it has the same size as the IMAGE extension. The third extension is a binary table called CATALOG which has 12371 rows and 5 columns (it probably contains information about the sources in the image).

GNU Astronomy Utilities X.X
Run on Day Month DD HH:MM:SS YYYY
-----
HDU (extension) information: `image.fits'.
 Column 1: Index (counting from 0).
 Column 2: Name (`EXTNAME' in FITS standard).
 Column 3: Image data type or `table' format (ASCII or binary).
 Column 4: Size of data in HDU.
-----
0      n/a             uint8           0
1      IMAGE           float32         4287x4286
2      MASK            int16           4287x4286
3      CATALOG         table_binary    12371x5

The operating mode and input/output options to Fits are similar to the other programs and fully described in Common options. The options particular to Fits can be divided into two groups: 1) those related to modifying HDUs or extensions (see HDU manipulation), and 2) those related to viewing/modifying meta-data keywords (see Keyword manipulation). These two classes of options cannot be called together in one run: you can either work on the extensions or meta-data keywords in any instance of Fits.


Next: , Previous: , Up: Invoking astfits   [Contents][Index]

5.1.1.1 HDU manipulation

Each header data unit, or HDU (also known as an extension), in a FITS file is an independent dataset (data + meta-data). Multiple HDUs can be stored in one FITS file, see Fits. The HDU modifying options to the Fits program are listed below.

These options may be called multiple times in one run. If so, the extensions will be copied from the input FITS file to the output FITS file in the given order (on the command-line and also in configuration files, see Configuration file precedence). If the separate classes are called together in one run of Fits, then first --copy is run (on all specified HDUs), followed --cut (again on all specified HDUs), and then --remove (on all specified HDUs).

The --copy and --cut options need an output FITS file (specified with the --output option). If the output file exists, then the specified HDU will be copied following the last extension of the output file (the existing HDUs in it will be untouched). Thus, after Fits finishes, the copied HDU will be the last HDU of the output file. If no output file name is given, then automatic output will be used to store the HDUs given to this option (see Automatic output).

-C STR
--copy=STR

Copy the specified extension into the output file, see explanations above.

-k STR
--cut=STR

Cut (copy to output, remove from input) the specified extension into the output file, see explanations above.

-R STR
--remove=STR

Remove the specified HDU from the input file. From CFITSIO: “In the case of deleting the primary array (the first HDU in the file) then [it] will be replaced by a null primary array containing the minimum set of required keywords and no data.”. So in practice, any existing data (array) and meta-data in the first extension will be removed, but the number of extensions in the file won’t change. This is because of the unique position the first FITS extension has in the FITS standard (for example it cannot be used to store tables).


Previous: , Up: Invoking astfits   [Contents][Index]

5.1.1.2 Keyword manipulation

The meta-data in each header data unit, or HDU (also known as extension, see Fits) is stored as “keyword”s. Each keyword consists of a name, value, unit, and comments. The Fits program (see Fits) options related to viewing and manipulating keywords in a FITS HDU are described below.

To see the full list of keywords in a FITS HDU, you can use the --printallkeys option. If any of the keywords are to be modified, the headers of the input file will be changed. If you want to keep the original FITS file or HDU, it is easiest to create a copy first and then run Fits on that. In the FITS standard, keywords are always uppercase. So case does not matter in the input or output keyword names you specify.

Most of the options can accept multiple instances in one command. For example you can add multiple keywords to delete by calling --delete multiple times, since repeated keywords are allowed, you can even delete the same keyword multiple times. The action of such options will start from the top most keyword.

The precedence of operations are described below. Note that while the order within each class of actions is preserved, the order of individual actions is not. So irrespective of what order you called --delete and --update. First, all the delete operations are going to take effect then the update operations.

  1. --delete
  2. --rename
  3. --update
  4. --write
  5. --asis
  6. --history
  7. --comment
  8. --date
  9. --printallkeys

All possible syntax errors will be reported before the keywords are actually written. FITS errors during any of these actions will be reported, but Fits won’t stop until all the operations are complete. If --quitonerror is called, then Fits will immediately stop upon the first error.

If you want to inspect only a certain set of header keywords, it is easiest to pipe the output of the Fits program to GNU Grep. Grep is a very powerful and advanced tool to search strings which is precisely made for such situations. For example if you only want to check the size of an image FITS HDU, you can run:

$ astfits input.fits | grep NAXIS

FITS STANDARD KEYWORDS: Some header keywords are necessary for later operations on a FITS file, for example BITPIX or NAXIS, see the FITS standard for their full list. If you modify (for example remove or rename) such keywords, the FITS file extension might not be usable any more. Also be careful for the world coordinate system keywords, if you modify or change their values, any future world coordinate system (like RA and Dec) measurements on the image will also change.

The keyword related options to the Fits program are fully described below.

-a STR
--asis=STR

Write STR exactly into the FITS file header with no modifications. If it does not conform to the FITS standards, then it might cause trouble, so please be very careful with this option. If you want to define the keyword from scratch, it is best to use the --write option (see below) and let CFITSIO worry about the standards. The best way to use this option is when you want to add a keyword from one FITS file to another unchanged and untouched. Below is an example of such a case that can be very useful sometimes (on the command-line or in scripts):

$ key=$(astfits firstimage.fits | grep KEYWORD)
$ astfits --asis="$key" secondimage.fits

In particular note the double quotation signs (") around the reference to the key shell variable ($key), since FITS keywords usually have lots of space characters, if this variable is not quoted, the shell will only give the first word in the full keyword to this option, which will definitely be a non-standard FITS keyword and will make it hard to work on the file afterwords. See the “Quoting” section of the GNU Bash manual for more information if your keyword has the special characters $, `, or \.

-d STR
--delete=STR

Delete one instance of the STR keyword from the FITS header. Multiple instances of --delete can be given (possibly even for the same keyword, when its repeated in the meta-data). All keywords given will be removed from the headers in the same given order. If the keyword doesn’t exist, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror.

-r STR
--rename=STR

Rename a keyword to a new value. STR contains both the existing and new names, which should be separated by either a comma (,) or a space character. Note that if you use a space character, you have to put the value to this option within double quotation marks (") so the space character is not interpreted as an option separator. Multiple instances of --rename can be given in one command. The keywords will be renamed in the specified order. If the keyword doesn’t exist, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror.

-u STR
--update=STR

Update a keyword, its value, its comments and its units in the format described below. If there are multiple instances of the keyword in the header, they will be changed from top to bottom (with multiple --update options).

The format of the values to this option can best be specified with an example:

--update=KEYWORD,value,"comments for this keyword",unit

If there is a writing error, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror.

The value can be any numerical or string value53. Other than the KEYWORD, all the other values are optional. To leave a given token empty, follow the preceding comma (,) immediately with the next. If any space character is present around the commas, it will be considered part of the respective token. So if more than one token has space characters within it, the safest method to specify a value to this option is to put double quotation marks around each individual token that needs it. Note that without double quotation marks, space characters will be seen as option separators and can lead to undefined behavior.

-w STR
--write=STR

Write a keyword to the header. For the possible value input formats, comments and units for the keyword, see the --update option above.

-H STR
--history STR

Add a HISTORY keyword to the header. The string given to this keyword will be separated into multiple keyword cards if it is longer than 70 characters. With each run only one value for the --history option will be read. If there are multiple, it is the last one. If there is an error, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror.

-c STR
--comment STR

Add a COMMENT keyword to the header. Similar to the explanation for --history above. If there is a writing error, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror.

-t
--date

Put the current date and time in the header. If the DATE keyword already exists in the header, it will be updated. If there is a writing error, Fits will give a warning and return with a non-zero value, but will not stop. To stop as soon as an error occurs, run with --quitonerror.

-p
--printall

Print all the keywords in the specified FITS extension (HDU) on the command-line.

-Q
--quitonerror

Quit if any of the operations above are not successful. By default if an error occurs, Fits will warn the user of the faulty keyword and continue with the rest of actions.


Next: , Previous: , Up: Data containers   [Contents][Index]

5.2 ConvertType

The formats of astronomical data were defined mainly for archiving and processing. In other situations, the data might be useful in other formats. For example, when you are writing a paper or report or if you are making slides for a talk, you can’t use a FITS image. Other image formats should be used. In other cases you might want your pixel values in a table format as plain text for input to other programs that don’t recognize FITS, or to include as a table in your report. ConvertType is created for such situations. The various types will increase with future updates and based on need.

The conversion is not only one way (from FITS to other formats), but two ways (except the EPS and PDF formats). So you can convert a JPEG image or text file into a FITS image. Basically, other than EPS, you can use any of the recognized formats as different color channel inputs to get any of the recognized outputs. So before explaining the options and arguments, first a short description of the recognized files types will be given followed a short introduction to digital color.


Next: , Previous: , Up: ConvertType   [Contents][Index]

5.2.1 Recognized file formats

The various standards and the file name extensions recognized by ConvertType are listed below.

FITS or IMH

Astronomical data are commonly stored in the FITS format (and in older data sets in IRAF .imh format), a list of file name suffixes which indicate that the file is in this format is given in Arguments.

Each extension of a FITS image only has one value per pixel, so when used as input, each input FITS image contributes as one color channel. If you want multiple extensions in one FITS file for different color channels, you have to repeat the file name multiple times and use the --hdu, --hdu2, --hdu3 or --hdu4 options to specify the different extensions.

JPEG

The JPEG standard was created by the Joint photographic experts group. It is currently one of the most commonly used image formats. Its major advantage is the compression algorithm that is defined by the standard. Like the FITS standard, this is a raster graphics format, which means that it is pixelated.

A JPEG file can have 1 (for gray-scale), 3 (for RGB) and 4 (for CMYK) color channels. If you only want to convert one JPEG image into other formats, there is no problem, however, if you want to use it in combination with other input files, make sure that the final number of color channels does not exceed four. If it does, then ConvertType will abort and notify you.

The file name endings that are recognized as a JPEG file for input are: .jpg, .JPG, .jpeg, .JPEG, .jpe, .jif, .jfif and .jfi.

EPS

The Encapsulated PostScript (EPS) format is essentially a one page PostScript file which has a specified size. PostScript also includes non-image data, for example lines and texts. It is a fully functional programming language to describe a document. Therefore in ConvertType, EPS is only an output format and cannot be used as input. Contrary to the FITS or JPEG formats, PostScript is not a raster format, but is categorized as vector graphics.

The Portable Document Format (PDF) is currently the most common format for documents. Some believe that PDF has replaced PostScript and that PostScript is now obsolete. This view is wrong, a PostScript file is an actual plain text file that can be edited like any program source with any text editor. To be able to display its programmed content or print, it needs to pass through a processor or compiler. A PDF file can be thought of as the processed output of the compiler on an input PostScript file. PostScript, EPS and PDF were created and are registered by Adobe Systems.

With these features in mind, you can see that when you are compiling a document with TeX or LaTeX, using an EPS file is much more low level than a JPEG and thus you have much greater control and therefore quality. Since it also includes vector graphic lines we also use such lines to make a thin border around the image to make its appearance in the document much better. No matter the resolution of the display or printer, these lines will always be clear and not pixelated. In the future, addition of text might be included (for example labels or object IDs) on the EPS output. However, this can be done better with tools within TeX or LaTeX such as PGF/Tikz54.

If the final input image (possibly after all operations on the flux explained below) is a binary image or only has two colors of black and white (in segmentation maps for example), then PostScript has another great advantage compared to other formats. It allows for 1 bit pixels (pixels with a value of 0 or 1), this can decrease the output file size by 8 times. So if a gray-scale image is binary, ConvertType will exploit this property in the EPS and PDF (see below) outputs.

The standard formats for an EPS file are .eps, .EPS, .epsf and .epsi. The EPS outputs of ConvertType have the .eps suffix.

PDF

As explained above, a PDF document is a static document description format, viewing its result is therefore much faster and more efficient than PostScript. To create a PDF output, ConvertType will make a PostScript page description and convert that to PDF using GPL Ghostscript. The suffixes recognized for a PDF file are: .pdf, .PDF. If GPL Ghostscript cannot be run on the PostScript file, it will remain and a warning will be printed.

blank

This is not actually a file type! But can be used to fill one color channel with a blank value. If this argument is given for any color channel, that channel will not be used in the output.

Plain text

Plain text files have the advantage that they can be viewed with any text editor or on the command-line. Most programs also support input as plain text files. As input, each plain text file is considered to contain one color channel.

In ConvertType, the recognized extensions for plain text files are .txt and .dat. As described in Invoking ConvertType, if you just give these extensions, (and not a full filename) as output, then automatic output will be preformed to determine the final output name (see Automatic output). Besides these, when the format of a file cannot be recognized from its name, ConvertType will fall back to plain text mode. So you can use any name (even without an extension) for a plain text input or output. Just note that when the suffix is not recognized, automatic output will not be preformed.

The basic input/output on plain text images is very similar to how tables are read/written as described in Gnuastro text table format. Simply put, the restrictions are very loose, and there is a convention to define a name, units, data type (see Numeric data types), and comments for the data in a commented line. The only difference is that as a table, a text file can contain many datasets (columns), but as a 2D image, it can only contain one dataset. As a result, only one information comment line is necessary for a 2D image, and instead of the starting ‘# Column N’ (N is the column number), the information line for a 2D image must start with ‘# Image 1’. When ConvertType is asked to output to plain text file, this information comment line is written before the image pixel values.

When converting an image to plain text, consider the fact that if the image is large, the number of columns in each line will become very large, possibly making it very hard to open in some text editors.

Standard output (command-line)

This is very similar to the plain text output, but instead of creating a file to keep the printed values, they are printed on the command line. This can be very useful when you want to redirect the results directly to another program in one command with no intermediate file. The only difference is that only the pixel values are printed (with no information comment line). To print to the standard output, set the output name to ‘stdout’.


Next: , Previous: , Up: ConvertType   [Contents][Index]

5.2.2 Color

An image is a two dimensional array of 2 dimensional elements called pixels. If each pixel only has one value, the image is known as a gray-scale image and no color is defined. The range of values in the image can be interpreted as shades of any color, it is customary to use shades of black or gray-scale. However, to produce the color spectrum in the digital world, several primary colors must be mixed. Therefore in a color image, each pixel has several values depending on how many primary colors were chosen. For example on the digital monitor or color digital cameras, all colors are built by mixing the three colors of Red-Green-Blue (RGB) with various proportions. However, for printing on paper, standard printers use the Cyan-Magenta-Yellow-Key (CMYK, Key=black) color space. Therefore when printing an RGB image, usually a transformation of color spaces will be necessary.

In a colored digital camera, a color image is produced by dividing the pixel’s area between three colors (filters). However in astronomy due to the intrinsic faintness of most of the targets, the collecting area of the pixel is very important for us. Hence the full area of the pixel is used and one value is stored for that pixel in the end. One color filter is used for the whole image. Thus a FITS image is inherently a gray-scale image and no color can be defined for it.

One way to represent a gray-scale image in different color spaces is to use the same proportions of the primary colors in each pixel. This is the common way most FITS image converters work: they fill all the channels with the same values. The downside is two fold:

To solve both these problems, the best way is to save the FITS image into the black channel of the CMYK color space. In the RGB color space all three channels have to be used. The JPEG standard is the only common standard that accepts CMYK color space, that is why currently only the JPEG standard is included and not the PNG standard for example.

The JPEG and EPS standards set two sizes for the number of bits in each channel: 8-bit and 12-bit. The former is by far the most common and is what is used in ConvertType. Therefore, each channel should have values between 0 to 2^8-1=255. From this we see how each pixel in a gray-scale image is one byte (8 bits) long, in an RGB image, it is 3bytes long and in CMYK it is 4bytes long. But thanks to the JPEG compression algorithms, when all the pixels of one channel have the same value, that channel is compressed to one pixel. Therefore a Grayscale image and a CMYK image that has only the K-channel filled are approximately the same file size.


Previous: , Up: ConvertType   [Contents][Index]

5.2.3 Invoking ConvertType

ConvertType will convert any recognized input file type to any specified output type. The executable name is astconvertt with the following general template

$ astconvertt [OPTION...] InputFile [InputFile2] ... [InputFile4]

One line examples:

## Convert an image in FITS to PDF:
$ astconvertt image.fits --output=pdf

## Convert an image in JPEG to FITS:
$ astconvertt image.jpg -ogalaxy.fits

## Use three plain text 2D arrays to create an RGB JPEG output:
$ astconvertt f1.txt f2.txt f3.fits -o.jpg

## Use two images and one blank for an RGB EPS output:
$ astconvertt M31_r.fits M31_g.fits blank -oeps

The file type of the output will be specified with the (possibly complete) file name given to the --output option, which can either be given on the command-line or in any of the configuration files (see Configuration files). Note that if the output suffix is not recognized, it will default to plain text format, see Recognized file formats.

The order of multiple input files is important. After reading the input file(s) the number of color channels in all the inputs will be used to define which color space is being used for the outputs and how each color channel is interpreted. Note that one file might have more than one color channel (for example in the JPEG format). If there is one color channel the output is gray-scale, if three input color channels are given they are respectively considered to be the red, green and blue color channels and if there are four color channels they are respectively considered to be cyan, magenta, yellow and black.

The value to --output (or -o) can be either a full file name or just the suffix of the desired output format. In the former case, that same name will be used for the output. In the latter case, the name of the output file will be set based on the automatic output guidelines, see Automatic output. Note that the suffix name can optionally start a . (dot), so for example --output=.jpg and --output=jpg are equivalent. See Recognized file formats

Besides the common set of options explained in Common options, the options to ConvertType can be classified into input, output and flux related options. The majority of the options are to do with the flux range. Astronomical data usually have a very large dynamic range (difference between maximum and minimum value) and different subjects might be better demonstrated with a limited flux range.

Input:

-h STR/INT
--hdu=STR/INT

In ConvertType, it is possible to call the HDU option multiple times for the different input FITS files (corresponding to different color channels) in the same order that they are called on the command-line. Except for the fact that multiple calls are possible, this option is identical to the common --hdu in Input/Output options. The number of calls to this option cannot be less than the number of input FITS files, but if there are more, the extra HDUs will be ignored, note that they will be read in the order described in Configuration file precedence.

Output:

-w FLT
--widthincm=FLT

The width of the output in centimeters. This is only relevant for those formats that accept such a width (not plain text for example). For most digital purposes, the number of pixels is far more important than the value to this parameter because you can adjust the absolute width (in inches or centimeters) in your document preparation program.

-b INT
--borderwidth=INT

The width of the border to be put around the EPS and PDF outputs in units of PostScript points. There are 72 or 28.35 PostScript points in an inch or centimeter respectively. In other words, there are roughly 3 PostScript points in every millimeter. If you are planning on adding a border, its significance is highly correlated with the value you give to the --widthincm parameter.

Unfortunately in the document structuring convention of the PostScript language, the “bounding box” has to be in units of PostScript points with no fractions allowed. So the border values only have to be specified in integers. To have a final border that is thinner than one PostScript point in your document, you can ask for a larger width in ConvertType and then scale down the output EPS or PDF file in your document preparation program. For example by setting width in your includegraphics command in TeX or LaTeX. Since it is vector graphics, the changes of size have no effect on the quality of your output quality (pixels don’t get different values).

-x
--hex

Use Hexadecimal encoding in creating EPS output. By default the ASCII85 encoding is used which provides a much better compression ratio. When converted to PDF (or included in TeX or LaTeX which is finally saved as a PDF file), an efficient binary encoding is used which is far more efficient than both of them. The choice of EPS encoding will thus have no effect on the final PDF.

So if you want to transfer your EPS files (for example if you want to submit your paper to arXiv or journals in PostScript), their storage might become important if you have large images or lots of small ones. By default ASCII85 encoding is used which offers a much better compression ratio (nearly 40 percent) compared to Hexadecimal encoding.

-u INT
--quality=INT

The quality (compression) of the output JPEG file with values from 0 to 100 (inclusive). For other formats the value to this option is ignored. Note that only in gray-scale (when one input color channel is given) will this actually be the exact quality (each pixel will correspond to one input value). If it is in color mode, some degradation will occur. While the JPEG standard does support loss-less graphics, it is not commonly supported.

Flux range:

-c STR
--chang=STR

(=STR) Change pixel values with the following format "from1:to1, from2:to2,...". This option is very useful in displaying labeled pixels (not actual data images which have noise) like segmentation maps. In labeled images, usually a group of pixels have a fixed integer value. With this option, you can manipulate the labels before the image is displayed to get a better output for print or to emphasize on a particular set of labels and ignore the rest. The labels in the images will be changed in the same order given. By default first the pixel values will be converted then the pixel values will be truncated (see --fluxlow and --fluxhigh).

You can use any number for the values irrespective of your final output, your given values are stored and used in the double precision floating point format. So for example if your input image has labels from 1 to 20000 and you only want to display those with labels 957 and 11342 then you can run ConvertType with these options:

$ astconvertt --change=957:50000,11342:50001 --fluxlow=5e4 \
   --fluxhigh=1e5 segmentationmap.fits --output=jpg

While the output JPEG format is only 8 bit, this operation is done in an intermediate step which is stored in double precision floating point. The pixel values are converted to 8-bit after all operations on the input fluxes have been complete. By placing the value in double quotes you can use as many spaces as you like for better readability.

-C
--changeaftertrunc

Change pixel values (with --change) after truncation of the flux values, by default it is the opposite.

-L FLT
--fluxlow=FLT

The minimum flux (pixel value) to display in the output image, any pixel value below this value will be set to this value in the output. If the value to this option is the same as --fluxmax, then no flux truncation will be applied. Note that when multiple channels are given, this value is used for all the color channels.

-H FLT
--fluxhigh=FLT

The maximum flux (pixel value) to display in the output image, see --fluxlow.

-m INT
--maxbyte=INT

This is only used for the JPEG and EPS output formats which have an 8-bit space for each channel of each pixel. The maximum value in each pixel can therefore be \(2^8-1=255\). With this option you can change (decrease) the maximum value. By doing so you will decrease the dynamic range. It can be useful if you plan to use those values for other purposes.

-A INT
--flminbyte=INT

If the lowest pixel value in the input channels is larger than the value to --fluxlow, then that input value will be redundant. In some situations it might be necessary to set the minimum byte value (0) to correspond to that flux even if the data do not reach that value. With this option you can do this. Note that if the minimum pixel value is smaller than --fluxlow, then this option is redundant.

-B INT
--fhmaxbyte=INT

See --flminbyte.

-i
--invert

For 8-bit output types (JPEG, EPS, and PDF for example) the final value that is stored is inverted so white becomes black and vice versa. The reason for this is that astronomical images usually have a very large area of blank sky in them. The result will be that a large are of the image will be black. Note that this behavior is ideal for gray-scale images, if you want a color image, the colors are going to be mixed up.


Previous: , Up: Data containers   [Contents][Index]

5.3 Table

Tables are the products of processing astronomical images and spectra. For example in Gnuastro, MakeCatalog will process the defined pixels over an object and produce a catalog (see MakeCatalog). For each identified object, MakeCatalog can print its position on the image or sky, its total brightness and many other information that is deducible from the given image. Each one of these properties is a column in its output catalog (or table) and for each input object, we have a row.

When there are only a small number of objects (rows) and not too many properties (columns), then a simple plain text file is mainly enough to store, transfer, or even use the produced data. However, to be more efficient in all these aspects, astronomers have defined the FITS binary table standard to store data in a binary (0 and 1) format, not plain text. This can offer major advantages in all those aspects: the file size will be greatly reduced and the reading and writing will be faster (because the RAM and CPU also work in binary).

The FITS standard also defines a standard for ASCII tables, where the data are stored in the human readable ASCII format, but within the FITS file structure. These are mainly useful for keeping ASCII data along with images and possibly binary data as multiple (conceptually related) extensions within a FITS file. The acceptable table formats are fully described in Tables.

Binary tables are not easily readable by human eyes. There is no fixed/unified standard on how the zero and ones should be interpreted. The Unix-like operating systems have flourished because of a simple fact: communication between the various tools is based on human readable characters55. So while the FITS table standards are very beneficial for the tools that recognize them, they are hard to use in the vast majority of available software. This creates limitations for their generic use.

‘Table’ is Gnuastro’s solution to this problem. With Table, FITS tables (ASCII or binary) are directly accessible to the Unix-like operating systems power-users (those working the command-line or shell, see Command-line interface). With Table, a FITS table (in binary or ASCII formats) is only one command away from AWK (or any other tool you want to use). Just like a plain text file that you read with the cat command. You can pipe the output of Table into any other tool for higher-level processing, see the examples in Invoking Table for some simple examples.


Previous: , Up: Table   [Contents][Index]

5.3.1 Invoking Table

Table will read/write, select, convert, or show the information of the columns in FITS ASCII table, FITS binary table and plain text table files, see Tables. Output columns can also be determined by number or regular expression matching of column names, units, or comments. The executable name is asttable with the following general template

$ asttable [OPTION...] InputFile

One line examples:

## Get the table column information (name, data type, or units):
$ asttable bintab.fits --information

## Only print those columns which have a name starting with "MAG_":
$ asttable bintab.fits --column=/^MAG_/

## Only print the 2nd column, and the third column multiplied by 5:
$ asttable bintab.fits | awk '{print $2, 5*$3}'

## Only print rows with a value in the 10th column above 100000:
$ asttable bintab.fits | awk '$10>10e5 {print}'

## Sort the output columns by the third column, save output:
$ asttable bintab.fits | 'sort -k3 > output.txt

## Convert a plain text table to a binary FITS table:
$ asttable plaintext.txt --output=table.fits --tabletype=fits-binary

In the absence of selected columns, all the input file’s columns will be output. If the specified output is a FITS file, the type of FITS table (binary or ASCII) will be determined from the --tabletype option. If the output is not a FITS file, it will be printed as a plain text table (with space characters between the columns). When the columns are accompanied by meta-data (like column name, units, or comments), this information will also printed in the plain text file before the table, as described in Gnuastro text table format.

For the full list of options common to all Gnuastro programs please see Common options. Options can also be stored in directory, user or system-wide configuration files to avoid repeating on the command-line, see Configuration files. Table does not follow Automatic output that is common in most Gnuastro programs, see Automatic output. Thus, in the absence of an output file, the selected columns will be printed on the command-line with no column information, ready for redirecting to other tools like AWK or sort, similar to the examples above.

-i
--information

Only print the column information in the specified table on the command-line and exit. Each column’s information (number, name, units, data type, and comments) will be printed as a row on the command-line. Note that the FITS standard only requires the data type (see Numeric data types), and in plain text tables, no meta-data/information is mandatory. Gnuastro has its own convention in the comments of a plain text table to store and transfer this information as described in Gnuastro text table format.

This option will take precedence over the --column option, so when it is called along with requested columns, the latter will be ignored. This can be useful if you forget the identifier of a column after you have already typed some on the command-line. You can simply add a -i and run Table to see the whole list and remember. Then you can use the shell history (with the up arrow key on the keyboard), and retrieve the last command with all the previously typed columns present, delete -i and add the identifier you had forgot.

-c STR/INT
--column=STR/INT

Specify the columns to output, see Selecting table columns for a thorough explanation on how the value to this option is interpreted. To select several output columns, this option can also be called any number times in one call to Table. The order of the output columns will be the same call order on the command-line.

This option is not mandatory, if no specific columns are requested, all the input table columns are output. When this option is called multiple times, it is possible to output one column more than once.


Next: , Previous: , Up: Top   [Contents][Index]

6 Data manipulation

Images are one of the major formats of data that is used in astronomy. The functions in this chapter explain the GNU Astronomy Utilities which are provided for their manipulation. For example cropping out a part of a larger image or convolving the image with a given kernel or applying a transformation to it.


Next: , Previous: , Up: Data manipulation   [Contents][Index]

6.1 Crop

Astronomical images are often very large, filled with thousands of galaxies. It often happens that you only want a section of the image, or you have a catalog of sources and you want to visually analyze them in small postage stamps. Crop is made to do all these things. When more than one crop is required, Crop will divide the crops between multiple threads to significantly reduce the run time.

Astronomical surveys are usually extremely large. So large in fact, that the whole survey will not fit into a reasonably sized file. Because of this, surveys usually cut the final image into separate tiles and store each tile in a file. For example the COSMOS survey’s Hubble space telescope, ACS F814W image consists of 81 separate FITS images, with each one having a volume of 1.7 Giga bytes.

Even though the tile sizes are chosen to be large enough that too many galaxies/targets don’t fall on the edges of the tiles, inevitably some do. So when you simply crop the image of such targets from one tile, you will miss a large area of the surrounding sky (which is essential in estimating the noise). Therefore in its WCS mode, Crop will stitch parts of the tiles that are relevant for a target (with the given width) from all the input images that cover that region into the output. Of course, the tiles have to be present in the list of input files.

Besides cropping postage stamps around certain coordinates, Crop can also crop arbitrary polygons from an image (or a set of tiles by stitching the relevant parts of different tiles within the polygon), see --polygon in Invoking Crop. Alternatively, it can crop out rectangular regions through the --section option from one image, see Crop section syntax.


Next: , Previous: , Up: Crop   [Contents][Index]

6.1.1 Crop modes

In order to be comprehensive, intuitive, and easy to use, there are two ways to define the crop:

  1. From its center and side length. For example if you already know the coordinates of an object and want to inspect it in an image or to generate postage stamps of a catalog containing many such coordinates.
  2. The vertices of the crop region, this can be useful for larger crops over many targets, for example to crop out a uniformly deep, or contiguous, region of a large survey.

Irrespective of how the crop region is defined, the coordinates to define the crop can be in Image (pixel) or World Coordinate System (WCS) standards. All coordinates are read as floating point numbers (not integers, except for the --section option, see below). By setting the mode in Crop, you define the standard that the given coordinates must be interpretted. Here, the different ways to specify the crop region are discussed within each standard. For the full list options, please see Invoking Crop.

When the crop is defined by its center, the respective (integer) central pixel position will be found internally according to the FITS standard. To have this pixel positioned in the center of the cropped region, the final cropped region will have an add number of pixels (even if you give an even number to --width in image mode).

Furthermore, when the crop is defined as by its center, Crop allows you to only keep crops what don’t have any blank pixels in the vicinity of their center (your primary target). This can be very convenient when your input catalog/coordinates originated from another survey/filter which is not fully covered by your input image, to learn more about this feature, please see the description of the --checkcenter option in Invoking Crop.

Image coordinates

In image mode (--mode=img), Crop interprets the pixel coordinates and widths in units of the input data-elements (for example pixels in an image, not world coordinates). In image mode, only one image may be input. The output crop(s) can be defined in multiple ways as listed below.

Center of multiple crops (in a catalog)

The center of (possibly multiple) crops are read from a text file. In this mode, the columns identified with the --coordcol option are interpreted as the center of a crop with a width of --width pixels along each dimension. The columns can contain any floating point value. The value to --output option is seen as a directory which will host (the possibly multiple) separate crop files, see Crop output for more. For a tutorial using this feature, please see Hubble visually checks and classifies his catalog.

Center of a single crop (on the command-line)

The center of the crop is given on the command-line with the --center option. The crop width is specified by the --width option along each dimension. The given coordinates and width can be any floating point number.

Vertices of a single crop

In Image mode there are two options to define the vertices of a region to crop: --section and --polygon. The former is lower-level (doesn’t accept floating point vertices, and only a rectangular region can be defined), it is also only available in Image mode. Please see Crop section syntax for a full description of this method.

The latter option (--polygon) is a higher-level method to define any convex polygon (with any number of vertices) with floating point values. Please see the description of this option in Invoking Crop for its syntax.

WCS coordinates

In WCS mode (--mode=wcs), the coordinates and widths are interpretted using the World Coordinate System (WCS, that must accompany the dataset), not pixel coordinates. In WCS mode, Crop accepts multiple datasets as input. When the cropped region (defined by its center or vertices) overlaps with multiple of the input images/tiles, the overlapping regions will be taken from the respective input (they will be stitched when necessary for each output crop).

In this mode, the input images do not necessarily have to be the same size, they just need to have the same orientation and pixel resolution. Currently only orientation along the celestial coordinates is accepted, if your input has a different orientation you can use Warp’s --align option to align the image before cropping it (see Warp).

Each individual input image/tile can even be smaller than the final crop. In any case, any part of any of the input images which overlaps with the desired region will be used in the crop. Note that if there is an overlap in the input images/tiles, the pixels from the last input image read are going to be used for the overlap. Crop will not change pixel values, so it assumes your overlapping tiles were cutout from the same original image. There are multiple ways to define your cropped region as listed below.

Center of multiple crops (in a catalog)

Similar to catalog inputs in Image mode (above), except that the values along each dimension are assumed to have the same units as the dataset’s WCS information. For example, the central RA and Dec value for each crop will be read from the first and second calls to the --coordcol option. The width of the cropped box (in units of the WCS, or degrees in RA and Dec mode) must be specified with the --width option.

Center of a single crop (on the command-line)

You can specify the center of only one crop box with the --center option. If it exists in the input images, it will be cropped similar to the catalog mode, see above also for --width.

Vertices of a single crop

The --polygon option is a high-level method to define any convex polygon (with any number of vertices). Please see the description of this option in Invoking Crop for its syntax.

CAUTION: In WCS mode, the image has to be aligned with the celestial coordinates, such that the first FITS axis is parallel (opposite direction) to the Right Ascension (RA) and the second FITS axis is parallel to the declination. If these conditions aren’t met for an image, Crop will warn you and abort. You can use Warp’s --align option to align the input image with these coordinates, see Warp.

As a summary, if you don’t specify a catalog, you have to define the cropped region manually on the command-line. In any case the mode is mandatory for Crop to be able to interpret the values given as coordinates or widths.


Next: , Previous: , Up: Crop   [Contents][Index]

6.1.2 Crop section syntax

When in image mode, one of the methods to crop only one rectangular section from the input image is to use the --section option. Crop has a powerful syntax to read the box parameters from a string of characters. If you leave certain parts of the string to be empty, Crop can fill them for you based on the input image sizes.

To define a box, you need the coordinates of two points: the first (X1, Y1) and the last pixel (X2, Y2) pixel positions in the image, or four integer numbers in total. The four coordinates can be specified with one string in this format: ‘X1:X2,Y1:Y2’. This string is given to the --section option. Therefore, the pixels along the first axis that are \(\geq\)X1 and \(\leq\)X2 will be included in the cropped image. The same goes for the second axis. Note that each different term will be read as an integer, not a float. This is a low-level option, for a higher-level way to specify region (any polygon, not just a box), please see the --polygon option in Crop options. Also note that in the FITS standard, pixel indexes along each axis start from unity(1) not zero(0).

You can omit any of the values and they will be filled automatically. The left hand side of the colon (:) will be filled with 1, and the right side with the image size. So, 2:,: will include the full range of pixels along the second axis and only those with a first axis index larger than 2 in the first axis. If the colon is omitted for a dimension, then the full range is automatically used. So the same string is also equal to 2:, or 2: or even 2. If you want such a case for the second axis, you should set it to: ,2.

If you specify a negative value, it will be seen as before the indexes of the image which are outside the image along the bottom or left sides when viewed in SAO ds9. In case you want to count from the top or right sides of the image, you can use an asterisk (*). When confronted with a *, Crop will replace it with the maximum length of the image in that dimension. So *-10:*+10,*-20:*+20 will mean that the crop box will be 20\times40 pixels in size and only include the top corner of the input image with 3/4 of the image being covered by blank pixels, see Blank pixels.

If you feel more comfortable with space characters between the values, you can use as many space characters as you wish, just be careful to put your value in double quotes, for example --section="5:200, 123:854". If you forget the quotes, anything after the first space will not be seen by --section and you will most probably get an error because the rest of your string will be read as a filename (which most probably doesn’t exist). See Command-line for a description of how the command-line works.


Next: , Previous: , Up: Crop   [Contents][Index]

6.1.3 Blank pixels

The cropped box can potentially include pixels that are beyond the image range. For example when a target in the input catalog was very near the edge of the input image. The parts of the cropped image that were not in the input image will be filled with the following two values depending on the data type of the image. In both cases, SAO ds9 will not color code those pixels.

You can ask for such blank regions to not be included in the output crop image using the --noblank option. In such cases, there is no guarantee that the image size of your outputs are what you asked for.

In some survey images, unfortunately they do not use the BLANK FITS keyword. Instead they just give all pixels outside of the survey area a value of zero. So by default, when dealing with float or double image types, any values that are 0.0 are also regarded as blank regions. This can be turned off with the --zeroisnotblank option.


Previous: , Up: Crop   [Contents][Index]

6.1.4 Invoking Crop

Crop will crop a region from an image. If in WCS mode, it will also stitch parts from separate images in the input files. The executable name is astcrop with the following general template

$ astcrop [OPTION...] [ASCIIcatalog] ASTRdata ...

One line examples:

## Crop all objects in cat.txt from image.fits:
$ astcrop --catalog=cat.txt image.fits

## Crop all options in catalog (with RA,DEC) from all the files
## ending in `_drz.fits' in `/mnt/data/COSMOS/':
$ astcrop --mode=wcs --catalog=cat.txt /mnt/data/COSMOS/*_drz.fits

## Crop the outer 10 border pixels of the input image:
$ astcrop --section=10:*-10,10:*-10 --hdu=2 image.fits

## Crop region around RA and Dec of (189.16704, 62.218203):
$ astcrop --mode=wcs --center=189.16704,62.218203 goodsnorth.fits

## Crop region around pixel coordinate (568.342, 2091.719):
$ astcrop --mode=img --center=568.342,2091.719 --width=201 image.fits

Crop has one mandatory argument which is the input image name(s), shown above with ASTRdata .... You can use shell expansions, for example * for this if you have lots of images in WCS mode. If the crop box centers are in a catalog, you can use the --catalog option. In other cases, you have to provide the single cropped output parameters must be given with command-line options. See Crop output for how the output file name(s) can be specified. For the full list of general options to all Gnuastro programs (including Crop), please see Common options.

Floating point numbers can be used to specify the crop region (except the --section option, see Crop section syntax). In such cases, the floating point values will be used to find the desired integer pixel indices based on the FITS standard. Hence, Crop ultimately doesn’t do any sub-pixel cropping (in other words, it doesn’t change pixel values). If you need such crops, you can use Warp to first warp the image to the a new pixel grid, then crop from that. For example, let’s assume you want a crop from pixels 12.982 to 80.982 along the first dimension. You should first translate the image by \(-0.482\) (note that the edge of a pixel is at integer multiples of \(0.5\)). So you should run Warp with --translate=-0.482,0 and then crop the warped image with --section=13:81.

There are two ways to define the cropped region: with its center or its vertices. See Crop modes for a full description. In the former case, Crop can check if the central region of the cropped image is indeed filled with data or is blank (see Blank pixels), and not produce any output when the center is blank, see the description under --checkcenter for more.

When in catalog mode, Crop will run in parallel unless you set --numthreads=1, see Multi-threaded operations. Note that when multiple outputs are created with threads, the outputs will not be created in the same order. This is because the threads are asynchronous and thus not started in order. This has no effect on each output, see Hubble visually checks and classifies his catalog for a tutorial on effectively using this feature.


Next: , Previous: , Up: Invoking astcrop   [Contents][Index]

6.1.4.1 Crop options

The options can be classified into the following contexts: Input, Output and operating mode options. Options that are common to all Gnuastro program are listed in Common options and will not be repeated here.

When you are specifying the crop vertices your self (through --section, or --polygon) on relatively small regions (depending on the resolution of your images) the outputs from image and WCS mode can be approximately equivalent. However, as the crop sizes get large, the curved nature of the WCS coordinates have to be considered. For example, when using --section, the right ascension of the bottom left and top left corners will not be equal. If you only want regions within a given right ascension, use --polygon in WCS mode.

Input image parameters:

--hstartwcs=INT

Specify the first keyword card (line number) to start finding the input image world coordinate system information. Distortions were only recently included in WCSLIB (from version 5). Therefore until now, different telescope would apply their own specific set of WCS keywords and put them into the image header along with those that WCSLIB does recognize. So now that WCSLIB recognizes most of the standard distortion parameters, they will get confused with the old ones and give completely wrong results. For example in the CANDELS-GOODS South images56.

The two --hstartwcs and --hendwcs are thus provided so when using older datasets, you can specify what region in the FITS headers you want to use to read the WCS keywords. Note that this is only relevant for reading the WCS information, basic data information like the image size are read separately. These two options will only be considered when the value to --hendwcs is larger than that of --hstartwcs. So if they are equal or --hstartwcs is larger than --hendwcs, then all the input keywords will be parsed to get the WCS information of the image.

--hendwcs=INT

Specify the last keyword card to read for specifying the image world coordinate system on the input images. See --hstartwcs

Crop box parameters:

-c FLT[,FLT[,...]]
--center=FLT[,FLT[,...]]

The central position of the crop in the input image. The positions along each dimension must be separated by a comma (,) and fractions are also acceptable. The number of values given to this option must be the same as the dimensions of the input dataset. The width of the crop should be set with --width. The units of the coordinates are read based on the value to the --mode option, see below.

-w FLT[,FLT[,...]]
--width=FLT[,FLT[,...]]

Width of the cropped region about its center. --width may take either a single value (to be used for all dimensions) or multiple values (a specific value for each dimension). If in WCS mode, value(s) given to this option will be read in the same units as the dataset’s WCS information along this dimension. The final output will have an odd number of pixels to allow easy identification of the pixel which keeps your requested coordinate (from --center or --catalog).

The --width option also accepts fractions. For example if you want the width of your crop to be 3 by 5 arcseconds along RA and Dec respectively, you can call it with: --width=3/3600,5/3600.

If you want an even sided crop, you can run Crop afterwards with --section=":*-1,:*-1" or --section=2:,2: (depending on which side you don’t need), see Crop section syntax.

-l STR
--polygon=STR

String of crop polygon vertices. Note that currently only convex polygons should be used. In the future we will make it work for all kinds of polygons. Convex polygons are polygons that do not have an internal angle more than 180 degrees. This option can be used both in the image and WCS modes, see Crop modes. The cropped image will be the size of the rectangular region that completely encompasses the polygon. By default all the pixels that are outside of the polygon will be set as blank values (see Blank pixels). However, if --outpolygon is called all pixels internal to the vertices will be set to blank.

The syntax for the polygon vertices is similar to, and simpler than, that for --section. In short, the dimensions of each coordinate are separated by a comma (,) and each vertex is separated by a colon (:). You can define as many vertices as you like. If you would like to use space characters between the dimensions and vertices to make them more human-readable, then you have to put the value to this option in double quotation marks.

For example, let’s assume you want to work on the deepest part of the WFC3/IR images of Hubble Space Telescope eXtreme Deep Field (HST-XDF). According to the webpage57 the deepest part is contained within the coordinates:

[ (53.187414,-27.779152), (53.159507,-27.759633),
  (53.134517,-27.787144), (53.161906,-27.807208) ]

They have provided mask images with only these pixels in the WFC3/IR images, but what if you also need to work on the same region in the full resolution ACS images? Also what if you want to use the CANDELS data for the shallow region? Running Crop with --polygon will easily pull out this region of the image for you irrespective of the resolution. If you have set the operating mode to WCS mode in your nearest configuration file (see Configuration files), there is no need to call --mode=wcs on the command line. You may also provide many FITS images/tiles and Crop will stitch them to produce this cropped region:

$ astcrop --mode=wcs desired-filter-image(s).fits           \
   --polygon="53.187414,-27.779152 : 53.159507,-27.759633 : \
              53.134517,-27.787144 : 53.161906,-27.807208"

In other cases, you have an image and want to define the polygon yourself (it isn’t already published like the example above). As the number of vertices increases, checking the vertex coordinates on a FITS viewer (for example SAO ds9) and typing them in one by one can be very tedious and prone to typo errors.

You can take the following steps to avoid the frustration and possible typos: Open the image with ds9 and activate its “region” mode with Edit→Region. Then define the region as a polygon with Region→Shape→Polygon. Click on the approximate center of the region you want and a small square will appear. By clicking on the vertices of the square you can shrink or expand it, clicking and dragging anywhere on the edges will enable you to define a new vertex. After the region has been nicely defined, save it as a file with Region→Save Regions. You can then select the name and address of the output file, keep the format as REG and press “OK”. In the next window, keep format as “ds9” and “Coordinate System” as “fk5”. A plain text file (let’s call it ds9.reg) is now created.

You can now convert this plain text file to Crop’s polygon format with this command (when typing on the command-line, ignore the “\” at the end of the first and second lines along with the extra spaces, these are only for nice printing):

$ v=$(awk 'NR==4' ds9.reg | sed -e's/polygon(//'        \
           -e's/\([^,]*,[^,]*\),/\1:/g' -e's/)//' )
$ astcrop --mode=wcs image.fits --polygon=$v
--outpolygon

Keep all the regions outside the polygon and mask the inner ones with blank pixels (see Blank pixels). This is practically the inverse of the default mode of treating polygons. Note that this option only works when you have only provided one input image. If multiple images are given (in WCS mode), then the full area covered by all the images has to be shown and the polygon excluded. This can lead to a very large area if large surveys like COSMOS are used. So Crop will abort and notify you. In such cases, it is best to crop out the larger region you want, then mask the smaller region with this option.

-s STR
--section=STR

Section of the input image which you want to be cropped. See Crop section syntax for a complete explanation on the syntax required for this input.

-x STR/INT
--coordcol=STR/INT

The column in a catalog to read as a coordinate. The value can be either the column number (starting from 1), or a match/search in the table meta-data, see Selecting table columns. This option must be called multiple times, depending on the number of dimensions in the input dataset. If it is called more than necessary, the extra columns (later calls to this option on the command-line or configuration files) will be ignored, see Configuration file precedence.

-n STR/INT
--namecol=STR/INT

Column selection of crop file name. The value can be either the column number (starting from 1), or a match/search in the table meta-data, see Selecting table columns. This option can be used both in Image and WCS modes, and not a mandatory. When a column is given to this option, the final crop base file name will be taken from the contents of this column. The directory will be determined by the --output option (current directory if not given) and the value to --suffix will be appended. When this column isn’t given, the row number will be used instead.

Output options:

-c INT
--checkcenter=INT

Box width (odd number of pixels) of region in the center of the image to check for blank values. If the value to this option is zero, no checking is done. This option is only relevant when the cropped region(s) are defined by their center (not by the vertices, see Crop modes). If any of the pixels in this central region of a crop (defined by its center) are blank, then it will not be created.

Because survey regions don’t often have a clean square or rectangle shape, some of the pixels on the sides of the survey FITS image don’t commonly have any data and are blank (see Blank pixels). So when the catalog was not generated from the input image, it often happens that the image does not have data over some of the points.

When the given center of a crop falls in such regions and this option has a non-zero, odd value, no crop will be created. Therefore with this option, you can specify a width of a small box (3 pixels is often good enough) around the central pixel of the cropped image. You can check which crops were created and which weren’t from the command-line (if --quiet was not called, see Operating mode options), or in Crop’s log file (see Crop output).

-p STR
--suffix=STR

The suffix (or post-fix) of the output files for when you want all the cropped images to have a special ending. One case where this might be helpful is when besides the science images, you want the weight images (or exposure maps, which are also distributed with survey images) of the cropped regions too. So in one run, you can set the input images to the science images and --suffix=_s.fits. In the next run you can set the weight images as input and --suffix=_w.fits.

-b
--noblank

Pixels outside of the input image that are in the crop box will not be used. By default they are filled with blank values (depending on type), see Blank pixels. This option only applies only in Image mode, see Crop modes.

-z
--zeroisnotblank

In float or double images, it is common to give the value of zero to blank pixels. If the input image type is one of these two types, such pixels will also be considered as blank. You can disable this behavior with this option, see Blank pixels.

Operating mode options:

-O STR
--mode=STR

Operate in Image mode or WCS mode when the input coordinates can be both image or WCS. The value must either be img or wcs, see Crop modes for a full description.


Previous: , Up: Invoking astcrop   [Contents][Index]

6.1.4.2 Crop output

The string given to --output option will be interpretted depending on how many crops were requested, see Crop modes:

The header of each output cropped image will contain the names of the input image(s) it was cut from. If a name is longer than the 70 character space that the FITS standard allows for header keyword values, the name will be cut into several keywords from the nearest slash (/). The keywords have the following format: ICFn_m (for Crop File). Where n is the number of the image used in this crop and m is the part of the name (it can be broken into multiple keywords). Following the name is another keyword named ICFnPIX which shows the pixel range from that input image in the same syntax as Crop section syntax. So this string can be directly given to the --section option later.

Once done, a log file can be created in the current directory with the --log option. This file will have three columns and the same number of rows as the number of cropped images. There are also comments on the top of the log file explaining basic information about the run and descriptions for the columns. A short description of the columns is also given below:

  1. The cropped image file name for that row.
  2. The number of input images that were used to create that image.
  3. A 0 if the central few pixels (value to the --checkcenter option) are blank and 1 if they aren’t. When the crop was not defined by its center (see Crop modes), or --checkcenter was given a value of 0 (see Invoking Crop), the center will not be checked and this column will be given a value of -1.

Next: , Previous: , Up: Data manipulation   [Contents][Index]

6.2 Arithmetic

It is commonly necessary to do operations on some or all of the elements of a dataset independently (pixels in an image). For example, in the reduction of raw data it is necessary to subtract the Sky value (Sky value) from each image image. Later (once the images as warped into a single grid using Warp for example, see Warp), the images are co-added (the output pixel grid is the average of the pixels of the individual input images). Arithmetic is Gnuastro’s program for such operations on your datasets directly from the command-line. It currently uses the reverse polish or post-fix notation, see Reverse polish notation and will work on the native data types of the input images/data to reduce CPU and RAM resources, see Numeric data types. For more information on how to run Arithmetic, please see Invoking Arithmetic.


Next: , Previous: , Up: Arithmetic   [Contents][Index]

6.2.1 Reverse polish notation

The most common notation for arithmetic operations is the infix notation where the operator goes between the two operands, for example \(4+5\). While the infix notation is the preferred way in most programming languages, currently Arithmetic does not use it since it will require parenthesis which can complicate the implementation of the code. In the near future we do plan to adopt this notation58, but for the time being (due to time constraints on the developers), Arithmetic uses the post-fix or reverse polish notation. The Wikipedia article provides some excellent explanation on this notation but here we will give a short summary here for self-sufficiency.

In the post-fix notation, the operator is placed after the operands, as we will see below this removes the need to define parenthesis for most ordinary operators. For example, instead of writing 5+6, we write 5 6 +. To easily understand how this notation works, you can think of each operand as a node in a first-in-first-out stack. Every time an operator is confronted, it pops the number of operands it needs from the top of the stack (so they don’t exist in the stack any more), does its operation and pushes the result back on top of the stack. So if you want the average of 5 and 6, you would write: 5 6 + 2 /. The operations that are done are:

  1. 5 is an operand, so it is pushed to the top of the stack.
  2. 6 is an operand, so it is pushed to the top of the stack.
  3. + is a binary operator, so pull the top two elements of the stack and perform addition on them (the order is \(5+6\) in the example above). The result is 11, push it on top of the stack.
  4. 2 is an operand so push it onto the top of the stack.
  5. / is a binary operator, so pull out the top two elements of the stack (top-most is 2, then 11) and divide the second one by the first.

In the Arithmetic program, the operands can be FITS images or numbers. As you can see, very complicated procedures can be created without the need for parenthesis or worrying about precedence. Even functions which take an arbitrary number of arguments can be defined in this notation. This is a very powerful notation and is used in languages like Postscript 59 (the programming language in Postscript and compiled into PDF files) uses this notation.


Next: , Previous: , Up: Arithmetic   [Contents][Index]

6.2.2 Arithmetic operators

The recognized operators in Arithmetic are listed below. See Reverse polish notation for more on how the operators and operands should be ordered on the command-line. The operands to all operators can be a data array (for example a FITS image) or a number, the output will be an array or number according to the inputs. For example a number multiplied by an array will produce an array. The conditional operators will return pixel, or numerical values of 0 (false) or 1 (true) and stored in an unsigned char data type (see Numeric data types).

+

Addition, so “4 5 +” is equivalent to \(4+5\).

-

Subtraction, so “4 5 -” is equivalent to \(4-5\).

x

Multiplication, so “4 5 x” is equivalent to \(4\times5\).

/

Division, so “4 5 /” is equivalent to \(4/5\).

%

Modulo (remainder), so “3 2 %” is equivalent to \(1\). Note that the modulo operator only works on integer types.

abs

Absolute value of first operand, so “4 abs” is equivalent to \(|4|\).

pow

First operand to the power of the second, so “4.3 5f pow” is equivalent to \(4.3^{5}\). Currently pow will only work on single or double precision floating point numbers or images. To be sure that a number is read as a floating point (even if it doesn’t have any non-zero decimals) put an f after it.

sqrt

The square root of the first operand, so “5 sqrt” is equivalent to \(\sqrt{5}\). The output type is determined from the input, so the output of this example will be 2 (since 5 doesn’t have any non-zero decimal digits). If you want 2.23607, run 5f sqrt instead, the f will ensure that a number will be read as a floating point number, even if it doesn’t have decimal digits. If the input image has an integer type, you should explicitly convert the image to floating point, for example a.fits float sqrt, see the type conversion operators below.

log

Natural logarithm of first operand, so “4 log” is equivalent to \(\ln(4)\). The output type is determined from the input, see the explanation under sqrt for more.

log10

Base-10 logarithm of first operand, so “4 log10” is equivalent to \(\log(4)\). The output type is determined from the input, see the explanation under sqrt for more.

minvalue

Minimum (non-blank) value in the top operand on the stack, so “a.fits minvalue” will push the the minimum pixel value in this image onto the stack. Therefore this operator is mainly intended for data (for example images), if the top operand is a number, this operator just returns it without any change. So note that when this operator acts on a single image, the output will no longer be an image, but a number. The output of this operand is in the same type as the input.

maxvalue

Maximum (non-blank) value of first operand in the same type, similar to minvalue.

numvalue

Number of non-blank elements in first operand in the uint64 type, similar to minvalue.

sumvalue

Sum of non-blank elements in first operand in the float32 type, similar to minvalue.

meanvalue

Mean value of non-blank elements in first operand in the float32 type, similar to minvalue.

stdvalue

Standard deviation of non-blank elements in first operand in the float32 type, similar to minvalue.

medianvalue

Median of non-blank elements in first operand with the same type, similar to minvalue.

min

The first popped operand to this operator must be a positive integer number which specifies how many further operands should be popped from the stack. The given number of operands must have the same type and size. Each pixel of the output of this operator will be set to the minimum value of the given number of operands (images) in that pixel.

For example the following command will produce an image with the same size and type as the inputs but each output pixel is set to the minimum respective pixel value of the three input images.

$ astarithmetic a.fits b.fits c.fits 3 min

Important notes:

max

Similar to min, but the pixels of the output will contain the maximum of the respective pixels in all operands in the stack.

num

Similar to min, but the pixels of the output will contain the number of the respective non-blank pixels in all input operands.

sum

Similar to min, but the pixels of the output will contain the sum of the respective pixels in all input operands.

mean

Similar to min, but the pixels of the output will contain the mean (average) of the respective pixels in all operands in the stack.

std

Similar to min, but the pixels of the output will contain the standard deviation of the respective pixels in all operands in the stack.

median

Similar to min, but the pixels of the output will contain the median of the respective pixels in all operands in the stack.

lt

Less than: If the second popped (or left operand in infix notation, see Reverse polish notation) value is smaller than the first popped operand, then this function will return a value of 1, otherwise it will return a value of 0. If both operands are images, then all the pixels will be compared with their counterparts in the other image. If only one operand is an image, then all the pixels will be compared with the the single value (number) of the other operand. Finally if both are numbers, then the output is also just one number (0 or 1). When the output is not a single number, it will be stored as an unsigned char type.

le

Less or equal: similar to lt (‘less than’ operator), but returning 1 when the second popped operand is smaller or equal to the first.

gt

Greater than: similar to lt (‘less than’ operator), but returning 1 when the second popped operand is greater than the first.

ge

Greater or equal: similar to lt (‘less than’ operator), but returning 1 when the second popped operand is larger or equal to the first.

eq

Equality: similar to lt (‘less than’ operator), but returning 1 when the two popped operands are equal (to double precision floating point accuracy).

ne

Non-Equality: similar to lt (‘less than’ operator), but returning 1 when the two popped operands are not equal (to double precision floating point accuracy).

and

Logical AND: returns 1 if both operands have a non-zero value and 0 if both are zero. Both operands have to be the same kind: either both images or both numbers.

or

Logical OR: returns 1 if either one of the operands is non-zero and 0 only when both operators are zero. Both operands have to be the same kind: either both images or both numbers.

not

Logical NOT: returns 1 when the operand is zero and 0 when the operand is non-zero. The operand can be an image or number, for an image, it is applied to each pixel separately.

isblank

Test for a blank value (see Blank pixels). In essence, this is very similar to the conditional operators: the output is either 1 or 0 (see the ‘less than’ operator above). The difference is that it only needs one operand. Because of the definition of a blank pixel, a blank value is not even equal to itself, so you cannot use the equal operator above to select blank pixels. See the “Blank pixels” box below for more on Blank pixels in Arithmetic.

where

Change the input (pixel) value where/if a certain condition holds. The conditional operators above can be used to define the condition. Three operands are required for where. The input format is demonstrated in this simplified example:

$ astarithmetic modify.fits binary.fits if-true.fits where

The value of any pixel in modify.fits that corresponds to a non-zero pixel of binary.fits will be changed to the value of the same pixel in if-true.fits (this may also be a number). The 3rd and 2nd popped operands (modify.fits and binary.fits respectively, see Reverse polish notation) have to have the same dimensions/size. if-true.fits can be either a number, or have the same dimension/size as the other two.

The 2nd popped operand (binary.fits) has to have uint8 (or unsigned char in standard C) type (see Numeric data types). It is treated as a binary dataset (with only two values: zero and non-zero, hence the name binary.fits in this example). However, commonly you won’t be dealing with an actual FITS file of a condition/binary image. You will probably define the condition in the same run based on some other reference image and use the conditional and logical operators above to make a true/false (or one/zero) image for you internally. For example the case below:

$ astarithmetic in.fits reference.fits 100 gt new.fits where

In the example above, any of the in.fits pixels that has a value in reference.fits greater than 100, will be replaced with the corresponding pixel in new.fits. Effectively the reference.fits 100 gt part created the condition/binary image which was added to the stack (in memory) and later used by where. The command above is thus equivalent to these two commands:

$ astarithmetic reference.fits 100 gt --output=binary.fits
$ astarithmetic in.fits binary.fits new.fits where

Finally, the input operands are read and used independently, so you can use the same file more than once as any of the operands.

When the 1st popped operand to where (if-true.fits) is a single number, it may be a NaN value (or any blank value, depending on its type) like the example below (see Blank pixels). When the number is blank, it will be converted to the blank value of the type of the 3rd popped operand (in.fits). Hence, in the example below, all the pixels in reference.fits that have a value greater than 100, will become blank in the natural data type of in.fits (even though NaN values are only defined for floating point types).

$ astarithmetic in.fits reference.fits 100 gt nan where
bitand

Bitwise AND operator: only bits with values of 1 in both popped operands will get the value of 1, the rest will be set to 0. For example (assuming numbers can be written as bit strings on the command-line): 00101000 00100010 bitand will give 00100000. Note that the bitwise operators only work on integer type datasets.

bitor

Bitwise inclusive OR operator: The bits where at least one of the two popped operands has a 1 value get a value of 1, the others 0. For example (assuming numbers can be written as bit strings on the command-line): 00101000 00100010 bitand will give 00101010. Note that the bitwise operators only work on integer type datasets.

bitxor

Bitwise exclusive OR operator: A bit will be 1 if it differs between the two popped operands. For example (assuming numbers can be written as bit strings on the command-line): 00101000 00100010 bitand will give 00001010. Note that the bitwise operators only work on integer type datasets.

lshift

Bitwise left shift operator: shift all the bits of the first operand to the left by a number of times given by the second operand. For example (assuming numbers can be written as bit strings on the command-line): 00101000 2 lshift will give 10100000. This is equivalent to multiplication by 4. Note that the bitwise operators only work on integer type datasets.

rshift

Bitwise right shift operator: shift all the bits of the first operand to the right by a number of times given by the second operand. For example (assuming numbers can be written as bit strings on the command-line): 00101000 2 rshift will give 00001010. Note that the bitwise operators only work on integer type datasets.

bitnot

Bitwise not (more formally known as one’s complement) operator: flip all the bits of the popped operand (note that this is the only unary, or single operand, bitwise operator). In other words, any bit with a value of 0 is changed to 1 and vice-versa. For example (assuming numbers can be written as bit strings on the command-line): 00101000 bitnot will give 11010111. Note that the bitwise operators only work on integer type datasets/numbers.

uint8

Convert the type of the popped operand to 8-bit un-signed integer type (see Numeric data types). The internal conversion of C will be used.

int8

Convert the type of the popped operand to 8-bit signed integer type (see Numeric data types). The internal conversion of C will be used.

uint16

Convert the type of the popped operand to 16-bit un-signed integer type (see Numeric data types). The internal conversion of C will be used.

int16

Convert the type of the popped operand to 16-bit signed integer (see Numeric data types). The internal conversion of C will be used.

uint32

Convert the type of the popped operand to 32-bit un-signed integer type (see Numeric data types). The internal conversion of C will be used.

int32

Convert the type of the popped operand to 32-bit signed integer type (see Numeric data types). The internal conversion of C will be used.

uint64

Convert the type of the popped operand to 64-bit un-signed integer (see Numeric data types). The internal conversion of C will be used.

float32

Convert the type of the popped operand to 32-bit (single precision) floating point (see Numeric data types). The internal conversion of C will be used.

float64

Convert the type of the popped operand to 64-bit (double precision) floating point (see Numeric data types). The internal conversion of C will be used.

Blank pixels in Arithmetic: Blank pixels in the image (see Blank pixels) will be stored based on the data type. When the input is floating point type, blank values are NaN. One aspect of NaN values is that by definition they will fail on any comparison. Hence both equal and not-equal operators will fail when both their operands are NaN! Therefore, the only way to guarantee selection of blank pixels is through the isblank operator explained above.

One way you can exploit this property of the NaN value to your advantage is when you want a fully zero-valued image (even over the blank pixels) based on an already existing image (with same size and world coordinate system settings). The following command will produce this for you:

$ astarithmetic input.fits nan eq --output=all-zeros.fits

Note that on the command-line you can write NaN in any case (for example NaN, or NAN are also acceptable). Reading NaN as a floating point number in Gnuastro isn’t case-sensitive.


Previous: , Up: Arithmetic   [Contents][Index]

6.2.3 Invoking Arithmetic

Arithmetic will do pixel to pixel arithmetic operations on the individual pixels of input data and/or numbers. For the full list of operators with explanations, please see Arithmetic operators. Any operand that only has a single element (number, or single pixel FITS image) will be read as a number, the rest of the inputs must have the same dimensions. The general template is:

$ astarithmetic [OPTION...] ASTRdata1 [ASTRdata2] OPERATOR ...

One line examples:

## Calculate (10.32-3.84)^2.7 quietly (will just print 155.329):
$ astarithmetic -q 10.32 3.84 - 2.7 pow

## Inverse the input image (1/pixel):
$ astarithmetic 1 image.fits / --out=inverse.fits

## Multiply each pixel in image by -1:
$ astarithmetic image.fits -1 x --out=negative.fits

## Subtract extension 4 from extension 1 (counting from zero):
$ astarithmetic image.fits image.fits - --out=skysub.fits           \
                --hdu=1 --hdu=4

## Add two images, then divide them by 2 (2 is read as floating point):
$ astarithmetic image1.fits image2.fits + 2f / --out=average.fits

## Use Arithmetic's average operator:
$ astarithmetic image1.fits image2.fits average --out=average.fits

## Calculate the median of three images in three separate extensions:
$ astarithmetic img1.fits img2.fits img3.fits median                \
                -h0 -h1 -h2 --out=median.fits

If the output is an image, and the --output option is not given, automatic output will use the name of the first FITS image encountered to generate an output file name, see Automatic output. Also, output WCS information will be taken from the first input image encountered. When the output is a single number, that number will be printed in the standard output and no output file will be created. Arithmetic’s notation for giving operands to operators is described in Reverse polish notation. To ignore certain pixels, set them as blank, see Blank pixels, for example with the where operator (see Arithmetic operators). See Common options for a review of the options in all Gnuastro programs. Arithmetic just redefines the --hdu option as explained below:

-h INT/STR
--hdu INT/STR

The header data unit of the input FITS images, see Input/Output options. Unlike most options in Gnuastro (which will ultimately only have one value for this option), Arithmetic allows --hdu to be called multiple times and the value of each invocation will be stored separately (for the unlimited number of input images you would like to use). Recall that for other programs this (common) option only takes a single value. So in other programs, if you specify it multiple times on the command-line, only the last value will be used and in the configuration files, it will be ignored if it already has a value.

The order of the values to --hdu has to be in the same order as input FITS images. Options are first read from the command-line (from left to right), then top-down in each configuration file, see Configuration file precedence.

If the number of HDUs is less than the number of input images, Arithmetic will abort and notify you. However, if there are more HDUs than FITS images, there is no problem: they will be used in the given order (every time a FITS image comes up on the stack) and the extra HDUs will be ignored in the end. So there is no problem with having extra HDUs in the configuration files and by default several HDUs with a value of 0 are kept in the system-wide configuration file when you install Gnuastro.

-g INT/STR
--globalhdu INT/STR

Use the value to this option as the HDU of all input FITS files. This option is very convenient when you have many input files and the dataset of interest is in the same HDU of all the files. When this option is called, any values given to the --hdu option (explained above) are ignored and will not be used.

Arithmetic accepts two kinds of input: images and numbers. Images are considered to be any of the inputs that is a file name of a recognized type (see Arguments) and has more than one element/pixel. Numbers on the command-line will be read into the smallest type (see Numeric data types) that can store them, so -2 will be read as a char type (which is signed on most systems and can thus keep negative values), 2500 will be read as an unsigned short (all positive numbers will be read as unsigned), while 3.1415926535897 will be read as a double and 3.14 will be read as a float. To force a number to be read as float, add a f after it, so 5f will be added to the stack as float (see Reverse polish notation).

Unless otherwise stated (in Arithmetic operators), the operators can deal with numeric multiple data types (see Numeric data types). For example in “a.fits b.fits +”, the image types can be long and float. In such cases, C’s internal type conversion will be used. The output type will be set to the higher-ranking type of the two inputs. Unsigned integer types have smaller ranking than their signed counterparts and floating point types have higher ranking than the integer types. So the internal C type conversions done in the example above are equivalent to this piece of C:

size_t i;
long a[100];
float b[100], out[100];
for(i=0;i<100;++i) out[i]=a[i]+b[i];

Relying on the default C type conversion significantly speeds up the processing and also requires less RAM (when using very large images). However this great advantage comes at the cost of preparing for all the combinations of types while building/compiling Gnuastro. With the full list of CFITSIO types, compilation can take roughly half an hour. However, some types are not too common, therefore Gnuastro comes with a set of configure time options letting you enable certain types for native compilation. You can see the full list of --enable-bin-op-XXXX options in Gnuastro configure options.

When a type isn’t enabled for native binary operations, the input data will be internally converted to the smallest, larger type that was enabled. This can slow down your processing (which is faster for smaller/integer types) and consume more RAM (to copy the new type), so if you often deal with data of a specific types, it is much better to make the one-time investment at compilation time and reap the benefits each time you run Gnuastro/Arithmetic. Note all arithmetic operations are done by gal_data_arithmetic function in Gnuastro library, so the choice of native binary operator types will affect any program (within Gnuastro or outside of it) that uses this function (including Arithmetic).

Some operators can only work on integer types (of any length, for example bitwise operators) while others only work on floating point types, (currently only the pow operator). In such cases, if the operand type(s) are different an error will be printed and internal conversion won’t occur. Arithmetic also comes with internal type conversion operators which you can use to convert the data into the appropriate type, see Arithmetic operators.

The hyphen (-) can be used both to specify options (see Options) and also to specify a negative number which might be necessary in your arithmetic. In order to enable you to do this, Arithmetic will first parse all the input strings and if the first character after a hyphen is a digit, then that hyphen is temporarily replaced by the vertical tab character which is not commonly used. The arguments are then parsed and these strings will not be specified as an option. Then the given arguments are parsed and any vertical tabs are replaced back with a hyphen so they can be read as negative numbers. Therefore, as long as the names of the files you want to work on, don’t start with a vertical tab followed by a digit, there is no problem. An important consequence of this implementation is that you should not write negative fractions like this: -.3, instead write them as -0.3.

Without any images, Arithmetic will act like a simple calculator and print the resulting output number on the standard output like the first example above. If you really want such calculator operations on the command-line, AWK (GNU AWK is the most common implementation) is much faster, easier and much more powerful. For example, the numerical one-line example above can be done with the following command. In general AWK is a fantastic tool and GNU AWK has a wonderful manual (https://www.gnu.org/software/gawk/manual/). So if you often confront situations like this, or have to work with large text tables/catalogs, be sure to checkout AWK and simplify your life.

$ echo "" | awk '{print (10.32-3.84)^2.7}'
155.329

Next: , Previous: , Up: Data manipulation   [Contents][Index]

6.3 Convolve

On an image, convolution can be thought of as a process to blur or remove the contrast in an image. If you are already familiar with the concept and just want to run Convolve, you can jump to Convolution kernel and Invoking Convolve and skip the lengthy introduction on the basic definitions and concepts of convolution.

There are generally two methods to convolve an image. The first and more intuitive one is in the “spatial domain” or using the actual image pixel values, see Spatial domain convolution. The second method is when we manipulate the “frequency domain”, or work on the magnitudes of the different frequencies that constitute the image, see Frequency domain and Fourier operations. Understanding convolution in the spatial domain is more intuitive and thus recommended if you are just starting to learn about convolution. However, getting a good grasp of the frequency domain is a little more involved and needs some concentration and some mathematical proofs. However, its reward is a faster operation and more importantly a very fundamental understanding of this very important operation.

Convolution of an image will generally result in blurring the image because it mixes pixel values. In other words, if the image has sharp differences in neighboring pixel values60, those sharp differences will become smoother. This has very good consequences in detection of signal in noise for example. In an actual observed image, the variation in neighboring pixel values due to noise can be very high. But after convolution, those variations will decrease and we have a better hope in detecting the possible underlying signal. Another case where convolution is extensively used is in mock images and modeling in general, convolution can be used to simulate the effect of the atmosphere or the optical system on the mock profiles that we create, see Point Spread Function. Convolution is a very interesting and important topic in any form of signal analysis (including astronomical observations). So we have thoroughly61 explained the concepts behind it in the following sub-sections.


Next: , Previous: , Up: Convolve   [Contents][Index]

6.3.1 Spatial domain convolution

The pixels in an input image represent different “spatial” positions, therefore when convolution is done only using the actual input pixel values, we name the process as being done in the “Spatial domain”. In particular this is in contrast to the “frequency domain” that we will discuss later in Frequency domain and Fourier operations. In the spatial domain (and in realistic situations where the image and the convolution kernel don’t extend to infinity), convolution is the process of changing the value of one pixel to the weighted average of all the pixels in its neighborhood.

The ‘neighborhood’ of each pixel (how many pixels in which direction) and the ‘weight’ function (how much each neighboring pixel should contribute depending on its position) are given through a second image which is known as a “kernel”62.


Next: , Previous: , Up: Spatial domain convolution   [Contents][Index]

6.3.1.1 Convolution process

In convolution, the kernel specifies the weight and positions of the neighbors of each pixel. To find the convolved value of a pixel, the central pixel of the kernel is placed on that pixel. The values of each overlapping pixel in the kernel and image are multiplied by each other and summed for all the kernel pixels. To have one pixel in the center, the sides of the convolution kernel have to be an odd number. This process effectively mixes the pixel values of each pixel with its neighbors, resulting in a blurred image compared to the sharper input image.

Formally, convolution is one kind of linear ‘spatial filtering’ in image processing texts. If we assume that the kernel has \(2a+1\) and \(2b+1\) pixels on each side, the convolved value of a pixel placed at \(x\) and \(y\) (\(C_{x,y}\)) can be calculated from the neighboring pixel values in the input image (\(I\)) and the kernel (\(K\)) from

$$C_{x,y}=\sum_{s=-a}^{a}\sum_{t=-b}^{b}K_{s,t}\times{}I_{x+s,y+t}.$$

Any pixel coordinate that is outside of the image in the equation above will be considered to be zero. When the kernel is symmetric about its center the blurred image has the same orientation as the original image. However, if the kernel is not symmetric, the image will be affected in the opposite manner, this is a natural consequence of the definition of spatial filtering. In order to avoid this we can rotate the kernel about its center by 180 degrees so the convolved output can have the same original orientation. Technically speaking, only if the kernel is flipped the process is known Convolution. If it isn’t it is known as Correlation.

To be a weighted average, the sum of the weights (the pixels in the kernel) have to be unity. This will have the consequence that the convolved image of an object and un-convolved object will have the same brightness (see Flux Brightness and magnitude), which is natural, because convolution should not eat up the object photons, it only disperses them.


Previous: , Up: Spatial domain convolution   [Contents][Index]

6.3.1.2 Edges in the spatial domain

In purely ‘linear’ spatial filtering (convolution), there are problems on the edges of the input image. Here we will explain the problem in the spatial domain. For a discussion of this problem from the frequency domain perspective, see Edges in the frequency domain. The problem originates from the fact that on the edges, in practice63, the sum of the weights we use on the actual image pixels is not unity. For example, as discussed above, a profile in the center of an image will have the same brightness before and after convolution. However, for partially imaged profile on the edge of the image, the brightness (sum of its pixel fluxes within the image, see Flux Brightness and magnitude) will not be equal, some of the flux is going to be ‘eaten’ by the edges.

If you ran $ make check on the source files of Gnuastro, you can see the this effect by comparing the convolve_frequency.fits with convolve_spatial.fits in the ./tests/ directory. In the spatial domain, by default, no assumption will be made about pixels outside of the image or any blank pixels in the image. The problem explained above will also occur on the sides of blank regions (see Blank pixels). The solution to this edge effect problem is only possible in the spatial domain. For pixels near the edge, we have to abandon the assumption that the sum of the kernel pixels is unity during the convolution process64. So taking \(W\) as the sum of the kernel pixels that overlapped with non-blank and in-image pixels, the equation in Convolution process will become:

$$C_{x,y}= { \sum_{s=-a}^{a}\sum_{t=-b}^{b}K_{s,t}\times{}I_{x+s,y+t} \over W}.$$

In this manner, objects which are near the edges of the image or blank pixels will also have the same brightness (within the image) before and after convolution. This correction is applied by default in Convolve when convolving in the spatial domain. To disable it, you can use the --noedgecorrection option. In the frequency domain, there is no way to avoid this loss of flux near the edges of the image, see Edges in the frequency domain for an interpretation from the frequency domain perspective.

Note that the edge effect discussed here is different from the one in If convolving afterwards. In making mock images we want to simulate a real observation. In a real observation the images of the galaxies on the sides of the CCD are first blurred by the atmosphere and instrument, then imaged. So light from the parts of a galaxy which are immediately outside the CCD will affect the parts of the galaxy which are covered by the CCD. Therefore in modeling the observation, we have to convolve an image that is larger than the input image by exactly half of the convolution kernel. We can hence conclude that this correction for the edges is only useful when working on actual observed images (where we don’t have any more data on the edges) and not in modeling.


Next: , Previous: , Up: Convolve   [Contents][Index]

6.3.2 Frequency domain and Fourier operations

Getting a good grip on the frequency domain is usually not an easy job! So we have decided to give the issue a complete review here. Convolution in the frequency domain (see Convolution theorem) heavily relies on the concepts of Fourier transform (Fourier transform) and Fourier series (Fourier series) so we will be investigating these important operations first. It has become something of a cliché for people to say that the Fourier series “is a way to represent a (wave-like) function as the sum of simple sine waves” (from Wikipedia). However, sines themselves are abstract functions, so this statement really adds no extra layer of physical insight.

Before jumping head-first into the equations and proofs, we will begin with a historical background to see how the importance of frequencies actually roots in our ancient desire to see everything in terms of circles. A short review of how the complex plane should be interpreted is then given. Having paved the way with these two basics, we define the Fourier series and subsequently the Fourier transform. The final aim is to explain discrete Fourier transform, however some very important concepts need to be solidified first: The Dirac comb, convolution theorem and sampling theorem. So each of these topics are explained in their own separate sub-sub-section before going on to the discrete Fourier transform. Finally we revisit (after Edges in the spatial domain) the problem of convolution on the edges, but this time in the frequency domain. Understanding the sampling theorem and the discrete Fourier transform is very important in order to be able to pull out valuable science from the discrete image pixels. Therefore we have included the mathematical proofs and figures so you can have a clear understanding of these very important concepts.


Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

6.3.2.1 Fourier series historical background

Ever since the ancient times, the circle has been (and still is) the simplest shape for abstract comprehension. All you need is a center point and a radius and you are done. All the points on a circle are at a fixed distance from the center. However, the moment you try to connect this elegantly simple and beautiful abstract construct (the circle) with the real world (for example compute its area or its circumference), things become really hard (ideally, impossible) because the irrational number \(\pi\) gets involved.

The key to understanding the Fourier series (thus the Fourier transform and finally the Discrete Fourier Transform) is our ancient desire to express everything in terms of circles or the most exceptionally simple and elegant abstract human construct. Most people prefer to say the same thing in a more ahistorical manner: to break a function into sines and cosines. As the term “ancient” in the previous sentence implies, Jean-Baptiste Joseph Fourier (1768 – 1830 A.D.) was not the first person to do this. The main reason we know this process by his name today is that he came up with an ingenious method to find the necessary coefficients (radius of) and frequencies (“speed” of rotation on) the circles for any generic (integrable) function.

Middle ages epicycles along
with two demonstrations of breaking a generic function using epicycles.

Figure 6.1: Epicycles and the Fourier series. Left: A demonstration of Mercury’s epicycles relative to the “center of the world” by Qutb al-Din al-Shirazi (1236 – 1311 A.D.) retrieved from Wikipedia. Middle and Right: How adding more epicycles (or terms in the Fourier series) will approximate functions. The right animation is also available.

Like most aspects of mathematics, this process of interpreting everything in terms of circles, began for astronomical purposes. When astronomers noticed that the orbit of Mars and other outer planets, did not appear to be a simple circle (as everything should have been in the heavens). At some point during their orbit, the revolution of these planets would become slower, stop, go back a little (in what is known as the retrograde motion) and then continue going forward again.

The correction proposed by Ptolemy (90 – 168 A.D.) was the most agreed upon. He put the planets on Epicycles or circles whose center itself rotates on a circle whose center is the earth. Eventually, as observations became more and more precise, it was necessary to add more and more epicycles in order to explain the complex motions of the planets65. Figure 6.1(Left) shows an example depiction of the epicycles of Mercury in the late 13th century.

Of course we now know that if they had abdicated the Earth from its throne in the center of the heavens and allowed the Sun to take its place, everything would become much simpler and true. But there wasn’t enough observational evidence for changing the “professional consensus” of the time to this radical view suggested by a small minority66. So the pre-Galilean astronomers chose to keep Earth in the center and find a correction to the models (while keeping the heavens a purely “circular” order).

The main reason we are giving this historical background which might appear off topic is to give historical evidence that while such “approximations” do work and are very useful for pragmatic reasons (like measuring the calendar from the movement of astronomical bodies). They offer no physical insight. The astronomers who were involved with the Ptolemaic world view had to add a huge number of epicycles during the centuries after Ptolemy in order to explain more accurate observations. Finally the death knell of this world-view was Galileo’s observations with his new instrument (the telescope). So the physical insight, which is what Astronomers and Physicists are interested in (as opposed to Mathematicians and Engineers who just like proving and optimizing or calculating!) comes from being creative and not limiting our selves to such approximations. Even when they work.


Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

6.3.2.2 Circles and the complex plane

Before going onto the derivation, it is also useful to review how the complex numbers and their plane relate to the circles we talked about above. The two schematics in the middle and right of Figure 6.1 show how a 1D function of time can be made using the 2D real and imaginary surface. Seeing the animation in Wikipedia will really help in understanding this important concept. At each point in time, we take the vertical coordinate of the point and use it to find the value of the function at that point in time. Figure 6.2 shows this relation with the axes marked.

Leonhard Euler67 (1707 – 1783 A.D.) showed that the complex exponential (\(e^{iv}\) where \(v\) is real) is periodic and can be written as: \(e^{iv}=\cos{v}+isin{v}\). Therefore \(e^{iv+2\pi}=e^{iv}\). Later, Caspar Wessel (mathematician and cartographer 1745 – 1818 A.D.) showed how complex numbers can be displayed as vectors on a plane. Euler’s identity might seem counter intuitive at first, so we will try to explain it geometrically (for deeper physical insight). On the real-imaginary 2D plane (like the left hand plot in each box of Figure 6.2), multiplying a number by \(i\) can be interpreted as rotating the point by \(90\) degrees (for example the value \(3\) on the real axis becomes \(3i\) on the imaginary axis). On the other hand, \(e\equiv\lim_{n\rightarrow\infty}(1+{1\over n})^n\), therefore, defining \(m\equiv nu\), we get:

$$e^{u}=\lim_{n\rightarrow\infty}\left(1+{1\over n}\right)^{nu} =\lim_{n\rightarrow\infty}\left(1+{u\over nu}\right)^{nu} =\lim_{m\rightarrow\infty}\left(1+{u\over m}\right)^{m}$$

Taking \(u\equiv iv\) the result can be written as a generic complex number (a function of \(v\)):

$$e^{iv}=\lim_{m\rightarrow\infty}\left(1+i{v\over m}\right)^{m}=a(v)+ib(v)$$

For \(v=\pi\), a nice geometric animation of going to the limit can be seen on Wikipedia. We see that \(\lim_{m\rightarrow\infty}a(\pi)=-1\), while \(\lim_{m\rightarrow\infty}b(\pi)=0\), which gives the famous \(e^{i\pi}=-1\) equation. The final value is the real number \(-1\), however the distance of the polygon points traversed as \(m\rightarrow\infty\) is half the circumference of a circle or \(\pi\), showing how \(v\) in the equation above can be interpreted as an angle in units of radians and therefore how \(a(v)=cos(v)\) and \(b(v)=sin(v)\).

Since \(e^{iv}\) is periodic (let’s assume with a period of \(T\)), it is more clear to write it as \(v\equiv{2{\pi}n\over T}t\) (where \(n\) is an integer), so \(e^{iv}=e^{i{2{\pi}n\over T}t}\). The advantage of this notation is that the period (\(T\)) is clearly visible and the frequency (\(2{\pi}n \over T\), in units of 1/cycle) is defined through the integer \(n\). In this notation, \(t\) is in units of “cycle”s.

As we see from the examples in Figure 6.1 and Figure 6.2, for each constituting frequency, we need a respective ‘magnitude’ or the radius of the circle in order to accurately approximate the desired 1D function. The concepts of “period” and “frequency” are relatively easy to grasp when using temporal units like time because this is how we define them in every-day life. However, in an image (astronomical data), we are dealing with spatial units like distance. Therefore, by one “period” we mean the distance at which the signal is identical and frequency is defined as the inverse of that spatial “period”. The complex circle of Figure 6.2 can be thought of the Moon rotating about Earth which is rotating around the Sun; so the “Real (signal)” axis shows the Moon’s position as seen by a distant observer on the Sun as time goes by. Because of the scalar (not having any direction or vector) nature of time, Figure 6.2 is easier to understand in units of time. When thinking about spatial units, mentally replace the “Time (sec)” axis with “Distance (meters)”. Because length has direction and is a vector, visualizing the rotation of the imaginary circle and the advance along the “Distance (meters)” axis is not as simple as temporal units like time.

gnuastro-figures/iandtime

Figure 6.2: Relation between the real (signal), imaginary (\(i\equiv\sqrt{-1}\)) and time axes at two snapshots of time.


Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

6.3.2.3 Fourier series

In astronomical images, our variable (brightness, or number of photo-electrons, or signal to be more generic) is recorded over the 2D spatial surface of a camera pixel. However to make things easier to understand, here we will assume that the signal is recorded in 1D (assume one row of the 2D image pixels). Also for this section and the next (Fourier transform) we will be talking about the signal before it is digitized or pixelated. Let’s assume that we have the continuous function \(f(l)\) which is integrable in the interval \([l_0, l_0+L]\) (always true in practical cases like images). Take \(l_0\) as the position of the first pixel in the assumed row of the image and \(L\) as the width of the image along that row. The units of \(l_0\) and \(L\) can be in any spatial units (for example meters) or an angular unit (like radians) multiplied by a fixed distance which is more common.

To approximate \(f(l)\) over this interval, we need to find a set of frequencies and their corresponding ‘magnitude’s (see Circles and the complex plane). Therefore our aim is to show \(f(l)\) as the following sum of periodic functions:

$$f(l)=\displaystyle\sum_{n=-\infty}^{\infty}c_ne^{i{2{\pi}n\over L}l} $$

Note that the different frequencies (\(2{\pi}n/L\), in units of cycles per meters for example) are not arbitrary. They are all integer multiples of the fundamental frequency of \(\omega_0=2\pi/L\). Recall that \(L\) was the length of the signal we want to model. Therefore, we see that the smallest possible frequency (or the frequency resolution) in the end, depends on the length we observed the signal or \(L\). In the case of each dimension on an image, this is the size of the image in the respective dimension. The frequencies have been defined in this “harmonic” fashion to insure that the final sum is periodic outside of the \([l_0, l_0+L]\) interval too. At this point, you might be thinking that the sky is not periodic with the same period as my camera’s view angle. You are absolutely right! The important thing is that since your camera’s observed region is the only region we are “observing” and will be using, the rest of the sky is irrelevant; so we can safely assume the sky is periodic outside of it. However, this working assumption will haunt us later in Edges in the frequency domain.

The frequencies are thus determined by definition. So all we need to do is to find the coefficients (\(c_n\)), or magnitudes, or radii of the circles for each frequency which is identified with the integer \(n\). Fourier’s approach was to multiply both sides with a fixed term:

$$f(l)e^{-i{2{\pi}m\over L}l}=\displaystyle\sum_{n=-\infty}^{\infty}c_ne^{i{2{\pi}(n-m)\over L}l} $$

where \(m>0\)68. We can then integrate both sides over the observation period:

$$\int_{l_0}^{l_0+L}f(l)e^{-i{2{\pi}m\over L}l}dl =\int_{l_0}^{l_0+L}\displaystyle\sum_{n=-\infty}^{\infty}c_ne^{i{2{\pi}(n-m)\over L}l}dl=\displaystyle\sum_{n=-\infty}^{\infty}c_n\int_{l_0}^{l_0+L}e^{i{2{\pi}(n-m)\over L}l}dl $$

Both \(n\) and \(m\) are positive integers. Also, we know that a complex exponential is periodic so after one period (\(L\)) it comes back to its starting point. Therefore \(\int_{l_0}^{l_0+L}e^{2{\pi}k/L}dl=0\) for any \(k>0\). However, when \(k=0\), this integral becomes: \(\int_{l_0}^{l_0+T}e^0dt=\int_{l_0}^{l_0+T}dt=T\). Hence since the integral will be zero for all \(n{\neq}m\), we get:

$$\displaystyle\sum_{n=-\infty}^{\infty}c_n\int_{l_0}^{l_0+T}e^{i{2{\pi}(n-m)\over L}l}dl=Lc_m $$

The origin of the axis is fundamentally an arbitrary position. So let’s set it to the start of the image such that \(l_0=0\). So we can find the “magnitude” of the frequency \(2{\pi}m/L\) within \(f(l)\) through the relation:

$$c_m={1\over L}\int_{0}^{L}f(l)e^{-i{2{\pi}m\over L}l}dl $$


Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

6.3.2.4 Fourier transform

In Fourier series, we had to assume that the function is periodic outside of the desired interval with a period of \(L\). Therefore, assuming that \(L\rightarrow\infty\) will allow us to work with any function. However, with this approximation, the fundamental frequency (\(\omega_0\)) or the frequency resolution that we discussed in Fourier series will tend to zero: \(\omega_0\rightarrow0\). In the equation to find \(c_m\), every \(m\) represented a frequency (multiple of \(\omega_0\)) and the integration on \(l\) removes the dependence of the right side of the equation on \(l\), making it only a function of \(m\) or frequency. Let’s define the following two variables:

$$\omega{\equiv}m\omega_0={2{\pi}m\over L}$$

$$F(\omega){\equiv}Lc_m$$

The equation to find the coefficients of each frequency in Fourier series thus becomes:

$$F(\omega)=\int_{-\infty}^{\infty}f(l)e^{-i{\omega}l}dl. $$

The function \(F(\omega)\) is thus the Fourier transform of \(f(l)\) in the frequency domain. So through this transformation, we can find (analyze) the magnitudes of the constituting frequencies or the value in the frequency space69 of our spatial input function. The great thing is that we can also do the reverse and later synthesize the input function from its Fourier transform. Let’s do it: with the approximations above, multiply the right side of the definition of the Fourier Series (Fourier series) with \(1=L/L=({\omega_0}L)/(2\pi)\):

$$f(l)={1\over 2\pi}\displaystyle\sum_{n=-\infty}^{\infty}Lc_ne^{{2{\pi}in\over L}l}\omega_0={1\over 2\pi}\displaystyle\sum_{n=-\infty}^{\infty}F(\omega)e^{i{\omega}l}\Delta\omega $$

To find the right most side of this equation, we renamed \(\omega_0\) as \(\Delta\omega\) because it was our resolution, \(2{\pi}n/L\) was written as \(\omega\) and finally, \(Lc_n\) was written as \(F(\omega)\) as we defined above. Now, as \(L\rightarrow\infty\), \(\Delta\omega\rightarrow0\) so we can write:

$$f(l)={1\over 2\pi}\int_{-\infty}^{\infty}F(\omega)e^{i{\omega}l}d\omega $$

Together, these two equations provide us with a very powerful set of tools that we can use to process (analyze) and recreate (synthesize) the input signal. Through the first equation, we can break up our input function into its constituent frequencies and analyze it, hence it is also known as analysis. Using the second equation, we can synthesize or make the input function from the known frequencies and their magnitudes. Thus it is known as synthesis. Here, we symbolize the Fourier transform (analysis) and its inverse (synthesis) of a function \(f(l)\) and its Fourier Transform \(F(\omega)\) as \({\cal F}[f]\) and \({\cal F}^{-1}[F]\).


Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

6.3.2.5 Dirac delta and comb

The Dirac \(\delta\) (delta) function (also known as an impulse) is the way that we convert a continuous function into a discrete one. It is defined to satisfy the following integral:

$$\int_{-\infty}^{\infty}\delta(l)dl=1$$

When integrated with another function, it gives that function’s value at \(l=0\):

$$\int_{-\infty}^{\infty}f(l)\delta(l)dt=f(0)$$

An impulse positioned at another point (say \(l_0\)) is written as \(\delta(l-l_0)\):

$$\int_{-\infty}^{\infty}f(l)\delta(l-l_0)dt=f(l_0)$$

The Dirac \(\delta\) function also operates similarly if we use summations instead of integrals. The Fourier transform of the delta function is:

$${\cal F}[\delta(l)]=\int_{-\infty}^{\infty}\delta(l)e^{-i{\omega}l}dl=e^{-i{\omega}0}=1$$

$${\cal F}[\delta(l-l_0)]=\int_{-\infty}^{\infty}\delta(l-l_0)e^{-i{\omega}l}dl=e^{-i{\omega}l_0}$$

From the definition of the Dirac \(\delta\) we can also define a Dirac comb (\({\rm III}_P\)) or an impulse train with infinite impulses separated by \(P\):

$${\rm III}_P(l)\equiv\displaystyle\sum_{k=-\infty}^{\infty}\delta(l-kP) $$

\(P\) is chosen to represent “pixel width” later in Sampling theorem. Therefore the Dirac comb is periodic with a period of \(P\). We have intentionally used a different name for the period of the Dirac comb compared to the input signal’s length of observation that we showed with \(L\) in Fourier series. This difference is highlighted here to avoid confusion later when these two periods are needed together in Discrete Fourier transform. The Fourier transform of the Dirac comb will be necessary in Sampling theorem, so let’s derive it. By its definition, it is periodic, with a period of \(P\), so the Fourier coefficients of its Fourier Series (Fourier series) can be calculated within one period:

$${\rm III}_P=\displaystyle\sum_{n=-\infty}^{\infty}c_ne^{i{2{\pi}n\over P}l}$$

We can now find the \(c_n\) from Fourier series:

$$c_n={1\over P}\int_{-P/2}^{P/2}\delta(l)e^{-i{2{\pi}n\over P}l} ={1\over P}\quad\quad \rightarrow \quad\quad {\rm III}_P={1\over P}\displaystyle\sum_{n=-\infty}^{\infty}e^{i{2{\pi}n\over P}l} $$

So we can write the Fourier transform of the Dirac comb as:

$${\cal F}[{\rm III}_P]=\int_{-\infty}^{\infty}{\rm III}_Pe^{-i{\omega}l}dl ={1\over P}\displaystyle\sum_{n=-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-i(\omega-{2{\pi}n\over P})l}dl={1\over P}\displaystyle\sum_{n=-\infty}^{\infty}\delta\left(\omega-{2{\pi}n\over P}\right) $$

In the last step, we used the fact that the complex exponential is a periodic function, that \(n\) is an integer and that as we defined in Fourier transform, \(\omega{\equiv}m\omega_0\), where \(m\) was an integer. The integral will be zero for any \(\omega\) that is not equal to \(2{\pi}n/P\), a more complete explanation can be seen in Fourier series. Therefore, while in the spatial domain the impulses had spacing of \(P\) (meters for example), in the frequency space, the spacing between the different impulses are \(2\pi/P\) cycles per meters.


Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

6.3.2.6 Convolution theorem

The convolution (shown with the \(\ast\) operator) of the two functions \(f(l)\) and \(h(l)\) is defined as:

$$c(l)\equiv[f{\ast}h](l)=\int_{-\infty}^{\infty}f(\tau)h(l-\tau)d\tau $$

See Convolution process for a more detailed physical (pixel based) interpretation of this definition. The Fourier transform of convolution (\(C(\omega)\)) can be written as:

$$ C(\omega)=\int_{-\infty}^{\infty}[f{\ast}h](l)e^{-i{\omega}l}dl= \int_{-\infty}^{\infty}f(\tau)\left[\int_{-\infty}^{\infty}h(l-\tau)e^{-i{\omega}l}dl\right]d\tau $$

To solve the inner integral, let’s define \(s{\equiv}l-\tau\), so that \(ds=dl\) and \(l=s+\tau\) then the inner integral becomes:

$$\int_{-\infty}^{\infty}h(l-\tau)e^{-i{\omega}l}dl= \int_{-\infty}^{\infty}h(s)e^{-i{\omega}(s+\tau)}ds=e^{-i{\omega}\tau}\int_{-\infty}^{\infty}h(s)e^{-i{\omega}s}ds=H(\omega)e^{-i{\omega}\tau} $$

where \(H(\omega)\) is the Fourier transform of \(h(l)\). Substituting this result for the inner integral above, we get:

$$C(\omega)=H(\omega)\int_{-\infty}^{\infty}f(\tau)e^{-i{\omega}\tau}d\tau=H(\omega)F(\omega)=F(\omega)H(\omega) $$

where \(F(\omega)\) is the Fourier transform of \(f(l)\). So multiplying the Fourier transform of two functions individually, we get the Fourier transform of their convolution. The convolution theorem also proves a relation between the convolutions in the frequency space. Let’s define:

$$D(\omega){\equiv}F(\omega){\ast}H(\omega)$$

Applying the inverse Fourier Transform or synthesis equation (Fourier transform) to both sides and following the same steps above, we get:

$$d(l)=f(l)h(l)$$

Where \(d(l)\) is the inverse Fourier transform of \(D(\omega)\). We can therefore re-write the two equations above formally as the convolution theorem:

$$ {\cal F}[f{\ast}h]={\cal F}[f]{\cal F}[h] $$

$$ {\cal F}[fh]={\cal F}[f]\ast{\cal F}[h] $$

Besides its usefulness in blurring an image by convolving it with a given kernel, the convolution theorem also enables us to do another very useful operation in data analysis: to match the blur (or PSF) between two images taken with different telescopes/cameras or under different atmospheric conditions. This process is also known as de-convolution. Let’s take \(f(l)\) as the image with a narrower PSF (less blurry) and \(c(l)\) as the image with a wider PSF which appears more blurred. Also let’s take \(h(l)\) to represent the kernel that should be convolved with the sharper image to create the more blurry image. Above, we proved the relation between these three images through the convolution theorem. But there, we assumed that \(f(l)\) and \(h(l)\) are known (given) and the convolved image is desired.

In de-convolution, we have \(f(l)\) –the sharper image– and \(f*h(l)\) –the more blurry image– and we want to find the kernel \(h(l)\). The solution is a direct result of the convolution theorem:

$$ {\cal F}[h]={{\cal F}[f{\ast}h]\over {\cal F}[f]} \quad\quad {\rm or} \quad\quad h(l)={\cal F}^{-1}\left[{{\cal F}[f{\ast}h]\over {\cal F}[f]}\right] $$

While this works really nice, it has two problems:

A standard solution to both these problems is the Weiner de-convolution algorithm70.


Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

6.3.2.7 Sampling theorem

Our mathematical functions are continuous, however, our data collecting and measuring tools are discrete. Here we want to give a mathematical formulation for digitizing the continuous mathematical functions so that later, we can retrieve the continuous function from the digitized recorded input. Assuming that we have a continuous function \(f(l)\), then we can define \(f_s(l)\) as the ‘sampled’ \(f(l)\) through the Dirac comb (see Dirac delta and comb):

$$f_s(l)=f(l){\rm III}_P=\displaystyle\sum_{n=-\infty}^{\infty}f(l)\delta(l-nP) $$

The discrete data-element \(f_k\) (for example, a pixel in an image), where \(k\) is an integer, can thus be represented as:

$$f_k=\int_{-\infty}^{\infty}f_s(l)dl=\int_{-\infty}^{\infty}f(l)\delta(l-kP)dt=f(kP)$$

Note that in practice, our discrete data points are not found in this fashion. Each detector pixel (in an image for example) has an area and averages the signal it receives over that area, not a mathematical point as the Dirac \(\delta\) function defines. However, as long as the variation in the signal over one detector pixel is not significant, this can be a good approximation. Having put this issue to the side, we can now try to find the relation between the Fourier transforms of the un-sampled \(f(l)\) and the sampled \(f_s(l)\). For a more clear notation, let’s define:

$$F_s(\omega)\equiv{\cal F}[f_s]$$

$$D(\omega)\equiv{\cal F}[{\rm III}_P]$$

Then using the Convolution theorem (see Convolution theorem), \(F_s(\omega)\) can be written as:

$$F_s(\omega)={\cal F}[f(l){\rm III}_P]=F(\omega){\ast}D(\omega)$$

Finally, from the definition of convolution and the Fourier transform of the Dirac comb (see Dirac delta and comb), we get:

$$\eqalign{ F_s(\omega) &= \int_{-\infty}^{\infty}F(\omega)D(\omega-\mu)d\mu \cr &= {1\over P}\displaystyle\sum_{n=-\infty}^{\infty}\int_{-\infty}^{\infty}F(\omega)\delta\left(\omega-\mu-{2{\pi}n\over P}\right)d\mu \cr &= {1\over P}\displaystyle\sum_{n=-\infty}^{\infty}F\left( \omega-{2{\pi}n\over P}\right).\cr } $$

\(F(\omega)\) was only a simple function, see Figure 6.3(left). However, from the sampled Fourier transform function we see that \(F_s(\omega)\) is the superposition of infinite copies of \(F(\omega)\) that have been shifted, see Figure 6.3(right). From the equation, it is clear that the shift in each copy is \(2\pi/P\).

gnuastro-figures/samplingfreq

Figure 6.3: Sampling causes infinite repetition in the frequency domain. FT is an abbreviation for ‘Fourier transform’. \(\omega_m\) represents the maximum frequency present in the input. \(F(\omega)\) is only symmetric on both sides of 0 when the input is real (not complex). In general \(F(\omega)\) is complex and thus cannot be simply plotted like this. Here we have assumed a real Gaussian \(f(t)\) which has produced a Gaussian \(F(\omega)\).

The input \(f(l)\) can have any distribution of frequencies in it. In the example of Figure 6.3(left), the input consisted of a range of frequencies equal to \(\Delta\omega=2\omega_m\). Fortunately as Figure 6.3(right) shows, the assumed pixel size (\(P\)) we used to sample this hypothetical function was such that \(2\pi/P>\Delta\omega\). The consequence is that each copy of \(F(\omega)\) has become completely separate from the surrounding copies. Such a digitized (sampled) data set is thus called over-sampled. When \(2\pi/P=\Delta\omega\), \(P\) is just small enough to finely separate even the largest frequencies in the input signal and thus it is known as critically-sampled. Finally if \(2\pi/P<\Delta\omega\) we are dealing with an under-sampled data set. In an under-sampled data set, the separate copies of \(F(\omega)\) are going to overlap and this will deprive us of recovering high constituent frequencies of \(f(l)\). The effects of under-sampling in an image with high rates of change (for example a brick wall imaged from a distance) can clearly be visually seen and is known as aliasing.

When the input \(f(l)\) is composed of a finite range of frequencies, \(f(l)\) is known as a band-limited function. The example in Figure 6.3(left) was a nice demonstration of such a case: for all \(\omega<-\omega_m\) or \(\omega>\omega_m\), we have \(F(\omega)=0\). Therefore, when the input function is band-limited and our detector’s pixels are placed such that we have critically (or over-) sampled it, then we can exactly reproduce the continuous \(f(l)\) from the discrete or digitized samples. To do that, we just have to isolate one copy of \(F(\omega)\) from the infinite copies and take its inverse Fourier transform.

This ability to exactly reproduce the continuous input from the sampled or digitized data leads us to the sampling theorem which connects the inherent property of the continuous signal (its maximum frequency) to that of the detector (the spacing between its pixels). The sampling theorem states that the full (continuous) signal can be recovered when the pixel size (\(P\)) and the maximum constituent frequency in the signal (\(\omega_m\)) have the following relation71:

$${2\pi\over P}>2\omega_m$$

This relation was first formulated by Harry Nyquist (1889 – 1976 A.D.) in 1928 and formally proved in 1949 by Claude E. Shannon (1916 – 2001 A.D.) in what is now known as the Nyquist-Shannon sampling theorem. In signal processing, the signal is produced (synthesized) by a transmitter and is received and de-coded (analyzed) by a receiver. Therefore producing a band-limited signal is necessary.

In astronomy, we do not produce the shapes of our targets, we are only observers. Galaxies can have any shape and size, therefore ideally, our signal is not band-limited. However, since we are always confined to observing through an aperture, the aperture will cause a point source (for which \(\omega_m=\infty\)) to be spread over several pixels. This spread is quantitatively known as the point spread function or PSF. This spread does blur the image which is undesirable; however, for this analysis it produces the positive outcome that there will be a finite \(\omega_m\). Though we should caution that any detector will have noise which will add lots of very high frequency (ideally infinite) changes between the pixels. However, the coefficients of those noise frequencies are usually exceedingly small.


Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

6.3.2.8 Discrete Fourier transform

As we have stated several times so far, the input image is a digitized, pixelated or discrete array of values (\(f_s(l)\), see Sampling theorem). The input is not a continuous function. Also, all our numerical calculations can only be done on a sampled, or discrete Fourier transform. Note that \(F_s(\omega)\) is not discrete, it is continuous. One way would be to find the analytic \(F_s(\omega)\), then sample it at any desired “freq-pixel”72 spacing. However, this process would involve two steps of operations and computers in particular are not too good at analytic operations for the first step. So here, we will derive a method to directly find the ‘freq-pixel’ated \(F_s(\omega)\) from the pixelated \(f_s(l)\). Let’s start with the definition of the Fourier transform (see Fourier transform):

$$F_s(\omega)=\int_{-\infty}^{\infty}f_s(l)e^{-i{\omega}l}dl $$

From the definition of \(f_s(\omega)\) (using \(x\) instead of \(n\)) we get:

$$\eqalign{ F_s(\omega) &= \displaystyle\sum_{x=-\infty}^{\infty} \int_{-\infty}^{\infty}f(l)\delta(l-xP)e^{-i{\omega}l}dl \cr &= \displaystyle\sum_{x=-\infty}^{\infty} f_xe^{-i{\omega}xP} } $$

Where \(f_x\) is the value of \(f(l)\) on the point \(x\) or the value of the \(x\)th pixel. As shown in Sampling theorem this function is infinitely periodic with a period of \(2\pi/P\). So all we need is the values within one period: \(0<\omega<2\pi/P\), see Figure 6.3. We want \(X\) samples within this interval, so the frequency difference between each frequency sample or freq-pixel is \(1/XP\). Hence we will evaluate the equation above on the points at:

$$\omega={u\over XP} \quad\quad u = 0, 1, 2, ..., X-1$$

Therefore the value of the freq-pixel \(u\) in the frequency domain is:

$$F_u=\displaystyle\sum_{x=0}^{X-1} f_xe^{-i{ux\over X}} $$

Therefore, we see that for each freq-pixel in the frequency domain, we are going to need all the pixels in the spatial domain73. If the input (spatial) pixel row is also \(X\) pixels wide, then we can exactly recover the \(x\)th pixel with the following summation:

$$f_x={1\over X}\displaystyle\sum_{u=0}^{X-1} F_ue^{i{ux\over X}} $$

When the input pixel row (we are still only working on 1D data) has \(X\) pixels, then it is \(L=XP\) spatial units wide. \(L\), or the length of the input data was defined in Fourier series and \(P\) or the space between the pixels in the input was defined in Dirac delta and comb. As we saw in Sampling theorem, the input (spatial) pixel spacing (\(P\)) specifies the range of frequencies that can be studied and in Fourier series we saw that the length of the (spatial) input, (\(L\)) determines the resolution (or size of the freq-pixels) in our discrete Fourier transformed image. Both result from the fact that the frequency domain is the inverse of the spatial domain.


Next: , Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

6.3.2.9 Fourier operations in two dimensions

Once all the relations in the previous sections have been clearly understood in one dimension, it is very easy to generalize them to two or even more dimensions since each dimension is by definition independent. Previously we defined \(l\) as the continuous variable in 1D and the inverse of the period in its direction to be \(\omega\). Let’s show the second spatial direction with \(m\) the the inverse of the period in the second dimension with \(\nu\). The Fourier transform in 2D (see Fourier transform) can be written as:

$$F(\omega, \nu)=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} f(l, m)e^{-i({\omega}l+{\nu}m)}dl$$

$$f(l, m)=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} F(\omega, \nu)e^{i({\omega}l+{\nu}m)}dl$$

The 2D Dirac \(\delta(l,m)\) is non-zero only when \(l=m=0\). The 2D Dirac comb (or Dirac brush! See Dirac delta and comb) can be written in units of the 2D Dirac \(\delta\). For most image detectors, the sides of a pixel are equal in both dimensions. So \(P\) remains unchanged, if a specific device is used which has non-square pixels, then for each dimension a different value should be used.

$${\rm III}_P(l, m)\equiv\displaystyle\sum_{j=-\infty}^{\infty} \displaystyle\sum_{k=-\infty}^{\infty} \delta(l-jP, m-kP) $$

The Two dimensional Sampling theorem (see Sampling theorem) is thus very easily derived as before since the frequencies in each dimension are independent. Let’s take \(\nu_m\) as the maximum frequency along the second dimension. Therefore the two dimensional sampling theorem says that a 2D band-limited function can be recovered when the following conditions hold74:

$${2\pi\over P} > 2\omega_m \quad\quad\quad {\rm and} \quad\quad\quad {2\pi\over P} > 2\nu_m$$

Finally, let’s represent the pixel counter on the second dimension in the spatial and frequency domains with \(y\) and \(v\) respectively. Also let’s assume that the input image has \(Y\) pixels on the second dimension. Then the two dimensional discrete Fourier transform and its inverse (see Discrete Fourier transform) can be written as:

$$F_{u,v}=\displaystyle\sum_{x=0}^{X-1}\displaystyle\sum_{y=0}^{Y-1} f_{x,y}e^{-i({ux\over X}+{vy\over Y})} $$

$$f_{x,y}={1\over XY}\displaystyle\sum_{u=0}^{X-1}\displaystyle\sum_{v=0}^{Y-1} F_{u,v}e^{i({ux\over X}+{vy\over Y})} $$


Previous: , Up: Frequency domain and Fourier operations   [Contents][Index]

6.3.2.10 Edges in the frequency domain

With a good grasp of the frequency domain, we can revisit the problem of convolution on the image edges, see Edges in the spatial domain. When we apply the convolution theorem (see Convolution theorem) to convolve an image, we first take the discrete Fourier transforms (DFT, Discrete Fourier transform) of both the input image and the kernel, then we multiply them with each other and then take the inverse DFT to construct the convolved image. Of course, in order to multiply them with each other in the frequency domain, the two images have to be the same size, so let’s assume that we pad the kernel (it is usually smaller than the input image) with zero valued pixels in both dimensions so it becomes the same size as the input image before the DFT.

Having multiplied the two DFTs, we now apply the inverse DFT which is where the problem is usually created. If the DFT of the kernel only had values of 1 (unrealistic condition!) then there would be no problem and the inverse DFT of the multiplication would be identical with the input. However in real situations, the kernel’s DFT has a maximum of 1 (because the sum of the kernel has to be one, see Convolution process) and decreases something like the hypothetical profile of Figure 6.3. So when multiplied with the input image’s DFT, the coefficients or magnitudes (see Circles and the complex plane) of the smallest frequency (or the sum of the input image pixels) remains unchanged, while the magnitudes of the higher frequencies are significantly reduced.

As we saw in Sampling theorem, the Fourier transform of a discrete input will be infinitely repeated. In the final inverse DFT step, the input is in the frequency domain (the multiplied DFT of the input image and the kernel DFT). So the result (our output convolved image) will be infinitely repeated in the spatial domain. In order to accurately reconstruct the input image, we need all the frequencies with the correct magnitudes. However, when the magnitudes of higher frequencies are decreased, longer periods (shorter frequencies) will dominate in the reconstructed pixel values. Therefore, when constructing a pixel on the edge of the image, the newly empowered longer periods will look beyond the input image edges and will find the repeated input image there. So if you convolve an image in this fashion using the convolution theorem, when a bright object exists on one edge of the image, its blurred wings will be present on the other side of the convolved image. This is often termed as circular convolution or cyclic convolution.

So, as long as we are dealing with convolution in the frequency domain, there is nothing we can do about the image edges. The least we can do is to eliminate the ghosts of the other side of the image. So, we add zero valued pixels to both the input image and the kernel in both dimensions so the image that will be convolved has a size equal to the sum of both images in each dimension. Of course, the effect of this zero-padding is that the sides of the output convolved image will become dark. To put it another way, the edges are going to drain the flux from nearby objects. But at least it is consistent across all the edges of the image and is predictable. In Convolve, you can see the padded images when inspecting the frequency domain convolution steps with the --viewfreqsteps option.


Next: , Previous: , Up: Convolve   [Contents][Index]

6.3.3 Spatial vs. Frequency domain

With the discussions above it might not be clear when to choose the spatial domain and when to choose the frequency domain. Here we will try to list the benefits of each.

The spatial domain,

The frequency domain,

As a general rule of thumb, when working on an image of modeled profiles use the frequency domain and when working on an image of real (observed) objects use the spatial domain (corrected for the edges). The reason is that if you apply a frequency domain convolution to a real image, you are going to loose information on the edges and generally you don’t want large kernels. But when you have made the profiles in the image yourself, you can just make a larger input image and crop the central parts to completely remove the edge effect, see If convolving afterwards. Also due to oversampling, both the kernels and the images can become very large and the speed boost of frequency domain convolution will significantly improve the processing time, see Oversampling.


Next: , Previous: , Up: Convolve   [Contents][Index]

6.3.4 Convolution kernel

All the programs that need convolution will need to be given a convolution kernel file and extension. In most cases (other than Convolve, see Convolve) the kernel file name is optional. However, the extension is necessary and must be specified either on the command-line or at least one of the configuration files (see Configuration files). Within Gnuastro, there are two ways to create a kernel image:

The two options to specify a kernel file name and its extension are shown below. These are common between all the programs that will do convolution.

-k STR
--kernel=STR

The convolution kernel file name. The BITPIX (data type) value of this file can be any standard type and it does not necessarily have to be normalized. Several operations will be done on the kernel image prior to the program’s processing:

-U STR
--khdu=STR

The convolution kernel HDU. Although the kernel file name is optional, before running any of the programs, they need to have a value for --khdu even if the default kernel is to be used. So be sure to keep its value in at least one of the configuration files (see Configuration files). By default, the system configuration file has a value.


Previous: , Up: Convolve   [Contents][Index]

6.3.5 Invoking Convolve

Convolve an input image with a known kernel or make the kernel necessary to match two PSFs. The general template for Convolve is:

$ astconvolve [OPTION...] ASTRdata

One line examples:

## Convolve mockimg.fits with psf.fits:
$ astconvolve --kernel=psf.fits mockimg.fits

## Convolve in the spatial domain:
$ astconvolve observedimg.fits --kernel=psf.fits --domain=spatial

## Find the kernel to match sharper and blurry PSF images:
$ astconvolve --kernel=sharperimage.fits --makekernel=10           \
              blurryimage.fits

The only argument accepted by Convolve is an input image file. Some of the options are the same between Convolve and some other Gnuastro programs. Therefore, to avoid repetition, they will not be repeated here. For the full list of options shared by all Gnuastro programs, please see Common options. In particular, in the spatial domain convolve uses Gnuastro’s tessellation, see Tessellation and the common options related to that in Processing options.

Here we will only explain the options particular to Convolve. Run Convolve with --help in order to see the full list of options Convolve accepts, irrespective of where they are explained in this book.

--nokernelflip

Do not flip the kernel after reading it the spatial domain convolution. This can be useful if the flipping has already been applied to the kernel.

--nokernelnorm

Do not normalize the kernel after reading it, such that the sum of its pixels is unity.

-d STR
--domain=STR

The domain to use for the convolution. The acceptable values are ‘spatial’ and ‘frequency’, corresponding to the respective domain.

For large images, the frequency domain process will be more efficient than convolving in the spatial domain. However, the edges of the image will loose some flux (see Edges in the spatial domain) and the image must not contain any blank pixels, see Spatial vs. Frequency domain.

--checkfreqsteps

With this option a file with the initial name of the output file will be created that is suffixed with _freqsteps.fits, all the steps done to arrive at the final convolved image are saved as extensions in this file. The extensions in order are:

  1. The padded input image. In frequency domain convolution the two images (input and convolved) have to be the same size and both should be padded by zeros.
  2. The padded kernel, similar to the above.
  3. The Fourier spectrum of the forward Fourier transform of the input image. Note that the Fourier transform is a complex operation (and not view able in one image!) So we either have to show the ‘Fourier spectrum’ or the ‘Phase angle’. For the complex number \(a+ib\), the Fourier spectrum is defined as \(\sqrt{a^2+b^2}\) while the phase angle is defined as \(\arctan(b/a)\).
  4. The Fourier spectrum of the forward Fourier transform of the kernel image.
  5. The Fourier spectrum of the multiplied (through complex arithmetic) transformed images.
  6. The inverse Fourier transform of the multiplied image. If you open it, you will see that the convolved image is now in the center, not on one side of the image as it started with (in the padded image of the first extension). If you are working on a mock image which originally had pixels of precisely 0.0, you will notice that in those parts that your convolved profile(s) did not convert, the values are now \(\sim10^{-18}\), this is due to floating-point round off errors. Therefore in the final step (when cropping the central parts of the image), we also remove any pixel with a value less than \(10^{-17}\).
--noedgecorrection

Do not correct the edge effect in spatial domain convolution. For a full discussion, please see Edges in the spatial domain.

-m INT
--makekernel=INT

(=INT) If this option is called, Convolve will do de-convolution (see Convolution theorem). The image specified by the --kernel option is assumed to be the sharper (less blurry) image and the input image is assumed to be the more blurry image. The value given to this option will be used as the maximum radius of the kernel. Any pixel in the final kernel that is larger than this distance from the center will be set to zero. The two images must have the same size.

Noise has large frequencies which can make the result less reliable for the higher frequencies of the final result. So all the frequencies which have a spectrum smaller than the value given to the minsharpspec option in the sharper input image are set to zero and not divided. This will cause the wings of the final kernel to be flatter than they would ideally be which will make the convolved image result unreliable if it is too high. Some notes to take into account for a good result:

-c
--minsharpspec

(=FLT) The minimum frequency spectrum (or coefficient, or pixel value in the frequency domain image) to use in deconvolution, see the explanations under the --makekernel option for more information.


Previous: , Up: Data manipulation   [Contents][Index]

6.4 Warp

Image warping is the process of mapping the pixels of one image onto a new pixel grid. This process is sometimes known as transformation, however following the discussion of Heckbert 198975 we will not be using that term because it can be confused with only pixel value or flux transformations. Here we specifically mean the pixel grid transformation which is better conveyed with ‘warp’.

Image wrapping is a very important step in astronomy, both in observational data analysis and in simulating modeled images. In modeling, warping an image is necessary when we want to apply grid transformations to the initial models, for example in simulating gravitational lensing (Radial warpings are not yet included in Warp). Observational reasons for warping an image are listed below:


Next: , Previous: , Up: Warp   [Contents][Index]

6.4.1 Warping basics

Let’s take \(\left[\matrix{u&v}\right]\) as the coordinates of a point in the input image and \(\left[\matrix{x&y}\right]\) as the coordinates of that same point in the output image77. The simplest form of coordinate transformation (or warping) is the scaling of the coordinates, let’s assume we want to scale the first axis by \(M\) and the second by \(N\), the output coordinates of that point can be calculated by

$$\left[\matrix{x\cr y}\right]= \left[\matrix{Mu\cr Nv}\right]= \left[\matrix{M&0\cr0&N}\right]\left[\matrix{u\cr v}\right]$$

Note that these are matrix multiplications. We thus see that we can represent any such grid warping as a matrix. Another thing we can do with this \(2\times2\) matrix is to rotate the output coordinate around the common center of both coordinates. If the output is rotated anticlockwise by \(\theta\) degrees from the positive (to the right) horizontal axis, then the warping matrix should become:

$$\left[\matrix{x\cr y}\right]= \left[\matrix{ucos\theta-vsin\theta\cr usin\theta+vcos\theta}\right]= \left[\matrix{cos\theta&-sin\theta\cr sin\theta&cos\theta}\right] \left[\matrix{u\cr v}\right] $$

We can also flip the coordinates around the first axis, the second axis and the coordinate center with the following three matrices respectively:

$$\left[\matrix{1&0\cr0&-1}\right]\quad\quad \left[\matrix{-1&0\cr0&1}\right]\quad\quad \left[\matrix{-1&0\cr0&-1}\right]$$

The final thing we can do with this definition of a \(2\times2\) warping matrix is shear. If we want the output to be sheared along the first axis with \(A\) and along the second with \(B\), then we can use the matrix:

$$\left[\matrix{1&A\cr B&1}\right]$$

To have one matrix representing any combination of these steps, you use matrix multiplication, see Merging multiple warpings. So any combinations of these transformations can be displayed with one \(2\times2\) matrix:

$$\left[\matrix{a&b\cr c&d}\right]$$

The transformations above can cover a lot of the needs of most coordinate transformations. However they are limited to mapping the point \([\matrix{0&0}]\) to \([\matrix{0&0}]\). Therefore they are useless if you want one coordinate to be shifted compared to the other one. They are also space invariant, meaning that all the coordinates in the image will receive the same transformation. In other words, all the pixels in the output image will have the same area if placed over the input image. So transformations which require varying output pixel sizes like projections cannot be applied through this \(2\times2\) matrix either (for example for the tilted ACS and WFC3 camera detectors on board the Hubble space telescope).

To add these further capabilities, namely translation and projection, we use the homogeneous coordinates. They were defined about 200 years ago by August Ferdinand Möbius (1790 – 1868). For simplicity, we will only discuss points on a 2D plane and avoid the complexities of higher dimensions. We cannot provide a deep mathematical introduction here, interested readers can get a more detailed explanation from Wikipedia78 and the references therein.

By adding an extra coordinate to a point we can add the flexibility we need. The point \([\matrix{x&y}]\) can be represented as \([\matrix{xZ&yZ&Z}]\) in homogeneous coordinates. Therefore multiplying all the coordinates of a point in the homogeneous coordinates with a constant will give the same point. Put another way, the point \([\matrix{x&y&Z}]\) corresponds to the point \([\matrix{x/Z&y/Z}]\) on the constant \(Z\) plane. Setting \(Z=1\), we get the input image plane, so \([\matrix{u&v&1}]\) corresponds to \([\matrix{u&v}]\). With this definition, the transformations above can be generally written as:

$$\left[\matrix{x\cr y\cr 1}\right]= \left[\matrix{a&b&0\cr c&d&0\cr 0&0&1}\right] \left[\matrix{u\cr v\cr 1}\right]$$

We thus acquired 4 extra degrees of freedom. By giving non-zero values to the zero valued elements of the last column we can have translation (try the matrix multiplication!). In general, any coordinate transformation that is represented by the matrix below is known as an affine transformation79:

$$\left[\matrix{a&b&c\cr d&e&f\cr 0&0&1}\right]$$

We can now consider translation, but the affine transform is still spatially invariant. Giving non-zero values to the other two elements in the matrix above gives us the projective transformation or Homography80 which is the most general type of transformation with the \(3\times3\) matrix:

$$\left[\matrix{x'\cr y'\cr w}\right]= \left[\matrix{a&b&c\cr d&e&f\cr g&h&1}\right] \left[\matrix{u\cr v\cr 1}\right]$$

So the output coordinates can be calculated from:

$$x={x' \over w}={au+bv+c \over gu+hv+1}\quad\quad\quad\quad y={y' \over w}={du+ev+f \over gu+hv+1}$$

Thus with Homography we can change the sizes of the output pixels on the input plane, giving a ‘perspective’-like visual impression. This can be quantitatively seen in the two equations above. When \(g=h=0\), the denominator is independent of \(u\) or \(v\) and thus we have spatial invariance. Homography preserves lines at all orientations. A very useful fact about Homography is that its inverse is also a Homography. These two properties play a very important role in the implementation of this transformation. A short but instructive and illustrated review of affine, projective and also bi-linear mappings is provided in Heckbert 198981.


Next: , Previous: , Up: Warp   [Contents][Index]

6.4.2 Merging multiple warpings

In Warping basics we saw how a basic warp/transformation can be represented with a matrix. To make more complex warpings (for example to define a translation, rotation and scale as one warp) the individual matrices have to be multiplied through matrix multiplication. However matrix multiplication is not commutative, so the order of the set of matrices you use for the multiplication is going to be very important.

The first warping should be placed as the left-most matrix. The second warping to the right of that and so on. The second transformation is going to occur on the warped coordinates of the first. As an example for merging a few transforms into one matrix, the multiplication below represents the rotation of an image about a point \([\matrix{U&V}]\) anticlockwise from the horizontal axis by an angle of \(\theta\). To do this, first we take the origin to \([\matrix{U&V}]\) through translation. Then we rotate the image, then we translate it back to where it was initially. These three operations can be merged in one operation by calculating the matrix multiplication below:

$$\left[\matrix{1&0&U\cr0&1&V\cr{}0&0&1}\right] \left[\matrix{cos\theta&-sin\theta&0\cr sin\theta&cos\theta&0\cr 0&0&1}\right] \left[\matrix{1&0&-U\cr0&1&-V\cr{}0&0&1}\right]$$


Next: , Previous: , Up: Warp   [Contents][Index]

6.4.3 Resampling

A digital image is composed of discrete ‘picture elements’ or ‘pixels’. When a real image is created from a camera or detector, each pixel’s area is used to store the number of photo-electrons that were created when incident photons collided with that pixel’s surface area. This process is called the ‘sampling’ of a continuous or analog data into digital data. When we change the pixel grid of an image or warp it as we defined in Warping basics, we have to ‘guess’ the flux value of each pixel on the new grid based on the old grid, or re-sample it. Because of the ‘guessing’, any form of warping on the data is going to degrade the image and mix the original pixel values with each other. So if an analysis can be done on an un-warped data image, it is best to leave the image untouched and pursue the analysis. However as discussed in Warp this is not possible most of the times, so we have to accept the problem and re-sample the image.

In most applications of image processing, it is sufficient to consider each pixel to be a point and not an area. This assumption can significantly speed up the processing of an image and also the simplicity of the code. It is a fine assumption when the signal to noise ratio of the objects are very large. The question will then be one of interpolation because you have multiple points distributed over the output image and you want to find the values at the pixel centers. To increase the accuracy, you might also sample more than one point from within a pixel giving you more points for a more accurate interpolation in the output grid.

However, interpolation has several problems. The first one is that it will depend on the type of function you want to assume for the interpolation. For example you can choose a bi-linear or bi-cubic (the ‘bi’s are for the 2 dimensional nature of the data) interpolation method. For the latter there are various ways to set the constants82. Such functional interpolation functions can fail seriously on the edges of an image. They will also need normalization so that the flux of the objects before and after the warpings are comparable. The most basic problem with such techniques is that they are based on a point while a detector pixel is an area. They add a level of subjectivity to the data (make more assumptions through the functions than the data can handle). For most applications this is fine, but in scientific applications where detection of the faintest possible galaxies or fainter parts of bright galaxies is our aim, we cannot afford this loss. Because of these reasons Warp will not use such interpolation techniques.

Warp will do interpolation based on “pixel mixing”83 or “area resampling”. This is also what the Hubble Space Telescope pipeline calls “Drizzling”84. This technique requires no functions, it is thus non-parametric. It is also the closest we can get (make least assumptions) to what actually happens on the detector pixels. The basic idea is that you reverse-transform each output pixel to find which pixels of the input image it covers and what fraction of the area of the input pixels are covered. To find the output pixel value, you simply sum the value of each input pixel weighted by the overlap fraction (between 0 to 1) of the output pixel and that input pixel. Through this process, pixels are treated as an area not as a point (which is how detectors create the image), also the brightness (see Flux Brightness and magnitude) of an object will be left completely unchanged.

If there are very high spatial-frequency signals in the image (for example fringes) which vary on a scale smaller than your output image pixel size, pixel mixing can cause ailiasing85. So if the input image has fringes, they have to be calculated and removed separately (which would naturally be done in any astronomical application). Because of the PSF no astronomical target has a sharp change in the signal so this issue is less important for astronomical applications, see Point Spread Function.


Previous: , Up: Warp   [Contents][Index]

6.4.4 Invoking Warp

Warp an input dataset into a new grid. Any homographic warp (for example scaling, rotation, translation, projection) is acceptable, see Warping basics for the definitions. The general template for invoking Warp is:

$ astwarp [OPTIONS...] InputImage

One line examples:

## Rotate and then scale input image:
$ astwarp --rotate=37.92 --scale=0.8 image.fits

## Scale, then translate the input image:
$ astwarp --scale 8/3 --translate 2.1 image.fits

## Align raw image with celestial coordinates:
$ astwarp --align rawimage.fits --output=aligned.fits

## Directly input a custom warping matrix (using fraction):
$ astwarp --matrix=1/5,0,4/10,0,1/5,4/10,0,0,1 image.fits

## Directly input a custom warping matrix, with final numbers:
$ astwarp --matrix="0.7071,-0.7071,  0.7071,0.7071" image.fits

If any processing is to be done, Warp can accept one file as input. As in all Gnuastro programs, when an output is not explicitly set with the --output option, the output filename will be set automatically based on the operation, see Automatic output. For the full list of general options to all Gnuastro programs (including Warp), please see Common options.

To be the most accurate, the input image will be read as a 64-bit double precision floating point dataset and all internal processing is done in this format (including the raw output type). You can use the common --type option to write the output in any type you want, see Numeric data types.

Warps must be specified as command-line options, either as (possibly multiple) modular warpings (for example --rotate, or --scale), or directly as a single raw matrix (with --matrix). If specified together, the latter (direct matrix) will take precedence and all the modular warpings will be ignored. Any number of modular warpings can be specified on the command-line and configuration files. If more than one modular warping is given, all will be merged to create one warping matrix. As described in Merging multiple warpings, matrix multiplication is not commutative, so the order of specifying the modular warpings on the command-line, and/or configuration files makes a difference (see Configuration file precedence). The full list of modular warpings and the other options particular to Warp are described below.

The values to the warping options (modular warpings as well as --matrix), are a sequence of at least one number. Each number in this sequence is separated from the next by a comma (,). Each number can also be written as a single fraction (with a forward-slash / between the numerator and denominator). Space and Tab characters are permitted between any two numbers, just don’t forget to quote the whole value. Otherwise, the value will not be fully passed onto the option. See the examples above as a demonstration.

Based on the FITS standard, integer values are assigned to the center of a pixel and the coordinate [1.0, 1.0] is the center of the first pixel (bottom left of the image when viewed in SAO ds9). So the coordinate center [0.0, 0.0] is half a pixel away (in each axis) from the bottom left vertex of the first pixel. The resampling that is done in Warp (see Resampling) is done on the coordinate axes and thus directly depends on the coordinate center. In some situations this if fine, for example when rotating/aligning a real image, all the edge pixels will be similarly affected. But in other situations (for example when scaling an over-sampled mock image to its intended resolution, this is not desired: you want the center of the coordinates to be on the corner of the pixel. In such cases, you can use the --centeroncorner option which will shift the center by \(0.5\) before the main warp, then shift it back by \(-0.5\) after the main warp, see below.

-a
--align

Align the image and celestial (WCS) axes given in the input. After it, the vertical image direction (when viewed in SAO ds9) corresponds to the declination and the horizontal axis is the inverse of the Right Ascension (RA). The inverse of the RA is chosen so the image can correspond to what you would actually see on the sky and is common in most survey images.

Align is internally treated just like a rotation (--rotation), but uses the input image’s WCS to find the rotation angle. Thus, if you have rotated the image before calling --align, you might get unexpected results (because the rotation is defined on the original WCS).

-r FLT
--rotate=FLT

Rotate the input image by the given angle in degrees: \(\theta\) in Warping basics. Note that commonly, the WCS structure of the image is set such that the RA is the inverse of the image horizontal axis which increases towards the right in the FITS standard and as viewed by SAO ds9. So the default center for rotation is on the right of the image. If you want to rotate about other points, you have to translate the warping center first (with --translate) then apply your rotation and then return the center back to the original position (with another call to --translate, see Merging multiple warpings.

-s FLT[,FLT]
--scale=FLT[,FLT]

Scale the input image by the given factor(s): \(M\) and \(N\) in Warping basics. If only one value is given, then both image axes will be scaled with the given value. When two values are given (separated by a comma), the first will be used to scale the first axis and the second will be used for the second axis. If you only need to scale one axis, use 1 for the axis you don’t need to scale. The value(s) can also be written (on the command-line or in configuration files) as a fraction.

-f FLT[,FLT]
--flip=FLT[,FLT]

Flip the input image around the given axis(s). If only one value is given, then both image axes are flipped. When two values are given (separated by a comma), you can choose which axis to flip over. --flip only takes values 0 (for no flip), or 1 (for a flip). Hence, if you want to flip by the second axis only, use --flip=0,1.

-e FLT[,FLT]
--shear=FLT[,FLT]

Shear the input image by the given value(s): \(A\) and \(B\) in Warping basics. If only one value is given, then both image axes will be sheared with the given value. When two values are given (separated by a comma), the first will be used to shear the first axis and the second will be used for the second axis. If you only need to shear along one axis, use 0 for the axis that must be untouched. The value(s) can also be written (on the command-line or in configuration files) as a fraction.

-t FLT[,FLT]
--translate=FLT[,FLT]

Translate (move the center of coordinates) the input image by the given value(s): \(c\) and \(f\) in Warping basics. If only one value is given, then both image axes will be translated by the given value. When two values are given (separated by a comma), the first will be used to translate the first axis and the second will be used for the second axis. If you only need to translate along one axis, use 0 for the axis that must be untouched. The value(s) can also be written (on the command-line or in configuration files) as a fraction.

-p FLT[,FLT]
--project=FLT[,FLT]

Apply a projection to the input image by the given values(s): \(g\) and \(h\) in Warping basics. If only one value is given, then projection will apply to both axes with the given value. When two values are given (separated by a comma), the first will be used to project the first axis and the second will be used for the second axis. If you only need to project along one axis, use 0 for the axis that must be untouched. The value(s) can also be written (on the command-line or in configuration files) as a fraction.

-m STR
--matrix=STR

The warp/transformation matrix. All the elements in this matrix must be separated by comas(,) characters and as described above, you can also use fractions (a forward-slash between two numbers). The transformation matrix can be either a 2 by 2 (4 numbers), or a 3 by 3 (9 numbers) array. In the former case (if a 2 by 2 matrix is given), then it is put into a 3 by 3 matrix (see Warping basics).

The determinant of the matrix has to be non-zero and it must not contain any non-number values (for example infinities or NaNs). The elements of the matrix have to be written row by row. So for the general Homography matrix of Warping basics, it should be called with --matrix=a,b,c,d,e,f,g,h,1.

The raw matrix takes precedence over all the modular warping options listed above, so if it is called with any number of modular warps, the latter are ignored.

-c
--centeroncorer

Put the center of coordinates on the corner of the first (bottom-left when viewed in SAO ds9) pixel. This option is applied after the final warping matrix has been finalized: either through modular warpings or the raw matrix. See the explanation above for coordinates in the FITS standard to better understand this option and when it should be used.

--hstartwcs=INT

Specify the first header keyword number (line) that should be used to read the WCS information, see the full explanation in Invoking Crop.

--hendwcs=INT

Specify the last header keyword number (line) that should be used to read the WCS information, see the full explanation in Invoking Crop.

-k
--keepwcs

Do not correct the WCS information of the input image and save it untouched to the output image. By default the WCS (World Coordinate System) information of the input image is going to be corrected in the output image so the objects in the image are at the same WCS coordinates. But in some cases it might be useful to keep it unchanged (for example to correct alignments).

-C FLT
--coveredfrac=FLT

Depending on the warp, the output pixels that cover pixels on the edge of the input image, or blank pixels in the input image, are not going to be fully covered by input data. With this option, you can specify the acceptable covered fraction of such pixels (any value between 0 and 1). If you only want output pixels that are fully covered by the input image area (and are not blank), then you can set --coveredfrac=1. Alternatively, a value of 0 will keep output pixels that are even infinitesimally covered by the input(so the sum of the pixels in the input and output images will be the same).


Next: , Previous: , Up: Top   [Contents][Index]

7 Data analysis

Astronomical datasets (images or tables) contain very valuable information, the tools in this section can help in analyzing, extracting, and quantifying that information. For example getting general or specific statistics of the dataset (with Statistics), detecting signal within a noisy dataset (with NoiseChisel), or creating a catalog from an input dataset (with MakeCatalog).


Next: , Previous: , Up: Data analysis   [Contents][Index]

7.1 Statistics

The distribution of values in a dataset can provide valuable information about it. For example, in an image, if it is a positively skewed distribution, we can see that there is significant data in the image. If the distribution is roughly symmetric, we can tell that there is no significant data in the image. In a table, when we need to select a sample of objects, it is important to first get a general view of the whole sample.

On the other hand, you might need to know certain statistical parameters of the dataset. For example, if we have run a detection algorithm on an image, and we want to see how accurate it was, one method is to calculate the average of the undetected pixels and see how reasonable it is (if detection is done correctly, the average of undetected pixels should be approximately equal to the background value, see Sky value). In a table, you might have calculated the magnitudes of a certain class of objects and want to get some general characteristics of the distribution immediately on the command-line (very fast!), to possibly change some parameters. The Statistics program is designed for such situations.


Next: , Previous: , Up: Statistics   [Contents][Index]

7.1.1 Histogram and Cumulative Frequency Plot

Histograms and the cumulative frequency plots are both used to visually study the distribution of a dataset. A histogram shows the number of data points which lie within pre-defined intervals (bins). So on the horizontal axis we have the bin centers and on the vertical, the number of points that are in that bin. You can use it to get a general view of the distribution: which values have been repeated the most? how close/far are the most significant bins? Are there more values in the larger part of the range of the dataset, or in the lower part? Similarly, many very important properties about the dataset can be deduced from a visual inspection of the histogram. In the Statistics program, the histogram can be either output to a table to plot with your favorite plotting program86, or it can be shown with ASCII characters on the command-line, which is very crude, but good enough for a fast and on-the-go analysis, see the example in Invoking Statistics.

The width of the bins is only necessary parameter for a histogram. In the limiting case that the bin-widths tend to zero (while assuming the number of points in the dataset tend to infinity), then the histogram will tend to the probability density function of the distribution. When the absolute number of points in each bin is not relevant to the study (only the shape of the histogram is important), you can normalize a histogram so like the probability density function, the sum of all its bins will be one.

In the cumulative frequency plot of a distribution, the horizontal axis is the sorted data values and the y axis is the index of each data in the sorted distribution. Unlike a histogram, a cumulative frequency plot does not involve intervals or bins. This makes it less prone to any sort of bias or error that a given bin-width would have on the analysis. When a larger number of the data points have roughly the same value, then the cumulative frequency plot will become steep in that vicinity. This occurs because on the horizontal axis, there is little change while on the vertical axis, the indexes constantly increase. Normalizing a cumulative frequency plot means to divide each index (y axis) by the total number of data points (or the last value).

Unlike the histogram which has a limited number of bins, ideally the cumulative frequency plot should have one point for every data element. Even in small datasets (for example a \(200\times200\) image) this will result in an unreasonably large number of points to plot (40000)! As a result, for practical reasons, it is common to only store its value on a certain number of points (intervals) in the input range rather than the whole dataset, so you should determine the number of bins you want when asking for a cumulative frequency plot. In Gnuastro (and thus the Statistics program), the number reported for each bin is the total number of data points until the larger interval value for that bin. You can see an example histogram and cumulative frequency plot of a single dataset under the --asciihist and --asciicfp options of Invoking Statistics.

So as a summary, both the histogram and cumulative frequency plot in Statistics will work with bins. Within each bin/interval, the lower value is considered to be within then bin (it is inclusive), but its larger value is not (it is exclusive). Formally, an interval/bin between a and b is represented by [a, b). When the over-all range of the dataset is specified (with the --greaterequal, --lessthan, or --qrange options), the acceptable values of the dataset are also defined with a similar inclusive-exclusive manner. But when the range is determined from the actual dataset (none of these options is called), the last element in the dataset is included in the last bin’s count.


Next: , Previous: , Up: Statistics   [Contents][Index]

7.1.2 Sigma clipping

Let’s assume that you have pure noise (centered on zero) with a clear Gaussian distribution, or see Photon counting noise. Now let’s assume you add very bright objects (signal) on the image which have a very sharp boundary. By a sharp boundary, we mean that there is a clear cutoff (from the noise) at the pixels the objects finish. In other words, at their boundaries, the objects do not fade away into the noise. In such a case, when you plot the histogram (see Histogram and Cumulative Frequency Plot) of the distribution, the pixels relating to those objects will be clearly separate from pixels that belong to parts of the image that did not have any signal (were just noise). In the cumulative frequency plot, after a steady rise (due to the noise), you would observe a long flat region were for a certain range of data (horizontal axis), there is no increase in the index (vertical axis).

Outliers like the example above can significantly bias the measurement of noise statistics. \(\sigma\)-clipping is defined as a way to avoid the effect of such outliers. In astronomical applications, cosmic rays (when they collide at a near normal incidence angle) are a very good example of such outliers. The tracks they leave behind in the image are perfectly immune to the blurring caused by the atmosphere and the aperture. They are also very energetic and so their borders are usually clearly separated from the surrounding noise. So \(\sigma\)-clipping is very useful in removing their effect on the data. See Figure 15 in Akhlaghi and Ichikawa, 2015.

\(\sigma\)-clipping is defined as the very simple iteration below. In each iteration, the range of input data might decrease and so when the outliers have the conditions above, the outliers will be removed through this iteration. The exit criteria will be discussed below.

  1. Calculate the standard deviation (\(\sigma\)) and median (\(m\)) of a distribution.
  2. Remove all points that are smaller or larger than \(m\pm\alpha\sigma\).
  3. Go back to step 1, unless the selected exit criteria is reached.

The reason the median is used as a reference and not the mean is that the mean is too significantly affected by the presence of outliers, while the median is less affected, see Quantifying signal in a tile. As you can tell from this algorithm, besides the condition above (that the signal have clear high signal to noise boundaries) \(\sigma\)-clipping is only useful when the signal does not cover more than half of the full data set. If they do, then the median will lie over the outliers and \(\sigma\)-clipping might remove the pixels with no signal.

There are commonly two exit criteria to stop the \(\sigma\)-clipping iteration:

When working on astronomical images, objects like galaxies and stars are blurred by the atmosphere and the telescope aperture, therefore their signal sinks into the noise very gradually. Galaxies in particular do not appear to have a clear high signal to noise cutoff at all. Therefore \(\sigma\)-clipping will not be useful in removing their effect on the data.

To gauge if \(\sigma\)-clipping will be useful for your dataset, look at the histogram (see Histogram and Cumulative Frequency Plot). The ASCII histogram that is printed on the command-line with --asciihist is good enough in most cases.


Next: , Previous: , Up: Statistics   [Contents][Index]

7.1.3 Sky value

One of the most important aspects of a dataset is its reference value: the value of the dataset where there is no signal. Without knowing, and thus removing the effect of, this value it is impossible to compare the derived results of many high-level analyses over the dataset with other datasets (in the attempt to associate our results with the “real” world). In astronomy, this reference value is known as the “Sky” value: the value where there is no signal from objects (for example galaxies, stars, planets or comets). Depending on the dataset, the Sky value maybe a fixed value over the whole dataset, or it may vary based on location. For an example of the latter case, see Figure 11 in Akhlaghi and Ichikawa (2015).

Because of the significance of the Sky value in astronomical data analysis, we have devoted this subsection to it for a thorough review. We start with a thorough discussion on its definition (Sky value definition). In the astronomical literature, researchers use a variety of methods to estimate the Sky value, so in Sky value misconceptions) we review those and discuss their biases. From the definition of the Sky value, the most accurate way to estimate the Sky value is to run a detection algorithm (for example NoiseChisel) over the dataset and use the un-detected pixels. However, there is also a more crude method that maybe useful when good direct detection is not initially possible (for example due to too many cosmic rays in a shallow image). A more crude (but simpler method) that is usable in such situations is discussed in Quantifying signal in a tile.


Next: , Previous: , Up: Sky value   [Contents][Index]

7.1.3.1 Sky value definition

This analysis is taken from Akhlaghi and Ichikawa (2015). Let’s assume that all instrument defects – bias, dark and flat – have been corrected and the brightness (see Flux Brightness and magnitude) of a detected object, \(O\), is desired. The sources of flux on pixel \(i\)87 of the image can be written as follows:

The total flux in this pixel (\(T_i\)) can thus be written as:

$$T_i=B_i+D_i+U_i+C_i+O_i.$$

By definition, \(D_i\) is detected and it can be assumed that it is correctly estimated (deblended) and subtracted, thus \(D_i=0\). There are also methods to detect and remove cosmic rays, for example the method described in van Dokkum (2001)88, or by comparing multiple exposures. This allows us to set \(C_i=0\). Note that in practice, \(D_i\) and \(U_i\) are correlated, because they both directly depend on the detection algorithm and its input parameters. Also note that no detection or cosmic ray removal algorithm is perfect. With these limitations in mind, the observed Sky value for this pixel (\(S_i\)) can be defined as

$$S_i=B_i+U_i.$$

Therefore, as the detection process (algorithm and input parameters) becomes more accurate, or \(U_i\to0\), the sky value will tend to the background value or \(S_i\to B_i\). Therefore, while \(B_i\) is an inherent property of the data (pixel in an image), \(S_i\) depends on the detection process. Over a group of pixels, for example in an image or part of an image, this equation translates to the average of undetected pixels. With this definition of sky, the object flux in the data can be calculated with

$$T_{i}=S_{i}+O_{i} \quad\rightarrow\quad O_{i}=T_{i}-S_{i}.$$

Hence, the more accurately \(S_i\) is measured, the more accurately the brightness (sum of pixel values) of the target object can be measured (photometry). Any under-(over-)estimation in the sky will directly translate to an over-(under-)estimation of the measured object’s brightness. In the fainter outskirts of an object a very small fraction of the photo-electrons in the pixels actually belong to objects (see Figure 1b in Akhlaghi and Ichikawa (2015)). Therefore even a small over estimation of the sky value will result in the loss of a very large portion of most galaxies. Besides the lost area/brightness, this will also cause an over-estimation of the Sky value and thus even more under-estimation of the object’s brightness. It is thus very important to detect the diffuse flux of a target, even if they are not your primary target.

The Sky value is only correctly found when all the detected objects (\(D_i\) and \(C_i\)) have been removed from the data.


Next: , Previous: , Up: Sky value   [Contents][Index]

7.1.3.2 Sky value misconceptions

As defined in Sky value, the sky value is only accurately defined when the detection algorithm is not significantly reliant on the sky value. In particular its detection threshold. However, most signal-based detection tools89 use the sky value as a reference to define the detection threshold. So these old techniques had to rely on approximations based on other assumptions about the data. A review of those other techniques can be seen in Appendix A of Akhlaghi and Ichikawa (2015)90. Since they were extensively used in astronomical data analysis for several decades, such approximations have given rise to a lot of misconceptions, ambiguities and disagreements about the sky value and how to measure it. As a summary, the major methods used until now were an approximation of the mode of the image pixel distribution and \(\sigma\)-clipping.

As discussed in Sky value, the sky value can only be correctly defined as the average of undetected pixels. Therefore all such approaches that try to approximate the sky value prior to detection are ultimately poor approximations.


Previous: , Up: Sky value   [Contents][Index]

7.1.3.3 Quantifying signal in a tile

Put simply, noise can characterized with a certain spread about a characteristic value. In the Gaussian distribution (most commonly used to model noise) the spread is defined by the standard deviation about the characteristic mean. Before continuing let’s clarify some definitions first: Data is defined as the combination of signal and noise (so a noisy image is one data-set). Signal is defined as the mean of the noise on each element (after sky subtraction, see Sky value definition).

Let’s assume that the background (see Sky value definition) is subtracted and is zero. When a data set doesn’t have any signal (only noise), the mean, median and mode of the distribution are equal within statistical errors and approximately equal to the background value. Signal always has a positive value and will never become negative, see Figure 1 in Akhlaghi and Ichikawa (2015). Therefore, as more signal is added to the raw noise, the mean, median and mode of the dataset (which has both signal and noise) shift to the positive. The mean’s shift is the largest. The median shifts less, since it is defined based on an ordered distribution and so is not affected by a small number of outliers. The distribution’s mode shifts the least to the positive.

Inverting the argument above gives us a robust method to quantify the significance of signal in a dataset. Namely, when the mode and median of a distribution are approximately equal, we can argue that there is no significant signal. To allow for gradients (which are commonly present in ground-based images), we can consider the image to be made of a grid of tiles (see Tessellation91). Hence, from the difference of the mode and median on each tile, we can ‘detect’ the significance of signal in it. The median of a distribution is defined to be the value of the distribution’s middle point after sorting (or 0.5 quantile). Thus, to estimate the presence of signal, we’ll compare with the quantile of the mode with 0.5, if the difference is larger than the value given to the --modmedqdiff option, this tile will be ignored. You can read this option as “mode-median-quantile-diff”.

This method to use the input’s skewness is possible because of a new algorithm to find the mode of a distribution that was defined in Appendix C of Akhlaghi and Ichikawa (2015). However, the raw dataset’s distribution is noisy (noise also affects the sorting), so using the argument above on the raw input will give a noisy result. To decrease the noise/error in estimating the mode, we will use convolution (see Convolution process). Convolution decreases the range of the dataset and enhances its skewness, See Section 3.1.1 and Figure 4 in Akhlaghi and Ichikawa (2015). This enhanced skewness can be interpreted as an increase in the Signal to noise ratio of the objects buried in the noise. Therefore, to obtain an even better measure of the presence of signal in a mesh, the image can be convolved with a given kernel first.

Note that through the difference of the mode and median we have actually ‘detected’ data in the distribution. However this “detection” was only based on the total distribution of the data in each tile (a much lower resolution). This is the main limitation of this technique. The best approach is thus to do detection over the dataset, mask all the detected pixels and use the undetected regions to estimate the sky and its standard deviation.

The mean value of the tiles that have an approximately equal mode and median will be the Sky value. However there is one final hurdle: astronomical datasets are commonly plagued with Cosmic rays. Images of Cosmic rays aren’t smoothed by the atmosphere or telescope aperture, so they have sharp boundaries. Also, since they don’t occupy too many pixels, they don’t affect the mode and median calculation. But their very high values can greatly bias the calculation of the mean (recall how the mean shifts the fastest in the presence of outliers), see Figure 15 in Akhlaghi and Ichikawa (2015) for one example.

The effect of outliers like cosmic rays on the mean and standard deviation can be removed through \(\sigma\)-clipping, see Sigma clipping for a complete explanation. Therefore, after asserting that the mode and median are approximately equal in a tile (see Tessellation), the final Sky value and its standard deviation are determined after \(\sigma\)-clipping with the --sigmaclip option.


Previous: , Up: Statistics   [Contents][Index]

7.1.4 Invoking Statistics

Statistics will print statistical measures of an input dataset (table column or image). The executable name is aststatistics with the following general template

$ aststatistics [OPTION ...] InputImage.fits

One line examples:

## Print some general statistics of input image:
$ aststatistics image.fits

## Print some general statistics of column named MAG_F160W:
$ aststatistics catalog.fits -h1 --column=MAG_F160W

## Make the histogram of the column named MAG_F160W:
$ aststatistics table.fits -cMAG_F160W --histogram

## Find the Sky value on image with a given kernel:
$ aststatistics image.fits --sky --kernel=kernel.fits

## Print Sigma-clipped results of records with a MAG_F160W
## column value between 26 and 27:
$ aststatistics cat.fits -cMAG_F160W -g26 -l27 --sigmaclip=3,0.2

## Print the median value of all records in column MAG_F160W that
## have a value larger than 3 in column PHOTO_Z:
$ aststatistics tab.txt -rPHOTO_Z -g3 -cMAG_F160W --median

An input image or table is necessary when processing is to be done. If any output file is to be created, the value to the --output option, is used as the base name for the generated files. Without --output, the input name will be used to generate an output name, see Automatic output. The options described below are particular to Statistics, but for general operations, it shares a large collection of options with the other Gnuastro programs, see Common options for the full list. Options can also be given in configuration files, for more, please see Configuration files.

The input dataset may have blank values (see Blank pixels), in this case, all blank pixels are ignored during the calculation. Initially, the full dataset will be read, but it is possible to select a specific range of data elements to use in the analysis of each run. You can either directly specify a minimum and maximum value for the range of data elements to use (with --greaterequal or --lessthan), or specify the range using quantiles (with --qrange). If a range is specified, all pixels outside of it are ignored before any processing.

The following set of options are for specifying the input/outputs of Statistics. There are many other input/output options that are common to all Gnuastro programs including Statistics, see Input/Output options for those.

-c STR/INT
--column=STR/INT

The input column selector when the input file is a table. See Selecting table columns for a full description of how to use this option. For more on how tables are read in Gnuastro, please see Tables.

-r STR/INT
--refcol=STR/INT

The reference column selector when the input file is a table. When a reference column is given, the range options below will be applied to this column and only elements in the input column that have a reference value in the correct range will be used. In practice this option allows you to select a subset of the input column based on values in another (the reference) column. All the statistical calculations will be done on the selected input column, not the reference column.

-g FLT
--greaterequal=FLT

Limit the range of inputs into those with values greater and equal to what is given to this option. None of the values below this value will be used in any of the processing steps below.

-l FLT
--lessthan=FLT

Limit the range of inputs into those with values less-than what is given to this option. None of the values greater or equal to this value will be used in any of the processing steps below.

-Q FLT[,FLT]
--qrange=FLT[,FLT]

Specify the range of usable inputs using the quantile. This option can take one or two quantiles to specify the range. When only one number is input (let’s call it \(Q\)), the range will be those values in the quantile range \(Q\) to \(1-Q\). So when only one value is given, it must be less than 0.5. When two values are given, the first is used as the lower quantile range and the second is used as the larger quantile range.

The quantile of a given element in a dataset is defined by the fraction of its index to the total number of values in the sorted input array. So the smallest and largest values in the dataset have a quantile of 0.0 and 1.0. The quantile is a very useful non-parametric (making no assumptions about the input) relative measure to specify a range. It can best be understood in terms of the cumulative frequency plot, see Histogram and Cumulative Frequency Plot. The quantile of each horizontal axis value in the cumulative frequency plot is the vertical axis value associate with it.

When no operation is requested, Statistics will print some general basic properties of the input dataset on the command-line like the example below (ran on one of the output images of make check92). This default behavior is designed to help give you a general feeling of how the data are distributed and help in narrowing down your analysis.

$ aststatistics convolve_spatial_scaled_noised.fits     \
                --greaterequal=9500 --lessthan=11000
Statistics (GNU Astronomy Utilities) X.X
-------
Input: convolve_spatial_scaled_noised.fits (hdu: 0)
Range: from (inclusive) 9500, upto (exclusive) 11000.
Unit: Brightness
-------
  Number of elements:                      9074
  Minimum:                                 9622.35
  Maximum:                                 10999.7
  Mode:                                    10055.45996
  Mode quantile:                           0.4001983908
  Median:                                  10093.7
  Mean:                                    10143.98257
  Standard deviation:                      221.80834
-------
Histogram:
 |                   **
 |                 ******
 |                 *******
 |                *********
 |              *************
 |              **************
 |            ******************
 |            ********************
 |          *************************** *
 |        ***************************************** ***
 |*  **************************************************************
 |-----------------------------------------------------------------

Gnuastro’s Statistics is a very general purpose program, so to be able to easily understand this diversity in its operations (and how to possibly run them together), we’ll divided the operations into two types: those that don’t respect the position of the elements and those that do (by tessellating the input on a tile grid, see Tessellation). The former treat the whole dataset as one and can re-arrange all the elements (for example sort them), but the former do their processing on each tile independently. First, we’ll review the operations that work on the whole dataset.

The group of options below can used to get a single value measurement of the whole dataset. They will print only the requested value as one field in a line/row, like the --mean, --median options. These options can be called any number of times and in any order. The outputs of all such options will be printed on one line following each other (with a space character between them). This feature makes these options very useful in scripts, or to redirect into programs like GNU AWK for higher-level processing. These are some of the most basic measures, Gnuastro is still under heavy development and this list will grow. If you want another statistical parameter, please contact us and we will do out best to add it to this list, see Suggest new feature.

-n
--number

Print the number of all used (non-blank and in range) elements.

--minimum

Print the minimum value of all used elements.

--maximum

Print the maximum value of all used elements.

--sum

Print the sum of all used elements.

-m
--mean

Print the mean (average) of all used elements.

-t
--std

Print the standard deviation of all used elements.

-E
--median

Print the median of all used elements.

-u FLT[,FLT[,...]]
--quantile=FLT[,FLT[,...]]

Print the values at the given quantiles of the input dataset. Any number of quantiles may be given and one number will be printed for each. Values can either be written as a single number or as fractions, but must be between zero and one (inclusive). Hence, in effect --quantile=0.25 --quantile=0.75 is equivalent to --quantile=0.25,3/4, or -u1/4,3/4.

The returned value is one of the elements from the dataset. Taking \(q\) to be your desired quantile, and \(N\) to be the total number of used (non-blank and within the given range) elements, the returned value is at the following position in the sorted array: \(round(q\times{}N\)).

--quantfunc=FLT[,FLT[,...]]

Print the quantiles of the given values in the dataset. This option is the inverse of the --quantile and operates similarly except that the acceptable values are within the range of the dataset, not between 0 and 1. Formally it is known as the “Quantile function”.

Since the dataset is not continuous this function will find the nearest element of the dataset and use its position to estimate the quantile function.

-O
--mode

Print the mode of all used elements. The mode is found through the mirror distribution which is fully described in Appendix C of Akhlaghi and Ichikawa 2015. See that section for a full description.

This mode calculation algorithm is non-parametric, so when the dataset is not large enough (larger than about 1000 elements usually), or doesn’t have a clear mode it can fail. In such cases, this option will return a value of nan (for the floating point NaN value).

As described in that paper, the easiest way to assess the quality of this mode calculation method is to use it’s symmetricity (see --modesym below). A better way would be to use the --mirror option to generate the histogram and cumulative frequency tables for any given mirror value (the mode in this case) as a table. If you generate plots like those shown in Figure 21 of that paper, then your mode is accurate.

--modequant

Print the quantile of the mode. You can get the actual mode value from the --mode described above. In many cases, the absolute value of the mode is irrelevant, but its position within the distribution is important. In such cases, this option will become handy.

--modesym

Print the symmetricity of the calculated mode. See the description of --mode for more. This mode algorithm finds the mode based on how symmetric it is, so if the symmetricity returned by this option is too low, the mode is not too accurate. See Appendix C of Akhlaghi and Ichikawa 2015 for a full description. In practice, symmetricity values larger than 0.2 are mostly good.

--modesymvalue

Print the value in the distribution where the mirror and input distributions are no longer symmetric, see --mode and Appendix C of Akhlaghi and Ichikawa 2015 for more.

The list of options below are for those statistical operations that output more than one value. So while they can be called together in one run, their outputs will be distinct (each one’s output will usually be printed in more than one line).

-A
--asciihist

Print an ASCII histogram of the usable values within the input dataset along with some basic information like the example below (from the UVUDF catalog93). The width and height of the histogram (in units of character widths and heights on your command-line terminal) can be set with the --numasciibins (for the width) and --asciiheight options.

For a full description of the histogram, please see Histogram and Cumulative Frequency Plot. An ASCII plot is certainly very crude and cannot be used in any publication, but it is very useful for getting a general feeling of the input dataset very fast and easily on the command-line without having to take your hands off the keyboard (which is a major distraction!). If you want to try it out, you can write it all in one line and ignore the \ and extra spaces.

$ aststatistics uvudf_rafelski_2015.fits.gz --hdu=1         \
                --column=MAG_F160W --lessthan=40            \
                --asciihist --numasciibins=55
ASCII Histogram:
Number: 8593
Y: (linear: 0 to 660)
X: (linear: 17.7735 -- 31.4679, in 55 bins)
 |                                         ****
 |                                        *****
 |                                       ******
 |                                      ********
 |                                      *********
 |                                    ***********
 |                                  **************
 |                                *****************
 |                           ***********************
 |                    ********************************
 |*** ***************************************************
 |-------------------------------------------------------
--asciicfp

Print the cumulative frequency plot of the usable elements in the input dataset. Please see descriptions under --asciihist for more, the example below is from the same input table as that example. To better understand the cumulative frequency plot, please see Histogram and Cumulative Frequency Plot.

$ aststatistics uvudf_rafelski_2015.fits.gz --hdu=1         \
                --column=MAG_F160W --lessthan=40            \
                --asciicfp --numasciibins=55
ASCII Cumulative frequency plot:
Y: (linear: 0 to 8593)
X: (linear: 17.7735 -- 31.4679, in 55 bins)
 |                                                *******
 |                                             **********
 |                                            ***********
 |                                          *************
 |                                         **************
 |                                        ***************
 |                                      *****************
 |                                    *******************
 |                                ***********************
 |                         ******************************
 |*******************************************************
 |-------------------------------------------------------
-H
--histogram

Save the histogram of the usable values in the input dataset into a table. The first column is the value at the center of the bin and the second is the number of points in that bin. If the --cumulative option is also called with this option in a run, then the table will have three columns (the third is the cumulative frequency plot). Through the --numbins and --lowerbin you can modify the first column values and with --normalize and --maxbinone you can modify the second columns. See below for the description of each.

By default (when no --output is specified) a plain text table will be created, see Gnuastro text table format. If a FITS name is specified, you can use the common option --tableformat to have it as a FITS ASCII or FITS binary format, see Common options. This table can then be fed into your favorite plotting tool and get a much more clean and nice histogram than what the raw command-line can offer you (with the --asciihist option).

-C
--cumulative

Save the cumulative frequency plot of the usable values in the input dataset into a table, similar to --histogram.

-s
--sigmaclip

Do \(\sigma\)-clipping on the usable pixels of the input dataset. See Sigma clipping for a full description on \(\sigma\)-clipping and also to better understand this option. The \(\sigma\)-clipping parameters can be set through the --sclipparams option (see below).

--mirror=FLT

Make a histogram and cumulative frequency plot of the mirror distribution for the given dataset when the mirror is located at the value to this option. The mirror distribution is fully described in Appendix C of Akhlaghi and Ichikawa 2015 and currently it is only used to calculate the mode (see --mode).

Just note that the mirror distribution is a discrete distribution like the input, so while you may give any number as the value to this option, the actual mirror value is the closest number in the input dataset to this value. If the two numbers are different, Statistics will warn you of the actual mirror value used.

This option will make a table as output. Depending on your selected name for the output, it will be either a FITS table or a plain text table (which is the default). It contains three columns: the first is the center of the bins, the second is the histogram (with the largest value set to 1) and the third is the normalized cumulative frequency plot of the mirror distribution. The bins will be positioned such that the mode is on the starting interval of one of the bins to make it symmetric around the mirror. With this output file and the input histogram (that you can generate in another run of Statistics, using the --onebinvalue), it is possible to make plots like Figure 21 of Akhlaghi and Ichikawa 2015.

The list of options below allow customization of the histogram and cumulative frequency plots (for the --histogram, --cumulative, --asciihist, and --asciicfp options).

--numbins

The number of bins (rows) to use in the histogram and the cumulative frequency plot tables (outputs of --histogram and --cumulative).

--numasciibins

The number of bins (characters) to use in the ASCII plots when printing the histogram and the cumulative frequency plot (outputs of --asciihist and --asciicfp).

--asciiheight

The number of lines to use when printing the ASCII histogram and cumulative frequency plot on the command-line (outputs of --asciihist and --asciicfp).

-n
--normalize

Normalize the histogram or cumulative frequency plot tables (outputs of --histogram and --cumulative). For a histogram, the sum of all bins will become one and for a cumulative frequency plot the last bin value will be one.

--maxbinone

Divide all the histogram values by the maximum bin value so it becomes one and the rest are similarly scaled. In some situations (for example if you want to plot the histogram and cumulative frequency plot in one plot) this can be very useful.

--onebinstart=FLT

Make sure that one bin starts with the value to this option. In practice, this will shift the bins used to find the histogram and cumulative frequency plot such that one bin’s lower interval becomes this value. For example when the histogram range includes negative and positive values and zero has a special significance in your analysis, then zero will be somewhere in one bin and will mix counts of positive and negative. By setting --onebinstart=0, you can make sure that the viewers of the histogram will not be confused without doing the math of setting a range and number of bins.

Note that by default, the first row of the histogram and cumulative frequency plot show the central values of each bin. So in the example above you will not see the 0.000 in the first column, you will see two symmetric values. If the value is not within the usable input range, this option will be ignored.

All the options described until now were from the first class of operations discussed above: those that treat the whole dataset as one. However. It often happens that the relative position of the dataset elements over the dataset is significant. For example you don’t want one median value for the whole input image, you want to know how the median changes over the image. For such operations, the input has to be tessellated (see Tessellation). Thus this class of options can’t currently be called along with the options above in one run of Statistics.

-t
--ontile

Do the respective single-valued calculation over one tile of the input dataset, not the whole dataset. This option must be called with at least one of the single valued options discussed above (for example --mean or --quantile). The output will be a file in the same format as the input. If the --oneelempertile option is called, then one element/pixel will be used for each tile (see Processing options). Otherwise, the output will have the same size as the input, but each element will have the value corresponding to that tile’s value. If multiple single valued operations are called, then for each operation there will be one extension in the output FITS file.

-y
--sky

Estimate the Sky value on each tile as fully described in Quantifying signal in a tile. As described in that section, several options are necessary to configure the Sky estimation which are listed below. The output file will have two extensions: the first is the Sky value and the second is the Sky standard deviation on each tile. Similar to --ontile, if the --oneelempertile option is called, then one element/pixel will be used for each tile (see Processing options).

The parameters for estimating the sky value can be set with the following options, except for the --sclipparams option (which is also used by the --sigmaclip), the rest are only used for the Sky value estimation.

-k=STR
--kernel=STR

File name of kernel to help in estimating the significance of signal in a tile, see Quantifying signal in a tile.

--khdu=STR

Kernel HDU to help in estimating the significance of signal in a tile, see Quantifying signal in a tile.

--mirrordist=FLT

Maximum distance (as a multiple of error) to estimate the difference between the input and mirror distributions in finding the mode, see Appendix C of Akhlaghi and Ichikawa 2015, also see Quantifying signal in a tile.

--modmedqdiff=FLT

The maximum acceptable distance between the mode and median, see Quantifying signal in a tile.

--sclipparams=FLT,FLT

The \(\sigma\)-clipping parameters, see Sigma clipping. This option takes two values which are separated by a comma (,). Each value can either be written as a single number or as a fraction of two numbers (for example 3,1/10). The first value to this option is the multiple of \(\sigma\) that will be clipped (\(\alpha\) in that section). The second value is the exit criteria. If it is less than 1, then it is interpreted as tolerance and if it is larger than one it is a specific number. Hence, in the latter case the value must be an integer.

--smoothwidth=INT

Width of a flat kernel to convolve the interpolated tile values. Tile interpolation is done using the median of the --interpnumngb neighbors of each tile (see Processing options). If this option is given a value of zero or one, no smoothing will be done. Without smoothing, strong boundaries will probably be created between the values estimated for each tile. It is thus good to smooth the interpolated image so strong discontinuities do not show up in the final Sky values. The smoothing is done through convolution (see Convolution process) with a flat kernel, so the value to this option must be an odd number.

--checksky

Create a multi-extension FITS file showing the steps that were used to estimate the Sky value over the input, see Quantifying signal in a tile. The file will have two extensions for each step (one for the Sky and one for the Sky standard deviation).


Next: , Previous: , Up: Data analysis   [Contents][Index]

7.2 NoiseChisel

Once instrumental signatures are removed from the raw data in the initial reduction process (see Data manipulation). We are ready to derive scientific results out of them. But we can’t do anything special with a raw dataset, for example an image is just an array of values. Every pixel just has one value and its position within the image. Therefore, the first step of your high-level analysis will be to classify/label the dataset elements/pixels into two classes: signal and noise. This process is formally known as detection. Afterwards, you want to separate the detections into multiple components (for example when two detected regions aren’t touching, they should be treated independently as two distant galaxies for example). This higher level classification of the detections is known as segmentation. NoiseChisel is Gnuastro’s program for detection and segmentation.

NoiseChisel works based on a new noise-based approach to signal detection and was introduced to the astronomical community in Akhlaghi and Ichikawa [2015]. NoiseChisel’s primary output is an array (image) with the same size as the input but containing labels: those pixels with a label of 0 are noise/sky while those pixels with labels larger than 0 are detections (separate segments will be given positive integers, starting from 1). For more on NoiseChisel’s particular output format and its benefits (especially in conjunction with MakeCatalog), please see Akhlaghi [2016]. The published paper cannot under go any updates, but the NoiseChisel software has evolved, you can see the major changes in NoiseChisel changes after publication.

Data is inherently mixed with noise: only mock/simulated datasets are free of noise. So this process of separating signal from noise is not trivial. In particular, most scientifically interesting astronomical targets are faint, can have a large variety of morphologies along with a large distribution in brightness and size which are all drowned in a ocean of noise. So detection is a uniquely vital aspect of any scientific work and even more so for astronomical research. This is such a fundamental step that designing of NoiseChisel was the primary motivation behind creating Gnuastro: the first generation of Gnuastro’s programs were all first part of what later became NoiseChisel, afterwards they spinned-off into separate programs.

The name of NoiseChisel is derived from the first thing it does after thresholding the dataset: to erode it. In mathematical morphology, erosion on pixels can be pictured as carving off boundary pixels. Hence, what NoiseChisel does is similar to what a wood chisel or stone chisel do. It is just not a hardware, but a software. In fact looking at it as a chisel and your dataset as a solid cube of rock will greatly help in best using it: with NoiseChisel you literally carve the galaxies/stars/comets out of the noise. Try running it with the --checkdetection option to see each step of the carving process on your input dataset. You can then change a specific option to carve out your signal out of the noise more successfully.


Next: , Previous: , Up: NoiseChisel   [Contents][Index]

7.2.1 NoiseChisel changes after publication

Before using NoiseChisel it is strongly recommended to read Akhlaghi and Ichikawa [2015] to gain a good understanding of what it does and how each parameter influences the output. Thanks to that paper, there is no more need to continue this introduction any further and we can just dive into the details of running NoiseChisel in Invoking NoiseChisel. However, the paper cannot undergo any further updates, but NoiseChisel will evolve: better algorithms or steps will be found, thus options will be added or removed. So this book is the final and definitive guide. To make the transition form the paper to this book easier (and encourage reading the paper), below you can see the major changes since that paper was published.

For a more detailed list of updates in each release, please follow the NEWS file. The NEWS file is in the released Gnuastro tarball (see Release tarball). You can also see it online at http://git.savannah.gnu.org/cgit/gnuastro.git/plain/NEWS.


Previous: , Up: NoiseChisel   [Contents][Index]

7.2.2 Invoking NoiseChisel

NoiseChisel will detect and segment signal in noise producing a multi-extension labeled image, ready for input into MakeCatalog to generate a catalog or other processing. The executable name is astnoisechisel with the following general template

$ astnoisechisel [OPTION ...] InputImage.fits

One line examples:

## Detect signal in input.fits:
$ astnoisechisel input.fits

## Detect signal assuming input has 4 channels along first dimension
## and 1 along the second. Also set the regular tile size to 100 along
## both dimensions:
$ astnoisechisel --numchannels=4,1 --tilesize=100,100 input.fits

If NoiseChisel is to do processing (for example you don’t want to get help, or see the values to each input parameter), an input image should be provided with the recognized extensions (see Arguments). NoiseChisel shares a large set of common operations with other Gnuastro programs, mainly regarding input/output, general processing steps, and general operating modes. To help in a unified experience between all of Gnuastro’s programs, these operations have the same command-line options, see Common options for a full list. Since the common options are thoroughly discussed there, they are no longer reviewed here. You can see all the options with a short description on the command-line with the --help option, see Getting help.

NoiseChisel’s input image may contain blank elements (see Blank pixels). Blank elements will be ignored in all steps of NoiseChisel. Hence if your dataset has bad pixels which should be masked with a mask image, please use Gnuastro’s Arithmetic program (in particular its where operator) to convert those pixels to blank pixels before running NoiseChisel. Gnuastro’s Arithmetic program has bitwise operators helping you select specific kinds of bad-pixels when necessary.

A convolution kernel can also be optionally given. If a value (file name) is given to --kernel on the command-line or in a configuration file (see Configuration files), then that file will be used to convolve the image prior to thresholding. Otherwise a default kernel will be used. The default kernel is a 2D Gaussian with a FWHM of 2 pixels truncated at 5 times the FWHM. This choice of the default kernel is discussed in Section 3.1.1 of Akhlaghi and Ichikawa [2015]. See Convolution kernel for kernel related options.

NoiseChisel defines two tessellations over the input (see Tessellation). This enables it to deal with possible gradients in the input dataset and also significantly improve speed by processing each tile on different threads. The tessellation related options are discussed in Processing options. In particular, NoiseChisel uses two tessellations (with everything between them identical except the tile sizes): a fine-grained one with smaller tiles (mainly used in detection) and a more larger tiled one which is used for multi-threaded processing. The common Tessellation options described in Processing options define all parameters of both tessellations, only the large tile size for the latter tessellation is set through the --largetilesize option. To inspect the tessellations on your input dataset, run NoiseChisel with --checktiles.

Usage TIP: Frequently use the options starting with --check. Depending on what you want to detect in the data, you can often play with the parameters/options for a better result than the default parameters. You can start with --checkdetection and --checksegmentation for the main steps. For their full list please run:

$ astnoisechisel --help | grep check

In the sections below, NoiseChisel’s options are classified into three general classes to help in easy navigation. General NoiseChisel options mainly discusses the options relating to input and those that are shared in both detection and segmentation. Options to configure the detection are described in Detection options and Segmentation options we discuss how you can fine-tune the segmentation of the detections. Finally in NoiseChisel output the format of NoiseChisel’s output is discussed. The order of options here follow the same logical order that the respective action takes place within NoiseChisel (note that the output of --help is sorted alphabetically).


Next: , Previous: , Up: Invoking astnoisechisel   [Contents][Index]

7.2.2.1 General NoiseChisel options

The options discussed in this section are mainly regarding the input(s), output, and some general processing options that are shared between both detection and segmentation. Recall that you can always see the full list of Gnuastro’s options with the --help option.

-k STR
--kernel=STR

File name of kernel to smooth the image before applying the threshold, see Convolution kernel. The first step of NoiseChisel is to convolve/smooth the image and use the convolved image in multiple steps during the processing. It will be used to define (and later apply) the quantile threshold (see --qthresh). The convolved image is also used to define the clumps (see Section 3.2.1 and Figure 8 of Akhlaghi and Ichikawa [2015]).

The --kernel option is not mandatory. If no kernel is provided, a 2D Gaussian profile with a FWHM of 2 pixels truncated at 5 times the FWHM is used. This choice of the default kernel is discussed in Section 3.1.1 of Akhlaghi and Ichikawa [2015].

--khdu=STR

HDU containing the kernel in the file given to the --kernel option.

-E
--skysubtracted

If this option is called, it is assumed that the image has already been sky subtracted once. Knowing if the sky has already been subtracted once or not is very important in estimating the Signal to noise ratio of the detections and clumps. In short an extra \(\sigma_{sky}^2\) must be added in the error (noise or denominator in the Signal to noise ratio) for every flux value that is present in the calculation. This can be interpreted as the error in measuring that sky value when it was subtracted by any other program. See Section 3.3 in Akhlaghi and Ichikawa [2015]) for a complete explanation.

-B FLT
--minskyfrac=FLT

Minimum fraction (value between 0 and 1) of sky (undetected) areas in a tile for it to be considered in measuring the following detection and segmentation properties.

Because of the PSF and their intrinsic amorphous properties, astronomical objects (except cosmic rays) never have a clear cutoff and commonly sink into the noise very slowly. Even below the very low thresholds used by NoiseChisel. So when a large fraction of the area of one mesh is covered by detections, it is very plausible that their faint wings are present in the undetected regions (hence causing a bias in any measurement). To get an accurate measurement of the above parameters over the tessellation, tiles that harbor too many detected regions should be excluded. The used tiles are visible in the respective --check option of the given step.

--minnumfalse=INT

The minimum number of ‘pseudo-detections’ (to identify false initial detections) or clumps (to identifying false clumps) found over the un-detected regions to identify a Signal-to-Noise ratio threshold.

The Signal to noise ratio (S/N) of false pseudo-detections and clumps in each tile is found using the quantile of the S/N distribution of the psudo-detections and clumps over the undetected pixels in each mesh. If the number of S/N measurements is not large enough, the quantile will not be accurate (can have large scatter). For example if you set --detquant=0.99 (or the top 1 percent), then it is best to have at least 100 S/N measurements.

-L INT[,INT]
--largetilesize=INT[,INT]

The size of each tile for the tessellation with the larger tile sizes. Except for the tile size, all the other parameters for this tessellation are taken from the common options described in Processing options. The format is identical to that of the --tilesize option that is discussed in that section.

--onlydetection

If this option is called, no segmentation will be done and the output will only have four extensions (no clumps extension, see NoiseChisel output). The second extension of the output is not going to be objects but raw detections (a large region will be given one label): labeling is only done based on connectivity. The last two extensions of the output will be the Sky and its Standard deviation.

This option can result in faster processing when only the noise properties of the image are desired for a catalog using another image’s labels for example. A common case is when you want to measure colors or SEDs in several images. Let’s say you have images in two colors: A and B. For simplicity also assume that they are exactly on the same position in the sky with the same pixel scale.

You choose to set A as a reference, so you run the NoiseChisel fully on A. Then you run NoiseChisel on B with --onlydetection since you only need the noise properties of B (for the signal to noise column in its catalog). You can then run MakeCatalog on A normally, see MakeCatalog. To run MakeCatalog on B, you simply set the object and clump labels images to those that NoiseChisel produced for A, see Invoking MakeCatalog.

--grownclumps

In the output (see NoiseChisel output) store the grown clumps (or full detected region if only one clump was present in that detection). By default the original clumps are stored as the third extension of the output, if this option is called, it is replaced with the grown clump labels.

--continueaftercheck

Continue NoiseChisel after any of the options starting with --check. NoiseChisel involves many steps and as a result, there are many checks, allowing to inspect the status of the processing. The results of each step affect the next steps of processing, so, when you are want to check the status of the processing at one step, the time spent to complete NoiseChisel is just wasted/distracting time.

To encourage easier experimentation with the option values, when you use any of the NoiseChisel options that start with --check, NoiseChisel will abort once all the desired check file(s) is (are) completed. If you call the --continueaftercheck option, you can disable this behavior and ask NoiseChisel to continue with the rest of the processing after completing the check file(s).


Next: , Previous: , Up: Invoking astnoisechisel   [Contents][Index]

7.2.2.2 Detection options

Detection is the process of separating the pixels in the image into two groups: 1) Signal and 2) Noise. Through the parameters below, you can customize the detection process in NoiseChisel. Recall that you can always see the full list of Gnuastro’s options with the --help option.

-r FLT
--mirrordist=FLT

Maximum distance (as a multiple of error) to estimate the difference between the input and mirror distributions in finding the mode, see Appendix C of Akhlaghi and Ichikawa 2015, also see Quantifying signal in a tile.

-Q FLT
--modmedqdiff=FLT

The maximum acceptable distance between the mode and median, see Quantifying signal in a tile. The quantile threshold will be found on tiles that satisfy this mode and median difference.

-t FLT
--qthresh=FLT

The quantile threshold to apply to the convolved image. The detection process begins with applying a quantile threshold to each of the tiles in the small tessellation. The quantile is only calculated for tiles that don’t have any significant signal within them, see Quantifying signal in a tile. Interpolation is then used to give a value to the un-successful tiles and it is finally smoothed.

The quantile value is a floating point value between 0 and 1. Assume that we have sorted the \(N\) data elements of a distribution (the pixels in each mesh on the convolved image). The quantile (\(q\)) of this distribution is the value of the element with an index of (the nearest integer to) \(q\times{N}\) in the sorted data set. After thresholding is complete, we will have a binary (two valued) image. The pixels above the threshold are known as foreground pixels (have a value of 1) while those which lie below the threshold are known as background (have a value of 0).

--smoothwidth=INT

Width of flat kernel used to smooth the interpolated quantile thresholds, see --qthresh for more.

--checkqthresh

Check the quantile threshold values on the mesh grid. A file suffixed with _qthresh.fits will be created showing each step. With this option, NoiseChisel will abort as soon as quantile estimation has been completed, allowing you to inspect the steps leading to the final quantile threshold, this can be disabled with --continueaftercheck. By default the output will have the same pixel size as the input, but with the --oneelempertile option, only one pixel will be used for each tile (see Processing options).

-e INT
--erode=INT

The number of erosions to apply to the binary thresholded image. Erosion is simply the process of flipping (from 1 to 0) any of the foreground pixels that neighbor a background pixel. In a 2D image, there are two kinds of neighbors, 4-connected and 8-connected neighbors. You can specify which type of neighbors should be used for erosion with the --erodengb option, see below.

Erosion has the effect of shrinking the foreground pixels. To put it another way, it expands the holes. This is a founding principle in NoiseChisel: it exploits the fact that with very low thresholds, the holes in the very low surface brightness regions of an image will be smaller than regions that have no signal. Therefore by expanding those holes, we are able to separate the regions harboring signal.

--erodengb=INT

The type of neighborhood (structuring element) used in erosion, see --erode for an explanation on erosion. Only two integer values are acceptable: 4 or 8. In 4-connectivity, the neighbors of a pixel are defined as the four pixels on the top, bottom, right and left of a pixel that share an edge with it. The 8-connected neighbors on the other hand include the 4-connected neighbors along with the other 4 pixels that share a corner with this pixel. See Figure 6 (a) and (b) in Akhlaghi and Ichikawa (2015) for a demonstration.

--noerodequant

Pure erosion is going to carve off sharp and small objects completely out of the detected regions. This option can be used to avoid missing such sharp and small objects (which have significant pixels, but not over a large area). All pixels with a value larger than the significance level specified by this option will not be eroded during the erosion step above. However, they will undergo the erosion and dilation of the opening step below.

Like the --qthresh option, the significance level is determined using the quantile (a value between 0 and 1). Just as a reminder, in the normal distribution, \(1\sigma\), \(1.5\sigma\), and \(2\sigma\) are approximately on the 0.84, 0.93, and 0.98 quantiles.

-p INT
--opening=INT

Depth of opening to be applied to the eroded binary image. Opening is a composite operation. When opening a binary image with a depth of \(n\), \(n\) erosions (explained in --erode) are followed by \(n\) dilations. Simply put, dilation is the inverse of erosion. When dilating an image any background pixel is flipped (from 0 to 1) to become a foreground pixel. Dilation has the effect of fattening the foreground. Note that in NoiseChisel, the erosion which is part of opening is independent of the initial erosion that is done on the thresholded image (explained in --erode). The structuring element for the opening can be specified with the --openingngb option. Opening has the effect of removing the thin foreground connections (mostly noise) between separate foreground ‘islands’ (detections) thereby completely isolating them. Once opening is complete, we have initial detections.

--openingngb=INT

The structuring element used for opening, see --erodengb for more information about a structuring element.

-s FLT,FLT
--sigmaclip=FLT,FLT

The \(\sigma\)-clipping parameters, see Sigma clipping. This option takes two values which are separated by a comma (,). Each value can either be written as a single number or as a fraction of two numbers (for example 3,1/10). The first value to this option is the multiple of \(\sigma\) that will be clipped (\(\alpha\) in that section). The second value is the exit criteria. If it is less than 1, then it is interpreted as tolerance and if it is larger than one it is assumed to be the fixed number of iterations. Hence, in the latter case the value must be an integer.

--checkdetsky

Check the initial approximation of the sky value and its standard deviation in a FITS file ending with _detsky.fits. With this option, NoiseChisel will abort as soon as the sky value used for defining pseudo-detections is complete. This allows you to inspect the steps leading to the final quantile threshold, this behavior can be disabled with --continueaftercheck. By default the output will have the same pixel size as the input, but with the --oneelempertile option, only one pixel will be used for each tile (see Processing options).

-R FLT
--dthresh=FLT

The detection threshold: a multiple of the initial sky standard deviation added with the initial sky approximation (which you can inspect with --checkdetsky). This flux threshold is applied to the initially undetected regions on the un-convolved image. The background pixels that are completely engulfed in a 4-connected foreground region are converted to background (holes are filled) and one opening (depth of 1) is applied over both the initially detected and undetected regions. The Signal to noise ratio of the resulting ‘psudo-detections’ are used to identify true vs. false detections. See Section 3.1.5 and Figure 7 in Akhlaghi and Ichikawa (2015) for a very complete explanation.

-i INT
--detsnminarea=INT

The minimum area to calculate the Signal to noise ratio on the psudo-detections of both the initially detected and undetected regions. When the area in a psudo-detection is too small, the Signal to noise ratio measurements will not be accurate and their distribution will be heavily skewed to the positive. So it is best to ignore any psudo-detection that is smaller than this area. Use --detsnhistnbins to check if this value is reasonable or not.

--checkdetsn

Save the S/N values of the pseudo-detections and dilated detections into three files ending with _detsn_sky.XXX, _detsn_det.XXX, and _detsn_dilated.XXX. The .XXX is determined from the --tableformat option (see Input/Output options, for example .txt or .fits). You can use these to inspect the S/N values and their distribution (in combination with the --checkdetection option to see where the pseudo-detections are). You can use Gnuastro’s Statistics to make a histogram of the distribution or any other analysis you would like for better understanding of the distribution (for example through a histogram).

With this option, NoiseChisel will abort as soon as the tables are created. This allows you to inspect the steps leading to the final quantile threshold, this behavior (to abort NoiseChisel) can be disabled with --continueaftercheck.

-c FLT
--detquant=FLT

The quantile of the Signal to noise ratio distribution of the psudo-detections in each mesh to use for filling the large mesh grid. Note that this is only calculated for the large mesh grids that satisfy the minimum fraction of undetected pixels (value of --minbfrac) and minimum number of psudo-detections (value of --minnumfalse).

-d INT
--dilate=INT

Number of times to dilate the final true detections. See the explanations in --opening for more information on dilation. The structuring element for this final dilation is fixed to an 8-connected neighborhood. This is because astronomical objects, except cosmic rays, never have a clear cutoff, so all the 8-pixels connected to the border pixels of a detection might harbor data.

--dilatengb=INT

The connectivity used for the final dilation, see --erodengb for more information about connectivity or a structuring element.

--checkdetection

Every step of the detection process will be added as an extension to a file with the suffix _det.fits. Going through each would just be a repeat of the explanations above and also of those in Akhlaghi and Ichikawa (2015). The extension label should be sufficient to recognize which step you are observing. Viewing all the steps can be the best guide in choosing the best set of parameters. With this option, NoiseChisel will abort as soon as a snapshot of all the detection process is saved. This behavior can be disabled with --continueaftercheck.

--checksky

Check the derivation of the final sky and its standard deviation values on the mesh grid. With this option, NoiseChisel will abort as soon as the sky value is estimated over the image (on each tile). This behavior can be disabled with --continueaftercheck. By default the output will have the same pixel size as the input, but with the --oneelempertile option, only one pixel will be used for each tile (see Processing options).


Next: , Previous: , Up: Invoking astnoisechisel   [Contents][Index]

7.2.2.3 Segmentation options

Segmentation is the process of (possibly) breaking up a detection into multiple segments (technically called objects and clumps in NoiseChisel). In deep surveys segmentation becomes particularly important because we will be detecting more diffuse flux so galaxy images are going to overlap more. It is thus very important to be able separate the pixels within a detection.

In NoiseChisel, segmentation is done by first finding the ‘true’ clumps over a detection and then expanding those clumps to a certain flux limit. True clumps are found in a process very similar to the true detections explained in Detection options, see Akhlaghi and Ichikawa [2015] for more information. If the connections between the grown clumps are weaker than a given threshold, the grown clumps are considered to be separate objects.

-m INT
--segsnminarea=INT

The minimum area which a clump in the undetected regions should have in order to be considered in the clump Signal to noise ratio measurement. If this size is set to a small value, the Signal to noise ratio of false clumps will not be accurately found. It is recommended that this value be larger than the value to --detsnminarea. Because the clumps are found on the convolved (smoothed) image while the psudo-detections are found on the input image. You can use --checkclumpsn and --checksegmentation to see if your chosen value is reasonable or not.

--checkclumpsn

Save the S/N values of the clumps into two files ending with _clumpsn_sky.XXX and _clumpsn_det.XXX. The .XXX is determined from the --tableformat option (see Input/Output options, for example .txt or .fits). You can use these to inspect the S/N values and their distribution (in combination with the --checksegmentation option to see where the clumps are). You can use Gnuastro’s Statistics to make a histogram of the distribution (ready for plotting in a text file, or a crude ASCII-art demonstration on the command-line).

With this option, NoiseChisel will abort as soon as the two tables are created. This allows you to inspect the steps leading to the final S/N quantile threshold, this behavior can be disabled with --continueaftercheck.

-g FLT
--segquant=FLT

The quantile of the noise clump Signal to noise ratio distribution. This value is used to identify true clumps over the detected regions. You can get the full distribution of clumps S/Ns over the undetected areas with the --checkclumpsn option and see them with --checksegmentation.

-v
--keepmaxnearriver

Keep a clump whose maximum flux is 8-connected to a river pixel. By default such clumps over detections are considered to be noise and are removed irrespective of their brightness (see Flux Brightness and magnitude). Over large profiles, that sink into the noise very slowly, noise can cause part of the profile (which was flat without noise) to become a very large and with a very high Signal to noise ratio. In such cases, the pixel with the maximum flux in the clump will be immediately touching a river pixel.

-G FLT
--gthresh=FLT

Threshold (multiple of the sky standard deviation added with the sky) to stop growing true clumps. Once true clumps are found, they are set as the basis to segment the detected region. They are grown until the threshold specified by this option.

-y INT
--minriverlength=INT

The minimum length of a river between two grown clumps for it to be considered in Signal to noise ratio estimations. Similar to --segsnminarea and --detsnminarea, if the length of the river is too short, the Signal to noise ratio can be noisy and unreliable. Any existing rivers shorter than this length will be considered as non-existent, independent of their Signal to noise ratio. Since the clumps are grown on the input image, this value should best be similar to the value of --detsnminarea. Recall that the clumps were defined on the convolved image so --segsnminarea was larger than --detsnminarea.

-O FLT
--objbordersn=FLT

The maximum Signal to noise ratio of the rivers between two grown clumps in order to consider them as separate ‘objects’. If the Signal to noise ratio of the river between two grown clumps is larger than this value, they are defined to be part of one ‘object’. Note that the physical reality of these ‘objects’ can never be established with one image, or even multiple images from one broad-band filter. Any method we devise to define ‘object’s over a detected region is ultimately subjective.

Two very distant galaxies or satellites in one halo might lie in the same line of sight and be detected as clumps on one detection. On the other hand, the connection (through a spiral arm or tidal tail for example) between two parts of one galaxy might have such a low surface brightness that they are broken up into multiple detections or objects. In fact if you have noticed, exactly for this purpose, this is the only Signal to noise ratio that the user gives into NoiseChisel. The ‘true’ detections and clumps can be objectively identified from the noise characteristics of the image, so you don’t have to give any hand input Signal to noise ratio.

--checksegmentation

A file with the suffix _seg.fits will be created. This file keeps all the relevant steps in finding true clumps and segmenting the detections into multiple objects in various extensions. Having read the paper or the steps above. Examining this file can be an excellent guide in choosing the best set of parameters. Note that calling this function will significantly slow NoiseChisel. In verbose mode (without the --quiet option, see Operating mode options) the important steps (along with their extension names) will also be reported.

With this option, NoiseChisel will abort as soon as the two tables are created. This behavior can be disabled with --continueaftercheck.


Previous: , Up: Invoking astnoisechisel   [Contents][Index]

7.2.2.4 NoiseChisel output

The default name and directory of the outputs are explained in Automatic output. NoiseChisel’s default output (when none of the options starting with --check or the --output option are called) is one file ending with _labeled.fits. This file has the extensions listed below:

  1. A copy of the input image, a copy is placed here for the following reasons:
  2. The object/detection labels. Each pixel in the input image is given a label in this extension, the labels start from one. If the --onlydetection option is given, each large connected part of the image has one label. Without that option, this extension is going to show the labels of the objects that are found after segmentation. The total number of labels is stored as the value to the NOBJS/NDETS keyword in the header of this extension. This number is also printed in verbose mode.
  3. The clump labels when --onlydetection is not called. All the pixels in the input image that belong to a true clump are given a positive label in this extension. The detected regions that were not a clump are given a negative value to clearly identify the sky noise from the diffuse detections. The total number of clumps in this image is stored in the NCLUMPS keyword of this extension and printed in verbose output.

    If the --grownclumps option is called, or a value of 1 is given to it in any of the configuration files, then instead of the original clump regions, the grown clumps will be stored in this extension. Note that if there is only one clump (or no clumps) over a detected region, then the whole detected region is given a label of 1.

  4. The final sky value on each pixel. See Sky value for a complete explanation.
  5. Similar to the previous mesh but for the standard deviation on each pixel.

To inspect NoiseChisel’s output, you can configure SAO DS9 in your Graphic User Interface (GUI) to open NoiseChisel’s output as a multi-extension data cube. This will allow you to flip through the different extensions and visually inspect the results. This process has been described for the GNOME GUI (most common GUI in GNU/Linux operating systems) in Viewing multiextension FITS images.


Previous: , Up: Data analysis   [Contents][Index]

7.3 MakeCatalog

At the lowest level, a dataset (for example an image) is just a collection of values, placed after each other in any number of dimensions (for example an image is a 2D dataset). Each data-element (pixel) just has two properties: its position (relative to the rest) and its value. The entire input dataset (a large image for example) is rarely treated as a singular entity for higher-level analysis94. You want to know the properties of the scientifically interesting targets that are embedded in it. For example the magnitudes, positions and elliptical properties of the galaxies that are in the image. MakeCatalog is Gnuastro’s program to derive higher-level information for pre-defined regions of a dataset. The role of MakeCatalog in a scientific analysis and the benefits of this model of data-analysis (were detection/identification is separated from measurement) is discussed in Akhlaghi [2016]. We strongly recommend reading this short paper for a better understanding of this methodology and use MakeCatalog most effectively. However, that paper cannot undergo any more change, so this manual is the definitive guide.

As discussed above, you have to define the regions of a dataset that you are interested in before running MakeCatalog. MakeProfiles currently uses labeled dataset(s) for this job. A labeled dataset for a given input dataset has the size/dimensions as the input, but its pixels have an integer type (see Numeric data types)95: all pixels with the same label (integers larger and equal to one) are used to generate the requested output columns of MakeCatalog for the row of their labeled value. For example, the flux weighted average position of all the pixels with a label of 42 will be considered as the central position96 of the 42nd row of the output catalog. Pixels with labels equal to or smaller than zero will be ignored by MakeCatalog. In other words, the number of rows of the output catalog will be determined from the labeled image.

The labeled image maybe created with any tool97. Within Gnuastro you can use these two solutions depending on a-priori/parametric knowledge of the targets you want to study:


Next: , Previous: , Up: MakeCatalog   [Contents][Index]

7.3.1 Detection and catalog production

As discussed above (MakeCatalog), NoiseChisel (Gnuastro’s signal detection tool, see NoiseChisel) does not produce any catalog of the detected objects. However, most other common tools in astronomical data-analysis (for example SExtractor98) merge the two processes into one. Gnuastro’s modularized methodology is therefore new to many experienced astronomers and deserves a short review here. Further discussion on the benefits of this methodology can be seen in Akhlaghi [2016].

To simplify catalog production from a raw input image in Gnuastro, NoiseChisel’s output (see NoiseChisel output) can be directly fed into MakeCatalog. This is good when no further customization is necessary and you want a fast/simple. But the modular approach taken by Gnuastro has many benefits that will become more apparent as you get more experienced in astronomical data analysis and want to be more creative in using your valuable data for the exciting scientific project you are working on. In short the reasons for this modularity can be classified as below:


Next: , Previous: , Up: MakeCatalog   [Contents][Index]

7.3.2 Quantifying measurement limits

No measurement on a real dataset can be perfect: you can only reach a certain level/limit of accuracy. Therefore, a meaningful (scientific) analysis requires an understanding of these limits for the dataset and your analysis tools: different datasets (images in the case of MakeCatalog) have different noise properties and different detection methods (one method/algorith/software that is run with a different set of parameters is considered as a different detection method) will have different abilities to detect or measure certain kinds of signal (astronomical objects) and their properties in an image. Hence, quantifying the detection and measurement limitations with a particular dataset and analysis tool is the most crucial/critical aspect of any high-level analysis.

In this section we discuss some of the most general limits that are very important in any astronomical data analysis and how MakeCatalog makes it easy to find them. Depending on the higher-level analysis, there are more tests that must be done, but these are usually necessary in any case. In astronomy, it is common to use the magnitude (a unit-less scale) and physical units, see Flux Brightness and magnitude. Therefore all the measurements discussed here are defined in units of magnitudes.

Surface brightness limit (of whole dataset)

As we make more observations on one region of the sky, and add the observations into one dataset, we are able to decrease the standard deviation of the noise in each pixel99. Qualitatively, this decrease manifests its self by making fainter (per pixel) parts of the objects in the image more visible. Technically, this is known as surface brightness. Quantitatively, it increases the Signal to noise ratio, since the signal increases faster than noise with more data. It is very important to have in mind that here, noise is defined per pixel (or in the units of our data measurement), not per object.

You can think of the noise as muddy water that is completely covering a flat ground100 with some regions higher than the others101 in it. In this analogy, height (from the ground) is surface brightness. Let’s assume that in your first observation the muddy water has just been stirred and you can’t see anything through it. As you wait and make more observations, the mud settles down and the depth of the transparent water increases, making the summits of hills visible. As the depth of clear water increases, the parts of the hills with lower heights (less parts with lower surface brightness) can be seen more clearly.

The outputs of NoiseChisel include the Sky standard deviation (\(\sigma\)) on every group of pixels (a mesh) that were calculated from the undetected pixels in that mesh, see Tessellation and NoiseChisel output. Let’s take \(\sigma_m\) as the median \(\sigma\) over the successful meshes in the image (prior to interpolation or smoothing).

On different instruments pixels have different physical sizes (for example in micro-meters, or spatial angle over the sky), nevertheless, a pixel is our unit of data collection. In other words, while quantifying the noise, the physical or projected size of the pixels is irrelevant. We thus define the Surface brightness limit or depth, in units of magnitude/pixel, of a data-set, with zeropoint magnitude \(z\), with the \(n\)th multiple of \(\sigma_m\) as (see Flux Brightness and magnitude):

$$SB_{\rm Pixel}=-2.5\times\log_{10}{(n\sigma_m)}+z$$

As an example, the XDF survey covers part of the sky that the Hubble space telescope has observed the most (for 85 orbits) and is consequently very small (\(\sim4\) arcmin\(^2\)). On the other hand, the CANDELS survey, is one of the widest multi-color surveys covering several fields (about 720 arcmin\(^2\)) but its deepest fields have only 9 orbits observation. The depth of the XDF and CANDELS-deep surveys in the near infrared WFC3/F160W filter are respectively 34.40 and 32.45 magnitudes/pixel. In a single orbit image, this same field has a depth of 31.32. Recall that a larger magnitude corresponds to less brightness.

The low-level magnitude/pixel measurement above is only useful when all the datasets you want to use belong to one instrument (telescope and camera). However, you will often find yourself using datasets from various instruments with different pixel scales (projected pixel sizes). If we know the pixel scale, we can obtain a more easily comparable surface brightness limit in units of: magnitude/arcsec\(^2\). Let’s assume that the dataset has a zeropoint value of \(z\), and every pixel is \(p\) arcsec\(^2\) (so \(A/p\) is the number of pixels that cover an area of \(A\) arcsec\(^2\)). If the \(n\)th multiple of \(\sigma_m\) is desired, then the surface brightness (in units of magnitudes per A arcsec\(^2\)) is102:

$$SB_{\rm Projected}=-2.5\times\log_{10}{\left(n\sigma_m\sqrt{A\over p}\right)+z}$$

Note that this is an extrapolation of the actually measured value of \(\sigma_m\) (which was per pixel). So it should be used with extreme care (for example the dataset must have an approximately flat depth). For each detection over the dataset, you can estimate an upper-limit magnitude which actually uses the detection’s area/footprint. It doesn’t extrapolate and even accounts for correlated noise features. Therefore, the upper-limit magnitude is a much better measure of your dataset’s surface brightness limit for each particular object.

MakeCatalog will calculate the input dataset’s \(SB_{\rm Pixel}\) and \(SB_{\rm Projected}\) and write them as comments/meta-data in the output catalog(s). Just note that \(SB_{\rm Projected}\) is only calculated if the input has World Coordinate System (WCS).

Completeness limit (of each detection)

As the surface brightness of the objects decreases, the ability to detect them will also decrease. An important statistic is thus the fraction of objects of similar morphology and brightness that will be identified with our detection algorithm/parameters in the given image. This fraction is known as completeness. For brighter objects, completeness is 1: all bright objects that might exist over the image will be detected. However, as we go to lower surface brightness objects, we fail to detect some and gradually we are not able to detect anything any more. For a given profile, the magnitude where the completeness drops below a certain level usually above \(90\%\) is known as the completeness limit.

Another important parameter in measuring completeness is purity: the fraction of true detections to all true detections. In effect purity is the measure of contamination by false detections: the higher the purity, the lower the contamination. Completeness and purity are anti-correlated: if we can allow a large number of false detections (that we might be able to remove by other means), we can significantly increase the completeness limit.

One traditional way to measure the completeness and purity of a given sample is by embedding mock profiles in regions of the image with no detection. However in such a study we must be really careful to choose model profiles as similar to the target of interest as possible.

Magnitude measurement error (of each detection)

Any measurement has an error and this includes the derived magnitude for an object. Note that this value is only meaningful when the object’s magnitude is brighter than the upper-limit magnitude (see the next items in this list). As discussed in Flux Brightness and magnitude, the magnitude (\(M\)) of an object with brightness \(B\) and Zeropoint magnitude \(z\) can be written as:

$$M=-2.5\log_{10}(B)+z$$

Calculating the derivative with respect to \(B\), we get:

$${dM\over dB} = {-2.5\over {B\times ln(10)}}$$

From the Tailor series (\(\Delta{M}=dM/dB\times\Delta{B}\)), we can write:

$$\Delta{M} = \left|{-2.5\over ln(10)}\right|\times{\Delta{B}\over{B}}$$

But, \(\Delta{B}/B\) is just the inverse of the Signal-to-noise ratio (\(S/N\)), so we can write the error in magnitude in terms of the signal-to-noise ratio:

$$\Delta{M} = {2.5\over{S/N\times ln(10)}} $$

MakeCatalog uses this relation to estimate the magnitude errors. The signal-to-noise ratio is calculated in different ways for clumps and objects (see Akhlaghi and Ichikawa [2015]), but this single equation can be used to estimate the measured magnitude error afterwards for any type of target.

Upper limit magnitude (of each detection)

Due to the noisy nature of data, it is possible to get arbitrarily low values for a faint object’s brightness (or arbitrarily high magnitudes). Given the scatter caused by the noise, such small values are meaningless: another similar depth observation will give a radically different value. This problem is most common when you use one image/filter to generate target labels (which specify which pixels belong to which object, see NoiseChisel output and MakeCatalog) and another image/filter to generate a catalog for measuring colors.

The object might not be visible in the filter used for the latter image, or the image depth (see above) might be much shallower. So you will get unreasonably faint magnitudes. For example when the depth of the image is 32 magnitudes, a measurement that gives a magnitude of 36 for a \(\sim100\) pixel object is clearly unreliable. In another similar depth image, we might measure a magnitude of 30 for it, and yet another might give 33. Furthermore, due to the noise scatter so close to the depth of the data-set, the total brightness might actually get measured as a negative value, so no magnitude can be defined (recall that a magnitude is a base-10 logarithm).

Using such unreliable measurements will directly affect our analysis, so we must not use them. However, all is not lost! Given our limited depth, there is one thing we can deduce about the object’s magnitude: we can say that if something actually exists here (possibly buried deep under the noise), it must have a magnitude that is fainter than an upper limit magnitude. To find this upper limit magnitude, we place the object’s footprint (segmentation map) over random parts of the image where there are no detections, so we only have pure (possibly correlated) noise and undetected objects. Doing this a large number of times will give us a distribution of brightness values. The standard deviation (\(\sigma\)) of that distribution can be used to quantify the upper limit magnitude.

Traditionally, faint/small object photometry was done using fixed circular apertures (for example with a diameter of \(N\) arc-seconds). In this way, the upper limit was like the depth discussed above: one value for the whole image. But with the much more advanced hardware and software of today, we can make customized segmentation maps for each object. The number of pixels (are of the object) used directly affects the final distribution and thus magnitude. Also the image correlated noise might actually create certain patters, so the shape of the object can also affect the result. So in MakeCatalog, the upper limit magnitude is found for each object in the image separately. Not one value for the whole image.


Next: , Previous: , Up: MakeCatalog   [Contents][Index]

7.3.3 Measuring elliptical parameters

The shape or morphology of a target is one of the most commonly desired parameters of a target. Here, we will review the derivation of the most basic/simple morphological parameters are estimated: the elliptical parameters for a set of labeled pixels. The elliptical parameters are: the (semi-)major axis, the (semi-)minor axis and the position angle along with the central position of the profile. The derivations below follow the SExtractor manual derivations with some added explanations for easier reading.

Let’s begin with one dimension for simplicity: Assume we have a set of \(N\) values \(B_i\) (keeping the spatial distribution of brightness for example), each at position \(x_i\). The simplest parameter we can define is the geometric center of the object (\(x_g\)) (ignoring the brightness values): \(x_g=(\sum_ix_i)/N\). Moments are defined to incorporate both the value (brightness) and position of the data. The first moment can be written as:

$$\overline{x}={\sum_iB_ix_i \over \sum_iB_i}$$

This is essentially the weighted (by \(B_i\)) mean position. The geometric center (\(x_g\), defined above) is a special case of this with all \(B_i=1\). The second moment is essentially the variance of the distribution:

$$\overline{x^2}\equiv{\sum_iB_i(x_i-\overline{x})^2 \over \sum_iB_i} = {\sum_iB_ix_i^2 \over \sum_iB_i} - 2\overline{x}{\sum_iB_ix_i\over\sum_iB_i} + \overline{x}^2 ={\sum_iB_ix_i^2 \over \sum_iB_i} - \overline{x}^2$$

The last step was done from the definition of \(\overline{x}\). Hence, the square root of \(\overline{x^2}\) is the spatial standard deviation (along the one-dimensional) of this particular brightness distribution (\(B_i\)). Crudely (or qualitatively), you can think of its square root as the distance (from \(\overline{x}\)) which contains a specific amount of the flux (depending on the \(B_i\) distribution). Similar to the first moment, the geometric second moment can be found by setting all \(B_i=1\). So while the first moment quantified the position of the brightness distribution, the second moment quantifies how that brightness is dispersed about the first moment. In other words, it quantifies how “sharp” the object’s image is.

Before continuing to two dimensions and the derivation of the elliptical parameters, let’s pause for an important implementation technicality. You can ignore this paragraph if you don’t want to implement these concepts. The basic definition (first fraction for \(\overline{x^2}\)) can be used without any major problem. However, using this fraction requires two runs over the data: one run to find \(\overline{x}\) and one run to find \(\overline{x^2}\), this can be slow. However, using the last fraction above, we can estimate both the first and second moments in one run (since the \(-\overline{x}^2\) term can easily be added later). The logarithmic nature of floating point number digitization creates a complication in this approach: suppose the object is located between pixels 10000 and 10020. Hence the target’s pixels are only distributed over 20 pixels (with a standard deviation \(<20\)), while the mean has a value of \(\sim10000\). The \(\sum_iB_i^2x_i^2\) will go to very very large values while the individual pixel differences will be much smaller, this will lower the accuracy of our calculation due to the limited accuracy of floating point operations. The variance only depends on the distance of each point from the mean, so we can shift all position by a constant/arbitrary \(K\) which is much closer to the mean: \(\overline{x-K}=\overline{x}-K\). Hence we can calculate the second order moment using:

$$\overline{x^2}={\sum_iB_i(x_i-K)^2 \over \sum_iB_i} - (\overline{x}-K)^2 $$

The closer \(K\) is to \(\overline{x}\), the better (the sums of squares will involve smaller numbers), as long as \(K\) is within the object limits (in the example above: \(10000\leq{K}\leq10020\)), the floating point error induced in our calculation will be negligible. For the most simplest implementation, MakeCatalog takes \(K\) to be the smallest position of the object in each dimension. Since \(K\) is arbitrary and an implementation/technical detail, we will ignore it for the remainder of this discussion.

In two dimensions, the mean and variances can be written as:

$$\overline{x}={\sum_iB_ix_i\over B_i}, \quad \overline{x^2}={\sum_iB_ix_i^2 \over \sum_iB_i} - \overline{x}^2$$ $$\overline{y}={\sum_iB_iy_i\over B_i}, \quad \overline{y^2}={\sum_iB_iy_i^2 \over \sum_iB_i} - \overline{y}^2$$ $$\quad\quad\quad\quad\quad\quad\quad\quad\quad \overline{xy}={\sum_iB_ix_iy_i \over \sum_iB_i} - \overline{x}\times\overline{y}$$

If an elliptical profile’s major axis exactly lies along the \(x\) axis, then \(\overline{x^2}\) will be directly proportional with the profile’s major axis, \(\overline{y^2}\) with its minor axis and \(\overline{xy}=0\). However, in reality we are not that lucky and (assuming galaxies can be parameterized as an ellipse) the major axis of galaxies can be in any direction on the image (in fact this is one of the core principles behind weak-lensing by shear estimation). So the purpose of the remainder of this section is to define a strategy to measure the position angle and axis ratio of some randomly positioned ellipses in an image, using the raw second moments that we have calculated above in our image coordinates.

Let’s assume we have rotated the galaxy by \(\theta\), the new second order moments are:

$$\overline{x_\theta^2} = \overline{x^2}\cos^2\theta + \overline{y^2}\sin^2\theta - 2\overline{xy}\cos\theta\sin\theta $$ $$\overline{y_\theta^2} = \overline{x^2}\sin^2\theta + \overline{y^2}\cos^2\theta + 2\overline{xy}\cos\theta\sin\theta$$ $$\overline{xy_\theta} = \overline{x^2}\cos\theta\sin\theta - \overline{y^2}\cos\theta\sin\theta + \overline{xy}(\cos^2\theta-\sin^2\theta)$$

The best \(\theta\) (\(\theta_0\), where major axis lies along the \(x_\theta\) axis) can be found by:

$$\left.{\partial \overline{x_\theta^2} \over \partial \theta}\right|_{\theta_0}=0$$ Taking the derivative, we get: $$2\cos\theta_0\sin\theta_0(\overline{y^2}-\overline{x^2}) + 2(\cos^2\theta_0-\sin^2\theta_0)\overline{xy}=0$$ When \(\overline{x^2}\neq\overline{y^2}\), we can write: $$\tan2\theta_0 = 2{\overline{xy} \over \overline{x^2}-\overline{y^2}}.$$

MakeCatalog uses the standard C math library’s atan2 function to estimate \(\theta_0\), which we define as the position angle of the ellipse. To recall, this is the angle of the major axis of the ellipse with the \(x\) axis. By definition, when the elliptical profile is rotated by \(\theta_0\), then \(\overline{xy_{\theta_0}}=0\), \(\overline{x_{\theta_0}^2}\) will be the extent of the maximum variance and \(\overline{y_{\theta_0}^2}\) the extent of the minimum variance (which are perpendicular for an ellipse). Replacing \(\theta_0\) in the equations above for \(\overline{x_\theta}\) and \(\overline{y_\theta}\), we can get the semi-major (\(A\)) and semi-minor (\(B\)) lengths:

$$A^2\equiv\overline{x_{\theta_0}^2}= {\overline{x^2} + \overline{y^2} \over 2} + \sqrt{\left({\overline{x^2}-\overline{y^2} \over 2}\right)^2 + \overline{xy}^2}$$

$$B^2\equiv\overline{y_{\theta_0}^2}= {\overline{x^2} + \overline{y^2} \over 2} - \sqrt{\left({\overline{x^2}-\overline{y^2} \over 2}\right)^2 + \overline{xy}^2}$$

As a summary, it is important to remember that the units of \(A\) and \(B\) are in pixels (the standard deviation of a positional distribution) and that they represent the spatial light distribution of the object in both image dimensions (rotated by \(\theta_0\)). When the object cannot be represented as an ellipse, this interpretation breaks down: \(\overline{xy_{\theta_0}}\neq0\) and \(\overline{y_{\theta_0}^2}\) will not be the direction of minimum variance.


Next: , Previous: , Up: MakeCatalog   [Contents][Index]

7.3.4 Adding new columns to MakeCatalog

MakeCatalog is designed to allow easy addition of different measurements over a labeled image (see Akhlaghi [2016]). A check-list style description of necessary steps to do that is described in this section. The common development characteristics of MakeCatalog and other Gnuastro programs is explained in Developing. We strongly encourage you to have a look at that chapter to greatly simplify your navigation in the code. After adding and testing your column, you are most welcome (and encouraged) to share it with us so we can add to the next release of Gnuastro for everyone else to also benefit from your efforts.

MakeCatalog will first pass over each label’s pixels two times and do necessary raw/internal calculations. Once the passes are done, it will use the raw information for filling the final catalog’s columns. In the first pass it will gather mainly object information and in the second run, it will mainly focus on the clumps, or any other measurement that needs an output from the first pass. These two passes are designed to be raw summations: no extra processing. This will allow parallel processing and simplicity/clarity. So if your new calculation, needs new raw information from the pixels, then you will need to also modify the respective mkcatalog_first_pass and mkcatalog_second_pass functions (both in bin/mkcatalog/mkcatalog.c) and define new raw table columns in main.h (hopefully the comments in the code are clear enough).

In all these different places, the final columns are sorted in the same order (same order as Invoking MakeCatalog). This allows a particular column/option to be easily found in all steps. Therefore in adding your new option, be sure to keep it in the same relative place in the list in all the separate places (it doesn’t necessarily have to be in the end), and near conceptually similar options.

main.h

The objectcols and clumpcols enumerated variables (enum) define the raw/internal calculation columns. If your new column requires new raw calculations, add a row to the respective list. If your calculation requires any other settings parameters, you should add a variable to the mkcatalogparams structure.

ui.h

The option_keys_enum associates a unique value for each option to MakeProfiles. The options that have a short option version, the single character short comment is used for the value. Those that don’t have a short option version, get a large integer automatically. You should add a variable here to identify your desired column.

args.h

This file specifies all the parameters for the GNU C library, Argp structure that is in charge of reading the user’s options. To define your new column, just copy an existing set of parameters and change the first, second and 5th values (the only ones that differ between all the columns), you should use the macro you defined in ui.h here.

ui.c

If your column includes any particular settings (you added a variable to the mkcatalogparams structure in main.h), you should do the sanity checks and preparations for it here. Otherwise, you can ignore this file.

columns.c

This file will contain the main definition and high-level calculation of your new column through the columns_define_alloc and columns_fill functions. In the first, you specify the basic information about the column: its name, units, comments, type (see Numeric data types) and how it should be printed if the output is a text file. You should also specify the raw/internal columns that are necessary for this column here as the many existing examples show. Through the types for objects and rows, you can specify if this column is only for clumps, objects or both.

The second main function (columns_fill) writes the final value into the appropriate column for each object and clump. As you can see in the many existing examples, you can define your processing on the raw/internal calculations here and save them in the output.

mkcatalog.c

As described before, this file contains the two main MakeCatalog work-horses: mkcatalog_first_pass and mkcatalog_second_pass, their names are descriptive enough and their internals are also clear and heavily commented.


Previous: , Up: MakeCatalog   [Contents][Index]

7.3.5 Invoking MakeCatalog

MakeCatalog will make a catalog from an input image and at least on labeled image. The executable name is astmkcatalog with the following general template

$ astmkcatalog [OPTION ...] InputImage.fits

One line examples:

## Create catalog with RA, Dec, Magnitude and Magnitude error,
## `input.fits' is NoiseChisel's output:
$ astmkcatalog --ra --dec --magnitude --magnitudeerr input.fits

## Same catalog as above (using short options):
$ asmkcatalog -rdmG input.fits

## Write the catalog to a FITS table:
$ astmkcatalog -mpQ --output=cat.fits input_labeled.fits

## Read the columns to create from `columns.conf':
$ astmkcatalog --config=columns.conf input_labeled.fits

## Use different images for the objects and clumps inputs:
$ astmkcatalog --objectsfile=K_labeled.fits --objectshdu=1    \
               --clumpsfile=K_labeled.fits --clumpshdu=2 i_band.fits

If MakeCatalog is to do processing, an input image should be provided with the recognized extensions as input data, see Arguments. The options described in this section are those that are only particular to MakeProfiles. For operations that MakeProfiles shares with other programs (mainly involving input/output or general processing steps), see Common options. Also see Common program behavior for some general characteristics of all Gnuastro programs including MakeCatalog.

MakeCatalog needs 4 (or 5) images as input. These images can be separate extensions in one file (NoiseChisel’s default output), or each can have its own file and its own extension. See NoiseChisel output for the list. The clump labels image is not mandatory (when no clump catalog is required, for example in aperture photometry). When inspecting the object labels image, MakeProfiles will look for a WCLUMPS (short for with-clumps) header keyword. If that keyword is present and has a value of yes, 1, or y (case insensitive) then a clump image must also be provided and a clump catalog will be made. When WCLUMPS isn’t present or has any other value, only an object catalog will be created and all clump related options/columns will be ignored.

For example, if you only need an object catalog from NoiseChisel’s output, you can use Gnuastro’s Fits program (see Fits) to modify or remove the WCLUMPS keyword in the objects HDU, then run MakeCatalog on it. Another example can be aperture photometry: let’s assume you have made your labeled image (defining the apertures) with MakeProfiles. Clumps are not defined in this context, so besides the input and labeled image, you only need NoiseChisel’s Sky and Sky standard deviation images (run NoiseChisel with the --onlydetection option). Since MakeProfile’s output doesn’t contain the WCLUMPS keyword, you just have to specify your labeled image with the --objectsfile option and also set its HDU. Note that labeled images have to be an integer type. Therefore, if you are using MakeProfiles to define the apertures/labels, you can use its --type=int32 for example, see Input/Output options and Numeric data types.

When a clump catalog is also desired, two catalogs will be made: one for the objects (suffixed with _o.txt or _o.fits) and another for the clumps (suffixed with _c.txt or _c.fits). Therefore if any value is given to the --output option, MakeCatalogs will replace these two suffixes with any existing suffix in the given value. If no output value is given, MakeCatalog will use the input name, see Automatic output. The format of the output table is specified with the --tableformat option, see Input/Output options.

When MakeCatalog is run on multiple threads, the clumps catalog rows will not be sorted by object since each object is processed independently by one thread and threaded applications are asynchronous. The clumps in each object will be sorted based on their labels, but you will find lower-index objects come after higher-index ones (especially if they have more clumps and thus take more time). If the order is very important for you, you can run the following command to sort the rows by object ID (and clump ID with each object):

$ awk '!/^#/' out_c.txt | sort -g -k1,1 -k2,2

Next: , Previous: , Up: Invoking astmkcatalog   [Contents][Index]

7.3.5.1 MakeCatalog input files

MakeCatalog needs multiple images as input: a values image, one (or two) labeled images and Sky and Sky standard deviation images. The options described in this section allow you to identify them. If you use the default output of NoiseChisel (see NoiseChisel output) you don’t have to worry about any of these options and just give NoiseChisel’s output file to MakeCatalog as described in Invoking MakeCatalog.

-O STR
--objectsfile=STR

The file name of the object labels image, if the image is in another extension of the input file, calling this option is not mandatory, just specify the extension/HDU with the --objectshdu option.

--objectshdu=STR

The HDU/extension of the object labels image. Only pixels with values above zero will be considered. The objects label image has to be an integer data type (see Numeric data types) and only pixels with a value larger than zero will be used. If this extension contains the WCLUMPS keyword with a value of yes, 1, or y (not case sensitive), then MakeCatalog will also build a clumps catalog, see Invoking MakeCatalog.

-C STR
--clumpsfile=STR

Similar to --objlabs but for the labels of the clumps. This is only necessary if the image containing clump labels is not in the input file and the objects image has a WCLUMPS keyword, see --objectshdu.

--clumpshdu=STR

The HDU/extension of the object labels image. Only pixels with values above zero will be considered. The objects label image has to be an integer data type (see Numeric data types) and only pixels with a value larger than zero will be used.

-s STR
--skyfile=STR

File name of an image keeping the Sky value for each pixel.

--skyhdu=STR

The HDU of the Sky value image.

-t STR
--stdfile=STR

File name of image keeping the Sky value standard deviation for each pixel.

--stdhdu=STR

The HDU of the Sky value standard deviation image.


Next: , Previous: , Up: Invoking astmkcatalog   [Contents][Index]

7.3.5.2 MakeCatalog general settings

Some of the columns require particular settings (for example the zero point magnitude for measuring magnitudes), the options in this section can be used for such configurations.

-z FLT
--zeropoint=FLT

The zero point magnitude for the input image, see Flux Brightness and magnitude.

-E
--skysubtracted

If the image has already been sky subtracted by another program, then you need to notify MakeCatalog through this option. Note that this is only relevant when the Signal to noise ratio is to be calculated.

-T FLT
--threshold=FLT

For all the columns, only consider pixels that are above a given relative threshold. Symbolizing the value of this option as \(T\), the Sky for a pixel at \((i,j)\) with \(\mu_{ij}\) and its Standard deviation with \(\sigma_{ij}\), that pixel will only be used if its value (\(B_{ij}\)) satisfies this condition: \(B_{ij}>\mu_{ij}+{T}\sigma_{ij}\). The only calculations that will not be affected are is the average river values (--riverave), since they are used as a reference. A commented row will be added in the header of the output catalog that will print the given value, since this is a very important issue, it starts with **IMPORTANT**.

NoiseChisel will detect very diffuse signal which is useful in most cases where the aggregate properties of the detections are desired, since there is signal there (with the desired certainty). However, in some cases, only the properties of the peaks of the objects/clumps are desired, for example in attempting to separate stars from galaxies, the peaks are the major target and the diffuse regions only act to complicate the separation. With this option, MakeCatalog will simply ignore any pixel below the relative threshold.

This option is not mandatory, so if it isn’t given (after reading the command-line and all configuration files, see Configuration files), MakeCatalog will still operate. However, if it has a value in any lower-level configuration file and you want to ignore that value for this particular run or in a higher-level configuration file, then set it to NaN, for example --threshold=nan. Gnuastro uses the C library’s strtod function to read floats, which is not case-sensitive in reading NaN values. But to be consistent, it is good practice to only use nan.

--nsigmag=FLT

The median standard deviation (from the standard deviation image) will be multiplied by the value to this option and its magnitude will be reported in the comments of the output catalog. This value is a per-pixel value, not per object/clump and is not found over an area or aperture, like the common \(5\sigma\) values that are commonly reported as a measure of depth or the upper-limit measurements (see Quantifying measurement limits).


Next: , Previous: , Up: Invoking astmkcatalog   [Contents][Index]

7.3.5.3 Upper-limit magnitude settings

The upper limit magnitude was discussed in Quantifying measurement limits. Unlike other measured values/columns in MakeCatalog, the upper limit magnitude needs several defined parameters which are discussed here. All the upper limit magnitude specific options start with up for upper-limit, except for --envseed that is also present in other programs and is general for any job requiring random number generation (see Generating random numbers).

One very important consideration in Gnuastro is reproducibility. Therefore, the values to all of these parameters along with others (like the random number generator type and seed) are also reported in the comments of the final catalog when the upper limit magnitude column is desired. The random seed that is used to define the random positionings for each object or clump is unique and set based on the given seed, the total number of objects and clumps and also the labels of the clumps and objects. So with identical inputs, an identical upper-limit magnitude will be found. But even if the ordering of the object/clump labels differs (and the seed is the same) the result will not be the same.

MakeCatalog will randomly place the object/clump footprint over the image and when the footprint doesn’t fall on any object or masked region (see --upmaskfile) it will be used until the desired number (--upnum) of samples are found to estimate the distribution’s standard deviation (see Quantifying measurement limits). Otherwise it will be ignored and another random position will be generated. But when the profile is very large or the image is significantly covered by detections, it might not be possible to find the desired number of samplings. MakeProfiles will continue searching until 50 times the value given to --upnum. If --upnum good samples cannot be found until this limit, it will set the upper-limit magnitude for that object to NaN (blank).

--upmaskfile=STR

File name of mask image to use for upper-limit calculation. In some cases (especially when doing matched photometry), the object labels specified in the main input and mask image might not be adequate. In other words they do not necessarily have to cover all detected objects: the user might have selected only a few of the objects in their labeled image. This option can be used to ignore regions in the image in these situations when estimating the upper-limit magnitude. All the non-zero pixels of the image specified by this option (in the --upmaskhdu extension) will be ignored in the upper-limit magnitude measurements.

For example, when you are using labels from another image, you can give NoiseChisel’s objects image output for this image as the value to this option. In this way, you can be sure that regions with data do not harm your distribution. See Quantifying measurement limits for more on the upper limit magnitude.

--upmaskhdu=STR

The extension in the file specified by --upmask.

--upnum=INT

The number of random samples to take for all the objects. A larger value to this option will give a more accurate result (asymptotically), but it will also slow down the process. When a randomly positioned sample overlaps with a detected/masked pixel it is not counted and another random position is found until the object completely lies over an undetected region. So you can be sure that for each object, this many samples over undetected objects are made. See the upper limit magnitude discussion in Quantifying measurement limits for more.

--envseed

Read the random number generator type and seed value from the environment (see Generating random numbers). Random numbers are used in calculating the random positions of different samples of each object.

--upsigmaclip=FLT,FLT

The raw distribution of random values will not be used to find the upper-limit magnitude, it will first be \(\sigma\)-clipped (see Sigma clipping) to avoid outliers in the distribution (mainly the faint undetected wings of bright/large objects in the image). This option takes two values: the first is the multiple of \(\sigma\), and the second is the termination criteria. If the latter is larger than 1, it is read as an integer number and will be the number of times to clip. If it is smaller than 1, it is interpretted as the tolerance level to stop clipping. See Sigma clipping for a complete explanation.

--upnsigma=FLT

The multiple of the final (\(\sigma\)-clipped) standard deviation (or \(\sigma\)) used to measure the upper-limit brightness or magnitude.


Previous: , Up: Invoking astmkcatalog   [Contents][Index]

7.3.5.4 MakeCatalog output columns

The final group of options particular to MakeCatalog are those that specify which columns should be written into the final output table. For each column there is an option, if it has been called on the command line or in any of the configuration files, it will included as a column in the output catalog in the same order (see Configuration file precedence). Some of the columns apply to both objects and clumps and some are particular to only one of them. The latter cases are explicitly marked with [Objects] or [Clumps] to specify the catalog they will be placed in.

--i
--ids

This is a unique option it can add multiple columns to the final catalog(s). Calling this option will put the object IDs (--objid) in the objects catalog and host-object-ID (--hostobjid) and ID-in-host-object (--idinhostobj) into the clumps catalog. Hence if only object catalogs are required, it has the same effect as --objid.

--objid

[Objects] ID of this object.

-j
--hostobjid

[Clumps] The ID of the object which hosts this clump.

--idinhostobj

[Clumps] The ID of this clump in its host object.

-C
--numclumps

[Objects] The number of clumps in this object.

-a
--area

The raw area (number of pixels) in any clump or object independent of what pixel it lies over (if it is NaN/blank or unused for example).

--clumpsarea

[Objects] The total area of all the clumps in this object.

--weightarea

The area (number of pixels) used in the flux weighted position calculations.

-x
--x

The flux weighted center of all objects and clumps along the first FITS axis (horizontal when viewed in SAO ds9), see \(\overline{x}\) in Measuring elliptical parameters. The weight has to have a positive value (pixel value larger than the Sky value) to be meaningful! Specially when doing matched photometry, this might not happen: no pixel value might be above the Sky value. For such detections, the geometric center will be reported in this column (see --geox). You can use --weightarea to see which was used.

-y
--y

The flux weighted center of all objects and clumps along the second FITS axis (vertical when viewed in SAO ds9). See --x.

--geox

The geometric center of all objects and clumps along the first FITS axis axis. The geometric center is the average pixel positions irrespective of their pixel values.

--geoy

The geometric center of all objects and clumps along the second FITS axis axis, see --geox.

--clumpsx

[Objects] The flux weighted center of all the clumps in this object along the first FITS axis. See --x.

--clumpsy

[Objects] The flux weighted center of all the clumps in this object along the second FITS axis. See --x.

--clumpsgeox

[Objects] The geometric center of all the clumps in this object along the first FITS axis. See --geox.

--clumpsgeoy

[Objects] The geometric center of all the clumps in this object along the second FITS axis. See --geox.

-r
--ra

Flux weighted right ascension of all objects or clumps, see --x.

-d
--dec

Flux weighted declination of all objects or clumps, see --x.

--geora

Geometric center right ascension of all objects or clumps, see --geox.

--geodec

Geometric center declination of all objects or clumps, see --geox.

--clumpsra

[Objects] Flux weighted right ascension of all clumps in this object, see --x.

--clumpsdec

[Objects] Flux weighted declination of all clumps in this object, see --x.

--clumpsgeora

[Objects] Geometric center right ascension of all clumps in this object, see --geox.

--clumpsgeodec

[Objects] Geometric center declination of all clumps in this object, see --geox.

-b
--brightness

The brightness (sum of all pixel values), see Flux Brightness and magnitude. For clumps, the ambient brightness (flux of river pixels around the clump multiplied by the area of the clump) is removed, see --riverflux. So the sum of clump brightnesses in the clump catalog will be smaller than the total clump brightness in the --clumpbrightness column of the objects catalog.

If no usable pixels (blank or below the threshold) are present over the clump or object, the stored value will be NaN (note that zero is meaningful).

--clumpbrightness

[Objects] The total brightness of the clumps within an object. This is simply the sum of the pixels associated with clumps in the object. If no usable pixels (blank or below the threshold) are present over the clump or object, the stored value will be NaN, because zero (note that zero is meaningful).

--noriverbrightness

[Clumps] The Sky (not river) subtracted clump brightness. By definition, for the clumps, the average brightness of the rivers surrounding it are subtracted from it for a first order accounting for contamination by neighbors. In cases where you will be calculating the flux brightness difference later (one example below) the contamination will be (mostly) removed at that stage, which is why this column was added.

One example might be this: you want to know the change in the clump flux as a function of threshold (see --threshold). So you will make two catalogs (each having this column but with different thresholds) and then subtract the lower threshold catalog (higher brightness) from the higher threshold catalog (lower brightness). The effect is most visible when the rivers have a high average signal-to-noise ratio. The removed contribution from the pixels below the threshold will be less than the river pixels. Therefore the river-subtracted brightness (--brightness) for the thresholded catalog for such clumps will be larger than the brightness with no threshold!

If no usable pixels (blank or below the possibly given threshold) are present over the clump or object, the stored value will be NaN (note that zero is meaningful).

-m
--magnitude

The magnitude of clumps or objects, see --brightness.

-e
--magnitudeerr

The magnitude error of clumps or objects. The magnitude error is calculated from the signal-to-noise ratio (see --sn and Quantifying measurement limits). Note that until now this error assumes un-correlated pixel values and also does not include the error in estimating the aperture (or error in generating the labeled image).

For now these factors have to be found by other means. Task 14124 has been defined for work on adding these sources of error too.

--clumpsmagnitude

[Objects] The magnitude of all clumps in this object, see --clumpbrightness.

--upperlimit

The upper limit value (in units of the input image) for this object or clump. See Quantifying measurement limits and Upper-limit magnitude settings for a complete explanation. This is very important for the fainter and smaller objects in the image where the measured magnitudes are not reliable.

--upperlimitmag

The upper limit magnitude for this object or clump. See Quantifying measurement limits and Upper-limit magnitude settings for a complete explanation. This is very important for the fainter and smaller objects in the image where the measured magnitudes are not reliable.

--riverave

[Clumps] The average brightness of the river pixels around this clump. River pixels were defined in Akhlaghi and Ichikawa 2015. In short they are the pixels immediately outside of the clumps. This value is used internally to find the brightness (or magnitude) and signal to noise ratio of the clumps. It can generally also be used as a scale to gauge the base (ambient) flux surrounding the clump. In case there was no river pixels, then this column will have the value of the Sky under the clump. So note that this value is not sky subtracted.

--rivernum

[Clumps] The number of river pixels around this clump, see --riverflux.

-n
--sn

The Signal to noise ratio (S/N) of all clumps or objects. See Akhlaghi and Ichikawa (2015) for the exact equations used.

--sky

The sky flux (per pixel) value under this object or clump. This is actually the mean value of all the pixels in the sky image that lie on the same position as the object or clump.

--std

The sky value standard deviation (per pixel) for this clump or object. Like --sky, this is the average of the values in the input sky standard deviation image pixels that lie over this object.

-A
--semimajor

The pixel-value weighted semi-major axis of the profile (assuming it is an ellipse) in units of pixels. See Measuring elliptical parameters.

-B
--semiminor

The pixel-value weighted semi-minor axis of the profile (assuming it is an ellipse) in units of pixels. See Measuring elliptical parameters.

-p
--positionangle

The pixel-value weighted angle of the semi-major axis with the first FITS axis in degrees. See Measuring elliptical parameters.

--geosemimajor

The geometric (ignoring pixel values) semi-major axis of the profile, assuming it is an ellipse.

--geosemiminor

The geometric (ignoring pixel values) semi-minor axis of the profile, assuming it is an ellipse.

--geopositionangle

The geometric (ignoring pixel values) angle of the semi-major axis with the first FITS axis in degrees.


Next: , Previous: , Up: Top   [Contents][Index]

8 Modeling and fitting

In order to fully understand observations after initial analysis on the image, it is very important to compare them with the existing models to be able to further understand both the models and the data. The tools in this chapter create model galaxies and will provide 2D fittings to be able to understand the detections.


Next: , Previous: , Up: Modeling and fittings   [Contents][Index]

8.1 MakeProfiles

MakeProfiles will create mock astronomical profiles from a catalog, either individually or together in one output image. In data analysis, making a mock image can act like a calibration tool, through which you can test how successfully your detection technique is able to detect a known set of objects. There are commonly two aspects to detecting: the detection of the fainter parts of bright objects (which in the case of galaxies fade into the noise very slowly) or the complete detection of an over-all faint object. Making mock galaxies is the most accurate (and idealistic) way these two aspects of a detection algorithm can be tested. You also need mock profiles in fitting known functional profiles with observations.

MakeProfiles was initially built for extra galactic studies, so currently the only astronomical objects it can produce are stars and galaxies. We welcome the simulation of any other astronomical object. The general outline of the steps that MakeProfiles takes are the following:

  1. Build the full profile out to its truncation radius in a possibly over-sampled array.
  2. Multiply all the elements by a fixed constant so its total magnitude equals the desired total magnitude.
  3. If --individual is called, save the array for each profile to a FITS file.
  4. If --nomerged is not called, add the overlapping pixels of all the created profiles to the output image and abort.

Using input values, MakeProfiles adds the World Coordinate System (WCS) headers of the FITS standard to all its outputs (except PSF images!). For a simple test on a set of mock galaxies in one image, there is no need for the third step or the WCS information.

However in complicated simulations like weak lensing simulations, where each galaxy undergoes various types of individual transformations based on their position, those transformations can be applied to the different individual images with other programs. After all the transformations are applied, using the WCS information in each individual profile image, they can be merged into one output image for convolution and adding noise.


Next: , Previous: , Up: MakeProfiles   [Contents][Index]

8.1.1 Modeling basics

In the subsections below, first a review of some very basic information and concepts behind modeling a real astronomical image is given. You can skip this subsection if you are already sufficiently familiar with these concepts.


Next: , Previous: , Up: Modeling basics   [Contents][Index]

8.1.1.1 Defining an ellipse

The PSF, see Point Spread Function, and galaxy radial profiles are generally defined on an ellipse so in this section first defining an ellipse on a pixelated 2D surface is discussed. Labeling the major axis of an ellipse \(a\), and its minor axis with \(b\), the axis ratio is defined as: \(q\equiv b/a\). The major axis of an ellipse can be aligned in any direction, therefore the angle of the major axis with respect to the horizontal axis of the image is defined to be the position angle of the ellipse and in this book, we show it with \(\theta\).

Our aim is to put a radial profile of any functional form \(f(r)\) over an ellipse. Hence we need to associate a radius/distance to every point in space. Let’s define the radial distance \(r_{el}\) as the distance on the major axis to the center of an ellipse which is located at \(i_c\) and \(i_c\) (in other words \(r_{el}\equiv{a}\)). We want to find \(r_{el}\) of a point located at \((i,j)\) (in the image coordinate system) from the center of the ellipse with axis ratio \(q\) and position angle \(\theta\). First the coordinate system is rotated103 by \(\theta\) to get the new rotated coordinates of that point \((i_r,j_r)\):

$$i_r(i,j)=+(i_c-i)\cos\theta+(j_c-j)\sin\theta$$ $$j_r(i,j)=-(i_c-i)\sin\theta+(j_c-j)\cos\theta$$

Recall that an ellipse is defined by \((i_r/a)^2+(j_r/b)^2=1\) and that we defined \(r_{el}\equiv{a}\). Hence, multiplying all elements of the the ellipse definition with \(r_{el}^2\) we get the elliptical distance at this point point located: \(r_{el}=\sqrt{i_r^2+(j_r/q)^2}\). To place the radial profiles explained below over an ellipse, \(f(r_{el})\) is calculated based on the functional radial profile desired.

MakeProfiles builds the profile starting from the nearest element (pixel in an image) in the dataset to the profile center. The profile value is calculated for that central pixel using monte carlo integration, see Sampling from a function. The next pixel is the next nearest neighbor to the central pixel as defined by \(r_{el}\). This process goes on until the profile is fully built upto the trunctation radius. This is done fairly efficiently using a breadth first parsing strategy104 which is implemented through an ordered linked list.

Using this approach, we build the profile by expanding the circumference. Not one more extra pixel has to be checked (the calculation of \(r_{el}\) from above is not cheap in CPU terms). Another consequence of this strategy is that extending MakeProfiles to three dimensions becomes very simple: only the neighbors of each pixel have to be changed. Everything else after that (when the pixel index and its radial profile have entered the linked list) is the same, no matter the number of dimensions we are dealing with.


Next: , Previous: , Up: Modeling basics   [Contents][Index]

8.1.1.2 Point Spread Function

Assume we have a ‘point’ source, or a source that is far smaller than the maximum resolution (a pixel). When we take an image of it, it will ‘spread’ over an area. To quantify that spread, we can define a ‘function’. This is how the point spread function or the PSF of an image is defined. This ‘spread’ can have various causes, for example in ground based astronomy, due to the atmosphere. In practice we can never surpass the ‘spread’ due to the diffraction of the lens aperture. Various other effects can also be quantified through a PSF. For example, the simple fact that we are sampling in a discrete space, namely the pixels, also produces a very small ‘spread’ in the image.

Convolution is the mathematical process by which we can apply a ‘spread’ to an image, or in other words blur the image, see Convolution process. The Brightness of an object should remain unchanged after convolution, see Flux Brightness and magnitude. Therefore, it is important that the sum of all the pixels of the PSF be unity. The PSF image also has to have an odd number of pixels on its sides so one pixel can be defined as the center. In MakeProfiles, the PSF can be set by the two methods explained below.

Parametric functions

A known mathematical function is used to make the PSF. In this case, only the parameters to define the functions are necessary and MakeProfiles will make a PSF based on the given parameters for each function. In both cases, the center of the profile has to be exactly in the middle of the central pixel of the PSF (which is automatically done by MakeProfiles). When talking about the PSF, usually, the full width at half maximum or FWHM is used as a scale of the width of the PSF.

Gaussian

In the older papers, and to a lesser extent even today, some researchers use the 2D Gaussian function to approximate the PSF of ground based images. In its most general form, a Gaussian function can be written as:

$$f(r)=a \exp \left( -(x-\mu)^2 \over 2\sigma^2 \right)+d$$

Since the center of the profile is pre-defined, \(\mu\) and \(d\) are constrained. \(a\) can also be found because the function has to be normalized. So the only important parameter for MakeProfiles is the \(\sigma\). In the Gaussian function we have this relation between the FWHM and \(\sigma\):

$$\rm{FWHM}_g=2\sqrt{2\ln{2}}\sigma \approx 2.35482\sigma$$

Moffat

The Gaussian profile is much sharper than the images taken from stars on photographic plates or CCDs. Therefore in 1969, Moffat proposed this functional form for the image of stars:

$$f(r)=a \left[ 1+\left( r\over \alpha \right)^2 \right]^{-\beta}$$

Again, \(a\) is constrained by the normalization, therefore two parameters define the shape of the Moffat function: \(\alpha\) and \(\beta\). The radial parameter is \(\alpha\) which is related to the FWHM by

$$\rm{FWHM}_m=2\alpha\sqrt{2^{1/\beta}-1}$$

Comparing with the PSF predicted from atmospheric turbulence theory with a Moffat function, Trujillo et al.105 claim that \(\beta\) should be 4.765. They also show how the Moffat PSF contains the Gaussian PSF as a limiting case when \(\beta\to\infty\).

An input FITS image

An input image file can also be specified to be used as a PSF. If the sum of its pixels are not equal to 1, the pixels will be multiplied by a fraction so the sum does become 1.

While the Gaussian is only dependent on the FWHM, the Moffat function is also dependent on \(\beta\). Comparing these two functions with a fixed FWHM gives the following results:


Next: , Previous: , Up: Modeling basics   [Contents][Index]

8.1.1.3 Stars

In MakeProfiles, stars are generally considered to be a point source. This is usually the case for extra galactic studies, were nearby stars are also in the field. Since a star is only a point source, we assume that it only fills one pixel prior to convolution. In fact, exactly for this reason, in astronomical images the light profiles of stars are one of the best methods to understand the shape of the PSF and a very large fraction of scientific research is preformed by assuming the shapes of stars to be the PSF of the image.


Next: , Previous: , Up: Modeling basics   [Contents][Index]

8.1.1.4 Galaxies

Today, most practitioners agree that the flux of galaxies can be modeled with one or a few generalized de Vaucouleur’s (or Sérsic) profiles.

$$I(r) = I_e \exp \left ( -b_n \left[ \left( r \over r_e \right)^{1/n} -1 \right] \right )$$

Gérard de Vaucouleurs (1918-1995) was first to show in 1948 that this function best fits the galaxy light profiles, with the only difference that he held \(n\) fixed to a value of 4. 20 years later in 1968, J. L. Sérsic showed that \(n\) can have a variety of values and does not necessarily need to be 4. This profile depends on the effective radius (\(r_e\)) which is defined as the radius which contains half of the profile brightness (see Profile magnitude). \(I_e\) is the flux at the effective radius. The Sérsic index \(n\) is used to define the concentration of the profile within \(r_e\) and \(b_n\) is a constant dependent on \(n\). MacArthur et al.106 show that for \(n>0.35\), \(b_n\) can be accurately approximated using this equation:

$$b_n=2n - {1\over 3} + {4\over 405n} + {46\over 25515n^2} + {131\over 1148175n^3}-{2194697\over 30690717750n^4}$$


Next: , Previous: , Up: Modeling basics   [Contents][Index]

8.1.1.5 Sampling from a function

A pixel is the ultimate level of accuracy to gather data, we can’t get any more accurate in one image, this is known as sampling in signal processing. However, the mathematical profiles which describe our models have infinite accuracy. Over a large fraction of the area of astrophysically interesting profiles (for example galaxies or PSFs), the variation of the profile over the area of one pixel is not too significant. In such cases, the elliptical radius (\(r_{el}\) of the center of the pixel can be assigned as the final value of the pixel, see Defining an ellipse).

As you approach their center, some galaxies become very sharp (their value significantly changes over one pixel’s area). This sharpness increases with smaller effective radius and larger Sérsic values. Thus rendering the central value extremely inaccurate. The first method that comes to mind for solving this problem is integration. The functional form of the profile can be integrated over the pixel area in a 2D integration process. However, unfortunately numerical integration techniques also have their limitations and when such sharp profiles are needed they can become extremely inaccurate.

The most accurate method of sampling a continuous profile on a discrete space is by choosing a large number of random points within the boundaries of the pixel and taking their average value (or Monte Carlo integration). This is also, generally speaking, what happens in practice with the photons on the pixel. The number of random points can be set with --numrandom.

Unfortunately, repeating this Monte Carlo process would be extremely time and CPU consuming if it is to be applied to every pixel. In order to not loose too much accuracy, in MakeProfiles, the profile is built using both methods explained below. The building of the profile begins from its central pixel and continues (radially) outwards. Monte Carlo integration is first applied (which yields \(F_r\)), then the central pixel value (\(F_c\)) is calculated on the same pixel. If the fractional difference (\(|F_r-F_c|/F_r\)) is lower than a given tolerance level (specified with --tolerance) MakeProfiles will stop using Monte Carlo integration and only use the central pixel value.

The ordering of the pixels in this inside-out construction is based on \(r=\sqrt{(i_c-i)^2+(j_c-j)^2}\), not \(r_{el}\), see Defining an ellipse. When the axis ratios are large (near one) this is fine. But when they are small and the object is highly elliptical, it might seem more reasonable to follow \(r_{el}\) not \(r\). The problem is that the gradient is stronger in pixels with smaller \(r\) (and larger \(r_{el}\)) than those with smaller \(r_{el}\). In other words, the gradient is strongest along the minor axis. So if the next pixel is chosen based on \(r_{el}\), the tolerance level will be reached sooner and lots of pixels with large fractional differences will be missed.

Monte Carlo integration uses a random number of points. Thus, every time you run it, by default, you will get a different distribution of points to sample within the pixel. In the case of large profiles, this will result in a slight difference of the pixels which use Monte Carlo integration each time MakeProfiles is run. To have a deterministic result, you have to fix the random number generator properties which is used to build the random distribution. This can be done by setting the GSL_RNG_TYPE and GSL_RNG_SEED environment variables and calling MakeProfiles with the --envseed option. To learn more about the process of generating random numbers, see Generating random numbers.

The seed values are fixed for every profile: with --envseed, all the profiles have the same seed and without it, each will get a different seed using the system clock (which is accurate to within one microsecond). The same seed will be used to generate a random number for all the sub-pixel positions of all the profiles. So in the former, the sub-pixel points checked for all the pixels undergoing Monte carlo integration in all profiles will be identical. In other words, the sub-pixel points in the first (closest to the center) pixel of all the profiles will be identical with each other. All the second pixels studied for all the profiles will also receive an identical (different from the first pixel) set of sub-pixel points and so on. As long as the number of random points used is large enough or the profiles are not identical, this should not cause any systematic bias.


Previous: , Up: Modeling basics   [Contents][Index]

8.1.1.6 Oversampling

The steps explained in Sampling from a function do give an accurate representation of a profile prior to convolution. However, in an actual observation, the image is first convolved with or blurred by the atmospheric and instrument PSF in a continuous space and then it is sampled on the discrete pixels of the camera.

In order to more accurately simulate this process, the un-convolved image and the PSF are created on a finer pixel grid. In other words, the output image is a certain odd-integer multiple of the desired size, we can call this ‘oversampling’. The user can specify this multiple as a command-line option. The reason this has to be an odd number is that the PSF has to be centered on the center of its image. An image with an even number of pixels on each side does not have a central pixel.

The image can then be convolved with the PSF (which should also be oversampled on the same scale). Finally, image can be sub-sampled to get to the initial desired pixel size of the output image. After this, mock noise can be added as explained in the next section. This is because unlike the PSF, the noise occurs in each output pixel, not on a continuous space like all the prior steps.


Next: , Previous: , Up: MakeProfiles   [Contents][Index]

8.1.2 If convolving afterwards

In case you want to convolve the image later with a given point spread function, make sure to use a larger image size. After convolution, the profiles become larger and a profile that is normally completely outside of the image might fall within it.

On one axis, if you want your final (convolved) image to be \(m\) pixels and your PSF is \(2n+1\) pixels wide, then when calling MakeProfiles, set the axis size to \(m+2n\), not \(m\). You also have to shift all the pixel positions of the profile centers on the that axis by \(n\) pixels to the positive.

After convolution, you can crop the outer \(n\) pixels with the section crop box specification of Crop: --section=n:*-n,n:*-n assuming your PSF is a square, see Crop section syntax. This will also remove all discrete Fourier transform artifacts (blurred sides) from the final image. To facilitate this shift, MakeProfiles has the options --xshift, --yshift and --prepforconv, see Invoking MakeProfiles.


Next: , Previous: , Up: MakeProfiles   [Contents][Index]

8.1.3 Flux Brightness and magnitude

Astronomical data pixels are usually in units of counts107 or electrons or either one divided by seconds. To convert from the counts to electrons, you will need to know the instrument gain. In any case, they can be directly converted to energy or energy/time using the basic hardware (telescope, camera and filter) information. We will continue the discussion assuming the pixels are in units of energy/time.

The brightness of an object is defined as its total detected energy per time. This is simply the sum of the pixels that are associated with that detection by our detection tool for example NoiseChisel108. The flux of an object is in units of energy/time/area and for a detected object, it is defined as its brightness divided by the area used to collect the light from the source or the telescope aperture (for example in \(cm^2\))109. Knowing the flux (\(f\)) and distance to the object (\(r\)), we can calculate its luminosity: \(L=4{\pi}r^2f\). Therefore, flux and luminosity are intrinsic properties of the object, while brightness depends on our detecting tools (hardware and software). Here we will not be discussing luminosity, but brightness. However, since luminosity is the astrophysically interesting quantity, we also defined it here to avoid possible confusion between these two terms because they both have the same units.

Images of astronomical objects span over a very large range of brightness. With the Sun (as the brightest object) being roughly \(2.5^{60}=10^{24}\) times brighter than the faintest galaxies we can currently detect. Therefore discussing brightness will be very hard, and astronomers have chosen to use a logarithmic scale to talk about the brightness of astronomical objects. But the logarithm can only be usable with a unit-less and always positive value. Fortunately brightness is always positive and to remove the units we divide the brightness of the object (\(B\)) by a reference brightness (\(B_r\)). We then define the resulting logarithmic scale as \(magnitude\) through the following relation110

$$m-m_r=-2.5\log_{10} \left( B \over B_r \right)$$

\(m\) is defined as the magnitude of the object and \(m_r\) is the pre-defined magnitude of the reference brightness. One particularly easy condition is when \(B_r=1\). This will allow us to summarize all the hardware specific parameters discussed above into one number as the reference magnitude which is commonly known as the Zero-point111 magnitude.


Next: , Previous: , Up: MakeProfiles   [Contents][Index]

8.1.4 Profile magnitude

To find the profile brightness or its magnitude, (see Flux Brightness and magnitude), it is customary to use the 2D integration of the flux to infinity. However, in MakeProfiles we do not follow this idealistic approach and apply a more realistic method to find the total brightness or magnitude: the sum of all the pixels belonging to a profile within its predefined truncation radius. Note that if the truncation radius is not large enough, this can be significantly different from the total integrated light to infinity.

An integration to infinity is not a realistic condition because no galaxy extends indefinitely (important for high Sérsic index profiles), pixelation can also cause a significant difference between the actual total pixel sum value of the profile and that of integration to infinity, especially in small and high Sérsic index profiles. To be safe, you can specify a large enough truncation radius for such compact high Sérsic index profiles.

If oversampling is used then the brightness is calculated using the over-sampled image, see Oversampling which is much more accurate. The profile is first built in an array completely bounding it with a normalization constant of unity (see Galaxies). Taking \(B\) to be the desired brightness and \(S\) to be the sum of the pixels in the created profile, every pixel is then multiplied by \(B/S\) so the sum is exactly \(B\).

If the --individual option is called, this same array is written to a FITS file. If not, only the overlapping pixels of this array and the output image are kept and added to the output array.


Previous: , Up: MakeProfiles   [Contents][Index]

8.1.5 Invoking MakeProfiles

MakeProfiles will make any number of profiles specified in a catalog either individually or in one image. The executable name is astmkprof with the following general template

$ astmkprof [OPTION ...] [Catalog]

One line examples:

## Make an image with profiles in catalog.txt (with default size):
$ astmkprof catalog.txt

## Make the profiles in catalog.txt over image.fits:
$ astmkprof --background=image.fits catalog.txt

## Make a Moffat PSF with FWHM 3pix, beta=2.8, truncation=5
$ astmkprof --kernel=moffat,2.8,5 --oversample=1

## Make profiles in catalog, using RA and Dec in the given column:
$ astmkprof --ccol=RA_CENTER --ccol=DEC_CENTER --mode=wcs catalog.txt

## Make a 1500x1500 merged image (oversampled 500x500) image along
## with an individual image for all the profiles in catalog:
$ astmkprof --individual --oversample 3 --naxis=500,500 catalog.txt

The parameters of the mock profiles can either be given through a catalog (which stores the parameters of many mock profiles, see MakeProfiles catalog), or the --kernel option (see MakeProfiles output dataset). The catalog can be in the FITS ASCII, FITS binary format, or plain text formats (see Tables). The columns related to each parameter can be determined both by number, or by match/search criteria using the column names, units, or comments. with the options ending in col, see below.

Without any file given to the --background option, MakeProfiles will make a zero-valued image and build the profiles on that (its size and main WCS parameters can also be defined through the options described in MakeProfiles output dataset). Besides the main/merged image containing all the profiles in the catalog, it is also possible to build individual images for each profile (only enclosing one full profile to its truncation radius) with the --individual option.

If an image is given to the --background option, the pixels of that image are used as the background value for every pixel. The flux value of each profile pixel will be added to the pixel in that background value. In this case, the values to all options relating to the output size and WCS will be ignored if specified (for example --oversample, --naxis, and --prepforconv) on the command-line or in the configuration files.

The sections below discuss the options specific to MakeProfiles based on context: the input catalog settings which can have many rows for different profiles are discussed in MakeProfiles catalog, in MakeProfiles profile settings, we discuss how you can set general profile settings (that are the same for all the profiles in the catalog). Finally MakeProfiles output dataset and MakeProfiles log file discuss the outputs of MakeProfiles and how you can configure them. Besides these, MakeProfiles also supports all the common Gnuastro program options that are discussed in Common options, so please flip through them is well for a more comfortable usage.

Please see Sufi simulates a detection for a very complete tutorial explaining how one could use MakeProfiles in conjunction with other Gnuastro’s programs to make a complete simulated image of a mock galaxy.


Next: , Previous: , Up: Invoking astmkprof   [Contents][Index]

8.1.5.1 MakeProfiles catalog

The catalog containing information about each profile can be in the FITS ASCII, FITS binary, or plain text formats (see Tables). Its columns can be ordered in any desired manner. You can specify which columns belong to which parameters using the set of options discussed below. For example through the --rcol and --tcol options, you can specify the column that contains the radial parameter for each profile and its truncation respectively. See Selecting table columns for a thorough discussion on the values to these options.

The value for the profile center in the catalog (the --ccol option) can be a floating point number so the profile center can be on any sub-pixel position. Note that pixel positions in the FITS standard start from 1 and an integer is the pixel center. So a 2D image actually starts from the position (0.5, 0.5), which is the bottom-left corner of the first pixel. When a --background image with WCS information is provided or you specify the WCS parameters with the respective options, you may also use RA and Dec to identify the center of each profile (see the --mode option below).

In MakeProfiles, profile centers do not have to be in (overlap with) the final image. Even if only one pixel of the profile within the truncation radius overlaps with the final image size, the profile is built and included in the final image image. Profiles that are completely out of the image will not be created (unless you explicity ask for it with the --individual option). You can use the output log file (created with --log to see which profiles were within the image, see Common options.

If PSF profiles (Moffat or Gaussian, see Point Spread Function) are in the catalog and the profiles are to be built in one image (when --individual is not used), it is assumed they are the PSF(s) you want to convolve your created image with. So by default, they will not be built in the output image but as separate files. The sum of pixels of these separate files will also be set to unity (1) so you are ready to convolve, see Convolution process. As a summary, the position and magnitude of PSF profile will be ignored. This behavior can be disabled with the --psfinimg option. If you want to create all the profiles separately (with --individual) and you want the sum of the PSF profile pixels to be unity, you have to set their magnitudes in the catalog to the zero-point magnitude and be sure that the central positions of the profiles don’t have any fractional part (the PSF center has to be in the center of the pixel).

The list of options directly related to the input catalog columns is shown below.

--ccol=STR/INT

Center coordinate column for each dimension. This option must be called two times to define the center coordinates in an image. For example --ccol=RA and --ccol=DEC (along with --mode=wcs) will inform MakeProfiles to look into the catalog columns named RA and DEC for the Right Ascension and Declination of the profile centers.

--fcol=INT/STR

The functional form of the profile with one of the values below depending on the desired profile. The column can contain either the numeric codes (for example ‘1’) or string characters (for example ‘sersic’). The numeric codes are easier to use in scripts which generate catalogs with hundreds or thousands of profiles.

The string format can be easier when the catalog is to be written/checked by hand/eye before running MakeProfiles. It is much more readable and provides a level of documentation. All Gnuastro’s recognized table formats (see Recognized table formats) accept string type columns. To have string columns in a plain text table/catalog, see Gnuastro text table format.

--rcol=STR/INT

The radius parameter of the profiles. Effective radius (\(r_e\)) if Sérsic, FWHM if Moffat or Gaussian.

--ncol=STR/INT

The Sérsic index (\(n\)) or Moffat \(\beta\).

--pcol=STR/INT

The position angle (in degrees) of the profiles relative to the first FITS axis (horizontal when viewed in SAO ds9).

--qcol=STR/INT

The axis ratio of the profiles (minor axis divided by the major axis in a 2D ellipse).

--mcol=STR/INT

The total pixelated magnitude of the profile within the truncation radius, see Profile magnitude.

--tcol=STR/INT

The truncation radius of this profile. By default it is in units of the radial parameter of the profile (the value in the --rcol of the catalog). If --tunitinp is given, this value is interpreted in units of pixels (prior to oversampling) irrespective of the profile.


Next: , Previous: , Up: Invoking astmkprof   [Contents][Index]

8.1.5.2 MakeProfiles profile settings

The profile parameters that differ between each created profile are specified through the columns in the input catalog and described in MakeProfiles catalog. Besides those there are general settings for some profiles that don’t differ between one profile and another, they are a property of the general process. For example how many random points to use in the monte-carlo integration, this value is fixed for all the profiles. The options described in this section are for configuring such properties.

--mode=STR

Interpret the center position columns (--ccol in MakeProfiles catalog) in image or WCS coordinates. This option thus accepts only two values: img and wcs. It is mandatory when a catalog is being used as input.

-r
--numrandom

The number of random points used in the central regions of the profile, see Sampling from a function.

-e
--envseed

Use the value to the GSL_RNG_SEED environment variable to generate the random Monte Carlo sampling distribution, see Sampling from a function and Generating random numbers.

-t FLT
--tolerance=FLT

The tolerance to switch from Monte Carlo integration to the central pixel value, see Sampling from a function.

-p
--tunitinp

The truncation column of the catalog is in units of pixels. By default, the truncation column is considered to be in units of the radial parameters of the profile (--rcol). Read it as ‘t-unit-in-p’ for ‘truncation unit in pixels’.

-f
--mforflatpix

When making fixed value profiles (flat and circumference, see ‘--fcol’), don’t use the value in the column specified by ‘--mcol’ as the magnitude. Instead use it as the exact value that all the pixels of these profiles should have. This option is irrelevant for other types of profiles. This option is very useful for creating masks, or labeled regions in an image. Any integer, or floating point value can used in this column with this option, including NaN (or ‘nan’, or ‘NAN’, case is irrelevant), and infinities (inf, -inf, or +inf).

For example, with this option if you set the value in the magnitude column (--mcol) to NaN, you can create an elliptical or circular mask over an image (which can be given as the argument), see Blank pixels. Another useful application of this option is to create labeled elliptical or circular apertures in an image. To do this, set the value in the magnitude column to the label you want for this profile. This labeled image can then be used in combination with NoiseChisel’s output (see NoiseChisel output) to do aperture photometry with MakeCatalog (see MakeCatalog).

Alternatively, if you want to mark regions of the image (for example with an elliptical circumference) and you don’t want to use NaN values (as explained above) for some technical reason, you can get the minimum or maximum value in the image 112 using Arithmetic (see Arithmetic), then use that value in the magnitude column along with this option for all the profiles.

Please note that when using MakeProfiles on an already existing image, you have to set ‘--oversample=1’. Otherwise all the profiles will be scaled up based on the oversampling scale in your configuration files (see Configuration files) unless you have accounted for oversampling in your catalog.

--mcolisbrightness

The value given in the “magnitude column” (specified by --mcol, see MakeProfiles catalog) must be interpretted as brightness, not magnitude. The zeropoint magnitude (value to the --zeropoint option) is ignored and the given value must have the same units as the input dataset’s pixels.

Recall that the total profile magnitude or brightness that is specified with in the --mcol column of the input catalog is not an integration to infinity, but the actual sum of pixels in the profile (until the desired truncation radius). See Profile magnitude for more on this point.

--magatpeak

The magnitude column in the catalog (see MakeProfiles catalog) will be used to find the brightness only for the peak profile pixel, not the full profile. Note that this is the flux of the profile’s peak pixel in the final output of MakeProfiles. So beware of the oversampling, see Oversampling.

This option can be useful if you want to check a mock profile’s total magnitude at various truncation radii. Without this option, no matter what the truncation radius is, the total magnitude will be the same as that given in the catalog. But with this option, the total magnitude will become brighter as you increase the truncation radius.

In sharper profiles, sometimes the accuracy of measuring the peak profile flux is more than the overall object brightness. In such cases, with this option, the final profile will be built such that its peak has the given magnitude, not the total profile.

CAUTION: If you want to use this option for comparing with observations, please note that MakeProfiles does not do convolution. Unless you have de-convolved your data, your images are convolved with the instrument and atmospheric PSF, see Point Spread Function. Particularly in sharper profiles, the flux in the peak pixel is strongly decreased after convolution. Also note that in such cases, besides de-convolution, you will have to set --oversample=1 otherwise after resampling your profile with Warp (see Warp), the peak flux will be different.

-X INT,INT
--shift=INT,INT

Shift all the profiles and enlarge the image along each dimension. To better understand this option, please see \(n\) in If convolving afterwards. This is useful when you want to convolve the image afterwards. If you are using an external PSF, be sure to oversample it to the same scale used for creating the mock images. If a background image is specified, any possible value to this option is ignored.

-c
--prepforconv

Shift all the profiles and enlarge the image based on half the width of the first Moffat or Gaussian profile in the catalog, considering any possible oversampling see If convolving afterwards. --prepforconv is only checked and possibly activated if --xshift and --yshift are both zero (after reading the command-line and configuration files). If a background image is specified, any possible value to this option is ignored.

-z FLT
--zeropoint=FLT

The zero-point magnitude of the image.

-w FLT
--circumwidth=FLT

The width of the circumference if the profile is to be an elliptical circumference or annulus. See the explanations for this type of profile in --fcol.

-R
--replace

Do not add the pixels of each profile over the background (possibly crowded by other profiles), replace them. By default, when two profiles overlap, the final pixel value is the sum of all the profiles that overlap on that pixel. When this option is given, the pixels are not added but replaced by the newer profile’s pixel and any value under it is lost.

When order matters, make sure to use this function with ‘--numthreads=1’. When multiple threads are used, the separate profiles are built asynchronously and not in order. Since order does not matter in an addition, this causes no problems by default but has to be considered when this option is given. Using multiple threads is no problem if the profiles are to be used as a mask with a blank or fixed value (see ‘--mforflatpix’) since all their pixel values are the same.

Note that only non-zero pixels are replaced. With radial profiles (for example Sérsic or Moffat) only values above zero will be part of the profile. However, when using flat profiles with the ‘--mforflatpix’ option, you should be careful not to give a 0.0 value as the flat profile’s pixel value.


Next: , Previous: , Up: Invoking astmkprof   [Contents][Index]

8.1.5.3 MakeProfiles output dataset

MakeProfiles takes an input catalog uses basic properties that are defined there to build a dataset, for example a 2D image containing the profiles in the catalog. In MakeProfiles catalog and MakeProfiles profile settings, the catalog and profile settings were discussed. The options of this section, allow you to configure the output dataset (or the canvas that will host the built profiles).

-k STR
--background=STR

A background image FITS file to build the profiles on. The extension that contains the image should be specified with the --backhdu option, see below. When a background image is specified, it will be used to derive all the information about the output image. Hence, the following options will be ignored: --naxis, --oversample, --crpix, --crval (generally, all other WCS related parameters) and the output’s data type (see --type in Input/Output options).

The image will act like a canvas to build the profiles on: profile pixel values will be summed with the background image pixel values. With the --replace option you can disable this behavior and replace the profile pixels with the background pixels. If you want to use all the image information above, except for the pixel values (you want to have a blank canvas to build the profiles on, based on an input image), you can call --clearcanvas, to set all the input image’s pixels to zero before starting to build the profiles over it (this is done in memory after reading the input, so nothing will happen to your input file).

-B STR/INT
--backhdu=STR/INT

The header data unit (HDU) of the file given to --background.

-C
--clearcanvas

When an input image is specified (with the --background option, set all its pixels to 0.0 immediately after reading it into memory. Effectively, this will allow you to use all its properties (described under the --background option), without having to worry about the pixel values.

--clearcanvas can come in handy in many situations, for example if you want to create a labeled image (segmentation map) for creating a catalog (see MakeCatalog). In other cases, you might have modeled the objects in an image and want to create them on the same frame, but without the original pixel values.

-E STR/INT,FLT[,FLT,[...]]
--kernel=STR/INT,FLT[,FLT,[...]]

Only build one kernel profile with the parameters given as the values to this option. The different values must be separated by a comma (,). The first value identifies the radial function of the profile, either through a string or through a number (see description of --fcol in MakeProfiles catalog). Each radial profile needs a different total number of parameters: Sérsic and Moffat functions need 3 parameters: radial, Sérsic index or Moffat \(\beta\), and truncation radius. The Gaussian function needs two parameters: radial and truncation radius. The point function doesn’t need any parameters and flat and circumference profiles just need one parameter (truncation radius).

The PSF or kernel is a unique (and highly constrained) type of profile: the sum of its pixels must be one, its center must be the center of the central pixel (in an image with an odd number of pixels on each side), and commonly it is circular, so its axis ratio and position angle are one and zero respectively. Kernels are commonly necessary for various data analysis and data manipulation steps (for example see Convolve, and NoiseChisel. Because of this it is inconvenient to define a catalog with one row and many zero valued columns (for all the non-necessary parameters). Hence, with this option, it is possible to create a kernel with MakeProfiles without the need to create a catalog. Here are some examples:

--kernel=moffat,3,2.8,5

A Moffat kernel with FWHM of 3 pixels, \(\beta=2.8\) which is truncated at 5 times the FWHM.

--kernel=gaussian,2,3

A Gaussian kernel with FWHM of 2 pixels and truncated at 3 times the FWHM.

-x INT,INT
--naxis=INT,INT

The number of pixels along each dimension axis of the output in FITS order. This is before over-sampling. For example if you call MakeProfiles with --naxis=100,150 --oversample=5 (assuming no shift due for later convolution), then the final image size along the first axis will be 500 by 750 pixels. Fractions are acceptable as values for each dimension, however, they must reduce to an integer, so --naxis=150/3,300/3 is acceptable but --naxis=150/4,300/4 is not.

When viewing a FITS image in DS9, the first FITS dimension is in the horizontal direction and the second is vertical. As an example, the image created with the example above will have 500 pixels horizontally and 750 pixels vertically.

If a background image is specified, this option is ignored.

-s INT
--oversample=INT

The scale to over-sample the profiles and final image. If not an odd number, will be added by one, see Oversampling. Note that this --oversample will remain active even if an input image is specified. If your input catalog is based on the background image, be sure to set --oversample=1.

--psfinimg

Build the possibly existing PSF profiles (Moffat or Gaussian) in the catalog into the final image. By default they are built separately so you can convolve your images with them, thus their magnitude and positions are ignored. With this option, they will be built in the final image like every other galaxy profile. To have a final PSF in your image, make a point profile where you want the PSF and after convolution it will be the PSF.

-i
--individual

If this option is called, each profile is created in a separate FITS file within the same directory as the output and the row number of the profile (starting from zero) in the name. The file for each row’s profile will be in the same directory as the final combined image of all the profiles and will have the final image’s name as a suffix. So for example if the final combined image is named ./out/fromcatalog.fits, then the first profile that will be created with this option will be named ./out/0_fromcatalog.fits.

Since each image only has one full profile out to the truncation radius the profile is centered and so, only the sub-pixel position of the profile center is important for the outputs of this option. The output will have an odd number of pixels. If there is no oversampling, the central pixel will contain the profile center. If the value to --oversample is larger than unity, then the profile center is on any of the central --oversample’d pixels depending on the fractional value of the profile center.

If the fractional value is larger than half, it is on the bottom half of the central region. This is due to the FITS definition of a real number position: The center of a pixel has fractional value \(0.00\) so each pixel contains these fractions: .5 – .75 – .00 (pixel center) – .25 – .5.

-m
--nomerged

Don’t make a merged image. By default after making the profiles, they are added to a final image with side lengths specified by --naxisif they overlap with it.

The options below can be used to define the world coordinate system (WCS) properties of the MakeProfiles outputs. The option names are delibarately chosen to be the same as the FITS standard WCS keywords. See Section 8 of Pence et al [2010] for a short introduction to WCS in the FITS standard113.

If you look into the headers of a FITS image with WCS for example you will see all these names but in uppercase and with numbers to represent the dimensions, for example CRPIX1 and PC2_1. You can see the FITS headers with Gnuastro’s Fits program using a command like this: $ astfits -p image.fits.

If the values given to all of these options does not correspond to the dimensionality of the output dataset, then no WCS information will be added.

--crpix=FLT,FLT

The pixel coordinates of the WCS reference point. Fractions are acceptable for the values of this option.

--crval=FLT,FLT

The WCS coordinates of the Reference point. Fractions are acceptable for the values of this option.

--cdelt=FLT,FLT

The resolution (size of one data-unit or pixel in WCS units) of the non-oversampled dataset. Fractions are acceptable for the values of this option.

--pc=FLT,FLT,FLT,FLT

The PC matrix of the WCS rotation, see the FITS standard (link above) to better understand the PC matrix.

--cunit=STR,STR

The units of each WCS axis, for example deg. Note that these values are part of the FITS standard (link above). MakeProfiles won’t complain if you use non-standard values, but later usage of them might cause trouble.

--ctype=STR,STR

The type of each WCS axis, for example RA---TAN and DEC--TAN. Note that these values are part of the FITS standard (link above). MakeProfiles won’t complain if you use non-standard values, but later usage of them might cause trouble.


Previous: , Up: Invoking astmkprof   [Contents][Index]

8.1.5.4 MakeProfiles log file

Besides the final merged dataset of all the profiles, or the individual datasets (see MakeProfiles output dataset), if the --log option is called MakeProfiles will also create a log file in the current directory (where you run MockProfiles). See Common options for a full description of --log and other options that are shared between all Gnuastro programs. The values for each column are explained in the first few commented lines of the log file (starting with # character). Here is a more complete description.


Previous: , Up: Modeling and fittings   [Contents][Index]

8.2 MakeNoise

Real data are always buried in noise, therefore to finalize a simulation of real data (for example to test our observational algorithms) it is essential to add noise to the mock profiles created with MakeProfiles, see MakeProfiles. Below, the general principles and concepts to help understand how noise is quantified is discussed. MakeNoise options and argument are then discussed in Invoking MakeNoise.


Next: , Previous: , Up: MakeNoise   [Contents][Index]

8.2.1 Noise basics

Deep astronomical images, like those used in extragalactic studies seriously suffer from noise in the data. Generally speaking, the sources of noise in an astronomical image are photon counting noise and Instrumental noise which are discussed in detail below. We finish with a short introduction on how random numbers are generated and how you can determine the random number generator and seed value.


Next: , Previous: , Up: Noise basics   [Contents][Index]

8.2.1.1 Photon counting noise

Thanks to the very accurate electronics used in today’s detectors, this type of noise is the main cause of concern for extra galactic studies. It can generally be associate with the counting error that is known to have a Poisson distribution. The Poisson distribution is about counting. But counting is a discrete operation with only positive values, for example we can’t count \(3.2\) or \(-2\) of anything. We only count \(0\), \(1\), \(2\), \(3\) and so on. Therefore the Poisson distribution is also a discrete distribution, only applying to whole positive integers.

Let’s assume the mean value of counting something is known. In this case, we are counting the number of electrons that are produced by photons in a detector (for example CCD) pixel. Let’s call this mean \(\lambda\). Furthermore, let’s take \(k\) to represent the result of one particular counting attempt. The probability density function of \(k\) can be written as:

$$f(k)={\lambda^k \over k!} e^{-\lambda},\quad k\in {0, 1, 2, 3, \dots }$$

Because the Poisson distribution is only applicable to positive values, naturally it is very skewed when \(\lambda\) is near zero. One qualitative way to explain it is that there simply aren’t enough integers smaller than \(\lambda\), than integers that are larger than it. Therefore to accommodate all possibilities, it has to be skewed when \(\lambda\) is small.

But as \(\lambda\) becomes larger and larger, the distribution becomes more and more symmetric. One very useful property of the Poisson distribution is that the mean value is also its variance. When \(\lambda\) is very large, say \(\lambda>1000\), then the normal (Gaussian) distribution, see Point Spread Function, is an excellent approximation of the Poisson distribution with mean \(\mu=\lambda\) and standard deviation \(\sigma=\sqrt{\lambda}\).

We see that the variance or dispersion of the distribution depends on the mean value, and when it is large it can be approximated with a Gaussian that only has one free parameter (\(\mu=\lambda\) and \(\sigma=\sqrt{\lambda}\)) instead of two that it originally has.

The astronomical objects after convolution with the PSF of the instrument, lie above a certain background flux. This background flux is defined to be the average flux of a region in the image that has absolutely no objects. The physical origin of this background value is the brightness of the atmosphere or possible stray light within the imaging instrument. It is thus an ideal definition, because in practice, what lies deep in the noise far lower than the detection limit is never known114. However, in a real image, a relatively large number of very faint objects can been fully buried in the noise. These undetected objects will bias the background measurement to slightly larger values. The sky value is therefore defined to be the average of the undetected regions in the image, so in an ideal case where all the objects have been detected, the sky value and background value are the same.

As longer wavelengths are used, the background value becomes more significant and also varies over a wide image field. Such variations are not currently implemented in MakeProfiles, but will be in the future. In a mock image, we have the luxury of setting the background value.

In each pixel of the canvas of pixels, the flux is the sum of contributions from various sources after convolution. Let’s name this flux of the convolved sum of possibly overlapping objects, \(I_{nn}\). \(nn\) representing ‘no noise’. For now, let’s assume the background is constant and represented by \(B\). In practice the background values are larger than \(\sim1,000\) counts. Then the flux after adding noise is a random value taken from a Gaussian distribution with the following mean (\(\mu\)) and standard deviation (\(\sigma\)):

$$\mu=B+I_{nn}, \quad \sigma=\sqrt{B+I_{nn}}$$

Since this type of noise is inherent in the objects we study, it is usually measured on the same scale as the astronomical objects, namely the magnitude system, see Flux Brightness and magnitude. It is then internally converted to the flux scale for further processing.


Next: , Previous: , Up: Noise basics   [Contents][Index]

8.2.1.2 Instrumental noise

While taking images with a camera, a dark current is fed to the pixels, the variation of the value of this dark current over the pixels, also adds to the final image noise. Another source of noise is the readout noise that is produced by the electronics in the CCD that attempt to digitize the voltage produced by the photo-electrons in the analog to digital converter. In deep extra-galactic studies these sources of noise are not as significant as the noise of the background sky. Let \(C\) represent the combined standard deviation of all these sources of noise. If only this source of noise is present, the noised pixel value would be a random value chosen from a Gaussian distribution with

$$\mu=I_{nn}, \quad \sigma=\sqrt{C^2+I_{nn}}$$

This type of noise is completely independent of the type of objects being studied, it is completely determined by the instrument. So the flux scale (and not magnitude scale) is most commonly used for this type of noise. In practice, this value is usually reported in ADUs not flux or electron counts. The gain value of the device can be used to convert between these two, see Flux Brightness and magnitude.


Next: , Previous: , Up: Noise basics   [Contents][Index]

8.2.1.3 Final noised pixel value

Depending on the values you specify for \(B\) and \(C\) from the above, the final noised value for each pixel is a random value chosen from a Gaussian distribution with

$$\mu=B+I_{nn}, \quad \sigma=\sqrt{C^2+B+I_{nn}}$$


Previous: , Up: Noise basics   [Contents][Index]

8.2.1.4 Generating random numbers

As discussed above, to generate noise we need to make random samples of a particular distribution. So it is important to understand some general concepts regarding the generation of random numbers. For a very complete and nice introduction we strongly advise reading Donald Knuth’s “The art of computer programming”, volume 2, chapter 3115. Quoting from the GNU Scientific Library manual, “If you don’t own it, you should stop reading right now, run to the nearest bookstore, and buy it”116!

Using only software, we can only produce what is called a psudo-random sequence of numbers. A true random number generator is a hardware (let’s assume we have made sure it has no systematic biases), for example throwing dice or flipping coins (which have remained from the ancient times). More modern hardware methods use atmospheric noise, thermal noise or other types of external electromagnetic or quantum phenomena. All psudo-random number generators (software) require a seed to be the basis of the generation. The advantage of having a seed is that if you specify the same seed for multiple runs, you will get an identical sequence of random numbers which allows you to reproduce the same final noised image.

The programs in GNU Astronomy Utilities (for example MakeNoise or MakeProfiles) use the GNU Scientific Library (GSL) to generate random numbers. GSL allows the user to set the random number generator through environment variables, see Installation directory for an introduction to environment variables. In the chapter titled “Random Number Generation” they have fully explained the various random number generators that are available (there are a lot of them!). Through the two environment variables GSL_RNG_TYPE and GSL_RNG_SEED you can specify the generator and its seed respectively.

If you don’t specify a value for GSL_RNG_TYPE, GSL will use its default random number generator type. The default type is sufficient for most general applications. If no value is given for the GSL_RNG_SEED environment variable and you have asked Gnuastro to read the seed from the environment (through the --envseed option), then GSL will use the default value of each generator to give identical outputs. If you don’t explicitly tell Gnuastro programs to read the seed value from the environment variable, then they will use the system time (accurate to within a microsecond) to generate (apparently random) seeds. In this manner, every time you run the program, you will get a different random number distribution.

There are two ways you can specify values for these environment variables. You can call them on the same command-line for example:

$ GSL_RNG_TYPE="taus" GSL_RNG_SEED=345 astmknoise input.fits

In this manner the values will only be used for this particular execution of MakeNoise. Alternatively, you can define them for the full period of your terminal session or script length, using the shell’s export command with the two separate commands below (for a script remove the $ signs):

$ export GSL_RNG_TYPE="taus"
$ export GSL_RNG_SEED=345

The subsequent programs which use GSL’s random number generators will hence forth use these values in this session of the terminal you are running or while executing this script. In case you want to set fixed values for these parameters every time you use the GSL random number generator, you can add these two lines to your .bashrc startup script117, see Installation directory.

NOTE: If the two environment variables GSL_RNG_TYPE and GSL_RNG_SEED are defined, GSL will report them by default, even if you don’t use the --envseed option. For example you can see the top few lines of the output of MakeProfiles:

$ export GSL_RNG_TYPE="taus"
$ export GSL_RNG_SEED=345
$ astmkprof catalog.txt --envseed
GSL_RNG_TYPE=taus
GSL_RNG_SEED=345
MakeProfiles started on AAA BBB DD EE:FF:GG HHH
  - 6 profiles read from catalog.txt 0.000236 seconds
  - Random number generator (RNG) type: taus
  - RNG seed for all profiles: 345

The first two output lines (showing the names of the environment variables) are printed by GSL before MakeProfiles actually starts generating random numbers. The Gnuastro programs will report the values they use independently, you should check them for the final values used. For example if --envseed is not given, GSL_RNG_SEED will not be used and the last line shown above will not be printed. In the case of MakeProfiles, each profile will get its own seed value.


Previous: , Up: MakeNoise   [Contents][Index]

8.2.2 Invoking MakeNoise

MakeNoise will add noise to an existing image. The executable name is astmknoise with the following general template

$ astmknoise [OPTION ...] InputImage.fits

One line examples:

## Add noise with a standard deviation of 100 to image:
$ astmknoise --sigma=100 image.fits

## Add noise to input image assuming a background magnitude (with zeropoint
## magnitude of 0) and a certain instrumental noise:
$ astmknoise --background=-10 -z0 --instrumental=20 mockimage.fits

If actual processing is to be done, the input image is a mandatory argument. The full list of options common to all the programs in Gnuastro can be seen in Common options. The type (see Numeric data types) of the output can be specified with the --type option, see Input/Output options. The header of the output FITS file keeps all the parameters that were influential in making it. This is done for future reproducibility.

-s FLT
--sigma=FLT

The total noise sigma in the same units as the pixel values. With this option, the --background, --zeropoint and --instrumental will be ignored. With this option, the noise will be independent of the pixel values (which is not realistic, see Photon counting noise). Hence it is only useful if you are working on low surface brightness regions where the change in pixel value (and thus real noise) is insignificant.

-b FLT
--background=FLT

The background pixel value for the image in units of magnitudes, see Photon counting noise and Flux Brightness and magnitude.

-z FLT
--zeropoint=FLT

The zeropoint magnitude used to convert the value of --background (in units of magnitude) to flux, see Flux Brightness and magnitude.

-i FLT
--instrumental=FLT

The instrumental noise which is in units of flux, see Instrumental noise.

-e
--envseed

Use the GSL_RNG_SEED environment variable for the seed used in the random number generator, see Generating random numbers. With this option, the output image noise is always going to be identical (or reproducible).

-d
--doubletype

Save the output in the double precision floating point format that was used internally. This option will be most useful if the input images were of integer types.


Next: , Previous: , Up: Top   [Contents][Index]

9 High-level calculations

After the reduction of raw data (for example with the programs in Data manipulation) you will have reduced images/data ready for processing/analyzing (for example with the programs in Data analysis). But the processed/analyzed data (or catalogs) are still not enough to derive any scientific result. Even higher-level analysis is still needed to convert the observed magnitudes, sizes or volumes into physical quantities that we associate with each catalog entry or detected object which is the purpose of the tools in this section.


Previous: , Up: High-level calculations   [Contents][Index]

9.1 CosmicCalculator

To derive higher-level information regarding our sources in extra-galactic astronomy, cosmological calculations are necessary. In Gnuastro, CosmicCalculator is in charge of such calculations. Before discussing how CosmicCalculator is called and operates (in Invoking CosmicCalculator), it is important to provide a rough but mostly self sufficient review of the basics and the equations used in the analysis. In Distance on a 2D curved space the basic idea of understanding distances in a curved and expanding 2D universe (which we can visualize) are reviewed. Having solidified the concepts there, in Extending distance concepts to 3D, the formalism is extended to the 3D universe we are trying to study in our research.

The focus here is obtaining a physical insight into these equations (mainly for the use in real observational studies). There are many books thoroughly deriving and proving all the equations with all possible initial conditions and assumptions for any abstract universe, interested readers can study those books.


Next: , Previous: , Up: CosmicCalculator   [Contents][Index]

9.1.1 Distance on a 2D curved space

The observations to date (for example the Plank 2013 results), have not measured the presence of a significant curvature in the universe. However to be generic (and allow its measurement if it does in fact exist), it is very important to create a framework that allows curvature. As 3D beings, it is impossible for us to mentally create (visualize) a picture of the curvature of a 3D volume in a 4D space. Hence, here we will assume a 2D surface and discuss distances on that 2D surface when it is flat, or when the 2D surface is curved (in a 3D space). Once the concepts have been created/visualized here, in Extending distance concepts to 3D, we will extend them to the real 3D universe we live in and hope to study.

To be more understandable (actively discuss from an observer’s point of view) let’s assume we have an imaginary 2D friend living on the 2D space (which might be curved in 3D). So here we will be working with it in its efforts to analyze distances on its 2D universe. The start of the analysis might seem too mundane, but since it is impossible to imagine a 3D curved space, it is important to review all the very basic concepts thoroughly for an easy transition to a universe we cannot visualize any more (a curved 3D space in 4D).

To start, let’s assume a static (not expanding or shrinking), flat 2D surface similar to Figure 9.1 and that our 2D friend is observing its universe from point \(A\). One of the most basic ways to parametrize this space is through the Cartesian coordinates (\(x\), \(y\)). In Figure 9.1, the basic axes of these two coordinates are plotted. An infinitesimal change in the direction of each axis is written as \(dx\) and \(dy\). For each point, the infinitesimal changes are parallel with the respective axes and are not shown for clarity. Another very useful way of parameterizing this space is through polar coordinates. For each point, we define a radius (\(r\)) and angle (\(\phi\)) from a fixed (but arbitrary) reference axis. In Figure 9.1 the infinitesimal changes for each polar coordinate are plotted for a random point and a dashed circle is shown for all points with the same radius.

gnuastro-figures/flatplane

Figure 9.1: Two dimensional Cartesian and polar coordinates on a flat plane.

Assuming a certain position, which can be parameterized as \((x,y)\), or \((r,\phi)\), a general infinitesimal change change in its position will place it in the coordinates \((x+dx,y+dy)\) and \((r+dr,\phi+d\phi)\). The distance (on the flat 2D surface) that is covered by this infinitesimal change in the static universe (\(ds_s\), the subscript signifies the static nature of this universe) can be written as:

$$ds_s=dx^2+dy^2=dr^2+r^2d\phi^2$$

The main question is this: how can our 2D friend incorporate the (possible) curvature in its universe when it is calculating distances? The universe it lives in might equally be a locally flat but globally curved surface like Figure 9.2. The answer to this question but for a 3D being (us) is the whole purpose to this discussion. So here we want to give our 2D friend (and later, ourselves) the tools to measure distances if the space (that hosts the objects) is curved.

Figure 9.2 assumes a spherical shell with radius \(R\) as the curved 2D plane for simplicity. The spherical shell is tangent to the 2D plane and only touches it at \(A\). The result will be generalized afterwards. The first step in measuring the distance in a curved space is to imagine a third dimension along the \(z\) axis as shown in Figure 9.2. For simplicity, the \(z\) axis is assumed to pass through the center of the spherical shell. Our imaginary 2D friend cannot visualize the third dimension or a curved 2D surface within it, so the remainder of this discussion is purely abstract for it (similar to us being unable to visualize a 3D curved space in 4D). But since we are 3D creatures, we have the advantage of visualizing the following steps. Fortunately our 2D friend knows our mathematics, so it can follow along with us.

With the third axis added, a generic infinitesimal change over the full 3D space corresponds to the distance: $$ds_s^2=dx^2+dy^2+dz^2=dr^2+r^2d\phi^2+dz^2.$$It is very important to recognize that this change of distance is for any point in the 3D space, not just those changes that occur on the 2D spherical shell of Figure 9.2. Recall that our 2D friend can only do measurements in the 2D spherical shell, not the full 3D space. So we have to constrain this general change to any change on the 2D spherical shell. To do that, let’s look at the arbitrary point \(P\) on the 2D spherical shell. Its image (\(P'\)) on the flat plain is also displayed. From the dark triangle, we see that

gnuastro-figures/sphericalplane

Figure 9.2: 2D spherical plane (centered on \(O\)) and flat plane (gray) tangent to it at point \(A\).

$$\sin\theta={r\over R},\quad\cos\theta={R-z\over R}.$$These relations allow our 2D friend to find the value of \(z\) (an abstract dimension for it) as a function of r (distance on a flat 2D plane, which it can visualize) and thus eliminate \(z\). From \(\sin^2\theta+\cos^2\theta=1\), we get \(z^2-2Rz+r^2=0\) and solving for \(z\), we find: $$z=R\left(1\pm\sqrt{1-{r^2\over R^2}}\right).$$The \(\pm\) can be understood from Figure 9.2: For each \(r\), there are two points on the sphere, one in the upper hemisphere and one in the lower hemisphere. An infinitesimal change in \(r\), will create the following infinitesimal change in \(z\):

$$dz={\mp r\over R}\left(1\over \sqrt{1-{r^2/R^2}}\right)dr.$$Using the positive signed equation instead of \(dz\) in the \(ds_s^2\) equation above, we get:

$$ds_s^2={dr^2\over 1-r^2/R^2}+r^2d\phi^2.$$

The derivation above was done for a spherical shell of radius \(R\) as a curved 2D surface. To generalize it to any surface, we can define \(K=1/R^2\) as the curvature parameter. Then the general infinitesimal change in a static universe can be written as: $$ds_s^2={dr^2\over 1-Kr^2}+r^2d\phi^2.$$Therefore, we see that a positive \(K\) represents a real \(R\) which signifies a closed 2D spherical shell like Figure 9.2. When \(K=0\), we have a flat plane (Figure 9.1) and a negative \(K\) will correspond to an imaginary \(R\). The latter two cases are open universes (where \(r\) can extend to infinity). However, when \(K>0\), we have a closed universe, where \(r\) cannot become larger than \(R\) as in Figure 9.2.

A very important issue that can be discussed now (while we are still in 2D and can actually visualize things) is that \(\overrightarrow{r}\) is tangent to the curved space at the observer’s position. In other words, it is on the gray flat surface of Figure 9.2, even when the universe if curved: \(\overrightarrow{r}=P'-A\). Therefore for the point \(P\) on a curved space, the raw coordinate \(r\) is the distance to \(P'\), not \(P\). The distance to the point \(P\) (at a specific coordinate \(r\) on the flat plane) on the curved surface (thick line in Figure 9.2) is called the proper distance and is displayed with \(l\). For the specific example of Figure 9.2, the proper distance can be calculated with: \(l=R\theta\) (\(\theta\) is in radians). using the \(\sin\theta\) relation found above, we can find \(l\) as a function of \(r\):

$$\theta=\sin^{-1}\left({r\over R}\right)\quad\rightarrow\quad l(r)=R\sin^{-1}\left({r\over R}\right)$$\(R\) is just an arbitrary constant and can be directly found from \(K\), so for cleaner equations, it is common practice to set \(R=1\), which gives: \(l(r)=\sin^{-1}r\). Also note that if \(R=1\), then \(l=\theta\). Generally, depending on the the curvature, in a static universe the proper distance can be written as a function of the coordinate \(r\) as (from now on we are assuming \(R=1\)):

$$l(r)=\sin^{-1}(r)\quad(K>0),\quad\quad l(r)=r\quad(K=0),\quad\quad l(r)=\sinh^{-1}(r)\quad(K<0).$$With \(l\), the infinitesimal change of distance can be written in a more simpler and abstract form of

$$ds_s^2=dl^2+r^2d\phi^2.$$

Until now, we had assumed a static universe (not changing with time). But our observations so far appear to indicate that the universe is expanding (isn’t static). Since there is no reason to expect the observed expansion is unique to our particular position of the universe, we expect the universe to be expanding at all points with the same rate at the same time. Therefore, to add a time dependence to our distance measurements, we can simply add a multiplicative scaling factor, which is a function of time: \(a(t)\). The functional form of \(a(t)\) comes from the cosmology and the physics we assume for it: general relativity.

With this scaling factor, the proper distance will also depend on time. As the universe expands (moves), the distance will also move to larger values. We thus define a distance measure, or coordinate, that is independent of time and thus doesn’t ‘move’ which we call the comoving distance and display with \(\chi\) such that: \(l(r,t)=\chi(r)a(t)\). We thus shift the \(r\) dependence of the proper distance we derived above for a static universe to the comoving distance:

$$\chi(r)=\sin^{-1}(r)\quad(K>0),\quad\quad \chi(r)=r\quad(K=0),\quad\quad \chi(r)=\sinh^{-1}(r)\quad(K<0).$$

Therefore \(\chi(r)\) is the proper distance of an object at a specific reference time: \(t=t_r\) (the \(r\) subscript signifies “reference”) when \(a(t_r)=1\). At any arbitrary moment (\(t\neq{t_r}\)) before or after \(t_r\), the proper distance to the object can simply be scaled with \(a(t)\). Measuring the change of distance in a time-dependent (expanding) universe will also involve the speed of the object changing positions. Hence, let’s assume that we are only thinking about the change in distance caused by something (light) moving at the speed of light. This speed is postulated as the only constant and frame-of-reference-independent speed in the universe, making our calculations easier, light is also the major source of information we receive from the universe, so this is a reasonable assumption for most extra-galactic studies. We can thus parametrize the change in distance as

$$ds^2=c^2dt^2-a^2(t)ds_s^2 = c^2dt^2-a^2(t)(d\chi^2+r^2d\phi^2).$$


Next: , Previous: , Up: CosmicCalculator   [Contents][Index]

9.1.2 Extending distance concepts to 3D

The concepts of Distance on a 2D curved space are here extended to a 3D space that might be curved in a 4D space. We can start with the generic infinitesimal distance in a static 3D universe, but this time not in spherical coordinates instead of polar coordinates. \(\theta\) is shown in Figure 9.2, but here we are 3D beings, positioned on \(O\) (the center of the sphere) and the point \(O\) is tangent to a 4D-sphere. In our 3D space, a generic infinitesimal displacement will have the distance: $$ds_s^2=dx^2+dy^2+dz^2=dr^2+r^2(d\theta^2+\sin^2{\theta}d\phi^2).$$Like our 2D friend before, we now have to assume an abstract dimension which we cannot visualize. Let’s call the fourth dimension \(w\), then the general change in coordinates in the full four dimensional space will be: $$ds_s^2=dr^2+r^2(d\theta^2+\sin^2{\theta}d\phi^2)+dw^2.$$But we can only work on a 3D curved space, so following exactly the same steps and conventions as our 2D friend, we arrive at: $$ds_s^2={dr^2\over 1-Kr^2}+r^2(d\theta^2+\sin^2{\theta}d\phi^2).$$In a non-static universe (with a scale factor a(t), the distance can be written as: $$ds^2=c^2dt^2-a^2(t)[d\chi^2+r^2(d\theta^2+\sin^2{\theta}d\phi^2)].$$


Previous: , Up: CosmicCalculator   [Contents][Index]

9.1.3 Invoking CosmicCalculator

CosmicCalculator will calculate cosmological variables based on the input parameters. The executable name is astcosmiccal with the following general template

$ astcosmiccal [OPTION...] ...

One line examples:

## Print basic cosmological properties at redshift 2.5:
$ astcosmiccal -z2.5

## Only print Comoving volume over 4pi stradian to z (Mpc^3):
$ astcosmiccal --onlyvolume --redshift=0.8

## Assume Lambda and matter density of 0.7 and 0.3 and print
## basic cosmological parameters for redshift 2.1:
$ astcosmiccal -l0.7 -m0.3 -z2.1

The input parameters can be given as command-line options or in the configuration files, see Configuration files. For a definition of the different parameters, please see the sections prior to this. By default, all the cosmological calculations will be printed in the standard output (the command-line mainly) along with a short description and units.

The options starting with --only will only do that single desired calculation and only print the final number (in the same units as reported by default). These options are very useful when you want to call CosmicCalculator from a script. The resulting number can simply be put into a shell variable (for example vol) with the following line, which will allow you to use the value for any other subsequent operation.

z=3.12
vol=$(astcosmiccal --redshift=$z --onlyvolume)

In a script, this operation might be necessary for a very large number of objects (thousands of galaxies in a catalog for example). So the fact that all the other default calculations are ignored will also help you get to your result faster. If you just want to inspect the value of a variable, the description (which comes with units) might be more useful. In that case, the following command might be better. The other parameters will also be calculated, but they are so fast that you will not notice on modern computers.

$ astcosmiccal --redshift=0.832 | grep volume

The full list of options is shown and described below:

-z FLT
--redshift=FLT

The redshift of interest.

-H FLT
--H0=FLT

Current expansion rate (in km sec\(^{-1}\) Mpc\(^{-1}\)).

-l FLT
--olambda=FLT

Cosmological constant density divided by the critical density in the current Universe (\(\Omega_{\Lambda,0}\)).

-m FLT
--omatter=FLT

Matter (including massive neutrinos) density divided by the critical density in the current Universe (\(\Omega_{m,0}\)).

-r FLT
--oradiation=FLT

Radiation density divided by the critical density in the current Universe (\(\Omega_{r,0}\)).

-v
--onlyvolume

Only print the comoving volume (in units of Mpc\(^3\)) until the desired redshift based on the input parameters. See explanations above for more on these types of options and how to effectively use them.

-d
--onlyabsmagconv

Only print the conversion factor for apparent magnitude to absolute magnitude. Note that this is practically the distance modulus added with \(-2.5\log{(1+z)}\) for the the desired redshift based on the input parameters. See explanations above for more on these types of options and how to effectively use them.


Next: , Previous: , Up: Top   [Contents][Index]

10 Library

Each program in Gnuastro that was discussed in the prior chapters (or any program in general) is a collection of functions that is compiled into one executable file which can communicate directly with the outside world. The outside world in this context is the operating system. By communication, we mean that control is directly passed to a program from the operating system with a (possible) set of inputs and after it is finished, the program will pass control back to the operating system. For programs written in C and C++, the unique main function is in charge of this communication.

Similar to a program, a library is also a collection of functions that is compiled into one executable file. However, unlike programs, libraries don’t have a main function. Therefore they can’t communicate directly with the outside world. This gives you the chance to write your own main function and call library functions from within it. After compiling your program into a binary executable, you just have to link it to the library and you are ready to run (execute) your program. In this way, you can use Gnuastro at a much lower-level, and in combination with other libraries on your system, you can significantly boost your creativity.

This chapter starts with a basic introduction to libraries and how you can use them in Review of library fundamentals. The separate functions in the Gnuastro library are then introduced (classified by context) in Gnuastro library. If you end up routinely using a fixed set of library functions, with a well-defined input and output, it will be much more beneficial if you define a program for the job. Therefore, in its Version controlled source, Gnuastro comes with the The TEMPLATE program to easily define your own programs(s).


Next: , Previous: , Up: Library   [Contents][Index]

10.1 Review of library fundamentals

Gnuastro’s libraries are written in the C programming language. In Why C programming language?, we have thoroughly discussed the reasons behind this choice. C was actually created to write Unix, thus understanding the way C works can greatly help in effectively using programs and libraries in all Unix-like operating systems. Therefore, in the following subsections some important aspects of C, as it relates to libraries (and thus programs that depend on them) on Unix are reviewed. First we will discuss header files in Headers and then go onto Linking. This section finishes with Summary and example on libraries. If you are already familiar with these concepts, please skip this section and go directly to Gnuastro library.

In theory, a full operating system (or any software) can be written as one function. Such a software would not need any headers or linking (that are discussed in the subsections below). However, writing that single function and maintaining it (adding new features, fixing bugs, documentation and etc) would be a programmer or scientist’s worst nightmare! Furthermore, all the hard work that went into creating it cannot be reused in other software: every other programmer or scientist would have to re-invent the wheel. The ultimate purpose behind libraries (which come with headers and have to be linked) is to address this problem and increase modularity: “the degree to which a system’s components may be separated and recombined” (from Wikipedia). The more modular the source code of a program or library, the easier maintaining it will be, and all the hard work that went into creating it can be reused for a wider range of problems.


Next: , Previous: , Up: Review of library fundamentals   [Contents][Index]

10.1.1 Headers

C source code is read from top to bottom in the source file, therefore program components (for example variables, data structures and functions) should all be defined or declared closer to the top of the source file: before they are used. Defining something in C or C++ is jargon for providing its full details. Declaring it, on the other-hand, is jargon for only providing the minimum information needed for the compiler to pass it temporarily and fill in the detailed definition later.

For a function, the declaration only contains the inputs and their data-types along with the output’s type118. The definition adds to the declaration by including the exact details of what operations are done to the inputs to generate the output. As an example, take this simple summation function:

double
sum(double a, double b)
{
  return a + b;
}

What you see above is the definition of this function: it shows you (and the compiler) exactly what it does to the two double type inputs and that the output also has a double type. Note that a function’s internal operations are rarely so simple and short, it can be arbitrarily long and complicated. This unreasonably short and simple function was chosen here for ease of reading. The declaration for this function is:

double
sum(double a, double b);

You can think of a function’s declaration as a building’s address in the city, and the definition as the building’s complete blueprints. When the compiler confronts a call to a function during its processing, it doesn’t need to know anything about how the inputs are processed to generate the output. Just as the postman doesn’t need to know the inner structure of a building when delivering the mail. The declaration (address) is enough. Therefore by declaring the functions once at the start of the source files, we don’t have to worry about defining them after they are used.

Even for a simple real-world operation (not a simple summation like above!), you will soon need many functions (for example, some for reading/preparing the inputs, some for the processing, and some for preparing the output). Although it is technically possible, managing all the necessary functions in one file is not easy and is contrary to the modularity principle (see Review of library fundamentals), for example the functions for preparing the input can be usable in your other projects with a different processing. Therefore, as we will see later (in Linking), the functions don’t necessarily need to be defined in the source file where they are used. As long as their definitions are ultimately linked to the final executable, everything will be fine. For now, it is just important to remember that the functions that are called within one source file must be declared within the source file (declarations are mandatory), but not necessarily defined there.

In the spirit of modularity, it is common to define contextually similar functions in one source file. For example, in Gnuastro, functions that calculate the median, mean and other statistical functions are defined in lib/statistics.c, while functions that deal directly with FITS files are defined in lib/fits.c.

Keeping the definition of similar functions in a separate file greatly helps their management and modularity, but this fact alone doesn’t make things much easier for the caller’s source code: recall that while definitions are optional, declarations are mandatory. So if this was all, the caller would have to manually copy and paste (include) all the declarations from the various source files into the file they are working on now. To address this problem, programmers have adopted the header file convention: the header file of a source code contains all the declarations that a caller would need to be able to use any of its functions. For example, in Gnuastro, lib/statistics.c (file containing function definitions) comes with lib/gnuastro/statistics.h (only containing function declarations).

The discussion above was mainly focused on functions, however, there are many more programming constructs such as pre-processor macros and data structures. Like functions, they also need to be known to the compiler when it confronts a call to them. So the header file also contains their definitions or declarations when they are necessary for the functions.

Pre-processor macros (or macros for short) are replaced with their defined value by the pre-processor before compilation. Conventionally they are written only in capital letters to be easily recognized. It is just important to understand that the compiler doesn’t see the macros, it sees their fixed values. So when a header specifies macros you can do your programming without worrying about the actual values. The standard C types (for example int, or float) are very low-level and basic. We can collect multiple C types into a structure for a higher-level way to keep and pass-along data. See Generic data container (gal_data_t) for some examples of macros and data structures.

The contents in the header need to be included into the caller’s source code with a special pre-processor command: #include <path/to/header.h>. As the name suggests, the pre-processor goes through the source code prior to the processor (or compiler). One of its jobs is to include, or merge, the contents of files that are mentioned with this directive in the source code. Therefore the compiler sees a single entity containing the contents of the main file and all the included files. This allows you to include many (sometimes thousands of) declarations into your code with only one line. Since the headers are also installed with the library into your system, you don’t even need to keep a copy of them for each separate program, making things even more convenient.

Try opening some of the .c files in Gnuastro’s lib/ directory with a text editor to check out the include directives at the start of the file (after the copyright notice). Let’s take lib/fits.c as an example. You will notice that Gnuastro’s header files (like gnuastro/fits.h) are indeed within this directory (the fits.h file is in the gnuastro/ directory). You will notice that files like stdio.h, or string.h are not in this directory (or anywhere within Gnuastro).

On most systems the basic C header files (like stdio.h and string.h mentioned above) are located in /usr/include/119. Your compiler is configured to automatically search that directory (and possibly others), so you don’t have to explicitly mention these directories. Go ahead, look into the /usr/include directory and find stdio.h for example. When the necessary header files are not in those specific libraries, the pre-processor can also search in places other than the current directory. You can specify those directories with this pre-processor option120:

-I DIR

“Add the directory DIR to the list of directories to be searched for header files. Directories named by ’-I’ are searched before the standard system include directories. If the directory DIR is a standard system include directory, the option is ignored to ensure that the default search order for system directories and the special treatment of system headers are not defeated...” (quoted from the GNU Compiler Collection manual). Note that the space between I and the directory is optional and commonly not used.

If the pre-processor can’t find the included files, it will abort with an error. In fact a common error when building programs that depend on a library is that the compiler doesn’t not know where a library’s header is (see Known issues). So you have to manually tell the compiler where to look for the library’s headers with the -I option. For a small software with one or two source files, this can be done manually (see Summary and example on libraries). However, to enhance modularity, Gnuastro (and most other bin/libraries) contain many source files, so the compiler is invoked many times121. This makes manual addition or modification of this option practically impossible.

To solve this problem, in the GNU build system, there are conventional environment variables for the various kinds of compiler options (or flags). These environment variables are used in every call to the compiler (they can be empty). The environment variable used for the C Pre-Processor (or CPP) is CPPFLAGS. By giving CPPFLAGS a value once, you can be sure that each call to the compiler will be affected. See Known issues for an example of how to set this variable at configure time.

As described in Installation directory, you can select the top installation directory of a software using the GNU build system, when you ./configure it. All the separate components will be put in their separate sub-directory under that, for example the programs, compiled libraries and library headers will go into $prefix/bin (replace $prefix with a directory), $prefix/lib, and $prefix/include respectively. For enhanced modularity, libraries that contain diverse collections of functions (like GSL, WCSLIB, and Gnuastro), put their header files in a sub-directory unique to themselves. For example all Gnuastro’s header files are installed in $prefix/include/gnuastro. In your source code, you need to keep the library’s sub-directory when including the headers from such libraries, for example #include <gnuastro/fits.h>122. Not all libraries need to follow this convention, for example CFITSIO only has one header (fitsio.h) which is directly installed in $prefix/include.


Next: , Previous: , Up: Review of library fundamentals   [Contents][Index]

10.1.2 Linking

To enhance modularity, similar functions are defined in one source file (with a .c suffix, see Headers for more). After running make, each human-readable, .c file is translated (or compiled) into a computer-readable “object” file (ending with .o). Note that object files are also created when building programs, they aren’t particular to libraries. Try opening Gnuastro’s lib/ and bin/progname/ directories after running make to see these object files123. Afterwards, the object files are linked together to create an executable program or a library.

The object files contain the full definition of the functions in the respective .c file along with a list of any other function (or generally “symbol”) that is referenced there. To get a list of those functions you can use the nm program which is part of GNU Binutils. For example from the top Gnuastro directory, run:

$ nm bin/arithmetic/arithmetic.o

This will print a list of all the functions (more generally, ‘symbols’) that were called within bin/arithmetic/arithmetic.c along with some further information (for example a T in the second column shows that this function is actually defined here, U says that it is undefined here). Try opening the .c file to check some of these functions for your self. Run info nm for more information.

To recap, the compiler created the separate object files mentioned above for each .c file. The linker will then combine all the symbols of the various object files (and libraries) into one program or library. In the case of Arithmetic (a program) the contents of the object files in bin/arithmetic/ are copied (and re-ordered) into one final executable file which we can run from the operating system. When the symbols (computer-readable function definitions in most cases) are copied into the output like this, we call the process static linking. Let’s have a closer look at static linking: we’ll assume you have installed Gnuastro into the default /usr/local/ directory (see Installation directory). If you tried the nm command on one of Arithmetic’s object files above, then with the command below you can confirm that all the functions that were defined in the object files (had a T in the second column) are also defined in the astarithmetic executable:

$ nm /usr/local/bin/astarithmetic

But you will notice that there are still many undefined symbols (have a U in the second column). One class of such functions are Gnuastro’s own library functions that start with ‘gal_’:

$ nm /usr/local/bin/astarithmetic | grep gal_

These undefined symbols (functions) will be linked to the executable every time you run arithmetic. Therefore they are known as dynamically linked libraries 124. When the functions of a library need to be dynamically linked, the library is known as a shared library. As we saw above, static linking is done when the executable is being built. However, when a library is linked dynamically, its symbols are only checked with the available libraries at build time: they are not actually copied into the executable. Every time you run the program, the linker will be activated and will try to link the program to the installed library before it starts. If you want all the libraries to be statically linked to the executables, you have to tell Libtool (which Gnuastro uses for the linking) to disable shared libraries at configure time125:

$ configure --disable-shared

Try configuring, statically building and installing Gnuastro with the command above. Then check the gal_ symbols in the installed Arithmetic executable like before. You will see that they are actually copied this time (have a T in the second column). If the second column doesn’t convince you, look at the executable file size with the following command:

$ ls -lh /usr/local/bin/astarithmetic

It should be around 4.2 Megabytes with this static linking. If you configure and build Gnuastro again with shared libraries enabled (which is the default), you will notice that it is roughly 100 Kilobytes! This huge difference would have been very significant in the old days, but with the roughly Terabyte storage drives commonly in use today, it is negligible. Fortunately, output file size is not the only benefit of dynamic linking: since it links to the libraries at run-time (rather than build-time), you don’t have to re-build a higher-level program or library when an update comes for one of the lower-level libraries it depends on. You just install the new low-level library and it will automatically be used next time in your higher-level tools. To be fair, this also creates a few complications126:

To see a list of all the shared libraries that are needed for a program or a shared library to run, you can use the GNU C library’s ldd127 program, for example:

$ ldd /usr/local/bin/astarithmetic

Library file names start with a lib and end with suffix depending on their type as described below. In between these two is the name of the library, for example libgnuastro.a (Gnuastro’s static library) and libgsl.so.0.0.0 (GSL’s shared library).

For those libraries that use GNU Libtool (including Gnuastro and its dependencies), both static and dynamic libraries are built and installed in the prefix/lib/ directory (see Installation directory). In this way other programs can make which ever kind of link that they want.

To link with a library, the linker needs to know where to find the library. You do that with two separate options to the linker (see Summary and example on libraries for an example):

-L DIR

Will tell the linker to look into DIR for the libraries. For example -L/usr/local/lib, or -L/home/yourname/.local/lib. You can make multiple calls to this option, so the linker looks into several directories. Note that the space between L and the directory is optional and commonly not used.

-lLIBRARY

Specify the unique name of a library to be linked. As discussed above, library file names have fixed parts which must not be given to this option. So -lgsl will guide the linker to either look for libgsl.a or libgsl.so (depending on the type of linking it is suppose to do). You can link many libraries by repeated calls to this option.

Very important: The place of this option on the command line matters. This is often a source of confusion for beginners, so let’s assume you have asked the linker to link with library A using this option. As soon as the linker confronts this option, it looks into the list of the undefined symbols it has found until that point and does a search in library A for any of those symbols. If any pending undefined symbol is found in library A, it is used. After the search in undefined symbols is complete, the contents of library A are completely discarded from the linker’s memory. Therefore, if a later object file or library uses an unlinked symbol in library A, the linker will abort after it has finished its search in all the input libraries or object files.

As an example, Gnuastro’s gal_array_dlog10_array function depends on the log10 function of the C Math library (specified with -lm). So the proper way to link something that uses this function is -lgnuastro -lm. If instead, you give: -lm -lgnuastro the linker will complain and abort.


Previous: , Up: Review of library fundamentals   [Contents][Index]

10.1.3 Summary and example on libraries

After the mostly abstract discussions of Headers and Linking, we’ll give a small tutorial here. But before that, let’s recall the general steps of how your source code is prepared, compiled and linked to the libraries it depends on so you can run it:

  1. The pre-processor includes the header (.h) files into the function definition (.c) files, expands pre-processor macros and generally prepares the human-readable source for compilation (reviewed in Headers).
  2. The compiler will translate (compile) the human-readable contents of each source (merged .c and the .h files, or generally the output of the pre-processor) into the computer-readable code of .o files.
  3. The linker will link the called function definitions from various compiled files to create one unified object. When the unified product has a main function, this function is the product’s only entry point, enabling the operating system or user to directly interact with it, so the product is a program. When the product doesn’t have a main function, the linker’s product is a library and its exported functions can be linked to other executables (it has many entry points).

The GNU Compiler Collection (or GCC for short) will do all three steps. So as a first example, from Gnuastro’s source, go to tests/lib/. This directory contains the library tests, you can use these as some simple tutorials. For this demonstration, we will compile and run the arraymanip.c. This small program will call Gnuastro library for some simple operations on an array (open it and have a look). To compile this program, run this command inside the directory containing it.

$ gcc arraymanip.c -lgnuastro -lm -o arraymanip

The two -lgnuastro and -lm options (in this order) tell GCC to first link with the Gnuastro library and then with C’s math library. The -o option is used to specify the name of the output executable, without it the output file name will be a.out (on most OSs), independent of your input file name(s).

If your top Gnuastro installation directory (let’s call it $prefix, see Installation directory) is not recognized by GCC, you will get pre-processor errors for unknown header files. Once you fix it, you will get linker errors for undefined functions. To fix both, you should run GCC as follows: additionally telling it which directories it can find Gnuastro’s headers and compiled library (see Headers and Linking):

$ gcc -I$prefix/include -L$prefix/lib arraymanip.c -lgnuastro -lm     \
      -o arraymanip

This single command has done all the pre-processor, compilation and linker operations. Therefore no intermediate files (object files in particular) were created, only a single output executable was created. You are now ready to run the program with:

$ ./arraymanip

The Gnuastro functions called by this program only needed to be linked with the C math library. But if your program needs WCS coordinate transformations, needs to read a FITS file, needs special math operations (which include its linear algebra operations), or you want it to run on multiple CPU threads, you also need to add these libraries in the call to GCC: -lgnuastro -lwcs -lcfitsio -lgsl -lgslcblas -pthread -lm. In Gnuastro library, where each function is documented, it is mentioned which libraries (if any) must also be linked when you call a function. If you feel all these linkings can be confusing, please consider Gnuastro’s BuildProgram program.


Next: , Previous: , Up: Library   [Contents][Index]

10.2 BuildProgram

The number and order of libraries that are necessary for linking a program with Gnuastro library might be too confusing when you need to compile a small program for one particular job (with one source file). BuildProgram will use the information gathered during configuring Gnuastro and link with all the appropriate libraries on your system. This will allow you to easily compile, link and run programs that use Gnuastro’s library with one simple command and not worry about which libraries to link to, or the linking order.

BuildProgram uses GNU Libtool to find the necessary libraries to link against (GNU Libtool is the same program that builds all of Gnuastro’s libraries and programs when you run make). So in the future, if Gnuastro’s prerequisite libraries change or other libraries are added, you don’t have to worry, you can just run BuildProgram and internal linking will be done correctly.


Previous: , Up: BuildProgram   [Contents][Index]

10.2.1 Invoking BuildProgram

BuildProgram will compile and link a C source program with Gnuastro’s library and all its dependencies, greatly facilitating the compilation and running of small programs that use Gnuastro’s library. The executable name is astbuildprog with the following general template:

$ astbuildprog [OPTION...] C_SOURCE_FILE

One line examples:

## Compile, link and run `myprogram.c':
$ astbuildprog myprogram.c

## Similar to previous, but with optimization and compiler warnings:
$ astbuildprog -Wall -O2 myprogram.c

## Compile and link `myprogram.c', then run it with `image.fits'
## as its argument:
$ astbuildprog myprogram.c image.fits

## Also look in other directories for headers and linking:
$ astbuildprog -Lother -Iother/dir myprogram.c

## Just build (compile and link) `myprogram.c', don't run it:
$ astbuildprog --onlybuild myprogram.c

If BuildProgram is to run, it needs a C programming language source file as input. By default it will compile and link the program to build the a final executable file and run it. The built executable name can be set with the optional --output option. When no output name is set, BuildProgram will use Gnuastro’s Automatic output, and remove the suffix of the input and use that as the output name. For the full list of options that BuildProgram shares with other Gnuastro programs, see Common options. You may also use Gnuastro’s Configuration files to specify other libraries/headers to use for special directories and not have to type them in every time.

The first argument is considered to be the C source file that must be compiled and linked. Any other arguments (non-option tokens on the command-line) will be passed onto the program when BuildProgram wants to run it. Recall that by default BuildProgram will run the program after building it. This behavior can be disabled with the --onlybuild option.

When the --quiet option (see Operating mode options) is not called, BuildPrograms will print the compilation and running commands. Once your program grows and you break it up into multiple files (which are much more easily managed with Make), you can use the linking flags of the non-quiet output in your Makefile.

-I STR
--includedir=STR

Directory to search for files that you #include in your C program. Note that headers relating to Gnuastro and its dependencies don’t need this option. This is only necessary if you want to use other headers. It may be called multiple times and order matters. This directory will be searched before those of Gnuastro’s build and also the system search directories. See Headers for a thorough introduction.

From the GNU C Pre-Processor manual: “Add the directory STR to the list of directories to be searched for header files. Directories named by -I are searched before the standard system include directories. If the directory STR is a standard system include directory, the option is ignored to ensure that the default search order for system directories and the special treatment of system headers are not defeated”.

-L STR
--linkdir=STR

Directory to search for compiled libraries to link the program with. Note that all the directories that Gnuastro was built with will already be used by BuildProgram (GNU Libtool). This option is only necessary if your libraries are in other directories. Multiple calls to this option are possible and order matters. This directory will be searched before those of Gnuastro’s build and also the system search directories. See Linking for a thorough introduction.

-l STR
--linklib=STR

Library to link with your program. Note that all the libraries that Gnuastro was built with will already be linked by BuildProgram (GNU Libtool). This option is only necessary if you want to link with other directories. Multiple calls to this option are possible and order matters. This library will be linked before Gnuastro’s library or its dependencies. See Linking for a thorough introduction.

-O INT/STR
--optimize=INT/STR

Compiler optimization level: 0 (for no optimization, good debugging), 1, 2, 3 (for the highest level of optimizations). From the GNU Compiler Collection (GCC) manual: “Without any optimization option, the compiler’s goal is to reduce the cost of compilation and to make debugging produce the expected results. Statements are independent: if you stop the program with a break point between statements, you can then assign a new value to any variable or change the program counter to any other statement in the function and get exactly the results you expect from the source code. Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program.” Please see your compiler’s manual for the full list of acceptable values to this option.

-g
--debug

Emit extra information in the compiled binary for use by a debugger. When calling this option, it is best to explicitly disable optimization with -O0. To combine both options you can run -gO0 (see Options for how short options can be merged into one).

-W STR
--warning=STR

Print compiler warnings on command-line during compilation. “Warnings are diagnostic messages that report constructions that are not inherently erroneous but that are risky or suggest there may have been an error.” (from the GCC manual). It is always recommended to compile your programs with warnings enabled.

All compiler warning options that start with W are usable by this option in BuildProgram also, see your compiler’s manual for the full list. Some of the most common values to this option are: pedantic (Warnings related to standard C) and all (all issues the compiler confronts).

-b
--onlybuild

Only build the program, don’t run it. By default, the built program is immediately run afterwards.


Next: , Previous: , Up: Library   [Contents][Index]

10.3 Gnuastro library

Gnuastro library’s programming constructs (function declarations, macros, data structures, or global variables) are classified by context into multiple header files (see Headers)128. In this section, the functions in each header will be discussed under a separate sub-section, which includes the name of the header. Assuming a function declaration is in headername.h, you can include its declaration in your source code with:

# include <gnuastro/headername.h>

The names of all constructs in headername.h are prefixed with gal_headername_ (or GAL_HEADERNAME_ for macros). The gal_ prefix stands for GNU Astronomy Library.

Gnuastro library functions are compiled into a single file which can be linked on the command-line with the -lgnuastro option, (see Linking and Summary and example on libraries for an introduction on linking and example). Gnuastro library is a high-level library which depends on lower level libraries for some operations (see Dependencies). Therefore if at least one of Gnuastro’s functions in your program use functions from the dependencies, you will also need to link those dependencies after linking with Gnuastro. The outside libraries that need to be linked for such functions are mentioned following the function name. See BuildProgram for a small Gnuastro program that will take care of the libraries to link against and lets you focus on your exciting science.

Libraries are still under heavy development: Gnuastro was initially created to be a collection of command-line programs. However, as the programs and their the shared functions grew, internal (not installed) libraries were added. Since the 0.2 release, the libraries are install-able. Hence the libraries are currently under heavy development and will significantly evolve between releases and will become more mature and stable in due time. It will stabilize with the removal of this notice. Check the NEWS file for interface changes. If you use the Info version of this manual (see Info), you don’t have to worry: the documentation will correspond to your installed version.


Next: , Previous: , Up: Gnuastro library   [Contents][Index]

10.3.1 Configuration information (config.h)

The gnuastro/config.h header contains information about the full Gnuastro installation on your system. Gnuastro developers should note that this is the only header that is not available within Gnuastro, it is only available to a Gnuastro library user after installation. Within Gnuastro, config.h (which is included in every Gnuastro .c file, see Coding conventions) has more than enough information about the overall Gnuastro installation.

Macro: GAL_CONFIG_VERSION

This macro can be used as a string literal129 containing the version of Gnuastro that is being used. See Version numbering for the version formats. For example:

printf("Gnuastro version: %s\n", GAL_CONFIG_VERSION);

or

char *gnuastro_version=GAL_CONFIG_VERSION;
Macro: GAL_CONFIG_HAVE_LIBGIT2

Libgit2 is an optional dependency of Gnuastro (see Optional dependencies). When it is installed and detected at configure time, this macro will have a value of 1 (one). Otherwise, it will have a value of 0 (zero). Gnuastro also comes with some wrappers to make it easier to use libgit2 (see Git wrappers (git.h)).

Macro: GAL_CONFIG_HAVE_WCSLIB_VERSION

WCSLIB is the reference library for world coordinate system transformation (see WCSLIB and World Coordinate System (wcs.h)). However, only more recent versions of WCSLIB also provide its version number. If the WCSLIB that is installed on the system provides its version (through the possibly existing wcslib_version function), this macro will have a value of one, otherwise it will have a value of zero.

Macro: GAL_CONFIG_HAVE_PTHREAD_BARRIER

The POSIX threads standard define barriers as an optional requirement. Therefore, some operating systems choose to not include it. As one of the ./configure step checks, Gnuastro we check if your system has this POSIX thread barriers. If so, this macro will have a value of 1, otherwise it will have a value of 0. see Implementation of pthread_barrier for more.

Macro: GAL_CONFIG_BIN_OP_UINT8
Macro: GAL_CONFIG_BIN_OP_INT8
Macro: GAL_CONFIG_BIN_OP_UINT16
Macro: GAL_CONFIG_BIN_OP_INT16
Macro: GAL_CONFIG_BIN_OP_UINT32
Macro: GAL_CONFIG_BIN_OP_INT32
Macro: GAL_CONFIG_BIN_OP_UINT64
Macro: GAL_CONFIG_BIN_OP_INT64
Macro: GAL_CONFIG_BIN_OP_FLOAT32
Macro: GAL_CONFIG_BIN_OP_FLOAT64

If binary arithmetic operators were configured for any type, the respective macro will have a value of 1 (one), otherwise its value will be 0 (zero). Please see the similar configure-time options in Gnuastro configure options for a thorough explanation. These are only relevant for you if you intend to use the binary operators of Arithmetic on datasets (arithmetic.h)

Macro: GAL_CONFIG_SIZEOF_LONG
Macro: GAL_CONFIG_SIZEOF_SIZE_T

The size of (number of bytes in) the system’s long and size_t types. Their values are commonly either 4 or 8 for 32-bit and 64-bit systems. You can also get this value with the expression ‘sizeof size_t’ for example without having to include this header.


Next: , Previous: , Up: Gnuastro library   [Contents][Index]

10.3.2 Multithreaded programming (threads.h)

In recent years, newer CPUs don’t have significantly higher frequencies any more. However, CPUs are being manufactured with more cores, enabling more than one operation (thread) at each instant. This can be very useful to speed up many aspects of processing and in particular image processing.

Most of the programs in Gnuastro utilize multi-threaded programming for the CPU intensive processing steps. This can potentially lead to a significant decrease in the running time of a program, see A note on threads. In terms of reading the code, you don’t need to know anything about multi-threaded programming. You can simply follow the case where only one thread is to be used. In these cases, threads are not used and can be completely ignored.

When the C language was defined (the K&R’s book was written), using threads was not common, so C’s threading capabilities aren’t introduced there. Gnuastro uses POSIX threads for multi-threaded programming, defined in the pthread.h system wide header. There are various resources for learning to use POSIX threads. An excellent tutorial is provided by the Lawrence Livermore National Laboratory, with abundant figures to better understand the concepts, it is a very good start. The book ‘Advanced programming in the Unix environment’130, by Richard Stevens and Stephen Rago, Addison-Wesley, 2013 (Third edition) also has two chapters explaining the POSIX thread constructs which can be very helpful.

An alternative to POSIX threads was OpenMP, but POSIX threads are low level, allowing much more control, while being easier to understand, see Why C programming language?. All the situations where threads are used in Gnuastro currently are completely independent with no need of coordination between the threads. Such problems are known as “embarrassingly parallel” problems. They are some of the simplest problems to solve with threads and are also the ones that benefit most from them, see the LLNL introduction131.

One very useful POSIX thread concept is pthread_barrier. Unfortunately, it is only an optional feature in the POSIX standard, so some operating systems don’t include it. Therefore in Implementation of pthread_barrier, we introduce our own implementation. This is a rather technical section only necessary for more technical readers and you can safely ignore it. Following that, we describe the helper functions in this header that can greatly simplify writing a multi-threaded program, see Gnuastro’s thread related functions for more.


Next: , Previous: , Up: Multithreaded programming   [Contents][Index]

10.3.2.1 Implementation of pthread_barrier

One optional feature of the POSIX Threads standard is the pthread_barrier concept. It is a very useful high-level construct that allows for independent threads to “wait” behind a “barrier” for the rest after they finish. Barriers can thus greatly simplify the code in a multi-threaded program, so they are heavily used in Gnuastro. However, since its an optional feature in the POSIX standard, some operating systems don’t include it. So to make Gnuastro portable, we have written our own implementation of those pthread_barrier functions.

At ./configure time, Gnuastro will check if pthread_barrier constructs are available on your system or not. If pthread_barrier is not available, our internal implementation will be compiled into the Gnuastro library and the definitions and declarations below will be usable in your code with #include <gnuastro/threads.h>.

Type: pthread_barrierattr_t

Type to specify the attributes of a POSIX threads barrier.

Type: pthread_barrier_t

Structure defining the POSIX threads barrier.

Function:
int
pthread_barrier_init (pthread_barrier_t *b, pthread_barrierattr_t *attr, unsigned int limit)

Initialize the barrier b, with the attributes attr and total limit (a number of) threads that must wait behind it. This function must be called before spinning off threads.

Function:
int
pthread_barrier_wait (pthread_barrier_t *b)

This function is called within each thread, just before it is ready to return. Once a thread’s function hits this, it will “wait” until all the other functions are also finished.

Function:
int
pthread_barrier_destroy (pthread_barrier_t *b)

Destroy all the information in the barrier structure. This should be called by the function that spinned-off the threads after all the threads have finished.

Destroy a barrier before re-using it: It is very important to destroy the barrier before (possibly) reusing it. This destroy function not only destroys the internal structures, it also waits (in 1 microsecond intervals, so you will not notice!) until all the threads don’t need the barrier structure any more. If you immediately start spinning off new threads with a not-destroyed barrier, then the internal structure of the remaining threads will get mixed with the new ones and you will get very strange and apparently random errors that are extremely hard to debug.


Previous: , Up: Multithreaded programming   [Contents][Index]

10.3.2.2 Gnuastro’s thread related functions

The POSIX Threads functions offered in the C library are very low-level and offer a great range of control over the properties of the threads. So if you are interested in customizing your tools for complicated thread applications, it is strongly encouraged to get a nice familiarity with them. Some resources were introduced in Multithreaded programming (threads.h).

However, in many cases used in astronomical data analysis, you don’t need communication between threads and each target operation can be done independently. Since such operations are very common, Gnuastro provides the tools below to facilitate the creation and management of jobs without any particular knowledge of POSIX Threads for such operations. The most interesting high-level functions of this section are the gal_threads_number and gal_threads_spin_off that identify the number of threads on the system and spin-off threads. You can see a demonstration of using these functions in Library demo - multi-threaded operation.

C struct): gal_threads_params

Structure keeping the parameters of each thread. When each thread is created, a pointer to this structure is passed to it. The params element can be the pointer to a structure defined by the user which contains all the necessary parameters to pass onto the worker function. The rest of the elements within this structure are set internally by gal_threads_spin_off and are relevant to the worker function.

struct gal_threads_params
{
  size_t            id; /* Id of this thread.                  */
  void         *params; /* User-identified pointer.            */
  size_t       *indexs; /* Target indexs given to this thread. */
  pthread_barrier_t *b; /* Barrier for all threads.            */
};
Function:
size_t
gal_threads_number ()

Return the number of threads that the operating system has available for your program. This number is usually fixed for a single machine and doesn’t change. So this function is useful when you want to run your program on different machines (with different CPUs).

Function:
void
gal_threads_spin_off (void *(*worker)(void *), void *caller_params, size_t numactions, size_t numthreads)

Distribute numactions jobs between numthreads threads and spin-off each thread by calling the worker function. The caller_params pointer will also be passed to worker as part of the gal_threads_params structure. For a fully working example of this function, please see Library demo - multi-threaded operation.

Function:
void
gal_threads_attr_barrier_init (pthread_attr_t *attr, pthread_barrier_t *b, size_t limit)

This is a low-level function in case you don’t want to use gal_threads_spin_off. It will initialize the general thread attribute attr and the barrier b with limit threads to wait behind the barrier. For maximum efficiency, the threads initialized with this function will be detached. Therefore no communication is possible between these threads and in particular pthread_join won’t work on these threads. You have to use the barrier constructs to wait for all threads to finish.

Function:
void
gal_threads_dist_in_threads (size_t numactions, size_t numthreads, size_t **outthrds, size_t *outthrdcols)

This is a low-level function in case you don’t want to use gal_threads_spin_off. Identify the “index”es (starting from 0) of the actions to be done on each thread in the outthrds array. outthrds is treated as a 2D array with numthreads rows and outthrdcols columns. The indexs in each row, identify the actions that should be done by one thread. Please see the explanation below to understand the purpose of this operation.

Let’s assume you have \(A\) actions (where there is only one function and the input values differ for each action) and \(T\) threads available to the system with \(A>T\) (common values for these two would be \(A>1000\) and \(T<10\)). Spinning off a thread is not a cheap job and requires a significant number of CPU cycles. Therefore, creating \(A\) threads is not the best way to address such a problem. The most efficient way to manage the actions is such that only \(T\) threads are created, and each thread works on a list of actions identified for it in series (one after the other). This way your CPU will get all the actions done with minimal overhead.

The purpose of this function is to do what we explained above: each row in the outthrds array contains the indexs of actions which must be done by one thread. outthrds contains outthrdcols columns. In using outthrds, you don’t have to know the number of columns. The GAL_BLANK_SIZE_T macro has a role very similar to a string’s \0: every row finishes with this macro, so can easily stop parsing the indexes in the row when you confront it. Please see the example program in tests/lib/multithread.c for a demonstration.


Next: , Previous: , Up: Gnuastro library   [Contents][Index]

10.3.3 Library data types (type.h)

Data in astronomy can have many types, numeric (numbers) and strings (names, identifiers). The former can also be divided into integers and floats, see Numeric data types for a thorough discussion of the different numeric data types and which one is useful for different contexts.

To deal with the very large diversity of types that are available (and used in different contexts), in Gnuastro each type is identified with global integer variable with a fixed name, this variable is then passed onto functions that can work on any type or is stored in Gnuastro’s Generic data container (gal_data_t) as one piece of meta-data.

The actual values within these integer constants is irrelevant and you should never rely on them. When you need to check, explicitly use the named variable in the table below. If you want to check with more than one type, you can use C’s switch statement.

Since Gnuastro heavily deals with file input-output, the types it defines are fixed width types, these types are portable to all systems and are defined in the standard C header stdint.h. You don’t need to include this header, it is included by any Gnuastro header that deals with the different types. However, the most commonly used types in a C (or C++) program (for example int or long are not defined by their exact width (storage), but by their minimum storage. So for example on some systems, int may be 2 bytes (16-bits, the minimum required by the standard) and on others it may be 4 bytes (32-bits, common in modern systems).

With every type, a unique “blank” value (or place holder showing the absence of data) can be defined. Please see Library blank values (blank.h) for constants that Gnuastro recognizes as a blank value for each type. See Numeric data types for more explanation on the limits and particular aspects of each type.

Global integer: GAL_TYPE_INVALID

This is just a place holder to specifically mark that no type has been set.

Global integer: GAL_TYPE_BIT

Identifier for a bit-stream. Currently no program in Gnuastro works directly on bits, but features will be added in the future.

Global integer: GAL_TYPE_UINT8

Identifier for an unsigned, 8-bit integer type: uint8_t (from stdint.h), or an unsigned char in most modern systems.

Global integer: GAL_TYPE_INT8

Identifier for a signed, 8-bit integer type: int8_t (from stdint.h), or an signed char in most modern systems.

Global integer: GAL_TYPE_UINT16

Identifier for an unsigned, 16-bit integer type: uint16_t (from stdint.h), or an unsigned short in most modern systems.

Global integer: GAL_TYPE_INT16

Identifier for a signed, 16-bit integer type: int16_t (from stdint.h), or a short in most modern systems.

Global integer: GAL_TYPE_UINT32

Identifier for an unsigned, 32-bit integer type: uint32_t (from stdint.h), or an unsigned int in most modern systems.

Global integer: GAL_TYPE_INT32

Identifier for a signed, 32-bit integer type: int32_t (from stdint.h), or an int in most modern systems.

<