How to run simultaneous operations (GNU Astronomy Utilities)

Previous: A note on threads, Up: Multi-threaded operations [Contents][Index]

4.4.2 How to run simultaneous operations ¶

There are two¹²² approaches to simultaneously execute a program: using GNU Parallel or Make (GNU Make is the most common implementation). The first is very useful when you only want to do one job multiple times and want to get back to your work without actually keeping the command you ran. The second is usually for more important operations, with lots of dependencies between the different products (for example, a full scientific research).

GNU Parallel ¶

When you only want to run multiple instances of a command on different threads and get on with the rest of your work, the best method is to use GNU parallel. Surprisingly GNU Parallel is one of the few GNU packages that has no Info documentation but only a Man page, see Info. So to see the documentation after installing it please run

$ man parallel

As an example, let’s assume we want to crop a region fixed on the pixels (500, 600) with the default width from all the FITS images in the ./data directory ending with sci.fits to the current directory. To do this, you can run:

$ parallel astcrop --numthreads=1 --xc=500 --yc=600 ::: \
  ./data/*sci.fits

GNU Parallel can help in many more conditions, this is one of the simplest, see the man page for lots of other examples. For absolute beginners: the backslash (\) is only a line breaker to fit nicely in the page. If you type the whole command in one line, you should remove it.

Make ¶

Make is a program for building “targets” (e.g., files) using “recipes” (a set of operations) when their known “prerequisites” (other files) have been updated. It elegantly allows you to define dependency structures for building your final output and updating it efficiently when the inputs change. It is the most common infra-structure to build software today.

Scientific research methodology is very similar to software development: you start by testing a hypothesis on a small sample of objects/targets with a simple set of steps. As you are able to get promising results, you improve the method and use it on a larger, more general, sample. In the process, you will confront many issues that have to be corrected (bugs in software development jargon). Make is a wonderful tool to manage this style of development.

Besides the raw data analysis pipeline, Make has been used to for producing reproducible papers, for example, see the reproduction pipeline of the paper introducing NoiseChisel (one of Gnuastro’s programs). In fact the NoiseChisel paper’s Make-based workflow was the foundation of a parallel project called Maneage (Managing data lineage): http://maneage.org that is described more fully in Akhlaghi et al. 2021. Therefore, it is a very useful tool for complex scientific workflows.

GNU Make¹²³ is the most common implementation which (similar to nearly all GNU programs, comes with a wonderful manual¹²⁴). Make is very basic and simple, and thus the manual is short (the most important parts are in the first roughly 100 pages) and easy to read/understand.

Make comes with a --jobs (-j) option which allows you to specify the maximum number of jobs that can be done simultaneously. For example, if you have 8 threads available to your operating system. You can run:

$ make -j8

With this command, Make will process your Makefile and create all the targets (can be thousands of FITS images for example) simultaneously on 8 threads, while fully respecting their dependencies (only building a file/target when its prerequisites are successfully built). Make is thus strongly recommended for managing scientific research where robustness, archiving, reproducibility and speed¹²⁵ are important.

Footnotes

(122)

A third way would be to open multiple terminal emulator windows in your GUI, type the commands separately on each and press Enter once on each terminal, but this is far too frustrating, tedious and prone to errors. It’s therefore not a realistic solution when tens, hundreds or thousands of operations (your research targets, multiplied by the operations you do on each) are to be done.

(123)

https://www.gnu.org/software/make/

(124)

https://www.gnu.org/software/make/manual/

(125)

Besides its multi-threaded capabilities, Make will only rebuild those targets that depend on a change you have made, not the whole work. For example, if you have set the prerequisites properly, you can easily test the changing of a parameter on your paper’s results without having to re-do everything (which is much faster). This allows you to be much more productive in easily checking various ideas/assumptions of the different stages of your research and thus produce a more robust result for your exciting science.