GNU Astronomy Utilities



11.2.1 Text functions for Makefiles

The functions described below operate on simple strings (plain text). They are therefore generic (not limited to astronomy/FITS), but because they are commonly necessary in astronomical data analysis pipelines and are not available anywhere else, we have included them in Gnuastro. The names of these functions start with ast-text-* and each has a fully working example to demonstrate its usage.

$(ast-text-to-upper STRING)

Returns the input string but with all characters in UPPER-CASE. For example, the following minimal Makefile will print FOOO BAAR UGGH word of the list.

load /usr/local/lib/libgnuastro_make.so

list   = fOOo bAar UggH
ulist := $(ast-text-to-upper $(list))
all:; echo $(ulist)
$(ast-text-to-lower STRING)

Returns the input string but with all characters in lower-case. For example, the following minimal Makefile will print fooo baar uggh word of the list.

load /usr/local/lib/libgnuastro_make.so

list  = fOOo bAar UggH
llist := $(ast-text-to-lower $(list))
all:; echo $(llist)
$(ast-text-contains STRING, TEXT)

Returns all white-space-separated words in TEXT that contain the STRING, removing any words that do not match. For example, the following minimal Makefile will only print the bAaz Aah word of the list.

load /usr/local/lib/libgnuastro_make.so

list = fooo baar bAaz uggh Aah
all:
     echo $(ast-text-contains Aa, $(list))

This can be thought of as Make’s own filter function, but if it would accept two patterns in a format like this $(filter %Aa%,$(list)) (for the example above). In fact, the first sentence describing this function is taken from the Make manual’s first sentence that describes the filter function! However, unfortunately Make’s filter function only accepts a single %, not two!

$(ast-text-not-contains STRING, TEXT)

Returns all white-space-separated words in TEXT that do not contain the STRING, removing any words that do not match. This is the inverse of the ast-text-contains function. For example, the following minimal Makefile will print fooo baar uggh word of the list.

load /usr/local/lib/libgnuastro_make.so

list = fooo baar bAaz uggh Aah
all:
     echo $(ast-text-not-contains Aa, $(list))
$(ast-text-prev TARGET, LIST)

Returns the word in LIST that is previous to TARGET. If TARGET is the first word of the list, or is not within it at all, this function will return an empty string (nothing). If any of the arguments are an empty string (or only contain space characters like ‘SPACE’, ‘TAB’, new-line and etc), this function will return an empty string (having no effect in Make).

One scenario when this function can be useful is when you want a list of higher-level targets to always be executed in sequence (even when Make is run in parallel). But you want their lower-level prerequisites to be executed in parallel.

The fully working example below shows this in practice: the “final” target depends on the sub-components a.fits, b.fits, c.fits and d.fits. But each one of these has seven dependencies (for example a.fits depends on the sub-sub-components a-1.fits, a-2.fits, a-3.fits, ...). Without this function, Make will first build all the sub-sub-components first, then the sub-components and ultimately the final target.

When the files are small and there aren’t too many of them, this is not a problem. But when you hundreds/thousands of sub-sub-components, your computer may not have the capacity to hold them all in storage or RAM (during processing). In such cases, you want to build the sub-components to built in series, but the sub-sub-components of each sub-component to be built in parallel. This function allows just this in an easy manner as below: the sub-sub-components of each sub-component depend on the previous sub-component.

To see the effect of this function put the example below in a Makefile and run make -j12 (to simultaneously execute 12 jobs); then comment/remove this function (so there is no prerequisite in $(subsubs)) and re-run make -j12.

# Basic settings
all: final
.SECONDEXPANSION:
load /usr/local/lib/libgnuastro_make.so

# 4 sub-components (alphabetic), each with 7
# sub-sub-components (numeric).
subids = a b c d
subsubids = 1 2 3 4 5 6 7
subs := $(foreach s, $(subids), $(s).fits)
subsubs := $(foreach s, $(subids), \
             $(foreach ss, $(subsubids), \
               $(s)-$(ss).fits))

# Build the sub-components:
$(subsubs): %.fits: $$(ast-text-prev \
                       $$(word 1, $$(subst -, ,%)).fits, \
                       $(subs))
        @echo "$@: $^"

# Build the final components
$(subs): %.fits: $$(foreach s, $(subsubids), %-$$(s).fits)
        @echo "$@: $^"

# Final
final: $(subs)
        @echo "$@: $^"

As you see, when this function is present, the sub-sub-components of each sub-component are executed in parallel, while at each moment, only a single sub-component’s prerequisites are being made. Without this function, make first builds all the sub-sub-components, then goes to the sub-components. There can be any level of components between these, allowing this operation to be as complex as necessary in your data analysis pipeline. Unfortunately the .NOTPARALLEL target of GNU Make doesn’t allow this level of customization.

$(ast-text-prev-batch TARGET, NUM, LIST)

Returns the previous batch of NUM words in LIST (in relation to the batch containing TARGET). NUM will be interpreted as an unsigned integer and cannot be zero. If any of the arguments are an empty string (or only contain space characters like ‘SPACE’, ‘TAB’, new-line and etc), this function will return an empty string (having no effect in Make). In the special case that NUM=1, this is equivalent to the ast-text-prev function that is described above.

Here is one scenario where this function is useful: in astronomy datasets are can easily be very large. Therefore, some Make recipes in your pipeline may require a lot of memory; such that executing them on all the available threads (for example 12 threads with -j12) will immediately occupy all your RAM, causing a crash in your pipeline. However, let’s assume that you have sufficient RAM to execute 4 targets of those recipes in parallel. Therefore while you want all the other steps of your pipeline to be using all 12 threads, you want one rule to only build 4 targets at any time. But before starting to use this function, also see ast-text-prev-batch-by-ram.

The example below demonstrates the usage of this function in a minimal working example of the scenario above: we want to build 15 targets, but in batches of 4 target at a time, irrespective of how many threads Make was executed with.

load /usr/local/lib/libgnuastro_make.so

.SECONDEXPANSION:

targets := $(foreach i,$(shell seq 15),a-$(i).fits)

all: $(targets)

$(targets): $$(ast-text-prev-batch $$@,4,$(targets))
        @echo "$@: $^"

If you place the example above in a plain-text file called Makefile (correcting for the TAB at the start of the recipe), and run Make on 12 threads like below, you will see the following output. The targets in each batch are not ordered (and the order may change in different runs) because they have been run in parallel.

$ make -j12
a-1.fits:
a-3.fits:
a-2.fits:
a-4.fits:
a-5.fits: a-1.fits a-2.fits a-3.fits a-4.fits
a-6.fits: a-1.fits a-2.fits a-3.fits a-4.fits
a-8.fits: a-1.fits a-2.fits a-3.fits a-4.fits
a-7.fits: a-1.fits a-2.fits a-3.fits a-4.fits
a-9.fits: a-5.fits a-6.fits a-7.fits a-8.fits
a-11.fits: a-5.fits a-6.fits a-7.fits a-8.fits
a-12.fits: a-5.fits a-6.fits a-7.fits a-8.fits
a-10.fits: a-5.fits a-6.fits a-7.fits a-8.fits
a-13.fits: a-9.fits a-10.fits a-11.fits a-12.fits
a-15.fits: a-9.fits a-10.fits a-11.fits a-12.fits
a-14.fits: a-9.fits a-10.fits a-11.fits a-12.fits

Any other rule that is later added to this make file (as a prerequisite/parent of targets or as a child of targets) will be run on 12 threads.

$(ast-text-prev-batch-by-ram TARGET, NEEDED_RAM_GB, LIST)

Similar to ast-text-prev-batch, but instead of taking the number of words/files in each batch, this function takes the maximum amount of RAM that is needed by one instance of the recipe. Through the NEEDED_RAM_GB argument, you should specify the amount of ram that a single instance of the recipe in this rule needs. If any of the arguments are an empty string (or only contain space characters like ‘SPACE’, ‘TAB’, new-line and etc), this function will return an empty string (having no effect in Make). When the needed RAM is larger than the available RAM only one job will be done at a time (similar to ast-text-prev).

The number of files in each batch is calculated internally by reading the available RAM on the system at the moment Make calls this function. Therefore this function is more generalizable to different computers (with very different RAM and/or CPU threads). But to avoid overlapping with other rules that may consume a lot of RAM, it is better to design your Makefile such that other rules are only executed once all instances of this rule have been completed.

For example, assume every instance of one rule in your Makefile requires a maximum of 5.2 GB of RAM during its execution, and your computer has 32 GB of RAM and 2 threads. In this case, you do not need to manage the targets at all: at the worst moment your pipeline will consume 10.4GB of RAM (much smaller than the 32GB of RAM that you have). However, you later run the same pipeline on another machine with identical RAM, but 12 threads! In this case, you will need \(5.2\times12=62.4\)GB of RAM; but the new system doesn’t have that much RAM, causing your pipeline to crash. If you used ast-text-prev-batch function (described above) to manage these hardware limitations, you would have to manually change the number on every new system; this is inconvenient, can cause many bugs, and requires manual intervention (not making your pipeline automatic).

The ast-text-prev-batch-by-ram function was designed as a solution to the problem above: it will read the amount of available RAM at the time that Make starts (before the recipes in your pipeline are actually executed). From the value to NEEDED_RAM_GB, it will then estimate how many instances of that recipe can be executed in parallel without breaching the available RAM of the system. Therefore it is important to not run another heavy RAM consumer on the system while your pipeline is being executed. Note that this function reads the available RAM, not total RAM; it therefore accounts for the background operations of the operating system or graphic user environment that are running in parallel to your pipeline; and assumes they will remain at the same level.

The fully working example below shows the usage of this function in a scenario where we assume the recipe requires 4.2GB of RAM for each target.

load /usr/local/lib/libgnuastro_make.so

.SECONDEXPANSION:

targets := $(foreach i,$(shell seq 13),$(i).fits)

all: $(targets)

$(targets): $$(ast-text-prev-batch-by-ram $$@,4.2,$(targets))
        @echo "$@: $^"

Once the contents above are placed in a Makefile and you execute the command below in a system with about 27GB of available RAM (total RAM is 32GB; the 5GB difference is used by the operating system and other background programs), you will get an output like below.

$ make -j12
1.fits:
2.fits:
3.fits:
4.fits:
5.fits:
6.fits:
7.fits: 1.fits 2.fits 3.fits 4.fits 5.fits 6.fits
8.fits: 1.fits 2.fits 3.fits 4.fits 5.fits 6.fits
11.fits: 1.fits 2.fits 3.fits 4.fits 5.fits 6.fits
10.fits: 1.fits 2.fits 3.fits 4.fits 5.fits 6.fits
9.fits: 1.fits 2.fits 3.fits 4.fits 5.fits 6.fits
12.fits: 1.fits 2.fits 3.fits 4.fits 5.fits 6.fits
13.fits: 7.fits 8.fits 9.fits 10.fits 11.fits 12.fits

Depending on the amount of available RAM on your system, you will get a different output. To see the effect, you can decrease or increase the amount of required RAM (4.2 in the example above).

What is the maximum RAM required by my command? Put a ‘/usr/bin/time --format=%M’ prefix behind your full command (including any options and arguments). For example like this for a call to Gnuastro’s Warp program:

/usr/bin/time --format=%M astwarp image.fits

After the regular outputs of the program, you will see a number on the last line. This number is the maximum used RAM (in kilobytes) during the execution of the program. Later, you can convert this to Gigabytes (to feed into this function) by dividing it to \(10^6\).