GNU Astronomy Utilities

2.1.12 NoiseChisel optimization for storage

As we showed before (in NoiseChisel and Multi-Extension FITS files), NoiseChisel’s output is a multi-extension FITS file with several images the same size as the input. As the input datasets get larger this output can become hard to manage and waste a lot of storage space. Fortunately there is a solution to this problem (which is also useful for Segment’s outputs).

In this small section we will take a short detour to show this feature. Please note that the outputs generated here are not needed for the rest of the tutorial. But first, let’s have a look at the contents/HDUs and volume of NoiseChisel’s output from NoiseChisel optimization for detection (fast answer, it is larger than 100 mega-bytes):

$ astfits nc/xdf-f160w.fits
$ ls -lh nc/xdf-f160w.fits

Two options can drastically decrease NoiseChisel’s output file size: 1) With the --rawoutput option, NoiseChisel will not create a Sky-subtracted output. After all, it is redundant: you can always generate it by subtracting the SKY extension from the input image (which you have in your database) using the Arithmetic program. 2) With the --oneelempertile, you can tell NoiseChisel to store its Sky and Sky standard deviation results with one pixel per tile (instead of many pixels per tile). So let’s run NoiseChisel with these options, then have another look at the HDUs and the over-all file size:

$ astnoisechisel flat-ir/xdf-f160w.fits --oneelempertile --rawoutput \
$ astfits nc-for-storage.fits
$ ls -lh nc-for-storage.fits

See how nc-for-storage.fits has four HDUs, while nc/xdf-f160w.fits had five HDUs? As explained above, the missing extension is INPUT-NO-SKY. Also, look at the sizes of the SKY and SKY_STD HDUs, unlike before, they are not the same size as DETECTIONS, they only have one pixel for each tile (group of pixels in raw input). Finally, you see that nc-for-storage.fits is just under 8 mega bytes (while nc/xdf-f160w.fits was 100 mega bytes)!

But we are not yet finished! You can even be more efficient in storage, archival or transferring NoiseChisel’s output by compressing this file. Try the command below to see how NoiseChisel’s output has now shrunk to about 250 kilo-byes while keeping all the necessary information as the original 100 mega-byte output.

$ gzip --best nc-for-storage.fits
$ ls -lh nc-for-storage.fits.gz

We can get this wonderful level of compression because NoiseChisel’s output is binary with only two values: 0 and 1. Compression algorithms are highly optimized in such scenarios.

You can open nc-for-storage.fits.gz directly in SAO DS9 or feed it to any of Gnuastro’s programs without having to decompress it. Higher-level programs that take NoiseChisel’s output (for example, Segment or MakeCatalog) can also deal with this compressed image where the Sky and its Standard deviation are one pixel-per-tile. You just have to give the “values” image as a separate option, for more, see Segment and MakeCatalog.

Segment (the program we will introduce in the next section for identifying sub-structure), also has similar features to optimize its output for storage. Since this file was only created for a fast detour demonstration, let’s keep our top directory clean and move to the next step:

rm nc-for-storage.fits.gz