GNU Astronomy Utilities



2.2.1 Downloading and validating input data

To get the image, you can use the simple field search tool of SDSS. As long as it is covered by the SDSS, you can find an image containing your desired target either by providing a standard name (if it has one), or its coordinates. To access the dataset we will use here, write NGC5195 in the “Object Name” field and press “Submit” button.

Type the example commands: Try to type the example commands on your terminal and use the history feature of your command-line (by pressing the “up” button to retrieve previous commands). Do not simply copy and paste the commands shown here. This will help simulate future situations when you are processing your own datasets.

You can see the list of available filters under the color image. For this demonstration, we will use the r-band filter image. By clicking on the “r-band FITS” link, you can download the image. Alternatively, you can just run the following command to download it with GNU Wget45. To keep things clean, let’s also put it in a directory called ngc5195. With the -O option, we are asking Wget to save the downloaded file with a more manageable name: r.fits.bz2 (this is an r-band image of NGC 5195, which was the directory name).

$ mkdir ngc5195
$ cd ngc5195
$ topurl=https://dr12.sdss.org/sas/dr12/boss/photoObj/frames
$ wget $topurl/301/3716/6/frame-r-003716-6-0117.fits.bz2 -Or.fits.bz2

When you want to reproduce a previous result (a known analysis, on a known dataset, to get a known result: like the case here!) it is important to verify that the file is correct: that the input file has not changed (on the remote server, or in your own archive), or there was no downloading problem. Otherwise, if the data have changed in your server/archive, and you use the same script, you will get a different result, causing a lot of confusion!

One good way to verify the contents of a file is to store its Checksum in your analysis script and check it before any other operation. The Checksum algorithms look into the contents of a file and calculate a fixed-length string from them. If any change (even in a bit or byte) is made within the file, the resulting string will change, for more see Wikipedia. There are many common algorithms, but a simple one is the SHA-1 algorithm (Secure Hash Algorithm 1) that you can calculate easily with the command below (the second line is the output, and the checksum is the first/long string: it is independent of the file name)

$ sha1sum r.fits.bz2
5fb06a572c6107c72cbc5eb8a9329f536c7e7f65  r.fits.bz2

If the checksum on your computer is different from this, either the file has been incorrectly downloaded (most probable), or it has changed on SDSS servers (very unlikely46). To get a better feeling of checksums open your favorite text editor and make a test file by writing something in it. Save it and calculate the text file’s SHA-1 checksum with sha1sum. Try renaming that file, and you’ll see the checksum has not changed (checksums only look into the contents, not the name/location of the file). Then open the file with your text editor again, make a change and re-calculate its checksum, you’ll see the checksum string has changed.

Its always good to keep this short checksum string with your project’s scripts and validate your input data before using them. You can do this with a shell conditional like this:

filename=r.fits.bz2
expected=5fb06a572c6107c72cbc5eb8a9329f536c7e7f65
sum=$(sha1sum $filename | awk '{print $1}')
if [ $sum = $expected ]; then
  echo "$filename: validated"
else
  echo "$filename: wrong checksum!"
  exit 1
fi

Now that we know you have the same data that we wrote this tutorial with, let’s continue. The SDSS server keeps the files in a Bzip2 compressed file format (that have a .bz2 suffix). So we will first decompress it with the following command to use it as a normal FITS file. By convention, compression programs delete the original file (compressed when uncompressing, or uncompressed when compressing). To keep the original file, you can use the --keep or -k option which is available in most compression programs for this job. Here, we do not need the compressed file any more, so we will just let bunzip delete it for us and keep the directory clean.

$ bunzip2 r.fits.bz2

Footnotes

(45)

To make the command easier to view on screen or in a page, we have defined the top URL of the image as the topurl shell variable. You can just replace the value of this variable with $topurl in the wget command.

(46)

If your checksum is different, try uncompressing the file with the bunzip2 command after this, and open the resulting FITS file. If it opens and you see the image of M51 and NGC5195, then there was no download problem, and the file has indeed changed on the SDSS servers! In this case, please contact us at bug-gnuastro@gnu.org.