GNU Astronomy Utilities

Next: , Previous: , Up: Arithmetic operators   [Contents][Index]

6.2.3.14 Random number generators

When you simulate data (for example, see Sufi simulates a detection), everything is ideal and there is no noise! The final step of the process is to add simulated noise to the data. The operators in this section are designed for that purpose.

mknoise-sigma

Add a fixed noise (Gaussian standard deviation) to each element of the input dataset. This operator takes two arguments: the top/first popped operand is the noise standard deviation, the next popped operand is the dataset that the noise should be added to.

When --quiet is not given, a statement will be printed on each invocation of this operator (if there are multiple calls to the mknoise-*, the statement will be printed multiple times). It will show the random number generator function and seed that was used in that invocation, see Generating random numbers. Reproducibility of the outputs can be ensured with the --envseed option, see below for more.

for example, with the first command below, image.fits will be degraded by a noise of standard deviation 3 units.

$astarithmetic image.fits 3 mknoise-sigma  Alternatively, you can use this operator within column arithmetic in the Table program, to generate a random number like below (centered on 0, with $$\sigma=3$$) like the first command below. With the second command, you can put it into a shell variable for later usage. $ echo 0 | asttable -c'arith $1 3 mknoise-sigma'$ value=$(echo 0 | asttable -c'arith$1 3 mknoise-sigma')
$echo$value


You can also use this operator in combination with AWK to easily generate an arbitrarily large table with random columns. In the example below, we will create a two column table with 20 rows. The first column will be centered on 5 and $$\sigma_1=2$$, the second will be centered on 10 and $$\sigma_2=3$$:

$echo 5 10 \ | awk '{for(i=0;i<20;++i) print$1, $2}' \ | asttable -c'arith$1 2 mknoise-sigma' \
-c'arith $2 3 mknoise-sigma'  By adding an extra --output=random.fits, the table will be saved into a file called random.fits, and you can change the i<20 to i<5000 to have 5000 rows instead. Of course, if your input table has different values in the desired column the noisy distribution will be centered on each input element, but all will have the same scatter/sigma. You can use the --envseed option to fix the random number generator seed (and thus get a reproducible result). For more on --envseed, see Generating random numbers. When using column arithmetic in Table, it may happen that multiple columns need random numbers (with any of the mknoise-* operators) in one call of asttable. In such cases, the value given to GSL_RNG_SEED is incremented by one on every call to the mknoise-* operators. Without this increment, when the column values are the same (happens a lot, for no-noised datasets), the returned values for all columns will be identical. But this feature has a side-effect: that if the order of calling the mknoise-* operators changes, the seeds used for each operator will change134. In case each data element should have an independent sigma, the first popped operand can be a dataset of the same size as the second. In this case, for each element, a different noise measure (for example, sigma in mknoise-sigma) will be used. mknoise-poisson Add Poisson noise to each element of the input dataset (see Photon counting noise). This operator takes two arguments: 1. the first popped operand (just before the operator) is the per-pixel background value (in units of electron counts). 2. The second popped operand is the dataset that the noise should be added to. Recall that the background values reported by observatories (for example, to define dark or gray nights), or in papers, is usually reported in units of magnitudes per arcseconds square. You need to do the conversion to counts per pixel manually. The conversion of magnitudes to counts is described below. For converting arcseconds squared to number of pixels, you can use the --pixelscale option of Fits. for example, astfits image.fits --pixelscale. Except for the noise-model, this operator is very similar to mknoise-sigma and the examples there apply here too. The main difference with mknoise-sigma is that in a Poisson distribution the scatter/sigma will depend on each element’s value. For example, let’s assume you have made a mock image called mock.fits with MakeProfiles and it is assumed zero point is 22.5 (for more on the zero point, see Brightness, Flux, Magnitude and Surface brightness). Let’s assume the background level for the Poisson noise has a value of 19 magnitudes. You can first use the mag-to-counts operator to convert this background magnitude into counts, then feed the background value in counts to mknoise-poisson operator: $ astarithmetic mock.fits 19 22.5 mag-to-counts \
mknoise-poisson


Try changing the background value from 19 to 10 to see the effect! Recall that the tutorial Sufi simulates a detection shows how you can use MakeProfiles to build mock images.

mknoise-uniform

Add uniform noise to each element of the input dataset. This operator takes two arguments: the top/first popped operand is the width of the interval, the second popped operand is the dataset that the noise should be added to (each element will be the center of the interval). The returned random values may happen to be the minimum interval value, but will never be the maximum. Except for the noise-model, this operator behaves very similar to mknoise-sigma, see the explanation there for more.

for example, with the command below, a random value will be selected between 10 to 14 (centered on 12, which is the only input data element, with a total width of 4).

echo 12 | asttable -c'arith $1 4 mknoise-uniform'  Similar to the example in mknoise-sigma, you can pipe the output of echo to awk before passing it to asttable to generate a full column of uniformly selected values within the same interval. random-from-hist-raw Generate random values from a custom distribution (defined by a histogram). The output will have a double-precision floating point type (see Numeric data types). This operator takes three operands: • The first popped operand (nearest to the operator) is the histogram values. The histogram is a 1-dimensional dataset (a table column) and contains the probability of obtaining a certain interval of values. The histogram does not have to be normalized: the GNU Scientific Library (or GSL, which is used by Gnuastro for this operator), will normalize it internally. The value of each bin (whose probability is given in the histogram) is given in the second popped operand. Therefore these two operands have to have the same number of rows. • The second popped operand is the bin value (mostly the bin center, but it can be anything). The probability of each bin is defined in the histogram operand (first popped operand). The bins can have any width (do not have to be evenly spaced), and any order. Just make sure that the same row in the bins column corresponds to the same row in the histogram: the number of rows in the bins and histogram must be equal. • The third popped operand is the dataset that the random values should be written over. Effectively only its size will be used by this operator (all values will be over-written as a double-precision floating point number). The first two operands have to be single-dimensional (a table column) and have the same number of rows, but the last popped operand can have any number of dimensions. You can use the load-col- operator to load the two bins and histogram columns from an external file (see Loading external columns). For example, in the command below, we first construct a fake histogram to represent a $$y=x^2$$ distribution with AWK. We aim to distribute random values from this distribution in a $$100\times100$$ image. Therefore, we use the makenew operator to construct an empty image of that size, use the load-col- operator to load the histogram columns into Arithmetic and put the output in random.fits. Finally we visually inspect random.fits with DS9 and also have a look at its pixel distribution with aststatistics. $ echo "" | awk '{for(i=1;i<5;++i) print i, i*i}' \
> histogram.txt

$cat histogram.txt 1 1 2 4 3 9 4 16$ astarithmetic 100 100 2 makenew \
random-from-hist-raw \
--output=random.fits

$astscript-fits-view random.fits$ aststatistics random.fits --asciihist --numasciibins=50
|                                                 *
|                                                 *
|                                                 *
|                                                 *
|                                 *               *
|                                 *               *
|                                 *               *
|                *                *               *
|                *                *               *
|*               *                *               *
|*               *                *               *
|--------------------------------------------------


As you see, the 10000 pixels in the image only have values 1, 2, 3 or 4 (which were the values in the bins column of histogram.txt), and the number of times each of these values occurs follows the $$y=x^2$$ distribution.

Generally, any value given in the bins column will be used for the final output values. for example, in the command below (for generating a histogram from an analytical function), we are adding the bins by 20 (while keeping the same probability distribution of $$y=x^2$$). If you re-run the Arithmetic command above after this, you will notice that the pixels values are now one of the following 21, 22, 23 or 24 (instead of 1, 2, 3, or 4). But the shape of the histogram of the resulting random distribution will be unchanged.

$echo "" | awk '{for(i=1;i<5;++i) print 20+i, i*i}' \ > histogram.txt  If you do not want the outputs to have exactly the value of the bin identifier, but be a randomly selected value from a uniform distribution within the bin, you should use random-from-hist (see below). As mentioned above, the output will have a double-precision floating point type (see Numeric data types). Therefore, by default each element of the output will consume 8 bytes (64-bits) of storage. This is usually far more than the statistical error/precision of your data (and just results in wasted storage in your file system, or wasted RAM when a program that uses the data is being run, and a slower running time of the program). It is therefore recommended to use a type-conversion operator after this operator to put the output in the smallest type that can be used to safely store your data without wasting storage, RAM or time. For the list of type conversion operators, see Numerical type conversion operators. Recall that you already know the values returned by this operator (they are one of the values in the bins column). For example, in the example above, the whole image only has values 1, 2, 3 or 4. Since they are always positive and are below 255, we can safely place them in an unsigned 8-bit integer (see Numeric data types) with the command below (note the uint8 after the operator name, and that we are using a different name for the output). After building the new image, let’s have a look at the sizes of the two images with ls -l: $ astarithmetic 100 100 2 makenew \
random-from-hist-raw uint8 \
--output=random-u8.fits

$ls -lh random.fits random-u8.fits -rw-r--r-- 1 name name 85K Jan 01 13:40 random.fits -rw-r--r-- 1 name name 17K Jan 01 13:45 random-u8.fits  As you see, when using a suitable data type, we can shrink the size of the file significantly without loosing any information (from 85 kilobytes to 17 kilobytes). This difference can be felt much better for larger (real-world) datasets, so be sure to always set the output data type after calling this operator. random-from-hist Similar to random-from-hist-raw, but do not return the exact bin value, instead return a random value from a uniform distribution within each bin. Therefore the following limitations have to be taken into account (compared to random-from-hist-raw): • The number associated with each bin (in the bin column) should be its center. • The bins have to be in descending order (so the second row in the bin column is larger than the first). • The bin widths (distance from one bin to another) have to be fixed. For a demonstration, let’s replace random-from-hist-raw with random-from-hist in the example of the description of random-from-hist-raw. Note how we are manually converting the output of this operator into single-precision floating point (32-bit, since the default 64-bit precision is statistically meaningless in this scenario and we do not want to waste storage, memory and running time): $ echo "" | awk '{for(i=1;i<5;++i) print i, i*i}' \
> histogram.txt

$astarithmetic 100 100 2 makenew \ load-col-1-from-histogram.txt \ load-col-2-from-histogram.txt \ random-from-hist float32 \ --output=random.fits$ aststatistics random.fits --asciihist --numasciibins=50
|                                          *
|                                      *** ********
|                                      ************
|                                     *************
|                             *     * *************
|                         * ***********************
|                         *************************
|                         *************************
|             *************************************
|********* * **************************************
|**************************************************
|--------------------------------------------------


You can see that the pixels of histogram.fits are no longer just 1, 2, 3 or 4. Instead, the values within each bin are selected from a uniform distribution covering that bin. This creates the step-like feature in the histogram of the output.

Of course, this extra uniform random number generation can make your program slower so be sure to check if it is worth it. In particular, one way to avoid this (and use random-from-hist-raw with a more contiguous-looking output distribution) is to simply use a higher-resolution histogram (assuming it is possible: you have a sufficient number of data points, or you have an analytical expression that you can sample at smaller bin sizes).

To better demonstrate this operator and its practical usage in everyday research, let’s look at another example: Assume you want to get 100 random star magnitudes that follow the real-world Gaia Data release 3 magnitude distribution within a radius of 2 degrees around the (RA,Dec) coordinate of (1.23,4.56). Let’s further assume that you want to distribute them uniformly over an image of size 1000 by 1000 pixels. So your desired output table should have three columns, the first two are pixel positions of each star, and the third is the magnitude.

First, we need to query the Gaia database and ask for all the magnitudes in this region of the sky. We know that Gaia is not complete for stars fainter than the 20th magnitude, so we will use the --range option and only ask for those stars that are brighter than magnitude 20.

$astquery gaia --dataset=dr3 --center=1.23,3.45 --radius=2 \ --column=phot_g_mean_mag --output=gaia.fits \ --range=phot_g_mean_mag,-inf,20  We now have more than 25000 magnitudes in gaia.fits! To get a more accurate random sampling of our stars, let’s construct a histogram with 500 bins, and generate our three desired randomly selected columns: $ aststatistics gaia.fits --histogram --numbins=500 \
--output=gaia-hist.fits

$asttable gaia-hist.fits -i$ echo 1000 \
| awk '{for(i=0;i<100;++i) print $1/2}' \ | asttable -c'arith$1 500 mknoise-uniform' \
-c'arith $1 500 mknoise-uniform' \ -c'arith$1 \
random-from-hist float32'


These columns can easily be placed in the format for MakeProfiles to be inserted into an image automatically.

Footnotes

(134)

We have defined Task 15971 in Gnuastro’s project management system to address this. If you need this feature please send us an email at bug-gnuastro@gnu.org (to motivate us in its implementation).

Next: Elliptical shape operators, Previous: Numerical type conversion operators, Up: Arithmetic operators   [Contents][Index]