Pooling operators (GNU Astronomy Utilities)

Next: Interpolation operators, Previous: Filtering (smoothing) operators, Up: Arithmetic operators [Contents][Index]

6.2.4.9 Pooling operators ¶

Pooling is one way of reducing the complexity of the input image by grouping multiple input pixels into one output pixel (using any statistical measure). As a result, the output image has fewer pixels (less complexity). In Computer Vision, Pooling is commonly used in Convolutional Neural Networks (CNNs).

In pooling, the inputs are an image (e.g., a FITS file) and a square window pixel size that is known as a pooling window. The window has to be smaller than the input’s number of pixels in both dimensions and its width is called the “pool size”. The pooling window starts at the top-left corner pixel of the input and calculates statistical operations on the pixels that overlap with it. It slides forward by the “stride” pixels, moving over all pixels in the input from the top-left corner to the bottom-right corner, and repeats the same calculation for the overlapping pixels in each position.

Usually, the stride (or spacing between the windows as they slide over the input) is equal to the window-size. In other words, in pooling, the separate “windows” do not overlap with each other on the input. However, you can choose any size for the stride. Remember this, It’s crucial to ensure that the stride size is less than the pool size. If not, some pixels may be missed during the pooling process. Therefore there are two major differences with Spatial domain convolution or Filtering (smoothing) operators, but pooling has some similarities to the Warp.

In convolution or filtering the input and output sizes are the same. However, when the stride is larger than 1 then, the output of pooling must have fewer pixels.
In convolution or filters, the kernels slide over the input in a pixel-by-pixel manner. As a result, the same pixel’s value will be used in many of the output pixels. However, in pooling each input pixel may be only used in a single output pixel (if the stride and the pool size are the same).
Special cases of Warping an image are similar to pooling. For example calling pool-sum with pool size of 2 will give the same pixel values (except the outer edges) as giving the same image to astwarp with --scale=1/2 --centeroncorner. However, warping will only provide the sum of the input pixels, there is no easy way to generically define something like pool-max in Warp (which is far more general than pooling). Also, due to its generic features (for example for non-linear warps), Warp is slower than the pool-max that is introduced here.

No WCS in output: As of Gnuastro 0.22, the output of pooling will not contain WCS information (primarily due to a lack of time by developers). Please inform us of your interest in having it, by contacting us at bug-gnuastro@gnu.org. If you need pool-sum, you can use Warp (which also modifies the WCS, see note above).

If the width or height of input is not divisible by the stride size, the pool window will go beyond the input pixel grid. In this case, the window pixels that do not overlap with the input are given a blank value (and thus ignored in the calculation of the desired statistical operation).

The simple ASCII figure below shows the pooling operation where the input is a \(3\times3\) pixel image with a pool size of 2 pixels. In the center of the second row, you see the intermediate input matrix that highlights how the input and output pixels relate with each other. Since the input is \(3\times3\) and we have a stride size of 2, as mentioned above blank pseudo-pixels are added with a value of B (for blank).

        Pool window:                             Input:
        +-----------+                           +-------------+
        |     |     |                           | 10   12   9 |
        | _ _ | _ _ |___________________________| 31   4    1 |
        |     |     |       ||          ||      | 16   5    8 |
        |     |     |       ||          ||      +-------------+
        +-----------+       ||          ||
    The pooling window 2*2  ||          ||
           stride 2         \/          \/
                        +---------------------+
                        |/ 10   12\|/ 9    B \|
                        |          |          |
  +-------+  pool-min   |\ 31   4 /|\ 1    B /|   pool-max  +-------+
  | 4   1 |   /------   |---------------------|   ------\   |31   9 |
  | 5   8 |   \------   |/ 16   5 \|/ 8    B \|   ------/   |16   8 |
  +-------+             |          |          |             +-------+
                        |\ B    B /.\ B    B /|
                        +---------------------+

The choice of the statistic to use depends on the specific use case, the characteristics of the input data, and the desired output. Each statistic has its advantages and disadvantages and the choice of which to use should be informed by the specific needs of the problem at hand. Below, the various pool operators of arithmetic are listed:

pool-max

Apply max-pooling on the input dataset. This operator takes three operands: the first popped operand is the stride and the second is the width of the square pooling window (which should be a single integer). Also, The third operand should be the input image. Within the pooling window, this operator will place the largest value in the output pixel (any blank pixels will be ignored).

See the ASCII diagram above for a demonstration of how max-pooling works. Here is an example of using this operator:

$ astarithmetic image.fits 2 2 pool-max

Max-pooling retains the largest value of the input window in the output, so the returned image is sharper where you have strong signal-to-noise ratio and more noisy in regions with no significant signal (only noise). It is therefore useful when the background of the image is dark and we are interested in only the highest signal-to-noise ratio regions of the image.

pool-min

Apply min-pooling on the input dataset. This operator takes three operands: the first popped operand is the stride and the second is the width of the square pooling window (which should be a single integer). Also, The third operand should be the input image. Except the used statistical measurement, this operator is similar to pool-max, see the description there for more.

Min-pooling is mostly used when the image has a high signal-to-noise ratio and a light background: min-pooling will select darker (lower-valued) pixels. For low signal-to-noise regions, this operator will increase the noise level (similar to the maximum, the scatter in the minimum is very strong).

pool-sum

Apply sum-pooling to the input dataset. This operator takes three operands: the first popped operand is the stride and the second is the width of the square pooling window (which should be a single integer). Also, The third operand should be the input image. Except the used statistical measurement, this operator is similar to pool-max, see the description there for more.

Sum-pooling will increase the signal-to-noise ratio at the cost of having a smoother output (less resolution).

pool-mean

Apply mean pooling on the input dataset. This operator takes three operands: the first popped operand is the stride and the second is the width of the square pooling window (which should be a single integer). Also, The third operand should be the input image. Except the used statistical measurement, this operator is similar to pool-max, see the description there for more.

The mean pooling method smooths out the image and hence the sharp features may not be identified when this pooling method is used. This therefore preserves more information than max-pooling, but may also reduces the effect of the most prominent pixels. Mean is often used where a more accurate representation of the input is required.

pool-median

Apply median pooling on the input dataset. This operator takes three operands: the first popped operand is the stride and the second is the width of the square pooling window (which should be a single integer). Also, The third operand should be the input image. Except the used statistical measurement, this operator is similar to pool-max, see the description there for more.

In general, the mean is mathematically easier to interpret and more susceptible to outliers, while the median outputs as being less subject to the influence of outliers compared to the mean so we have a smoother image. This is therefore better for low signal-to-ratio (noisy) features and extended features (where you don’t want a single high or low valued pixel to affect the output).