***************** Digital Filtering ***************** Introduction ============ The filters discussed in this chapter are based on the following moving data window which is centered on :math:`i`-th sample: .. only:: not texinfo .. math:: W_i^H = \left\{ x_{i-H}, \dots, x_i, \dots, x_{i+H} \right\} .. only:: texinfo :: W_i^H = { x_{i-H}, ..., x_i, ..., x_{i+H} } Here, :math:`H` is a non-negative integer called the *window half-length*, which represents the number of samples before and after sample :math:`i`. The total window length is :math:`K = 2 H + 1`. Handling Endpoints ================== When processing samples near the ends of the input signal, there will not be enough samples to fill the window :math:`W_i^H` defined above. Therefore the user must specify how to construct the windows near the end points. This is done by passing an input argument of type :type:`gsl_filter_end_t`: .. type:: gsl_filter_end_t This data type specifies how to construct windows near end points and can be selected from the following choices: .. macro:: GSL_FILTER_END_PADZERO With this option, a full window of length :math:`K` will be constructed by inserting zeros into the window near the signal end points. Effectively, the input signal is modified to .. only:: not texinfo .. math:: \tilde{x} = \{ \underbrace{0, \dots, 0}_{H \textrm{ zeros}}, x_1, x_2, \dots, x_{n-1}, x_n, \underbrace{0, \dots, 0}_{H \textrm{ zeros} } \} .. only:: texinfo :: x~ = { 0, ..., 0, x_1, x_2, ..., x_{n-1}, x_n, 0, ..., 0 } to ensure a well-defined window for all :math:`x_i`. .. macro:: GSL_FILTER_END_PADVALUE With this option, a full window of length :math:`K` will be constructed by padding the window with the first and last sample in the input signal. Effectively, the input signal is modified to .. only:: not texinfo .. math:: \tilde{x} = \{ \underbrace{x_1, \dots, x_1}_{H}, x_1, x_2, \dots, x_{n-1}, x_n, \underbrace{x_n, \dots, x_n}_{H} \} .. only:: texinfo :: x~ = { x_1, ..., x_1, x_1, x_2, ..., x_{n-1}, x_n, x_n, ..., x_n } .. macro:: GSL_FILTER_END_TRUNCATE With this option, no padding is performed, and the windows are simply truncated as the end points are approached. Linear Digital Filters ====================== Gaussian Filter --------------- The Gaussian filter convolves the input signal with a Gaussian kernel or window. This filter is often used as a smoothing or noise reduction filter. The Gaussian kernel is defined by .. only:: not texinfo .. math:: G(k) = e^{-\frac{1}{2} \left( \alpha \frac{k}{(K-1)/2} \right)^2} = e^{-k^2/2\sigma^2} .. only:: texinfo :: G(k) = e^{-1/2 ( \alpha k/((K-1)/2) )^2} = e^{-k^2/2\sigma^2} for :math:`-(K-1)/2 \le k \le (K-1)/2`, and :math:`K` is the size of the kernel. The parameter :math:`\alpha` specifies the number of standard deviations :math:`\sigma` desired in the kernel. So for example setting :math:`\alpha = 3` would define a Gaussian window of length :math:`K` which spans :math:`\pm 3 \sigma`. It is often more convenient to specify the parameter :math:`\alpha` rather than the standard deviation :math:`\sigma` when constructing the kernel, since a fixed value of :math:`\alpha` would correspond to the same shape of Gaussian regardless of the size :math:`K`. The appropriate value of the standard deviation depends on :math:`K` and is related to :math:`\alpha` as .. only:: not texinfo .. math:: \sigma = \frac{K - 1}{2\alpha} .. only:: texinfo :: \sigma = (K - 1)/(2 \alpha) The routines below accept :math:`\alpha` as an input argument instead of :math:`\sigma`. The Gaussian filter offers a convenient way of differentiating and smoothing an input signal in a single pass. Using the derivative property of a convolution, .. only:: not texinfo .. math:: \frac{d}{dt} \left( G * x \right) = \frac{dG}{dt} * x .. only:: texinfo :: d/dt ( G * x ) = dG/dt * x the input signal :math:`x(t)` can be smoothed and differentiated at the same time by convolution with a derivative Gaussian kernel, which can be readily computed from the analytic expression above. The same principle applies to higher order derivatives. .. function:: gsl_filter_gaussian_workspace * gsl_filter_gaussian_alloc(const size_t K) This function initializes a workspace for Gaussian filtering using a kernel of size :data:`K`. Here, :math:`H = K / 2`. If :math:`K` is even, it is rounded up to the next odd integer to ensure a symmetric window. The size of the workspace is :math:`O(K)`. .. function:: void gsl_filter_gaussian_free(gsl_filter_gaussian_workspace * w) This function frees the memory associated with :data:`w`. .. function:: int gsl_filter_gaussian(const gsl_filter_end_t endtype, const double alpha, const size_t order, const gsl_vector * x, gsl_vector * y, gsl_filter_gaussian_workspace * w) This function applies a Gaussian filter parameterized by :data:`alpha` to the input vector :data:`x`, storing the output in :data:`y`. The derivative order is specified by :data:`order`, with :code:`0` corresponding to a Gaussian, :code:`1` corresponding to a first derivative Gaussian, and so on. The parameter :data:`endtype` specifies how the signal end points are handled. It is allowed for :data:`x` = :data:`y` for an in-place filter. .. function:: int gsl_filter_gaussian_kernel(const double alpha, const size_t order, const int normalize, gsl_vector * kernel) This function constructs a Gaussian kernel parameterized by :data:`alpha` and stores the output in :data:`kernel`. The parameter :data:`order` specifies the derivative order, with :code:`0` corresponding to a Gaussian, :code:`1` corresponding to a first derivative Gaussian, and so on. If :data:`normalize` is set to :code:`1`, then the kernel will be normalized to sum to one on output. If :data:`normalize` is set to :code:`0`, no normalization is performed. Nonlinear Digital Filters ========================= The nonlinear digital filters described below are based on the window median, which is given by .. only:: not texinfo .. math:: m_i = \textrm{median} \left\{ W_i^H \right\} = \textrm{median} \left\{ x_{i-H}, \dots, x_i, \dots, x_{i+H} \right\} .. only:: texinfo :: m_i = median { W_i^H } = median { x_{i-H}, ..., x_i, ..., x_{i+H} } The median is considered robust to local outliers, unlike the mean. Median filters can preserve sharp edges while at the same removing signal noise, and are used in a wide range of applications. Standard Median Filter ---------------------- The *standard median filter* (SMF) simply replaces the sample :math:`x_i` by the median :math:`m_i` of the window :math:`W_i^H`: This filter has one tuning parameter given by :math:`H`. The standard median filter is considered highly resistant to local outliers and local noise in the data sequence :math:`\{x_i\}`. .. function:: gsl_filter_median_workspace * gsl_filter_median_alloc(const size_t K) This function initializes a workspace for standard median filtering using a symmetric centered moving window of size :data:`K`. Here, :math:`H = K / 2`. If :math:`K` is even, it is rounded up to the next odd integer to ensure a symmetric window. The size of the workspace is :math:`O(7K)`. .. function:: void gsl_filter_median_free(gsl_filter_median_workspace * w) This function frees the memory associated with :data:`w`. .. function:: int gsl_filter_median(const gsl_filter_end_t endtype, const gsl_vector * x, gsl_vector * y, gsl_filter_median_workspace * w) This function applies a standard median filter to the input :data:`x`, storing the output in :data:`y`. The parameter :data:`endtype` specifies how the signal end points are handled. It is allowed to have :data:`x` = :data:`y` for an in-place filter. Recursive Median Filter ----------------------- The *recursive median filter* (RMF) is a modification of the SMF to include previous filter outputs in the window before computing the median. The filter's response is .. only:: not texinfo .. math:: y_i = \textrm{median} \left( y_{i-H}, \dots, y_{i-1}, x_i, x_{i+1}, \dots, x_{i+H} \right) .. only:: texinfo :: y_i = median ( y_{i-H}, ..., y_{i-1}, x_i, x_{i+1}, ..., x_{i+H} ) Sometimes, the SMF must be applied several times in a row to achieve adequate smoothing (i.e. a cascade filter). The RMF, on the other hand, converges to a *root sequence* in one pass, and can sometimes provide a smoother result than several passes of the SMF. A root sequence is an input which is left unchanged by the filter. So there is no need to apply a recursive median filter twice to an input vector. .. function:: gsl_filter_rmedian_workspace * gsl_filter_rmedian_alloc(const size_t K) This function initializes a workspace for recursive median filtering using a symmetric centered moving window of size :data:`K`. Here, :math:`H = K / 2`. If :math:`K` is even, it is rounded up to the next odd integer to ensure a symmetric window. The size of the workspace is :math:`O(K)`. .. function:: void gsl_filter_rmedian_free(gsl_filter_rmedian_workspace * w) This function frees the memory associated with :data:`w`. .. function:: int gsl_filter_rmedian(const gsl_filter_end_t endtype, const gsl_vector * x, gsl_vector * y, gsl_filter_rmedian_workspace * w) This function applies a recursive median filter to the input :data:`x`, storing the output in :data:`y`. The parameter :data:`endtype` specifies how the signal end points are handled. It is allowed to have :data:`x` = :data:`y` for an in-place filter. Impulse Detection Filter ------------------------ Impulsive noise is characterized by short sequences of data points distinct from those in the surrounding neighborhood. This section describes a powerful class of filters, also known as *impulse rejection filters* and *decision-based filters*, designed to detect and remove such outliers from data. The filter's response is given by .. only:: not texinfo .. math:: y_i = \left\{ \begin{array}{cc} x_i, & |x_i - m_i| \le t S_i \\ m_i, & |x_i - m_i| > t S_i \end{array} \right. .. only:: texinfo :: y_i = { x_i, |x_i - m_i| <= t * S_i { m_i, |x_i - m_i| > t * S_i where :math:`m_i` is the median value of the window :math:`W_i^H`, :math:`S_i` is a robust estimate of the scatter or dispersion for the window :math:`W_i^H`, and :math:`t` is a tuning parameter specifying the number of scale factors needed to determine that a point is an outlier. The main idea is that the median :math:`m_i` will be unaffected by a small number of outliers in the window, and so a given sample :math:`x_i` is tested to determine how far away it is from the median in terms of the local scale estimate :math:`S_i`. Samples which are more than :math:`t` scale estimates away from the median are labeled as outliers and replaced by the window median :math:`m_i`. Samples which are less than :math:`t` scale estimates from the median are left unchanged by the filter. Note that when :math:`t = 0`, the impulse detection filter is equivalent to the standard median filter. When :math:`t \rightarrow \infty`, it becomes the identity filter. This means the impulse detection filter can be viewed as a "less aggressive" version of the standard median filter, becoming less aggressive as :math:`t` is increased. Note that this filter modifies only samples identified as outliers, while the standard median filter changes all samples to the local median, regardless of whether they are outliers. This fact, plus the additional flexibility offered by the additional tuning parameter :math:`t` can make the impulse detection filter a better choice for some applications. It is important to have a robust and accurate scale estimate :math:`S_i` in order to detect impulse outliers even in the presence of noise. The window standard deviation is not typically a good choice, as it can be significantly perturbed by the presence of even one outlier. GSL offers the following choices (specified by a parameter of type :type:`gsl_filter_scale_t`) for computing the scale estimate :math:`S_i`, all of which are robust to the presence of impulse outliers. .. type:: gsl_filter_scale_t This type specifies how the scale estimate :math:`S_i` of the window :math:`W_i^H` is calculated. .. macro:: GSL_FILTER_SCALE_MAD This option specifies the median absolute deviation (MAD) scale estimate, defined by .. only:: not texinfo .. math:: S_i = 1.4826 \times \textrm{median} \left\{ | W_i^H - m_i | \right\} .. only:: texinfo :: S_i = 1.4826 median { | W_i^H - m_i | } This choice of scale estimate is also known as the *Hampel filter* in the statistical literature. See :ref:`here ` for more information. .. macro:: GSL_FILTER_SCALE_IQR This option specifies the interquartile range (IQR) scale estimate, defined as the difference between the 75th and 25th percentiles of the window :math:`W_i^H`, .. only:: not texinfo .. math:: S_i = 0.7413 \left( Q_{0.75} - Q_{0.25} \right) .. only:: texinfo :: S_i = 0.7413 ( Q_{0.75} - Q_{0.25} ) where :math:`Q_p` is the p-quantile of the window :math:`W_i^H`. The idea is to throw away the largest and smallest 25% of the window samples (where the outliers would be), and estimate a scale from the middle 50%. The factor :math:`0.7413` provides an unbiased estimate of the standard deviation for Gaussian data. .. macro:: GSL_FILTER_SCALE_SN This option specifies the so-called :math:`S_n` statistic proposed by Croux and Rousseeuw. See :ref:`here ` for more information. .. macro:: GSL_FILTER_SCALE_QN This option specifies the so-called :math:`Q_n` statistic proposed by Croux and Rousseeuw. See :ref:`here ` for more information. .. warning:: While the scale estimates defined above are much less sensitive to outliers than the standard deviation, they can suffer from an effect called *implosion*. The standard deviation of a window :math:`W_i^H` will be zero if and only if all samples in the window are equal. However, it is possible for the MAD of a window to be zero even if all the samples in the window are not equal. For example, if :math:`K/2 + 1` or more of the :math:`K` samples in the window are equal to some value :math:`x^{*}`, then the window median will be equal to :math:`x^{*}`. Consequently, at least :math:`K/2 + 1` of the absolute deviations :math:`|x_j - x^{*}|` will be zero, and so the MAD will be zero. In such a case, the Hampel filter will act like the standard median filter regardless of the value of :math:`t`. Caution should also be exercised if dividing by :math:`S_i`. .. function:: gsl_filter_impulse_workspace * gsl_filter_impulse_alloc(const size_t K) This function initializes a workspace for impulse detection filtering using a symmetric moving window of size :data:`K`. Here, :math:`H = K / 2`. If :math:`K` is even, it is rounded up to the next odd integer to ensure a symmetric window. The size of the workspace is :math:`O(6K)`. .. function:: void gsl_filter_impulse_free(gsl_filter_impulse_workspace * w) This function frees the memory associated with :data:`w`. .. function:: int gsl_filter_impulse(const gsl_filter_end_t endtype, const gsl_filter_scale_t scale_type, const double t, const gsl_vector * x, gsl_vector * y, gsl_vector * xmedian, gsl_vector * xsigma, size_t * noutlier, gsl_vector_int * ioutlier, gsl_filter_impulse_workspace * w) These functions apply an impulse detection filter to the input vector :data:`x`, storing the filtered output in :data:`y`. The tuning parameter :math:`t` is provided in :data:`t`. The window medians :math:`m_i` are stored in :data:`xmedian` and the :math:`S_i` are stored in :data:`xsigma` on output. The number of outliers detected is stored in :data:`noutlier` on output, while the locations of flagged outliers are stored in the boolean array :data:`ioutlier`. The input :data:`ioutlier` may be :code:`NULL` if not desired. It is allowed to have :data:`x` = :data:`y` for an in-place filter. Examples ======== Gaussian Example 1 ------------------ This example program illustrates the Gaussian filter applied to smoothing a time series of length :math:`N = 500` with a kernel size of :math:`K = 51`. Three filters are applied with parameters :math:`\alpha = 0.5, 3, 10`. The results are shown in :numref:`fig_filt-gaussian`. .. _fig_filt-gaussian: .. figure:: /images/gaussfilt.png :scale: 60% Top panel: Gaussian kernels (unnormalized) for :math:`\alpha = 0.5, 3, 10`. Bottom panel: Time series (gray) with Gaussian filter output for same :math:`\alpha` values. We see that the filter corresponding to :math:`\alpha = 0.5` applies the most smoothing, while :math:`\alpha = 10` corresponds to the least amount of smoothing. The program is given below. .. include:: examples/gaussfilt.c :code: Gaussian Example 2 ------------------ A common application of the Gaussian filter is to detect edges, or sudden jumps, in a noisy input signal. It is used both for 1D edge detection in time series, as well as 2D edge detection in images. Here we will examine a noisy time series of length :math:`N = 1000` with a single edge. The input signal is defined as .. only:: not texinfo .. math:: x(n) = e(n) + \left\{ \begin{array}{cc} 0, & n \le N/2 \\ 0.5, & n > N/2 \end{array} \right. .. only:: texinfo :: x(n) = e(n) + { 0, n <= N/2 { 0.5, n > N/2 where :math:`e(n)` is Gaussian random noise. The program smooths the input signal with order :math:`0,1,` and :math:`2` Gaussian filters of length :math:`K = 61` with :math:`\alpha = 3`. For comparison, the program also computes finite differences of the input signal without smoothing. The results are shown in :numref:`fig_filt-gaussian2`. .. _fig_filt-gaussian2: .. figure:: /images/gaussfilt2.png :scale: 60% Top row: original input signal :math:`x(n)` (black) with Gaussian smoothed signal in red. Second row: First finite differences of input signal. Third row: Input signal smoothed with a first order Gaussian filter. Fourth row: Input signal smoothed with a second order Gaussian filter. The finite difference approximation of the first derivative (second row) shows the common problem with differentiating a noisy signal. The noise is amplified and makes it extremely difficult to detect the sharp gradient at sample :math:`500`. The third row shows the first order Gaussian smoothed signal with a clear peak at the location of the edge. Alternatively, one could examine the second order Gaussian smoothed signal (fourth row) and look for zero crossings to determine the edge location. The program is given below. .. include:: examples/gaussfilt2.c :code: Square Wave Signal Example -------------------------- The following example program illustrates the median filters on a noisy square wave signal. Median filters are well known for preserving sharp edges in the input signal while reducing noise. The program constructs a 5 Hz square wave signal with Gaussian noise added. Then the signal is filtered with a standard median filter and recursive median filter using a symmetric window of length :math:`K = 7`. The results are shown in :numref:`fig_filt-edge`. .. _fig_filt-edge: .. figure:: /images/filt_edge.png :scale: 60% Original time series is in gray. The standard median filter output is in green and the recursive median filter output is in red. Both filters preserve the sharp signal edges while reducing the noise. The recursive median filter achieves a smoother result than the standard median filter. The "blocky" nature of the output is characteristic of all median filters. The program is given below. .. include:: examples/filt_edge.c :code: Impulse Detection Example ------------------------- The following example program illustrates the impulse detection filter. First, it constructs a sinusoid signal of length :math:`N = 1000` with Gaussian noise added. Then, about 1% of the data are perturbed to represent large outliers. An impulse detecting filter is applied with a window size :math:`K = 25` and tuning parameter :math:`t = 4`, using the :math:`Q_n` statistic as the robust measure of scale. The results are plotted in :numref:`fig_impulse`. .. _fig_impulse: .. figure:: /images/impulse.png :scale: 60% Original time series is in blue, filter output is in green, upper and lower intervals for detecting outliers are in red and yellow respectively. Detected outliers are marked with squares. The program is given below. .. include:: examples/impulse.c :code: References and Further Reading ============================== The following publications are relevant to the algorithms described in this chapter, * F. J. Harris, *On the use of windows for harmonic analysis with the discrete Fourier transform*, Proceedings of the IEEE, 66 (1), 1978. * S-J. Ko, Y-H. Lee, and A. T. Fam. *Efficient implementation of one-dimensional recursive median filters*, IEEE transactions on circuits and systems 37.11 (1990): 1447-1450. * R. K. Pearson and M. Gabbouj, *Nonlinear Digital Filtering with Python: An Introduction*. CRC Press, 2015.