Skip to main content

Robust fuzzy scheme for Gaussian denoising of 3D color video


We propose a three-dimensional Gaussian denoising scheme for application to color video frames. The time is selected as a third dimension. The algorithm is developed using fuzzy rules and directional techniques. A fuzzy parameter is used for characterization of the difference among pixels, based on gradients and angle of deviations, as well as for motion detection and noise estimation. By using only two frames of a video sequence, it is possible to efficiently decrease Gaussian noise. This filter uses a noise estimator that is spatio-temporally adapted in a local manner, in a novel way using techniques mentioned herein, and proposing a fuzzy methodology that enhances capabilities in noise suppression when compared to other methods employed. We provide simulation results that show the effectiveness of the novel color video denoising algorithm.

1. Introduction

All pixels in digital color video frames are commonly affected by Gaussian-type noise due to the behavior of the image acquisition sensor; in accordance with this, we make the following assumptions as to the noise:

G x β = N 0 , σ = 1 2 π σ exp - x β 2 2 σ 2 ,

where x β represents the original pixel component value, β = {Red, Green, Blue} are the notations on each pixel color component (or channel), and σ is the standard deviation of the noise. In our case, the Gaussian function is independently used on the pixel component of each channel of the frame in order to obtain the corrupted video sequence.

A pre-processing procedure to reduce noise effect is the main stage of any computer vision application. It should include procedures to reduce the noise impact in a video without degrading the quality, edges, fine detail, and color properties.

The current proposal is an attempt to enhance the quality while processing the color video sequences corrupted by Gaussian noise; this methodology is an extension of the method proposed for impulsive noise removal[1]. There exist numerous algorithms that perform the processing of 3D signals using only the spatial information [2]. Other applications use only the temporal information [3, 4]; an example is one that uses wavelet procedures to reduce the delay in video coding [5]. There exist also some interesting applications that use spatio-temporal information [613]. The disadvantage of these 3D solutions is that they often require large memory and may introduce a significant time delay in cases where there is a need for more than one frame to be processed. This is undesirable in interactive applications such as infrared camera-assisted driving or videoconferencing. Moreover, full 3D techniques tend to require more computation than separable ones, and their optimal performance can be very difficult to determine. For example, integrating video coding and denoising is a novel processing paradigm and brings mutual benefits to both video processing tools. In Jovanov et al. [14], the main idea is the reuse of motion estimation resources from the video coding module for the purpose of video denoising. Some disadvantages of the work done by Dai et al. [15] is that they use a number of reference frames that increases the computational charge; the algorithm MHMCF was originally applied to grayscale video signal; and in the paper referenced [14], it was adapted to color video denoising, transforming the RGB video in a luminance color difference space proposed by the authors.

Other state-of-the-art algorithms found in literature work in the same manner; for example in Liu and Freeman [16], a framework that integrates robust optical flow into a non-local means framework with noise level estimation is used, and the temporal coherence is taken into account in removing structured noise. In the paper by Dabov et al. [17], it is interesting to see how they propose a method based on highly sparse signal representation in local 3D transform domain; a noisy video is processed in blockwise manner, and for each processed block, they form data array by stacking together blocks found to be similar to the currently processed one. In [18], Mairal et al. presented a framework for learning multiscale sparse representations of color images and video with overcomplete dictionaries. They propose a multiscaled learned representation obtained by using an efficient quadtree decomposition of the learned dictionary and overlapping image patches. This provides an alternative to predefined dictionaries such as wavelets.

The effectiveness of the algorithm designed is justified by comparing it with four other state-of-the-art approaches: ‘Fuzzy Logic Recursive Spatio-Temporal Filter’ (FLRSTF), where a fuzzy logic recursive scheme is proposed for motion detection and spatio-temporal filtering capable of dealing with Gaussian noise and unsteady illumination conditions in both the temporal and the spatial directions [19]. Another algorithm used for comparison is the ‘Fuzzy Logic Recursive Spatio-Temporal Filter using Angles’ (FLRSTF_ANGLE). This algorithm uses the angle deviations instead of gradients as a difference between pixels in the FLRSTF algorithm. The ‘Video Generalized Vector Directional Filtering in Gaussian Denoising’ (VGVDF_G) [20] is a directional technique that computes the angle deviations between pixels as a difference criterion among them. As a consequence, the vector directional filters (VDF) do not take into account the image brightness when processing the image vectors. Finally, the ‘Video Median M-type K-Nearest Neighbor in Gaussian Denoising’ filter (VMMKNN_G) [21, 22] uses order statistics techniques to characterize the pixel differences.

The proposed algorithm employs only two frames in order to reduce the computational processing charge and memory requirements, permitting one to produce an efficient denoising framework. Additionally, it applies the relationship that the neighboring pixels have to the central one in magnitude and angle deviation, connecting them by fuzzy logic rules designed to estimate the motion and noise parameters. The effectiveness of the present approach is justified by comparing it with four state-of-the-art algorithms found in literature as explained before.

The digital video database is formed by the Miss America, Flowers, and Chair color video sequences; this database is well known in scientific literature [23]. Frames were manipulated to be 24 bits in depth to form true-color images with 176 × 144 pixels, in order to work with the Quarter Common Intermediate Format (QCIF). These video sequences were selected because of their different natures and textures. The database was contaminated by Gaussian noise at different levels of intensity for each channel in an independent manner. This was used to characterize the performance, permitting the justification of the robustness of the novel framework.

2. Proposed fuzzy design

The first frame of the color video sequence is processed as follows. First, the histogram and the mean value x ¯ β for each pixel component are calculated, using a 3 × 3 processing window. Then, an angle deviation between two vectors x ¯ and x c containing components in the Red, Green, and Blue channels is computed as θ c = A x ¯ , x c , where θ c = cos - 1 x ¯ x c x ¯ x c is the angle deviation of the mean value vector x ¯ with respect to the central pixel vector (x c ) in a 3 × 3 processing window. Color-image processing has traditionally been approached in a component-wise manner, that is, by processing the image channels separately. These approaches fail to consider the inherent correlation that exists between the different channels, and they may result in pixel output values that are different from the input values with possible shifts in chromaticity [24]. Thus, it is desirable to employ vector approaches in color image processing to obtain the angle deviations.

The angle interval [0, 1] is used to determine the histogram. The pixel intensity takes values from 0 to 255 in each channel; the angle deviation θ c for any given pixel with respect to another one falls within the interval 0 , π 2 . The angle deviations outside the proposed interval ([0, 1]) are not taken into account in forming the histogram. Therefore, the noise estimator is obtained using only values inside this interval; this is to avoid the smoothness of some details and improve the criteria results.

It is common practice to normalize a histogram by dividing each of its components by the total number of pixels in the image; this is an estimate of the probability of occurrence of intensity levels in the image. Using this same principle, we propose the use of a normalized histogram based on angle deviations; this normalized histogram being an estimate of the probability of occurrence of the angle deviations between pixels. The procedure used to obtain the histogram is that of using the vectorial values: if [(F - 1)/255] ≤ θ c ≤ [F/255], the histogram is increased by ‘1’ in the F position; the parameter F increases from 1 to 255; if the aforementioned condition does not hold for the range of F, the histogram remains unchanged for F, where θ c is the angle deviation of the central pixel with respect to one of its neighboring pixel. The parameter F is proposed only to determine to which value of pixel intensity in a histogram the angle deviation belongs. After obtaining the histogram, the probability of occurrence for each one of the elements of the histogram must be calculated. After the mean value μ = j = 0 255 j p j is computed (where p j is the probability of occurrence of each element in the histogram), the variance σ β 2 = j = 0 255 j - μ 2 p j (where j represents each element inside the histogram) and the general standard deviation (SD) σ β ' = σ β 2 are determined. The SD parameter is used as the noise estimator for the purpose of decreasing Gaussian noise only for the first frame of the video sequence. In this step of the algorithm, σ β ' is the same for all three channels of a color image for the general process of the Gaussian denoising algorithm, as Figure 1 indicates. The SD parameter is used to find the deviations representing the data in its distribution from the arithmetic mean. This is in order to present them more realistically when it comes to describing and interpreting them for decision-making purposes. We estimate the SD parameter of the Gaussian noise from the input video sequence only for the first frame (t = 0) and subsequently try to adapt the SD to the input video and noise changes by spatio-temporal adaptation of the noise estimator SD.

Figure 1
figure 1

General scheme of the algorithm for Gaussian denoising.

To summarize, we use the SD parameter as an estimate of the noise to be applied in the spatial algorithm, which will be renewed on a temporary adaptive filter in order to ultimately generate an adaptive spatio-temporal noise estimator.

2.1. Spatial algorithm

The spatial algorithm allows one to estimate the angle deviation of the neighboring pixels with respect to the central one. The results are adjusted according to the processing windows used (see Figure 2b). This methodology was developed to effectively identify uniform areas within the image to be processed with a fast processing algorithm, like the ‘Mean Weighted Filtering Algorithm’ described below.

Figure 2
figure 2

Processing windows used in the Gaussian denoising algorithm.

The proposed fast filtering procedure is carried out using the following methodology under an ‘IF-OR-THEN-ELSE’ condition: IF (θ1ANDθ3ANDθ4ANDθ6τ1) OR (θ0ANDθ2ANDθ5ANDθ7τ1) THEN the Mean Weighted Filtering Algorithm, ELSE the Spatial Filtering Algorithm (where τ1 is a threshold defined as 0.1). All parameter values were proposed according to the best results of the peak signal-to-noise ratio (PSNR) and mean absolute error (MAE) criteria obtained after numerous experiments. If the Mean Weighted Filtering Algorithm [25] is selected, one processes the pixel's intensity components of the 3 × 3 window sample using Equation 2. This procedure is used because the angle deviation between the pixels is very small, which could indicate a uniform region where it is likely that there are no edges and details which may be softened. Thus, the use of the Mean Weighted Filtering Algorithm is proposed. Taking into account the fact that the relationship between a distance measure (angle deviation) is generally exponential, a sigmoidal linear membership function is suggested and a fuzzy weight 2 / 1 + e θ i associated with the vector x βi can be used in the following equation:

y β out = i = 0 i c N - 1 x βi 2 1 + e θ i + x βc / i = 0 N - 1 2 1 + e θ i + 1 ,

where N = 8 represents the number of data samples to be taken into account and it is in agreement with Figure 2; the fuzzy weight computed will produce an output in the interval [0,1], and it corresponds to each angle deviation value computed excluding the central angle deviation.

If the Spatial Filtering Algorithm was selected, it probably means that the sample contained edges and/or fine details. To implement this filter, the following methodology is proposed. The procedure consists of computing a new locally adapted SD (σ β ) for each plane of the color image, using a 5 × 5 processing window (see Figure 2a). In addition, the local updating of the SD should be undertaken according to the following condition: if σ β = σ β ' , then σ β = σ β ' ; otherwise σ β ' = σ β , where σ β ' was previously defined. This is most likely because the sample has edges and details, presenting a large value of dispersion among the pixels, so the largest SD value describes best this fact.

To provide a parameter indicating the similarity between the central and neighboring pixels, a gradient () was defined. This parameter describes the magnitude differences between pixels, and the gradient is applied to each β component of the noisy color frame independently, as follows: (k,l)β(i, j) = | β (i + k, j + l) - β (i, j)| = γβ , where the pair (k, l) (with k, l {-1, 0, 1}) represents each of the eight cardinal directions, and (i, j) is the center of the gradient. This leads to the main values (see Figure 3) [19]. The eight gradient values according to the eight different directions (or neighbors) are called main gradient values. To provide more robustness to the algorithm, and avoid image blurring, the use of two related gradient values in the same direction is proposed. We assume that in case an edge like image structure extends in a certain direction γ = {N, S, E, W, SE, SW, NE, NW}, it leads to large derivative values perpendicular to the direction γ at the current pixel position (i, j) and at the neighboring pixels as well, in other words; these values are determined making use of a right angle with the direction of the main gradient. For example (see Figure 3a), in the SE direction (for (k, l) = (1, 1) and (i, j) = (0, 0)), we calculate the main gradient value (1,1)β(i, j) = | β (i + 1, j + 1) - β (i, j)| = γβM and two related gradient values: (2,0)β(i + 1, j - 1) = | β (i + 2, j) - β (i + 1, j - 1)| = γβD 1 and (0,2)β(i - 1, j + 1) = | β (i, j + 2) - β (i - 1, j + 1)| = γβD 2. In such a manner, by taking into account those three derivative values for each direction (and combining them in a fuzzy logic manner), we distinguish local variations due to noise from those due to the edge-like image structures. The two derived gradient values are used to distinguish noisy pixels from edge pixels; when all of these gradients are larger than a predefined threshold T β , (i, j) is considered to be a noisy pixel and must be filtered out.

Figure 3
figure 3

Main and derived values (a) involved in 5 × 5 processing window; angle deviation (b) for only one plane, β = R .

Subsequently, the following condition should be verified: IF γβ(M,D 1,D 2) < T β (where T β = 2 · σ β ), THEN a membership value using γβ(M,D 1,D 2) is computed; otherwise, the membership value is 0. The threshold value T β is obtained experimentally according to the PSNR and MAE criteria: γ represents each of the eight cardinal points, and γβ(M,D 1,D 2) represents each of the values computed for each of the neighboring pixels with respect to the central one, within a sliding window. These gradients are called ‘main gradient values’. Two ‘derived gradient values’ are employed, avoiding the blurring of the image in the presence of an edge instead of a noisy pixel.

The detailed methodological procedures used to compute the main and derived gradient values for the eight cardinal directions are described by Zlokolica et al. [19]. If γβ(M,D 1,D 2) < T β for each of the three gradient values (the main and derived according to Figure 3a), then the angle deviation is calculated in the corresponding direction (γ). This means that if the main and derived gradient values are of a lower value than the threshold (T β ), one gets the angle deviations from the values for the three gradients; however, if any of these values do not satisfy the condition, the angle deviation is set to 0 for the values that do not comply.

Another way to characterize the difference between pixels is by calculating the angle deviation from the central pixel and its neighbors; this is called the main and derived vectorial values. Calculated angle deviations in the cardinal directions are taken as the weight values for each color plane of an image (Equation 3). These weights provide a relationship between pixels in a single plane at a given angle deviation. Equation 3 illustrates the calculation of the angle deviation to obtain the weight values, where

θ β = cos - 1 2 255 2 + x γβ M , D 1 , D 2 x ' γβ M , D 1 , D 2 2 255 2 + x γβ M , D 1 , D 2 2 2 255 2 + x ' γβ M , D 1 , D 2 2

these values range from 0 to 1 according to Figure 3b.

α γβ = 2 1 + exp θ β ,

where xγβ(M,D 1,D 2) is the pixel component in the associated direction. For example, for the x γβM component of the pixel, the coordinate is (0, 0) as shown in Figure 3a. Therefore, for component x '  γβM , the coordinate should be (1, 1) for the ‘SE’ cardinal direction, and so on. This parameter indicates that the smaller the difference in angle between the pixels involved, the greater the weight value of the pixel in the associated direction.

Finally, the main and derived vectorial gradient values are used to find a degree of membership using membership functions, which are functions that return a value between 0 and 1, indicating the degree of membership of an element with respect to a set (in our case, we define a BIG fuzzy set). Then, we can characterize the level of proximity of the components of the central pixel with respect to its neighbors, and see if it is a noisy or in motion component, or free of motion and/or low noise.

As mentioned above, we have defined a BIG fuzzy set; it will feature the presence of noise in the sample to be processed. The values that belong to this fuzzy set, in whole or in part, will represent the level of noise present in the pixel.

The membership function used to characterize the ‘main and derived vectorial gradient values’ is defined by:

μ BIG = max 1 - γβ M , D 1 , D 2 / T β , α γβ M , D 1 , D 2 , if γβ < T β 0 , otherwise ,

A fuzzy rule is created from this membership function, which is simply the application of the membership function by fuzzy operators. In this case, fuzzy operator OR is defined as OR(f1, f2) = max(f1, f2).

Each pixel has one returned value defined by the level of corruption present in the pixel. That is, one says ‘the pixel is corrupted’ if its BIG membership value is 1, and ‘the pixel is low-noise corrupted’ when its BIG membership value is 0. The linguistics ‘the pixel is corrupted’ and ‘the pixel is low-noise corrupted’ indicate the degree of belonging to each of the possible states in which the pixel can be found.

From the fuzzy rules, we obtain outputs, which are used to make decisions. The function defined by Equation 4 returns values between 0 and 1. It indicates how the parameter behaved with respect to the proposed fuzzy set. Finally, the following fuzzy rule is designed to connect gradient values with angle deviations, thus forming the ‘fuzzy vectorial-gradient values’.

Fuzzy rule 1 helps to detect the edges and fine details using the membership values of the BIG fuzzy set obtained by Equation 4. The fuzzy values obtained by this rule are taken as fuzzy weights and used in a fast processing algorithm to improve the computational load. This fast processing algorithm is defined by means of Equation 5.

Fuzzy rule 1: the fuzzy vectorial-gradient value is defined as γβ α γβ , so: IF (( γβM , α γβ ) is BIG AND (γβD 1, αγβD 1) is BIG) OR (( γβM , α γβ ) is BIG AND (γβD 2, αγβD 2) is BIG), THEN γβ α γβ is BIG. In this fuzzy rule, the ‘AND’ and ‘OR’ operators are defined as algebraic operations, consequently: AND = A · B, and OR = A + B - A · B.

The fuzzy weights are used in the fast algorithm as a final step in the noise suppression of the spatial algorithm; the fast algorithm is defined as an averaging procedure with weights as follows:

y β out = γ γβ α γβ x γβ γ γβ α γβ ,

where x γβ represents each component magnitude of the neighboring pixels around the central pixel within the pre-processing window (Figure 2b) in the respective cardinal direction, and yβ out is the output of the spatial algorithm applied to the first frame of the video sequence. From this, we obtain the first spatially filtered t frame which is then passed to the temporal algorithm, joined to the t + 1 frame according to the scheme described in Figure 1.

2.2. Temporal algorithm

The outlined spatial algorithm smoothes Gaussian noise efficiently but still loses some of the image's fine details and edges. To avoid these undesirable outputs, a temporal algorithm is proposed. To design such an algorithm, only two frames of the video sequence are used. The spatially filtered t frame obtained with the methodology developed in Section 2 is used once in order to provide the temporal algorithm of a filtered t frame to be used for reference to enhance the capabilities of the temporal algorithm from the first frame of the video stream without losing significant results, and the corrupted t + 1 frame of the video sequence.

The temporal algorithm, like the spatial algorithm, is governed by fuzzy rules to help detect the noise and motion present between pixels of two frames (t and t + 1), thus avoiding the loss of important features of video frames. The proposed fuzzy rules are used for each color plane of the two frames (t and t + 1) independently. In the same way as the spatial algorithm, the gradient and the angle deviation values are calculated in order to characterize the difference between pixels in the two frames of the video sequence. These values are related to the central pixel x βc t + 1 with respect to its neighbors in frames t and t + 1 and are computed as follows:

θ βic 1 = A x βi t , x βc t + 1 , βic 1 = x βi t - x βc t + 1 , i , j = 0 , , N ; where N = 8 ,
θ βij 2 = A x βi t , x βj t + 1 , βij 2 = x βi t - x βj t + 1 , θ βjc 3 = A x βj t + 1 , x βc t + 1 , βj 3 = x βj t + 1 - x βc t + 1 .

This is better understood with an example, as illustrated in Figure 4, for the case where β = Red (R), and i = j = 2.

Figure 4
figure 4

Application example for Equations 6 and 7 , where β = R , and i = j = 2.

Similarly defined as was the BIG fuzzy set, this set is defined as the SMALL fuzzy set. The same meanings for the expressions ‘the pixel is corrupted’ and the ‘the pixel is low-noise corrupted’ apply, but in the opposite direction. Assuming that a fuzzy set is totally characterized by a membership function, the membership function μSMALL (in the SMALL fuzzy set) is introduced to characterize the values associated with no movement and low-noise presence. By doing this, one can have a value between [0, 1] in order to measure the membership value with respect to the SMALL fuzzy set, where the value of 1 implies that the sample has no movement and low noise presence, and the value of 0 implies the opposite.

Thus, two fuzzy sets separately defined as BIG and SMALL are used to characterize the level of noise and/or movement in the sample processing. The membership functions μBIG and μSMALL, for gradients and angle deviations used by the temporal algorithm, are defined by the following expressions [25]:

μ SMALL χ = 1 if χ < μ 1 exp - χ - μ 1 2 / 2 σ 2 otherwise ,
μ BIG χ = 1 if χ > μ 2 exp - χ - μ 2 2 / 2 σ 2 otherwise ,

when χ = θ βγ for angle deviations, one has to select the parameters, standard deviation σ = 0.3163, mean μ1 = 0.2, and mean μ2 = 0.615; when χ = βγ for gradient values, select the parameters, standard deviation σ = 31.63, mean μ1 = 60, and mean μ2 = 140. The parameter values were obtained through extensive simulations carried out on the color video sequences used in this study. The idea was to find the optimal parameter values according to the PSNR and MAE criteria. The procedure used to compute the optimal values of the parameters in the event that χ = θ βγ is selected was the beginning and variation of standard deviation starting with the value 0.1, so the PSNR and MAE criteria could reach their optimal values while maintaining the fixed values of μ1 = 0.1 and μ2 = 0.1. Once we have the optimal values of PSNR and MAE, the parameter of standard deviation is fixed and μ1 subsequently increases until it reaches the optimal values for the PSNR and MAE criteria. Finally, upon the fixing of the standard deviation and μ1, the μ2 is varied until it again reaches the optimal values for the PSNR and MAE criteria. The same approach is used to calculate the values of the parameters when the event χ = βγ is selected, based on the PSNR and MAE criteria. These experimental results were obtained using the well-known Miss America and Flowers color video sequences.

The fuzzy rules illustrated in Figure 5 are designed to detect, pixel by pixel, the presence of motion. First, the motion relative to the central pixel in the t + 1 frame is detected, using the pixels in the t frame; then, motion detection is performed on a pixel basis in both frames; and finally, this procedure applies only to the pixels of the t + 1 frame. Following this, the procedure for the proposed fuzzy rules is described in Figure 5; these fuzzy rules allow the analyst to characterize the presence of motion and/or noise in the sample in order to determine which procedure to utilize during the image processing.

Figure 5
figure 5

Fuzzy rules 2, 3, 4, and 5 used to determine the motion level for t and t + 1 frames. (a) Fuzzy rule 2 SBB βic . (b) Fuzzy rule 3 SSS βic . (c) Fuzzy rule 4 BBB βic . (d) Fuzzy rule 5 BBS βic .

The fuzzy rules of Figure 5 were designed to characterize, in a fuzzy directional manner, the relationship between pixels in a sliding window using two frames. Hence, the movement and the noise level presence in the central pixel of the sample are found. To understand the meaning of these fuzzy rules, the following situation is assumed: if the fuzzy directional values obtained by the membership function for the SMALL fuzzy set are close to one, then there is neither motion nor low-noise presence in the central pixel component. Conversely, if the values of the membership function are close to one for the BIG fuzzy set, the central pixel component is noisy and/or presents motion. Thus, for fuzzy rule 2, the values SMALL, BIG, and BIG (SBB) characterize a pixel in motion, in such a way that the first value characterizes the closeness of a SMALL component to the central pixel in the t + 1 frame with the pixel component of a neighbor in the t frame; the first BIG value indicates that the component of the pixel in the t frame and the component of the pixel in the t + 1 frame are unrelated; and the second BIG conveys that the value of the component of the pixel of the t + 1 frame, with respect to the component of the central pixel of the t + 1 frame shows some difference, therefore this pixel is highly likely to belong to an edge and/or is in motion. These findings reinforce the correctness of the parameters obtained for other neighboring component pixels. In this way, the relationship of proximity between the central pixel of the t + 1 frame with respect to the neighboring pixels of the t and t + 1 frames is obtained.

This study also aims at improving performance over computational resources of the algorithm making the distinction among different areas, especially, finding areas of an image that could be processed by a magnitude filter without affecting the fine image details and other image characteristics. The procedure to accomplish this is as follows: the sample standard deviation that includes the 3 × 3 × 2 pre-processing window for each color channel in the t and t + 1 frames is calculated, thereby obtaining the parameter σ β " . This is described as the temporal SD because it is calculated over two frames (t and t + 1) of the video sequence. The procedure to calculate σ β " is similar to that used in Section 2 but applied to a 3 × 3 × 2 sample consisting of both frames. Then, it is compared with the SD σ β " obtained for the spatial algorithm in Subsection 2.1, as follows: IF σ red ' ' 0.4 σ red ' AND σ green ' ' 0.4 σ green ' AND σ blue ' ' 0.4 σ blue ' , THEN fuzzy rules 2, 3, 4, and 5 are employed; otherwise, the Mean Filter is utilized. The AND operator therein is the ‘logical AND’. Here, the value 0.4 in the condition sentence is selected to distinguish different areas containing fine details from those showing a uniform pattern. This value was found experimentally, according to the optimal PSNR and MAE values. Therefore, the application of the Mean Filter Algorithm implies that the uniform area is under processing:

y ¯ β out = i = 0 N x βi / N , N = 17 ,

where x βi represents each one of the pixels in a 3 × 3 × 2 pre-processing window, N = 17 is selected to take into account all pixel components in the two frames to be processed.

The general standard deviation used in this stage of the algorithm was adapted locally according to the pixels that agree with Figure 5 in the current sample. To acquire a new locally adapted SD, which will be used in the next frame of video sequence, a sensitive parameter α must be introduced describing the current distribution of the pixels and featuring a measure of temporal relationship between the t and t + 1 frames. The main idea of the sensitive parameter is to control the amount of filtering; this parameter modifies its value on its own to agree with the locally adapted SD. The same parameter allows the upgrading of the SD that helps to describe the relationship in the frames t and t + 1, producing a temporal parameter. When the Mean Filter is applied, the sensitivity parameter value is α = 0.125.

In case there is a drastic change in the fine details, edges, and movements in the current samples, these will be reflected in their parameter values - such as the membership functions, the SD, and the sensitivity parameters, as well as in their fuzzy vectorial-gradient values. The consequences, which are applied for each fuzzy rule, are based on the different conditions present in the sample.

The updating of the general standard deviation that should be used in the processing of the next frame is performed according to the expression:

σ β ' = α σ total 5 + 1 - α σ β ' .

The aim of this equation is to control the locally adapted spatial SD and, in the same manner, control the temporal SD which will, on its turn, control the amount of filtering modifying the T β threshold as will be shown later.

Parameters σ β ' , σ β ' ' , and σ total describe how the pixels in the t and t + 1 frames are related to each other in a spatial σ β ' and temporal σ β " , and σ total way. The SD updating of σtotal is achieved through: σ total = σ Red ' ' + σ Green ' ' + σ Blue ' ' 3 ; this is the average value of the temporal SD using the three color planes of the images. This relationship is designed to have the other color components of the image contribute to the sensitivity parameter.

The structure of Equation 11 can be illustrated using an example: if the Mean Filter Algorithm was selected for application instead of fuzzy rules 2, 3, 4, and 5, the sensitive parameter α = 0.125 used for the algorithm describes that the t and t + 1 frames are closely related. This means that the pixels in the t frame bear low noise due to the fact that the spatial algorithm was applied to this frame (see Subsection 2.1) and that the pixels in the t + 1 frame are probably low-noise too. However, at this time, because the t frame has only been filtered by the spatial algorithm (see Subsection 2.1), it seems better to increase the weight obtained by the t frame in the spatial SD σ β ' , rather than using that obtained by the t + 1 frame temporal SD σ β ' ' . That is why the weights of σtotal multiplied by α = 0.125, and the weight of σ β ' multiplied by (1 - α) = 0.875 are used.

The application of fuzzy rules to pixels allows a better preservation of the inherent characteristics of the color images. The following methodology is based on these concepts, using the pixels indicated by each fuzzy rule in the process of noise suppression. That is: if the number of pixels presented in the next condition, (1) IF {(# pixelsSBB > # pixelsSSS)AND(# pixelsSBB > # pixelsBBB)AND(# pixelsSBB > # pixelsBBS)}, is the biggest as compared to the other ones in the following IF conditions: (2) IF {(# pixelsSSS > # pixelsSBB)AND(# pixelsSSS > # pixelsBBB)AND(# pixelsSSS > # pixelsBBS)}, (3) IF {(# pixelsBBS > # pixelsSBB)AND(# pixelsBBS > # pixelsSSS)AND(# pixelsBBS > # pixelsBBB)}, and (4) IF {(# pixelsBBB > # pixelsSBB)AND(# pixelsBBB > # pixelsSSS)AND(# pixelsBBB > # pixelsBBS)}, the following methodology is applied to only those pixels that fulfill the condition:

y β out = i = 1 # pixels x βi t - 1 SBB βi / i = 1 # pixels SBB βi ,

where x βi t , and x βi t + 1 represent each pixel in the t and t + 1 frames that fulfills the assumed fuzzy rule conditions, respectively, with α = 0.875. For a better understanding of the use of fuzzy rules, see Figure 6. The following equations are used in cases where the largest number of pixels compared to the others is, for example, in case of the second condition (if #pixelsSSS is the biggest, that means: # pixelsSSS > # pixelsSBB > # pixelsBBB > # pixelsBBS) we perform:

y β out = i = 1 # pixels x βi t - 1 0.5 + x βi t 0.5 SSS βi / i = 1 # pixels SSS βi ,

where α = 0.125; or for the third condition (# pixelsBBS is the biggest):

y β out = i = 1 # pixels x βi t 1 - BBS βi / i = 1 # pixels 1 - BBS βi

where α = 0.875. Finally, for the fourth condition (# pixelsBBB is the biggest), when the number of pixels (# pixels) with BBB βi value is the biggest, the next algorithm is performed:

Figure 6
figure 6

The denoising scheme applied in the temporal algorithm. In the case of movement, uniform region, noise, edge, and fine detail , agrees with Figure 5 .

Procedure 1: consider the nine fuzzy vectorial-gradient values obtained from the BBB βi values. The central value is selected along with the three neighboring fuzzy values in order to detect the motion. The conjunction of the four subfacts is performed, which are combined by a triangular norm [19]. The intersection of all possible combinations of BBB βi and three different neighboring membership degrees gives 56 values to be obtained: C N - 1 K = N - 1 K = 56 , where N = 9, and with K = 3 elements are to be included in the intersection process. The values are added using an algebraic equation (sum = A + B - A · B) [19] of all instances in order to obtain the motion-noise confidence parameter.

The motion-noise confidence parameter is used to update the SD and to obtain the output pixel by means of the next algorithm: y β out = 1 - α x βc t + 1 + α x βc t , (where α = 0.875 if the motion-noise = 1; and α = 0.125 when the motion-noise = 0). If there is no majority in the number of pixels to any of the fuzzy rules, then the output pixel is computed as follows: y β out = 0.5 x βc t + 1 + 0.5 x βc t , where α = 0.5.

Finally, the algorithm employs the above-outlined spatial algorithm for smoothing the non-stationary noise remaining after application of the temporal filter, with the only modification in its threshold value of T β = 0.25 σ β ' , in agreement with Figure 1.

In summary, all parameters and their optimal values used in the development of this algorithm is given in the Table 1. All the optimum parameters were found under numerous simulations using different video color sequences with different levels of Gaussian noise and with different criteria to characterize noise suppression (PSNR), details and edges preservation (MAE), and chromaticity preservation (normalized chromaticity deviation (NCD)).

Table 1 All parameters used in the algorithm and their optimal values

All the other parameters used in the algorithm are locally updated in agreement with the adaptive method; this means that these parameters change locally in all the sequences of the video frames.

3. Simulation results

The results presented show the effectiveness of the proposed algorithm against others used for comparison. To accomplish this, video sequences containing different features and textures were used: Miss America, Flowers, and Chair sequences; all of them contaminated by Gaussian noise with a variance (VAR) 0.0 to 0.05. The color video sequences processed for this work were 24-bit true color and 176 × 144 pixels (QCIF format).

Figure 7 shows the frames of the original video sequences subjectively used to characterize the noise suppression, detail and edge preservation, and the chromaticity. The filtered frames and complete video sequences were quantitatively evaluated according to the following criteria: PSNR was used to characterize the noise suppression capabilities (a larger PSNR reflects a better preservation of the characteristics of video frames); the MAE was used to numerically measure the level of preservation of edges and fine details; and the NCD was used to characterize the perceptual error between two color vectors, according to the human perception of color [22, 26]. These criteria were applied to the proposed framework and compared with several algorithms that have demonstrated beneficial properties in video denoising.

Figure 7
figure 7

Original and corrupted images used to subjectively evaluate proposed and comparative algorithms. (a) 10th Flowers video sequence frame, (b) 10th Miss America video sequence frame, and (c) 10th Chair video sequence frame. Frames are corrupted with VAR = 0.01.

The proposed ‘Fuzzy Directional Adaptive Recursive Temporal Filter for Gaussian Denoising’ algorithm, referred to as FDARTF_G, was compared with others, the FLRSTF algorithm that uses similar fuzzy techniques [19], the FLRSTF_ANGLE, the VGVDF_G, and the VMMKNN_G [21, 22] algorithm that uses order statistics techniques for the removal of Gaussian noise.

Figure 8 illustrates the denoising capability and preservation ability of all mentioned filters for the 10th frame of the Miss America and Flowers video sequences. This figure shows that the designed framework produces the best results. The criteria applied are the PSNR and MAE. Here, one can observe that the performance of our design is the best for the Miss America frame; on the other hand, for the Flowers frame, the best results are generated by the PSNR criterion for the majority of the noise levels, while for the MAE criterion, the best results are for low-noise levels.

Figure 8
figure 8

MAE and PSNR criteria for the 10th Miss America and Flowers frames.

The processing results in the cases of the 20th and 30th frames for the three video sequences with corruption levels of VAR = 0.005 and VAR =0.01 have shown that the best performances in the MAE, PSNR, and NCD criteria are most of the times achieved through applying the proposed algorithm, as shown in Table 2.

Table 2 Comparative restoration results agree with the MAE, PSNR, and NCD criteria

A more sophisticated filter used as a comparison is the CBM3D [17]; this filter works in other domain, which consists of two steps in which blocks are grouped by spatio-temporal predictive blockmatching and each 3D group is filtered by a 3D transform domain shrinkage, and the complex 3D wavelet transform method 3DWF shows better results in terms of PSNR and MAE criteria than our proposed filter. For the Flowers sequence, the received results for our algorithm are worse because the performance of the additional time-recursive filtering in pixels where no motion is detected will be reduced for a moving camera. Advantages to take into account in our filtering method are the prevention/avoidance of spatiotemporal blur; one should only consider neighboring pixels from the current frame in case of detected motion. Other advantage is in preserving the details in the frame content; the filtering should not be as strong when large spatial activity e.g., a large variance, is detected in the current filtering window. As a consequence, more noise will be left, but large spatial activity corresponds to high spatial frequencies, where the eye is not sensitive enough to detect this. In the case of homogeneous areas, strong filtering should be performed to remove as much noise as possible. The performance of our methodology is similar to the achieved in the paper of Mélange et al. [27], and it was outperformed by CBM3D method too.

Table 3 presents average results for all of the proposed criteria from all of the frames that form the video sequences used. Based on these results, one can state that the best performance response is by the proposed filtering algorithm (FDARTF_G) for all the Gaussian noise levels for the Miss America video sequence. In the Flowers video sequence, the best results are achieved by the PSNR criterion for the majority of the noise levels. Additionally, the use of the MAE and NCD criteria achieves very good results in the preservation of details and chromatic properties.

Table 3 The PSNR , MAE , and NCD criteria averaged results

In Figure 9, we see that for the Chair video sequence, the best performance is given by our proposed method for every frame forming the video at medium and high noise levels. The best results were obtained for all of the criteria used (PSNR, MAE, and NCD). Evidently, the CVBM3D version, to process video color images, [17] filtering method will deliver better results against our suggestion, since 5 dB through 8 dB in the PSNR criterion, because it is more sophisticated and works in different domain and uses complex 3D wavelet transform method 3DWF that makes it powerful even though, until now, algorithms are not more powerful as those suggested in [17], as the methods proposed by Yu et al. [28], by Priyam Chatterjee and Milanfar [29], by Zuo et al. [30], and by Li et al. [31].

Figure 9
figure 9

PSNR, MAE, and NCD criteria for the Chair video sequence. With column (a) VAR = 0.005 and column (b) VAR = 0.01.

Finally, in Figure 10, one can see the filtered frames after different algorithms were used to estimate the quality of the subjective vision perception. From the results of the proposed algorithm, it is easy to corroborate that this filter has the best performance in detail preservation and noise suppression. In the FDARTF_G filtered image, one can observe cleaner regions with better preservation of fine details and edges, as compared to other algorithms.

Figure 10
figure 10

The filtered 10th frames of the Miss America video sequence. (a) Frame corrupted by Gaussian noise, VAR = 0.005, (b) FDARTF_G, (c) FLRSTF, (d) FLRSTF_ANGLE, (e) VGVDF_G, and (f) VMMKNN_G.

In Figure 11 below, the proposed framework produces the best results in the areas of detail preservation and noise suppression. One can perceive (in the vicinity of the tree) that in the case of the FDARTF_G filtering, the resulting image is less influenced by noise compared to the image produced by other filters. In addition, the new filter preserves more details of the features displayed in the background environment.

Figure 11
figure 11

The filtered 10th frames of the Flowers video sequence. (a) Image corrupted by Gaussian noise, VAR = 0.01, (b) FDARTF_G, (c) FLRSTF, (d) FLRSTF_ANGLE, (e) VGVDF_G, and (f) VMMKNN_G.

From Figure 12, one can see that the proposed framework achieves the best results in details, edges, and preservation of chromaticity. We can observe that the uniform regions are free from noise influence in the case of the FDARTF_G filtering than with the other filters implemented. Also, the new filter preserves more details in the features seen in the background environment.

Figure 12
figure 12

The filtered 10th frames of the Chair video sequence. (a) Image corrupted by Gaussian noise, VAR = 0.01, (b) FDARTF_G, (c) FLRSTF, (d) FLRSTF_ANGLE, (e) VGVDF_G, and (f) VMMKNN_G.

Since the proposed algorithm is adaptive, it is difficult to obtain computational information related to how many adds, multiplies, or divisions among other operations like trigonometrical ones were carried out; we provide real-time performance using a DSP from Texas Instruments, Dallas, TX, USA; this was the DM642 [32] giving the following results: for our proposed FDARTF_G, it spent an average time of 17.78 s per frame, but in a complete directional (VGVDF_G) processing algorithm, it spent an average time of 25.6 s per frame, both in a QCIF format.

4. Conclusions

The fuzzy and directional techniques working together have proven to be a powerful framework for image filtering applied in color video denoising in QCIF sequences. This robust algorithm performs motion detection and local noise standard deviation estimation. These proper video-sequence characteristics have been obtained and converted into parameters to be used as thresholds in different stages of the novel proposed filter. This algorithm permits the processing of t and t + 1 video frames, producing an appreciable savings of time and resources expended in computational filtering.

Using the advantages of both techniques (directional and diffuse), it was possible to design an algorithm that can preserve edges and fine details of video frames besides maintaining their inherent color, improving the preservation of the texture of the colors versus results obtained by the comparative algorithms. Other important conclusion is that for sequences obtained by a still camera, our method has a better performance in terms of PSNR than other multiresolution filters of a similar complexity, but it is outperformed by some more sophisticated methods (CBM3D).

The simulation results under the proposed criteria PSNR, MAE, and NCD were used to characterize an algorithm's efficiency in noise suppression, fine details, edges, and chromatic properties preservation. The perceptual errors have demonstrated the advantages of the proposed filtering approach.


  1. Rosales-Silva AJ, Gallegos-Funes FJ, Ponomaryov V: Fuzzy Directional (FD) Filter for impulsive noise reduction in colour video sequences. J. Vis. Commun. Image Represent. 2012, 23(1):143-149. 10.1016/j.jvcir.2011.09.007

    Article  Google Scholar 

  2. Amer A, Schrerder H: A new video noise reduction algorithm using spatial subbands. Int. Conf. on Electronic Circuits and Systems 13-16 October 1996, 1: 45-48.

    Article  Google Scholar 

  3. De Haan G: IC for motion-compensated deinterlacing, noise reduction, and picture rate conversion. IEEE Trans. On Consumers Electronics 1999, 45(3):617-624. 10.1109/30.793549

    Article  Google Scholar 

  4. Rajagopalan R, Orchard M: Synthesizing processed video by filtering temporal relationships. IEEE Trans. Image Process. 2002, 11(1):26-36. 10.1109/83.977880

    Article  Google Scholar 

  5. Seran V, Kondi LP: New temporal filtering scheme to reduce delay in wavelet-based video coding. IEEE Trans. Image Process. 2007, 16(12):2927-2935.

    Article  MathSciNet  Google Scholar 

  6. Zlokolica V, De Geyter M, Schulte S, Pizurica A, Philips W, Kerre E: Fuzzy logic recursive change detection for tracking and denoising of video sequences. Paper presented at the IS&T/SPIE Symposium on Electronic Imaging, San Jose, California, USA, 14 March 2005 doi: 10.1117/12.585854

    Google Scholar 

  7. Pizurica A, Zlokolica V, Philips W: Noise reduction in video sequences using wavelet-domain and temporal filtering. Paper presented at the SPIE Conference on Wavelet Applications in Industrial Processing, USA, 27 February 2004 doi:10.1117/12.516069

    Google Scholar 

  8. Selesnick W, Li K: Video denoising using 2d and 3d dual-tree complex wavelet transforms. Paper presented at the Proc. SPIE on Wavelet Applications in Signal and Image Processing, USA, volume 5207, pp. 607-618; 14 November 2003 doi: 10.1117/12.504896

    Google Scholar 

  9. Rajpoot N, Yao Z, Wilson R: Adaptive wavelet restoration of noisy video sequences. Paper presented at the IEEE International Conference on Image Processing, pp. 957-960, October 2004 doi: 10.1109/ICIP.2004.1419459

    Google Scholar 

  10. Ercole C, Foi A, Katkovnik V, Egiazarian K: Spatio-temporal pointwise adaptive denoising of video: 3d nonparametric regression approach. January: Paper presented at the First Workshop on Video Processing and Quality Metrics for Consumer Electronics; 2005.

    Google Scholar 

  11. Rusanovskyy D, Egiazarian K: Video denoising algorithm in sliding 3D DCT domain. Lecture Notes in Computer Science 3708. : Springer Verlag, Advanced Concepts for Intelligent Vision Systems; 2005:618-625.

    Google Scholar 

  12. Ponomaryov V, Rosales-Silva A, Gallegos-Funes F: Paper presented at the Proc. of SPIE-IS&T, Published in SPIE Proceedings Vol. 6811: Real-Time Image Processing 2008. 4 March 2008. doi:10.1117/12.758659

    Google Scholar 

  13. Varghese G, Wang Z: Video denoising based on a spatio-temporal Gaussian scale mixture model. IEEE Trans. Circ. Syst. Video. Tech. 2010, 20(7):1032-1040.

    Article  Google Scholar 

  14. Jovanov L, Pizurica A, Schulte S, Schelkens P, Munteanu A, Kerre E, Philips W: Combined wavelet-domain and motion-compensated video denoising based on video codec motion estimation methods. IEEE Trans. Circ. Syst. Video. Tech. 2009, 19(3):417-421.

    Article  Google Scholar 

  15. Dai J, Oscar C, Yang W, Pang C, Zou F, Wen X: Color video denoising based on adaptive color space conversion. Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), June 2010, pp. 2992-2995 doi: 10.1109/ISCAS.2010.5538013

    Google Scholar 

  16. Liu C, Freeman WT: A high-quality video denoising algorithm based on reliable motion estimation. In Paper presented at the Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III. Heraklion, Crete, Greece: Springer-Verlag; 2010:706-719.

    Google Scholar 

  17. Dabov K, Foi A, Egiazarian K: Video denoising by sparse 3D transform-domain collaborative filtering. In Proc. 15th European Signal Processing Conference, EUSIPCO 2007. Poznan, Poland: ; September 2007.

    Google Scholar 

  18. Mairal J, Sapiro G, Elad M: Learning multiscale sparse representations for image and video restoration. SIAM Multiscale Modeling and Simulation 2008, 7(1):214-241. 10.1137/070697653

    Article  MathSciNet  Google Scholar 

  19. Zlokolica V, Schulte S, Pizurica A, Philips W, Kerre E: Fuzzy logic recursive motion detection and denoising of video sequences. J. Electron. Imag. 2006, 15(2):1-13. doi:10.1117/1.2201548

    Article  Google Scholar 

  20. Trahanias PE, Karakos D, Venetsanopoulos AN: Directional processing of color images: theory and experimental results. IEEE Trans. Image Process. 1996, 5(6):868-880. 10.1109/83.503905

    Article  Google Scholar 

  21. Ponomaryov VI: Real-time 2D-3D filtering using order statistics based algorithms. J. Real-Time Image Proc. 2007, 1(3):173-194. 10.1007/s11554-007-0021-5

    Article  Google Scholar 

  22. Ponomaryov V, Rosales-Silva A, Golikov V: Adaptive and vector directional processing applied to video color images. Electron. Lett. 2006, 42(11):1-2.

    Article  Google Scholar 

  23. Arizona State University , October-2010

  24. Zheng J, Valavanis KP, Gauch JM: Noise removal from color images. J. Intell. Robot. Syst. 1993, 7: 3.

    Article  Google Scholar 

  25. Plataniotis KN, Venetsanopoulos AN: Color Image Processing and Applications. : Springer-Verlag; 26 May 2000.

    Book  Google Scholar 

  26. Pearson A: Fuzzy Logic Fundamentals. Chapter 3, 2001, pp. 61–103. . August 2008 Chapter 3, 2001, pp. 61–103. . August 2008

    Google Scholar 

  27. Mélange T, Nachtegael M, Kerre EE, Zlokolica V, Schulte S, Witte VD, Pizurica A, Philips W: Video denoising by fuzzy motion and detail adaptive averaging. J. Electron. Imag. 2008, 17(4):043005-1-043005-19.

    Article  Google Scholar 

  28. Yu S, Ahmad O, Swamy MNS: Video denoising using motion compensated 3-D wavelet transform with integrated recursive temporal filtering. IEEE Trans. Circ. Syst. Video. Tech. 2010, 20(6):780-791.

    Article  Google Scholar 

  29. Chatterjee P, Milanfar P: Clustering-based denoising with locally learned dictionaries. IEEE Trans. Image Process. 2009, 18(7):1438-1451.

    Article  MathSciNet  Google Scholar 

  30. Zuo C, Liu Y, Tan X, Wang W, Zhang M: Video denoising based on a spatiotemporal Kalman-bilateral mixture model. Scientific World Journal (Hindawi) 2013.

    Google Scholar 

  31. Li S, Yin H, Fang L: Group-sparse representation with dictionary learning for medial image denoising and fusion. IEEE Transaction on Biomedical Engineering 2012., 59(12):

    Google Scholar 

  32. Texas Instruments , January 2008

Download references


The authors thank the Instituto Politécnico Nacional de México (National Polytechnic Institute of Mexico) and CONACYT for their financial support.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Alberto Jorge Rosales-Silva.

Additional information

Competing interests

The authors declare that they have no competing interest.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Rosales-Silva, A.J., Gallegos-Funes, F.J., Trujillo, I.B. et al. Robust fuzzy scheme for Gaussian denoising of 3D color video. J Image Video Proc 2014, 20 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: