 Research
 Open Access
 Published:
A robust adaptive algorithm of moving object detection for video surveillance
EURASIP Journal on Image and Video Processing volume 2014, Article number: 27 (2014)
Abstract
In visual surveillance of both humans and vehicles, a video stream is processed to characterize the events of interest through the detection of moving objects in each frame. The majority of errors in higherlevel tasks such as tracking are often due to false detection. In this paper, a novel method is introduced for the detection of moving objects in surveillance applications which combines adaptive filtering technique with the Bayesian change detection algorithm. In proposed method, an adaptive structure firstly detects the edges of motion objects. Then, Bayesian algorithm corrects the shape of detected objects. The proposed method exhibits considerable robustness against noise, shadows, illumination changes, and repeated motions in the background compared to earlier works. In the proposed algorithm, no prior information about foreground and background is required and the motion detection is performed in an adaptive scheme. Besides, it is shown that the proposed algorithm is computationally efficient so that it can be easily implemented for online surveillance systems as well as similar applications.
1 Introduction
Today, stationary cameras are extensively used for video surveillance systems [1]. Visual surveillance is employed in many applications, such as car and pedestrian traffic monitoring, human activity surveillance for unusual activity detection, people counting, etc. A typical surveillance system consists of three building blocks: moving object detection, object tracking and higherlevel motion analysis [2]. The detection of regions corresponding to moving objects (people and vehicles) in video is the first processing step of almost every vision system because the rest of processing stages including tracking and activity analysis are locally applied to the regions of moving objects [3]. Thus, the identification of moving objects from a video sequence plays an important role in the performance of vision systems [2].
Numerous algorithms of motion detection have been presented up to now. The simplest ones mostly use a thresholding operation on the intensity difference (e.g., between consecutive video frames or between the current and background frames). These basic algorithms often yield a poor performance [1]. To improve the performance, other proposed methods employ probabilistic models [4–7] and statistical tests [8, 9]. So probabilistic models and statistical tests are used to model and extract the background. The performance of these detection algorithms would be largely influenced by the choice of threshold. Higher performance can theoretically be obtained by adaptively modifying threshold value. Up to now, several threshold adaptation methods have been proposed [1]. The most successful algorithms of detection are those which exploit frame differencing and modelling of change labels using Markov random field (MRF) in Bayesian framework [10].
On parallel, the change detection methods have been developed based on the maximum a posteriori (MAP) probability criterion which use MRFs as a priori models [11–13]. MAPinspired change detection algorithms result in better performance. However, they are computationally complex, because MAP estimation is an optimization problem requiring special algorithms such as simulated annealing or graphcuts [14].
To reduce the complexity of MAP estimation, it can be formulated as a likelihood test called local MAP estimation. Local MAP estimation coupled with MRF as a priori probability has been widely used for moving object detection. These algorithms generally use one of the current background subtraction methods in MAPMRF framework [1, 10, 15–18].
In this work, a new structure of detection is proposed in which adaptive noise cancellation (ANC) algorithm is utilized along with local MAP estimation. Adaptive noise cancellation basically is an alternative technique for estimating the original signals corrupted by additive noise or interference. In the context of signal and image processing, ANC has been already used in works which mostly estimate an image from a version of itself contaminated with additive noise [19–22]. In other words, it only removes the effect of noise. In this paper, ANC is exploited for moving object detection in video surveillance applications so that it eliminates noise, repeated motions of background, illumination changes, and shadows. Then, MAP estimation renders the regions corresponding to moving objects more compact and smooth. Proposed ANCMAP method suffers no longer from heavy computational complexity required in global MAP estimation. Also, it is adequately robust and efficient.
The organization of this paper is as follows. Section 2 provides a review on the basic Bayesian algorithm of change detection. Section 3 describes the principles of ANC algorithm. The proposed combinational method is presented in Section 4. Simulation results are discussed in Section 5. Finally, Section 6 summarizes the results as conclusion.
2 Bayesian change detection algorithm
The goal of a motion detection system is to divide each image frame into moving and still segments. It is realized through generating a mask Q consisting of binary labels q(m) for each pixel m on the image grid. The labels take either the label ‘u’ (unchanged) or ‘c’ (changed). In order to determine the label q(m = i) of pixel i, it may be started from the graylevel difference D = {d(m)} between two successive frames and then looking for a change mask which maximizes P(QD) (MAP estimate). Assuming that d(m) values are conditionally independent and the labels q(m) are known for all picture elements except i, the estimation of Q reduces to the determination of q(i) [u or c]. Depending on the choice of q(i), there would be two possible change masks of ${\mathrm{Q}}_{\mathsf{u}}^{\mathsf{i}}$, ${\mathrm{Q}}_{\mathsf{c}}^{\mathsf{i}}$. According to the Bayes' theorem, it may be deduced that [10]:
where t represents a threshold value for decision.
To make the detection more reliable, the decision should be taken based on the graylevel difference at pixel i and its neighboring pixels. Supposing a zeromean Gaussian distribution for the difference values and applying the inequality (1) to the pixels around pixel i, a decision rule may be obtained as follows [10]:
σ_{ u } represents the noise standard deviation of the graylevel differences in the stationary areas assuming to be constant over space. $\overline{{\mathrm{\Delta}}_{\mathit{i}}^{2}}$ is the sum of squared differences within a small sliding window w_{ i } having center i. T is an adaptive threshold derived from modelling a priori knowledge by MRF. This adaptive threshold varies with the label values in the pixel's neighborhood, i.e., decreases inside changed areas and increases outside [9]. T is defined as following:
where T_{0} stands for a constant threshold and B is a positivevalued potential. n_{ i } is the number of changed pixels in 3 × 3 neighborhood of each pixel. The higher the number n_{ i } of changed pixels found in this neighborhood, the lower the threshold is [10].
Figure 1 shows the general flowchart of basic Bayesian change detection algorithm.
Though this method performs well, the interior parts of the foregrounds are not detected in the case of big, uniform, or slow objects. This originates from differencing two successive frames. Moreover, it has significant difficulties with changing illumination conditions. In practice, every change causing $\overline{{\mathrm{\Delta}}_{\mathsf{i}}^{2}}$ to become larger than T would be considered as a motion event.
3 Adaptive noise cancellation
Adaptive noise cancellation is a method for estimating signals corrupted by additive noise or interference. Though the concept of ANC is based on using only an adaptive filter, the structure of ANC would appear so helpful in the proposed algorithm. According to Figure 2, it comprises two available inputs: a primary input d(n) and a reference input N_{1}(n). The first one represents the main signal s[n] corrupted by noise N_{0}(n). The reference input N_{1}(n) provides a filtered form of main noise N_{0}(n). In ANC, the reference input is adaptively filtered and subtracted from the primary input to obtain the original signal (removing the noise). The output will be an error signal (difference between d[n], y[n]), which is used through a feedback path to adjust the adaptive filter. The adaptive filter continuously readjusts its coefficients to minimize the energy of the error signal [23].
The adaptive filter can effectively work in unknown environments and can track the input signal with timevarying characteristics [24]. Several algorithms have been proposed to optimally adjust the filter coefficients, such as least mean square (LMS) algorithm and recursive least square (RLS) algorithm [25]. Here, LMS algorithm has been used because of its simplicity and fast convergence. Basically, LMS algorithm tries to minimize the energy (or mean square) of error signal, i.e., E[e^{2}] [25]. The LMS algorithm leads to a recursive update relation for filter coefficients W(n) as follows [26]:
where the parameters are as follows: n, iteration number; W, the vector of adaptive filter coefficients; X, the input vector entering adaptive filter; μ, a positive scalar called the step size.
4 Proposed algorithm for motion detection
In this section, a new algorithm is proposed that uses the ANC technique in a Bayesian framework to detect moving parts of each frame in a video sequence. To better follow up the concept of proposed algorithm, the basic idea of detection using ANC is firstly described. Then, it will be combined with local MAP estimation so that an integrated ANCMAP algorithm is obtained for optimally detecting moving objects.
4.1 Basic idea
In video surveillance applications, the camera is often located at a fixed position. This enables us to assume a rather stationary background. Since the areas related to moving objects are relatively small, the background information of two frames, whether successive or not, is highly correlated. This correlation will be used to separate background from foreground in an adaptive scheme.
As previously mentioned, the ANC algorithm requires two signals as inputs: a primary corrupted signal and a reference input containing noise. To apply this algorithm for motion detection, two possible situations may be imagined in terms of input signals. The input signals of ANC may be defined as following choices:

One background frame (processed frame) and one original frame

Two successive original frames without any processing
The two possible solutions above have been implemented and examined. In practice, the second solution is preferred because of the simplicity (no need of background extraction). The related procedure is implemented as following. First of all, two original frames are considered (containing unknown moving objects). The normalized gray levels of these two frames are put into column vectors X and Y and utilized as the inputs of the ANC algorithm (Figure 3). The vectors X and Y are supposed to represent the reference N_{1} and primary N_{0} + s[n] signals (refer to Section 3). s[n] is here assumed as the change caused by motion in the second frame.
Since the ANC algorithm suppresses any correlation (mostly due to background information), it is normally expected that the motion part remains at the output. By a simple thresholding on the absolute error and reshaping the vector, the output signal can be realized as an image including only moving objects. The proposed algorithm can detect the moving object being present either at both frames or at only one frame (entrance of the person). Figure 4 demonstrates the algorithm performance in two cases. Figure 4b represents the output (error signal e[n]) of detection result when only one frame contains moving objects. Figure 4c is the result of applying the ANC algorithm on two successive frames both including moving objects.
4.2 Proposed ANCMAP detection algorithm
Having some primary frames with no moving object, a background model may be available. In this case, the proposed ANC method would have an almost perfect performance (as shown in Figure 4b). However, a scene with no moving object may be at times impossible or very restrictive on the system such as traffic surveillance system. Moreover, a robust method of moving object detection should discriminate nonstationary background objects such as moving leaves and rain. Also, it should be able to quickly adapt to background changes (for example starting and stopping of vehicles). To cope with these problems, successive frames are selected to be applied to the proposed ANCbased algorithm in spite of better performance when a background model is used.
If two successive frames are applied to ANCbased algorithm, the inner parts of moving foreground objects are not detected and classified as the background. It is due to the large correlation of inner regions which are omitted by the ANC algorithm supposed as background segment. To overcome this problem, the proposed ANC system is followed by a Bayesian stage to detect changes (refer to Section 2).
To integrate Bayesian motion detection framework with mentioned ANC detection, the error signal which has turned back to an image is applied to the Bayesian algorithm as an input image (Figure 5). A priori probability function is selected so that smooth regions appear more probable than irregular ones. This procedure renders the output of detection algorithm more realistic (i.e., detected moving areas are made uniformly connected). A priori probability is here modeled by MRF. MRF estimation increases the probability of being a foreground pixel in the proximity of a pixel detected by ANC method and provides a contextdependent variable threshold. So detected objects would become more accurate and compact. The procedure finally results in the following equation:
where t_{s} is a constant threshold. $\overline{{\mathrm{err}}_{\mathit{i}}^{2}}$ stands for the sum of square errors in a window around pixel i. The parameter B is a positivevalued potential and n_{c} is the number of changed pixels in 3 × 3 neighborhood of each pixel. Since the output error is an estimation of the original image with background elimination, the original image should be normalized in order to have a normalized error. So, the threshold t_{s} will not be very sensitive to error value and can have a rather fixed value for different sequences.
5 Experimental results
To date, many motion detection algorithms have been developed that perform well in some types of videos but not in others. There is a list of challenging problems in the video surveillance applications addressed including illumination changes, repeated motions of background, bootstrapping, and shadows [17]. To show the ability of the proposed method to handle key challenges of realworld videos, it has been implemented and applied to several indoor and outdoor sequences with different frame rates and detection challenges. Regarding each motion detection challenge, two or more videos have been selected. Selected sequences are related to Visor dataset, Caviar dataset, and videos referenced in [7]. All videos are accessible at [27–29].
To evaluate the performance, earlier works [5, 7, 17] have been simulated and compared with the proposed algorithm. In all experimentations, the parameters are set as follows: the step size is chosen as 10^{−6}, filter length is equal to 8, and the parameter B is set to 0.5t_{s}. t_{s} is a fixed threshold which is selected experimentally and has a value between 0 and 1.
5.1 Simulation results
The proposed ANCMAP detection algorithm has been tested on a variety of environments. The results which are shown in Figure 6 have been compared with other methods. The compared algorithms are as follows. The first comparison is made with mixture of Gaussian (MOG) algorithm [5] being a widely used adaptive background subtraction method. It performs well for both stationary and nonstationary backgrounds [7]. Another compared algorithm is the robust method of [7] which incorporates spectral, spatial, and temporal features to characterize the background appearance in a Bayesian framework. The proposed method has been also compared with the algorithm presented in [17] being a combination of MOG and local MAP estimation.
Detection results have been compared for the case where rival methods exhibit the best possible performance as the results of other methods have been collected from [7, 17] (Figures 6, 7, and 8 of [7] and Figure 2 of [17]). The results shown for the algorithms of [5] and [7] have gained after a level of postprocessing [7].
5.1.1 Indoor environments and shadow
Figure 6a,b shows the results of two indoor test sequences including ‘Shopping Center and Buffet Restaurant’. It may be seen that the proposed algorithm can detect and separate the moving objects and eliminate the shadows of walking persons almost perfectly.
5.1.2 Bootstrapping
Figure 6a,b is the two examples of bootstrapping too, in which no training period (i.e., no frame without foreground objects) is available. In spite of background subtraction algorithms, the proposed method needs no primary training frame without foreground.
5.1.3 Illumination variations
Figure 6c shows the results for a sequence (Lobby) with sudden illumination changes caused by switching on or off lights.
5.1.4 Repeated changes in the background
Campus is a sequence with changes in background. Results show that the proposed algorithm can easily omit the repeated motions in background (waving trees) (Figure 6d).
Figure 7 shows the performance of the proposed method on five other videos with the same challenges discussed above.
5.2 Quantitative evaluation
To quantitatively evaluate the proposed algorithm versus earlier works, the similarity measure can be used as introduced in [7]. The similarity measure is defined as follows:
where A is the foreground region detected by the proposed algorithm; B, ideal foreground (ground truth); N(x), number of pixels in the region x.
S(A,B) approaches to a maximum value of 1 if A and B are the same. Five video sequences (Shopping center, Buffet Restaurant, Lobby, Fountain, and Campus) were selected to evaluate algorithms based on manually produced ground truth. Each of these videos is about one or two leading challenges in motion detection. Twenty frames of each sequence was randomly selected and used for making a comparison between the proposed method and other algorithms. This method of sampling frames is just like what is used in [7].
The averaging values of similarity measures for mentioned video sequences are shown in Table 1. Columns 1, 2, and 3 are the similarity measure values clearly expressed in [7, 17]. The last row shows the average of results for five sequences. Quantitative evaluation and comparison with the existing methods show that the proposed method provides better performance.
Figure 8 is a plot of similarity measure for the sampled frames of sequence ‘Campus.’
5.3 Limitations of the method
Since background and foreground are not a priori modeled in the proposed method, some restrictions appear at the results. A problem occurs when a foreground moving object stops suddenly or remains still for a period of time. The algorithm is not able to recognize a motionless foreground unless it starts moving again. Another problem arises when a color similarity exists between foreground and background. In this case, many foreground pixels are misclassified. However, the moving object is detected.
5.4 Complexity and computational cost analysis
To evaluate the computational load of proposed algorithm, it is supposed that the length of adaptive filter of ANC algorithm and the size of sliding window in Bayesian method are L and W respectively. In this case, the proposed algorithm will comprise of W + L + 2 addition and W + 2 L + 5 multiplication operation for each pixel. Table 2 is a comparison of complexity in terms of the required additions and multiplications per pixel in each algorithm. The number of operations needed in each method is dependent on its parameters. The letters used in Table 2 refer to the following parameters:

k, number of Gaussian distributions in each pixel

m, number of matched distributions in each pixel (m ≤ k)

L, length of adaptive filter of ANC algorithm

W, size of sliding window in Bayesian method

N(v), number of principle features of the background at 1 pixel
Assuming k = 3, W = 9, L = 8, and N(v) = 15, the operations required per pixel for each algorithm would be as follows:

MOG, 27 additions and 39 multiplications

[7], 74 additions and 43 multiplications

[17], 40 additions and 52 multiplications

Proposed method, 19 additions and 30 additions
According to Table 2, the proposed ANCMAP detection algorithm needs slightly low computational complexity compared to the realtime algorithm of [7] which requires a large amount of memory (1.78 KB memory) for calculations of each pixel [7]. Although the computational cost of MOG or [17] is relatively comparable with proposed method, the latter proposes a superior performance as stated in Table 1.
6 Conclusions
A new algorithm was proposed in this paper for the detection of moving objects using the structure of adaptive noise cancellation. The proposed detection algorithm is integrated with BayesianMRF algorithm to improve the performance in terms of the shape continuity of detected objects. This algorithm benefits from the correlation of background pixels on the successive frames and removes the background. What is left at the output would be an approximation of moving areas. The shape of moving objects is then improved using Bayesian algorithm. The algorithm appears to be very efficient in eliminating noise, shadows, illumination variations, and repeated motions in the background. Experiments on different environments have shown the effectiveness of the proposed method. Despite earlier adaptive detection algorithms, the proposed method tries to directly detect moving objects using adaptive filtering. The promising detection results and simplicity of algorithm make the proposed method to be a suitable candidate for realtime practical implementations.
Authors' information
EK was born in Iran in 1986. She received the B.Sc. and M.Sc. degrees in electronic engineering from K.N. Toosi University of Technology, Tehran, Iran, in 2008 and 2011, respectively. She is currently a teacher of digital logic laboratory at K.N. Toosi University of Technology. Her main interests are design of digital electronic circuits and signal and image processing. DA received the B.Sc. and M.Sc. degrees in electronics and bioelectric engineering from Sharif University of Technology, Tehran, Iran, in 1996 and 1998, respectively, and the Ph.D. degree in electronics engineering from SUPELEC, GifSurYvette, France, in 2007. He is now with the Department of Electrical Engineering, K.N. Toosi University of Technology in Tehran, Iran. His research interests include blind signal processing, particularly applied to electronic systems and design of analog and digital electronic circuits.
Abbreviations
 ANC:

adaptive noise cancellation
 MAP:

maximum a posteriori
 MRF:

Markov random field.
References
 1.
McHugh JM, Konrad J, Saligrama V, Jodoin P: Foregroundadaptive background subtraction. IEEE Signal. Process Lett. 2009, 16: 390393.
 2.
Ovsenik L, Kolesarova AK, Turan J: Video surveillance systems. Acta Electrotechnica et Informatica 2010, 10(4):4653.
 3.
Shayegh HR, Moghadam N: A new background subtraction method in video sequences based on temporal motion windows. In International Conference on IT. Thailand; 2009. March
 4.
Wren CR, Azarbayejani A, Darrell T, Pentland A: Pfinder: realtime tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 18(7):780785.
 5.
Stauffer C, Grimson W: Adaptive background mixture models for realtime tracking. IEEE Conf. Comp. Vision Pattern Recog. 1999, 2: 246252.
 6.
Elgammal A, Duraiswami R, Harwood D, Davis L: Background and foreground modelling using nonparametric kernel density for visual surveillance. Proc. IEEE 2002, 90(7):11511163. 10.1109/JPROC.2002.801448
 7.
Li L, Huang W, Gu IY, Tian Q: Statistical modelling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 2004, 13(11):14591472. 10.1109/TIP.2004.836169
 8.
Hsu YZ, Nagel HH, Refers G: New likelihood test method for change detection in image sequences. Comput. Vision Graph Image Process. 1984, 26: 73106. 10.1016/0734189X(84)901312
 9.
Aach T, Kaup A, Mester R: Statistical modelbased change detection in moving video. Signal Process. 1993, 31: 165180. 10.1016/01651684(93)90063G
 10.
Aach T, Kaup A: Bayesian algorithms for adaptive change detection in image sequences using Markov random fields. Signal Process Image Comm. 1995, 7(2):147160. 10.1016/09235965(95)00003F
 11.
Migdal J, Grimson EL: Background subtraction using Markov thresholds. IEEE Workshop Motion Video Computing 2005, 2: 5865.
 12.
Sheikh Y, Shah M: Bayesian modelling of dynamic scenes for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27(11):17781792.
 13.
Yu SY, Wang F, Xue YF, Yang J: Bayesian moving object detection in dynamic scenes using an adaptive foreground model. J. Zhejiang Univ. (Sci.) 2009, 10(12):17501758. 10.1631/jzus.A0820743
 14.
Kato Z Application on segmentation of SPOT images, PhD Thesis. In Multiresolution Markovian models in computer vision. France: INRIA, Sophia Antipolis; 1994.
 15.
Aach T, Dumbgen L, Mester R, Toth D: Bayesian illuminationinvariant motion detection. IEEE International Conference on Image Processing, Thessaloniki, 7–10 October 2001, 640643.
 16.
Liu Q, Sun M, Sclabassi RJ: Illuminationinvariant change detection model for patient monitoring video. In 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2004. San Francisco: IEMBS 04; 2004:17821785.
 17.
Tsai TH, Lin CY: Markov random field based background subtraction method for foreground detection under moving background scene. In Conference on Genetic and Evolutionary Computing. Shenzhen; 2010:691694.
 18.
Kermani E, Asemani D: A new illuminationinvariant method of moving object detection for video surveillance systems. In Conference on Machine Vision and Image Processing. Tehran; 2011:15.
 19.
Das M: An improved adaptive wiener filter for denoising and signal detection. In Proceedings of IASTED International Conference on Signal and Image Processing. Hawaii; 2005:226230.
 20.
Chen CY, Hsia CW: Image noise cancellation by adaptive filter with weighttraining mechanism (AFWTM). In Information, Decision, and Control Conference. Adelaide; 2007:332335.
 21.
Sudha S, Suresh GR, Sukanesh R: Speckle noise reduction in ultrasound images by wavelet thresholding based on weighted variance. IJCTE 2009, 1(1):17938201.
 22.
Naveen VJ, Prabakar T, Suman JV, Pradeep PD: Noise suppression in speech signals using adaptive algorithms. Int J Signal Process. Image Process. Pattern Recog. 2010, 3(3):8796.
 23.
Singh A: Adaptive noise cancellation, undergraduate B.E. project report. Netaji Subhas Institute of Technology; 2001.
 24.
He Y, He H, Li L, Wu Y, Pan H: The applications and simulation of adaptive filter in noise canceling. In Conference on Computer Science and Software Engineering. Hubei; 2008:14.
 25.
Singh A: Adaptive noise cancellation (2001). Available at . Accessed 21 January 2012 http://www.cs.cmu.edu/~aarti/pubs/ANC.pdf
 26.
Ramadan Z: Error vector normalized adaptive algorithm applied to adaptive noise canceller and system identification. J. Eng. Appl. Sci. 2010, 3(4):710717.
 27.
Visor dataset . Accessed 15 August 2011 http://imagelab.ing.unimore.it/visor/
 28.
CAVIAR test case scenarios (2003–2004). . Accessed 10 October 2011 http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/
 29.
Background modeling dataset. . Accessed 19 January 2012 http://perception.i2r.astar.edu.sg/bk_model/bk_index.html
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Received
Accepted
Published
DOI
Keywords
 Moving object detection
 Adaptive noise cancellation
 Bayesian
 Maximum a posteriori, Video stream
 Background subtraction
 Surveillance