Low-complexity background subtraction based on spatial similarity
© Lee and Lee; licensee Springer. 2014
Received: 3 July 2013
Accepted: 2 June 2014
Published: 19 June 2014
Robust detection of moving objects from video sequences is an important task in machine vision systems and applications. To detect moving objects, accurate background subtraction is essential. In real environments, due to complex and various background types, background subtraction is a challenging task. In this paper, we propose a pixel-based background subtraction method based on spatial similarity. The main difficulties of background subtraction include various background changes, shadows, and objects similar in color to background areas. In order to address these problems, we first computed the spatial similarity using the structural similarity method (SSIM). Spatial similarity is an effective way of eliminating shadows and detecting objects similar to the background areas. With spatial similarity, we roughly eliminated most background pixels such as shadows and moving background areas, while preserving objects that are similar to the background regions. Finally, the remaining pixels were classified as background pixels and foreground pixels using density estimation. Previous methods based on density estimation required high computational complexity. However, by selecting the minimum number of features and deleting most background pixels, we were able to significantly reduce the level of computational complexity. We compared our method with some existing background modeling methods. The experimental results show that the proposed method produced more accurate and stable results.
KeywordsBackground subtraction Background modeling Structural similarity Kernel density estimation
As security monitoring emerges as an important issue, there has been an increasing demand for intelligent surveillance systems. Key operations in intelligent surveillance include object tracking, abnormal behavior detection, and behavior understanding. Accurate background subtraction plays an important role. The goal of background subtraction is to eliminate background components and detect meaningful moving objects. In real environments, due to various and complex background types such as moving escalators, waving tree branches, water fountains, and flickering monitors, background subtraction is a difficult task. Researchers have overcome these problems by using background modeling. Simple background models assume static background images. Background components can generally be eliminated by computing the difference between an input image and the background image that was modeled using average, low-pass filtering, and median filtering[1–4]. For instance, in, the median background image was used to subtract the background components. Since temporal median filtering is time-consuming, a fast algorithm utilizing the characteristics of adjacent frames was proposed. Cheng et al. applied a recursive mean procedure to compute background images. In, low-pass filtering was utilized to estimate a static background image. However, these approaches cannot handle dynamic backgrounds and are sensitive to threshold values.
In order to handle various background types, statistical approaches were introduced. Among these approaches, Gaussian modeling methods have been widely used. Initially, uni-modal distribution was used to model pixel values. In, a background subtraction method using the HSV color space was presented based on single Gaussian modeling. A fast and stable linear discriminant approach based on uni-modal distribution and Markov random field was proposed. Rambabu and Woo proposed a background subtraction method which is robust against noisy and changing illumination based on single Gaussian modeling. Although these models have low complexity levels and produce satisfactory performances in controlled backgrounds, it is difficult to use them for dynamic scenes. The Gaussian mixture model (GMM) is usually used to model various background types. Stuffer and Grimson used the GMM for background subtraction in, and it is still a popular method for background subtraction[10–20]. A spatio-temporal GMM (STGMM) was proposed to handle complex background. Using a GMM, a statistical framework was investigated to localize a foreground object and a dynamic background was modeled for highly dynamic conditions such as active cameras and high motion activities in background regions. Also, the subtraction of two Gaussian kernels (difference of Gaussians) was used to eliminate background regions in embedded platforms. A general framework of regularized online classification EM for GMM was proposed. Wang et al. proposed an adaptive local-patch GMM to detect moving objects in dynamic background regions. In, a new update algorithm was proposed for learning adaptive mixture models, and Bin et al. proposed a self-adaptive moving object detection algorithm. The method improved the original GMM in order to adapt to sudden or gradual illumination changes. In, in order to improve GMM performance, a new rate control method based on high-level feedback was developed. An improved adaptive-K GMM method was presented for updating background regions, and GMM was used for modeling background regions in a Bayer-pattern domain. A disadvantage of these multimodal Gaussian modeling methods is that they require pre-defined parameters such as the number of the Gaussian distributions and the standard deviations of those distributions. Also, dynamic backgrounds cannot be accurately modeled by a few Gaussian distributions. In order to overcome parameter background modeling methods, nonparametric background modeling techniques have been developed for estimating background probabilities. Nonparametric background modeling methods have been used to estimate background distribution based on pixel values observed in the past. In, the Gaussian kernel was used for pixel-based background modeling. This nonparametric method is usually used to handle multiple modes of dynamic backgrounds without pre-defined parameters. However, these nonparametric methods use kernel density estimation (KDE), which requires heavy computational complexity and a large amount of memory. Various efforts have been made to address these problems. Using Parzen density estimation and foreground object detection, a fast estimation method was presented and an automatic background modeling based on multivariate non-parametric KDE was proposed. In, a non-parametric method was proposed for foreground and background modeling, which did not require any initialization. Han et al. proposed an efficient algorithm for recursive density approximation based on density mode propagation. Also, depth information, on-line auto-regressive modeling, and Gaussian family distribution were used to eliminate background regions[26–28]. In, new object segmentation was proposed based on a recursive KDE. It used the mean-shift method to approximate the local maximum value of the density function. The background was modeled using real-time KDE based on online histogram learning.
Also, alternative approaches were proposed based on neural network techniques or the support vector machine (SVM) method[31–35]. A method was proposed based on self-organization through artificial neural networks. Furthermore, a self-organization method was combined with fuzzy approach to update background. In[33–35], an automatic algorithm was proposed to perform background modeling using SVM.
To develop a robust model with low complexity, we used a pixel-based background subtraction method based on spatial similarity computed using the structural similarity method (SSIM). Using spatial similarity, we measured the pixel similarity and eliminated background pixels. The remaining pixels were classified as either background or foreground pixels using KDE. Since we eliminated most background pixels and used only two features for KDE, the complexity of the proposed method was significantly reduced. The proposed method was evaluated using two datasets (Wallflower's and Li's datasets) and showed favorable performance over some existing methods.
The overall algorithm for efficient background subtraction
The structure similarity for eliminating background components
Since we calculated the means and the variances, the computational complexity was low. However, some background pixels were still retained. In order to eliminate the background pixels, we used nonparametric kernel density estimation.
Determining foreground and background areas using KDE
The proposed method
Determine the background type
The RBBI successfully detected moving background components such as moving escalators, waving tree branches, and water fountains.
Determine the foreground candidate pixels
Subtract the background pixels using KDE
where dmax represents the maximum difference. Let Ωmax be the channel with the maximum difference.
where I(i, j) represents the input image. If either the estimated probability density function of the pixel using the original RGB channels or the estimated probability density function of the pixel value of the normalized RGB channels was classified as a foreground component, the pixel was determined to be a foreground component. After this procedure, there were several small holes inside the foreground regions and some noise elements in the background regions. Most pixel-based methods suffer from this kind of problem. In order to address this, we applied a morphological operation to remove the small holes and noise elements. In particular, we used erosion followed by dilation and then a region filling technique was applied to the results.
In some test sequences, the background gradually became brighter over a period (Figure 8). The RBI did not reflect this gradual background change with a small value of α (Figure 8c). Thus, we set α = 0.01, and the learning rate was able to handle background changes adequately (Figure 8d).
Experiments were performed using two datasets (Li's dataset and the Wallflower's dataset). Li's dataset contained several dynamic background video sequences (water surface (WS), campus (CAM), fountain (FT), and meeting room (MR)) and static background video sequences (shopping center (SC), subway station (SS), airport (AP), lobby (LB), bootstrap (B)). The Wallflower's dataset contained various background types (bootstrap (B), camouflage (C), foreground aperture (FA), light switch (LS), moved object (MO), time of day (TD), and waving tree (WT)).
In the proposed method, the window size is 3 (w = 3), and the average remaining pixels were about 5% ~ 6% of the entire image pixels (τ ≅ 0.05). In other words, the KDE operation was reduced by approximately 95%. Although we needed to compute additional spatial similarity, it had a minor effect on the overall complexity. With 100 training images, the computational complexity for KDE and the proposed method was O(100 N) and O((9 + 0.05 × 100)N) = O(14 N), respectively. In this case, the complexity of the proposed method was about 14% of KDE.
where TP represents the number of true positive pixels, FP represents the number of false positive pixels, and FN represents the number of false negative pixels. Generally, a higher Jaccard similarity index indicates better performance.
Results using Li's dataset
Performance comparison with Jaccard similarity (Li's dataset)
Results using Wallflower's dataset
The effects of thresholds
We selected the optimal value for T1 and T2. When we set T1 and T2 to 0.55 and 30 respectively, the foreground candidate pixels were about 5% of the entire number of pixels, and the Jaccard similarity of the proposed method was about 0.78 with Li's dataset and the FP and FN numbers were about 6,888 with Wallflower's dataset. Experiments with various values of T1 and T2 show that the proposed method produced stable performance when the value of T1 was from 0.5 to 0.65 and the value of T2 was from 25 to 35.
In this paper, we proposed a background subtraction method that utilized structural similarity, which was robust against various background areas. The proposed method also significantly reduced the level of computational complexity since most pixels were eliminated using the similarity image. We tested the proposed method with two datasets and then compared the proposed method with some existing methods. The experimental results demonstrated that the proposed method was effective for various background scenes and compared favorably with some existing algorithms.
Sangwook Lee received the BS and MS degrees in electrical and electronic engineering from Yonsei University, Seoul, Repiblic of Korea in 2004 and 2006, respectively. He is currently working toward the PhD degree from Yonsei University and a senior engineer at Samsung Electronics Co. Ltd., Republic of Korea. His research interests include machine vision, image/signal processing, and video quality measurement.
Chulhee Lee received the BS and MS degrees in electronic engineering from Seoul National University in 1984 and 1986, respectively, and a PhD degree in electrical engineering from Purdue University, West Lafayette, Indiana, in 1992. In 1996, he joined the faculty of the Department of Electrical and Computer Engineering, Yonsei University, Seoul, Republic of Korea. His research interests include image/signal processing, pattern cognition, and neural networks.
This work was supported by grant no. R01-2006-000-11223-0 from the Basic Research Program of the Korea Science & Engineering Foundation.
- McFarlane NJB, Schofield CP: Segmentation and tracking of piglets in images. Mach. Vision App. 1995, 8(1):187-193.View ArticleGoogle Scholar
- Hung MH, Hsieh CH: Speed up temporal median filter for background subtraction. In Proceedings of the PCSPA, vol. 1. Harbin; 2004:297-300.Google Scholar
- Cheng F, Huang S, Ruan S: Advanced motion detection for intelligent video surveillance systems. In Proceedings of the ACM SAC, vol. 1. 984, Sierra; 2010:983-984.Google Scholar
- Cohen S: Background estimation as a labeling problem. In Proceedings of ICCV, vol. 2. Beijing; 2005:1034-1041.Google Scholar
- Wren C, Azarbayejani A, Darrell T, Pentland A: Pfinder: Real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. 1997, 19(7):780-785. 10.1109/34.598236View ArticleGoogle Scholar
- Zhao M, Bu J, Chen C: Robust background subtraction in HSV color space. In Proceedings of SPIE MSAV, vol. 1. Boston; 2002:325-332.Google Scholar
- Pan X, Wu Y: GSM-MRF based classification approach for real-time moving object detection. J. Zhejiang Univ. Sci. A 2008, 9(2):250-255. 10.1631/jzus.A071267MATHView ArticleGoogle Scholar
- Rambabu C, Woo W: Robust and accurate segmentation of moving objects in real-time video. In Proceedings of International Symposium on Ubiquitous VR, vol. 191. Yanji City; 2006:65-69.Google Scholar
- Stauffer C, Grimson E: Adaptive background mixture models for real-time tracking. In Proceedings of IEEE Conf. Computer Vision Patt. Recog, vol. 2. Fort Collins; 1999:246-252.Google Scholar
- Zhang W, Fang X, Yang X, Wu Q: Spatiotemporal Gaussian mixture model to detect moving objects in dynamic scenes. J. Electron. Imaging 2007, 16(2):023013-1–023013-6.View ArticleGoogle Scholar
- Su T, Hu J: Background removal in vision servo system using Gaussian mixture model framework. In Proceedings of ICNSC, vol. 1. Singapore; 2004:70-75.Google Scholar
- Doulamis A: Dynamic background modeling for a safe road design. In Proceedings of PETRA, vol. 1. Samos; 2010:1-9.Google Scholar
- Khan MH, Kypraios I, Khan U: A robust background subtraction algorithm for motion based video scene segmentation in embedded platforms. In Proceedings of FIT, vol. 1. Abbottabad; 2009:1-8.Google Scholar
- Wang H, Miller P: Regularized online mixture of Gaussians for background with shadow removal. In Proceedings of AVSS, vol. 1. Klagenfurt; 2011:249-254.Google Scholar
- Wang SC, Su TF, Lai SH: Detection of moving objects from dynamic background with shadow remova. In Proceedings of ICASSP, vol. 1. Prague; 2011:925.Google Scholar
- Zhao L, He X: daptive Gaussian mixture learning for moving object detection. In Proceedings of IC-BNMT, vol. 1. Beijing; 2010:1176-1180.Google Scholar
- Bin Z, Liu Y: Robust moving object detection and shadow removing based on improved Gaussian model and gradient information. In Proceedings of ICMT2010, vol. 1. Ningbo; 2010:1-5.Google Scholar
- Lim HH, Chuang JH, Liu TL: Regularized background adaptation: a novel learning rate control scheme for Gaussian mixture modeling. IEEE Trans. Image Process. 2011, 20(3):822-836.MathSciNetView ArticleGoogle Scholar
- Zhou H, Zhang X, Gao Y, Yu P: Video background subtraction using improved adaptive-K Gaussian mixture model. In Proceedings of ICACTE, vol. 5. Chengdu; 2010:363-366.Google Scholar
- Suhr J, Jung H, Li G, Kim J: Mixture of Gaussians-based background subtraction for Bayer-pattern image sequences. IEEE Trans. Circuits Syst. Video Technol. 2011, 21(3):365-370.View ArticleGoogle Scholar
- Elgammal A, Harwood D, Davis L: Non-parametric model for background subtraction. In Proceedings of ECCV, vol. 1. Dublin; 2000:751-767.Google Scholar
- Tanaka T, Shimada A, Arita D, Taniguchi R: A fast algorithm for adaptive background model construction using Parzen density estimation. In Proceedings of IEEE Conf. AVSS, vol. 1. London; 2007:528-553.Google Scholar
- Tavakkoli A, Nicolescu M, Bebis G: Automatic robust background modeling using multivariate non-parametric kernel density estimation for visual surveillance. In Proceedings of the International Symposium of Advances in Visual Computing LNCS, vol. 1. Nevada; 2005:363-370.Google Scholar
- Martel-Brisson N, Zaccarin A: Unsupervised approach for building non-parametric background and foreground models of scenes with significant foreground activity. In Proceedings of VNBA, vol. 1. Vancouver; 2008:93-100.Google Scholar
- Han B, Zhu DCY, Davis L: Sequential kernel density approximation through mode propagation: applications to background modeling. In Proceedings of ACCV, vol. 1. Jeju; 2004:1-6.Google Scholar
- Gordon G, Darrell T, Harville M, Woodfill J: Background estimation and removal based on range and color. In Proceedings of CVPR, vol. 1. Fort Collins; 1999:2459-2464.Google Scholar
- Monnet A, Mittal A, Paragios N, Ramesh V: Background modeling and subtraction of dynamic scenes. In Proceedings of ICCV, vol. 2. Beijing; 2003:1-8.Google Scholar
- Kim H, Sakamoto R, Kitahara I, Toriyama T, Kogure K: Robust foreground extraction technique using Gaussian family model and multiple thresholds. In Proceedings of ACCV, vol. 1. Tokyo; 2007:758-768.Google Scholar
- Zhu Q, Liu G, Wang Z, Chen H, Xie Y: A novel video object segmentation based on recursive kernel density estimation. In Proceedings of ICINFA, vol. 1. Shenzhen; 2011:843-846.Google Scholar
- Kolawole A, Tavakkoli A: Robust foreground detection in videos using adaptive color histogram thresholding and shadow removal. In Proceedings of ISVC, vol. 2. Las Vegas; 2011:496-505.Google Scholar
- Maddalena L, Petrosino A: A self-organizing approach to background subtraction for visual surveillance applications. IEEE Trans. Image Process. 2008, 13(4):1168-1177.MathSciNetView ArticleGoogle Scholar
- Maddalena L, Petrosino A: Self organizing and fuzzy modelling for parked vehicles detection. In Proceeding of ACVIS, vol. 1. Bordeaux; 2009:422-433.Google Scholar
- Lin H, Liu T, Chuang J: A probabilistic SVM approach for background scene initialization. In Proceedings of ICIP, vol. 3. Rochester; 2002:893-896.Google Scholar
- Cheng L, Gong M, Schuurmans D, Caelli T: Real-time discriminative background subtraction. IEEE Trans. Image Process. 2011, 20(5):1401-1414.MathSciNetView ArticleGoogle Scholar
- Junejo I, Bhutta A, Foroosh H: Dynamic scene modeling for object detection using single-class SVM. In Proceeding of International Conference on Image Processing, vol. 1. Hong Kong; 2010:1541-1544.Google Scholar
- Wang Z, Bovik AC, Sheikh HR, Simoncelli EP: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13(4):1-14.View ArticleGoogle Scholar
- Gonzalez R, Woods R: Digital Image Processing. 2nd edition. Prentice Hall, Englewood Cliffs; 2002.Google Scholar
- Park JG, Lee C: Bayesian rule-based complex background modeling and foreground detection. Opt. Eng. 2010, 49(2):027006-1–027006-11.View ArticleGoogle Scholar
- Li L, Huang W, Gu IYH, Tian Q: Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 2004, 13(1):1459-1472.View ArticleGoogle Scholar
- Toyama K, Krumm L, Brumitt B, Meyers B: Wallflower: principles and practice of background maintenance. In Proceedings of IEEE ICCV, vol. 1. Kerkyra; 1999:255-261.Google Scholar
- Jaccard P: The distribution of flora in the alpine zone. New Phytol. 1912, 11(2):37-50. 10.1111/j.1469-8137.1912.tb05611.xView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.