 Research
 Open Access
 Published:
Visual contour tracking based on innercontour model particle filter under complex background
EURASIP Journal on Image and Video Processing volume 2019, Article number: 85 (2019)
Abstract
In this paper, a novel particle filter–based visual contour tracking method is proposed, which uses innercontour model to track contour object under complex background. The purpose is to achieve effectiveness and robustness against complex background. To that end, the proposed method first utilized Sobel edge detector to detect the edge information along the normal line of the contour. Then, it sampled the inner part of the normal line to get the local color information, which was then combined with the edge information to construct new normal line likelihood. After that, all the inner color information was used to construct global color likelihood. Finally, the edge information, local color information, and global color information were fused into new observation likelihood. Experimental results showed that the proposed method was robust for contours tracking under complex background, and it was also computationally efficient and can run in realtime completely.
Introduction
Nowadays, the application of visual object tracking is becoming more and more important in many fields, such as surveillance, robots, and humancomputer interfaces [1]. However, it is still a challenging task to achieve reliable tracking due to cluttered background, occlusion, and different illuminations. To overcome the abovementioned challenges and achieve robust tracking, a lot of tracking methods have been published during last two decades. Most of these tracking methods use rectangle or other rigid shapes to represent the target, which lose detailed shape and edge information. Furthermore, rectangle or other rigid shapes contain some background pixels outside the real target region, which will reduce the robustness of tracking. To overcome this problem, some researchers use contours to represent deformable targets [2,3,4,5].
Related work
Most contour tracking methods can be grouped into two categories, parametric active contour [6,7,8] and geometric active contour [5, 9], with different representations of contour curves. In the former, the contour was approximated by an explicit parametric model, typically using a set of control points and Bsplines. In the second case, the contour was typically represented by an implicit function, as in the level set method. In general, the parametric contour methods were more efficient, and were thus more suitable for realtime contour tracking.
In order to track contours with nonGaussian and nonlinear state densities in cluttered video sequences, Isard and Blake [6] introduced the CONDENSATION algorithm. They used Bspline to represent object contours, and particle filters to track the curve parameters given noisy observations. However, in this work, a very simple measurement term was used. Therefore, this method had difficulties in dealing with complex background clutters. To improve the performance under complex background, several methods were proposed. Li and Zhang [3] proposed Unscented Kalman Particle Filter (UKPF) to construct a new observation model, in which Kalman filter and unscented particle filter were used to adopt suboptimal proposal distributions. Chen [10] proposed a Multicue Hidden Markov Model Unscented Particle Filter (MHMMUPF) for contour detection and tracking based on multiple visual cues in spatial domain and improved the performance by joint probability matching to reduce background clutter. Although these two methods could improve the tracking performance in some simple environments, they still could not deal with the existence of objects that are similar with the target. Another contour tracking strategy is tracking by segmentation, in which the contours are represented with segmentation masks. For instance, Godec [11] proposed a Hough Tracking (HT) algorithm, which is one of the stateoftheart contour tracking algorithms in recent years. In this algorithm, the object location was determined with the Hough voting technique, and a mask was obtained with the Grab cut algorithm. However, due to its imposing shape constraints, the HT algorithm is very timeconsuming, and is not suitable for realtime object tracking.
Our approach
In contrast to the above methods, a novel multifeature fusion approach called innercontour model was proposed in this paper. This method fused the color feature with gradient feature to construct a new observation model in the particle filter framework to realize robust contour tracking in cluttered background. The proposed method first utilized Sobel edge detector to detect the edge information along the normal line of the contour. Then, it sampled the inner part of the normal line to get the local color information, which was combined with the edge information to construct new normal line likelihood. After that, all the inner color information was used to construct global color likelihood. Finally, the edge information, local color information, and global color information were fused into new observation likelihood. Experimental results showed that the proposed method was robust for contours tracking under complex background, and it was also computationally efficient and can run in realtime completely. The pipeline of the proposed method is shown in Fig. 1.
Paper organization
Section 2 summarizes the relevant methods for visual contour tracking; Section 3 deals with the proposed innercontour model for visual contour tracking; Section 4 introduces the experimental results and compares them with UKPF [3], MHMMUPF [10], and HT [11]; and Section 5 outlines the conclusions and suggestions for future research.
Visual contours tracking based on particle filter
This section briefly overviews the main concepts of the related methods discussed in this paper, including the basic formulae of particle filter for visual tracking and the visual contour observation model for contour appearance representation.
Particle filter
Particle filter is a Monte Carlo approximation to the optimal Bayesian filter. It provides robust tracking of moving objects in cluttered environment, especially in the case of nonlinear and nonGaussian problems where the interest lies in the detection and tracking of moving objects. It is a probabilistic framework for sequentially estimating the state of the target, recursively calculating the posterior density p(s_{t} z_{1 : t}) of the current object state s_{t} conditioned on all observations z_{1 : t} = (z_{1}, z_{2}......z_{t}) up to time t. The posterior density p(s_{t} z_{1 : t}) can be obtained recursively in two stages: prediction and update, which are, respectively, written as follows:
According to formulas (1) and (2), we obtain the following formula:
where k_{p} is a normalizing constant that is independent of s_{t}, p(z_{t} s_{t}) is the likelihood function, p(s_{t} s_{t − 1}) is the dynamic model, and p(s_{t} z_{1 : t − 1}) is the temporal prior over s_{t} given the prior observations.
The integral in formula (3) has no closed form solution, except in some most basic cases, so the particle filter is used to approximate formula (3) by using a set of weighted particles \( \left\{{s}_t^{(i)},{w}_t^{(i)}\right\}i=1,...,n \), and each particle represents a hypothetical state of the object. Under this representation, formula (3) can be approximated as follows:
where \( {w}_t^{(i)} \) is the weight for particle \( {s}_t^{(i)} \).
To implement a standard PF, a state representation s_{t} should be identified, in object tracking, which might include locations, scales, and rotations of the object. Moreover, it is necessary to design three distributions: the process dynamical distribution p(s_{t} s_{t − 1}), which describes how the object moves between frames; the proposal distribution q(s_{t} s_{1 : t − 1}, z_{1 : t}), which is sampled each time the particle distribution updates; and the observation likelihood distribution p(z_{t} s_{t}), which means how the object appears in the video frame. This paper focuses on this likelihood, and will be discussed in detail in the later sections.
The dynamical distribution p(s_{t} s_{t − 1}) can usually be represented as a linear stochastic differential function:
where A defines the deterministic component of the dynamic model, s_{t} is the state vector of time t, ω_{t − 1} ∈ (0, 1) is the system noise, which is usually an uniformly random variable or a multivariate Gaussian random variable, and B is the propagation distance, indicating the distance the particles can propagate in the next frame.
Visual contour observation model
In this paper, the visual contour object was modeled as a Bspline curve and was restricted to a shape space proposed by A. Blake and M. Isard [6]. The observation model of the tracking process was based on the model introduced by J. MacCormick [12], and the assumptions and propositions in this reference were adopted as the fundamentals for further derivation. This model can be described briefly as follows.
Giving a candidate contour represented by a Bspline curve, on which a finite number of points are sampled, and then the normal li(i = 1, 2, ⋯m) (hereafter called measurement line) to this curve at these points are searched for edge features, all the measurement lines have the same length L. A Sobel edge detector is applied to each measurement line, which is characterized by local maximum. This model made the following hypotheses.
Each feature could correspond to the real edge of the target or clutter feature. The model assumes that all the clutter features are uniformly distributed on the measurement line, and only one edge feature can be detected on each measurement line. The number n of clutter features can be observed on the measurement line with length L obeys a Poisson law with the densityλ:bL(n) = e^{−λL}(λL)^{n}/n!; there is a fixed possibility q_{01} that the edge feature is not detected; the distribution of the distance between edge feature and contour location of the real object is Gaussian, with zero mean and varianceσ^{2}.
Based on these hypotheses, the likelihood of the measurement line l_{i} can be expressed as (see reference [13] for details):
where q_{11} = 1 − q_{01}. Given that all the observations of the m measurement line l_{i}(i = 1, 2, ⋯m) are statistically independent, then the likelihood of the entire contour becomes:
The proposed method
Innercontour model
With only using the gradient feature of the input image, the contour tracking algorithms based on general model may have good performance with simple background which does not have much edge features. However, in highly cluttered environment, the tracker will easily drift to the noise edge feature, which leads to the failure of the tracking process. In order to improve the robustness of contour tracking, it is necessary to introduce other useful features into this model, and naturally fuse all the features to construct new observation likelihood. Inspired by this idea, this paper proposed new observation likelihood which combines the gradient feature and color feature naturally.
To achieve robustness against nonrigidity, rotation, and partial occlusion, the color distribution is a widely used target representation model. In this paper, the color distribution in HSV color space was used to express the color features.
With the contour as the boundary, all the m measurement lines li(i = 1, 2, ⋯m) can be separated into two parts, the inner part normal lines and the outer part normal lines, as shown in Fig. 2. The inner part reflects some characteristics of the target, while the outer part is often the clutter background, so using the inner part measurement lines can enhance the observation model during the tracking process.
For a single measurement line l_{i}, the histogram of the inner part is h_{i} = {h_{i, u}}u = 1...q, while the corresponding reference histogram is r_{i} = {r_{i, u}}u = 1...q, the size of both histograms are 1 × L/2, thus the similarity of the two histograms can be measured by the Bhattacharyya distance:
where \( \rho \left[{h}_i,{r}_i\right]=\sum \limits_{u=1}^q\sqrt{h_{i,u}{r}_{i,u}} \) is the Bhattacharyya factor. Then, the local color likelihood of l_{i} is:
The new likelihood for the measurement line combine l_{i} with the gradient feature and local color feature can be written as:
The likelihood of combining the whole contour with all measurement lines turns to be:
However, only using the local color information of each measurement line cannot provide the overall color distribution information of the contour target. In order to address this problem, all the inner part measurement lines are combined to construct a global color histogram, as shown in Fig. 3.
The histogram of all the m measurement lines l_{i}(i = 1, 2, ⋯m) is H = {H_{u}}u = 1...q, while the corresponding reference histogram is Q = {Q_{u}}u = 1...q, the size of both histograms is 1 × L/2, thus the similarity of the two histograms can be measured by Bhattacharyya distance:
where \( \rho \left[{H}_i,{Q}_i\right]=\sum \limits_{u=1}^q\sqrt{H_{i,u}{Q}_{i,u}} \) is the Bhattacharyya factor. Then, the global color likelihood of the contour is:
The final likelihood of the contour, which combines the gradient information, local color information, and global color information, can be expressed as follows:
Results and discussion
Experiment setting
To demonstrate the effectiveness and robustness of the proposed tracking scheme, seven different color videos were used in our experiments, six of which were acquired indoors and outdoors with a SONY CCD camera EXFCB48, and the rest was acquired from the public tracking dataset by Babenko [14]. These videos contained several challenging conditions, such as partial or total occlusion, and similar objects in the background. The contours of the object of basic truth of all tested videos were marked manually frame by frame. For all videos, the target object was manually selected in the first frame.
The control points of three particle filter–based algorithms (UKPF, MHMMUPF, and the proposed algorithm) were generated randomly according to formula 5, and the dynamic parameters used in the formula were listed in Table 2.
All the algorithms were implemented in C++ using the OpenCV library and run on a 1.8 GHz Pentium DualCore CPU, with 2 Gbyte of DDR memory.
The proposed tracking method was compared with UKPF, MHMMUPF, and HT algorithms, and the tracking results of each algorithm were marked with different colors to demonstrate the differences. The parameters used in the experiments are shown in Tables 1 and 2.
In the following section, we first presented all the tracking results of the two algorithms in the same tested video with different color curves, and then gave detailed evaluations and comparisons to demonstrate the effectiveness of the algorithms.
Performance and results overview
For comparison, we first implemented all the algorithms separately, and recorded the related data. Then, we redrew all the curves in the same video with different colors in order to observe the differences between the algorithms, as shown in Figs. 4, 5, 6, 7, 8, 9, and 10.
The test videos “Hand1” and “Hand3” were used to test the tracking performance in the scenarios of cluttered background, with abundant edge features and obvious affine transformation when the hand was moving. In “Hand1,” due to the existence of collars, cuffs, pockets, and wrinkles in clothes, the edge features of background interference were very noticeable. In “Hand3,” the background was more chaotic and there were many dense edge features, as shown in Fig. 5. Therefore, if only edge feature was used for contour tracking, it would be difficult to achieve stable tracking. As shown in the 285th, 361st, and 373rd frames in Fig. 4, the 291st, 488th, and 514th frames in Fig. 5, the tracking results of the UKPF and MHMMUPF drifted to the background, while in the previous frames of the tracking result of HT, the contours were not close to the real target, because the HT algorithm needed a progress to segment the target, as shown in the 43rd frame in Fig. 4. When using the proposed innercontour model, the tracking process was more reliable and robust with the help of local and global color information, as shown by the tracking results of green curves in Figs. 4 and 5. At the same time, the algorithm can respond to the affine deformation of the target in time and accurately in the tracking process. In Fig. 4, the target had an obvious movement close to the camera and then far away from the camera, that is, the process of the target from small to large and then to small, and also the process of rotation with large angle, during all these affine transformations. The proposed method always kept good tracking performance, while the UKPF and MHMMUPF could not.
The test video “Body” was used to test the robustness and effectiveness of the algorithm in the complex dynamic backgrounds with similar target interference and occlusion. In the “Body” video, a cartoon movie was always played on the projection screen of the background; therefore, the edge features and color features of the background were changing simultaneously, and another person walked in front of the target, as shown in Fig. 6. When only edge feature was used to track human contours, it was susceptible to dynamic background and similar target occlusions, resulting in tracking failures, as shown in the 174th, 177th, and 183rd frames of Fig. 6. When combined with the innercontour information, because there were significant differences between the clothes worn by the two people, it was easier to distinguish two different targets, even in the case of partial occlusion, so that the target could be accurately tracked. Although the tracking contours of HT were not as accurate as the proposed method, the accuracy of the target center positions was the best among all the test algorithms.
The test videos “Leaf1” and “Leaf 2” were used to test the tracking performance when there were a large number of similar targets in the background. These two sets of test videos were the most challenging, because the target to be tracked was a bunch of completely similar leaves, which flutter in the wind, and the system dynamic model was more complicated, thus it was difficult to achieve stable tracking. As shown in the 118th, 121st, and 130th frames of Fig.7, and the 377th, 411th, and 466th frames in Fig. 8, if only the edge information was used, it was easy to track other leaves during intense movement and deformation when the leaves were blown by the wind. When the innercontour model was adopted, it had better stability under the same conditions, which can not only accurately track the position of the target, but also make correct affine changes with the swing and deformation of the leaves. The HT tracker did not perform well in these two test videos because it was hard to segment the real target among so many similar leaves around the target.
The test videos “Taxi” was used to test the tracking performance when the target moved quickly in a simple background, as shown in Fig. 9. In this test, all the algorithms tracked the target well, and the HT tracker performed better than the other three particle filter–based algorithms, because in this simple background, it was relatively easy to segment the real target precisely.
The test video “David” was a wellknown video sequence, which was used to track algorithms to test the robustness when the environment luminance changed significantly. The experimental results showed that, as shown in Fig. 10, the proposed method and HT can track the person well from the first frame to the end, while UKPF and MHMMUPF lost the targets during the luminance changing.
In order to evaluate the tracking performance differences of various algorithms more objectively and accurately, the Euclidean distance of the center of gravity coordinates, the Euclidian distance of the control points and the algorithm time were used to evaluate the difference of different tracking algorithms in this paper. The center of gravity coordinate Euclidean distance represented the Euclidean distance between the center of gravity of the target contour calculated by the algorithm and the ground truth, which was used to characterize the accuracy of the overall position of the contour. The average Euclidean distance of the control points represented the average value of the Euclidean distance between each control point of the target contour calculated by the algorithm and the ground truth, which was used to characterize the track location accuracy. The algorithm time was a measure of the time spent by each algorithm on tracking different targets. It was used to characterize the execution efficiency and realtime performance of the algorithm.
As shown in Fig. 11, the left side represents the difference between the horizontal position and the ground truth of the center of gravity of the tracking target contour calculated by each algorithm of the “Hand1” video, and the right side represents the difference between the vertical position and the ground truth. Figure 12 shows the Euclidean distance between the center of gravity of the control points and the ground truth of each frame. From these figures, it can be seen that the proposed algorithm performed best among all the tested algorithms.
For performance evaluation and comparison, all the three PFbased algorithms (UKPF, MHMMUPF, and the proposed algorithm) were tested with different numbers of particles, including 100, 150, 200, 250, and 300, and the tracking results are shown in Table 3. It can be found that the tracking performance improves with the increase of the number of particles, and reaches the top when the number of particles is larger than 200.
The experiment result of average Euclidian distance and time consuming of all the test videos can be seen in Table 3. In terms of average Euclidean distance, the performance of the proposed algorithm and HT tracker was better than that of UKPF and MHMMUPF, and the average Euclidean distance of the proposed algorithm was the smallest in test videos “Hand1,” “Hand3,” “Leaf1,” and “Leaf2,” while the performance of HT tracker was better in test videos “Body,” “Taxi,” and “David” than that of the proposed one.
The difference of Euclidean distance between the four algorithms in the test video “Leaf2” was the most obvious, followed by the videos “Hand1”and “Hand3,” and the "Taxi" tracking was the least. The reason of this phenomenon was, in the leaf contour tracking video, there were a lot of edge information and some similar targets. Meanwhile, the (PanTiltZoom) PTZ camera took intense motion in three degrees of freedom, so when only edge information was used, the calculated weight of each particle may be close in any position, so that the effect of resampling was not obvious, and finally, the target position generated a relatively large deviation. When the color information was introduced, the small difference between the target and the surrounding leaves was used to increase the weight of each particle, and the weight of the edge of the background became smaller. After resampling, more particles moved closer to the target leaves; therefore, the tracking position was more accurate. In the “Body” video, there were few distinct edge features around the body. The edge features could well distinguish the target and the background, so the difference between the two algorithms was very small.
For the perspective of algorithm timeconsuming (Table 4), the HT tracker was the most timeconsuming one, it took much more time than the other three algorithms due to its complex segment algorithm and imposing shape constraints. Among the other three algorithms except HT, the proposed algorithm was more timeconsuming than UKPF, but less than MHMMUPF; however the difference was very small, about 1 to 2 ms. At the same time, it can be found that the time consuming of the algorithm increased with the increase of the number of control points. The reason was that if only the time consumption of coordinate transformation of the control points was considered, the algorithm time would be slightly affected by the number of control points due to the representation of the shape space, and the main influencing factor was the time to draw the Bspline curve. In the case of 50 equal divisions between every two control points, the time to draw the Bspline curve increased approximately linearly with the number of control points, as shown in Fig. 13. In the case of 18 control points (using test video “Hand3”), the average algorithm time was 40.26 ms, and the algorithm time could basically meet the realtime requirements, and the realtime performance of the algorithm could be further improved by using some fast spatialtemporal mechanisms and algorithms, as mentioned in literatures [15,16,17].
Conclusions
This paper has presented a method of visual contour tracking based on particle filter for innercontour model under complex background. This novel method fused the gradient feature, local color feature, and global color feature naturally to achieve robust contour tracking in cluttered environment. Specifically, the proposed algorithm first used Sobel edge detector to detect the edge information along the normal lines of the contour, and then sampled the inner part of the normal lines to get the local color information, which was combined with the edge information to construct new normal line likelihood. After that, all the inner color information was used to construct global color likelihood. Finally, the edge information, local color information and global color information are fused together as new observation likelihood. The experimental results demonstrated that, compared with gradientonly feature method, the proposed algorithm was effective and robust in dealing with cluttered background, and it was also computationally efficient and could run completely in real time.
The proposed algorithm was inspired by the gradientonly contour tracking method (UKPF) and achieved better results, and it would be helpful for other tracking methods that needed to consider multicues fusing in cluttered background.
Availability of data and materials
None
Abbreviations
 HT:

Hough Tracking
 MHMMUPF:

Multicue Hidden Markov Model Unscented Particle Filter
 UKPF:

Unscented Kalman Particle Filter
References
 1.
A. Yilmaz, O. Javed, M. Shah, Object tracking: a survey. ACM Comput. Surv. 38(4), 13 (2006)
 2.
M. Isard, A. Blake, CONDENSATION—conditional density propagation for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998)
 3.
P. Li, T. Zhang, A.E.C. Pece, Visual contour tracking based on particle filters. Image Vis. Comput. 21(1), 111–123 (2003)
 4.
A.W.M. Smeulders et al., Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)
 5.
P. Lv et al., Multiple cuesbased active contours for target contour tracking under sophisticated background. Vis. Comput. 33(9), 1103–1119 (2016)
 6.
Isard, M. and A. Blake. Contour tracking by stochastic propagation of conditional density. in European Conference on Computer Vision. 1996.
 7.
N. Peterfreund, Robust tracking of position and velocity with Kalman snakes. Pami 22(6), 564–569 (1999)
 8.
F.Y. Shih, Z. Kai, Locating object contours in complex background using improved snakes. Computer Vision & Image Understanding 105(2), 93–98 (2007)
 9.
Chockalingam, P., N. Pradeep, and S. Birchfield. Adaptive fragmentsbased tracking of nonrigid objects using level sets. in IEEE International Conference on Computer Vision. 2009.
 10.
C. Yunqiang, R. Yong, T.S. Huang, Multicue HMMUKF for realtime contour tracking. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1525–1529 (2006)
 11.
M. Godec, P.M. Roth, H. Bischof, Houghbased tracking of nonrigid objects. Comput. Vis. Image Underst. 117(10), 1245–1256 (2013)
 12.
MacCormick, J. and A. Blake. A probabilistic exclusion principle for tracking multiple objects. in Proceedings of the Seventh IEEE International Conference on Computer Vision. 1999.
 13.
J. Maccormick, Stochastic algorithms for visual tracking (2002)
 14.
Babenko, B., M.H. Yang, and S. Belongie. Visual tracking with online multiple instance learning. in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. 2009.
 15.
Yan, C., et al., Crossmodality bridging and knowledge transferring for image understanding. IEEE Transactions on Multimedia, 2019: p. 11.
 16.
Yan, C., et al., STAT: spatialtemporal attention mechanism for video captioning. IEEE Transactions on Multimedia, 2019: p. 11.
 17.
C. Yan et al., A fast Uyghur text detector for complex background images. IEEE Transactions on Multimedia 20(12), 3389–3398 (2018)
Acknowledgements
The most heartfelt gratitude to Ke Xiang for his helpful discussion and feedback on the algorithm, as well as Fangge Lu’s for proofreading of English writing.
Funding
None
Author information
Affiliations
Contributions
SC implemented the core algorithm, designed all the experiments, addressed the resulting data, and drafted the manuscript. XW participated in the design and construction of the innercontour model and helped draft the manuscript. All authors have read and approved the final manuscript.
Corresponding author
Correspondence to Songxiao Cao.
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Cao, S., Wang, X. Visual contour tracking based on innercontour model particle filter under complex background. J Image Video Proc. 2019, 85 (2019) doi:10.1186/s1364001904877
Received:
Accepted:
Published:
Keywords
 Contour tracking
 Innercontour model
 Bspline
 Particle filter