- Research
- Open Access

# Robust multiple-vehicle tracking via adaptive integration of multiple visual features

- Hao Sheng
^{1, 2}, - Qi Wei
^{3}, - Chao Li
^{2, 3}Email author and - Zhang Xiong
^{2, 3}

**2012**:2

https://doi.org/10.1186/1687-5281-2012-2

© Sheng et al; licensee Springer. 2012

**Received:**19 August 2011**Accepted:**24 February 2012**Published:**24 February 2012

## Abstract

This article presents a robust approach to tracking multiple vehicles with integration of multiple visual features. The observation is modeled by democratic integration strategies according to the reliability of the information in the current multi-visual features to adjust their weights. The appearance model is also embedded in a particle filter (PF) tracking framework. Furthermore, we propose a new model updating algorithm based on the PF. In order to avoid incorrect results caused by "model drift" introduced into the observation model, model updating should only be controlled in a reliable manner, and the rate of updating is based on reliability. This article also presents the experiments using a real video sequence to verify the proposed method.

## Keywords

- multiple vehicle tracking
- multiple visual features
- adaptive integration
- model updating

## 1. Introduction

With the rapid process of urbanization, the concept of developing a "smart city" has gained prominence. As an important part of this trend, Intelligent Transportation Systems (ITS) will be critical for effective management of urban traffic. Vehicle tracking under different traffic scenarios is one of the key issues in ITS. Vehicle motion parameters, such as location, velocity, orientation, and acceleration, can be obtained to further recognize and understand vehicular behavior. However, the challenges of robust tracking come from uncertain and dynamic conditions of speed, occlusion, deformation, illumination variation, background clutter, real-time restriction, etc. In order to handle these problems, great effort has been made to devise robust tracking algorithms. In general, the following three key problems should be solved in tracking: (1) an effective framework to locate vehicles in motion; (2) modeling observation of vehicles; and (3) reliable updating of vehicle models.

An ideal locating framework should be able to predict and update the motion state and observation model of an object, and even track multiple objects under various conditions. Probabilistic tracking, which is a process utilizing posterior probability density of target states in a Bayesian framework, is a highly effective approach. Kalman Filter (KF) [1], Hidden Markov Model [2], and Particle Filter (PF) [3–7] techniques have been used in different tracking applications. PF recursively constructs the posterior probability distribution function of the state space using a Monte Carlo integral. A PF-based tracking algorithm has the added advantage that any visual feature can be used for the observation model. Meanwhile, it has the ability to integrate multiple visual features.

The observation model depicts similarity measurements between a template region and the candidate region of a vehicle, and plays an even critical role in visual tracking associated with PF. Many visual features can be selected for vehicle observation modeling, including color [8], edge [3, 9], feature descriptors [10], color-spatial features [11], wavelets [12], etc. For instance, it is sensitive to the variation of a given illumination environment using a color-based method when the illumination varies. An edge-based method can avoid disturbances caused by illumination variances, but it is either time-consuming or limited to a single shape model, and presents difficulties in achieving accurate real-time tracking. The algorithms based on these methods have achieved good tracking performance, but relying on a single visual feature is often inadequate and unstable in some complex tracking scenarios. Various complementary features can be combined to derive more robust tracking results. It is our interest to employ multiple visual features under a robust tracking framework. The advantage of this kind of method is that vehicle information can complement other visual features. When one visual feature fails, others can be used to maintain tracking. However, the difficulty is how to design a good strategy to integrate visual features reliably. Methods proposed in [13–15] are based on the fixed weight integration. If one visual feature with a fixed weight changes markedly, the observation model after integration will be unreliable. This leads to tracking drift away from the true location, and even tracking failure. Spengler and Schiele [16] proposed an algorithm with an adaptive integration strategy using an EM algorithm to adjust the weight of each visual feature online. However, this algorithm is based on a matching algorithm under a local search. When partial or complete occlusion occurs, tracking performance will seriously decline.

How to update a vehicle model to deal with appearance changes during tracking is very important for the robustness of an algorithm. Many algorithms assume the appearance of an object as being invariable during tracking. The appearance model of an object is usually extracted in the first frame image, and then the most probable location of the object is found in the following frame. This assumption is reasonable for short-term tracking. However, for long-term tracking, appearance changes in an object are inevitable. Jepson et al. [17] proposed an adaptive texture-based model named *WSL*. This model consists of three components to describe object appearance changes, where *W* describes the rapid changes in object appearance, *S* is used to characterize the stability in an object whose change in appearance is slow, and *L* is defined to depict abnormal variations in object appearance. A Gaussian Mixture Model (GMM) is constructed by these components, and the parameters of GMM are updated through the EM algorithm online. The proposed model has strong robustness to changes in both illumination and shape. However, it fails to track when an object is occluded by one with the same visual features for even a moment. The reason is that the same information presented by the occluding object is added into the model of the occluded object during updating. After occlusion, the appearance model cannot correctly reflect the object. This phenomenon is called "model drift". A fixed number of pre-learned exemplars are used as templates by Toyama and Blake [18]. The problem with this method is that only a fixed number of examples can be used as templates to model the appearance of an object. Yang and Wu [19] introduced a closed-form solution by "discriminative training" of a generative model to alleviate model drift. They optimize a convex combination of the generative and discriminative log likelihood functions to obtain the model. Avidan [20] treated tracking as a classification problem. The ensemble of weak classifiers is combined into a strong classifier using AdaBoost. The strong classifier is then used to label pixels in the next frame as either belonging to the object or the background, creating a confidence map. The new position of the object is found in the peak of the map by using a mean shift. However, only the color of each pixel is used to classify, and the classifier needs to update background information around the object. When two objects of a similar color are very near each other, tracking will fail.

This article proposes a robust tracking approach with an adaptive integration of multiple visual features for vehicles. A color histogram and an edge orientation histogram (EOH) are selected as visual features to model the observation of the vehicle and integrated by a democratic integration strategy proposed by Triesch and Malsburg [21]. It is suitable for dynamic scenes due to the adaptive adjustment of the weight of each visual feature with its reliability in the current frame. However, deterministic integration is vulnerable to occlusion for a few frames because the present iteration is initialized according to the previous one. Thus, the observation model is embedded in the PF tracking framework. In order to improve the robustness of object representation, spatial information is incorporated into the observation model by dividing the object to be tracked into a number of fragments. We then analyze the reason for model drift during the model update process, and propose a new model updating method under a PF. In order to avoid errors caused by model drift, the updating process should only be implemented in a reliable manner, and the rate of updating can be controlled according to this reliability. The posterior probability density function of distribution of state vector and similarity between the candidate and reference observation of an object are used to define the valid measurement of the reliability to model updating during tracking. Experimental results in real traffic surveillance video sequences show that our approach outperforms others in vehicle tracking under complex conditions.

The remainder of this article is organized as follows. The preprocessing before tracking is described in Section 2. The state model for multiple vehicles is built in Section 3. The adaptive and robust observation model is presented in Section 4. The reliable model updating strategy is introduced in Section 5. The PF-based tracking algorithm is completely summarized in Section 6. The experimental results are given in Section 7, and finally, the conclusion is given in Section 8.

## 2. Preprocessing before tracking

### 2.1. Background modeling

In surveillance video, it can be seen that the background changes along with illumination, weather, and other conditions. So, we must process the surveillance video scene first. Our previous research has given a self-adaptive modeling for real-time background modeling with lower computational-complexity and higher accuracy [22].

*n*+ 1)th frame, the gray value of point

*p*can be described as follows:

*G*(

*n*,

*p*) is the pixel

*p*'s gray value in the

*n*th frame,

*L*(

*n*,

*p*) is the model to describe the change of illumination with the change of time, and noise

^{1}(

*n*,

*p*) is the Gaussian noise taking zero as the center. The gray value of pixel

*p*in the input image can be described as:

^{2}(

*n*,

*p*) is Gaussian noise taking zero as the center. A comparison between (1) and (2) can easily indicate that

where *ω*(*n*+1, *p*) = *L*(*n*, *p*)+noise^{1}(*n*, *p*)+noise^{2}(*n*+1, *p*). *ω*(*n*, *p*) is a Gaussian distribution. We use a mean value to represent *m*(*n*, *p*) and *s*(*n*, *p*), respectively, and use a variable to represent *ω*(*n*, *p*). In traffic surveillance video, illumination and noise distribution change little in a triangular region. Therefore, *m*(*n*, *p*) and *s*(*n*, *p*) are independent of the position of pixel *p*. Then, a histogram can be derived from the difference between {*I*(*n*+1, *p*)} and {*G*(*n*, *p*)} in a triangular region. From this histogram, the mean value of *m*(*n*) and *s*(*n*) can be estimated by a self-adaptive filter based on a recursive least square method.

### 2.2. Detecting the ROI of a vehicle

- (1)
Extract contour information with a Canny filter

- (2)
Extract lines from image contours with a Hough transformer

- (3)
Achieve a set of corners at both ends of the lines

- (4)
Initialize the CDT based on all constrained edges

- (5)
Insert all corner points in turn to reconstruct the CDT

- (6)
Extract corner density, horizontal straight line density, vertical straight line density, triangle density, and average intensity of a vehicle region to construct the feature vector

- (7)
Put the feature vector into SVM to determine the ROI of the vehicle target

## 3. State modeling for multiple vehicles

where **C** = (*x*, *y*) and *s* are the centroid and area bounding box, respectively.

**S**

_{ t }is predicted by three parts: the previous state

**S**

_{t-1}, the last state displacement

**S**

_{t-1}-

**S**

_{t-2}, and a zero-mean Gaussian stochastic component

*ω*

_{ t }with covariance matrix ∑:

*M*vehicles in a video scene. So, the model is regarded as

where **S**_{
t
}(*m*) is the state vector of the *m* th vehicle in the *k* th frame.

## 4. Adaptive integration-based observation model

The observation models encode the visual information of a vehicle's appearance. Since a single visual feature does not work in all cases, we utilize the Hue-Saturation-Value (HSV) color histogram to capture the color information of a vehicle, and an EOH to encode shape information, indicating that **O** = {**O**_{
t
}; *t* ∈ *N*} is denoted as the vehicle's observation model.

### 4.1. Color features

We obtain the color information of a vehicle by a two-part color histogram based on the HSV color space. We use the HSV color histogram because it decouples the intensity from Hue and Saturation, and thus it is less sensitive to illumination effects than a histogram from the RGB color space. The exploitation of the spatial layout of the color is also crucial due to the fact that different vehicles usually have different colors.

In the non-Gaussian state space, state model **S** is assumed to be a hidden Markov process, with an initial distribution *p*(**S**_{0}) and a transfer distribution *p*(**S**_{
t
}|**S**_{t-1}). A color histogram-based observation model ${\mathbf{O}}_{t}^{c}$ is obtained through the marginal distribution $p\left({\mathbf{O}}_{t}^{c}|{\mathbf{S}}_{t}\right)$. Our color observation model is composed of a 2D histogram based on Hue and Saturation and a 1D histogram based on value. Both histograms are normalized such that all bins sum to one. We assign the same number of bins for each color component, i.e., *N*_{h} = *N*_{s} = *N*_{v} = 10, resulting in an *N* = *N*_{h} × *N*_{s}+*N*_{v} = 110-dimensional HSV histogram.

*R*(

**S**

_{ t }) is the candidate region of vehicle at time

*t*, the kernel density estimation of color distribution is

*b*

_{ t }(

**d**) ∈ {1, ...,

*N*} is the index of color bins of a pixel at position

**d**;

*δ*[·] is the delta function;

*κ*is a normalized factor to subject to ${\sum}_{n=1}^{N}k\left(n;{\mathbf{S}}_{t}\right)=1$; position

**d**is a pixel in the candidate region

*R*(

**S**

_{ t }). Suppose that ${\mathbf{K}}^{*}\triangleq {\left\{{k}^{*}\left(n;{\mathbf{S}}_{0}\right)\right\}}_{n=1,...,N}$ is the reference template and $\mathbf{K}\left({\mathbf{S}}_{t}\right)\triangleq {\left\{k\left(n;{\mathbf{S}}_{t}\right)\right\}}_{n=1,...,N}$ is the candidate model, the similarity measurement is defined based on Bhattacharyya coefficient:

*λ*

_{ c }is a factor determined by the variation of color Gaussian distribution. Figure 4 shows the HSV color histograms of two vehicles.

### 4.2. Shape features

*x*,

*y*) in the image

*I*can be computed by the Sobel operator mask:

_{h}and Sobel

_{v}are horizontal and vertical masks of the Sobel operator. The strength of an edge is computed as follows:

*G*(

*x*,

*y*) such that

*T*was suggested to be set between 80 and 110 in [24]. The orientation of the edge is

*K*bins. The value of the

*k*th bin is denoted as

- (1)Edge Strength Features in any two orientations
*ϕ*:${\varphi}_{i,j}\left({q}^{l}\right)=arctan\left(\frac{{E}_{i}\left({q}^{l}\right)+\epsilon}{{E}_{j}\left({q}^{l}\right)+\epsilon}\right)$(17) - (2)Dominant orientation features
*φ*:${\phi}_{i}\left({q}^{l}\right)=arctan\left(\frac{{E}_{i}\left({q}^{l}\right)+\epsilon}{\sum _{j\in K}{E}_{j}\left({q}^{l}\right)+\epsilon}\right)$(18) - (3)Symmetry features
*ζ*:${\zeta}_{1}\left({R}_{1},{R}_{2}\right)=arctan\left(\frac{{E}_{i}\left({R}_{1}\right)-{E}_{\pi \left(i\right)}\left({R}_{2}\right)+\epsilon}{\sum _{j\in K}\left({E}_{j}\left({R}_{1}\right)+{E}_{j}\left({R}_{2}\right)\right)+\epsilon}\right)$(19)${\zeta}_{2}\left({R}_{1},{R}_{2}\right)=arctan\left(\frac{{E}_{i}\left({R}_{1}\right)+{E}_{\pi \left(i\right)}\left({R}_{2}\right)+\epsilon}{\sum _{j\in K}\left({E}_{j}\left({R}_{1}\right)+{E}_{j}\left({R}_{2}\right)\right)+\epsilon}\right)$(20)

where *R*_{1} and *R*_{2} are regions of the same size and are positioned at opposite sides of the symmetry axes. *π*(*i*) = (*M*_{
ζ
} - *i*)%*M*_{
ζ
}; *M*_{
ζ
} is interval numbers of [0, *π*] and *M*_{
ζ
} = 6 in the experiment.

where **O**^{
e
}is denoted as the observation model based on an EOH, and *λ*_{
e
} is a factor determined by the variation of the EOH distribution.

### 4.3. Improving robustness

Both visual features introduced above are based on histograms, while all spatial information is discarded. This may lead to false objects and local minima, and even tracking failure under occlusion. On the other hand, methods incorporating the spatial information are computationally intensive. Motivated by the approaches proposed in [26, 27], spatial information is incorporated into the observation model by dividing the vehicle to be tracked into a number of fragments.

*q*

^{ l }}

_{l = 1, ..., L}instead of one global histogram, where

*L*is the number of fragments. Let the target candidate centered at position

**C**be represented by {

*p*

^{ l }(

**C**)}

_{l = 1, ..., L}, where

*p*

^{ l }(

**C**) is built in the same manner as the observation model. With this definition, we propose the similarity function as follows:

*λ*

^{(l)}describes the important weight of each fragment and subjects to ${\sum}_{l=1}^{L}{\lambda}^{\left(l\right)}=1$. The similarity function of each fragment is calculated by similarity measurements of different features between

*p*

^{ l }(

**C**) and

*q*

^{ l }. During tracking, each fragment should play a role at different levels due to occlusions or other kinds of appearance changes. A higher value

*λ*

^{(l)}means that the tracking algorithm will refer more to the

*l*th fragment. Conversely, a fragment with little weight will count less for the final tracking result. Here, we regard a fragment as being more important if it is more similar with the reference fragment, and at the same time less similar with the background:

*γ*tunes the proportion of ${\lambda}_{\mathrm{fg}}^{\left(l\right)}$ and ${\lambda}_{\mathrm{bg}}^{\left(l\right)}$, that we set it 0.8 in the following experiments. The background region for each fragment is selected as the neighborhood surrounding region with a double size excluding the fragment. Accordingly, the feature histogram of the background region is extracted. To measure the similarity more properly, we use the metric proposed by Nummiaro et al. [28]:

where *d*^{(l)}is the distance of two feature histograms.

### 4.4. Adaptive integration

We employ an adaptive integration of the multiple visual features mentioned above, i.e., democratic integration. This integration strategy changes each feature's weight adaptively, according to its reliability in the previous frame, and improves the performance robustness of the visual features.

*α*

^{ c }and

*α*

^{ e }are the weights of color histogram and EOH features, respectively, and

*α*

^{ c }+

*α*

^{ e }= 1. The final state vector can be obtained by the maximum likelihood estimation:

*f*is a sign to indicate the type of feature, i.e., color or EOH. In general, the change between two adjacent frames is small, so the weight of a feature can be predicted by

where *τ* is a constant to determine the adaptive rate of change of weight; Δ*t* is a continuous time interval between two frames. From Equation (32), the weight of feature whose current weight is less than the value of ${\stackrel{\u0304}{\gamma}}_{t-1}^{f}$ may be increased. That is to say that this strategy always increases the weight of a feature with a high reliability and reduces the weight with a low reliability.

*p*

_{ f }(

**x**

_{ i },

*t*) ∝

*p*

_{ f }(

*Z*

_{ i }|

*M*

_{ f, F }).

*Z*

_{ i }is the observation at pixel

*i*;

*M*

_{ f, F }is the foreground model of feature

*f*,

*p*

_{ f }(

*Z*

_{ i }|

*M*

_{ f, F }) represents the observation likelihood of the pixel

*i*given the foreground model

*M*

_{ f, F }of feature

*f*. The higher the pixel's value in

*p*

_{ f }(

**x**

_{ i },

*t*) is, the higher the likelihood of pixel

*i*belongs to the foreground. Hence, ${\gamma}_{t}^{f}$ is defined as the ratio between the numbers of probabilistic pixels of foreground and background in the probabilistic distribution map:

*W*:

## 5. Model updating

*O*

_{1}, ...,

*O*

_{ t }to estimate to the observation model

**O**

_{t+1}in the next frame. Assume that the appearance of a vehicle remains the same during tracking, the observation model in the coming frame is

where ${\stackrel{\u0304}{\mathbf{S}}}_{t}$ is the state of the vehicle at time *t*, and $p\left({\stackrel{\u0304}{\mathbf{S}}}_{t}\right)$ is the observation estimation covered by ${\stackrel{\u0304}{\mathbf{S}}}_{t}$.

where *β*_{
t
} is named as a forgetting factor, and it is used to minimize the impact on the observation model by specific frames and to control the speed of model updating. It is inevitable that some kinds of errors will be made during tracking. There exist two kinds of errors: errors caused by accumulation, and errors caused by object distortion. The former is caused by the accumulation of small errors from frequent updating; the latter is usually a fatal error which is induced by maintaining the same observation model during tracking. Therefore, the key problems are when to update the model and the rate of updating.

*η*

_{1}(

*t*) and

*η*

_{2}(

*t*), to determine the forgetting factor

*β*

_{ t }at time

*t*:

where *k* is a constant.

*η*

_{1}(

*t*) is defined using the similarity measurement between the candidate and the reference observation:

*ρ*(·,·) is the similarity measurement between

**O**

_{ t }and ${\stackrel{\u0304}{\mathbf{S}}}_{t}$.

*Th*

_{1}is empirical and is set to 0.8 in the experiments. The bounding box scale changes due to the vehicle's motion trajectory. We employ the bounding box scale recursion introduced by McKenna et al. [32]:

*μ*

_{t+1}and ${\sigma}_{t+1}^{2}$ represent the new mean and the new variance of the recursive bounding box scale, respectively, and ${s}_{t+1}^{\prime}$ represents the newly detected bounding box scale.

*C*is used to control the forgotten rate of the recursive bounding box scale of the vehicle. If

*c*is large, the history of the bounding box scale will fade out slowly. This is good for a vehicle as a rigid object with a fixed shape, and the history of the bounding box scale will be kept through the large

*c*. In the experiments,

*c*is set to 0.9. Here,

*η*

_{2}(

*t*) is defined according to the new mean and variance:

where *CV*(*t*) = *σ*_{
t
}/*μ*_{
t
} is the dispersion coefficient. *Th*_{1} is empirical and is set to 0.2 in the experiments.

## 6. Robust tracking under PF

According to the state and observation model, multi-vehicle tracking is performed by running multiple-independent PFs for every vehicle in the scene. Algorithm 1 summarizes the fully automatic multi-vehicle tracking algorithm.

### Algorithm 1. Robust Tracking under PF

**Input**: {*I*_{
t
}}_{t = 1, ..., T};

**Output**: ${\left\{{\widehat{\mathbf{S}}}_{t}\left(m\right)\right\}}_{t=1,\dots ,T;m=1,\dots ,M}$;

- 1.
Detect the ROI of vehicle;

- 2.
Divide (0, 1] into

*N*independent intervals, and*N*is the number of initial particles, i.e. $\left(0,1\right]=\left(0,\frac{1}{N}\right]\cup \cdot \cdot \cdot \cup \left(\frac{N-1}{N},1\right]$, where*N*is the number of initial particles; - 3.
For each initial particle set {

*S*^{ i }}_{i = 1,2,...,N}, which is independent identical distribution,*S*^{ i }is denoted as ${\mathbf{S}}^{i}=U\left(\left(\frac{i-1}{N},\frac{i}{N}\right]\right)$, where*U*((*u*,*v*]) is uniform distribution in (*u*,*v*]; - 4.
The vehicle is fragmented according to the set of features generated by the EOH;

- 5.
Compute the initial HSV color histogram of each fragments of vehicle;

- 6.
Compute the initial EOH histogram of each fragments of vehicle;

- 7.
Initialize the weights of integration of the color and EOH features:

*α*^{ c }=*α*^{ e }= 0.5; - 8.
*For t*= 1,2,...*For i*= 1,...,*N*Predict the state of the vehicle by Equation (5): ${\stackrel{\u0304}{\mathbf{S}}}_{t}^{i}=E\left({\mathbf{S}}_{t}^{i}\right)=2{\mathbf{S}}_{t-1}^{i}-{\mathbf{S}}_{t-2}^{i}$;

Compute the observation likelihood of color $p\left({\mathbf{O}}_{t}^{c}|{\mathbf{S}}_{t}^{i}\right)$ by Equation (27)

Compute the observation likelihood of EOH $p\left({\mathbf{O}}_{t}^{e}|{\mathbf{S}}_{t}^{i}\right)$ by Equation (28)

Generate the observation likelihood integrating both color and EOH $p\left({\mathbf{O}}_{t}|{\mathbf{S}}_{t}^{i}\right)$ by Equation (29);

Update the importance weights: ${\omega}_{t}^{i}={\omega}_{t-1}^{i}p\left({\mathbf{O}}_{t}|{\mathbf{S}}_{t}^{i}\right)$;

*End For* - 9.
*If*it is necessary to do re-samplingObtain a new set of particles: $\left\{{\mathbf{S}}_{t}^{i},1/N\right\}~\left\{{\mathbf{S}}_{t}^{i},{\omega}_{t}^{i}\right\}$;

*End if* - 10.
Generate the final state vector by Eqn. (30);

- 11.
Compute the quality function ${\gamma}_{t}^{f}$ of color and EOH by Equation (33), respectively;

- 12.
Compute the integrated weight ${\alpha}_{t}^{f}$ of color and EOH by Equation (32), respectively;

- 13.
According to probability density distribution of the posterior of a vehicle's state, compute the two factors

*η*_{1}(*t*) and*η*_{2}(*t*) by Equations (39) and (42), respectively; - 14.
Obtain the forgetting factor

*β*_{ t }by Equation (38) to update the vehicle's observation model by Equation (37);*End For*

## 7. Experimental results

In this section, the proposed approach is used to track vehicles on the road. In our experiments, the dataset is composed of video sequences which were obtained from a real surveillance camera. The camera is fixed on a pole in highway and has a high-angle shot to one side of a driveway. All the experiments were carried out on 640 × 480 pixel sequences with an Intel^{®} Core™ Duo CPU T7500 2.93 GHz PC. A real-life scenario, including partial occlusion, large-area occlusion in a short time and scale variation, is considered. We verify the performance of our approach via single and multiple vehicle target trackings. In the experiments, the length of each video sequence is 100 frames, and the number of particles is set to 50.

### 7.1. Quantitative evaluation

where $\left({x}_{t}^{\prime},{y}_{t}^{\prime}\right)$ and ${s}_{t}^{\prime}$ are the ground-truth centroid and scale of a vehicle at time *t*, respectively. *M* is the measurement time.

### 7.2. Tracking results and discussion

For comparison, we conducted our experiments with four different types of trackers: a color-based PF tracker (Tracker 1), an EOH-based PF tracker (Tracker 2), a PF tracker based on fixed-weight multiple visual features (Tracker 3), and our approach. The former three trackers had no adaptive updating during tracking, and the weights of color histogram and EOH features are 0.5 in Tracker 3, respectively.

## 8. Conclusions

This article presents a robust tracking approach for multiple vehicles using adaptive integration of multiple visual features. Color histograms and EOHs are selected as visual features to model the observation of vehicles and integrated by a democratic integration strategy, and the observation model is embedded in a PF tracking framework. The spatial information is incorporated into the observation model to improve the robustness of object representation by dividing the object to be tracked into a number of fragments. Further, in order to avoid errors caused by model drift, the updating process should only be implemented in a reliable manner, and the rate of updating can be controlled according to this reliability. The posterior probability density function of distribution of state vector and similarity between the candidate and reference observation of an object are used to define the valid measurement of reliability to model updating during tracking. Experimental results in real traffic surveillance video sequences show that our approach outperforms others in vehicle tracking under complex conditions.

## Declarations

### Acknowledgements

This study was supported by the optional research topic from the National Natural Science Foundation of China (No. 61103094). Furthermore, it was also supported by the National High Technology Research and Development Program (863) with the research topic ID 2011AA010502.

## Authors’ Affiliations

## References

- Ruiter H, Benhabib B: Tracking of rigid-bodies for autonomous surveillance. In
*Proceedings of IEEE International Conference on Mechatronics and Automation*.*Volume 2*. Niagara Falls, Canada; 2005:928-933.Google Scholar - Chen YQ, Rui Y, Huang TS: JPDAF based HMM for real-time contour tracking. In
*Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition*.*Volume 1*. Hawaii, USA; 2001:543-550.Google Scholar - Li PH, Zhang TW, Arthur ECP: Visual contour tracking based on particle filters.
*Image Vis Comput*2003, 21: 111-123. 10.1016/S0262-8856(02)00133-6View ArticleGoogle Scholar - Czyz J, Ristic B, Macq B: A particle filter for joint detection and tracking of color objects.
*J Image Vis Comput*2007, 25: 1271-1281. 10.1016/j.imavis.2006.07.027View ArticleGoogle Scholar - Zhai Y, Yeary MB, Cheng S, Kehtarnavaz N: An object-tracking algorithm based on multiple-model particle filtering with state partitioning.
*IEEE Trans Instrum Meas*2009, 58: 1797-1809.View ArticleGoogle Scholar - Kazuhiro H: Adaptive weighting of local classifiers by particle filters for robust tracking.
*Pattern Recogn*2009, 42: 619-628. 10.1016/j.patcog.2008.09.026MATHView ArticleGoogle Scholar - Cui P, Sun L, Yang S: Adaptive mixture observation models for multiple object tracking.
*Sci China Ser F: Inf Sci*2009, 52: 226-235. 10.1007/s11432-009-0054-4MATHMathSciNetView ArticleGoogle Scholar - Comaniciu D, Ramesh V, Meer P: Kernel-based object tracking.
*IEEE Trans Pattern Anal Mach Intell*2003, 25: 564-577. 10.1109/TPAMI.2003.1195991View ArticleGoogle Scholar - Jonathan D, Ian R: Articulated body motion capture by stochastic search.
*Int J Comput Vis*2005, 61: 185-205.View ArticleGoogle Scholar - Tu Q, Xu YP, Zhou ML: Robust vehicle tracking based on scale in-variant feature transform. In
*Proceedings of IEEE International Conference on Information and Automation*. Changsha, China; 2008:86-90.Google Scholar - Wei Q, Xiong Z, Li C: Color spatial feature based approach for multiple-vehicle tracking.
*Appl Opt*2010, 49(31):6034-6047.View ArticleGoogle Scholar - Jahangheer SS, Khan MI: Detection and tracking of rotated and scaled targets by use of hilbert-wavelet transform.
*Appl Opt*2003, 42(23):4718-4735. 10.1364/AO.42.004718View ArticleGoogle Scholar - Birchfield S: Elliptical head tracking using intensity gradients and color histograms. In
*Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition*. Santa Barnara, CA, USA; 1998:232-237.Google Scholar - Tao X, Christian D: Monte Carlo visual tracking using color histograms and a spatially weighted oriented Hausdorff measure. In
*Proceedings of the Conference on Analysis of Images and Patterns*.*Volume 2756*. Groningen, Netherlands; 2003:190-197. 10.1007/978-3-540-45179-2_24View ArticleGoogle Scholar - Kwolek B: Stereovision-based head tracking using color and ellipse fitting in a particle filter. In
*Proceedings of the 8th European Conference on Computer Vision*.*Volume 3023*. Prague, Czech Republic; 2004:192-204.Google Scholar - Spengler M, Schiele B: Towards robust multi-cue integration for visual tracking.
*Mach Vis Appl*2003, 14: 50-58. 10.1007/s00138-002-0095-9View ArticleGoogle Scholar - Jepson AD, Fleet DJ, El-Maraghi TF: Robust online appearance models for visual tracking.
*IEEE Trans Pattern Anal Mach Intell*2003, 25(10):415-522.View ArticleGoogle Scholar - Toyama K, Blake A: Probabilistic tracking with exemplars in a metric space.
*Int J Comput Vis*2002, 48(1):9-19. 10.1023/A:1014899027014MATHView ArticleGoogle Scholar - Yang M, Wu Y: Tracking non-stationary appearances and dynamic feature selection. In
*Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition*.*Volume 2*. San Diego, CA, USA; 2005:1059-1066.Google Scholar - Avidan S: Ensemble tracking.
*IEEE Trans Pattern Anal Mach Intell*2007, 29(2):261-271.View ArticleGoogle Scholar - Triesch J, Malsburg C: Self-organized integration of adaptive visual cues for face tracking. In
*Proceedings of IEEE International Conference on Automatic Face Gesture Recognition*. Grenoble, France; 2000:102-107.Google Scholar - Sheng H, Xiong Z, Weng JN, Wei Q: An approach to detecting abnormal vehicle events in complex factors over highway surveillance video.
*Sci China Ser E: Technol Sci*2008, 51: 199-208. 10.1007/s11431-008-6011-4View ArticleGoogle Scholar - Sheng H, Li C, Wei Q, Xiong Z: Real-time detection of abnormal vehicle events with multi-feature over Highway Surveillance Video. In
*Proceedings of IEEE International Conference on Intelligent Transportation System*. Beijing, China; 2008:550-556.Google Scholar - Duan Z, Cai Z, Yu J: Adaptive particle filter for unknown fault detection of wheeled mobile robots.
*Proceedings of IEEE International Conference on Intelligent Robots and Systems*2006, 1312-1315.Google Scholar - Levi K, Weiss Y: Learning object detection from a small number of examples: the importance of good features.
*Comput Vis Pattern Recogn*2004, 2: 53-60.Google Scholar - Adam A, Rivlin E, Shimshoni I: Robust fragments-based tracking using the integral histogram. In
*Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition*.*Volume 1*. New York, USA; 2006:798-805.Google Scholar - Maggio E, Cavallaro A: Multi-part target representation for color tracking. In
*Proceedings of IEEE International Conference on Image Processing*.*Volume 1*. Genoa, Italy; 2005:729-732.Google Scholar - Nummiaro K, Koller-Meier E, Gool LJV: An adaptive color-based particle filter.
*Image Vis Comput*2003, 21: 99-110. 10.1016/S0262-8856(02)00129-4View ArticleGoogle Scholar - Choeychuen K, Kumhoma P, Chamnongthaia K: Robust ambiguous target handling for visual object tracking.
*AEU Int J Electron Commun*2010, 64(10):960-970. 10.1016/j.aeue.2009.10.005View ArticleGoogle Scholar - Moreno-Noguer F, Sanfeliu A: A framework to integrate particle filters for robust tracking in non-stationary environments.
*Pattern Recogn Image Anal*2005, 3522: 93-101. 10.1007/11492429_12Google Scholar - Liu H, Yu Z, Zha HB, Zou YX, Zhang L: Robust human tracking based on multi-cue integration and mean-shift.
*Pattern Recogn Lett*2009, 30(9):827-837. 10.1016/j.patrec.2008.10.008View ArticleGoogle Scholar - McKenna S, Jabri S, Doric Z, Wechsler H, Rosenfeld A: Tracking groups of people.
*Comput Vis Image Understand*2000, 80: 42-56. 10.1006/cviu.2000.0870MATHView ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.