 Research
 Open Access
 Published:
Robust multiplevehicle tracking via adaptive integration of multiple visual features
EURASIP Journal on Image and Video Processing volume 2012, Article number: 2 (2012)
Abstract
This article presents a robust approach to tracking multiple vehicles with integration of multiple visual features. The observation is modeled by democratic integration strategies according to the reliability of the information in the current multivisual features to adjust their weights. The appearance model is also embedded in a particle filter (PF) tracking framework. Furthermore, we propose a new model updating algorithm based on the PF. In order to avoid incorrect results caused by "model drift" introduced into the observation model, model updating should only be controlled in a reliable manner, and the rate of updating is based on reliability. This article also presents the experiments using a real video sequence to verify the proposed method.
1. Introduction
With the rapid process of urbanization, the concept of developing a "smart city" has gained prominence. As an important part of this trend, Intelligent Transportation Systems (ITS) will be critical for effective management of urban traffic. Vehicle tracking under different traffic scenarios is one of the key issues in ITS. Vehicle motion parameters, such as location, velocity, orientation, and acceleration, can be obtained to further recognize and understand vehicular behavior. However, the challenges of robust tracking come from uncertain and dynamic conditions of speed, occlusion, deformation, illumination variation, background clutter, realtime restriction, etc. In order to handle these problems, great effort has been made to devise robust tracking algorithms. In general, the following three key problems should be solved in tracking: (1) an effective framework to locate vehicles in motion; (2) modeling observation of vehicles; and (3) reliable updating of vehicle models.
An ideal locating framework should be able to predict and update the motion state and observation model of an object, and even track multiple objects under various conditions. Probabilistic tracking, which is a process utilizing posterior probability density of target states in a Bayesian framework, is a highly effective approach. Kalman Filter (KF) [1], Hidden Markov Model [2], and Particle Filter (PF) [3–7] techniques have been used in different tracking applications. PF recursively constructs the posterior probability distribution function of the state space using a Monte Carlo integral. A PFbased tracking algorithm has the added advantage that any visual feature can be used for the observation model. Meanwhile, it has the ability to integrate multiple visual features.
The observation model depicts similarity measurements between a template region and the candidate region of a vehicle, and plays an even critical role in visual tracking associated with PF. Many visual features can be selected for vehicle observation modeling, including color [8], edge [3, 9], feature descriptors [10], colorspatial features [11], wavelets [12], etc. For instance, it is sensitive to the variation of a given illumination environment using a colorbased method when the illumination varies. An edgebased method can avoid disturbances caused by illumination variances, but it is either timeconsuming or limited to a single shape model, and presents difficulties in achieving accurate realtime tracking. The algorithms based on these methods have achieved good tracking performance, but relying on a single visual feature is often inadequate and unstable in some complex tracking scenarios. Various complementary features can be combined to derive more robust tracking results. It is our interest to employ multiple visual features under a robust tracking framework. The advantage of this kind of method is that vehicle information can complement other visual features. When one visual feature fails, others can be used to maintain tracking. However, the difficulty is how to design a good strategy to integrate visual features reliably. Methods proposed in [13–15] are based on the fixed weight integration. If one visual feature with a fixed weight changes markedly, the observation model after integration will be unreliable. This leads to tracking drift away from the true location, and even tracking failure. Spengler and Schiele [16] proposed an algorithm with an adaptive integration strategy using an EM algorithm to adjust the weight of each visual feature online. However, this algorithm is based on a matching algorithm under a local search. When partial or complete occlusion occurs, tracking performance will seriously decline.
How to update a vehicle model to deal with appearance changes during tracking is very important for the robustness of an algorithm. Many algorithms assume the appearance of an object as being invariable during tracking. The appearance model of an object is usually extracted in the first frame image, and then the most probable location of the object is found in the following frame. This assumption is reasonable for shortterm tracking. However, for longterm tracking, appearance changes in an object are inevitable. Jepson et al. [17] proposed an adaptive texturebased model named WSL. This model consists of three components to describe object appearance changes, where W describes the rapid changes in object appearance, S is used to characterize the stability in an object whose change in appearance is slow, and L is defined to depict abnormal variations in object appearance. A Gaussian Mixture Model (GMM) is constructed by these components, and the parameters of GMM are updated through the EM algorithm online. The proposed model has strong robustness to changes in both illumination and shape. However, it fails to track when an object is occluded by one with the same visual features for even a moment. The reason is that the same information presented by the occluding object is added into the model of the occluded object during updating. After occlusion, the appearance model cannot correctly reflect the object. This phenomenon is called "model drift". A fixed number of prelearned exemplars are used as templates by Toyama and Blake [18]. The problem with this method is that only a fixed number of examples can be used as templates to model the appearance of an object. Yang and Wu [19] introduced a closedform solution by "discriminative training" of a generative model to alleviate model drift. They optimize a convex combination of the generative and discriminative log likelihood functions to obtain the model. Avidan [20] treated tracking as a classification problem. The ensemble of weak classifiers is combined into a strong classifier using AdaBoost. The strong classifier is then used to label pixels in the next frame as either belonging to the object or the background, creating a confidence map. The new position of the object is found in the peak of the map by using a mean shift. However, only the color of each pixel is used to classify, and the classifier needs to update background information around the object. When two objects of a similar color are very near each other, tracking will fail.
This article proposes a robust tracking approach with an adaptive integration of multiple visual features for vehicles. A color histogram and an edge orientation histogram (EOH) are selected as visual features to model the observation of the vehicle and integrated by a democratic integration strategy proposed by Triesch and Malsburg [21]. It is suitable for dynamic scenes due to the adaptive adjustment of the weight of each visual feature with its reliability in the current frame. However, deterministic integration is vulnerable to occlusion for a few frames because the present iteration is initialized according to the previous one. Thus, the observation model is embedded in the PF tracking framework. In order to improve the robustness of object representation, spatial information is incorporated into the observation model by dividing the object to be tracked into a number of fragments. We then analyze the reason for model drift during the model update process, and propose a new model updating method under a PF. In order to avoid errors caused by model drift, the updating process should only be implemented in a reliable manner, and the rate of updating can be controlled according to this reliability. The posterior probability density function of distribution of state vector and similarity between the candidate and reference observation of an object are used to define the valid measurement of the reliability to model updating during tracking. Experimental results in real traffic surveillance video sequences show that our approach outperforms others in vehicle tracking under complex conditions.
The remainder of this article is organized as follows. The preprocessing before tracking is described in Section 2. The state model for multiple vehicles is built in Section 3. The adaptive and robust observation model is presented in Section 4. The reliable model updating strategy is introduced in Section 5. The PFbased tracking algorithm is completely summarized in Section 6. The experimental results are given in Section 7, and finally, the conclusion is given in Section 8.
2. Preprocessing before tracking
2.1. Background modeling
In surveillance video, it can be seen that the background changes along with illumination, weather, and other conditions. So, we must process the surveillance video scene first. Our previous research has given a selfadaptive modeling for realtime background modeling with lower computationalcomplexity and higher accuracy [22].
In the (n + 1)th frame, the gray value of point p can be described as follows:
where G(n, p) is the pixel p's gray value in the n th frame, L(n, p) is the model to describe the change of illumination with the change of time, and noise^{1}(n, p) is the Gaussian noise taking zero as the center. The gray value of pixel p in the input image can be described as:
where noise^{2}(n, p) is Gaussian noise taking zero as the center. A comparison between (1) and (2) can easily indicate that
where ω(n+1, p) = L(n, p)+noise^{1}(n, p)+noise^{2}(n+1, p). ω(n, p) is a Gaussian distribution. We use a mean value to represent m(n, p) and s(n, p), respectively, and use a variable to represent ω(n, p). In traffic surveillance video, illumination and noise distribution change little in a triangular region. Therefore, m(n, p) and s(n, p) are independent of the position of pixel p. Then, a histogram can be derived from the difference between {I(n+1, p)} and {G(n, p)} in a triangular region. From this histogram, the mean value of m(n) and s(n) can be estimated by a selfadaptive filter based on a recursive least square method.
Figure 1 gives the background in four video surveillance scenes, where the regions masked in red are not our monitoring driveways.
2.2. Detecting the ROI of a vehicle
The aim is to track vehicle targets in real time and as robustly as possible. The first step is to detect the vehicle targets automatically. The initial detection and the tracking regions are set in the field of vision, respectively, as shown in Figure 2.
Then, due to its success in vehicle detection of a real surveillance scene, a fastconstrained Delaunay triangulation (FCDT) algorithm [23] is used as follows:

(1)
Extract contour information with a Canny filter

(2)
Extract lines from image contours with a Hough transformer

(3)
Achieve a set of corners at both ends of the lines

(4)
Initialize the CDT based on all constrained edges

(5)
Insert all corner points in turn to reconstruct the CDT

(6)
Extract corner density, horizontal straight line density, vertical straight line density, triangle density, and average intensity of a vehicle region to construct the feature vector

(7)
Put the feature vector into SVM to determine the ROI of the vehicle target
The detection results are shown in Figure 3. The blue lines construct the Delaunay triangulation net. The red rectangle bounding box is the ROI of the vehicle.
3. State modeling for multiple vehicles
According to the characteristics of vehicle motion, we build the prediction equation of the motion state using the secondorder linear regression. The state model is built by a centroid and the area of a rectangular bounding box:
where C = (x, y) and s are the centroid and area bounding box, respectively.
The current state S_{ t }is predicted by three parts: the previous state S_{t1}, the last state displacement S_{t1} S_{t2}, and a zeromean Gaussian stochastic component ω_{ t } with covariance matrix ∑:
Hence, the model can be denoted as a Gaussian distribution as follows:
For multiple vehicles, we suppose that vehicles are independent from each other and there are M vehicles in a video scene. So, the model is regarded as
where S_{ t }(m) is the state vector of the m th vehicle in the k th frame.
4. Adaptive integrationbased observation model
The observation models encode the visual information of a vehicle's appearance. Since a single visual feature does not work in all cases, we utilize the HueSaturationValue (HSV) color histogram to capture the color information of a vehicle, and an EOH to encode shape information, indicating that O = {O_{ t }; t ∈ N} is denoted as the vehicle's observation model.
4.1. Color features
We obtain the color information of a vehicle by a twopart color histogram based on the HSV color space. We use the HSV color histogram because it decouples the intensity from Hue and Saturation, and thus it is less sensitive to illumination effects than a histogram from the RGB color space. The exploitation of the spatial layout of the color is also crucial due to the fact that different vehicles usually have different colors.
In the nonGaussian state space, state model S is assumed to be a hidden Markov process, with an initial distribution p(S_{0}) and a transfer distribution p(S_{ t }S_{t1}). A color histogrambased observation model {\mathbf{O}}_{t}^{c} is obtained through the marginal distribution p\left({\mathbf{O}}_{t}^{c}{\mathbf{S}}_{t}\right). Our color observation model is composed of a 2D histogram based on Hue and Saturation and a 1D histogram based on value. Both histograms are normalized such that all bins sum to one. We assign the same number of bins for each color component, i.e., N_{h} = N_{s} = N_{v} = 10, resulting in an N = N_{h} × N_{s}+N_{v} = 110dimensional HSV histogram.
Assume that R(S_{ t }) is the candidate region of vehicle at time t, the kernel density estimation of color distribution is
where b_{ t }(d) ∈ {1, ..., N} is the index of color bins of a pixel at position d; δ[·] is the delta function; κ is a normalized factor to subject to {\sum}_{n=1}^{N}k\left(n;{\mathbf{S}}_{t}\right)=1; position d is a pixel in the candidate region R(S_{ t }). Suppose that {\mathbf{K}}^{*}\triangleq {\left\{{k}^{*}\left(n;{\mathbf{S}}_{0}\right)\right\}}_{n=1,...,N} is the reference template and \mathbf{K}\left({\mathbf{S}}_{t}\right)\triangleq {\left\{k\left(n;{\mathbf{S}}_{t}\right)\right\}}_{n=1,...,N} is the candidate model, the similarity measurement is defined based on Bhattacharyya coefficient:
Therefore, the colorbased observation model is denoted as follows:
where λ_{ c } is a factor determined by the variation of color Gaussian distribution. Figure 4 shows the HSV color histograms of two vehicles.
4.2. Shape features
We apply an EOH to describe shape information of a vehicle. In order to detect the edge, the color image must be converted to grayscale at first. The gradient at pixel (x, y) in the image I can be computed by the Sobel operator mask:
where Sobel_{h} and Sobel_{v} are horizontal and vertical masks of the Sobel operator. The strength of an edge is computed as follows:
In order to suppress noise we threshold G(x, y) such that
where the value of T was suggested to be set between 80 and 110 in [24]. The orientation of the edge is
Then, the edges are divided into K bins. The value of the k th bin is denoted as
Figure 5 shows the EOHs of the vehicles in Figure 4.
Levi and Weiss [25] introduced three extended features based on EOH. However, direct use of these features for vehicles has some limitations. First, the values of both the ratio of edge strength of any two orientations and the dominant orientation feature have a large range, but the values of the discriminative features distribute into a relatively small scope. It cannot reflect characteristics of the majority of edges. Second, the orientations of symmetrical edges should be complementary instead of equal, because of the symmetry of two regions. Hence, we provide an enhanced feature set. These features are used to improve robustness in Section 4.3.

(1)
Edge Strength Features in any two orientations ϕ:
{\varphi}_{i,j}\left({q}^{l}\right)=arctan\left(\frac{{E}_{i}\left({q}^{l}\right)+\epsilon}{{E}_{j}\left({q}^{l}\right)+\epsilon}\right)(17) 
(2)
Dominant orientation features φ:
{\phi}_{i}\left({q}^{l}\right)=arctan\left(\frac{{E}_{i}\left({q}^{l}\right)+\epsilon}{\sum _{j\in K}{E}_{j}\left({q}^{l}\right)+\epsilon}\right)(18) 
(3)
Symmetry features ζ:
{\zeta}_{1}\left({R}_{1},{R}_{2}\right)=arctan\left(\frac{{E}_{i}\left({R}_{1}\right){E}_{\pi \left(i\right)}\left({R}_{2}\right)+\epsilon}{\sum _{j\in K}\left({E}_{j}\left({R}_{1}\right)+{E}_{j}\left({R}_{2}\right)\right)+\epsilon}\right)(19){\zeta}_{2}\left({R}_{1},{R}_{2}\right)=arctan\left(\frac{{E}_{i}\left({R}_{1}\right)+{E}_{\pi \left(i\right)}\left({R}_{2}\right)+\epsilon}{\sum _{j\in K}\left({E}_{j}\left({R}_{1}\right)+{E}_{j}\left({R}_{2}\right)\right)+\epsilon}\right)(20)
where R_{1} and R_{2} are regions of the same size and are positioned at opposite sides of the symmetry axes. π(i) = (M_{ ζ }  i)%M_{ ζ }; M_{ ζ } is interval numbers of [0, π] and M_{ ζ } = 6 in the experiment.
Suppose that {\mathbf{E}}^{*}\triangleq {\left\{{e}^{*}\left(n;{\mathbf{S}}_{0}\right)\right\}}_{n=1,...,K} is the reference template and \mathbf{E}\left({\mathbf{S}}_{t}\right)\triangleq {\left\{e\left(n;{\mathbf{S}}_{t}\right)\right\}}_{n=1,...,K} is the candidate model of EOH, the similarity measurement is defined as follows:
Therefore, the colorbased observation model is denoted as follows:
where O^{e}is denoted as the observation model based on an EOH, and λ_{ e } is a factor determined by the variation of the EOH distribution.
4.3. Improving robustness
Both visual features introduced above are based on histograms, while all spatial information is discarded. This may lead to false objects and local minima, and even tracking failure under occlusion. On the other hand, methods incorporating the spatial information are computationally intensive. Motivated by the approaches proposed in [26, 27], spatial information is incorporated into the observation model by dividing the vehicle to be tracked into a number of fragments.
The reference observation of a vehicle is represented by multiple fragments using multiple feature histograms {q^{l}}_{l = 1, ..., L}instead of one global histogram, where L is the number of fragments. Let the target candidate centered at position C be represented by {p^{l}(C)}_{l = 1, ..., L}, where p^{l}(C) is built in the same manner as the observation model. With this definition, we propose the similarity function as follows:
where λ^{(l)}describes the important weight of each fragment and subjects to {\sum}_{l=1}^{L}{\lambda}^{\left(l\right)}=1. The similarity function of each fragment is calculated by similarity measurements of different features between p^{l}(C) and q^{l}. During tracking, each fragment should play a role at different levels due to occlusions or other kinds of appearance changes. A higher value λ^{(l)}means that the tracking algorithm will refer more to the l th fragment. Conversely, a fragment with little weight will count less for the final tracking result. Here, we regard a fragment as being more important if it is more similar with the reference fragment, and at the same time less similar with the background:
where γ tunes the proportion of {\lambda}_{\mathrm{fg}}^{\left(l\right)} and {\lambda}_{\mathrm{bg}}^{\left(l\right)}, that we set it 0.8 in the following experiments. The background region for each fragment is selected as the neighborhood surrounding region with a double size excluding the fragment. Accordingly, the feature histogram of the background region is extracted. To measure the similarity more properly, we use the metric proposed by Nummiaro et al. [28]:
where d^{(l)}is the distance of two feature histograms.
Many suggested methods [29–31] divide an object into multiple nonoverlapping fragments. Note that both the number of fragments and their delineation have an impact on tracking efficiency and accuracy. Although the robustness increases with the number of fragments, too many fragments mean an increased processing time for each frame. Since the computation required for each frame greatly depends on the size of each fragment, which also needs to be restricted. Further, selecting very small fragments will result in tracking drift, or some information about the vehicle being discarded. So, a tradeoff is required. We prompt the use of some overlapping fragments. A set of nonoverlapping horizontal fragments and a set of nonoverlapping vertical fragments are overlapped. Horizontal and vertical fragments are obtained by the dominant orientation features and the symmetry feature introduced in Section 4.2, respectively. When the size of the fragment is less than 8 × 8, it will be discarded. The satisfactory results of fragmentation are shown in Figure 6.
Hence, the color and EOHbased observation with fragmentation are denoted as follows:
4.4. Adaptive integration
We employ an adaptive integration of the multiple visual features mentioned above, i.e., democratic integration. This integration strategy changes each feature's weight adaptively, according to its reliability in the previous frame, and improves the performance robustness of the visual features.
The complete observation model is defined as
where α^{c} and α^{e} are the weights of color histogram and EOH features, respectively, and α^{c} +α^{e} = 1. The final state vector can be obtained by the maximum likelihood estimation:
In order to verify the consistency between results by integration of multiple visual features and by a single feature, a quality function {\gamma}_{t}^{f} is introduced and normalized as follows:
where f is a sign to indicate the type of feature, i.e., color or EOH. In general, the change between two adjacent frames is small, so the weight of a feature can be predicted by
where τ is a constant to determine the adaptive rate of change of weight; Δt is a continuous time interval between two frames. From Equation (32), the weight of feature whose current weight is less than the value of {\stackrel{\u0304}{\gamma}}_{t1}^{f} may be increased. That is to say that this strategy always increases the weight of a feature with a high reliability and reduces the weight with a low reliability.
In fact, {\gamma}_{t}^{f} can be treated as the feedback of the tracking result {\widehat{\mathbf{S}}}_{t}. The weight of each visual feature is adaptively calculated by the normalized quality function in the previous frame. In order to define {\gamma}_{t}^{f}, we employ the probabilistic distribution map in [31]: p_{ f }(x_{ i }, t) ∝p_{ f }(Z_{ i }M_{ f, F }). Z_{ i } is the observation at pixel i; M_{ f, F } is the foreground model of feature f, p_{ f }(Z_{ i }M_{ f, F }) represents the observation likelihood of the pixel i given the foreground model M_{ f, F } of feature f. The higher the pixel's value in p_{ f }(x_{ i }, t) is, the higher the likelihood of pixel i belongs to the foreground. Hence, {\gamma}_{t}^{f} is defined as the ratio between the numbers of probabilistic pixels of foreground and background in the probabilistic distribution map:
where the background is defined as the area between the tracking box and a larger window {\widehat{\mathbf{S}}\prime}_{t}, which shares the same centroid of a bounding box. Sum(·,·) is the sum of probabilistic pixels in the window W:
5. Model updating
Tracking is usually performed by searching for a location in the image that is similar to a given reference model. The updating of the observation model is implemented by the new appearance and a previous observation model O_{1}, ..., O_{ t } to estimate to the observation model O_{t+1}in the next frame. Assume that the appearance of a vehicle remains the same during tracking, the observation model in the coming frame is
It is reasonable for shortterm tracking under some conditions. However in reality, vehicles will change appearance due to a variety of factors, such as turning, scale, camera angle, etc. Therefore, this assumption will eventually lead to some errors where the observation model cannot correctly represent the actual appearance of the vehicle. In order to obtain the latest and real observation model of a vehicle, a simple model updating strategy is proposed where the observation model in the next frame is estimated by the state vector of tracking results from the previous frame:
where {\stackrel{\u0304}{\mathbf{S}}}_{t} is the state of the vehicle at time t, and p\left({\stackrel{\u0304}{\mathbf{S}}}_{t}\right) is the observation estimation covered by {\stackrel{\u0304}{\mathbf{S}}}_{t}.
This updating strategy can make the observation model of a vehicle respond to appearance changes, but it easily leads to model drift when the vehicle is occluded by other vehicles or tracking errors, or the rapid deviations from the groundtruth of vehicle observation present during the updating process. Thus, we have created an adaptive update method to maintain stability over observation changes:
where β_{ t } is named as a forgetting factor, and it is used to minimize the impact on the observation model by specific frames and to control the speed of model updating. It is inevitable that some kinds of errors will be made during tracking. There exist two kinds of errors: errors caused by accumulation, and errors caused by object distortion. The former is caused by the accumulation of small errors from frequent updating; the latter is usually a fatal error which is induced by maintaining the same observation model during tracking. Therefore, the key problems are when to update the model and the rate of updating.
In a traffic scene, the changes of tracked vehicles usually fall into two categories: change of a vehicle's scale and changes in appearance. Therefore, we define two factors, η_{1}(t) and η_{2}(t), to determine the forgetting factor β_{ t } at time t:
where k is a constant.
First, when the appearance of a vehicle is obviously changed by occlusion, illumination, etc., a significant difference appears between the reference observation and the candidate one. At this time, updating should be avoided. Thus, η_{1}(t) is defined using the similarity measurement between the candidate and the reference observation:
where ρ(·,·) is the similarity measurement between O_{ t }and {\stackrel{\u0304}{\mathbf{S}}}_{t}. Th_{1} is empirical and is set to 0.8 in the experiments. The bounding box scale changes due to the vehicle's motion trajectory. We employ the bounding box scale recursion introduced by McKenna et al. [32]:
where μ_{t+1}and {\sigma}_{t+1}^{2} represent the new mean and the new variance of the recursive bounding box scale, respectively, and {s}_{t+1}^{\prime} represents the newly detected bounding box scale. C is used to control the forgotten rate of the recursive bounding box scale of the vehicle. If c is large, the history of the bounding box scale will fade out slowly. This is good for a vehicle as a rigid object with a fixed shape, and the history of the bounding box scale will be kept through the large c. In the experiments, c is set to 0.9. Here, η_{2}(t) is defined according to the new mean and variance:
where CV(t) = σ_{ t }/μ_{ t } is the dispersion coefficient. Th_{1} is empirical and is set to 0.2 in the experiments.
6. Robust tracking under PF
According to the state and observation model, multivehicle tracking is performed by running multipleindependent PFs for every vehicle in the scene. Algorithm 1 summarizes the fully automatic multivehicle tracking algorithm.
Algorithm 1. Robust Tracking under PF
Input: {I_{ t }}_{t = 1, ..., T};
Output: {\left\{{\widehat{\mathbf{S}}}_{t}\left(m\right)\right\}}_{t=1,\dots ,T;m=1,\dots ,M};

1.
Detect the ROI of vehicle;

2.
Divide (0, 1] into N independent intervals, and N is the number of initial particles, i.e. \left(0,1\right]=\left(0,\frac{1}{N}\right]\cup \cdot \cdot \cdot \cup \left(\frac{N1}{N},1\right], where N is the number of initial particles;

3.
For each initial particle set {S^{i}}_{i = 1,2,...,N}, which is independent identical distribution, S^{i} is denoted as {\mathbf{S}}^{i}=U\left(\left(\frac{i1}{N},\frac{i}{N}\right]\right), where U((u, v]) is uniform distribution in (u, v];

4.
The vehicle is fragmented according to the set of features generated by the EOH;

5.
Compute the initial HSV color histogram of each fragments of vehicle;

6.
Compute the initial EOH histogram of each fragments of vehicle;

7.
Initialize the weights of integration of the color and EOH features: α^{c} = α^{e} = 0.5;

8.
For t = 1,2,...
For i = 1,...,N
Predict the state of the vehicle by Equation (5): {\stackrel{\u0304}{\mathbf{S}}}_{t}^{i}=E\left({\mathbf{S}}_{t}^{i}\right)=2{\mathbf{S}}_{t1}^{i}{\mathbf{S}}_{t2}^{i};
Compute the observation likelihood of color p\left({\mathbf{O}}_{t}^{c}{\mathbf{S}}_{t}^{i}\right) by Equation (27)
Compute the observation likelihood of EOH p\left({\mathbf{O}}_{t}^{e}{\mathbf{S}}_{t}^{i}\right) by Equation (28)
Generate the observation likelihood integrating both color and EOH p\left({\mathbf{O}}_{t}{\mathbf{S}}_{t}^{i}\right) by Equation (29);
Update the importance weights: {\omega}_{t}^{i}={\omega}_{t1}^{i}p\left({\mathbf{O}}_{t}{\mathbf{S}}_{t}^{i}\right);
End For

9.
If it is necessary to do resampling
Obtain a new set of particles: \left\{{\mathbf{S}}_{t}^{i},1/N\right\}~\left\{{\mathbf{S}}_{t}^{i},{\omega}_{t}^{i}\right\};
End if

10.
Generate the final state vector by Eqn. (30);

11.
Compute the quality function {\gamma}_{t}^{f} of color and EOH by Equation (33), respectively;

12.
Compute the integrated weight {\alpha}_{t}^{f} of color and EOH by Equation (32), respectively;

13.
According to probability density distribution of the posterior of a vehicle's state, compute the two factors η_{1}(t) and η_{2}(t) by Equations (39) and (42), respectively;

14.
Obtain the forgetting factor β_{ t } by Equation (38) to update the vehicle's observation model by Equation (37);
End For
7. Experimental results
In this section, the proposed approach is used to track vehicles on the road. In our experiments, the dataset is composed of video sequences which were obtained from a real surveillance camera. The camera is fixed on a pole in highway and has a highangle shot to one side of a driveway. All the experiments were carried out on 640 × 480 pixel sequences with an Intel^{®} Core™ Duo CPU T7500 2.93 GHz PC. A reallife scenario, including partial occlusion, largearea occlusion in a short time and scale variation, is considered. We verify the performance of our approach via single and multiple vehicle target trackings. In the experiments, the length of each video sequence is 100 frames, and the number of particles is set to 50.
7.1. Quantitative evaluation
We evaluate our algorithm quantitatively in order to show the robustness for tracking. The evaluation compares the position and scale estimation of our approach with the groundtruth. Root mean squared error (RMSE) is used as the performance metric. The RMSE of a vehicle's centroid and bounding box scale are defined as follows:
where \left({x}_{t}^{\prime},{y}_{t}^{\prime}\right) and {s}_{t}^{\prime} are the groundtruth centroid and scale of a vehicle at time t, respectively. M is the measurement time.
7.2. Tracking results and discussion
For comparison, we conducted our experiments with four different types of trackers: a colorbased PF tracker (Tracker 1), an EOHbased PF tracker (Tracker 2), a PF tracker based on fixedweight multiple visual features (Tracker 3), and our approach. The former three trackers had no adaptive updating during tracking, and the weights of color histogram and EOH features are 0.5 in Tracker 3, respectively.
In Figure 7, the vehicle traveled on a straight driveway. The tracking results of the four trackers are shown in Figure 7ad. From Figure 7e, f, we can see that the RMSEs of a vehicle's centroid and bounding box scale of the four trackers are all maintained at a lower level, and the four trackers provide nearly similar tracking results. Figure 7g gives the curves of weights of different visual features. It can be seen that the features are in a relatively stable state with no dramatic change throughout the video sequence because of no obvious change of illumination or translation, rotation, etc. So, the tracking results of Track 3 and our approach are more similar to each other.
Figure 8 shows the vehicle turning. Figure 8ad presents the tracking results of the four trackers and we can see from Figure 8e, f that the RMSE of our approach is lower than that of the other three trackers, especially obvious when the vehicle turns between frames 30 and 40. In Figure 8g, due to translation and rotation, the EOH of the vehicle has a greater change. The decline of the EOH weight makes the color histogram more reliable. From the comparison of the RMSEs of Tracker 3 and our approach in Figure 8e, f, we can see that the RMSEs of a vehicle's centroid are provided more similar tracking results than the bounding box scale because of the advantage of multiple visual features integration for vehicles.
The third sequence is captured at night, as shown in Figure 9. The color of the vehicle and the background are very similar, but the edge feature is obvious due to the streetlight and the headlight of the rear vehicle. Figure 9ad is still the tracking results of four trackers. From Figure 9e, f, we can see that the RMSEs of all trackers increase significantly, but our approach is still more accurate than the other methods. Furthermore, the accuracy of Tracker 3 decreases much faster than our approach, caused by the fixed weights of multiple visual features. In Figure 9g, the changes of weight show that the color histogram is unreliable with low weight because the color of the vehicle shows weak discrimination from the background. Instead, the EOH feature plays a dominant role at this moment.
When vehicles with very similar colors appear close to each other, tracking algorithms using color, EOH, or fixedweight multiple visual features fail. As shown in Figure 10, the precision of tracking with fixedweight multiple visual features is slightly better than colorbased tracking, but the deviation caused by color similarity is just prolonged and cannot be prevented completely. And the adaptive integration of multiple features makes a contribution to distinguish the vehicles to get a robust tracking result. Partial occlusion between vehicles, even large area occlusion, is the key issue to influence robustness and tracking accuracy. Figures 11, 12, 13, and 14 show the cases of occurrence of occlusion. As illustrated in Figures 11 and 12, partial and large area occlusion appear and last for about 50 frames, respectively. While the proposed approach incorporates spatial information, i.e., fragmentation, the proposed approach can track vehicles with nonoccluded fragments. Since some fragments were occluded, the observation model of these regions is unreliable and the forgetting factor is determined to be equal to 0. Therefore, the observation model is stopped from updating. When occlusion is finished, the proposed tracker can still give continuous tracking. Figure 14 shows vehicles traveling under occlusion at night. As Figure 14ac demonstrates, the former three trackers may lead to inaccurate results due to the ambiguities inherent in the processing of the video sequence when considering single modalities. There are objects in the background which have a similar appearance to the vehicle. Therefore, soon after the initialization, the colorbased tracking framework starts on the vehicle and gradually deviates from the groundtruth.
8. Conclusions
This article presents a robust tracking approach for multiple vehicles using adaptive integration of multiple visual features. Color histograms and EOHs are selected as visual features to model the observation of vehicles and integrated by a democratic integration strategy, and the observation model is embedded in a PF tracking framework. The spatial information is incorporated into the observation model to improve the robustness of object representation by dividing the object to be tracked into a number of fragments. Further, in order to avoid errors caused by model drift, the updating process should only be implemented in a reliable manner, and the rate of updating can be controlled according to this reliability. The posterior probability density function of distribution of state vector and similarity between the candidate and reference observation of an object are used to define the valid measurement of reliability to model updating during tracking. Experimental results in real traffic surveillance video sequences show that our approach outperforms others in vehicle tracking under complex conditions.
References
Ruiter H, Benhabib B: Tracking of rigidbodies for autonomous surveillance. In Proceedings of IEEE International Conference on Mechatronics and Automation. Volume 2. Niagara Falls, Canada; 2005:928933.
Chen YQ, Rui Y, Huang TS: JPDAF based HMM for realtime contour tracking. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Volume 1. Hawaii, USA; 2001:543550.
Li PH, Zhang TW, Arthur ECP: Visual contour tracking based on particle filters. Image Vis Comput 2003, 21: 111123. 10.1016/S02628856(02)001336
Czyz J, Ristic B, Macq B: A particle filter for joint detection and tracking of color objects. J Image Vis Comput 2007, 25: 12711281. 10.1016/j.imavis.2006.07.027
Zhai Y, Yeary MB, Cheng S, Kehtarnavaz N: An objecttracking algorithm based on multiplemodel particle filtering with state partitioning. IEEE Trans Instrum Meas 2009, 58: 17971809.
Kazuhiro H: Adaptive weighting of local classifiers by particle filters for robust tracking. Pattern Recogn 2009, 42: 619628. 10.1016/j.patcog.2008.09.026
Cui P, Sun L, Yang S: Adaptive mixture observation models for multiple object tracking. Sci China Ser F: Inf Sci 2009, 52: 226235. 10.1007/s1143200900544
Comaniciu D, Ramesh V, Meer P: Kernelbased object tracking. IEEE Trans Pattern Anal Mach Intell 2003, 25: 564577. 10.1109/TPAMI.2003.1195991
Jonathan D, Ian R: Articulated body motion capture by stochastic search. Int J Comput Vis 2005, 61: 185205.
Tu Q, Xu YP, Zhou ML: Robust vehicle tracking based on scale invariant feature transform. In Proceedings of IEEE International Conference on Information and Automation. Changsha, China; 2008:8690.
Wei Q, Xiong Z, Li C: Color spatial feature based approach for multiplevehicle tracking. Appl Opt 2010, 49(31):60346047.
Jahangheer SS, Khan MI: Detection and tracking of rotated and scaled targets by use of hilbertwavelet transform. Appl Opt 2003, 42(23):47184735. 10.1364/AO.42.004718
Birchfield S: Elliptical head tracking using intensity gradients and color histograms. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. Santa Barnara, CA, USA; 1998:232237.
Tao X, Christian D: Monte Carlo visual tracking using color histograms and a spatially weighted oriented Hausdorff measure. In Proceedings of the Conference on Analysis of Images and Patterns. Volume 2756. Groningen, Netherlands; 2003:190197. 10.1007/9783540451792_24
Kwolek B: Stereovisionbased head tracking using color and ellipse fitting in a particle filter. In Proceedings of the 8th European Conference on Computer Vision. Volume 3023. Prague, Czech Republic; 2004:192204.
Spengler M, Schiele B: Towards robust multicue integration for visual tracking. Mach Vis Appl 2003, 14: 5058. 10.1007/s0013800200959
Jepson AD, Fleet DJ, ElMaraghi TF: Robust online appearance models for visual tracking. IEEE Trans Pattern Anal Mach Intell 2003, 25(10):415522.
Toyama K, Blake A: Probabilistic tracking with exemplars in a metric space. Int J Comput Vis 2002, 48(1):919. 10.1023/A:1014899027014
Yang M, Wu Y: Tracking nonstationary appearances and dynamic feature selection. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. Volume 2. San Diego, CA, USA; 2005:10591066.
Avidan S: Ensemble tracking. IEEE Trans Pattern Anal Mach Intell 2007, 29(2):261271.
Triesch J, Malsburg C: Selforganized integration of adaptive visual cues for face tracking. In Proceedings of IEEE International Conference on Automatic Face Gesture Recognition. Grenoble, France; 2000:102107.
Sheng H, Xiong Z, Weng JN, Wei Q: An approach to detecting abnormal vehicle events in complex factors over highway surveillance video. Sci China Ser E: Technol Sci 2008, 51: 199208. 10.1007/s1143100860114
Sheng H, Li C, Wei Q, Xiong Z: Realtime detection of abnormal vehicle events with multifeature over Highway Surveillance Video. In Proceedings of IEEE International Conference on Intelligent Transportation System. Beijing, China; 2008:550556.
Duan Z, Cai Z, Yu J: Adaptive particle filter for unknown fault detection of wheeled mobile robots. Proceedings of IEEE International Conference on Intelligent Robots and Systems 2006, 13121315.
Levi K, Weiss Y: Learning object detection from a small number of examples: the importance of good features. Comput Vis Pattern Recogn 2004, 2: 5360.
Adam A, Rivlin E, Shimshoni I: Robust fragmentsbased tracking using the integral histogram. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. Volume 1. New York, USA; 2006:798805.
Maggio E, Cavallaro A: Multipart target representation for color tracking. In Proceedings of IEEE International Conference on Image Processing. Volume 1. Genoa, Italy; 2005:729732.
Nummiaro K, KollerMeier E, Gool LJV: An adaptive colorbased particle filter. Image Vis Comput 2003, 21: 99110. 10.1016/S02628856(02)001294
Choeychuen K, Kumhoma P, Chamnongthaia K: Robust ambiguous target handling for visual object tracking. AEU Int J Electron Commun 2010, 64(10):960970. 10.1016/j.aeue.2009.10.005
MorenoNoguer F, Sanfeliu A: A framework to integrate particle filters for robust tracking in nonstationary environments. Pattern Recogn Image Anal 2005, 3522: 93101. 10.1007/11492429_12
Liu H, Yu Z, Zha HB, Zou YX, Zhang L: Robust human tracking based on multicue integration and meanshift. Pattern Recogn Lett 2009, 30(9):827837. 10.1016/j.patrec.2008.10.008
McKenna S, Jabri S, Doric Z, Wechsler H, Rosenfeld A: Tracking groups of people. Comput Vis Image Understand 2000, 80: 4256. 10.1006/cviu.2000.0870
Acknowledgements
This study was supported by the optional research topic from the National Natural Science Foundation of China (No. 61103094). Furthermore, it was also supported by the National High Technology Research and Development Program (863) with the research topic ID 2011AA010502.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Sheng, H., Wei, Q., Li, C. et al. Robust multiplevehicle tracking via adaptive integration of multiple visual features. J Image Video Proc 2012, 2 (2012). https://doi.org/10.1186/1687528120122
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1687528120122
Keywords
 multiple vehicle tracking
 multiple visual features
 adaptive integration
 model updating