 Research
 Open access
 Published:
Hybridizing extended ant lion optimizer with sine cosine algorithm approach for abrupt motion tracking
EURASIP Journal on Image and Video Processing volume 2020, Article number: 4 (2020)
Abstract
In view of the problem that the conventional tracker does not adapt to abrupt motion, we propose a tracking algorithm based on the hybrid extended ant lion optimizer with sine cosine algorithm (EALOSCA) in this paper. Firstly, the multiple elites is used to replace the single elite in the standard ant lion optimizer (ALO). The extended ALO (EALO) can enhance the global exploration ability, which can handle abrupt motion. Secondly, considering that sine cosine algorithm (SCA) has strong local exploitation operator, a hybrid EALOSCA tracker is proposed using the advantages of both EALO and SCA. The proposed approach can improve tracking accuracy and efficiency. Finally, extensive experimental results in both quantitative and qualitative measures prove that the proposed algorithm is very competitive compared to 7 stateoftheart trackers, especially for abrupt motion tracking.
1 Introduction
Visual tracking is a fundamental problem pertinent to many realworld applications including video surveillance, autonomous vehicle navigation, humancomputer interaction, medical imaging, and many more. Although visual tracking has been studied for decades, it remains a challenging problem due to various factors such as partial occlusion, fast or abrupt motion, illumination changes, motion blur, etc. To further improve tracking performance, many tracking methods have been proposed [1, 2]. These methods can be roughly divided into generative model [3, 4] and discriminative model [5, 6].
In recent visual tracking, significant attention has been paid to discriminative correlation filters (DCF) [7, 8] and deep learning [9, 10] based methods. Subsequently, many improved DCFbased methods have been proposed because of the high efficiency, such as scale adaptive kernel correlation filter tracker (SAMF) [11] and accurate scale estimation for robust visual tracking (DSST) [12] for scale variation, spatially regularized correlation filter (SRDCF) [13] and contextaware correlation filter tracking (CACF) [14] to reduce boundary effects, etc. However, most of these methods use manual features, which reduces their accuracy and robustness. Therefore, methods based on deep learning have attracted attention of researchers. Considering the effectiveness of DCF, many methods have been proposed to improve the robustness of the tracker by replacing the manual feature with the depth feature, such as hierarchical convolutional features (HCF) [15], deep features for SRDCF (DeepSRDCF) [16], etc. Of course, another trend of deep trackers is to design a network and pretrain it to improve tracking performance, such as convolutional residual learning for visual tracking (CREST) [17], cascaded region proposal networks (CRPN) [18], etc.
Although achieving stateoftheart performance, most of these methods assume that the tracked target has smooth motion conditions. However, the abrupt or fast motion of targets often occur in realworld scenarios, which will make the traditional methods difficult to track targets well. In order to solve these problems, many methods have been proposed, including detection based tracking methods [19, 20] and motion model based tracking methods [4, 21].
Generally speaking, the first consideration method to handle the problem of abrupt motion is that it has the strong global search ability. Therefore, I would say that it can be regarded as an optimization problem, which has attracted more and more attention of researchers [22]. For example, Gao et al. [23] introduced firefly algorithm into visual tracking, which had a strong superiority toward other trackers in accuracy and speed. Zhang et al. [24] proposed a sequential particle swarm optimization (PSO)based visual tracking, which was integrated sequential information into PSO method to form a robust tracking framework. This method was more robust and effective in arbitrary motion or large appearance changes. Nguyen et al. [25] presented a modified bacterial foraging optimization algorithm, which was introduced into realtime tracking with changes. The method had a good merit compared with PSO and particle filter in speed and accuracy. These proposed methods have some advantages in visual tracking. However, each optimization algorithm has its limitations, there is no single optimization algorithm that can solve all the problems perfectly according to no free lunch (NFL) theorem [26].
As we all know, exploration and exploitation are two important components of any metaheuristic algorithm. The purpose of global exploration is to find the best desired region in a wider search area. Local exploitation is to find a better solution by further searching some areas of the group, based on the prior knowledge or the new information found during the search process. Therefore, an algorithm can improve convergence speed and accuracy by properly balancing exploration and exploitation performance. Hybridizing two algorithm is the latest research trend for solving global optimization to overcome the poor exploration ability of one algorithm and poor exploitation ability of the other algorithm. There are many hybrid metaheuristic algorithms such as hybrid gray wolf optimizer and genetic algorithm [27], krill herddifferential evolution [28], hybrid bat algorithm with harmony search [29], hybrid gravitational search algorithm with PSO [30], etc. Of course, many researchers have introduced hybrid optimization algorithms to visual tracking. Chen et al. [31] proposed a euclid distance based hybrid quantum PSO, which overcame the problem that population diversity reduced during the latter period of evolution in PSO. It improved tracking efficiency and decreased detection time cost. Hao et al. [32] proposed a particle filtering algorithm based on ant colony optimization, which enhanced the performance of particle filter with a small sample set. The method improved the efficiency of visual tracking. Ljouad et al. [33] presented the hybrid Kalman cuckoo search tracker using a modified cuckoo search algorithm combined with the Kalman filter. The proposed algorithm had better performance than PSObased tracker. These hybrid optimization algorithms have improved tracking performance of the original optimization algorithm. In addition, in recent years, we have focused a lot of attention on the problem of abrupt motion tracking. Zhang et al. [34, 35] proposed extended kernel correlation filter (KCF) tracker based on swarm intelligence and extended cuckoo searchbased KCF tracker. A unified framework was designed to capture abrupt motion. The performance of the hybrid approach is better than that of the standard algorithm in these literatures.
In recent years, Mirjalili [36, 37] presented two novel natureinspired algorithms, ALO and SCA, respectively. These two optimization algorithms had been applied in lots of fields successfully [38, 39]. Subsequently, some improved methods were proposed [40–43]. Especially, Mirjalili presented that the SCA could be hybridized with other algorithms in the field of stochastic optimization to improve its performance in[37], and lots of hybrid methods had been applied [44–46]. Thus, a hybrid EALOSCA is proposed in order to solve abrupt motion tracking in the paper (see Fig. 1). The approach leverages the complementary properties between exploratory stage of EALO and the exploitation operator of SCA to improve tracking performance.
The main contribution of our work includes three folds:
EALO, an optimization algorithm called EALO is proposed to enhance the global search ability, thus increasing the diversity of candidate samples to handle abrupt motion.
EALOSCA, a hybrid EALOSCA is proposed using global exploration of EALO and local exploitation of SCA, which can properly balance exploration and exploitation
A unified tracking framework is designed based on EALOSCA, which can improve tracking performance.
2 The basic sine cosine algorithm and ant lion optimizer
2.1 Sine cosine algorithm (SCA)
SCA is a populationbased optimization algorithm that depends on sine and cosine operators, differing from other optimization algorithms [47], for updating the movement of the search agents toward the best solution [37]. Each search agent in the initial population updates its position with respect to the best solution using the Eq. (1).
where \(X_{i}^{t}\) indicates the position of the ith individual at the tth iteration, \({X^ * }_{i}^{t}\) is the position of the best individual obtained so far in ith individual, r_{4} is a random number in [0,1]. r_{1}, r_{2}, and r_{3} are random numbers. r_{4} is a random number in [0,1], and  indicates the absolute value. Basic principle of SCA is represented in Fig. 2.
Figure 2 shows that the current solution will be away from the destination and explore the search space when the individual moves in [−2,−1) or (1,2]. However, the current solution will be toward the destination and exploit the search space when the individual moves in [−1,1].
Exploration means that particles need to be searched in a wider area. Exploitation is that particles are searched locally in small areas. Therefore, the exploration and exploitation phases seek conflicting objects. For SCA, the range of sine and cosine is decreased during optimization using Eq. (2) to balance exploration and exploitation.
where a is a constant, t is the current iteration, and T is the maximum number of iterations. Figure 3 shows how this equation decreases the range of sine and cosine functions with the increase of iterations.
It can be seen from Fig. 3 that the range of the search space gradually decreases and the exploitation becomes more and more important with the increase of iterations. In addition, the range of Fig. 3b is smaller than Fig. 3a. This phenomenon shows that a=2 is easier to enter the exploitation phase than a=3, but the ability to explore is weakened. That is to say, the parameter a leads to balanced exploration and exploitation of the search space. In this study, we set the parameter model to be a=2. In a word, the basic SCA utilizes Eq. (1) as a special path to converge to global optimum. Therefore, the basic SCA performs well in exploitation operator.
2.2 The ant lion optimizer (ALO)
The ALO algorithm mimics the hunting mechanism of antlions in nature [36]. This behavior can be summarized into five main steps: the random walk of ants, building traps, entrapment of ants in traps, catching prey, and rebuilding traps. The following subsections discuss the mathematical model in details.
2.2.1 The random walk of ants
This random walk is chosen for modeling ants’ movement as follows:
where cumsum calculates the cumulative sum, T shows the maximum number of iteration, m is the step of random walk (iteration in this study), and r(m) is a random number 0 or 1. In order to ensure that the ants are walking within the search space, they are normalized using the following equation:
where a_{i} is the minimum of random walk of ith variable, b_{i} shows the maximum of random walk in ith variable, \(c_{i}^{t}\) indicates the minimum of ith variable at tth iteration, and \(d_{i}^{t}\) is the maximum of ith variable at tth iteration.
2.2.2 Building traps
Random walks of ants are affected by antlions’ traps using mathematic model:
where c^{t} is the minimum of all variables at tth iteration, d^{t} is the vector including the maximum of all variables at tth iteration, \(\text {Antlion}_{j}^{t}\) shows the position of the selected jth antlion at tth iteration. In this study, the best antlion obtained so far in each iteration is saved and considered as an elite. Since the elite is the fittest antlion, it should be able to affect the movements of all the ants during iterations. Therefore, the ants’ position update can be expressed as:
where \(R_{A}^{t}\) is the random walk around the antlion selected by the roulette wheel at tth iteration, \(R_{E}^{t}\) is the random walk around the elite at tth iteration, and \(\text {Ant}_{i}^{t}\) indicates the position of ith ant at tth iteration.
2.2.3 Entrapment of ants in traps
When an ant fell into a trap, antlions shoot sands outwards the center of the pit, so ants gradually moved to the antlion. Finally, antlions preyed on ants. To simulate this behavior, c^{t}=c^{t}/I and d^{t}=d^{t}/I are proposed. Parameter I controls the tradeoff between exploration and exploitation in the ALO. I is a ratio (\(I={{10}^{w}}\frac {t}{T}\)). w is described as follows:
where t is the current iteration, T is the maximum number of iterations. w gradually increases, then I gradually increases, but c^{t} and d^{t} gradually decrease.
2.2.4 Catching preys and rebuilding traps
The final stage of hunt is when an ant reaches the bottom of the pit and is caught in the antlion’s jaw. An antlion is then required to update its position to the latest position of the hunted ant to enhance its chance of catching new prey. The following equation is proposed in this regard:
where \(\text {Antlion}_{j}^{t}\) shows the position of selected jth antlion at tth iteration, and \(\text {Ant}_{i}^{t}\) indicates the position of ith ant at tth iteration. The antlion will update its position where the ant is last caught and rebuild the trap to hunt the next ant.
3 The extended ant lion optimizer method
In the section, the motivation and implementation of EALO algorithm are explained in detail.
3.1 Motivation of EALO
In the original ALO algorithm, the updated position of ant for the next generation is given by Eq. (6). However, it is observed that the roulette wheel selection method is more likely to be selected \(R_{E}^{t}\). That is, ants tend to move around single elite. In addition, the information of single elite is extremely limited, which will lead to local trapping problem. Therefore, it is necessary to establish elite library for storing better individual (antlion).
3.2 Implementation of EALO
Based on the above problems, we introduce multiple elites, and they are designed as an elite library (as shown in Eq. (10)). During the early search, multiple elites parallel competition can not only enhance exploration ability of the original ALO algorithm, but also ensure the convergence speed of the algorithm. However, n becomes smaller during the later search, and it can reduce the unnecessary movement of ants. In conclusion, EALO enhances the optimization capability, meanwhile, the efficiency of the original ALO has improved. Elite library is introduced into Eq. (6)in order to achieve these effects and the resultant modified equation is written as in Eq. (9).
where n is the number of elites, t is the current iteration, T is the maximum number of iteration, \(R_{En}^{t}\) is the random walk around the nth elite at tth iteration. Elite library is described as follows:
where f is fitness function, N is the number of antlions, → is the former n antlions as the elite library of the next iteration.
Equations (9), (10), and (11) are used instead of Eq. (6) in the original ALO to form a new optimization algorithm, which we call EALO. This algorithm enhances the global optimization ability through parallel cooperation of multiple elites and avoids trapping in local optima. The pseudocode of the EALO is as shown in Algorithm 1.
The main difference between ALO and EALO is how the ants’ location is updated. In addition, the EALO contains the concept of elite library compared with ALO.
3.3 Performance evaluation of EALO
In order to verify the feasibility of our proposed method, we selected the #0043 frame of the DEER video sequence, as shown in Fig. 4, which had two similar targets. Further, to make a fair comparison, the same target model (histogram of oriented gradient, HOG), motion model (random walk model), and parameters were used. As aforementioned, there are three main parameters, namely Np (population size), T (the number of iterations), and I (the shrinkage factor, the value of I heavily depends on w), respectively. In this study, n=75; T=200; t>0.5T,w=1;t>0.7T,w=2;t>0.9T, and w=2.7.
Figure 4 shows the searching process using adaptive boundary shrinking mechanism. Firstly, at the beginning of the first layer, we guarantee that original ALO and EALO have the same initial position of search agent. Secondly, in the second layer, all search agents in the original ALO algorithm always randomly walk around the false DEER. In contrast, EALO has several different search areas since it has elite library. Thirdly, in the third and fourth levels, the original ALO algorithm always searches around the false DEER and it is difficult to jump out of the local optimum because of single elite. However, the EALO can track the target successfully. And as shown by Eq. (10), the number of elites decreases with the increase of iterations. Therefore, in the later stages of the search, EALO can reduce unnecessary search space and ensure convergence speed. Moreover, we repeat this experiment 60 times. The original ALO is successful 47 times, and the success rate is 78.3%. The EALO is successful 60 times, and the success rate is 100%. Considering the above, the proposed EALO has some advantages in this paper.
4 The EALOSCA tracking method
In this paper, in order to enhance the exploitation performance of EALO, a hybrid EALOSCA is proposed, which combines the better exploratory performance of EALO with the better exploitation performance of SCA. Visual tracking is considered to be a process of searching for target by various ants and antlions in sequential images. To this end, an EALOSCAbased tracking framework is presented as shown in Fig. 5.
As shown in Fig. 5, the target is first chosen by the user in the first frame, and the state vector X=[x,y,s] is initialized, where [x,y] denotes the location of the target pixel coordinates, and s is the scale parameter. Secondly, according to a dynamic model, the predicted position of the target is given in the next frame. Then, the features of candidate sample generated by EALOSCA and template are extracted. Finally, the target candidate with the largest similarity value is found by similarity function, which is the target. In this work, our main contribution is how to switch thresholds between EALO and SCA to generate candidate sample images.
4.1 Basic idea
Although EALO enhances significantly in the exploratory phase and avoids falling into local optimum, the method suffers from a lack of an efficient exploitation operator due to random walk rather than specific paths. Conversely, the SCA has an efficient exploitation operator and suffers from a lack of an exploration phase, and SCA adopts a specific path to the target solution (the best solution so far). Therefore, a novel hybrid EALOSCA is proposed, which can balance exploration and exploitation. In hybrid EALOSCA, EALO algorithm is utilized for global search, which makes most solution move toward a more promising area. After exploration phase, SCA is utilized to search locally with a small step to get the best solution.
The first advantage of the hybrid algorithm is to introduce the updated formula of SCA into EALO, which will force all solutions to update their positions faster toward the target solution. This helps the EALO to enhance exploitation operator. The second advantage of the hybrid algorithm is that it can enhance the performance of SCA in the exploratory phase. Because the solution in SCA is limited to [−2,2] and cannot move further away from the destination.
Considering the above, EALOSCA has obvious advantages over the EALO and SCA.
4.2 The proposed tracking method
Suppose there is target in the image being searched. And a group of target candidates are randomly generated in the image. The aim of EALOSCA tracker is to find the “best” candidate using the proposed hybrid algorithm. In the EALOSCA tracker, ants and antlions can be considered as search candidates and storage candidates, respectively. They are both target candidates. Search candidates and storage candidates are the locations generated by ants and antlions using the EALOSCA. In addition, ants and antlions are mixed, and the individuals with high fitness value are selected as antlions (storage candidates).
4.2.1 The fitness function
The fitness function measures the values of correlation between the target and the target candidate. In addition, the HOG is the structural feature of the extracted edge with well geometric and optical invariance. Therefore, their similarity is computed as:
where D(·) denotes the variance and Cov(·) denotes covariance. X and Y are the HOG feature of the target and candidate sample respectively. The fitness function is defined as follows:
where ρ(X,Y) indicates the value of the correlation coefficient, and the function E(s) to be minimized is analogous to the internal energy of the system. Finally, the optimal value points to the best target candidate as the target image.
4.2.2 The global motion model
Recently, tracking algorithms, using motion models to capture uncertain motion, have attracted much attention. In traditional visual tracking, a random walk model, a nearly constant velocity model, and a Gaussian distribution model are commonly selected. Obviously, the three models cannot solve abrupt motion tracking effectively. Therefore, we design a global motion model to estimate the position of candidate image samples.
where x=[x,y] is vector and it denotes a pixel location, K is the model parameter and is obtained as follows:
In the next iteration, the model selection depends on the parameter K. R is a random number in [0,1]. P shows the threshold and its analysis is detailed in section 4.3. Equations (14) and (15) are used to design a global motion model, which combines the effective exploratory stage of EALO with the effective exploitation operator of SCA. The purpose of global motion model is to solve abrupt motion tracking.
4.2.3 The tracking implement
In the EALOSCA tracker, target candidates are firstly generated by motion model of EALOSCA. Then, the features of template and target candidate are extracted, and the best target candidate is found by similarity measurement. Finally, the best target candidate is used as the output of the current frame and the template of the next frame. In addition, EALOSCA tracker makes use of EALO’s better global search ability and SCA’s better local search ability to improve tracking accuracy and efficiency. The pseudocode of proposed hybrid EALOSCA tracker is shown in Algorithm 2.
4.3 Parameters’ sensitivity and adjustment
Parameter tuning is often a key aspect in optimization algorithms. The speed and accuracy should be considered simultaneously during the parameter tuning. In hybrid EALOSCA tracker, there are four main parameters, namely Np (the numbers of ants and antlions), I (the shrinkage factor), T (the numbers of iteration), and P (threshold), respectively. In this study, we discuss two parameters Np and P; then, the parameter I and T are designed to be t>0.5T,w=1;t>0.7T,w=2;t>0.9T,w=2.7 and 500. We choose BOY video sequence to test tracking performance because the target undergoes fast motion and motion blur caused by camera shaking. In this video sequence, their maximum displacement between frames reaches 21 pixels. The BOY video sequence is available on the website http://www.visualtracking.net.
We first analyze the population size Np. Other parameters are fixed. We test the performance of the EALOSCA tracker by calculating the number of lost frames, the number of frames where the distance precision between its upper left corner and the groundtruth is more than 17, and costing time. The result is shown in Fig. 6.
Figure 6 shows that when the population size Np<150, the number of lost frames relatively decreases along with the increase of Np. In other words, the accuracy is not achieved. However, when the population size Np>150, a large amount of time is wasted. That is to say, the tracking efficiency is reduced. Therefore, comprehensive considering the tracking accuracy and speed, we set the population size of the parameter model to be Np=150.
Another parameter is the threshold P. The proper threshold P makes an appropriate tradeoff between exploration and exploitation, which is key factor for a good tracking result. The Euclidean distance between the groundtruth and estimated by the different thresholds is calculated in each frame visually, and the comparisons results are shown in Fig. 7. It can be seen from Fig. 7 that the EALOSCA performs well in the BOY video sequence at threshold P=0.5. It can track the target successfully. However, other parameters make the method fail to track the whole BOY video sequence successfully.
In addition, we show partial sequences of BOY video to visualize the results as shown in Fig. 8. All the selected parameters P can track by the tracker except for P=0.3 at frame #0085. Only the selected parameters P=0.5 and P=0.1 can track the target at frame #0115, and P=0.5 completes the whole video sequence successfully. Thus, we set the threshold of the parameter model to be P=0.5.
5 Experimental results and discussion
We implemented the proposed tracker in MATLAB R2014a. The experiments were conducted on a PC with Intel Core i57500 3.40GHz and 8GB RAM. In order to verify the effectiveness of EALOSCA tracker, the experiment is divided into three parts:
Comparison of tracking performance among EALOSCA, SCA, ALO, and EALO
Comparison of the stateoftheart trackers
Discussion
In these experiment, the parameters of EALOSCA are set as follows: the threshold (P=0.5); EALO, the shrinkage factor (t>0.5T,w=1;t>0.7T,w=2;t>0.9T,w=2.7; t is the current number of iteration); and SCA, the constant (a=2), r_{2}∈[0,2π], r_{3}∈[0,2] and r_{4}∈[0,1]. The parameters of SCA, ALO, and EALO are consistent with EALOSCA.
5.1 Comparison of tracking performance among EALOSCA, SCA, ALO, and EALO
Figure 9 shows the relation between the convergence accuracy and the parameters setup, including the population size and the iteration number. The Xaxis represents the population size, the Yaxis shows the iteration number, and the Zaxis denotes the similarity value.
It can be clearly seen from Fig. 9 that SCAbased tracker has the best result when Np=50 and T=60, and its fitness value is 0.9961. It still shows poor performance because SCA does not have good global exploratory performance. The result of EALObased tracker is better than that ALObased tracker because EALO has the ability to jump out of local optimum. In addition, the tracker based on EALOSCA shows the best performance and requires less population size and iteration number to achieve stable tracking results than the other three optimization algorithms. EALOSCA tracker combines EALO’s global exploration with SCA’s local exploitation, which makes an appropriate tradeoff between exploration and exploitation for improving tracking performance.
5.2 Comparison of the stateoftheart trackers
To prove the feasibility of the proposed tracker, we selected 12 video sequences in the experiment. The source of the FACE1 is the dataset AVSS2007. ZT, FHC, and ZXJ are our own. Other sequences are available on the website http://www.visualtracking.net (listed in Table 1). Note that we extracted the frames 306310 in BLURFACE video sequence, which could represent the problem of frame dropping. Thus, it can provide scenes for abrupt motion. The population size and iteration number are set to 150 and 500, respectively, in EALOSCA tracker.
We compared our tracker EALOSCA with 7 stateoftheart trackers(including DSST [12], fast compressive tracking (FCT) [48], kernelized correlation filters (KCF) [49], exploiting the circulant structure of krackingbydetection with kernels (CSK) [50], fast tracking via spatiotemporal context learning (STC) [51], least softthreshold squares tracking(LSST) [52], and CACF [14]).In order to ensure the consistency of the experiment, the parameters of our tracker are consistent in the experiment. The experimental data of other trackers come from the experimental results executed by source code. In this experiment, we divided the 12 video sequences into 3 groups based on the target’s displacement between image frames, including the slight motion group, the middle motion group, and the large motion group. The displacement of slight motion group is less than 35 pixels, including the MHYANG, FISH, BOY, and HUMAN7 video sequences. The middle motion group contains the JUMPING, DEER, FACE1, and ZXJ video sequences, whose motion displacement is more than 35 pixels and less than 75 pixels. The displacement of large motion group is more than 75 pixels, the video sequences are BLURBODY, FHC, BLURFACE, and ZT, respectively.
Additionally, to evaluate the accuracy of the trackers, the videos were manually labeled by identifying the upper left corner of the tracked target in each frame visually. The tracking results were evaluated by using distance precision (DP), center location error (CLE), and overlap precision (OP) in [53]. DP is the relative number of frames in the sequence where the center location error is smaller than a certain threshold. CLE is computed as the average Euclidean distance between the groundtruth and tracking result. OP is defined as the percentage of frames where the bounding box overlap exceeds a threshold t∈(0,1).
5.2.1 Qualitative analysis
5.2.1.1 The slight motion group
The tracking results with the slight motion are shown in Fig. 10. MHYANG video sequence has smallest motion at 7 pixels, it has the larger illumination changing at frame #0378. FCT has slightly worse tracking results, while others have similar performance. In FISH video sequence, it is obvious that there is camera shake at frame #0058 and #0312, and the brightness is dimmed at frame #0178. Nearly all trackers can catch up with the target successfully except for CSK. Our tracker and KCF perform best. BOY video sequence has severely blurred at frame #0330 because of camera shaking. At first, LSST is failure at frame #0330, #0508, and #0595, respectively. Then, STC is failure at frame #0508 and #0595, respectively, and CSK deviates the target because of serious blur. At the end, CSK is failure at frame #0595, and FCT slightly drifted away from the target. Other trackers complete the whole video sequence. For HUMAN7 video sequence, it is covered by tree shadows several times at frame #0090, #0168, and #0208. All trackers are failure except for CACF and our tracker. Although the target is shaded under the tree, our tracker still can track the target and get the better performance. All in all, our tracker can keep the better performance with the slight motion group.
5.2.1.2 The middle motion group
In order to verify the excellent performance of our tracker, we continue to enhance the motion displacement. The tracking results with the middle motion are shown in Fig. 11. It is obvious that JUMPING video sequence has been seriously blurred because of the camera shaking. In this case, CSK, DSST, FCT, KCF, and STC lose the target unfortunately before the frame #0096. Although our method drifts the object slightly, it recovers quickly. And our tracker has a better performance. In DEER video sequence, the target undergoes fast motion and multiple similar targets. DSST, KCF, STC, and CACF have been lost target at frame #0032. Although KCF and CACF recover on several frames, they still cannot complete the whole video sequence. However, FCT, CSK, and our tracker complete the whole video sequence, and CSK and our tracker obtain the better tracking results. FACE1 video sequence displays that the scale changes occur in video frames. Obviously, CSK and LSST have lost target at frame #0210. Although other methods work well, DSST shows the best performance. In ZXJ video sequence, the target experiences abrupt motion. All tracks can track the target before at frame #0042, but all tracks deviate the target besides CACF and our tracker at frame #0069. In addition, LSST deviates the target because of abrupt motion, but it can recover tracking. Our tracker obtains the best performance. On the whole, for the middle motion group, our tracker achieves the better track performance compared with other track methods.
5.2.1.3 The large motion group
We continue to enhance the motion displacement. For the large motion group, we choose BLURBODY, FHC, BLURFACE, and ZT video sequences, and their maximum displacements are 76, 188, 202, and 256 pixels, respectively. The tracking results with the large motion are shown in Fig. 12. In BLURBODY video sequence, the target goes through a large displacement, and there was a very severe blur because of the camera shaking. Firstly, LSST is failure at frame #0100. Then, CSK and STC are also failure at frame #0212. Finally, all tracks lose the target at frames #0271 except for CACF and our tracker. Our tracker obtains the best performance. For FHC video sequence, the maximum displacements are 188 pixels. Our tracker performs much better than other trackers, while other trackers all fail at frame #0073 except for CACF and our tracker. For BLURFACE video sequence, we design the problem of the frame dropping. And there has a severe motion blur because of fast motion at the frames #0153, #0241, and #0310. Other trackers are failure, but our tracker still can track the target successfully. ZT video sequence has the largest motion at 256 pixels, all tracks deviate or lose the target except for CACF and our tracker at frame #0046. Although some tracking methods recover at frame #0105, they still cannot complete the whole video sequence. Therefore, our tracker and CACF obtain the better performance. In a word, our tracker has a strong superiority toward other trackers for the larger motion group.
5.2.2 Quantitative analysis
Figures 13 and 14 show the OP and DP of 12 different video sequences respectively. For Fig. 13, the Xaxis represents the threshold, and the Yaxis represents the ratio between the number of frames, overlap is greater than the threshold, and the total number of frames. The larger the area under the curve, the better the tracker. For Fig. 14, the Yaxis also shows the ratio between the number of frames, distance of predicted and ground truth bounding box is below the threshold, and the total number of frames. The higher the slope, the better the tracking performance, this is because the center distance of more sequences is lower than the threshold. Figures 13 and 14 can vividly show the performance of different tracking methods, including DSST, FCT, KCF, CSK, STC, LSST, CACF and our tracker.
In addition, in order to clearly observe the tracking results of different trackers, we utilize data to display the tracking results. Tables 2 and 3 list a persequence comparison of our tracker to DSST, FCT, KCF, CSK, STC, LSST, and CACF, while Table 2 is concerned with average overlap rate and Table 3 refers to average center error rate. For the average overlap rate, the larger the value, the higher the tracking accuracy; however, for the center error rate, the smaller the value, the higher the tracking accuracy. In the tables, the best and second best results are shown in bold and italic. As can be seen from Tables 2 and 3, our tracker has a strong advantage in large displacement video sequences. At the same time, our tracker performs best on average in all videos.
As a result, it is obviously seen in Figs. 13, 14, Tables 2 and 3 that our tracker performs much better than 7 other trackers especially for large displacement video sequences. In other video sequences, our tracker also shows a better performance. As a whole, the proposed tracker has a good merit for abrupt motion tracking compared with other trackers.
5.3 Discussion
All the aforementioned experiments have validated the proposed tracker in smooth and abrupt motions. For these 12 video sequences, ZXJ, FHC, and ZT video sequences mainly go through abrupt motion. FACE1 video sequence faces scale variation, motion blur, and abrupt motion. The challenges of other video sequences can be seen in [53, 54].
FCT employs nonadaptive random projections, and through training compressive samples the classifier is update in time. It can handle pose variation well (e.g. MHYANG video sequence). However, it adopts sampling Haarlike features to represent the object appearance; when target experiences motion blur and the occlusion, the method loses or drifts the target frequently (e.g., HUMAN7, JUMPING, BLURBODY, and BLURFACE video sequences). In addition, FCT obtains samples nearby the target position in the previous frame. Therefore, when the target appears large displacement motion, the sampling radius could not cover the uncertainty motion so that it is difficult to maintain tracking (e.g., ZXJ, FHC, and ZT video sequences).
CSK uses a density sampling strategy to process multiple subwindows in a frame and classify them into a cyclic matrix. STC and CACF use context information to design a tracking framework so that they can handle background clutters very well (e.g., MHYANG video sequence). The main idea of the former is to model the spatiotemporal relationships between the object of interest and its locally dense contexts in a Bayesian framework. The latter presents a framework that allows the explicit incorporation of global context within correlation filter trackers. Therefore, CACF performs well in the above video sequences.
DSST and LSST show the relationship between all the candidates and the targets in different forms. They perform well in the smooth motion (e.g., MHYANG and FISH video sequences). However, when the tracking object encounters motion blur (e.g., HUMAN7 video sequence), or abrupt motion (e.g., BLURBODY, FHC, BLURFACE, and ZT video sequences), they perform the bad tracking results. The main difference between KCF and DSST is that DSST has scale variation. Therefore, the tracking result of DSST is generally better than that of KCF. However, they are hard to cover the uncertain motion states, which performs poor (e.g., ZXJ, BLURBODY, FHC, BLURFACE, and ZT video sequences).
Our proposed method shows the better performance in the smooth motion and abrupt motion since the method benefits the mechanism by using EALO and SCA strategies to produce samples. As shown in Tables 2 and 3, our tracker can maintain good performance in smooth motion (e.g., MHYANG, FISH, BOY, and HUMAN7 video sequences). Further, it exhibits the best tracking results in abrupt motion (e.g., ZXJ, BLURBODY, FHC, and BLURFACE video sequences). Meanwhile, the method also can cope with illumination variation and motion blur well.
6 Conclusion
A hybrid optimization called EALOSCA is proposed in this paper, for solving abrupt motion tracking. Firstly, elite library is introduced into standard ALO, which enhances the global optimization capability to adapt to abrupt motion. Secondly, EALO enhances exploitation operator by hybridizing with SCA to improve tracking accuracy. Finally, a model switching method is designed and a unified tracking framework is constructed, which makes an appropriate tradeoff between exploration and exploitation for improving tracking performance. The experimental results show that the algorithm has better tracking performance, especially for abrupt motion.
The contribution of this paper attempts to utilize swarm optimization method to handle abrupt motion. However, the method obtains a large number of iterations to ensure tracking accuracy, this will lead to a large amount of time consumption.
In the future, we will design a unified tracking framework based on stateoftheart trackers and swarm optimization methods, so that the realtime tracking can be achieved by using swarm optimization when only the target undergoes large displacement between image frames.
Availability of data and materials
Please contact author for data requests.
Abbreviations
 ALO:

Ant lion optimizer
 CRPN:

Cascaded region proposal networks
 CACF:

Contextaware correlation filter tracking
 CLE:

Center location error
 CREST:

Convolutional residual learning for visual tracking
 CSK:

Exploiting the circulant structure of trackingbydetection with kernels
 DeepSRDCF:

Deep features for SRDCF
 DCF:

Discriminative correlation filters
 DP:

Distance precision
 DSST:

Accurate scale estimation for robust visual tracking
 EALOSCA:

Extended ant lion optimizer with sine cosine algorithm
 EALO:

Extended ant lion optimizer
 FCT:

Fast compressive tracking
 HCF:

Hierarchical convolutional features
 HOG:

Histogram of oriented gradient
 KCF:

Kernel correlation filter
 LSST:

Least softthreshold squares tracking
 NFL:

No free lunch
 OP:

Overlap precision
 PSO:

Particle swarm optimization
 SAMF:

Scale adaptive kernel correlation filter tracker
 SCA:

Sine cosine algorithm
 SRDCF:

Spatially regularized correlation filter
 STC:

Spatiotemporal context learning
References
T. Zhou, H. Bhaskar, F. Liu, et al., Graph regularized and localityconstrained coding for robust visual tracking. IEEE Trans. Circ. Syst. Video Technol.27(10), 2153–2164 (2017).
T. Zhou, F. Liu, H. Bhaskar, et al., Robust visual tracking via online discriminative and lowrank dictionary learning. IEEE Trans. Cybern.48(9), 2643–2655 (2018).
T. Zhang, B. Ghanem, S. Liu, et al., Robust visual tracking via exclusive context modeling. IEEE Trans. Cybern.46(1), 51–63 (2016).
H. Zhang, S. Hu, X. Zhang, Sift flow for largedisplacement object tracking. Appl. Opt.53(27), 6194–6205 (2014).
H. Zhang, Y. Wang, L. Luo, et al., Sift flow for abrupt motion tracking via adaptive samples selection with sparse representation. Neurocomputing. 249(2), 253–265 (2017).
S. Hare, S. Golodetz, A. Saffari, et al., Struck: Structured output tracking with kernels. IEEE Trans. Pattern. Anal. Mach. Intell.38(10), 2096–2109 (2016).
M. Danelljan, A. Robinson, F. S. Khan, et al., in Computer Vision – ECCV 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking (Springer International Publishing, 2016), pp. 472–488. https://doi.org/10.1007/9783319464541_29.
M. Danelljan, G. Bhat, F. S. Khan, et al., in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Eco: Efficient convolution operators for tracking (IEEE, 2017). https://doi.org/10.1109/cvpr.2017.733.
D. Held, S. Thrun, S. Savarese, in Computer Vision – ECCV 2016. Learning to track at 100 fps with deep regression networks (Springer International Publishing, 2016), pp. 749–765. https://doi.org/10.1007/9783319464480_45.
J. Valmadre, L. Bertinetto, J. F. Henriques, et al., in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Endtoend representation learning for correlation filter based tracking (IEEE, 2017). https://doi.org/10.1109/cvpr.2017.531.
Y. Li, J. Zhu, in European Conference on Computer Vision (ECCV). A scale adaptive kernel correlation filter tracker with feature integration, (2014), pp. 254–265. https://doi.org/10.1007/9783319161815_18.
M. Danelljan, G. Häger, F. S. Khan, et al., in Proceedings of the British Machine Vision Conference 2014. Accurate scale estimation for robust visual tracking (British Machine Vision Association, 2014). https://doi.org/10.5244/c.28.65.
M. Danelljan, G. Hager, F. Shahbaz Khan, et al., in 2015 IEEE International Conference on Computer Vision (ICCV). Learning spatially regularized correlation filters for visual tracking (IEEE, 2015). https://doi.org/10.1109/iccv.2015.490.
M. Mueller, N. Smith, B. Ghanem, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Contextaware correlation filter tracking (IEEE, 2017). https://doi.org/10.1109/cvpr.2017.152.
C. Ma, J. B. Huang, X. Yang, et al., in 2015 IEEE International Conference on Computer Vision (ICCV). Hierarchical convolutional features for visual tracking (IEEE, 2015). https://doi.org/10.1109/iccv.2015.352.
M. Danelljan, G. Hager, F. Shahbaz Khan, et al., in 2015 IEEE International Conference on Computer Vision Workshop (ICCVW). Convolutional features for correlation filter based visual tracking (IEEE, 2015). https://doi.org/10.1109/iccvw.2015.84.
Y. Song, C. Ma, L. Gong, et al., in 2017 IEEE International Conference on Computer Vision (ICCV). Crest: Convolutional Residual Learning For Visual Tracking (IEEE, 2017). https://doi.org/10.1109/iccv.2017.279.
H. Fan, H. Ling, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Siamese cascaded region proposal networks for realtime visual tracking, (2019), pp. 7952–7961. https://arxiv.org/pdf/1812.06148.pdf.
D. A. Migliore, M. Matteucci, M. Naccari, in Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks – VSSN’06. A revaluation of frame difference in fast and robust motion detection (ACM Press, 2006). https://doi.org/10.1145/1178782.1178815.
A. Elgammal, D. Harwood, L. Davis, in European Conference on Computer Vision (ECCV). Nonparametric model for background subtraction, (2000), pp. 751–767. https://doi.org/10.1007/354045053X_48.
M. Khan, M. Valstar, T. Pridmore, in Proc. Brit. Mach. Vis. Conf. (BMVC). A multiple motion model tracker handling occlusion and rapid motion variation, (2013), pp. 1–12. http://www.bmva.org/bmvc/2013/Workshop/0006.pdf.
H. Zhang, X. Zhang, Y. Wang, et al., An experimental comparison of swarm optimization based abrupt motion tracking methods. IEEE Access. 6:, 75383–75394 (2018).
M. Gao, X. He, D. Luo, et al., Object tracking using firefly algorithm. IET Comput. Vis.7(4), 227–237 (2013).
X. Zhang, W. Hu, S. Maybank, in 2008 IEEE Conference on Computer Vision and Pattern Recognition. Sequential particle swarm optimization for visual tracking (IEEE, 2008). https://doi.org/10.1109/cvpr.2008.4587512.
H. Nguyen, B. Bhanu, in 2012 IEEE Ninth International Conference on Advanced Video and SignalBased Surveillance. Realtime pedestrian tracking with bacterial foraging optimization (IEEE, 2012). https://doi.org/10.1109/avss.2012.60.
D. H. Wolpert, W. G. Macready, No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1:, 67–82 (1997).
M. A. Tawhid, A. F. Ali, A hybrid grey wolf optimizer and genetic algorithm for minimizing potential energy function. Memetic Comput.9(4), 347–359 (2017).
G. G. Wang, A. H. Gandomi, A. H. Alavi, et al., Hybrid krill herd algorithm with differential evolution for global numerical optimization. Neural Comput. Appl.25(2), 297–308 (2014).
G. Wang, L. Guo, A novel hybrid bat algorithm with harmony search for global numerical optimization. J. Appl. Math.2013:, 1–21 (2013). https://doi.org/10.1155/2013/696491.
S. Mirjalili, S. Hashim, International Conference on Computer and Information Application (ICCIA). A new hybrid PSOGSA algorithm for function optimization (IEEE, 2010), (2010). https://doi.org/10.1109/ICCIA.2010.6141614.
J. Chen, Y. Zhen, D. Yang, et al., Fast moving object tracking algorithm based on hybrid quantum PSO. WSEAS Trans. Comput. 12(10), 375–383 (2013).
Z. Hao, X. Zhang, P. Yu, et al., in 2010 2nd International Conference On Advanced Computer Control (ICACC), vol. 3. Video object tracing based on particle filter with ant colony optimization (IEEE, 2010), pp. 232–236. https://doi.org/10.1109/icacc.2010.5486857.
T. Ljouad, A. Amine, M. Rziza, A hybrid mobile object tracker based on the modified Cuckoo search algorithm and the Kalman filter. Pattern Recog.47(11), 3597–3613 (2014).
H. Zhang, J. Zhang, Q. Wu, et al., Extended kernel correlation filter for abrupt motion tracking. KSII Trans. Internet Inf. Syst.11(9), 4438–4460 (2017).
H. Zhang, X. Zhang, Y. Wang, et al., Extended Cuckoo searchbased kernel correlation filter for abrupt motion tracking. IET Comput. Vis.12(6), 763–769 (2018).
S. Mirjalili, The ant lion optimizer. Adv. Eng. Softw.83:, 80–98 (2015).
S. Mirjalili, Sca: A sine cosine algorithm for solving optimization problems. Knowl.Based Syst.96:, 120–133 (2016).
H. M. Zawbaa, E. Emary, B. Parv, in 2015 Third World Conference on Complex Systems (WCCS). Feature selection based on antlion optimization algorithm (IEEE, 2015), pp. 1–7. https://doi.org/10.1109/icocs.2015.7483317.
A. F. Attia, R. A. E. Sehiemy, H. M. Hasanien, Optimal power flow solution in power systems using a novel sinecosine algorithm. Int. J. Electr. Power Energy Syst.99:, 331–341 (2018).
E. Emary, H. M. Zawbaa, A. E. Hassanien, Binary ant lion approaches for feature selection. Neurocomputing. 213(12), 54–65 (2016).
Y. Peng, H. Wang, Dynamic adaptive ant lion optimizer applied to route planning for unmanned aerial vehicle. Soft Comput.21(18), 5475–5488 (2017).
R. Sindhu, R. Ngadiran, Y. M. Yacob, et al., Sinecosine algorithm for feature selection with elitism strategy and new updating mechanism. Neural Comput. Applic.28(10), 2847–2958 (2017).
K. S. Reddy, L. K. Panwar, B. K. Panigrahi, et al., A new binary variant of sinecosine algorithm: Development and application to solve profitbased unit commitment problem. Arab. J. Sci. Eng.43(8), 4041–4056 (2018).
H. Nenavath, D. R. K. Jatoth, D. S. Das, A synergy of the sinecosine algorithm and particle swarm optimizer for improved global optimization and object tracking. Swarm Evol. Comput.43:, 1–30 (2018).
H. Nenavath, R. K. Jatoth, Hybridizing sine cosine algorithm with differential evolution for global optimization and object tracking. Appl. Soft Comput.62:, 1019–1043 (2018).
S. Khalilpourazari, S. Khalilpourazary, SCWOA: An efficient hybrid algorithm for parameter optimization of multipass milling process. J. Ind. Prod. Eng.35(3), 135–147 (2018).
V. Dabra, K. K. Paliwal, P. Sharma, et al., Optimization of photovoltaic power system: a comparative study. Protect. Control Mod. Power Syst.2(2), 29–39 (2017).
K. Zhang, L. Zhang, M. H. Yang, Fast compressive tracking. IEEE Trans. Pattern. Anal. Mach. Intell.36(10), 2002–2015 (2014).
J. F. Henriques, R. Caseiro, P. Martins, et al., Highspeed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence. 37(3), 583–596 (2015).
J. F. Henrigues, R. Caseiro, P. Martins, et al., in Computer Vision – ECCV 2012. Exploiting the circulant structure of trackingbydetection with kernels (Springer Berlin Heidelberg, 2012), pp. 702–715. https://doi.org/10.1007/9783642337659_50.
K. Zhang, L. Zhang, Q. Liu, et al., in European conference on computer vision (ECCV). Fast visual tracking via dense spatiotemporal context learning, (2014), pp. 127–141. https://doi.org/10.1007/9783319106021_9.
D. Wang, H. Lu, M. Yang, Robust visual tracking via least softthreshold squares. IEEE Trans. Circ. Syst. Video Technol.26(9), 1709–1721 (2016).
Y. Wu, J. Lim, M. H. Yang, Online object tracking: A benchmark (IEEE, 2013). https://doi.org/10.1109/cvpr.2013.312.
Y. Wu, J. Lim, M. H. Yang, Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015).
Acknowledgements
The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.
Funding
This work was supported in part by the National Natural Science Foundation of China (61873246, 61672471), the Graduate Innovation Foundation of Zhengzhou University of Light Industry (2018024).
Author information
Authors and Affiliations
Contributions
HLZ designed the algorithm. ZG carried out the experiments and drafted the manuscript. JZ gave suggestions on the structure of manuscript and participated in modifying the manuscript. JFL and ZCN built the dataset and gave suggestions on the experimental analysis. JWZ participated in the survey of large motion and gave suggestions on the experimental analysis. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Zhang, H., Gao, Z., Zhang, J. et al. Hybridizing extended ant lion optimizer with sine cosine algorithm approach for abrupt motion tracking. J Image Video Proc. 2020, 4 (2020). https://doi.org/10.1186/s136400200491y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s136400200491y