Skip to main content

On the optical flow model selection through metaheuristics

Abstract

Optical flow methods are accurate algorithms for estimating the displacement and velocity fields of objects in a wide variety of applications, being their performance dependent on the configuration of a set of parameters. Since there is a lack of research that aims to automatically tune such parameters, in this work, we have proposed an optimization-based framework for such task based on social-spider optimization, harmony search, particle swarm optimization, and Nelder-Mead algorithm. The proposed framework employed the well-known large displacement optical flow (LDOF) approach as a basis algorithm over the Middlebury and Sintel public datasets, with promising results considering the baseline proposed by the authors of LDOF.

1 Introduction

Optical flow estimation is one of the most important research areas in computer vision, and it aims at identifying the patterns of motion of objects and surfaces in a visual scene, i.e., to approximate the motion field from a time-varying image intensity. The literature is wide, being some very recent works related to optical flow estimation using Laplacian mesh structures [1], total generalized variation [2], probabilistic motion detection [3], and as an optimization problem in a high-dimensional motion field [4], just to name a few. The importance of optical flow estimation can be evidenced in image segmentation [5], rigid object reconstruction [6], cell tracking [7], video stabilization [8], among others. Some parallel-based implementations can be found in [9-11] as well.

Recently, Sun et al. [12] stressed that the theoretical foundations of a broad range of optical flow methods have changed little since the seminal work of Horn and Schunck [13]. Basically, they argued that, although the results have improved over the past years, the vast majority of optical flow methods rely on the same basis of the work proposed by Horn and Schunck. Another shortcoming related to the optical flow-based techniques relies on the estimation of their parameters, which poses a big challenge to the field. Since most of techniques are parameter-dependent, a grid-search for a set of near-optimal/optimal parameters in video content may not be a viable task [14]. Therefore, many works often set the parameters by hand, which may limit our understanding about how well the considered optical flow method can generalize unseen data. As a matter of fact, the problem of estimating the parameters of optical flow techniques may be seen as a large-scale learning problem. Albeit, we usually need to estimate a few parameters only, the huge amount of data to be processed for such estimation in video datasets demands a high computational effort.

Although the reader can face several works that cope with the problem of estimating/calibrating camera parameters, only a few of them deal with the problem of parameter estimation in optical flow techniques. Heas et al. [15] and Krajsek and Mester [16], for instance, employed a Bayesian optimization framework for such purpose, and Li and Huttenlocher [17] presented an interesting stochastic optimization approach based on Markov random fields for optical flow parameter estimation. The authors state several arguments concerning the advantages of optimizing an error criterion instead of using a maximum likelihood approach for that, as employed by the work of Roth and Black [18]. The reader can refer to a few other works that model the task of optical flow parameter estimation as an optimization task by means of metaheuristic techniques. Delpiano et al. [19], for instance, proposed a multi-objective approach for parameter estimation aiming at optimizing both the training loss and the computational load. Later on, Pereira et al. [20] applied metaheuristic optimization algorithms for the same task considering the large displacement optical flow (LDOF) technique [21], being the results of social-spider optimization (SSO) [22], harmony search (HS) [23], and particle swarm optimization (PSO) [24] compared against each other in the well-known Middlebury dataset. Although one can find several other optical flow-based implementations out there [25-27], we opted to use LDOF due to its simplicity, reliability, and good rank in Sintel website. LDOF implementation is very accurate, but computationally expensive, thus being an interesting choice for applications that require high accuracy, but does not require very short execution times.

In order to fill the lack of research regarding model selection in optical flow environments, we extended the work of Pereira et al. [20] by adding two more optimization techniques, being one of them based on exact computations called Nelder-Mead (NM) [28] and the other one a ‘baseline’ using the parameters proposed by the authors of the LDOF technique [21], as well as we added one more dataset to the experimental section. Additionally, the work of Pereira et al. [20] proposed a local approach to estimate the parameters of LDOF: in short, the idea of their work is to optimize each sequence separately and then to employ the best set of parameters to optimize the remaining sequences. We propose here to optimize the techniques globally, which means we consider all sequences from a given dataset for parameter optimization, being the results more accurate than the ones reported by Pereira et al. [20]. The remainder of this paper is organized as follows: Section 2 presents a brief theoretical background about optical flow, and Section 3 revisits the techniques employed in this paper for comparison purposes. The proposed methodology and experimental results are discussed in Sections 4 and 5, respectively. Section 6 states the conclusions and future works.

2 Optical flow

Optical flow (OF) is a vector field representing ‘the distribution of apparent velocities of movement of brightness patterns in an image’ [13]. The idea contains two basic assumptions: the ‘grey value constancy’ and the ‘smooth flow of the intensity values’ between two successive images. Some articles still maintain the grey value constancy (as an example, see [29]), while other works report the necessity to loosen this assumption [30].

The OF constraint, given in Equation 1, is derived from the ‘grey value constancy’ assumption. It relates the spatial and temporal derivatives of a 2D image g at time step t and the OF vector ϕ, and it has a strong analogy with mass conservation in fluid mechanics, shown in Equation 2, where ϕ is the fluid speed and ρ is the fluid density. As fluid mass, image intensity is often supposed to remain constant under deformation and motion. However, Equations 1 and 2 would be equivalent when ϕ=0 only. This condition matches the smooth flow assumption that is considered when regularizing the flow field:

$$\begin{array}{@{}rcl@{}} \frac{\partial g }{\partial t}+\boldsymbol{\phi} \nabla g&=&0, \end{array} $$
((1))
$$\begin{array}{@{}rcl@{}} \frac{\partial \rho }{\partial t}+\nabla (\rho \boldsymbol{\phi})&=&0. \end{array} $$
((2))

The early work presented in [13] stated the need for an extra constraint to compute the optical flow field from an image sequence and proposed one ad hoc constraint based on the assumption of flow smoothness. Another research work [31] proposed to consider the OF equation for several neighboring pixels in order to avoid the need for an extra constraint. More than 10 years later, Barron et al. [32] provided a comparison of several OF methods, mainly with respect to their average angular error (AAE) when applied to some image sequences. The experiments showed that the method in [31] was one of the most reliable methods at the moment.

Recently, several image datasets have been compiled for a more precise evaluation and comparison of OF methods [33,34]. Many shortcomings of the original methods have been overcome, and the accuracy of OF methods on the top of the rankings has grown continuously. Additionally, several researchers have tried to preserve the discontinuity of natural motion fields [35], overcoming the original assumption of OF smoothness in [13]. After the work by [32], there have been further attempts to compare different methods. Liu et al. [36], for instance, showed a trade-off between computational time and angular error using operation curves to compare different OF techniques. It is also interesting to consider the time comparison among OF algorithms given by [37], since the authors provide a picture of the computational load of some OF algorithms. More recently, a group of researchers presented a series of real image sequences and their respective ground truth obtained by tracking hidden fluorescent textures [33]. The authors also suggest a method to evaluate OF-based algorithms.

2.1 Large displacement optical flow

Given a sequence of m frames ={I 1,I 2,…,I m }, let ϕ=(a,b)T be the optical flow for a pair of consecutive frames I i ,I i+1, i=1,2,…,m−1, being such frame pre-smoothed using a Gaussian filter with parameter σ. The large displacement optical flow method proposed by Brox and Malik [21] solves the energy functional given by:

$$\begin{array}{@{}rcl@{}} E(\boldsymbol{\phi}) &=& E_{\text{color}}(\boldsymbol{\phi}) + \gamma E_{\text{gradient}}(\boldsymbol{\phi})\\ && + \alpha E_{\text{smooth}}(\mathbf{\phi}) + \beta E_{\text{match}}(\boldsymbol{\phi},\boldsymbol{\phi_{1}})\\ && + E_{\text{desc}}(\boldsymbol{\phi_{1}}), \end{array} $$
((3))

where the term E color represents the common assumption of grey value or color constancy; E gradient represents gradient constancy, which is invariant to a uniform illumination change; E smooth enforces regularity of the resulting optical flow; E match stands for an energy related to point correspondences; and the minimization of E desc assures descriptor matching. The quantity ϕ 1 is an auxiliary variable which allows integrating descriptor matching into a continuous approach. The implementation available for LDOF [38] has a reduced number of parameters, which means we can consider all of them for optimization purposes. Such implementation allows the user to fine-tune four parameters: (i) σ is related to the Gaussian pre-smoothing of the images (pre-processing parameter), (ii) α controls the importance attributed to smoothness of the resulting optical flow, (iii) β enforces the matching of points in both images, and (iv) γ regulates the penalization of violations to the gradient constancy assumption. It is important to highlight that this set of parameters influences significantly the accuracy (consequently the error metrics) and the computational load.

3 Optimization background

In this section, we describe the techniques employed in this paper for comparison purposes. The methods can be divided in two classes: (i) metaheuristic algorithms and (ii) exact methods. Concerning the former approaches, we used social-spider optimization, particle swarm optimization, and harmony search, and with respect to exact methods, we employed the Nelder-Mead, which is a deterministic algorithm for convex functions that employs a simplex for optimization purposes.

3.1 Social-spider optimization

Social-spider optimization is based on the cooperative behavior of social spiders [22], and it takes into account two genders of search spiders: males and females. Depending on the gender, each agent is conducted by a set of different operators emulating a cooperative behavior in a colony. The search space is assumed as a communal web, and a spider’s position represents an optimal (near optimal) solution.

An interesting characteristic of social spiders is the female-biased population. The number of male spiders hardly reaches 30% of the total colony members. The number of females N f is randomly selected within a range of 65% to 90% of the entire population N, being calculated as follows:

$$ N_{f} = [(0.9-\xi0.25)N], $$
((4))

where ξ (0,1). The number of male spiders N m is given by:

$$ N_{m} = N - N_{\mathrm{f}}. $$
((5))

Each spider i receives a weight ϕ i according to the fitness value of its solution:

$$ \phi_{i} = \frac{\text{fitness}_{i} - \text{worst}}{\text{best} - \text{worst}}, $$
((6))

where fitness i is the fitness value obtained by the evaluation of the ith spider’s position i=1,2,…,N. The worst and best mean the worst fitness value and best fitness value of the entire population, respectively.

The communal web is used as a mechanism to transmit information among the colony members. The information is encoded as small vibrations and depends on the weight and distance of the spider which have generated them:

$$ V_{i,j} = \phi_{j} e^{-d^{2}_{i,j}}, $$
((7))

where d i,j is the Euclidean distance between the spider i and j. We can consider three special relationships:

  • The vibrations V i,c are perceived by the spider i as a result of the information transmitted by the member c who is the nearest member to i and possesses a higher weight ϕ c >ϕ i ;

  • The vibrations V i,b perceived by the spider i as a result of information transmitted by the spider b holding the best weight of the entire population;

  • The vibrations V i,f perceived by the spider i as a result of the information transmitted by the nearest female f.

Social spiders perform cooperative interaction over other colony members depending on the gender. In order to emulate the cooperative behavior of the female spider, a new operator is defined in Equation 8. The movement of attraction or repulsion φ i of a female spider i at time step t+1 is developed over other spiders according to their vibrations, which are emitted over the communal web:

$$ \varphi_{i}(t+1) = \left\{ \begin{array}{ll} \varphi_{i}(t) + \alpha * V_{i,c} * (s_{c} - \varphi_{i}(t)) + \beta * V_{i,b} * \\ (s_{b} - \varphi_{i}(t)) + \gamma * (\text{rand} - \frac{1}{2}) & \\ \text{if}\,\, \theta < \text{PF}; \\ \varphi_{i}(t) - \alpha * V_{i,c} * (s_{c} - \varphi_{i}(t)) - \beta * V_{i,b} * \\ (s_{b} - \varphi_{i}(t)) + \gamma * (\text{rand} - \frac{1}{2}) & \\ \text{if}\,\, \theta \geq \text{PF}, \end{array} \right. $$
((8))

where θ,α,β,γ, and rand are uniform random numbers between [0,1], PF is an input parameter, and s c and s b represent the nearest member to i that holds a higher weight and the best spider of the entire population, respectively.

The male spider population is divided into two classes: dominant and non-dominant. The dominant class spider has better fitness in comparison to non-dominant, and they are attracted to the closest female spider in the communal web. On the other hand, non-dominant male spiders tend to concentrate in the center of the male population as a strategy to take advantage of resources that are wasted by dominant males. The movement of male spiders is given by:

$$ {}\delta_{i}(t+1) = \left\{ \begin{array}{ll} \delta_{i}(t) + \alpha * V_{i,f} * (s_{f} \,-\, \delta_{i}(t)) + \gamma * (\text{rand} \,-\, \frac{1}{2}) & \\ \text{if}\,\, \phi_{N_{f} + i} > \tilde{\phi}; \\ \delta_{i}(t) + \alpha * \left(\frac{\sum_{h=1}^{N_{m}} \delta_{h} (t) * \phi_{N_{f} + h}}{\sum_{h=1}^{N_{m}} \phi_{N_{f} + h}}\right) & \\ \text{if}\,\, \phi_{N_{f} + i} \leq \tilde{\phi}, \end{array} \right. $$
((9))

where s f represents the nearest female spider to the male spider i and \(\tilde {\phi }\) is the median weight of male spider population. Thus, the reader can observe that we have distinct movement equations for male and female spiders. Notice that we are using \(\phi _{N_{f}+i}\) to denote the male spiders, since we consider ϕ as a vector containing the fitness of every spider within the web, being the first N f spiders the female ones.

Mating is performed by dominant males and female members in a social-spider colony. Considering r (calculated by Equation 10) as being the radius, when a dominant male spider locates female members inside r, it mates, forming a new brood:

$$ r = \frac{\sum_{j=1}^{n}l_{j}^{\text{high}} - l_{j}^{\text{low}}}{2n}, $$
((10))

where n is the dimension of the problem, and \(l_{j}^{\text {high}}\) and \(l_{j}^{\text {low}}\) are the upper and lower bounds, respectively. Once the new spider is formed, it is compared to the worst spider of the colony. If the new spider is better, the worst spider is replaced by the new one.

3.2 Harmony search

Harmony search is a metaheuristic technique based on the improvisation process of musicians searching for a good harmony [39]. The main idea is to generate a new harmony \(h_{\text {new}} = (h^{1}_{\text {new}}, h^{2}_{\text {new}},..., h^{N}_{\text {new}})\) at each iteration, based on memory considerations and pitch adjustment. In this case, N stands for the number of decision variables to be optimized.

The idea of the memorization step is to model the process of creating songs, in which the musician can use his/her memories of good musical notes to create a new song. This process is modeled by the harmony memory considering rate (HMCR), as follows:

$$ {} h^{j}_{\text{new}} \!\!\leftarrow\!\! \left\{ \begin{array}{ll} \!h^{j}_{\text{new}} \!\in\! \left\{{h^{j}_{1}},\ldots,{h^{j}_{M}}\! \right\} & \text{with probability HMCR}\\ \!h^{j}_{\text{new}} \!\in\! {\psi}_{j} & \text{with probability (1-HMCR)}, \end{array} \right. $$
((11))

where M and ψ j are the number of harmonies and the set of ranges for each decision variable j, respectively. Therefore, HMCR [0,1] is the probability of choosing one value from the historic values stored in the harmony memory, and (1-HMCR) is the probability of randomly choosing one feasible value. Further, if the new harmony has been created with probability HMCR, every component j of the new harmony vector h new is examined to determine whether it should be pitch-adjusted or not, which is controlled by the pitch adjusting rate (PAR) variable:

$$\begin{array}{@{}rcl@{}} h^{j}_{\text{new}} & \leftarrow & \left\{ \begin{array}{ll} \mathrm{Yes,} & \text{{\scriptsize with probability PAR}} \\ \mathrm{No,} & \text{{\scriptsize with probability (1-PAR)}}. \end{array}\right. \end{array} $$
((12))

The pitch adjustment is often used to improve solutions and to avoid local optima. This mechanism concerns shifting the neighbouring values of some decision variable in the harmony. As such, if the pitch adjustment decision for the decision variable \(h^{j}_{\text {new}}\) is Yes, then \(h^{j}_{\text {new}}\) is replaced as follows:

$$ h^{j}_{\text{new}} \leftarrow h^{j}_{\mathrm{new }} + \delta_{j}\tau, $$
((13))

where τ is an arbitrary distance (bandwidth) for the continuous design variable, and \(\delta _{j}~\sim {\mathcal {U}}(0,1)\) is an ad hoc parameter.

Recently, several researches have focused on developing variants of traditional HS. In our implementation, we employed the novel global harmony search (NGHS) [40], which has demonstrated better results than vanilla HS in our experiments. The NGHS does not employ PAR and HMCR parameters, but it introduces a new parameter P that denotes the probability of occurring an improvisation schema during a new harmony’s creation, and therefore modifies the improvisation process. Another difference between NGHS and the HS is that a new harmony always replaces the worst one, even when the new one does not improve the worst harmony.

3.3 Particle swarm optimization

Particle swarm optimization can be seen as a search algorithm based on stochastic processes [24], where the learning of social behavior allows each possible solution (particle) ‘fly’ onto that space (swarm) looking for other particles that have the best features and thus minimizing or maximizing the objective function.

Each particle has a memory that stores its best local solution (local maxima or minima) and the best global solution (global maximum or minimum). Besides, each particle has the ability to imitate others that provide the best positions in the swarm. This mechanism can be summarized in three principles: (i) evaluation, (ii) comparison, and (iii) imitation. Each particle can evaluate others within your neighborhood through some objective function; it can compare with your own value and finally decide whether it is a good choice to imitate it or not.

The swarm is modeled as a multidimensional space \(\mathbb {R}^{N}\), where each particle \(l_{i} = (\lambda _{i},\kappa _{i}) \in \mathbb {R}^{N}\) has two main features: (i) position λ i and (ii) velocity κ i . The best local \(\widehat {\lambda _{i}}\) and global \(\widehat {G}\) solutions (position in the swarm) are also known. After setting the size of the swarm (the number of particles), each particle is initialized with random values for both velocity and position. Each particle is then evaluated with respect to some objective function, and its local maxima/minima is updated. The global maximum/minimum value is updated with the particle that reached the best position in the swarm. This process is repeated until some convergence criterion is met. The position and velocity of the particle l i at time step t+1 are updated by Equations 14 and 15, respectively:

$$ \lambda_{i}^{t+1}={\lambda_{i}^{t}}+{\kappa_{i}^{t}}, $$
((14))

and

$$ \kappa_{i}^{t+1} = \Psi {\kappa_{i}^{t}}+c_{1}r_{1}(\widehat{\lambda}_{i}-{\lambda_{i}^{t}})+c_{2}r_{2}(\widehat{G}-{\lambda_{i}^{t}}) $$
((15))

where Ψ is the inertia force that controls the interaction power between particles, and r 1,r 2[0,1] are random variables that give the idea of stochasticity concerning PSO. The constants c 1 and c 2 are also used to guide the particles (input parameters for the algorithm) onto good solutions.

3.4 Nelder-Mead method

The Nelder-Mead is an iterative heuristic of direct search approach (it does not compute derivatives) used to find stationary points (minimum or maximum) in multidimensional unconstrained functions [28]. This approach is commonly used in problems where the derivative is not known, or when the computational cost to compute it is prohibitive.

Given a function \(f: \mathbb {R}^{n} \to \mathbb {R}\) and an initial guess x 0, the Nelder-Mead method creates a simplex \({\cal {S}}^{0} = \{p_{0}, p_{1},..., p_{n} \} \in \mathbb {R}^{n}\) around the initial guess x 0 with n+1 sample points. There are different approaches to generate an initial simplex \({\mathcal {S}}^{0}\), and its size can influence the solution to be obtained. In our implementation, we generate the initial simplex \({\mathcal {S}}^{0}\) using the classical approach described by Equation 16:

$$\begin{array}{@{}rcl@{}} \mathbf{p_{0}} & = & \mathbf{x}^{0}\text{and}\\ \mathbf{p_{j}} & = & \mathbf{p_{0}} + s\,e_{j} \quad j \in \{1, 2,..., n\}, \end{array} $$
((16))

where s is the step size that determines the simplex size and \(e = \{1, 1,..., 1\} \in \mathbb {R}^{n}\) is a diagonal vector with size \(\sqrt {n}\). Thus, the initial simplex \({\mathcal {S}}^{0}\) has all edges with the same size s.

After the construction of simplex \({\mathcal {S}}^{i}\), the Nelder-Mead starts the iterative process to find a stationary point x . The first step is to compute all sample values f j =f(p j )0≤jn. Next, we determine the indices w, v, and b, which represent the worst, second worst, and best samples’ indexes, respectively. Soon after, we compute the centroid \(c = \frac {1}{n} \sum _{j \neq w} p_{j}\) of all sample points except the worst once.

Further, we compute the reflect point p r =c+𝜗(cp w ): if f b f r<f v, then we replace the simplex sample p w by p r , and the iteration ends. Otherwise, if f r <f b , we compute the expansion point p e =c+ς(p r c) and its sample value f e =f(p e ). If f e <f r , then we select p e and discard p w ; otherwise, we accept p r and discard p w . Now, if f r f w , we compute the contraction point p c (Equation 17) using the best sample between p r and p v :

$$\begin{array}{@{}rcl@{}} \mathbf{p_{c}} & = & \mathbf{c} + \varphi(\mathbf{p_{r}} - \mathbf{c}) \mathrm{\qquad if~({f_{v}} \leq {f_{r}} \leq {f_{w}})}\\ \mathbf{p_{c}} & = & \mathbf{c} + \varphi(\mathbf{p_{w}} - \mathbf{c}) \mathrm{\qquad if~({f_{r}} \geq {f_{w}}). } \end{array} $$
((17))

We denote p brv as the point with the lowest sample value between p r and p v , i.e., p brv =p r if f r f w , and p brv =p w otherwise. If f c f bpv , we accept p c ; otherwise, it is necessary to create a new shrink simplex, which can be calculated by updating the vertices as follows:

$$ \mathbf{p_{j}} = \mathbf{p_{j}} + \rho(\mathbf{p_{j}} - \mathbf{p_{b}}),\, \forall j \neq b, $$
((18))

where j=1,2,…,n. The iterative process is repeated until the maximum number of iterations is reached, or some convergence criterion is met. Notice that the Nelder-Mead algorithm has the following parameters: 𝜗,φ,ρ, and ς.

4 Methodology

This section describes the experimental setup employed in this paper to validate the optimization algorithms to set up parameters of LDOF. We used two well-known public datasets composed of image sequences and their respective ground truths: Middlebury [33,41] and Sintel [42,43], which have been frequently used to evaluate different OF methods [21,33]. The Middlebury dataset contains eight synthetic and laboratory sequences with a dense ground truth (Figure 1), and the Sintel dataset contains artificial naturalistic video sequence (Figure 2).

Figure 1
figure 1

Images from the Middlebury dataset used in the experiments. From left to right: Dimetrodon, Grove2, Grove3, Hydrangea, Urban2, Urban3, RubberWhale, and Venus.

Figure 2
figure 2

Images from the Sintel dataset used in the experiments. From left to right: alley_1, ambush_2, bamboo_1, bandage_1, cave_2, market_2, mountain_1, shaman_1, sleeping_1, and temple_2.

We employed the LDOF technique (Section 2.1) together with our implementation of SSO, NGHS, PSO, and NM. The main reason behind the use of such techniques is to alleviate the high computational burden often required by optimization techniques. In light of such shortcoming, we opted to use techniques with easy implementation, which usually reflects in their complexity. For the sake of comparison, we computed the average of ‘end point error’ (EPE) [44] values obtained over five runnings for each optimization technique, which is basically the difference between the ground truth and estimated optical flow.

Let u e =(u e ,v e ) be the estimated optical flow, and u gt =(u gt ,v gt ) be the ground truth of the optical flow. Therefore, the EPE can be calculated as follows:

$$ \text{EPE} = \sqrt{(u_{e} - u_{gt})^{2} + (v_{e} - v_{gt})^{2})}. $$
((19))

Table 1 presents the parameters used for each of them: NM parameters were set according to the work of Lagarias et al. [45]. Additionally, we also employed LDOF with the parameters recommended by Brox and Malik [21], in which we refer here as the ‘baseline.’ SSO, NGHS, and PSO parameters were fine-tuned according to the work of Pereira et al. [20]. A search space with 20 agents and 200 iterations for SSO, NGHS, and PSO, and 100 iterations for NM.a Since the solution of NM algorithm is strongly influenced by the initial guess, we used random initial guesses for that.

Table 1 Parameters used for each optimization technique

Roughly speaking, the main idea is to find out the set of LDOF parameters that minimize the EPE measure. Therefore, instead of employing a random or empirical approach for that, we make use of an optimization framework to perform a faster and more reliable search for such parameters. As such, the fitness function to be minimized is the one given by EPE measure. The experiments were divided in two rounds, as depicted in Figure 3. In the first round, we estimated the best set of parameters (the ones with minimum EPE) using the aforementioned optimization algorithms applied on the eight Middlebury sequences. In the second round, we applied the same algorithms on ten sequences of images from the Sintel dataset. In order to compare the optimization algorithms, we also employed a ‘baseline’ set of parameters proposed by Brox and Malik [21].

Figure 3
figure 3

Methodology employed to validate the optimization algorithms.

The methodology employed in this paper differs from the one used by Pereira et al. [20], which optimized each dataset image individually, i.e., they aimed at fine-tuning LDOF for each image, being the final result the average over all images considering the AAE metric. In this work, we conducted the optimization process over the whole dataset, i.e., we aimed at fine-tuning LDOF considering all images of the dataset at the same time. Therefore, the fitness function adopted in this work was the one given by the average of EPE values of all dataset images.

5 Experimental results

This section presents the results obtained by SSO, NGHS, PSO, and NM for optical flow parameter optimization purposes. We would like to stress that we did not consider the runtime (computational load), since our goal is to minimize the EPE metric only. Furthermore, the parameters to be optimized have a strong influence on both EPE and runtime.

In regard to the first round of experiments, Table 2 shows the EPE values concerning SSO, NGHS, PSO, NM, and LDOF baseline in the eight ground-truth image sequences of the Middlebury dataset. In the first round, PSO obtained the best average results with EPE equals to 0.325, followed by SSO (EPE equals to 0.330). Notice both methods presented better results than the baseline approach. Additionally, PSO obtained the best results in three out of eight Middlebury sequences (RubberWhale, Urban3, and Urban2), SSO achieved the best results in three out of eight Middlebury sequences (Dimetrodon, Grove3, and Venus), and the baseline achieved the best results for two sequences (Grove2 and Hydrangea). NGHS and NM did not achieve the best result in any image sequence. Although Brox and Malik [21] did not present the methodology used to find the baseline parameters, this experiment highlighted the need for a fine-tune of parameters using optimization algorithms.

Table 2 Results obtained by SSO, NGHS, PSO, NM, and LDOF baseline [ 21 ] over Middlebury dataset

In regard to the second round of experiments, Table 3 shows the EPE values concerning SSO, NGHS, PSO, NM, and baseline on ten image sequences considering the Sintel dataset. In this experiment, we used the first two frames of the following sequences: alley_1, ambush_2, bamboo_1, bandage_1, cave_2, market_2, mountain_1, shaman_1, sleeping_1, and temple_2. We can observe that the optimization techniques presented similar results, being all of them more accurate than the baseline (except for cave_2, where the baseline approach achieved similar results to PSO). PSO obtained the best results in four out of ten Sintel sequences (alley_1, bamboo_1, cave_2, and temple_2), SSO achieved the best results in three out ten Sintel sequences (bandage_1, market_2, and shaman_1), NGHS obtained the best result in one out of ten Sintel sequences (ambush_2).

Table 3 Results obtained by SSO, NGHS, PSO, NM, and LDOF baseline [ 21 ] over Sintel dataset

Figure 4 depicts the average EPE values considering all sequences for the Middlebury and Sintel datasets, as well as the average between these two. Therefore, the main idea of this work is to highlight the importance of using optimization algorithms to fine-tune the parameters for OF-based techniques. Considering the average results of both experiments, all optimization techniques obtained better results than the baseline. Furthermore, the experiments show that the parameters shall be selected specifically for each dataset or application.

Figure 4
figure 4

The results obtained on Middlebury and Sintel datasets.

An additional experiment showed the computational load of each technique, which is measured here in terms of the number of calls to the LDOF algorithm and presented in Figure 5. If we are interested in a fast model selection, the best approach might be NGHS, since it has obtained reasonable results with less computational effort than swarm-based approaches. However, if we decide to apply an off-line fine-tuning, both SSO and PSO seem to be interesting approaches, being the former slightly more accurate.

Figure 5
figure 5

The number of calls of each optimization algorithm.

6 Conclusions

In this paper, we have validated the optimization algorithms in the context of model selection in optical flow-based applications, which play an important role in computer vision systems. The experimental section compared the baseline parameters obtained by Brox and Malik [21] against with four optimization techniques: SSO, NGHS, PSO, and NM. Two rounds of experiments have been conducted over the well-known Middlebury and Sintel datasets: (i) the first round aimed at learning the best set of parameters (i.e., the ones that minimizes the end point error criterion) over the Middlebury dataset and (ii) the second phase performed the same over the Sintel dataset. In the first round, two optimization algorithms (SSO and PSO) achieved better results than the baseline parameters, and in the second round, all optimization algorithms achieved better results than the baseline. Therefore, this paper highlighted the need for an automatic fine-tuning of the parameters of optical flow techniques. In addition, the computational load of the compared techniques have been assessed in terms of the number of calls to the LDOF technique, evidencing the lower computational burden of NGHS and NM techniques.

7 Endnote

a The number of agents and iterations have been chosen based on previous experiments [20].

References

  1. W Li, D Cosker, M Brown, T R, in IEEE Conference on Computer Vision and Pattern Recognition. Optical flow estimation using Laplacian mesh energy (IEEE Press,DC, USA, 2013), pp. 2435–2442.

    Google Scholar 

  2. R Ranftl, K Bredies, T Pock, in European Conference on Computer Vision. Lecture Notes in Computer Science, 8689, ed. by D Fleet, T Pajdla, B Schiele, and T Tuytelaars. Non-local total generalized variation for optical flow estimation (Springer,New York, 2014), pp. 439–454.

    Google Scholar 

  3. J An, SJ Ha, NI Cho, Probabilistic motion pixel detection for the reduction of ghost artifacts in high dynamic range images from multiple exposures. EURASIP J. Image Video Process. 2014(1), 1–15 (2014).

    Article  Google Scholar 

  4. M Hornáček, F Besse, J Kautz, A Fitzgibbon, C Rother, in European Conference on Computer Vision. Lecture Notes in Computer Science, 8691, ed. by D Fleet, T Pajdla, B Schiele, and T Tuytelaars. Highly overparameterized optical flow using patchmatch belief propagation (Springer,New York, 2014), pp. 220–234.

    Google Scholar 

  5. M Narayana, A Hanson, E Learned-Miller, in IEEE International Conference on Computer Vision. Coherent motion segmentation in moving camera videos using optical flow orientations (IEEE Press,DC, USA, 2013), pp. 1577–1584.

    Google Scholar 

  6. E Ilg, R Kümmerle, W Burgard, T Brox, in IEEE International Conference on Robotics and Automation. Reconstruction of rigid body models from motion distorted laser range data using optical flow (IEEE Press,DC, USA, 2014), pp. 1–6.

    Google Scholar 

  7. G Dongmin, AL van de Ven, X Zhou, Red blood cell tracking using optical flow methods. IEEE J. Biomed. Health Informatics. 18(3), 991–998 (2014).

    Article  Google Scholar 

  8. S Liu, L Yuan, P Tan, J Sun, in IEEE Conference on Computer Vision and Pattern Recognition. SteadyFlow: spatially smooth optical flow for video stabilization (IEEE Press,DC, USA, 2014), pp. 4209–4216.

    Google Scholar 

  9. F Valentinotti, G Di Caro, B Crespi, Real-time parallel computation of disparity and optical flow using phase difference. Machine Vision Appl. 9(3), 87–96 (1996).

    Article  Google Scholar 

  10. M Fleury, AF Clark, AC Downton, Evaluating optical-flow algorithms on a parallel machine. Image Vision Comput. 19(3), 131–143 (2001).

    Article  Google Scholar 

  11. A Garcia-Dopico, JL Pedraza, M Nieto, A Pérez, S Rodríguez, J Navas, Parallelization of the optical flow computation in sequences from moving cameras. EURASIP J. Image Video Process. 2014(1), 1–19 (2014).

    Article  Google Scholar 

  12. D Sun, S Roth, MJ Black, A quantitative analysis of current practices in optical flow estimation and the principles behind them. Int. J. Comput. Vision. 106(2), 115–137 (2014).

    Article  Google Scholar 

  13. BKP Horn, BG Schunck, Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981).

    Article  Google Scholar 

  14. N Onkarappa, AD Sappa, Speed and texture: an empirical study on optical-flow accuracy in ADAS scenarios. IEEE Trans. Intell. Transportation Syst. 15, 136–147 (2014).

    Article  Google Scholar 

  15. P Heas, C Herzet, E Memin, Bayesian inference of models and hyperparameters for robust optical-flow estimation. IEEE Trans. Image Process. 21(4), 1437–1451 (2012).

    Article  MathSciNet  Google Scholar 

  16. K Krajsek, R Mester, in Pattern Recognition. Lecture Notes in Computer Science, 4713, ed. by FA Hamprecht, C Schnörr, and B Jähne. Bayesian model selection for optical flow estimation (SpringerNew York, 2007), pp. 142–151.

    Google Scholar 

  17. Y Li, DP Huttenlocher, in European Conference on Computer Vision. Lecture Notes in Computer Science, 5303, ed. by D Forsyth, P Torr, and A Zisserman. Learning for optical flow using stochastic optimization (SpringerNew York, 2008), pp. 379–391.

    Google Scholar 

  18. S Roth, MJ Black, On the spatial statistics of optical flow. Int. J. Comput. Vis. 74(1), 33–50 (2007).

    Article  Google Scholar 

  19. J Delpiano, L Pizarro, R Verschae, J Ruiz-del-Solar, in 9th International Conference on Computer Vision Theory and Applications, 2. Multi-objective optimization for characterization of optical flow methods (IEEE Press,Odense, DK, 2014), pp. 556–573.

    Google Scholar 

  20. DR Pereira, J Delpiano, JP Papa, in 27th SIBGRAPI Conference on Graphics, Patterns and Images. Evolutionary optimization applied for fine-tuning parameter estimation in optical flow-based environments (SciTePress,DC, USA, 2014), pp. 125–132.

    Chapter  Google Scholar 

  21. T Brox, J Malik, Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 500–513 (2011).

    Article  Google Scholar 

  22. E Cuevas, M Cienfuegos, D Zaldívar, M Pérez-Cisneros, A swarm optimization algorithm inspired in the behavior of the social-spider. Expert Syst. Appl. 40(16), 6374–6384 (2013).

    Article  Google Scholar 

  23. ZW Geem, Music-Inspired Harmony Search Algorithm: Theory and Applications (Springer, New York, 2009).

    Book  Google Scholar 

  24. J Kennedy, R Eberhart, in Proceedings of the IEEE International Conference on Neural Networks. Particle swarm optimization (IEEE Press,DC, USA, 1995), pp. 1942–1948.

    Chapter  Google Scholar 

  25. L Xu, J Jia, Y Matsushita, Motion detail preserving optical flow estimation. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1744–1757 (2012).

    Article  Google Scholar 

  26. D Sun, S Roth, MJ Black, in IEEE Conference On Computer Vision and Pattern Recognition (CVPR) 2010. Secrets of optical flow estimation and their principles (IEEE,DC, USA, 2010), pp. 2432–2439.

    Chapter  Google Scholar 

  27. M Werlberger, W Trobin, T Pock, A Wedel, D Cremers, H Bischof, in Proceedings of the British Machine Vision Conference (BMVC), London, UK. Anisotropic Huber-L1 optical flow (BMVA Press,Durham, 2009).

    Google Scholar 

  28. JA Nelder, R Mead, A simplex method for function minimization. Comput. J. 7, 308–313 (1965).

    Article  MATH  Google Scholar 

  29. A Bruhn, J Weickert, C Schnörr, Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int. J. Comput. Vis. 61(3), 211–231 (2005).

    Article  Google Scholar 

  30. N Cornelius, T Kanade, in Proc. of the ACM SIGGRAPH/SIGART Interdisciplinary Workshop on Motion: Representation and Perception. Adapting optical-flow to measure object motion in reflectance and x-ray image sequences (Elsevier North-Holland,NY, USA, 1986).

    Google Scholar 

  31. B Lucas, T Kanade, in Proceedings of the 7th International Joint Conference on Artificial Intelligence. An iterative image registration technique with an application to stereo vision (Morgan Kaufmann Publishers Inc.CA, USA, 1981).

    Google Scholar 

  32. JL Barron, DJ Fleet, SS Beauchemin, Performance of optical flow techniques. Int. J. Comput. Vis. 12, 43–77 (1994).

    Article  Google Scholar 

  33. S Baker, D Scharstein, JP Lewis, S Roth, M Black, R Szeliski, A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (2011). doi:10.1007/s11263-010-0390-2.

    Article  Google Scholar 

  34. A Geiger, P Lenz, R Urtasun, in IEEE Conference on Computer Vision and Pattern Recognition. Are we ready for autonomous driving? The KITTI vision benchmark suite (IEEE Press,DC, USA, 2012).

    Google Scholar 

  35. A Bruhn, J Weickert, A multigrid platform for real-time motion computation with discontinuity-preserving variational methods. Int. J. Comput. Vis. 70, 257–277 (2006).

    Article  Google Scholar 

  36. H Liu, T Hong, M Herman, R Chellappa, Accuracy vs. efficiency trade-offs in optical flow algorithms. Comp. Vision Image Underst. 72(3), 271–286 (1996).

    Article  Google Scholar 

  37. D Gibson, M Spann, Robust optical flow estimation based on a sparse motion trajectory set. IEEE Trans. Image Process. 12(4), 431–445 (2003).

    Article  Google Scholar 

  38. LDOF implementation. http://lmb.informatik.uni-freiburg.de/resources/software.php. Accessed 4 Nov 2014.

  39. ZW Geem, Music-Inspired Harmony Search Algorithm: Theory and Applications (Springer,New York, 2009).

    Book  Google Scholar 

  40. D Zou, L Gao, J Wu, S Li, Novel global harmony search algorithm for unconstrained problems. Neurocomputing. 73, 3308–3318 (2010).

    Article  Google Scholar 

  41. Middlebury dataset. http://vision.middlebury.edu/flow/data/. Accessed 4 Nov 2014.

  42. Sintel dataset. sintel.is.tue.mpg.de/. Accessed 4 Nov 2014.

  43. DJ Butler, J Wulff, GB Stanley, MJ Black, in European Conference on Computer Vision. Part IV, LNCS 7577, ed. by A Fitzgibbon, et al.A naturalistic open source movie for optical flow evaluation (Springer,New York, 2012), pp. 611–625.

    Google Scholar 

  44. M Otte, H-H Nagel, in European Conference on Computer Vision. Lecture Notes in Computer Science, 800, ed. by J-O Eklundh. Optical flow estimation: advances and comparisons (Springer,New York, 1994), pp. 49–60.

    Google Scholar 

  45. JC Lagarias, JA Reeds, MH Wright, PE Wright, Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM J. Optim. 9, 112–147 (1998).

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful to FAPESP grants #2013/20387-7 and #2014/16250-9, CNPq grants #303182/2011-3, #470571/2013-6, and #306166/2014-3, and Universidad de los Andes FAI grant #05/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danillo R Pereira.

Additional information

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pereira, D.R., Delpiano, J. & Papa, J.P. On the optical flow model selection through metaheuristics. J Image Video Proc. 2015, 11 (2015). https://doi.org/10.1186/s13640-015-0066-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13640-015-0066-5

Keywords