Skip to main content

Online motion smoothing for video stabilization via constrained multiple-model estimation


Video stabilization smooths camera motion estimates in a way that should adapt to different types of intentional motion. Corrective motion (the difference between smoothed and original motions) should be constrained so that black borders do not intrude into the (cropped) stabilized frames. Although offline smoothing can use all of the frames, online (real-time) smoothing can only use a small number of previous frames. In this paper, we propose an online motion smoothing method based on linear estimation applied to a constant-velocity model. We use estimate projection to ensure that the smoothed motion satisfies black-border constraints, which are modeled exactly by linear inequalities for general 2D motion models. We then combine the estimate projection with multiple-model estimation, which can adaptively smooth the camera motion in a probabilistic way. Experimental results show how the proposed algorithm can better smooth the camera motion and stabilize videos in real time.

1 Introduction

Video data has increased dramatically in recent years due to the prevalence of hand-held cameras. Such videos, however, are usually shakier compared to videos shot by tripod-mounted cameras or cameras with mechanical stabilizers. Digital video stabilization seeks to remove the unwanted frame-to-frame jitter and generate visually stable and pleasant videos. In general, digital video stabilization consists of three major steps, namely motion estimation, motion smoothing, and frame synthesis. This paper focuses on the second step.

Given the estimated camera motion for each frame, motion smoothing aims at designing a new smooth camera motion path. Most existing works address motion smoothing as an offline processing after the entire video sequence has been recorded. However, real-time video stabilization is necessary for applications such as video conferencing and broadcasting. Besides, for consumers who want to record videos, real-time stabilization can greatly improve the user experience with the stabilized videos displayed in real-time on the viewfinders. Real-time video stabilization is also able to reduce the memory requirements with frames stabilized before compression. In real-time video stabilization, camera motion is required to be smoothed in a causal way. This is more difficult than offline motion smoothing because we are missing information of how camera motion changes afterward.

Due to the camera motion change from motion smoothing, some areas in the synthesized frame will be undefined. This is known as black-border problem. In practice, we have to crop the resulting video frames and enlarge them if necessary. Still, in motion smoothing, we have to constrain the change of camera motion in order to guarantee that no black borders intrude into the stabilized video frames. How to take such constraint into 5 optimally is a challenging problem, especially for online motion smoothing.

While taking videos, people sometimes move cameras on purpose to get the best viewpoint of the scene that is being recorded. This is known as intentional motion and should be kept by motion smoothing. The changing rate of camera intentional motion may vary, and a fixed motion smoothing strategy may not work well. For instance, aggressive motion smoothing can effectively reduce the jitter if the camera motion is supposed to be still, but may lose track of the intentional motion if it is changing fast. Moreover, aggressive motion smoothing for fast changing intentional motion will lead larger areas to be undefined in the stabilized video. As a result, the motion smoothing algorithm should be adaptive accordingly.

In this paper, we propose an online motion smoothing method. Our method is motivated by Kalman-filtering-based motion smoothing with a constant-velocity model. We use Bayesian multiple-model estimation to achieve adaptive smoothing. The black-border constraints are exactly modeled as linear inequalities for almost every 2D motion model. We change the multiple-model estimation algorithm by taking the constraints into account. The state vector estimates are projected onto the constraint set in a probabilistic way.

This paper is organized as follows: Section 2 reviews previous motion smoothing algorithms and related estimation background. Section 3 shows how online motion smoothing can be formulated as a linear estimation problem with constant-velocity model that can be solved by Kalman filtering. Section 4.1 formulates the black-border constraints as linear inequalities for most of the 2D camera motion models, and shows how estimate projection can be used to solve the constrained estimation problem. Section 4.2 presents the proposed adaptive motion smoothing using multiple-model estimation and how to modify it with estimate projection. Section 5 shows how motion smoothing is improved using the proposed algorithm of multiple motion models. Section 6 concludes the paper.

2 Background and related work

This paper focuses on motion smoothing. Motion estimation, as the other essential step in video stabilization, can be implemented by sparse feature tracking [1] or block matching [2, 3].

Most existing motion smoothing algorithms are offline smoothing. Gaussian window filtering was used to smooth the entire camera motion path in [4, 5] under 2D translational and affine model, respectively. Another kind of algorithms smooth the camera motion via minimizing a certain objective function that represents the smoothness of the camera motion trajectory. An advantage of such objective-minimizing methods is that the black-border constraints can be naturally added to the problem and solved by constrained optimization. In [6], the authors defined the objective function as the L 2 norm of the second order difference of camera motion under 2D Euclidean model. The black-border constraint was approximately modeled by an interval constraint on the motion parameters. Similar modeling was also used in [7], but the variables were assumed to be integer-valued and the problem was solved via dynamic programming. In [8], the objective function was a mixture of the first-, second-, and third order difference of camera motion measured by L 1 norm. The motion model was 2D similarity motion, and the black-border constraint was modeled precisely as linear inequalities. As a result, the constrained motion smoothing could be solved efficiently by linear programming. Black-border constrained was also taken into consideration for window-filtering-based methods. In [9], the authors proposed a dual pass motion smoothing method which could find an optimal cropping size with as large as possible.

In [10], IIR filtering was proposed for online motion smoothing based on 2D translational motion model. Kalman filtering was first used for online smoothing in [11]. The intentional motion parameters (under 2D translational motion model) were modeled by a constant-velocity linear system so Kalman filtering could be used to optimally estimate them. The same Kalman-filtering motion smoothing framework was extended to 2D affine motion model in [12], leading to a better performance. The same algorithm was widely used in the later video stabilization works, such as [1, 13]. These algorithms used fixed parameters in stabilizing the entire video sequence, which is not ideal since the magnitude of unintentional camera motion (jitter) may vary. Adaptive algorithms were proposed for online motion smoothing to resolve this problem. In [14], a fuzzy system was used to tune the parameters in an IIR motion filter. A similar fuzzy system was also proposed in [15] to improve the Kalman-filtering-based method. Another adaptive Kalman-filtering method was proposed in [16] by detecting the zero-crossing numbers of motion parameters. In our paper, we adaptively estimate the intentional camera motion using dynamic multiple-model estimation, which is a generalization of single-mode Kalman filtering [17]. Interacting multiple-model (IMM) algorithm [18] has been widely used to solve such problem due to its excellent performance and relatively low computational requirements [19]. Compared to previous adaptive Kalman-filtering-based motion smoothing algorithms, the proposed dynamic multiple-model estimation is able to choose the proper parameters optimally from a probabilistic viewpoint. We modified the existing unconstrained IMM algorithm with estimation projection so that we can successfully smooth the camera motion in an adaptive way while guaranteeing no black borders.

The black-border constraints were rarely considered in online motion smoothing. In [20], the authors proposed to use constrained Kalman filtering for 2D translational motion model. Because of the simplicity of the motion model, interval constraints could be used and the constrained estimate could be obtained in one step. For more complicated motion models, interval constraints are not able to model the black-border constraints accurately. While [21] still used interval constraints to approximately solve the black-border problem for more complicated motion models, in this paper, we use an exact linear inequality modeling of the constraints without any approximation for complicated motion models like similarity motion and affine motion. We solve the constrained estimation by estimate projection as proposed in [22]. For a more comprehensive survey of constrained Kalman-filtering algorithms, please see [23].

3 Kalman filter-based motion smoothing

A Kalman filter is an optimal maximum a posteriori (MAP) estimator for linear dynamic systems with Gaussian process and measurement noise. Without loss of generality, we assume that there is no control input to the system. The system can be represented as

$$ \left\{\begin{array}{l} \mathbf{x}_{k} = \mathbf{F}_{k}\mathbf{x}_{k-1} + \mathbf{w}_{k} \\ \mathbf{z}_{k} = \mathbf{H}_{k}\mathbf{x}_{k} + \mathbf{v}_{k} \end{array},\right. $$

where x k is the hidden state vector at time k and z k is the observation (or measurement) at time k. F k is the state transition model which is applied to the previous state x k−1. H k is the observation model which maps the true state space into the observed space. \(\mathbf {w}_{k} \sim \mathcal {N}(\mathbf {0}; \mathbf {Q}_{k})\) and \(\mathbf {v}_{k} \sim \mathcal {N}(\mathbf {0}; \mathbf {R}_{k})\) model the normal distributed process noise and observation noise. A Kalman filter recursively estimates the Gaussian posterior probability p(x k |z 1:k ) by tracking its mean \(\hat {\mathbf {x}}_{k}\) and covariance P k . The Kalman-filtering algorithm can be summarized as Algorithm 1.

Kalman filtering with a constant-velocity (CV) system model has been widely used in tracking maneuvering targets. Assuming one dimension of the target location to be tracked is x k , the CV model uses a state vector x k =[x k ,v k ]T consisting of both x k and the velocity in this dimension v k . The dynamic model is specified as

$$ \mathbf{F}_{k} = \left[\begin{array}{ll} 1 & T\\ 0 & 1 \end{array},\right] \mathbf{w}_{k} \sim \mathcal{N}\left(\mathbf{0}; \sigma_{p}^{2} \left[\begin{array}{ll} \frac{T^{4}}{4} & \frac{T^{3}}{2} \\ \frac{T^{3}}{2} & T^{2} \end{array}\right]\right), $$

where T is the sampling interval. In this model, the velocity is assumed to be almost constant except for a possible acceleration (maneuvering) with distribution \( \mathcal {N}(0;\sigma _{p}^{2})\). Usually, we have a noisy measurement of the location of the target, so the measurement model can be specified as

$$ \mathbf{H}_{k} = \left[\begin{array}{ll} 1 & 0 \end{array}\right], \mathbf{v}_{k}\sim \mathcal{N}\left(0; \sigma_{m}^{2}\right). $$

The aforementioned single-dimensional CV model can be easily generalized to a multi-dimensional CV model. The Kalman filter estimate of the target locations from the CV model usually appears much smoother compared to the original noisy location measurements due to the constant-velocity assumption in the dynamic model. Therefore, this model has been successfully used in causal smoothing of time series such as the camera motion.

Given a reference frame, the camera motion of the entire video sequence can be represented as a sequence of motion parameters dependent on the choice of motion model. For instance, a 2D affine model depicts the relative transformation between two frames as

$$ \left[\begin{array}{ll} x^{\prime} \\ y^{\prime} \end{array}\right] = \mathbf{A} \left[\begin{array}{ll} x \\ y \end{array}\right] + \mathbf{b}, $$

where [ x,y]T and [ x ,y ]T are the locations of any pair of matched pixels in the two frames. Therefore, the camera motion of each frame k can be represented by a 2×2 matrix A k and a 2×1 vector b k . In general, the camera motion of the video can be parameterized as a sequence of motion vectors {θ k }. This sequence can then be smoothed via the CV-model-based Kalman filtering by setting the state vector as \([\boldsymbol {\theta }_{k}, \dot {\boldsymbol {\theta }}_{k}]^{\mathrm {T}}\), where \(\dot {\boldsymbol {\theta }}_{k}\) is the discrete changing rate (velocity) of the camera motion.

4 The proposed methods

The aforementioned constant-velocity Kalman-filtering algorithm effectively smooths the camera motion sequences for online video stabilization. However, it is not constrained to avoid black borders in the stabilized frames. In addition, when the intentional camera motion changes at different rates, a single constant-velocity model is not able to accurately track it. We propose an online motion smoothing algorithm that resolves both problems. Black-border constraints are modeled as linear inequalities and solved by estimation projection. Interactive multiple-model estimation is used to adaptively smooth camera motion that cannot be achieved by single-mode Kalman filtering.

4.1 Constraints on motion smoothing and constrained Kalman filtering

The smoothed camera motion generates a correction motion for each frame. In the last step of video stabilization, the new frames are synthesized by image warping using the correction motion. The synthesized frames may contain black borders since not every pixel in the synthesized frame is visible in the original frame due to the change of camera motion. As discussed in Section 2, a secure way to this problem is to crop the synthesized frames into a smaller size so that there are no black borders in the stabilized video. Therefore, in smoothing the camera motion sequence, we need to guarantee that every pixel in the cropped stabilized frames is visible in the original frames. This is a hard constraint that has to be considered in the camera motion smoothing algorithm.

For almost all kinds of 2D motion models, the constraints on the camera motion parameters for each frame can be expressed as a set of linear inequality constraints. Thus, the system we are facing becomes

$$ \left\{\begin{array}{ll} \mathbf{x}_{k} = \mathbf{F}_{k}\mathbf{x}_{k-1} + \mathbf{w}_{k} \\ \mathbf{z}_{k} = \mathbf{H}_{k}\mathbf{x}_{k} + \mathbf{v}_{k} \end{array}\right. \mathrm{s.t.} \boldsymbol{\Psi}_{k}\mathbf{x}_{k} \leq \boldsymbol{\beta}_{k}. $$

The state constraints have to be taken into account in Kalman filtering. We tackle this problem with an efficient method known as estimate projection. The idea is to project the unconstrained state estimate \(\hat {\mathbf {x}}_{k}\) of the Kalman filter onto the constraint set. The constrained estimate can be written as

$$ \tilde{\mathbf{x}}_{k} = \text{argmin}_{\mathbf{x}}(\mathbf{x} - \hat{\mathbf{x}}_{k})^{\mathrm{T}}\mathbf{W}(\mathbf{x} - \hat{\mathbf{x}}_{k}), \mathrm{s.t.} \boldsymbol{\Psi}_{k}\mathbf{x} \leq \boldsymbol{\beta}_{k}, $$

where W is a positive-definite weighting matrix. Usually, W is chosen as the inverse of the unconstrained covariance matrix estimate \(\mathbf {P}_{k}^{-1}\). In this way, the solution \(\tilde {\mathbf {x}}_{k}\) maximizes the probability density function (pdf) of the original unconstrained estimate subject to the state constraints. Note that (6) is a linear-inequality-constrained convex quadratic programming (QP) problem. We solve it using the active set method. The active set method searches the constraints that are active at the optimal solution to the problem. For each trial of active constraints, the problem is simplified to a linear-equality-constrained quadratic programming problem, which can be solved analytically in one step using Lagrange multiplier method. Details of active set method for convex QP are shown in [24].

In the next subsections, we show the modeling of the visibility constraint as a set of linear inequality constraints for three different camera motion models:

4.1.1 Affine motion

Under affine motion, a pixel [x,y]T in frame k can be transformed to location

$$ \left[\begin{array}{ll} x^{\prime} \\ y^{\prime} \end{array}\right] = \left[\begin{array}{ll} a_{k}^{0} & a_{k}^{1} \\ a_{k}^{2} & a_{k}^{3} \end{array}\right] \left[\begin{array}{ll} x \\ y \end{array}\right] + \left[\begin{array}{ll} b_{k}^{0}\\ b_{k}^{1} \end{array}\right] $$

in the reference frame using six parameters. We assume that the smoothed camera motion for frame k is \(\hat {a}_{k}^{i}, i = 0\cdots 3\) and \(\hat {b}_{k}^{i}, i = 0,1\). Then, given the four corners of the cropping rectangle \([c_{x}^{i}, c_{y}^{i}]^{\mathrm {T}}, i = 1\cdots 4\), the constraints on the smoothed camera motion can be represented as

$$\begin{array}{*{20}l} \left[\begin{array}{ll} 0\\ 0 \end{array}\right] &\leq \left[\begin{array}{ll} a_{k}^{0} & a_{k}^{1} \\ a_{k}^{2} & a_{k}^{3} \end{array}\right]^{-1}\left(\left[\begin{array}{ll} \hat{a}_{k}^{0} & \hat{a}_{k}^{1} \\ \hat{a}_{k}^{2} & \hat{a}_{k}^{3} \end{array}\right] \left[\begin{array}{ll} c_{x}^{i} \\ c_{y}^{i} \end{array}\right] + \left[\begin{array}{ll} \hat{b}_{k}^{0}\\ \hat{b}_{k}^{1} \end{array}\right] \right)\\ &\quad- \left[\begin{array}{ll} a_{k}^{0} & a_{k}^{1}\\ a_{k}^{2} & a_{k}^{3} \end{array}\right]^{-1} \left[\begin{array}{ll} b_{k}^{0}\\ b_{k}^{1} \end{array}\right] \leq \left[\begin{array}{ll} w\\ h \end{array}\right], \end{array} $$

where w and h are the width and height of the original frame. This is a set of linear inequality constraints on the smoothed motion parameters.

4.1.2 Similarity motion

The similarity motion model is similar to the affine motion model except that \(a_{k}^{2}\) is forced to be equivalent to \(-a_{k}^{1}\), and \(a_{k}^{3}\) is forced to be equivalent to \(a_{k}^{0}\). So, there are four motion parameters for each frame instead of six in the affine motion model.

4.1.3 Translation motion

The translation model only depicts the 2D translation motion of pixels on the image plane, so it forces the matrix \(\left [\begin {array}{ll} a_{k}^{0} & a_{k}^{1} \\ a_{k}^{2} & a_{k}^{3} \end {array}\right ]\)to be identified and only leaves translational parameters \(b_{k}^{0}\) and \(b_{k}^{1}\).

The constraints on the smoothed camera motion can be represented as

$$ \left\{\begin{array}{ll} 0 \leq c_{x}^{i} + \hat{b}_{k}^{0} - b_{k}^{0} \leq w \\ 0 \leq c_{y}^{i} + \hat{b}_{k}^{1} - b_{k}^{1} \leq h \end{array},\right. $$

which can be further simplified to an interval constraint.

4.2 Adaptive smoothing with multiple-model estimate

4.2.1 Adaptive motion smoothing

Motion smoothing using CV model highly depends on the assumption of the acceleration variance (σ p in (2)). Small value of σ p allows little change in velocity, which results in a smoother trajectory. On the opposite, large value of σ p gives higher degree of flexibility in velocity change and leads to a trajectory closer to the original one (given as the noisy measurement).

In video stabilization, small σ p does not necessarily lead to a good result. If there is a significant intentional camera motion change in the video, small σ p may have a long delay time or even fail in tracking the intentional camera motion. Moreover, the smoothed camera motion generated by Kalman filtering with smaller σ p tends to deviate farther from the original camera motion, and thus triggers estimate projection in Section 4.1 more frequently. As shown in Section 4.1, the constraints on motion parameters \(\hat {\boldsymbol {\theta }}_{k}\) are determined by the original (unsmooth) motion parameters θ k , and therefore differ across different frames. Frequent estimate projection may add the unwanted camera shake back and reduce the smoothness of the Kalman-filtering output.

Therefore, it is desirable to adaptively change the value of σ p according to the original camera motion. For the frames which the intentional camera motion is still, we would better use small value of σ p to effectively eliminate camera shake (measurement noise in (1)). For the frames which the intentional camera motion changes fast, a larger value of σ p can provide the flexibility in tracking the camera motion change and avoid estimate projection for satisfying the black-border constraint.

We solve this problem via dynamic multi-model state estimate. We use M different CV system models {j,j=1M} that only differ in the value of σ p . The model is assumed to jump between models as a Markov chain:

$$ p(m_{k+1} = j | m_{k} = i) = p_{ij}. $$

If the model is static, we can implement M Kalman filters in parallel with each corresponding to a model. At each stage, likelihood of each model p(m k |z 1:k ) is computed first and the state estimate is computed as a Bayes-optimal combination of the the individual estimates. If the model is dynamic as in our case, the optimal multi-model filter has to keep track of all of the model history, which grows exponentially with increasing stages (frames). In practice, only model history in the last stage is kept and the model histories in older stages are combined. This idea leads to the IMM (interacting multiple-model) algorithm, which has good performance and relatively low computational complexity.

4.2.2 IMM algorithm

An unconstrained IMM estimator consists of three main steps: (1) mixing/interacting of the mode-conditioned estimates in previous stage, (2) mode-conditioned state estimation, and (3) mode probability computation. Figure 1 illustrates how IMM algorithm is implemented. At each stage, we keep the Gaussian approximations of each mode-conditioned estimate p(x k |m k =j,z 1:k ) with mean \(\hat {\mathbf {x}}_{k}^{j}\) and covariance \(\mathbf {P}_{k}^{j}\). The mode probabilities \(\mu _{k}^{j} = p(m_{k} = j | \mathbf {z}_{1:k})\) are also kept.

Fig. 1
figure 1

Unconstrained IMM algorithm

In the mixing step, we obtain p(x k−1|m k =j,z 1:k−1) according to

$$ \sum_{i=1}^{M} p(\mathbf{x}_{k-1} | m_{k-1} = i, \mathbf{z}_{1:k-1})\lambda_{k-1}^{ij}, $$

where \(\lambda _{k-1}^{ij} = p(m_{k-1} = i | m_{k} = j, \mathbf {z}_{1:k-1})\). \(\lambda _{k-1}^{ij}\) can be computed by Bayes rule using the mode transition probability and the mode distribution in the previous stage μ k−1 as

$$\begin{array}{*{20}l} \lambda_{k-1}^{ij} & = \frac{p(m_{k-1} = i, m_{k} = j | \mathbf{z}_{1:k-1})}{\sum_{i=1}^{M} p(m_{k-1} = i, m_{k} = j | \mathbf{z}_{1:k-1})} \\ & = \frac{\mu_{k-1}^{i} p_{ij}}{\sum_{i=1}^{M} \mu_{k-1}^{i} p_{ij}}. \end{array} $$

Note that (11) is a mixture of Gaussian distribution. The IMM algorithm approximates it by a Gaussian distribution with mean \(\bar {\mathbf {x}}_{k-1}^{j}\) and covariance \(\bar {\mathbf {P}}_{k-1}^{j}\).

Each pair of \(\bar {\mathbf {x}}_{k-1}^{j}, \bar {\mathbf {P}}_{k-1}^{j}\) is then fed into a Kalman filter to get p(x k |m k =j,z 1:k ) (represented by mean \(\hat {\mathbf {x}}_{k}^{j}\) and covariance \(\hat {\mathbf {P}}_{k}^{j}\)).

The mode probabilities are updated according to

$$\begin{array}{*{20}l} \mu_{k}^{j} & \propto p(m_{k} = j, \mathbf{z}_{k}|\mathbf{z}_{1:k-1})\\ & = p(m_{k} = j | \mathbf{z}_{1:k-1})p(\mathbf{z}_{k} | m_{k} =j, \mathbf{z}_{1:k-1}) \\ & = \left(\sum\limits_{i=1}^{M} \mu_{k-1}^{i} p_{ij}\right)p(\mathbf{z}_{k} | m_{k} =j, \mathbf{z}_{1:k-1}), \end{array} $$

where p(z k |m k =j,z 1:k−1) is equivalent to the probability of the innovation vector \(\mathbf {y}_{k}^{j}\) with respect to a Gaussian distribution \(\mathcal {N}(\mathbf {0}; \mathbf {S}_{k}^{j})\) (see line 8 and 9 in Algorithm 1).

The final estimate at each stage is a linear combination of all Kalman filter outputs using the mode probabilities.

4.2.3 Constrained IMM algorithm

In this subsection, the black-border constraints in Section 4.1 is applied to the multi-model estimation. We have shown that the constraints can be modeled as a set of linear inequality constraints. In single-model Kalman filtering, error projection method can be applied on the unconstrained Kalman filter estimate to meet the constraints. The output of the IMM algorithm consists of the outputs of several Kalman filters, as well as their combination using the mode probabilities. Therefore, we can also apply error projection (6) on the unconstrained estimate of each Kalman filter. Their linear combination automatically satisfies the constraints due to the linearity of the constraints.

Such modification can guarantee the constraints being satisfied. However, the influence of the constraints on the computation of mode probabilities is not taken into account. Error projection was proposed after both predict and update steps have been implemented. Therefore, the innovation vectors are not modified and the mode probability computation remains unchanged. To make the mode probabilities to better reflect the influence of the black-border constraints, we propose to insert an additional error projection step between the predict and update steps in each Kalman filters in the IMM algorithm. The input (innovation vectors) to mode probability computation step is a modified version after error projection. Note, however, the update step in each Kalman filter still use the unchanged predicted state vector because there will be another error projection step after update.

The modified Kalman filter for constrained IMM algorithm is illustrated by Fig. 2 and summarized in Algorithm 2.

Fig. 2
figure 2

Modified Kalman filter in constrained IMM algorithm

5 Experimental results and discussion

5.1 2D translational motion

We first test the proposed algorithm under a 2D translational motion model. As we see in Section 4.1.3, the black-border constraints can be modeled as independent interval constraints on the two parameters of camera motion (displacements in x and y axes). As a result, the two motion parameters can be smoothed separately, which makes visual and numerical comparison of different algorithms easier.

5.1.1 Synthetic motion

Figure 3 shows a synthetic path of image displacement for a video with 600 frames and the smoothed result using the proposed algorithm. The intentional motion has constant velocity except for the abrupt changes at frames 200 and 400. The unsteady (original) motion is synthesized by adding Gaussian random noise to the intentional motion. We constrain the motion smoothing so that the correction translation on each direction is less than 60 pixels. In the multiple-model estimation, we use two modes with \(\sigma _{p}^{2}T^{2} = 0.0001\) and \(\sigma _{p}^{2}T^{2} = 0.1\) (pixels2). The sampling interval T is 33.3 ms, which corresponds to 30 fps. The mode transition probability is set as p 11=0.99,p 12=0.01,p 21=0.25, and p 22=0.75. Such setting has a bias towards constant-velocity motion compared to maneuvering, since the change of velocity is usually transient.

Fig. 3
figure 3

Synthetic simulation: original and smoothed motion using the proposed approach

In Fig. 4, we compare the proposed constrained IMM with single-model constrained Kalman filters. We also show the constraint boundaries by cyan curves. We can find that the result of constrained IMM is closer to the result of Kalman filter with large σ p but clearly smoother (better observed after zooming in). The result of Kalman filter with small σ p appears smoother when the velocity of the intentional motion does not change, as expected. However, when there is an abrupt change in the velocity, it takes longer to adapt to the correct velocity estimation. This leads to more jitters after frame 200 and frame 400 because the Kalman filter estimates before estimation projection hits the constraint boundaries more often.

Fig. 4
figure 4

Synthetic simulation: comparison between single-model constrained Kalman filtering and constrained IMM. Cyan curves are constraint boundaries

Figure 5 shows how the mode probabilities change in the multiple-model estimation. Sudden changes of pixel displacement velocity clearly corresponds to the increase in probability of mode \(\sigma _{p}^{2}T^{2} = 0.1\) and decrease in probability of mode \(\sigma _{p}^{2}T^{2} = 0.0001\).

Fig. 5
figure 5

Changing of mode probabilities

In numerical comparison, we use two performance metrics. The first is the mean square of jitter in the result. The jitter is obtained by passing the result through a high pass filter with cutoff frequency as 1 Hz (sampling frequency is 30 Hz). This metric was proposed in [25]. In [25], another metric was proposed with the mean square of jitter to measure the low-frequency divergence between the smoothed motion and the intentional motion. In this paper, the black-border constraints naturally restrict such divergence to a very small value. So, we only use the mean square of jitter because it reflects the smoothness of the camera motion.

The other smoothness metric we measure is the mean square of the motion acceleration. Motion acceleration is the second order difference of the motion parameter sequence. This metric is widely used as the objective function to minimize many offline video stabilization algorithms [6, 8].

Table 1 shows the numerical comparison between single-model constrained Kalman filtering and constrained IMM. From Table 1, we can see that for both smoothness metrics, the constrained IMM outperforms the single-mode constrained Kalman filters.

Table 1 Numerical comparison between different motion smoothing algorithms for the synthetic camera motion

5.1.2 Real videos

We also tested the proposed algorithm on two real videos. Both videos are captured by a walking person on urban streets. Figure 6 shows two example frames extracted from the videos. The original frame size is 720×480. In our experiments, we use a 540×360 cropping size for the stabilized video. The choice of σ p and mode transition matrix are the same as in the synthetic simulation.

Fig. 6
figure 6

Examples of frames extracted from the test sequences

Figures 7 and 8 show the smoothed horizontal motion of video 1 and video 2 using single-model constrained Kalman filters and the proposed constrained IMM filter. Similar to the synthetic simulation, the proposed IMM filter performs well no matter the velocity of the intentional motion stays almost constant or changes abruptly.

Fig. 7
figure 7

Video 1 horizontal motion: comparison between single-model constrained Kalman filtering and constrained IMM. Cyan curves are constraint boundaries

Fig. 8
figure 8

Video 2 horizontal motion: comparison between single-model constrained Kalman filtering and constrained IMM. Cyan curves are constraint boundaries

The smoothed vertical motions of two test videos are shown in Figs. 9 and 10, respectively. Vertical translations of videos are more unstable because the photographer is walking. Also, the intentional motion of vertical translation does not have very large changes in its velocity because the urban street is even. Therefore, constrained Kalman filter with smaller σ p seems to perform better, especially for video 2.

Fig. 9
figure 9

Video 1 vertical motion: comparison between single-model constrained Kalman filtering and constrained IMM. Cyan curves are constraint boundaries

Fig. 10
figure 10

Video 2 vertical motion: comparison between single-model constrained Kalman filtering and constrained IMM. Cyan curves are constraint boundaries

Numerical comparisons in Tables 2 and 3 show that the proposed algorithm can smooth the entire video sequences better except for the vertical motion of video 2.

Table 2 Numerical comparison between different motion smoothing algorithms for video 1
Table 3 Numerical comparison between different motion smoothing algorithms for video 2

5.2 2D affine motion

2D affine motion can model the pixel displacements more accurately than 2D translational motion. Therefore, motion smoothing under 2D affine motion model can generate more stable videos than 2D translational motion. For 2D affine motion model, the black-border constraints can be exactly modeled by linear inequalities as in (8). Such constraints can be efficiently handled by the proposed estimation projection steps in the IMM estimation framework. Multiple-model filtering is only used to smooth motion parameters b 0 and b 1 to reduce the necessary number of modes. The parameters a 0a 3 are still smoothed by single-mode Kalman filtering. Similar to 2D translational motion smoothing, we use two modes (\(\sigma _{p}^{2}T^{2} = 0.0001\) and \(\sigma _{p}^{2}T^{2} = 0.1\)) for each of b 0 and b 1. Since for 2D affine motion model the motion parameters cannot be smoothed independently, we have four modes in total in the constrained IMM filtering.

We compare the motion smoothing results visually by showing the feature trajectories in the stabilized videos. Specifically, we detect Harris corner points in a certain frame and track them for 20 frames. The feature trajectories are plotted as black curves on top of the starting frame (the frames themselves are plotted using alpha channel 0.5 (more transparent) to make the curves clearer). For a stabilized video, the trajectories should look smooth. Figure 11 shows a comparison between the stabilization results using the proposed 2D translational motion smoothing and the proposed 2D affine motion smoothing. Note that we detect and track the feature points independently in the three videos so the location and number of the feature points can be different. It is clear that affine motion smoothing can better stabilize the original video under the same black-border constraints. Figure 12 shows a similar comparison for video 2. As a result, it is necessary to stabilize the videos using affine motion model if we want to get more stable results.

Fig. 11
figure 11

Stabilization comparison for video1. Features are tracked from frame 256 to frame 275. The feature tracks are plotted as black curves on frame 256. a Original video. b Proposed translational smoothing. c Proposed affine smoothing

Fig. 12
figure 12

Stabilization comparison for video 2. Features are tracked from frame 16 to frame 35. The feature tracks are plotted as black curves on frame 16. a Original video. b Proposed translational smoothing. c Proposed affine smoothing

We next compare the stabilized results using the constrained IMM filter against single-mode constrained Kalman filters, all using 2D affine motion model. As shown in Figs. 13 and 14, in the cases where the velocity of the intentional camera motion changes slowly, constrained Kalman filter with small σ p tends to generate the most stable results. However, when there is abrupt velocity change in the intentional motion, the constrained Kalman filter with small σ p can result in annoying back and forth pixel movements because the motion estimate hits the constraints easily. The proposed constrained multiple-model filter is able to generate more balanced results, which is consistent with our observation and analysis in Section 5.1.

Fig. 13
figure 13

Stabilization comparison for video 1. Features are tracked from frame 420 to frame 439. The feature tracks are plotted as black curves on frame 420. a Constrained KF with small σ p . b Constrained KF with large σ p . c Constrained IMM filter

Fig. 14
figure 14

Stabilization comparison for video 1. Features are tracked from frame 700 to frame 719. The feature tracks are plotted as black curves on frame 700. a Constrained KF with small σ p . b Constrained KF with large σ p . c Constrained IMM filter

6 Conclusions

In this paper, we propose an online motion smoothing method for video stabilization based on the existing constant-velocity Kalman-filtering method. The black-border constraints are modeled as linear inequalities for different 2D motion models and are combined with the Kalman-filtering framework in a probabilistic way. Estimate projection is used to project the estimates on to the constraint set after the update step in Kalman filtering. To adaptively smooth the camera motion with different kinds of intentional motion, we propose to use multiple-model estimation with different process noise variance instead of single-mode Kalman filtering. To make the mode probability computation more accurate under the affect of black-border constraints, the multiple-model estimation is modified by adding another estimate projection step after the propagation step for each sub-filter. Experimental results show that the proposed constrained multiple-model estimation is able to adaptively smooth camera motion and guarantee that all of the pixels in stabilized frames are defined in the original frames.


  1. J Yang, D Schonfeld, M Mohamed, Robust video stabilization based on particle filter tracking of projected camera motion. IEEE Trans. Circ. Syst. Video Technol. 19(7), 945–54 (2009).

    Article  Google Scholar 

  2. C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, F Wu, Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans. Circ. Syst. Video Technol. 24(12), 2077–89 (2014).

    Article  Google Scholar 

  3. C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, F Wu, A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process. Lett. 21(5), 573–6 (2014).

    Article  Google Scholar 

  4. S Ertürk, TJ Dennis, Image sequence stabilization based on DFT filtering. IEE Proc. Vision Image Signal Process. 147(2), 95–102 (2000).

    Article  Google Scholar 

  5. Y Matsushita, E Ofek, W Ge, X Tang, H-Y Shum, Full-frame video stabilization with motion inpainting. IEEE Trans. Pattern Anal. Mach. Intell. 28:, 1150–1163 (2006).

    Article  Google Scholar 

  6. C Song, H Zhao, W Jing, Y Bi, in Proc. Intl. Conf. Pattern Recognition. Robust video stabilization based on bounded path planning, (2012).

  7. M Pilu, in Proc. IEEE Intl. Conf. Computer Vision and Pattern Recognition. Video stabilization as a variational problem and numerical solution with the viterbi method, (2004).

  8. M Grundmann, V Kwatra, I Essa, in Proc. IEEE Conf. Computer Vision and Pattern Recognition. Auto-directed video stabilization with robust l1 optimal camera paths, (2011).

  9. P Pan, A Minagawa, J Sun, Y Hotta, S Naoi, in Proc. Intl. Conf. Pattern Recognition. A dual pass video stabilization system using iterative motion estimation and adaptive motion smoothing, (2010).

  10. S Ertürk, in Proc. Intl. Symp. Image and Signal Processing and Analysis. Image sequence stabilization: motion vector integration (MVI) versus frame position smoothing (FPS), (2001).

  11. S Ertürk, Real-time digital image stabilization using Kalman filters. Real-Time Imaging. 8:, 317–28 (2002).

    Article  MATH  Google Scholar 

  12. A Litvin, J Konrad, W Karl, Probabilistic video stabilization using Kalman filtering and mosaicking. Proc. IS&T/SPIE Symp. Electronic Imaging, Image and Video Comm. and Proc.5022:, 663–74 (2003).

    Google Scholar 

  13. C Jia, Z Sinno, B Evans, in Proc. Asilomar Conf. Signals, Sytems, and Computers. Real-time 3D rotation smoothing for video stabilization, (2014).

  14. MJ Tanakian, M Rezaei, F Mohanna, Digital video stabilizer by adaptive fuzzy filtering. EURASIP J. Image Video Process. 21: (2012).

  15. Güllu, MK̈, E Yaman, S Ertürk, Image sequence stabilization using fuzzy adaptive Kalman filtering. Electron. Lett. 39:, 429–31 (2003).

    Article  Google Scholar 

  16. C Wang, J-H Kim, K-Y Byun, J Ni, S-J Ko, Robust digital image stabilization using the Kalman filter. IEEE Trans. Consum. Electron. 55(1), 6–14 (2009).

    Article  Google Scholar 

  17. Y Bar-Shalom, XR Li, T Kirubarajan, Estimation with applications to tracking and navigation: theory algorithms and software (J. Wiley and Sons, 2001).

  18. HAP Blom, Y Bar-Shalom, The interacting multiple model algorithm for systems with Markovian switching coefficients. IEEE Trans. Autom. Control. 33(8), 780–3 (1988).

    Article  MATH  Google Scholar 

  19. E Mazor, A Averbuch, Y Bar-Shalom, J Dayan, Interacting multiple model methods in target tracking: a survey. IEEE Trans. Aerosp. Electron. Syst. 34:, 103–23 (1998).

    Article  Google Scholar 

  20. M Tico, M Vehvilainen, in Proc. IEEE Intl. Conf. Image Processing. Constraint motion filtering for video stabilization, (2005).

  21. M Tico, M Vehvilainen, in Proc. European Signal Processing Conference. Constraint translational and rotational motion filtering for video stabilization, (2005).

  22. D Simon, DL Simon, Aircraft turbofan engine health estimation using constrained Kalman filtering. ASME J. Eng. Gas Turbines Power. 127:, 323–8 (2005).

    Article  Google Scholar 

  23. D Simon, Kalman filtering with state constraints: a survey of linear and nonlinear algorithms. IET Control Theory Appl. 4:, 1303–18 (2010).

    Article  MathSciNet  Google Scholar 

  24. J Nocedal, SJ Wright, Numerical optimization (Springer, 1999).

  25. M Niskanen, O Silven, M Tico, in Proc. IEEE Intl. Conf. Multimedia and Expo. Video stabilization performance assessment, (2006).

Download references


This research was supported by a gift funding from Texas Instruments, Dallas, TX, USA.

Authors’ contributions

CJ designed the proposed algorithm, carried out the experiments, and drafted the manuscript. BE participated in the discussion of algorithm design and modified the content of the manuscript. Both authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Chao Jia.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, C., Evans, B.L. Online motion smoothing for video stabilization via constrained multiple-model estimation. J Image Video Proc. 2017, 25 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: