- Research
- Open Access
Online motion smoothing for video stabilization via constrained multiple-model estimation
- Chao Jia^{1}Email authorView ORCID ID profile and
- Brian L. Evans^{2}
https://doi.org/10.1186/s13640-017-0171-8
© The Author(s) 2017
- Received: 27 September 2016
- Accepted: 16 February 2017
- Published: 27 March 2017
Abstract
Video stabilization smooths camera motion estimates in a way that should adapt to different types of intentional motion. Corrective motion (the difference between smoothed and original motions) should be constrained so that black borders do not intrude into the (cropped) stabilized frames. Although offline smoothing can use all of the frames, online (real-time) smoothing can only use a small number of previous frames. In this paper, we propose an online motion smoothing method based on linear estimation applied to a constant-velocity model. We use estimate projection to ensure that the smoothed motion satisfies black-border constraints, which are modeled exactly by linear inequalities for general 2D motion models. We then combine the estimate projection with multiple-model estimation, which can adaptively smooth the camera motion in a probabilistic way. Experimental results show how the proposed algorithm can better smooth the camera motion and stabilize videos in real time.
Keywords
- Video stabilization
- Kalman filter
- Multi-model estimation
- Active sets method
1 Introduction
Video data has increased dramatically in recent years due to the prevalence of hand-held cameras. Such videos, however, are usually shakier compared to videos shot by tripod-mounted cameras or cameras with mechanical stabilizers. Digital video stabilization seeks to remove the unwanted frame-to-frame jitter and generate visually stable and pleasant videos. In general, digital video stabilization consists of three major steps, namely motion estimation, motion smoothing, and frame synthesis. This paper focuses on the second step.
Given the estimated camera motion for each frame, motion smoothing aims at designing a new smooth camera motion path. Most existing works address motion smoothing as an offline processing after the entire video sequence has been recorded. However, real-time video stabilization is necessary for applications such as video conferencing and broadcasting. Besides, for consumers who want to record videos, real-time stabilization can greatly improve the user experience with the stabilized videos displayed in real-time on the viewfinders. Real-time video stabilization is also able to reduce the memory requirements with frames stabilized before compression. In real-time video stabilization, camera motion is required to be smoothed in a causal way. This is more difficult than offline motion smoothing because we are missing information of how camera motion changes afterward.
Due to the camera motion change from motion smoothing, some areas in the synthesized frame will be undefined. This is known as black-border problem. In practice, we have to crop the resulting video frames and enlarge them if necessary. Still, in motion smoothing, we have to constrain the change of camera motion in order to guarantee that no black borders intrude into the stabilized video frames. How to take such constraint into 5 optimally is a challenging problem, especially for online motion smoothing.
While taking videos, people sometimes move cameras on purpose to get the best viewpoint of the scene that is being recorded. This is known as intentional motion and should be kept by motion smoothing. The changing rate of camera intentional motion may vary, and a fixed motion smoothing strategy may not work well. For instance, aggressive motion smoothing can effectively reduce the jitter if the camera motion is supposed to be still, but may lose track of the intentional motion if it is changing fast. Moreover, aggressive motion smoothing for fast changing intentional motion will lead larger areas to be undefined in the stabilized video. As a result, the motion smoothing algorithm should be adaptive accordingly.
In this paper, we propose an online motion smoothing method. Our method is motivated by Kalman-filtering-based motion smoothing with a constant-velocity model. We use Bayesian multiple-model estimation to achieve adaptive smoothing. The black-border constraints are exactly modeled as linear inequalities for almost every 2D motion model. We change the multiple-model estimation algorithm by taking the constraints into account. The state vector estimates are projected onto the constraint set in a probabilistic way.
This paper is organized as follows: Section 2 reviews previous motion smoothing algorithms and related estimation background. Section 3 shows how online motion smoothing can be formulated as a linear estimation problem with constant-velocity model that can be solved by Kalman filtering. Section 4.1 formulates the black-border constraints as linear inequalities for most of the 2D camera motion models, and shows how estimate projection can be used to solve the constrained estimation problem. Section 4.2 presents the proposed adaptive motion smoothing using multiple-model estimation and how to modify it with estimate projection. Section 5 shows how motion smoothing is improved using the proposed algorithm of multiple motion models. Section 6 concludes the paper.
2 Background and related work
3 Kalman filter-based motion smoothing
where x _{ k } is the hidden state vector at time k and z _{ k } is the observation (or measurement) at time k. F _{ k } is the state transition model which is applied to the previous state x _{ k−1}. H _{ k } is the observation model which maps the true state space into the observed space. \(\mathbf {w}_{k} \sim \mathcal {N}(\mathbf {0}; \mathbf {Q}_{k})\) and \(\mathbf {v}_{k} \sim \mathcal {N}(\mathbf {0}; \mathbf {R}_{k})\) model the normal distributed process noise and observation noise. A Kalman filter recursively estimates the Gaussian posterior probability p(x _{ k }|z _{1:k }) by tracking its mean \(\hat {\mathbf {x}}_{k}\) and covariance P _{ k }. The Kalman-filtering algorithm can be summarized as Algorithm 1.
The aforementioned single-dimensional CV model can be easily generalized to a multi-dimensional CV model. The Kalman filter estimate of the target locations from the CV model usually appears much smoother compared to the original noisy location measurements due to the constant-velocity assumption in the dynamic model. Therefore, this model has been successfully used in causal smoothing of time series such as the camera motion.
where [ x,y]^{T} and [ x ^{′},y ^{′}]^{T} are the locations of any pair of matched pixels in the two frames. Therefore, the camera motion of each frame k can be represented by a 2×2 matrix A _{ k } and a 2×1 vector b _{ k }. In general, the camera motion of the video can be parameterized as a sequence of motion vectors {θ _{ k }}. This sequence can then be smoothed via the CV-model-based Kalman filtering by setting the state vector as \([\boldsymbol {\theta }_{k}, \dot {\boldsymbol {\theta }}_{k}]^{\mathrm {T}}\), where \(\dot {\boldsymbol {\theta }}_{k}\) is the discrete changing rate (velocity) of the camera motion.
4 The proposed methods
The aforementioned constant-velocity Kalman-filtering algorithm effectively smooths the camera motion sequences for online video stabilization. However, it is not constrained to avoid black borders in the stabilized frames. In addition, when the intentional camera motion changes at different rates, a single constant-velocity model is not able to accurately track it. We propose an online motion smoothing algorithm that resolves both problems. Black-border constraints are modeled as linear inequalities and solved by estimation projection. Interactive multiple-model estimation is used to adaptively smooth camera motion that cannot be achieved by single-mode Kalman filtering.
4.1 Constraints on motion smoothing and constrained Kalman filtering
The smoothed camera motion generates a correction motion for each frame. In the last step of video stabilization, the new frames are synthesized by image warping using the correction motion. The synthesized frames may contain black borders since not every pixel in the synthesized frame is visible in the original frame due to the change of camera motion. As discussed in Section 2, a secure way to this problem is to crop the synthesized frames into a smaller size so that there are no black borders in the stabilized video. Therefore, in smoothing the camera motion sequence, we need to guarantee that every pixel in the cropped stabilized frames is visible in the original frames. This is a hard constraint that has to be considered in the camera motion smoothing algorithm.
where W is a positive-definite weighting matrix. Usually, W is chosen as the inverse of the unconstrained covariance matrix estimate \(\mathbf {P}_{k}^{-1}\). In this way, the solution \(\tilde {\mathbf {x}}_{k}\) maximizes the probability density function (pdf) of the original unconstrained estimate subject to the state constraints. Note that (6) is a linear-inequality-constrained convex quadratic programming (QP) problem. We solve it using the active set method. The active set method searches the constraints that are active at the optimal solution to the problem. For each trial of active constraints, the problem is simplified to a linear-equality-constrained quadratic programming problem, which can be solved analytically in one step using Lagrange multiplier method. Details of active set method for convex QP are shown in [24].
In the next subsections, we show the modeling of the visibility constraint as a set of linear inequality constraints for three different camera motion models:
4.1.1 Affine motion
where w and h are the width and height of the original frame. This is a set of linear inequality constraints on the smoothed motion parameters.
4.1.2 Similarity motion
The similarity motion model is similar to the affine motion model except that \(a_{k}^{2}\) is forced to be equivalent to \(-a_{k}^{1}\), and \(a_{k}^{3}\) is forced to be equivalent to \(a_{k}^{0}\). So, there are four motion parameters for each frame instead of six in the affine motion model.
4.1.3 Translation motion
The translation model only depicts the 2D translation motion of pixels on the image plane, so it forces the matrix \(\left [\begin {array}{ll} a_{k}^{0} & a_{k}^{1} \\ a_{k}^{2} & a_{k}^{3} \end {array}\right ]\)to be identified and only leaves translational parameters \(b_{k}^{0}\) and \(b_{k}^{1}\).
which can be further simplified to an interval constraint.
4.2 Adaptive smoothing with multiple-model estimate
4.2.1 Adaptive motion smoothing
Motion smoothing using CV model highly depends on the assumption of the acceleration variance (σ _{ p } in (2)). Small value of σ _{ p } allows little change in velocity, which results in a smoother trajectory. On the opposite, large value of σ _{ p } gives higher degree of flexibility in velocity change and leads to a trajectory closer to the original one (given as the noisy measurement).
In video stabilization, small σ _{ p } does not necessarily lead to a good result. If there is a significant intentional camera motion change in the video, small σ _{ p } may have a long delay time or even fail in tracking the intentional camera motion. Moreover, the smoothed camera motion generated by Kalman filtering with smaller σ _{ p } tends to deviate farther from the original camera motion, and thus triggers estimate projection in Section 4.1 more frequently. As shown in Section 4.1, the constraints on motion parameters \(\hat {\boldsymbol {\theta }}_{k}\) are determined by the original (unsmooth) motion parameters θ _{ k }, and therefore differ across different frames. Frequent estimate projection may add the unwanted camera shake back and reduce the smoothness of the Kalman-filtering output.
Therefore, it is desirable to adaptively change the value of σ _{ p } according to the original camera motion. For the frames which the intentional camera motion is still, we would better use small value of σ _{ p } to effectively eliminate camera shake (measurement noise in (1)). For the frames which the intentional camera motion changes fast, a larger value of σ _{ p } can provide the flexibility in tracking the camera motion change and avoid estimate projection for satisfying the black-border constraint.
If the model is static, we can implement M Kalman filters in parallel with each corresponding to a model. At each stage, likelihood of each model p(m _{ k }|z _{1:k }) is computed first and the state estimate is computed as a Bayes-optimal combination of the the individual estimates. If the model is dynamic as in our case, the optimal multi-model filter has to keep track of all of the model history, which grows exponentially with increasing stages (frames). In practice, only model history in the last stage is kept and the model histories in older stages are combined. This idea leads to the IMM (interacting multiple-model) algorithm, which has good performance and relatively low computational complexity.
4.2.2 IMM algorithm
Note that (11) is a mixture of Gaussian distribution. The IMM algorithm approximates it by a Gaussian distribution with mean \(\bar {\mathbf {x}}_{k-1}^{j}\) and covariance \(\bar {\mathbf {P}}_{k-1}^{j}\).
Each pair of \(\bar {\mathbf {x}}_{k-1}^{j}, \bar {\mathbf {P}}_{k-1}^{j}\) is then fed into a Kalman filter to get p(x _{ k }|m _{ k }=j,z _{1:k }) (represented by mean \(\hat {\mathbf {x}}_{k}^{j}\) and covariance \(\hat {\mathbf {P}}_{k}^{j}\)).
where p(z _{ k }|m _{ k }=j,z _{1:k−1}) is equivalent to the probability of the innovation vector \(\mathbf {y}_{k}^{j}\) with respect to a Gaussian distribution \(\mathcal {N}(\mathbf {0}; \mathbf {S}_{k}^{j})\) (see line 8 and 9 in Algorithm 1).
The final estimate at each stage is a linear combination of all Kalman filter outputs using the mode probabilities.
4.2.3 Constrained IMM algorithm
In this subsection, the black-border constraints in Section 4.1 is applied to the multi-model estimation. We have shown that the constraints can be modeled as a set of linear inequality constraints. In single-model Kalman filtering, error projection method can be applied on the unconstrained Kalman filter estimate to meet the constraints. The output of the IMM algorithm consists of the outputs of several Kalman filters, as well as their combination using the mode probabilities. Therefore, we can also apply error projection (6) on the unconstrained estimate of each Kalman filter. Their linear combination automatically satisfies the constraints due to the linearity of the constraints.
Such modification can guarantee the constraints being satisfied. However, the influence of the constraints on the computation of mode probabilities is not taken into account. Error projection was proposed after both predict and update steps have been implemented. Therefore, the innovation vectors are not modified and the mode probability computation remains unchanged. To make the mode probabilities to better reflect the influence of the black-border constraints, we propose to insert an additional error projection step between the predict and update steps in each Kalman filters in the IMM algorithm. The input (innovation vectors) to mode probability computation step is a modified version after error projection. Note, however, the update step in each Kalman filter still use the unchanged predicted state vector because there will be another error projection step after update.
5 Experimental results and discussion
5.1 2D translational motion
We first test the proposed algorithm under a 2D translational motion model. As we see in Section 4.1.3, the black-border constraints can be modeled as independent interval constraints on the two parameters of camera motion (displacements in x and y axes). As a result, the two motion parameters can be smoothed separately, which makes visual and numerical comparison of different algorithms easier.
5.1.1 Synthetic motion
In numerical comparison, we use two performance metrics. The first is the mean square of jitter in the result. The jitter is obtained by passing the result through a high pass filter with cutoff frequency as 1 Hz (sampling frequency is 30 Hz). This metric was proposed in [25]. In [25], another metric was proposed with the mean square of jitter to measure the low-frequency divergence between the smoothed motion and the intentional motion. In this paper, the black-border constraints naturally restrict such divergence to a very small value. So, we only use the mean square of jitter because it reflects the smoothness of the camera motion.
The other smoothness metric we measure is the mean square of the motion acceleration. Motion acceleration is the second order difference of the motion parameter sequence. This metric is widely used as the objective function to minimize many offline video stabilization algorithms [6, 8].
Numerical comparison between different motion smoothing algorithms for the synthetic camera motion
Mean square jitter | Mean square acceleration | |
---|---|---|
Unsmoothed | 314.00 | 2217.43 |
Small σ _{ p } | 11.58 | 21.79 |
Large σ _{ p } | 19.24 | 28.09 |
Proposed | 3.93 | 10.80 |
5.1.2 Real videos
Numerical comparison between different motion smoothing algorithms for video 1
Horizontal motion | ||
Mean square jitter | Mean square acceleration | |
Unsmoothed | 189.83 | 93.79 |
Small σ _{ p } | 28.52 | 31.42 |
Large σ _{ p } | 33.58 | 16.36 |
Proposed | 15.56 | 10.9 |
Vertical motion | ||
Unsmoothed | 301.74 | 52.05 |
Small σ _{ p } | 50.07 | 5.89 |
Large σ _{ p } | 64.64 | 1.97 |
Proposed | 36.92 | 1.74 |
Numerical comparison between different motion smoothing algorithms for video 2
Horizontal motion | ||
Mean square jitter | Mean square acceleration | |
Unsmoothed | 256.46 | 76.82 |
Small σ _{ p } | 122.86 | 32.44 |
Large σ _{ p } | 113.54 | 15.77 |
Proposed | 93.59 | 11.47 |
Vertical motion | ||
Unsmoothed | 155.67 | 35.66 |
Small σ _{ p } | 1.43 | 0.61 |
Large σ _{ p } | 44.17 | 1.35 |
Proposed | 9.40 | 0.74 |
5.2 2D affine motion
2D affine motion can model the pixel displacements more accurately than 2D translational motion. Therefore, motion smoothing under 2D affine motion model can generate more stable videos than 2D translational motion. For 2D affine motion model, the black-border constraints can be exactly modeled by linear inequalities as in (8). Such constraints can be efficiently handled by the proposed estimation projection steps in the IMM estimation framework. Multiple-model filtering is only used to smooth motion parameters b _{0} and b _{1} to reduce the necessary number of modes. The parameters a _{0}⋯a _{3} are still smoothed by single-mode Kalman filtering. Similar to 2D translational motion smoothing, we use two modes (\(\sigma _{p}^{2}T^{2} = 0.0001\) and \(\sigma _{p}^{2}T^{2} = 0.1\)) for each of b _{0} and b _{1}. Since for 2D affine motion model the motion parameters cannot be smoothed independently, we have four modes in total in the constrained IMM filtering.
6 Conclusions
In this paper, we propose an online motion smoothing method for video stabilization based on the existing constant-velocity Kalman-filtering method. The black-border constraints are modeled as linear inequalities for different 2D motion models and are combined with the Kalman-filtering framework in a probabilistic way. Estimate projection is used to project the estimates on to the constraint set after the update step in Kalman filtering. To adaptively smooth the camera motion with different kinds of intentional motion, we propose to use multiple-model estimation with different process noise variance instead of single-mode Kalman filtering. To make the mode probability computation more accurate under the affect of black-border constraints, the multiple-model estimation is modified by adding another estimate projection step after the propagation step for each sub-filter. Experimental results show that the proposed constrained multiple-model estimation is able to adaptively smooth camera motion and guarantee that all of the pixels in stabilized frames are defined in the original frames.
Declarations
Funding
This research was supported by a gift funding from Texas Instruments, Dallas, TX, USA.
Authors’ contributions
CJ designed the proposed algorithm, carried out the experiments, and drafted the manuscript. BE participated in the discussion of algorithm design and modified the content of the manuscript. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- J Yang, D Schonfeld, M Mohamed, Robust video stabilization based on particle filter tracking of projected camera motion. IEEE Trans. Circ. Syst. Video Technol. 19(7), 945–54 (2009).View ArticleGoogle Scholar
- C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, F Wu, Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans. Circ. Syst. Video Technol. 24(12), 2077–89 (2014).View ArticleGoogle Scholar
- C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, F Wu, A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process. Lett. 21(5), 573–6 (2014).View ArticleGoogle Scholar
- S Ertürk, TJ Dennis, Image sequence stabilization based on DFT filtering. IEE Proc. Vision Image Signal Process. 147(2), 95–102 (2000).View ArticleGoogle Scholar
- Y Matsushita, E Ofek, W Ge, X Tang, H-Y Shum, Full-frame video stabilization with motion inpainting. IEEE Trans. Pattern Anal. Mach. Intell. 28:, 1150–1163 (2006).View ArticleGoogle Scholar
- C Song, H Zhao, W Jing, Y Bi, in Proc. Intl. Conf. Pattern Recognition. Robust video stabilization based on bounded path planning, (2012).Google Scholar
- M Pilu, in Proc. IEEE Intl. Conf. Computer Vision and Pattern Recognition. Video stabilization as a variational problem and numerical solution with the viterbi method, (2004).Google Scholar
- M Grundmann, V Kwatra, I Essa, in Proc. IEEE Conf. Computer Vision and Pattern Recognition. Auto-directed video stabilization with robust l1 optimal camera paths, (2011).Google Scholar
- P Pan, A Minagawa, J Sun, Y Hotta, S Naoi, in Proc. Intl. Conf. Pattern Recognition. A dual pass video stabilization system using iterative motion estimation and adaptive motion smoothing, (2010).Google Scholar
- S Ertürk, in Proc. Intl. Symp. Image and Signal Processing and Analysis. Image sequence stabilization: motion vector integration (MVI) versus frame position smoothing (FPS), (2001).Google Scholar
- S Ertürk, Real-time digital image stabilization using Kalman filters. Real-Time Imaging. 8:, 317–28 (2002).View ArticleMATHGoogle Scholar
- A Litvin, J Konrad, W Karl, Probabilistic video stabilization using Kalman filtering and mosaicking. Proc. IS&T/SPIE Symp. Electronic Imaging, Image and Video Comm. and Proc.5022:, 663–74 (2003).Google Scholar
- C Jia, Z Sinno, B Evans, in Proc. Asilomar Conf. Signals, Sytems, and Computers. Real-time 3D rotation smoothing for video stabilization, (2014).Google Scholar
- MJ Tanakian, M Rezaei, F Mohanna, Digital video stabilizer by adaptive fuzzy filtering. EURASIP J. Image Video Process. 21: (2012).Google Scholar
- Güllu, MK̈, E Yaman, S Ertürk, Image sequence stabilization using fuzzy adaptive Kalman filtering. Electron. Lett. 39:, 429–31 (2003).View ArticleGoogle Scholar
- C Wang, J-H Kim, K-Y Byun, J Ni, S-J Ko, Robust digital image stabilization using the Kalman filter. IEEE Trans. Consum. Electron. 55(1), 6–14 (2009).View ArticleGoogle Scholar
- Y Bar-Shalom, XR Li, T Kirubarajan, Estimation with applications to tracking and navigation: theory algorithms and software (J. Wiley and Sons, 2001).Google Scholar
- HAP Blom, Y Bar-Shalom, The interacting multiple model algorithm for systems with Markovian switching coefficients. IEEE Trans. Autom. Control. 33(8), 780–3 (1988).View ArticleMATHGoogle Scholar
- E Mazor, A Averbuch, Y Bar-Shalom, J Dayan, Interacting multiple model methods in target tracking: a survey. IEEE Trans. Aerosp. Electron. Syst. 34:, 103–23 (1998).View ArticleGoogle Scholar
- M Tico, M Vehvilainen, in Proc. IEEE Intl. Conf. Image Processing. Constraint motion filtering for video stabilization, (2005).Google Scholar
- M Tico, M Vehvilainen, in Proc. European Signal Processing Conference. Constraint translational and rotational motion filtering for video stabilization, (2005).Google Scholar
- D Simon, DL Simon, Aircraft turbofan engine health estimation using constrained Kalman filtering. ASME J. Eng. Gas Turbines Power. 127:, 323–8 (2005).View ArticleGoogle Scholar
- D Simon, Kalman filtering with state constraints: a survey of linear and nonlinear algorithms. IET Control Theory Appl. 4:, 1303–18 (2010).MathSciNetView ArticleGoogle Scholar
- J Nocedal, SJ Wright, Numerical optimization (Springer, 1999).Google Scholar
- M Niskanen, O Silven, M Tico, in Proc. IEEE Intl. Conf. Multimedia and Expo. Video stabilization performance assessment, (2006).Google Scholar