 Research
 Open access
 Published:
A novel camera path planning algorithm for realtime video stabilization
EURASIP Journal on Image and Video Processing volume 2019, Article number: 64 (2019)
Abstract
In video stabilization, a steady camera path plan is as important as accurate camera motion prediction. While several camera path planning algorithms have been studied, most algorithms are used for postprocessing video stabilization. Since all of the frames are accessible in postprocessing video stabilization, a camera path planning method can generate a good camera path, considering the camera movements at all of the frames. Meanwhile, the number of accessible frames in realtime video stabilization is limited. Thus, smooth camera path planning is a challenging issue in realtime video stabilization. For example, a camera path planner does not know in advance the locations of sudden camera motions, so it is not easy to compensate a sudden camera movement. Therefore, this paper proposes a novel camera path planning algorithm for realtime video stabilization. A camera path planning method should fully utilize the given image margin to provide steady camera paths. In contrast, it should retain the certain amount of an unused image margin for use in frames with dynamic or sudden camera movements. To resolve this problem, the proposed algorithm uses two terms related to the steady camera path and the amount of image margin, and it crossoptimizes the two terms to provide a new camera path onthefly. Experimental results show that the proposed algorithm provides excellent performance in realtime video stabilization while requiring negligible computation.
1 Introduction
With the recent advancement in camera technology, the shooting and sharing of videos with consumer cameras has significantly increased. However, the quality of videos made by amateur users differs from that by professional users. One of the most obvious differences between videos shot by amateur and professional users is the quality gap in terms of video stability. In order to stabilize videos, professional users adopt various professional stabilization hardware tools such as gimbals, steadicams, tripods, and camera dollies. In contrast, the amateur user shoots videos using consumer cameras equipped with only an optical image stabilizer [1] without professional stabilization hardware tools. Hence, video stabilization is crucial to reduce the quality gap between videos shot by amateurs and those by professional users.
1.1 Video stabilization
Video stabilization mainly consists of three steps: camera motion estimation, new camera path planning, and new image synthesis. In camera motion estimation, the geometric relationship between consecutive frames is analyzed based on the adopted motion model, and the original camera path is predicted. The new camera path is planned by smoothing the original camera path. Finally, the stabilized views on the new camera path are synthesized.
1.1.1 Camera motion estimation
The geometric relationship between consecutive frames should be accurately described to completely remove unwanted camera motion. Hence, the performance of video stabilization is directly related to the accuracy of the adopted motion model, and the main concern of conventional video stabilization techniques is to develop new motion models to accurately describe the geometric relationship between consecutive frames.
Video stabilization is categorized into 2D and 3D approaches according to the motion model. The first approach is 2D video stabilization [2] that describes a relationship between two successive frames using a 2D motion model such as a simple translational model, 2D rigid model, and 2D affine model [3–12]. The 2D methods are fast and robust, compared to the 3D approach. If camera rotations are small along the x and y axes in the 3D space, 2D video stabilization provides good performance [13]. However, as the camera motion becomes dynamic, the 2D motion model cannot describe the geometric relationship between successive frames, and the performance is very limited. Lee at el. proposed a method to directly stabilize a video without explicitly estimating camera motion [14]. This method finds a set of transformations to smooth out feature trajectories [15, 16] and stabilize the video.
The second approach is 3D video stabilization introduced by Buehler et al. [17]. Early 3D approaches reconstruct 3D models of the scene and camera motion using structurefrommotion (SFM) techniques [18]. The stabilized views are then rendered at the new 3D camera path [17, 19]. The 3D video stabilization provides better performance than the 2D approach. Since these 3D methods often fail with videos that lack parallax, they are not practical in many cases [20]. Liu proposed subspace video stabilization that combines the advantages of 2D and 3D video stabilization [21]. However, this method often fails for videos that include dynamic motion due to an insufficient number of long feature trajectories [20]. Hence, Wang [20] proposed video stabilization for a video where 3D reconstruction is difficult or where long feature trajectories are not available. Liu et al. [22] proposed a motion model SteadyFlow that is a specific optical flow by enforcing strong spatial coherence. Grundmann et al. proposed calibrationfree rolling shutter removal, based on a mixture model of homographies [23]. Dong proposed video stabilization for realtime applications using a motion model based on interframe homography estimation [24]. Ringaby et al. proposed a method for rectifying and stabilizing video by parameterizing camera rotation as a continuous curve [25]. Karpendo et al. proposed video stabilization and rolling shutter correction using gyroscopes [26]. Lee proposed video stabilization based on human visual system [27].
1.1.2 Camera path planning
The second step of video stabilization is new camera path planning that removes the annoying irregular perturbations. The simplest method for predicting a new camera path is the application of a lowpass filter (or IIR filter [28]) or Gaussian filter, which suppresses highfrequency jitter in the original camera path. Since the lowpass filter efficiently smooths the original camera path, it is widely used in video stabilization [3, 5, 10–12, 12, 27, 29, 30]. Zhu et al. proposed an inertial motion model [8] for the motion filtering. Litvin et al. [31] proposed a Kalman filter to smooth the camera path. Because of camera shake, the actual camera path includes noise in the intended camera path. These methods [4, 31] estimate the intended camera path using the Kalman filter [32]. Another camera path planning is to model the camera trajectory such as linear and parabolic [29].
Video stabilization methods crop the original view by a cropping window positioned on the new camera path, which is one of the simplest and most robust approaches [21]. If input videos include dynamic camera motion, the cropping windows sometimes will not fit the original frame. In this case, outofbound areas will not be visible or complex motioninpainting will be required [10, 33]. However, motioninpainting does not provide satisfactory performance in some cases and has a high computational cost. Another approach is to forcibly modify the new camera path to fit the original frame, but compulsory modification of the new camera path will degrade the visual quality of the stabilized videos. In other approaches, the size of output videos can be set so that the cropping window always fits the original frame. While this method significantly reduces the field of view (or the size of output views) for videos having dynamic camera motions, it can maximize dramatic cinematographic effect [34]. Video 1, which can be downloadable on website [35], shows examples of a video having dynamic camera motion. Videos 1(a), 1(b), 1(c), and 1(d) depict an original video, a stabilized video including outofbound, a video stabilized by forcibly modifying the new camera path, and a video stabilized by reducing the size of the cropping window, respectively. While Video 1(d) provides the best performance in terms of video stability, the field of view is considerably reduced. Hence, Grudmann et al. proposed an algorithm for automatically applying constrainable, L1optimal camera paths for postprocess video stabilization [36]. Under boundary constraints, the method computes camera paths that are composed of constant, linear, and parabolic segments, in order to mimic camera motions employed by professional cinematographers.
1.1.3 Image synthesis
Based on the adopted motion model, the 2D methods synthesize output videos using an interpolation technique. The 3D methods need to employ the view interpolation for rendering output videos. However, the output videos from the view interpolation sometimes include image artifacts like the ghost effect [29]. In order to reduce the image artifacts, new image rendering by contentpreserving warp [29, 30] was proposed. The contentpreserving warp relaxes the constraint for a physically correct reconstruction while preserving the perceptual quality. Some studies employ a purely rotational motion model without considering translational camera motion, in order to avoid image artifacts from view interpolation [26, 27]. Since they consider only the camera rotation, there is no occlusion issue.
1.2 Realtime processing
Most conventional video stabilization methods have been developed for postprocessing used in professional video editing tools. After shooting videos, experts can edit their videos using the professional video editing tools. These methods are necessary in generating highquality videos for advertisement, film, commercial use, etc. Meanwhile, most users using consumer digital cameras are not good at the professional video editing tools. They usually display, upload, and share videos without modification. Therefore, the performance of realtime video stabilization is very crucial in consumer digital cameras. Fast and efficient video stabilization methods for realtime applications have been studied [5, 12, 24, 27, 37, 38]. However, these methods did not consider camera path planning in depth.
On the other hand, camera path planning for realtime video stabilization has several challenging issues as follows. First, the size of output videos should be determined prior to applying realtime video stabilization, regardless of whether an input video includes static or dynamic camera motions. Thus, the realtime video stabilization cannot adaptively determine the size of the cropping window to fit the original frame according to the given video. Second, a camera path planning algorithm should predict the new camera path onthefly. Unlike postprocessing video stabilization, the new camera path cannot be updated or modified to gradually improve the quality. Third, it cannot consider camera motions at all of the frames to plan the new camera path. For example, since Grundmann’s method obtains the L1optimal camera paths via linear programming using all of the frames, it cannot be used for realtime video stabilization [36]. If the quality of a planned camera path is not acceptable, although accurate camera motions are predicted, the visual quality of stabilized videos will be significantly degraded. Consequently, in realtime video stabilization, camera path planning is as important as accurate camera motion prediction. As Dong pointed out, most conventional video stabilization methods have been developed for postprocessing [24]. Moreover, camera path planning algorithms for realtime video stabilization have not yet been studied in detail. Therefore, in this paper, we propose a new camera path planning algorithm for realtime video stabilization by crossoptimizing two terms related to steady camera path and the amount of image margin. One term renders the new camera trajectory smooth, and the other term retains the certain amount of an unused image margin for use in dynamic or sudden camera motions.
2 Background
For convenience of explanation, this paper describes the proposed algorithm of the camera path planning under simple video stabilization with a 2D translational motion model as depicted in Fig. 1. However, the proposed method can be modified to plan the camera path for other stabilization using a different motion model such as homography.
The original camera path of videos shot by a handheld camera is usually shaky, as depicted in Fig. 2. To provide steady output videos, it is necessary to predict a smooth camera path from the original camera path. The simplest method for estimating a smooth camera path is to apply a lowpass filter to the original camera path as follows.
Here, P_{O}(t) and P_{N}(t) denote the original and new camera positions, respectively. w(k) is the kth coefficient of the lowpass filter. Since the above ideal filter requires an infinite number of signals, it is impossible to realize. In practice, the lowpass filter should consider a finite number of signals. Specifically, a casual filter that depends on camera positions of past and present frames is considered in realtime applications as follows.
Figure 2 shows the result of applying the casual lowpass filter to a camera path along the x direction. The lowpass filter significantly smoothens the camera path. Here, the difference between the original and new camera paths in Fig. 2 can be interpreted as the amount of the cropping window shift shown in Fig. 1. If the amount of the shift is more than the width or height of the margin, the cropping window will not fit within the original frame. The width and height margins (M_{W} and M_{H}) are defined as follows.
where W_{I}, H_{I}, and W_{O}, H_{O} are the width and height of the original and output frames, respectively. Outofbound areas will be invisible or require motioninpainting [10, 33]. The boundaries are depicted in Fig. 3. In the figure, a gray region illustrates an example of the outofbound area. For realtime applications, the motioninpainting is not a feasible solution due to complexity. Hence, the new camera path should be determined so that the cropping window is located inside the original frame. However, the amount of cropping window shift in the method based on the lowpass filter is uncontrollable, and the method does not guarantee that the cropping window will always be located within the original frame. Therefore, this simple method cannot be applied to realtime video stabilization where the size of the output video is fixed.
In postprocess video stabilization, the entire original camera path is given before planning the new camera path. Hence, the smooth camera path can be optimally planned so that an outofbound area does not occur. However, since camera positions at the subsequent frames are unknown in the realtime video stabilization, it is sometimes difficult to correctly determine the camera position at the current frame. Figure 4 illustrates an example of the difficulty in planning a camera path for realtime video stabilization. In Fig. 4a, it could not be predicted whether the camera path goes up or down at the subsequent frames. If the camera path travels downward at the subsequent frames, the new camera path can be straight, as shown in Fig. 4b. If the camera path travels upward at the subsequent frames, the direction of the new camera path should change, as seen in Fig. 4c. This sudden movement will degrade the visual quality of the stabilized video. On the other hand, since the entire original camera path is given in postprocess video stabilization, the new camera path can be planned as shown in Fig. 4d. As shown in this example, the realtime video stabilization is difficult without the camera positions at the subsequent frames. The performance of the camera path planning in realtime video stabilization will inevitably be limited. Therefore, camera path planning for realtime video stabilization is a challenging problem.
3 Method—camera path planning
3.1 Observation
Simple tests were performed to observe how camera paths affect visual perception. People who did not have related knowledge were selected as participants. Ten participants evaluated the visual perception of synthetic videos with different types of camera paths. Each video has one type of camera path among various types including static view, horizontally moving views with zero acceleration, and randomly moving views. If the synthetic video includes local motion, it will disturb the evaluation of the visual perception caused only by the camera path. In order to exclude the local motion, the test videos are synthesized by shifting a single image along the camera path. The participants were asked to sort the videos in a highquality order according to their visual perception. Table 1 shows the results. Interestingly, the scores made by the participants were in the same order. Obviously, the static camera path obtained a high score. Hence, the constant path is the first property to consider when planning a camera path. The next property is the linearity of a camera path. Consistency in camera movements is an important feature. Even a small random motion gives a poor visual impression. A camera path with a 4 pixel horizontal movement is better than a camera path with a ± 1 random camera movement. The amount of slope of the linear camera path also affects the visual quality. A linear camera path with a gentle slope gives better visual quality than that with a steep slope.
More complex experiments are needed to more profoundly understand the relationship between the camera path and visual perception in terms of stability. However, it may require deep psychovisual understanding and extensive experiments with a greater number of participants. This paper considers only the two properties obtained from the above simple experiments in planning the new camera path.
3.2 Assumption
It is challenging to plan the steady camera path within the upper and lower boundaries. As discussed in Section 2, the difficulty is that the realtime video stabilization determines the new camera position in the current frame without the original camera positions in the subsequent frames. Even if the original camera positions in several subsequent frames are available before determining the new camera position in the current frame, the quality of the new camera path will be significantly increased. Fortunately, recent camera systems have extraordinarily large memory to temporally store multiple frames. Some frames can be buffered to improve the quality of the new camera path. Figure 5 shows the assumptions for solving a problem in the proposed algorithm. In the figure, the camera captures up to the (n+K)th frame and stores them to the buffer. The original camera positions for the buffered frames are predicted. The proposed algorithm then determines the new camera position in the (n)th frame (or the current frame). Here, K represents the number of additional buffered frames.
The quality of the new camera path will depend on the amount of K. This paper proposes a new method to estimate a steady camera path for the given K with the lower and upper boundary constraints.
3.3 Framework of the proposed algorithm
The first property to consider in planning camera path is providing constant camera path. The constant camera path represents a line with a zero slope. The next property is the linear camera path where the camera has a constant velocity. While the slope of the linear camera path is not zero, the linear path is also a line. Although the physical meanings of the two properties differ in terms of camera movement, the shape of the constant and linear camera paths is basically a line. Hence, the proposed algorithm considers a line as a basic feature to pursue in the camera path.
The proposed algorithm first finds all possible candidate lines that pass through the last camera position on the new camera path, as shown in Fig. 6. Since the new camera path should be continuous, the candidate lines should start from the camera position at the (n−1)th frame, where it is the last camera position on the new camera path. They then end between low and upper boundaries at the (n+K)th frame. All possible candidate lines are generated in a way that the interval between adjacent candidate lines is 1 at the (n+K)th frame. In the figure, the camera position at the current frame (or the nth frame) is not yet determined. The buffered frames range from the (n+1)th frame to the (n+K)th frame. Camera motions for the buffered frames can be predicted in advance, so that the original camera positions and boundaries at the buffer frames can be determined. Hence, the end points of the candidate lines are simply located between the two boundaries at the (n+K)th frame. Then, the proposed algorithm examines all the possible candidate lines within the search range using the proposed cost function that will be described in the next subsection. The candidate line having the minimum cost is chosen as the best line. Finally, the camera position at the current frame is calculated from the best line. Note that the camera positions at the buffered frame do not belong to the new camera path.
If a candidate line passes through an outofbound area, it should not be chosen as the best line. Figure 7 depicts examples of the candidate lines that should not be chosen. In Fig. 7a, candidate lines in region R pass through an outofbound area. Hence, no candidate line in region R should be chosen as the best line. Sometimes, all the candidate lines within the search range pass through the outofbound area as shown in Fig. 7b. In this case, when no best line can be found, the value of K is decreased by 1 until the best line is found.
3.4 Cost function
The proposed algorithm chooses a candidate line having the minimum cost as the best line. The cost function of the ith candidate line at the nth frame is defined as follows.
Here, w_{1}, w_{2}, and w_{3} are weighting factors. C_{S}(n,i) is defined as follows.
S(n,i) is a slope of the ith candidate line at the nth frame. The zero slope of a candidate line represents no camera movement. Therefore, C_{S}(n,i) can be interpreted as a term to pursue the static camera path, which is the first property. If the slope of a candidate line is steep, this term becomes large.
C_{L}(n,i) is defined as follows.
Here, \(i^{\text {min}}_{n1}\) is the index of the best line at the (n−1)th frame. Therefore, \(S(n1,i^{\text {min}}_{n1})\) denotes the slope of the best line at the (n−1)th frame. C_{L}(n,i) will be zero if the slope of a candidate line at the current frame is the same as the slope of the best line at the previous frame. It can be interpreted that this term seeks to keep the linearity of the camera path, which is the second property.
If the camera path plan method utilizes the upper and lower boundaries as simple constraints, although the constraints will prevent the new camera path exceeding the available image margin, the new camera position can sometimes be very close to the lower boundary such as the new camera position for the current frame in Fig. 4a. In this case, if the original camera path travels upwards (case a), since there is no image margin in the down direction, the new camera path should suddenly change the direction to upward. This sudden movement on the new camera path will degrade the visual quality of stabilized videos. To avoid this situation, this paper considers a new term of C_{M}(n,i) to retain the available image margin in the upper and lower directions.
where \(P^{i}_{N}(n)\) is a camera position at the nth frame on the ith candidate line. M_{LO} and M_{UP} are the sizes of the lower and upper margins, respectively. These margins are usually set at the same value. \(P_{O}(n) + M_{\text {UP}}  P^{i}_{N}(n)\) (or \(P^{i}_{N}(n) + M_{\text {LO}}  P_{O}(n)\)) is the Euclidean distance between \(P^{i}_{N}(n)\) and the upper boundary (or lower boundary), as given in Fig. 8. Here, 0.01 is considered to prevent division by zero. If \(P^{i}_{N}(n)\) is close to the upper boundary (or lower boundary), the value of F_{UP}(n,i) (or F_{LO}(n,i)) will become significantly large. If \(P^{i}_{N}(n)\) is remote from the upper boundary (or lower boundary), the value of F_{UP}(n,i) (or F_{LO}(n,i)) will become significantly small. When \(P^{i}_{N}(n)\) is located at a position equally far from both of the two boundaries, (F_{UP}(n,i)+F_{LO}(n,i)) will obtain the minimum value. Accordingly, F_{UP}(n,i) and F_{LO}(n,i) can be interpreted as forces to push a camera position from boundaries to retain the image margin. C_{M}(n,i) accumulates all of F_{UP}(n,i) and F_{LO}(n,i) at camera positions on the ith candidate line. The camera position cannot be located outside the boundaries. If the K value decreases, C_{M}(n,i) becomes an important term to achieve a new camera path with high quality in realtime video stabilization. The details are given in Section 4 with the experimental results.
Let us briefly consider the computational process of the proposed method. The first step of the proposed algorithm is to find candidate lines. This can be simply achieved by obtaining the lines connecting from the (n−1)th camera position to the points within the search range, as shown in Fig. 6. The next step involves excluding candidate lines that include at an outofbound area, as shown in Fig. 7. If any point on the ith candidate line (or \(P^{i}_{N}(n)\)) is not located from P_{O}(n)−M_{LO} to P_{O}(n)+M_{UP}, the corresponding candidate line should be excluded. In the final step, the method evaluates each candidate line according to Eq. (4) and finds the candidate line having the minimum cost. As shown above, the proposed algorithm requires several arithmetic operations and comparisons to plan the new camera position of the new camera path at the current frame. Compared to the computational burden of video stabilization, its computational process is extremely low, as will be demonstrated in the Section 4.
4 Experimental results and discussion
We show simulation results of four representative videos among the extensive test videos to evaluate the proposed algorithms. Two videos were made by a man running, so that the videos include very dynamic camera motion. The other two videos were shot while walking, and the camera motion is normal. In the videos, the region of interest is set on the human face. For each video, the original camera path is extracted by tracking the human face. In this paper, not all simulation results can be illustrated due to the limited space. Only some of the simulation results are given in this section. The original and new camera paths for all the frames in four representative test videos are downloadable on the Web [35]. Output videos are also downloadable on the Web. Highquality Figs. 9, 10, 11, 12, 13, and 14 were uploaded on the Web.
Figure 9 depicts some part of the simulation results, in order to show the effect of C_{S}(n,i) in Eq. (4). Here, (w_{1}, w_{2}, w_{3}) for Fig. 9 a and b are set at (0, 0.01, 10) and (0.01, 0.01, 10), respectively. Without C_{S}(n,i), the first property mentioned in Section 3.1 is not satisfied. Therefore, the new camera path is not constant as shown in Fig. 9 a. C_{S}(n,i) renders the new camera path constant.
Figure 10 shows the effect of C_{L}(n,i) in Eq. (4). Weights for Fig. 10 a and b are set at (0.01, 0, 10) and (0.01, 0.01, 10), respectively. The method without C_{L}(n,i) considers only a constant camera path (C_{S}(n,i)) and a force pushing from boundaries (C_{M}(n,i)). When the current position is located remote from the boundaries, C_{S}(n,i) is dominant and the method renders the new camera path constant. If the original camera path globally descends as in Fig. 10, the camera position along the constant camera path eventually approaches close to the boundary. Then, the force pushing from the boundary becomes large so that the method will change the direction of the new camera path downward. The camera position again becomes remote from the boundary. Then, C_{S}(n,i) becomes dominant and the method renders the new camera path constant again. Accordingly, the slope of the new camera path frequently changes as shown in Fig. 10 a. C_{L}(n,i) improves the performance as shown Fig. 10 b.
If the value of K is large, the proposed method without a term of C_{M}(n,i) can easily plan a steady camera path. Figure 11 a shows an example where K is set at 60. Since the method can consider 60 frames in the future, it can provide a steady camera path by coping with sudden camera movements in advance. However, when the value of K decreases, it is difficult to cope with the sudden camera movements, as shown in Fig. 11 b. Here, K is set at 2. The proposed method without C_{M}(n,i) utilizes the upper and lower boundaries as only boundary constraints. Hence, even if the amount of available image margin is very small, the method attempts to make the new camera path constant. Only if there is no image margin, the method changes the new camera path upward or downward. Accordingly, even for small fluctuations of the original camera path, the direction of the new camera path frequently changes, which is undesirable. On the other hand, C_{M}(n,i) pushes the new camera path from the boundaries to maintain the image margin. In the details, if the weights for C_{L}(n,i) and C_{S}(n,i) are set at the same value, C_{L}(n,i) and C_{S}(n,i) show almost the same dominance. Then, C_{M}(n,i) becomes a dominant term. Therefore, the new camera path travels to the center or the middle of the upper and lower boundaries where the pushing forces from the two boundaries are the same. After the path reaches the center, C_{M}(n,i) becomes a weak term and the method can set the new direction of the new camera path as shown in Fig. 11c.
Figure 12 depicts the simulation results for a video, including usual camera motion, in order to show the performance of the proposed algorithm. The image margins for Fig. 12 a, b, c, and d are set at 10%. The values of K are set at 2, 5, 10, and 60, respectively. The proposed algorithm significantly reduces the highfrequency jitter of camera motion, and the new camera path does not cross the upper and lower boundaries. Although the number of buffer frames in the proposed method of P_{2,10%} is only 2, it provides outstanding performance. For reference, the new camera paths based on the following simple lowpass filters, \(P^{\text {LPF}}_{N}(t)\), are illustrated in Fig. 12 e and f.
The values of K for LPF_{10} and LPF_{60} in Fig. 12 e and f are set at 10 and 60, respectively. While LPF_{10} significantly reduces highfrequency jitter of the camera motion, the new camera path is still unstable. Although LPF_{60} provides good performance, the value of K is very large. For fullHD video, the memory size is 178 M bytes (=1920×1080×1.5×60) for a YCbCr format. Moreover, LPF_{10} and LPF_{60} do not guarantee that the cropping window will always be located within the original frame (see Fig. 3.) The performance of the proposed method of P_{2,10%} provides better performance than LPF_{10}. Note that while the number of buffer frames in LPF_{10} is 10, that in P_{2,10%} is only 2.
Figure 13 depicts the simulation results for a video, including dynamic camera motion depending on the value of K. The image margins for Fig. 13 a, b, c, and d are set at 20%. The values of K are set at 2, 5, 10, and 60, respectively. Figure 13 e and f show simulation results of LPF_{10} and LPF_{60}. Although the new camera path predicted from the proposed algorithm of P_{2,20%} is much better than the original camera path, it does not provide outstanding performance, compared to that shown in Fig. 12 a. Two frame buffers are insufficient to cope with sudden camera movement in the video, including dynamic camera motion. As the value of K increases, the proposed algorithm provides more stable camera path. While LPF_{60} provides good performance, since this method does not consider boundary constraints, the new camera path crosses the boundary.
The threshold values of w_{1} and w_{2} can be set as the different values. For example, the value of w_{1} can be set larger than that of w_{2} in order to give priority to C_{S}(n,i). However, according to our extensive experiments, it is recommended to set the two thresholds to the same value. The performance of the proposed algorithm is not sensitive to the threshold value of w_{3}.
Figure 14 shows the performance of the proposed algorithm according to the image margin. The image margins for Fig. 14 a, b, c, and d are set at 10%, 10%, 20%, and 20%, respectively. The values of K are set at 10, 20, 10, and 20, respectively. As the size of the image margin decreases, the algorithm performance reduces. The performance gap between P_{10,10%} and P_{20,10%} is more than the performance gap between P_{10,20%} and P_{20,20%}. When the image margin decreases, the value of K needs to be large.
Comparison of performance between the proposed algorithm and Grundmann’s work [36] is as illustrated in Fig. 15. The image margins for Fig. 15a, b, and c were set at 20%. The values of K were set at 2, 10, and 60, respectively. Figure 15 d and e represent LPF_{10} and LPF_{60}, respectively. Figure 15 f, g, and h illustrate Grundmann’s results. Parameters w_{1}, w_{2}, and w_{3} for Fig. 15f, g, and h were set to (0, 1, 0), (0, 0, 1), and (10, 1, 100), respectively. P_{2,%20} and P_{10,%20} did not outperform Grundmann’s work [36]. However, while his work considers all the fames in order to get the optimal paths, the proposed P_{2,%20} and P_{10,%20} consider only 2 and 10 buffered frames. As the proposed algorithm uses more buffered frames, its performance becomes comparable to Grundmann’s work as shown in Fig. 15c. Since LPF_{60} strongly smoothens the original camera path without considering boundary constraints, it provides performance similar to that of the proposed method and Grundmann’s work.
The original camera path was extracted by tracking the human face, and the camera path was generated by using the proposed method, LPF_{10}, and LPF_{60}. Then, simple video stabilization with 2D translational motion model in Fig. 1 was applied to generate the output videos that were uploaded on the web [35]. To evaluate the proposed algorithm, the stability of the human face position along time needs to be examined. Since LPF_{60} sometimes exceeds the available image margin, outofbound regions (black regions) are frequently observed in the stabilized videos. Moreover, it requires huge frame delay. Thus, LPF_{60} is not adequate for realtime video stabilization. However, since LPF_{60} provides a very stable camera path, a video rendered by LPF_{60} is used for comparison purpose. As shown in the comparison videos, the proposed algorithm provides performance close to LPF_{60}. A video from LPF_{10} reduces highfrequency jitter of the original video compared to the original video. However, the human face is slightly unstable.
Table 2 shows mean absolute slopes of camera paths used to evaluate the performance of algorithms. The smaller the value of the mean absolute slope, the more stable the camera path. The mean absolute slopes of LPF_{10}, LPF_{60}, P_{10,10%}, P_{10,20%}, P_{60,10%}, and P_{60,20%} are significantly lower than those of the original camera path. It means that they provide more stable camera path than the original camera path. P_{10,10%} always provides better performance than LPF_{10}. Since 10 frame buffers are not enough to provide very stable camera path, P_{10,10%} does not outperform LPF_{60}. If the size of the buffer becomes the same as 60 frames and the margin is enough, P_{60,20%} outperforms LPF_{60}. It is noted here that LPF_{10} and LPF_{60} do not consider the upper and lower boundary constraints.
The proposed algorithm is designed for realtime video stabilization. Hence, its computational burden needs to be dealt with. Table 3 shows the time for processing a single frame in the proposed algorithm under Intel Core i77700K CPU (4.2 GHz) according to the value of K. The proposed algorithm was implemented with C language without optimization. As mentioned in Section 1, video stabilization generally requires huge computational complexity; however, the realtime video stabilization is quite challenging. In that sense, tens of microseconds for processing a single frame is extremely small, hence considered negligible, compared to the total processing time of the video stabilization. As the value of K increases, the computational burden also increases. However, the increase of complexity is not proportional to the value of K. Moreover, the processing time for K=60 is only 62.2µs.
5 Conclusion
This paper presents a novel camera path planning algorithm for realtime video stabilization by crossoptimizing two terms related to steady camera path and the amount of image margin. Hence, the proposed algorithm attempts to provide a steady camera path while maintaining a sufficient image margin to compensate for dynamic or sudden camera motions.
While all the frames can be used for to plan the new camera path in postprocessing video stabilization, only a limited number of frames can be used in realtime video stabilization. Moreover, the realtime camera path planning algorithm should predict the new camera path onthefly. Once the new camera path is planned, it cannot be updated or modified to improve the quality, unlike postprocessing video stabilization. Hence, camera path planning is a challenging issue in realtime video stabilization. If the quality of a planned camera path is not acceptable, although accurate camera motions are predicted, the quality of stabilized videos will be degraded. Hence, camera path planning is an essential feature in realtime video stabilization, and our work is thus meaningful.
Abbreviations
 CPU:

Central processing unit
 HD:

High definition
 IIR:

Infinite impulse response
 LPF:

Lowpass filter
 SFM:

Structurefrommotion
References
F. La Rosa, M. Celvisia Virzì, F. Bonaccorso, M. Branciforte, Optical image stabilization (ois). STMicroelectronics white paper 2015 (2015). https://www.st.com/resource/en/white_paper/ois_white_paper.pdf.
C. Morimoto, R. Chellappa, in Proceedings of the DARPA Image Understanding Workshop. Evaluation of image stabilization algorithms (IEEENew Jersey, 1998), pp. 295–302.
W. H. Cho, K. S. Hong, Affine motion based CMOS distortion analysis and CMOS digital image stabilization. IEEE Trans. Consum. Electron.53:, 833–841 (2007).
C. Wang, J. H. Kim, K. Y. Byung, J. Ni, S. J. Ko, Robust digital image stabilization using the Kalman filter. IEEE Trans. Consum. Electron.55:, 6–14 (2009).
W. H. Cho, D. W. Kim, K. S. Hong, Mos digital image stabilization. IEEE Trans. Consum. Electron.42:, 979–986 (2007).
H. C. Chang, S. H. Lai, K. R. Lu, in IEEE International Conference on Multimedia and Expo. A robust and efficient video stabilization algorithm (IEEENew Jersey, 2004), pp. 16–27.
K. Ratakonda, in IEEE International Symposium on Circuits and Systems. Realtime digital video stabilization for multimedia applications (IEEENew Jersey, 1998), pp. 69–72.
Z. Zhu, G. Xu, Y. Yang, J. S. Jin, in IEEE International Conference on Intelligent Vehicles. Camera stabilization based on 2.5D motion estimation and inertial motion filtering (IEEENew Jersey, 1998), pp. 329–334.
Y. Matsushita, E. Ofek, X. Tang, H. Y. Shum, in IEEE CVPR 2005. Fullframe video stabilization (IEEENew Jersey, 2005), pp. 50–57.
Y. Matsushita, E. Ofek, W. Ge, X. Tang, H. Y. Shum, Fullframe video stabilization with motion inpainting. IEEE Trans. Pattern. Anal. Mach. Intell.28:, 1150–1163 (2006).
S. Kumar, H. Azartash, M. Biswas, T. Nguyen, Realtime affine global motion estimation using phase correlation and its application for digital image stabilization. IEEE Trans. Image Process.20:, 3406–3419 (2011).
Y. G. Lee, G. Kai, Fast rolling shutter compensation based on piecewise quadratic approximation of a camera trajectory. Opt. Eng.53:, 093101 (2014).
J. S. Jin, Z. Zhu, G. Xu, A stable vision system for moving vehicles. IEEE Trans. Intell. Transp. Syst.1:, 32–39 (2000).
K. Y. Lee, Y. Y. Chuang, B. Y. Chen, M. Ouhyoung, in IEEE International Conference on Computer Vision. Video stabilization using robust feature trajectories (IEEENew Jersey, 2009), pp. 1397–1404.
C. Liu, J. Yuen, A. Torralba, J. Sivic, W. T. Freeman, in In Proceedings of European Conference on Computer Vision 2008. Sift flow: dense correspondence across difference scenes (SpringerNew York, 2008), pp. 28–42.
P. Sand, S. Teller, Particle video: longrange motion estimation using point trajectories. Int. J. Comput. Vis.80:, 72–91 (2008).
C. Buehler, M. Bosse, L. McMillan, in IEEE Conference on Computer Vision and Pattern Recognition. Nonmetric image based rendering for video stabilization (IEEENew Jersey, 2001), pp. 609–614.
R. I. Hartley, A. Zisserman, Multiple View Geometric in Computer Vision (Cambridge University Press, Cambridge, 2004).
G. Zhang, W. Hua, X. Qin, Y. Shao, H. Bao, Video stabilization based on a 3D perspective camera model. Vis. Comput.25:, 997–1008 (2009).
Y. S. Wang, F. Liu, P. S. Hsu, T. Y. Lee, Spatially and temporally optimized video stabilization. IEEE Trans. Vis. Comput. Graph.19:, 1354–1361 (2013).
F. Liu, M. Gleicher, J. Wang, H. Jin, A. Agarwala, Subspace video stabilization. ACM Trans. Graph.30:, 4 (2011).
S. Liu, L. Yuan, P. Tan, J. Sun, in IEEE CVPR. Steadyflow: spatially smooth optical flow for video stabilization (IEEENew Jersey, 2014), pp. 4209–4216.
M. Grundmann, V. Kwatra, D. Castro, I. Essa, in International Conference on Computational Photography. Calibrationfree rolling shutter removal (IEEENew Jersey, 2012).
J. Dong, H. Liu, Video stabilization for strict realtime applications. IEEE Trans. Circ. Syst. Video Technol.27:, 716–724 (2017).
E. Ringaby, P. E. Forssen, Efficient video rectification and stabilization of cellphones. Inernational J. Comput. Vis.96:, 335–352 (2012).
A. Karpenko, D. Jacobs, J. Baek, M. Levoy, in Stanford Tech Report CTSR. Digital video stabilization and rolling shutter correction using gyroscopes (StanfordCalifornia, 2011), pp. 1–7.
Y. G. Lee, Video stabilization based on human visual system. J. Electron. Imaging. 23:, 053009 (2014).
A. J. Crawford, H. Denman, F. Kelly, F. Pitie, A. C. Kokaram, in IEEE International Conference on Image Processing. Gradient based dominant motion estimation with integral projections for real time video stabilization (IEEENew Jersey, 2004), pp. 3371–3374.
F. Liu, M. Gleicher, H. Jin, A. Agarwala, Contentpreserving warps for 3D video stabilization. ACM Trans. Graph.28:, 44 (2009).
Z. Zhou, H. Jin, Y. Ma, in IEEE CVPR. Planebased content preserving warps for video stabilization (IEEENew Jersey, 2013), pp. 2200–2306.
A. Litvin, J. Konrad, W. Karl, in IS&TSPIE Symp. Electronic Imaging, Image, and Video Comm. Probabilistic video stabilization using Kalman filtering and mosaicking (IS&T/SPIESpringfield, 2003), pp. 663–674.
D. Simon, Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches (Wiley, New Jersey, 2006).
B. Y. Chen, K. Y. Lee, W. T. Huang, J. S. Lin, Capturing intentionbased fullframe video stabilization. Pac. Graph.27:, 1805–1814 (2008).
M. L. Gleicher, F. Liu, Recinematography: improving the camerawork of casual video. ACM Trans. Multimed. Comput. Commun. Appl.5:, 2 (2008).
Data. http://sites.google.com/site/camerapathplanning. Accessed 6 Mar 2019.
M. Grundmann, V. Kwatra, I. Essa, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Autodirected video stabilization with robust l1 optimal camera paths (IEEENew Jersey, 2011).
H. C. Chang, S. H. Lai, K. R. Lu, A robust realtime video stabilization algorithm. J. Vis. Commun. Image Represent.17:, 659–673 (2006).
J. Song, X. Ma, in IEEE 7th International Conference on Awareness Science and Technology. A novel realtime digital video stabilization algorithm based on the improved diamond search and modified Kalman filter (IEEENew Jersey, 2015).
Funding
This present research has been conducted by the Research Grant of Kwangwoon University in 2018. This work was supported by an Institute for Information and communications Technology Promotion (IITP) grant funded by the Korea Government (MSIT) [2016000144], Moving Freeviewpoint 360VR Immersive Media System Design and Component Technologies.
Availability of data and materials
The data and materials are downloadable on the Web (http://sites.google.com/site/camerapathplanning).
Author information
Authors and Affiliations
Contributions
Lee works all things about this paper. The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The author declares he has no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional information
Authors’ information
Yun Gu Lee received his B.S., M.S., and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea, in 2000, 2002, and 2006, respectively. From 2006 to 2013, he was in the Media Processing Laboratory, DMC Research and Development Center, Samsung Electronics, Republic of Korea, where he was a principal engineer. In 2013, he joined the school of software, Kwangwoon University, Seoul, Republic of Korea, where he is currently a professor. His current research interests include the general areas of image and video coding, image and video processing, image stabilization, computer vision, and threedimensional display systems.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Lee, Y.G. A novel camera path planning algorithm for realtime video stabilization. J Image Video Proc. 2019, 64 (2019). https://doi.org/10.1186/s1364001904632
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1364001904632