- Research
- Open access
- Published:
Early search termination for fast motion estimation
EURASIP Journal on Image and Video Processing volume 2015, Article number: 29 (2015)
Abstract
This paper proposes a novel method for early termination of a motion search. It first introduces an estimator of the sum of absolute difference (SAD) between the current block and a search point. After analyzing the SAD estimator, it proposes the SAD condition to decide whether the current search point is around a minimum point or not. The counter condition is used to evaluate the current search point as an early search termination cue. After the current search point is considered as a minimum point, the search is stopped immediately. The proposed algorithm can easily be combined with most existing fast motion estimation algorithms to further reduce computational cost. While previous thresholding techniques have focused on the correlation of SAD values between a current block and its neighboring blocks, the proposed algorithm studies its own characteristics to predict the threshold. Hence, the proposed termination scheme can complement the previous early search termination techniques by merging the termination methods together, in order to further reduce computational cost. Experimental results demonstrate that the proposed algorithm successfully reduces the computational cost by combining the previous early search termination techniques.
1 Introduction
Recent video coding standards, such as H.264 [1] and HEVC [2], significantly improve coding efficiency, compared to previous video coding standards. Complexity is also increased along with improvement of the coding efficiency. The video coding standards have adopted the block matching algorithm (BMA) to reduce temporal redundancy between frames. In the BMA, the best matching block of each current block is found in the reference frame. Then, only the difference block between the matching block and the current block is coded with a motion vector (or a displacement). The straightforward method to find the best matching block is a full search block matching algorithm that checks all possible candidates in the search area, which requires a huge computational burden. Moreover, since recent video coding standards basically support variable block size and multiple reference frames, the number of candidates is dramatically increasing. Hence, motion estimation is the most time-consuming part among the video coding tools used in a video encoder. Motion estimation in video coding first finds the integer motion vector within the search area. Then, the subpixel motion vector is found around the integer motion vector. The computational burden for estimating the integer motion vector is much more than that for estimating the subpixel motion vector. So, this paper considers only integer motion estimation.
Fast motion estimation algorithms have been developed to reduce the heavy computational cost of the full search block matching algorithm, such as diamond search [3], MVFAST [4], PMVFAST [5], hexagon-based search [6], and hybrid unsymmetrical cross multihexagon-grid search [7]. In order to find the motion vector, these fast motion estimation algorithms check only limited search points using different shapes and sizes of search patterns, rather than examining all possible search points within a search area. Complex or large search patterns are robust to random motions, but they are expensive. Simple search patterns sometimes fail to find the global minimum point by falling into a local minimum one. Hence, advanced algorithms adaptively select shapes and sizes of search patterns. For example, the large and sparse search pattern is first adopted to prevent from falling into the local minimum point. Then, the small and dense search pattern is used to refine the search.
Early search termination techniques have been developed to further reduce the computational cost of fast motion estimation. Usually, a condition of early search termination is to compare the sum of absolute difference (SAD) value of a current search point with the threshold value. If the SAD value is less than the threshold value, the current search point is considered as the minimum point, and the search is immediately stopped without examining the remaining search points. Ismail mentioned that early stop search termination of a search process could reduce the number of computations [8]. In [4], a constant threshold was used for early search termination. Tourapis proposed an adaptive method in that the threshold value of the current block is determined among the minimum SAD values of the three adjacent blocks [5]. Ismail [9] proposed a dynamic early stop search termination (DESST) by considering a SAD value of an initial search center (ISC) of a current block and the average SAD values of ISCs in all the previous coded blocks. These methods determine the threshold value from SAD values of blocks spatially and temporally adjacent to the current block. Since motion fields of consecutive frames are smooth and gentle [9], they can efficiently judge a block to be stationary or not using the determined threshold. For the stationary blocks, they can terminate early the motion search. However, for the non-stationary blocks, the threshold value predicted from SAD values of blocks spatially and temporally neighboring to the current block is inadequate. Hence, while these methods significantly increase the performance, the determined threshold value may not be the best for every block.
In our previous work, a method to predict the distance between the optimal point (a position of the best matching block) and a current search point was proposed [10]. If the predicted distance between them is less than 0.5 pixel, the current search point is considered as the global minimum point, and the motion search was terminated early. However, since the method considered a one-dimensional image, the predicted distance may not be accurate. Also, there was no analysis of the distance measure in terms of performance and accuracy, so it was very difficult to combine the proposed distance measure with the existing motion estimation algorithms, in order to terminate a search early. Therefore, this paper proposes the new early search termination for fast motion estimation by extending our previous work to a two-dimensional space. It proposes a SAD estimator on the two-dimensional space and studies it to get a condition for early termination of a motion search. This paper also analyzes the proposed algorithm in terms of performance and accuracy.
While conventional early termination methods consider the SAD values of blocks spatially and temporally neighboring to a current block to determine termination conditions, the proposed algorithm takes into account a self-characteristic of the current block for a termination condition. Since the proposed algorithm utilizes different characteristics compared to the existing methods, the proposed termination method and the existing termination methods can complement each other, in order to further improve the speed. This paper provides how the proposed method complements previous thresholding techniques.
This paper is organized as follows. Section 2 describes the proposed algorithm. Experimental results are given in Section 3. Finally, conclusions are presented in Section 4.
2 Proposed method
2.1 SAD estimator
Let us assume that there are only translational motions between the previous and current frames, and all pixels in each block have the same translational motion. By assuming that pixels in the previous frame (or p(i,j)) can be reconstructed by linear interpolation of pixels in the current frame (or c(i,j)), a pixel in the previous frame is described as
Here, (a, b) and (m, n) are integer and subpel components of the optimal motion vector, respectively (Fig. 1). From Eq. (1), a SAD value at the integer search point (a, b) is as follows:
Here, G H(i,j) and G V(i,j) are (c(i,j)−c(i+1,j)) and (c(i,j)−c(i,j+1)), respectively. Equation (2) can be interpreted as a SAD estimator according to (m, n). If either of m or n is zero, Eq. (2) becomes identical to the equation in the previous work [10], which considers an only one-dimensional case.
2.2 SAD contours
Let (d x , d y ) be the distance along the x and y axes between the current search point and the optimal motion vector. In the following, this paper assumes that both d x and d y are always positive for simplicity. When both d x and d y are less than 1 pixel for a given integer search point, (d x , d y ) is a subpel component of the optimal motion vector. Then, (d x , d y ) is the same as (m, n), and the SAD value for a given (m, n) can be described by Eq. (2).
Figure 2 a depicts contours where SAD values of search points on the same contour are the same. For example, since points A and B are on the same contour, their SAD values are the same. On the other hand, the contours are not always symmetric along y=x as in Fig. 2 a. If the gradient of a block is not the same along the x and y directions, the contours are not symmetric. Figure 2 b shows a case in which the vertical gradient is much smaller than the horizontal gradient. In this case, the value of G V(i,j) is usually less than G H(i,j) in a block, and m is dominant to the value of Eq. (2).
2.3 Termination condition
Subpixel motion estimation finds the optimal motion vector with subpixel accuracy around the integer motion vector. Usually, the search range for the subpixel motion estimation is [ −1,1] along the x and y directions. So if (d x , d y ) for a current integer search point is less than 1 pixel along the x and y directions or the current integer search point is within the rectangle, the current search point should be the integer motion vector. Hence, the integer motion estimation can be stopped immediately, which is the early termination of a motion search. This subsection will study the condition wherein (d x , d y ) is less than 1 pixel.
Let SAD values of points along contour A in Fig. 2 a be 1000. If the SAD value of a current integer search point is less than 1000, the current search point should be inside of contour A. Otherwise, the current integer search point is outside of contour A. In this way, the relative position of the current integer search point can be roughly predicted by comparing the SAD values of the current integer search point and contours. If a contour is inside of the rectangle and the SAD value of a search point is less than that for the contour, (d x , d y ) of the search point will be less than 1 pixel. However, although a contour is inside of the rectangle, if a SAD value is more than that for the contour, it is not certain that (d x , d y ) is less than 1 pixel. For example, let point C be a current integer search point. In this case, the SAD value of the search point is more than that for contour A, and the search point is not inside of the rectangle. On the other hand, if point D is a current integer search point, although the SAD value of the search point is more than that for contour A, the search point is inside of the rectangle. Therefore, only if the SAD value of a search point is less than that for the contour that is inside of the rectangle, (d x , d y ) for the search point is the inside of the rectangle.
Figure 3 depicts three types of the largest contours that satisfy the above condition. In the figure, SAD(0,1) and SAD(1,0) represent SAD(a,b)(0,1) and SAD(a,b)(1,0), respectively. From Eq. (2), they are
Each largest contour should cross one of (0, 1) and (1, 0) or both of them as in the figure. Let this condition be Cond MinSad . The termination condition of Cond MinSad (or a SAD value for each contour) is simply the minimum one between SAD(a,b)(0,1) and SAD(a,b)(1,0) as follows:
Consequently, if the SAD value of the current search point is less than the threshold, integer motion estimation is terminated. It is noted here that SAD(a,b)(0,1) and SAD(a,b)(1,0) are calculated from only pixels in a current block.
Figure 4 illustrates two types of contours for the condition of Cond MaxSad defined as follows:
Compared to Fig. 3, the sizes of the gray regions for cases with SAD(0,1)<SAD(1,0) and SAD(0,1)>SAD(1,0) are increased. Hence, the probability of early termination of a motion search will be increased. However, since parts of the gray regions are not included in the rectangular boxes, the search will be sometimes terminated at the improper position, and the quality performance may be degraded. Section 4 will present the performance degradation from wrong termination of a motion search by comparing the performances of Cond MinSad and Cond MaxSad .
The proposed method can be combined with the existing thresholding technique to further improve the search speed. Cond M i n S a d,S i m is a termination condition by combining Cond MinSad with a simple threshold technique, which was introduced in MVFAST [4], in order to further improve the speed. The termination condition is defined as follows:
Here, B W and B H are the width and height of a block to be considered. Cond M i n S a d,S i m sets a constant threshold value to 2B W B H to support the variable block size mode. Note here that the threshold value for MVFAST [4] was set to 512. The performance of Cond M i n S a d,S i m will be given in the next section.
2.4 Implementation
The SAD condition for early termination of a motion search was proposed in the previous subsections. This subsection will describe how to implement the proposed method. The proposed algorithm cannot be running alone, but it should be merged with the existing fast motion estimation to further improve its search speed. To combine the proposed method with the fast motion estimation is simple as in Fig. 5 c. After the SAD evaluation of each search point, its SAD value is compared with one of the proposed threshold T methods from the previous section. Only if the SAD value is less than the threshold, the current search point is considered as the minimum point, and the search is stopped immediately.
To support variable block motion estimation, the threshold value for the termination condition does not need to be calculated at every block size. The computational cost is reduced by sharing the calculated data. First, the method calculates SAD(1,0) and SAD(0,1) of 4×4 blocks in a current block in Fig. 5 a. Then, SAD(1,0) and SAD(0,1) for the given block size are calculated from the above 4×4 blocks (see Fig. 5 b). Finally, the threshold is calculated according to the termination condition, such as Cond MinSad , Cond MaxSad , and Cond M i n S a d,S i m .
The calculation of SAD(0,1) and SAD(1,0) requires additional computational cost. The computational load to calculate SAD(0,1) or SAD(1,0) is the same as the computation of one point search (or the SAD evaluation for one point). Hence, the overhead for the proposed algorithm corresponds to a two-point search in total. This computational cost can be further reduced using a subsampling technique [10] as follows:
Then, the overhead becomes a 0.5 point search.
Some assumptions, such as the pure translational motion, the reconstruction of a pixel using a simple bilinear interpolation, and convex contours of SAD, are not valid in general. Hence, the proposed algorithm may improve speed with a sacrifice of rate (or peak signal-to-noise ratio (PSNR)) performance. Therefore, experimental results in the next section will show the performance of the proposed algorithm in terms of speed and rate.
3 Experimental results
In order to verify the proposed algorithm, two types of experiments are performed. If a search point satisfies the proposed condition while the search point is not very close to the optimal point (or the integer motion vector), the proposed method terminates a motion search at the non-optimal point. Then, there is a chance of degrading the coding performance. Hence, the first experiment examines the accuracy and safety of the proposed termination condition. The second experiment verifies the performance of the proposed algorithm on a video encoder.
This paper includes experimental results for eight test video sequences including BQTerrace, BasketballDrill, Kimono1, ChinaSpeed, ParkScene, FourPeople, KristenAndSara, and PeopleOnStreet, which were used in the HEVC standard. The details are given in Table 1.
3.1 Accuracy test
In order to examine the accuracy and safety of the proposed termination condition of Cond MinSad , SAD values for all possible search points for each block are examined and compared with the condition Cond MinSad . When among all search points, only search points within an area close to the optimal point satisfy the termination condition, a block is marked as a safe block. Here, the distance between the optimal point (or the integer motion vector) and a search point within the close area should be equal to or less than 1 pixel. If one of the search points outside of the close area satisfies the termination condition, the block is marked as an unsafe block. If the current search point is out of the close area while the point satisfies the proposed termination condition, the search is stopped at the current search point, and the coding performance may be degraded. Hence, the number of safe blocks should be high, while the number of unsafe blocks is low. It should be noted here that the block marked as the unsafe block does not always degrade the coding efficiency. For example, although the block is the unsafe block, if the current search point is within the close area, the search is stopped immediately, and the coding performance will not be degraded. The proposed termination condition Cond MinSad for the accuracy test is given in Eq. (5).
Table 2 represents the comparison of the early termination condition between the proposed condition Cond MinSad and Ismail’s method [9], which is one of the state-of-the-art methods. The threshold for Ismail’s [9] is as follows:
where SAD ISCavg is the average SAD ISC value for all the previous coded blocks in the current frame whose best match motion vectors are the initial search centers (ISCs) themselves. SAD ISCcurrent is the value of SAD at the initial search center for the current block. Here, ε 1 and ε 2 are set to 0.75 and 128, respectively, which are the recommended values [9].
In the table, the percentages of the safe block of the proposed algorithm are similar to or higher than those of Ismail [9]. Meanwhile, the percentages of the unsafe block of the proposed algorithm are less. This means that the proposed algorithm may be slightly more accurate than Ismail’s method.
The proposed algorithm can be combined with most existing fast motion estimation algorithms to reduce computational cost. Since the fast motion estimation algorithms adopt different shapes and sizes of search patterns, the overall speed will totally depend on the algorithms. Hence, “Total” in Table 2 is related with only the speed gain by combining the termination method. It represents the percentages of blocks in that at least one search point satisfies the termination condition. There is a chance to terminate early a motion search at those blocks. The higher the percentages of “Total,” the higher the speed gain is. Overall, the total percentage of Ismail’s method is 9.41 % more than that of the proposed method, and Ismail’s method provides the higher speed gain than the proposed algorithm. However, since the proposed algorithm utilizes a different feature to predict the termination condition, the proposed algorithm can be combined with Ismail’s method to further improve the search speed. (The experimental results will be presented in the next subsection.) Hence, the proposed method is still meaningful.
3.2 Performance on a video encoder
3.2.1 Simulation environment
In order to evaluate the performance of the proposed termination method on a video encoder, it was implemented on the H.264 reference software JM 18.5 [11] and combined with ‘Enhanced Predictive Zonal Search’ (SearchMode = 3), which is one of fast motion estimation algorithms. For simulation, the number of reference frames is set to 1 without B frame coding. Rate control and rate-distortion optimized mode decision are turned off. The number of frames for simulation is set to 100. A variable block size mode is turned on. The quantity of ΔPSNR is a PSNR change compared to the conventional method. When the PSNR performance is degraded from the conventional method, ΔPSNR becomes a negative value and vice versa. The quantity of ΔBR is a bitrate change from the conventional method in terms of percentage. When the bitrate is increased from the conventional one, it has a positive value. RR represents a reduction rate as follows:
Here, METime CONV and METime NEW represent motion estimation times for conventional and newly implemented methods, respectively.
3.2.2 Results
Table 3 shows the performances of the conventional method and the proposed algorithms of Cond MinSad and Cond MaxSad . In the table, the conventional method represents the original Enhanced Predictive Zonal Search implemented on the JM 18.5. In Cond MinSad , a termination condition of Cond MinSad in Eq. (5) is combined with the Enhanced Predictive Zonal Search to further reduce motion estimation time. As in the table, the Cond MinSad successively reduces the motion estimation time of the original Enhanced Predictive Zonal Search, and the average reduction ratio is 0.22. This simulation explains that the proposed termination condition can further reduce the computational cost of fast motion estimation. As mentioned in Section 2.4, some assumptions in this paper are not valid in general. Hence, while the search speed is improved, the PSNR performance is degraded.
In Cond MaxSad , a termination condition of Cond MaxSad in Eq. (6) is combined with the original Enhanced Predictive Zonal Search. Compared to Fig. 3, the sizes of the gray regions in Fig. 4 are increased. Hence, RR of the Cond MaxSad is more than that of the Cond MinSad . However, since parts of the gray regions are outside the rectangular boxes, the search with Cond MaxSad will be sometimes terminated at the improper position, and ΔBR is increased compared to Cond MinSad or the conventional method. Table 3 shows the above trade-off between speed and accuracy. Nowadays, prices for storage devices become cheap. Accordingly, in some applications, an increase of 0.88 % in bitrate is not critical while the reduction of computational cost is important. Cond MaxSad will be beneficial for those applications.
Table 4 represents the performance of the proposed method by combining it with a simple threshold technique, which was first introduced in MVFAST [4]. The termination condition of Cond M i n S a d,S i m is described in Eq. (7). The proposed algorithm sets a threshold value to 2B W B H to support the variable block size mode. Note here that the threshold value for MVFAST [4] was set to 512. In the table, while the coding efficiency is degraded by 0.16 %, the reduction rate is 14 % (or 0.14). This simulation shows that the proposed termination method can be merged with the existing early search termination to further reduce the computational cost. For the sequences of KS, the speed of Cond M i n S a d,S i m is much faster than that of Con MinSad . Meanwhile, the RR of Cond M i n S a d,S i m for the sequence of PS is almost the same as that of Cond MinSad . The main reason is that the proposed method utilizes the self-characteristic of a current block, which is different from the constant thresholding technique. In other words, the gains of two termination algorithms are not correlated.
Table 5 represents the comparison of early termination conditions between the proposed method and Ismail’s method [9], which is one of the state-of-the-art methods. The early termination of Ismail [9] in Eq. (6) is also implemented on the JM reference software 18.5 by combining it with Enhanced Predictive Zonal Search. In order to support the variable block size mode, Eq. (6) is modified as follows:
Here, B W and B H are the width and height of a block to be considered. As in the table, Cond M i n S a d,S i m successively reduces the motion estimation time, and the average reduction ratio is 0.36. While the average reduction ratio of Cond M i n S a d,S i m is better than that of Ismail, the PSNR performance of Cond M i n S a d,S i m is worse than that of Ismail. However, although the proposed method does not outperform Ismail’s method, the proposed algorithm can be used to complement Ismail’s method, in order to further improve its search speed. While Ismail’s method considers the SAD values of blocks temporally and spatially neighboring to determine the termination condition, the proposed algorithm takes into account the self-characteristic of a current block for calculating the termination condition. Since their termination conditions are exclusive, the merged algorithm further accelerates the search speed. Two termination conditions of Ismail’s and proposed methods are simultaneously considered in Ismail + Cond M i n S a d,S i m , in order to show how the proposed algorithm complements Ismail’s method. The method chooses the maximal threshold value between Ismail’s and Cond M i n S a d,S i m to combine two algorithms together. The reduction ratio of Ismail + Cond M i n S a d,S i m is more than those of Cond M i n S a d,S i m and Ismail’s method. This simulation results show that the proposed algorithm also complements Ismail’s method. Hence, although the proposed algorithm does not outperform the existing methods, it can be used to accelerate the search speed, and it makes the proposed algorithm meaningful.
Table 6 shows the overall performance comparisons of Cond MaxSad , Cond MinSad , Cond M i n S a d,S i m , Ismail [9], and Ismail + Cond M i n S a d,S i m .
4 Conclusions
This paper presents a method to terminate early a motion search for a video encoder. Unlike the previous work, this paper introduces a SAD estimator by considering a two-dimensional space and studies the estimator to get a condition for terminating a motion search early. The proposed algorithm can easily be combined with most of the methods of fast motion estimation to reduce computational cost. While previous thresholding techniques consider SAD values of blocks neighboring a current block, the proposed algorithm takes into account the characteristic of the current block to predict the threshold. Hence, the proposed termination scheme can complement the previous early search termination techniques.
References
T Wiegand, GJ Sullivan, G Bjøntegaard, A Luthra, Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13(7), 560–576 (2003).
GJ Sullivan, J Ohm, WJ Han, T Wiegand, Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol. 22(12), 1649–1668 (2012).
S Zhu, K-K Ma, A new diamond search algorithm for fast block matching motion estimation. IEEE Trans. Image Process. 9(2), 287–290 (2000).
PI Hosur, KK Ma, in Proc. 2nd Int. Conf. Information Communications and Signal Processing (ICICS ’99). Motion vector field adaptive fast motion estimation (Singapore, Dec. 1999), pp. 7–10.
AM Tourapis, OC Au, ML Liou, Highly efficient predictive zonal algorithms for fast block-matching motion estimation. IEEE Trans. Circ. Syst. Video Technol. 12(10), 934–947 (2002).
C Zhu, X Lin, LP Chau, KP Lim, HA Ang, CY Ong, in Proc. ICASSP. A novel hexagon-based search algorithm for fast block motion estimation (Utah, USA, 2001), pp. 1593–1596. 7-11 May 2001.
ZB Chen, P Zhou, Y He, in JVT-F017, 6th Meeting, Awaji. Fast integer pel and fractional pel motion estimation for JVT (Awaji, Japan, Dec. 2002).
Y Ismail, M Shaaban, M Bayoumi, in Proc. IEEE ISCAS. An adaptive block size phase correlation motion estimation using adaptive early search termination technique (New Orleans, USA, 2007), pp. 3423–3426. 27-30 May 2007.
Y Ismail, JB McNeely, M Shaaban, H Mahmoud, MA Bayoumi, Fast motion estimation system using dynamic models for H.264/AVC video coding. IEEE Trans. Circ. Syst. Video Technol. 22(1), 28–42 (2012).
Y Lee, JB Ra, Fast motion estimation robust to random motions based on a distance prediction. IEEE Trans. Circ. Syst. Video Technol. 16(7), 869–875 (2006).
JM reference software (Aug 1 2014). http://iphome.hhi.de/suehring/tml/.
Acknowledgements
This present research has been conducted through the Research Grant of Kwangwoon University in 2014. This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2014R1A1A2054105).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The author declares no competing interests.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Lee, YG. Early search termination for fast motion estimation. J Image Video Proc. 2015, 29 (2015). https://doi.org/10.1186/s13640-015-0083-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13640-015-0083-4