 Research
 Open Access
 Published:
Fast intermode decision algorithm based on general and local residual complexity in H.264/AVC
EURASIP Journal on Image and Video Processing volume 2013, Article number: 30 (2013)
Abstract
The stateoftheart video coding standard H.264/AVC achieves significant coding performance by adopting variable block sizes for motion estimation (ME) and mode decision. However, this technique brings out high computational complexity since the optimal mode is determined by exhaustively performing ratedistortion optimization (RDO) on each coding mode with different block sizes. In this paper, the fast intermode decision algorithm is proposed to reduce the computational complexity. Based on the high correlation between the residual error of ME and the optimal block size, general residual complexity (GRC) and local residual complexity (LRC) are defined. According to MB activity evaluated on GRC and LRC, candidate intermodes are determined and RDO processes are only performed on selected intermodes. The experimental results demonstrate that the proposed algorithm achieves time saving by 63% on average with negligible degradation of coding efficiency.
1 Introduction
The stateoftheart video coding standard H.264/AVC [1] was developed by the Joint Video Team of ITUT Video Coding Experts Group and ISO/IEC Moving Picture Experts Group. Compared to previous the video coding standards, H.264/AVC achieves higher coding performance with the use of several advanced coding tools such as variable block sizes for motion estimation (ME) and mode decision, multireference frames, a motion vector (MV) with quarterpixel accuracy, a deblocking filter, and contextbased adaptive binary arithmetic coding (CABAC) [2]. However, computational complexity increases with these tools, especially by several block sizes for ME and mode decision [3].
In ME and mode decision, a macroblock (MB) can be divided into four block partitions: 16 × 16, 16 × 8, 8 × 16, and 8 × 8; and each 8 × 8 subblock, denoted P8 × 8, can be further divided into 8 × 8, 8 × 4, 4 × 8, and 4 × 4. Figure 1 represents all block sizes and their relationships for ME and mode decision.
To obtain the optimal mode from candidate coding modes with various block sizes, H.264/AVC employs the ratedistortion optimization (RDO) technique [4]. In RDO, after the ratedistortion cost (RDcost) of each coding mode is calculated, the mode with the minimum RDcost is selected as the optimal mode. Since RDO requires encoding and decoding processes to obtain RDcost, it brings forth vast computational complexity. Therefore, it is highly aspired to reduce the encoder complexity without noticeable coding loss for the wide application of H.264/AVC, including video services (video conferencing, digital multimedia broadcasting, and IPTV), and consumer products (mobile devices and DVD or Bluray players).
A number of approaches [5–16] have been reported for H.264/AVC to reduce the encoder complexity. The main idea of algorithms is to eliminate unnecessary modes with different block sizes and to perform the RDO process for possible block sizes only. In some approaches [5–8], SKIP mode is focused in reducing the encoder complexity. The early SKIP mode decision algorithm [5, 6] is proposed based on four conditions. If these conditions are verified, the SKIP mode is determined as the optimal mode, and the remaining modes are considered as unnecessary modes for the RDO process. This algorithm has been adopted as fast mode decision option in JM reference software. Additional conditions [7] for SKIP mode are proposed based on spatial and temporal neighborhood information to improve the performance of the early SKIP mode decision algorithm [6]. Saha et al. [8] employed the sumofabsolutetransformeddifferences to predict the SKIP mode. In these algorithms [5–8], since coding modes excluding the SKIP mode are considered as a single group, if the SKIP mode is not determined as the optimal mode, the RDO process is executed for all coding modes. Accordingly, the performance improvement is limited in sequences with fast motion or detailed regions. Therefore, additional approaches [9–16] for remaining coding modes are proposed to further reduce the encoding time. Wu et al. [9] proposed spatial homogeneity by using the Sobel operator and temporal stationary with the difference between current MB and its colocated MB in the reference frame. Based on homogeneity and stationary, only a small number of intermodes are selected for the RDO process. In [10], the bottomup merge method is introduced for the fast intermode selection. In the bottomup merge method, 16 × 16 block is split into 4 × 4 blocks, and then 4 × 4 blocks with the same class are merged based on MVs and the edge information. Ren et al. [11] proposed a fast adaptive early termination (FAT) mode selection algorithm. FAT consists of three steps: initial prediction, early termination, and refinement. In the initial prediction, the candidate modes are determined based on the mode histogram from neighboring MBs. Early termination and refinement for candidate modes are considered to tradeoff computational efficiency and accuracy. The MB tracking scheme [12] is proposed based on the temporal correlation between successive frames. The candidate modes of the current MB are determined according to the optimal mode of colocated MB in the previous frame. Zeng et al. [13] proposed a method to select the candidate modes according to motion activity, generated by the MVs of spatially and temporally nearby MBs. Liu et al. [14] proposed an efficient intermode decision algorithm based on motion homogeneity evaluated on a normalized MV field, as determined by the MVs from ME of 4 × 4 block. Threedirectional motion homogeneities are exploited to select candidate intermodes. The classified region algorithm [15] is proposed based on the spatial and temporal homogeneity of the block obtained by using 16 × 16 and 8 × 8 block pattern. The candidate intermodes for RDO process are reduced by using the spatial and temporal homogeneity. MartínezEnríquez et al. [16] proposed an adaptive algorithm based on RDcost statistics to decrease the encoding time. The differences between the RDcost for each mode are used to attain successive adaptive thresholds that allow the reduction of the number of evaluated intermodes. Although previous works [9–16] are well designed, there is still a need to further develop a more efficient algorithm, especially for video sequences with high motion or detailed regions.
In this paper, fast intermode decision algorithm is proposed based on global residual complexity (GRC) and local residual complexity (LRC). MB activity is determined based on GRC and LRC, and RDO processes are only performed for candidate intermodes chosen according to MB activity.
The features of the proposed algorithm are as follows: First, three subsets of candidate intermodes are defined to represent different MB activities. Candidate intermodes of selected subset are only considered for ME and mode decision. Second, to determine MB activity, GRC and LRC are proposed based on the observation that smaller block sizes are likely to be selected as the optimal mode in MB with high residual error of ME. Adaptive threshold for LRC according to GRC is designed for available QP ranges of H.264/AVC. Third, additional computation is only needed to obtain GRC, which can be easily and inexpensively calculated and performed only once for each frame. Since LRC is obtained in the process of ME for 16 × 16, additional computation is not needed. These features contribute to the performance improvement for all sequences including ones with high motion or detailed regions.
The rest of this paper is organized as follows. In Section 2, the proposed algorithm is introduced in detail. The simulation results are provided in Section 3. Finally, the conclusion is represented in Section 4.
2 The proposed intermode decision algorithm
2.1 Motivation
In ME and mode decision, variable block sizes can reduce the prediction or residual error efficiently. For example, if a MB includes homogeneous regions with no or slow motion, it is appropriate to select large block sizes such as 16 × 16, since large block sizes can result in sufficiently small residual error. In contrast, large block sizes lead to large prediction error for a MB with fast motion or detailed regions. In this case, smaller block sizes such as 8 × 8, 8 × 4, 4 × 8, and 4 × 4 can be considered as the optimal mode. Accordingly, it is likely to select smaller block sizes when a MB includes detailed regions with fast motion.
It is observed that there is a high correlation between the optimal block size and residual error. Figure 2 shows one example, in which the 77th frame from the CIF sequence Silent is used. Figure 2a represents the real frame image, and Figure 2b represents the absolute difference between each pixel in the current MB and the reference block of 16 × 16 ME. The optimal intermodes are represented by using differentsized boxes overlying on the corresponding MB, and the absolute difference value is represented in a darker shade when the absolute difference value is larger and vice versa. As seen in Figure 2a, larger block sizes are selected for the homogeneous region, such as the white wall in the background. Also, the painting of the background is coded for large blocks, since it remains still, although this region includes nonhomogeneous patterns. On the other hand, regions including motion boundary such as the hair and the right hand of the woman are coded in smaller blocks. Since in these regions, the residual error of ME for large block is large, smaller blocks are considered to achieve small residual error. It can be seen in Figure 2b that when a small block size is selected as the optimal mode for a MB, the residual error of ME for 16 × 16 block is large (gray or dark gray region), whereas if the residual error is small (white region), MBs are coded for a large block.
According to this observation, it is concluded that the optimal block size can be predicted based on the MB residual error. If the residual error is small when ME for large block is performed, a large block size can be used as the optimal mode. In contrast, if the residual error is not small enough to select a large block, smaller block sizes are considered to determine the optimal block size mode.
2.2 Intermode decision based on LRC and GRC
Based on the above analysis, LRC and MB activities are proposed to determine candidate intermodes. After ME is performed on the 16 × 16 block size, LRC is calculated as the sum of absolute difference between the current MB and the reference block of 16 × 16 ME, as follows:
where s denotes the current MB, r indicates the reference block of 16 × 16 ME, and x and y are the horizontal and vertical positions of the MB. The MB activity is determined according to LRC as follows:
Based on the selected MB activity, the candidate intermodes summarized in Table 1 are considered to determine the optimal mode. For example, if LRC is greater than L _{0} and less than or equal to L _{1}, MB activity is selected as medium, and, henceforth, RDO processes are performed only for 16 × 16, 16 × 8, and 8 × 16. To design the adaptive thresholds L _{0} and L _{1} in (2), GRC is defined as in (3):
where ⌊x⌋ is the largest integer value less than or equal to x, N and M are the frame height and width, respectively. S is the current frame and R is the previous frame in display order, which means the nearest candidate reference frame to the current frame. The thresholds L _{ i } are obtained according to each GRC and QP as follows.
where i ∈ {0, 1}, α _{ i } are defined to provide a tradeoff between visual quality and time saving, and μ _{ i } is the average of LRCs of MBs for class i from Table 1. For example, μ _{0} is obtained as the average of LRCs from MBs in which the optimal mode is 16 × 16, and LRCs of MBs in which the optimal mode is 16 × 16, 16 × 8, or 8 × 16 are considered to obtain μ _{1}.
The exhaustive experiments are performed to obtain μ _{ i } according to each GRC and QP, and afterwards these values are modeled as a function of GRC and QP. For this, the training sequences from Table 2 have been used with the ten QPs, 12, 16, 20, 24, 28, 32, 36, 40, 44, and 48. The training sequences have been selected in considering resolutions and sequence characteristics. Empirically, we have selected as α_{0} = 1.1 and α_{1} = 1.5.
Figure 3 represents the thresholds L _{ i }, i ∈ {0, 1}, according to GRC from the exhaustive experiment. Figure 3a,b shows the results for QP = 24 and QP = 36, respectively. It is observed from Figure 3 that the thresholds are divided into two regions according to the tendency of the slopes. For example, in Figure 3a, L _{ i }, i ∈ {0, 1} can be divided into two regions, GRC ≤ 4 and 4 < GRC. In GRC ≤ 4, L _{ i }, i ∈ {0, 1} can be approximated as constants because the slopes are close to zero. In 4 < GRC, L _{ i }, i ∈ {0, 1} are approximated as linear functions. A similar tendency can be observed from the slope in Figure 3b. Consequently, L _{ i } is modeled as follows:
where parameters α _{ i }, b _{ i }, and c _{ i } are determined in Table 3, and G is defined according to the tendency of the slopes for L _{ i } as follows:
By plotting the parameter values of the ten QPs given in Table 3 onto Figure 4, it can be observed that these values are estimated as the exponential function, f(QP) = p × e^{q × QP}. Accordingly, the parameter values in Table 3, α _{ i }, b _{ i }, and c _{ i }, i ∈ {0, 1} are determined as follows:
2.3 Overall algorithm
Figure 5 represents the flowchart of the proposed intermode decision algorithm. In each frame, GRC is calculated, and then thresholds L _{0} and L _{1} are determined. For each MB, LRC is calculated and MB activity is determined. Finally, according to MB activity, candidate intermodes for the RDO process are determined for each MB. The overall procedure of the proposed algorithm is described as follows:
Step 1: Calculate GRC according to (3) for current frame.
Step 2: Determine two thresholds, L _{0} and L _{1}, based on GRC according to (5), (6), and (7).
Step 3: Perform ME of 16 × 16 block size, and then calculate LRC according to (1) in each MB for current frame.
Step 4: Determine MB activity of current MB based on LRC and two thresholds L _{0} and L _{1}, according to (2).
Step 5: Determine the class of the current MB based on its MB activity from Table 1.
Step 6: Perform RDO process and obtain RDcosts for only candidate intermodes that belong to the selected class.
Step 7: Determine the mode with minimum RDcost as the optimal mode.
Step 8: If current MB is the last MB in current frame, go to the step 9. Otherwise, go to the step 3 for the next MB in current frame.
Step 9: If current frame is the last frame, finalize the encoding process. Otherwise, go to the step 1 for the next frame.
3 Simulation results
The proposed intermode decision algorithm is implemented into the H.264/AVC reference software, JM 13.2 [17]. The experiments are performed on a PC with 3.2 GHz CPU and 6 GB RAM by using the training sequences used for the threshold modeling and the nontraining sequences as seen in Table 2. The test conditions are set as follows: encoding frame is 100; ME search range is 32; five reference frames are used; motion vector resolution is 1/4 pel; RDO and CABAC are used; fast ME [18] is used; GOP structure is IPPP; and four QPs (28, 32, 36, 40) are used. Bjontegaard delta peak signaltonoise ratio (BDPSNR), Bjontegaard delta bit rate (BDBR) [19], and time saving (TS) are used in order to evaluate the performance. TS is defined as follows.
where T _{ref} and T _{proposed} are the encoding times of the reference software and proposed algorithm, respectively. For BDBR, BDPSNR, and TS, the positive values represent an increase whereas negative values indicate a decrease.
The simulation results are represented based on the training sequences and the nontraining sequence, together in Table 4. Also, the proposed algorithm is compared with two preceding works, Jeon’s [5, 6] and MartínezEnríquez’s [16] in Table 5. Jeon’s and MartínezEnríquez’s methods are two of the mostcited fast mode decision algorithm. Especially, Jeon’s method has been adopted as fast mode decision algorithm in JM reference software. MartínezEnríquez’s method has been reported recently, and its performance is outstanding.
It is observed from Table 4 that the proposed algorithm reduces the encoding time by 63% on average with negligible coding loss in terms of BDPSNR and BDBR, with the values of −0.03 dB and 0.19%, respectively. The proposed algorithm also outperforms Jeon's and MartínezEnríquez's, in terms of TS, by −49% and −17% on average, respectively, from Table 5. In particular, the improvement of TS for sequences with high motion or detailed regions, such as Coastguard, Football, and Mobile, is outstanding compared to the two works. For example, in the Mobile sequence, TS of the proposed algorithm is −56%, while TS of MartínezEnríquez’s and Jeon’s algorithms are −29% and −6%, respectively.
To evaluate ratedistortion (RD) performance, the RD curves of the reference software and proposed algorithm are represented in Figure 6. The RD curves of the reference software refer to the optimal performance bounds. From Figure 6, it can be seen that the proposed fast intermode decision algorithm achieves a similar RD performance as in that of the exhaustive mode decision of the JM reference software.
4 Conclusions
In this paper, the fast intermode decision algorithm is proposed to reduce encoder complexity. In particular, based on the high correlation between the residual error and optimal mode, GRC and LRC, which can be obtained simply and inexpensively, are defined. The candidate intermodes are efficiently determined according to MB activity evaluated on the GRC and LRC, and RDO processes can be skipped for unnecessary intermodes. Therefore, the proposed algorithm greatly reduces the encoding time with negligible degradation of coding performance and achieves significantly better results than that of the previous two works, Jeon’s and MartínezEnríquez’s, for all sequences including ones with high motion or detailed region.
Abbreviations
 BDBR:

Bjontegaard delta bit rate
 BDPSNR:

Bjontegaard delta peak signaltonoise ratio
 CABAC:

contextbased adaptive binary arithmetic coding
 FAT:

fast adaptive early termination
 GRC:

global residual complexity
 LRC:

local residual complexity
 MB:

macroblock
 ME:

motion estimation (ME)
 MV:

motion vector
 RD:

ratedistortion
 RDcost:

ratedistortion cost
 RDO:

ratedistortion optimization
 TS:

time saving.
References
 1.
ISO/IEC, ISO/IEC 14496–10: Information TechnologyCoding of Audio–Visual Objects–Part 10. Advanced Video Coding (ISO/IEC, 2003)
 2.
Wiegand T, Sullivan GJ, Bjontegard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 2003, 13(7):560576.
 3.
Ostermann J, Bormans J, List P, Marpe D, Narroschke M, Pereira F, Stockhammer T, Wedi T: Video coding with H.264/AVC: tools, performance and complexity. IEEE Trans. Circuits Syst. Mag. 2004, 4(1):728. 10.1109/MCAS.2004.1286980
 4.
Wiegand T, Schwarz H, Joch A, Kossentini F, Sullivan G: Rateconstrained coder control and comparison of video coding standards. IEEE Trans. Circuits Syst. Video Technol. 2003, 13(7):688703. 10.1109/TCSVT.2003.815168
 5.
Jeon B: Fast Mode Decision for H.264 (ISO/IEC, 2003) ISO/IEC JTC1/SC29/WG11 and ITUT SG16, Input Document JVTJ033. http://wftp3.itu.int/avarch/jvtsite/2003_12_Waikoloa/
 6.
Choi I, Lee J, Jeon B: Fast coding mode selection with rate distortion optimization for MPEG4 part10 AVC/H.264. IEEE Trans. Circuits Syst. Video Technol. 2006, 16(12):15571561.
 7.
Crecos C, Yang MY: Fast inter mode prediction for P slices in the H.264 video coding standard. IEEE Trans. Broadcast. 2005, 51(2):256263. 10.1109/TBC.2005.846192
 8.
Saha A, Mallick K, Mukherjee J, Sural S: SKIP prediction for fast rate distortion optimization in H.264. IEEE Trans. Consumer Electron 2007, 53(3):11531160.
 9.
Wu D, Pan F, Lim KP, Wu S, Li ZG, Lin X, Rahardja S, Ko CC: Fast intermode decision in H.264/AVC video coding. IEEE Trans. Circuits Syst. Video Technol. Jul. 2005, 15(7):953958.
 10.
Choi BD, Nam JH, Hwang MC, Ko SJ: Fast motion estimation and inter mode selection for H.264. EURASIP J. Adv. Signal Process. 2006, : . 10.1155/ASP/2006/71643
 11.
Ren J, Kehtarnavaz N, Budagavi M: Computationally efficient mode selection in H.264/AVC video coding. IEEE Trans. Consumer Electron 2008, 54(2):877886.
 12.
Kim BG: Novel intermode decision algorithm based on macroblock (MB) tracking for the Pslice in H.264/AVC video coding. IEEE Trans. Circuits Syst. Video Technol. 2008, 18(2):273279.
 13.
Zeng H, Cai C, Ma KK: Fast mode decision for H.264/AVC based on macroblock motion activity. IEEE Trans. Circuits Syst. Video Technol. 2009, 19(4):491499.
 14.
Liu Z, Shen L, Zhang Z: An efficient intermode decision algorithm based on motion homogeneity for H.264/AVC. IEEE Trans. Circuits Syst. Video Technol. 2009, 19(1):128132.
 15.
Bharanitharan K, Liu BD, Yang JF: Classified region algorithm for fast inter mode decision in H.264/AVC encoder. EURASIP J. Adv. Signal Process 2010. 10.1155/2010/150809
 16.
MartínezEnríquez E, JiménezMoreno A, DiazdeMaria F: An adaptive algorithm for fast inter mode decision in the H.264/AVC video coding standard. IEEE Trans. Consumer Electron 2010, 56(2):826834.
 17.
Joint Video Term (JVT): H.264/AVC reference software. http://iphome.hhi.de/suehring/tml
 18.
Chen Z, Zhou P, He Y: Fast motion estimation for JVT. JVTG016.doc. In 7th Meeting of the Joint Video Team (JVT) of ISO/IEC MPEG & ITUT VCEG, Pattya II, March 2003 Edited by: .
 19.
Bjontegaard G: Calculation of average PSNR differences between RDcurves. VCEGM33, 13th Meeting, Austin, Texas 2–4 April 2001.
Acknowledgments
This research was supported by the MKE (The Ministry of Knowledge Economy), South Korea, under the ‘ITRC (Information Technology Research Center)’ support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA2012H0301121008). Also, this work was supported by the National Research Foundation of Korea (NRF) with grant funded by the Korean government (MEST) (no. 2011–0016302).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Received
Accepted
Published
DOI
Keywords
 H.264/AVC
 Ratedistortion optimization
 Fast intermode decision
 General residual complexity
 Local residual complexity