Skip to main content

Fast intermode decision algorithm based on general and local residual complexity in H.264/AVC

Abstract

The state-of-the-art video coding standard H.264/AVC achieves significant coding performance by adopting variable block sizes for motion estimation (ME) and mode decision. However, this technique brings out high computational complexity since the optimal mode is determined by exhaustively performing rate-distortion optimization (RDO) on each coding mode with different block sizes. In this paper, the fast intermode decision algorithm is proposed to reduce the computational complexity. Based on the high correlation between the residual error of ME and the optimal block size, general residual complexity (GRC) and local residual complexity (LRC) are defined. According to MB activity evaluated on GRC and LRC, candidate intermodes are determined and RDO processes are only performed on selected intermodes. The experimental results demonstrate that the proposed algorithm achieves time saving by 63% on average with negligible degradation of coding efficiency.

1 Introduction

The state-of-the-art video coding standard H.264/AVC [1] was developed by the Joint Video Team of ITU-T Video Coding Experts Group and ISO/IEC Moving Picture Experts Group. Compared to previous the video coding standards, H.264/AVC achieves higher coding performance with the use of several advanced coding tools such as variable block sizes for motion estimation (ME) and mode decision, multi-reference frames, a motion vector (MV) with quarter-pixel accuracy, a de-blocking filter, and context-based adaptive binary arithmetic coding (CABAC) [2]. However, computational complexity increases with these tools, especially by several block sizes for ME and mode decision [3].

In ME and mode decision, a macroblock (MB) can be divided into four block partitions: 16 × 16, 16 × 8, 8 × 16, and 8 × 8; and each 8 × 8 subblock, denoted P8 × 8, can be further divided into 8 × 8, 8 × 4, 4 × 8, and 4 × 4. Figure  1 represents all block sizes and their relationships for ME and mode decision.

Figure 1
figure 1

The several block sizes and their relationships in H.264/AVC.

To obtain the optimal mode from candidate coding modes with various block sizes, H.264/AVC employs the rate-distortion optimization (RDO) technique [4]. In RDO, after the rate-distortion cost (RDcost) of each coding mode is calculated, the mode with the minimum RDcost is selected as the optimal mode. Since RDO requires encoding and decoding processes to obtain RDcost, it brings forth vast computational complexity. Therefore, it is highly aspired to reduce the encoder complexity without noticeable coding loss for the wide application of H.264/AVC, including video services (video conferencing, digital multimedia broadcasting, and IPTV), and consumer products (mobile devices and DVD or Blu-ray players).

A number of approaches [5–16] have been reported for H.264/AVC to reduce the encoder complexity. The main idea of algorithms is to eliminate unnecessary modes with different block sizes and to perform the RDO process for possible block sizes only. In some approaches [5–8], SKIP mode is focused in reducing the encoder complexity. The early SKIP mode decision algorithm [5, 6] is proposed based on four conditions. If these conditions are verified, the SKIP mode is determined as the optimal mode, and the remaining modes are considered as unnecessary modes for the RDO process. This algorithm has been adopted as fast mode decision option in JM reference software. Additional conditions [7] for SKIP mode are proposed based on spatial and temporal neighborhood information to improve the performance of the early SKIP mode decision algorithm [6]. Saha et al. [8] employed the sum-of-absolute-transformed-differences to predict the SKIP mode. In these algorithms [5–8], since coding modes excluding the SKIP mode are considered as a single group, if the SKIP mode is not determined as the optimal mode, the RDO process is executed for all coding modes. Accordingly, the performance improvement is limited in sequences with fast motion or detailed regions. Therefore, additional approaches [9–16] for remaining coding modes are proposed to further reduce the encoding time. Wu et al. [9] proposed spatial homogeneity by using the Sobel operator and temporal stationary with the difference between current MB and its co-located MB in the reference frame. Based on homogeneity and stationary, only a small number of intermodes are selected for the RDO process. In [10], the bottom-up merge method is introduced for the fast intermode selection. In the bottom-up merge method, 16 × 16 block is split into 4 × 4 blocks, and then 4 × 4 blocks with the same class are merged based on MVs and the edge information. Ren et al. [11] proposed a fast adaptive early termination (FAT) mode selection algorithm. FAT consists of three steps: initial prediction, early termination, and refinement. In the initial prediction, the candidate modes are determined based on the mode histogram from neighboring MBs. Early termination and refinement for candidate modes are considered to trade-off computational efficiency and accuracy. The MB tracking scheme [12] is proposed based on the temporal correlation between successive frames. The candidate modes of the current MB are determined according to the optimal mode of co-located MB in the previous frame. Zeng et al. [13] proposed a method to select the candidate modes according to motion activity, generated by the MVs of spatially and temporally nearby MBs. Liu et al. [14] proposed an efficient intermode decision algorithm based on motion homogeneity evaluated on a normalized MV field, as determined by the MVs from ME of 4 × 4 block. Three-directional motion homogeneities are exploited to select candidate intermodes. The classified region algorithm [15] is proposed based on the spatial and temporal homogeneity of the block obtained by using 16 × 16 and 8 × 8 block pattern. The candidate intermodes for RDO process are reduced by using the spatial and temporal homogeneity. Martínez-Enríquez et al. [16] proposed an adaptive algorithm based on RDcost statistics to decrease the encoding time. The differences between the RDcost for each mode are used to attain successive adaptive thresholds that allow the reduction of the number of evaluated intermodes. Although previous works [9–16] are well designed, there is still a need to further develop a more efficient algorithm, especially for video sequences with high motion or detailed regions.

In this paper, fast intermode decision algorithm is proposed based on global residual complexity (GRC) and local residual complexity (LRC). MB activity is determined based on GRC and LRC, and RDO processes are only performed for candidate intermodes chosen according to MB activity.

The features of the proposed algorithm are as follows: First, three subsets of candidate intermodes are defined to represent different MB activities. Candidate intermodes of selected subset are only considered for ME and mode decision. Second, to determine MB activity, GRC and LRC are proposed based on the observation that smaller block sizes are likely to be selected as the optimal mode in MB with high residual error of ME. Adaptive threshold for LRC according to GRC is designed for available QP ranges of H.264/AVC. Third, additional computation is only needed to obtain GRC, which can be easily and inexpensively calculated and performed only once for each frame. Since LRC is obtained in the process of ME for 16 × 16, additional computation is not needed. These features contribute to the performance improvement for all sequences including ones with high motion or detailed regions.

The rest of this paper is organized as follows. In Section 2, the proposed algorithm is introduced in detail. The simulation results are provided in Section 3. Finally, the conclusion is represented in Section 4.

2 The proposed intermode decision algorithm

2.1 Motivation

In ME and mode decision, variable block sizes can reduce the prediction or residual error efficiently. For example, if a MB includes homogeneous regions with no or slow motion, it is appropriate to select large block sizes such as 16 × 16, since large block sizes can result in sufficiently small residual error. In contrast, large block sizes lead to large prediction error for a MB with fast motion or detailed regions. In this case, smaller block sizes such as 8 × 8, 8 × 4, 4 × 8, and 4 × 4 can be considered as the optimal mode. Accordingly, it is likely to select smaller block sizes when a MB includes detailed regions with fast motion.

It is observed that there is a high correlation between the optimal block size and residual error. Figure  2 shows one example, in which the 77th frame from the CIF sequence Silent is used. Figure  2a represents the real frame image, and Figure  2b represents the absolute difference between each pixel in the current MB and the reference block of 16 × 16 ME. The optimal intermodes are represented by using different-sized boxes overlying on the corresponding MB, and the absolute difference value is represented in a darker shade when the absolute difference value is larger and vice versa. As seen in Figure  2a, larger block sizes are selected for the homogeneous region, such as the white wall in the background. Also, the painting of the background is coded for large blocks, since it remains still, although this region includes non-homogeneous patterns. On the other hand, regions including motion boundary such as the hair and the right hand of the woman are coded in smaller blocks. Since in these regions, the residual error of ME for large block is large, smaller blocks are considered to achieve small residual error. It can be seen in Figure  2b that when a small block size is selected as the optimal mode for a MB, the residual error of ME for 16 × 16 block is large (gray or dark gray region), whereas if the residual error is small (white region), MBs are coded for a large block.

Figure 2
figure 2

The optimal intermodes in (a) sequence frame image and (b) residual error image.

According to this observation, it is concluded that the optimal block size can be predicted based on the MB residual error. If the residual error is small when ME for large block is performed, a large block size can be used as the optimal mode. In contrast, if the residual error is not small enough to select a large block, smaller block sizes are considered to determine the optimal block size mode.

2.2 Intermode decision based on LRC and GRC

Based on the above analysis, LRC and MB activities are proposed to determine candidate intermodes. After ME is performed on the 16 × 16 block size, LRC is calculated as the sum of absolute difference between the current MB and the reference block of 16 × 16 ME, as follows:

LRC = ∑ x = 0 15 ∑ y = 0 15 s x , y − r x , y ,
(1)

where s denotes the current MB, r indicates the reference block of 16 × 16 ME, and x and y are the horizontal and vertical positions of the MB. The MB activity is determined according to LRC as follows:

MB activity = Low Medium High if LRC ≤ L 0 if < LRC ≤ L 1 if L 1 < LRC
(2)

Based on the selected MB activity, the candidate intermodes summarized in Table  1 are considered to determine the optimal mode. For example, if LRC is greater than L 0 and less than or equal to L 1, MB activity is selected as medium, and, henceforth, RDO processes are performed only for 16 × 16, 16 × 8, and 8 × 16. To design the adaptive thresholds L 0 and L 1 in (2), GRC is defined as in (3):

GRC = 1 N × M × ∑ x = 0 N − 1 ∑ y = 0 M − 1 S s , y − R x , y + 0.5
(3)

where ⌊x⌋ is the largest integer value less than or equal to x, N and M are the frame height and width, respectively. S is the current frame and R is the previous frame in display order, which means the nearest candidate reference frame to the current frame. The thresholds L i are obtained according to each GRC and QP as follows.

L i = α i × μ i
(4)

where i ∈ {0, 1},  α i are defined to provide a trade-off between visual quality and time saving, and μ i is the average of LRCs of MBs for class i from Table  1. For example, μ 0 is obtained as the average of LRCs from MBs in which the optimal mode is 16 × 16, and LRCs of MBs in which the optimal mode is 16 × 16, 16 × 8, or 8 × 16 are considered to obtain μ 1.

Table 1 Classes and candidate intermodes according to MB activity

The exhaustive experiments are performed to obtain μ i according to each GRC and QP, and afterwards these values are modeled as a function of GRC and QP. For this, the training sequences from Table  2 have been used with the ten QPs, 12, 16, 20, 24, 28, 32, 36, 40, 44, and 48. The training sequences have been selected in considering resolutions and sequence characteristics. Empirically, we have selected as α0 = 1.1 and α1 = 1.5.

Table 2 Summary of the training sequences and the non-training sequences

Figure  3 represents the thresholds L i ,  i ∈ {0, 1}, according to GRC from the exhaustive experiment. Figure 3a,b shows the results for QP = 24 and QP = 36, respectively. It is observed from Figure  3 that the thresholds are divided into two regions according to the tendency of the slopes. For example, in Figure  3a, L i ,  i ∈ {0, 1} can be divided into two regions, GRC ≤ 4 and 4 < GRC. In GRC ≤ 4, L i ,  i ∈ {0, 1} can be approximated as constants because the slopes are close to zero. In 4 < GRC, L i ,  i ∈ {0, 1} are approximated as linear functions. A similar tendency can be observed from the slope in Figure 3b. Consequently, L i is modeled as follows:

L i = α i b i × GRC + c i if GRC ≤ G if G < GRC ,
(5)

where parameters α i , b i , and c i are determined in Table  3, and G is defined according to the tendency of the slopes for L i as follows:

G = max 0 , QP − 16 4 + 2.
(6)
Figure 3
figure 3

Thresholds L 0 and L 1 according to GRC in (a) QP = 24 and (b) QP = 36.

Table 3 Parameters for QPs, 12, 16, 20, 24, 28, 32, 36, 44, and 48

By plotting the parameter values of the ten QPs given in Table  3 onto Figure  4, it can be observed that these values are estimated as the exponential function, f(QP) = p × eq × QP. Accordingly, the parameter values in Table  3, α i , b i , and c i , i ∈ {0, 1} are determined as follows:

a 0 = 93.76 × e 0.07060 × QP b 0 = 6.312 × e 0.03842 × QP c 0 = 110.0 × e 0.06210 × QP a 1 = 118.5 × e 0.08757 × QP · b 1 = 17.65 × e 0.05755 × QP c 1 = 165.2 × e 0.06070 × QP .
(7)
Figure 4
figure 4

Parameter value approximations for (a) a 0 and a 1 , (b) b 0 and b 1 , (c) c 0 and c 1 .

2.3 Overall algorithm

Figure  5 represents the flowchart of the proposed intermode decision algorithm. In each frame, GRC is calculated, and then thresholds L 0 and L 1 are determined. For each MB, LRC is calculated and MB activity is determined. Finally, according to MB activity, candidate intermodes for the RDO process are determined for each MB. The overall procedure of the proposed algorithm is described as follows:

Step 1: Calculate GRC according to (3) for current frame.

Step 2: Determine two thresholds, L 0 and L 1, based on GRC according to (5), (6), and (7).

Step 3: Perform ME of 16 × 16 block size, and then calculate LRC according to (1) in each MB for current frame.

Step 4: Determine MB activity of current MB based on LRC and two thresholds L 0 and L 1, according to (2).

Step 5: Determine the class of the current MB based on its MB activity from Table  1.

Step 6: Perform RDO process and obtain RDcosts for only candidate intermodes that belong to the selected class.

Step 7: Determine the mode with minimum RDcost as the optimal mode.

Step 8: If current MB is the last MB in current frame, go to the step 9. Otherwise, go to the step 3 for the next MB in current frame.

Step 9: If current frame is the last frame, finalize the encoding process. Otherwise, go to the step 1 for the next frame.

Figure 5
figure 5

The flowchart of the proposed intermode decision algorithm.

3 Simulation results

The proposed intermode decision algorithm is implemented into the H.264/AVC reference software, JM 13.2 [17]. The experiments are performed on a PC with 3.2 GHz CPU and 6 GB RAM by using the training sequences used for the threshold modeling and the non-training sequences as seen in Table  2. The test conditions are set as follows: encoding frame is 100; ME search range is 32; five reference frames are used; motion vector resolution is 1/4 pel; RDO and CABAC are used; fast ME [18] is used; GOP structure is IPPP; and four QPs (28, 32, 36, 40) are used. Bjontegaard delta peak signal-to-noise ratio (BDPSNR), Bjontegaard delta bit rate (BDBR) [19], and time saving (TS) are used in order to evaluate the performance. TS is defined as follows.

TS = T proposed − T ref T ref × 100 % ,
(8)

where T ref and T proposed are the encoding times of the reference software and proposed algorithm, respectively. For BDBR, BDPSNR, and TS, the positive values represent an increase whereas negative values indicate a decrease.

The simulation results are represented based on the training sequences and the non-training sequence, together in Table  4. Also, the proposed algorithm is compared with two preceding works, Jeon’s [5, 6] and Martínez-Enríquez’s [16] in Table  5. Jeon’s and Martínez-Enríquez’s methods are two of the most-cited fast mode decision algorithm. Especially, Jeon’s method has been adopted as fast mode decision algorithm in JM reference software. Martínez-Enríquez’s method has been reported recently, and its performance is outstanding.

Table 4 Performance of the proposed algorithms in terms of BDPSNR, BDBR, and TS
Table 5 Performance comparison of three intermode decision algorithms in terms of BDPSNR, BDBR, and TS

It is observed from Table  4 that the proposed algorithm reduces the encoding time by 63% on average with negligible coding loss in terms of BDPSNR and BDBR, with the values of −0.03 dB and 0.19%, respectively. The proposed algorithm also outperforms Jeon's and Martínez-Enríquez's, in terms of TS, by −49% and −17% on average, respectively, from Table  5. In particular, the improvement of TS for sequences with high motion or detailed regions, such as Coastguard, Football, and Mobile, is outstanding compared to the two works. For example, in the Mobile sequence, TS of the proposed algorithm is −56%, while TS of Martínez-Enríquez’s and Jeon’s algorithms are −29% and −6%, respectively.

To evaluate rate-distortion (RD) performance, the RD curves of the reference software and proposed algorithm are represented in Figure  6. The RD curves of the reference software refer to the optimal performance bounds. From Figure  6, it can be seen that the proposed fast intermode decision algorithm achieves a similar RD performance as in that of the exhaustive mode decision of the JM reference software.

Figure 6
figure 6

RD curves of reference software and proposed algorithm.

4 Conclusions

In this paper, the fast intermode decision algorithm is proposed to reduce encoder complexity. In particular, based on the high correlation between the residual error and optimal mode, GRC and LRC, which can be obtained simply and inexpensively, are defined. The candidate intermodes are efficiently determined according to MB activity evaluated on the GRC and LRC, and RDO processes can be skipped for unnecessary intermodes. Therefore, the proposed algorithm greatly reduces the encoding time with negligible degradation of coding performance and achieves significantly better results than that of the previous two works, Jeon’s and Martínez-Enríquez’s, for all sequences including ones with high motion or detailed region.

Abbreviations

BDBR:

Bjontegaard delta bit rate

BDPSNR:

Bjontegaard delta peak signal-to-noise ratio

CABAC:

context-based adaptive binary arithmetic coding

FAT:

fast adaptive early termination

GRC:

global residual complexity

LRC:

local residual complexity

MB:

macroblock

ME:

motion estimation (ME)

MV:

motion vector

RD:

rate-distortion

RDcost:

rate-distortion cost

RDO:

rate-distortion optimization

TS:

time saving.

References

  1. ISO/IEC, ISO/IEC 14496–10: Information Technology-Coding of Audio–Visual Objects–Part 10. Advanced Video Coding (ISO/IEC, 2003)

  2. Wiegand T, Sullivan GJ, Bjontegard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 2003, 13(7):560-576.

    Article  Google Scholar 

  3. Ostermann J, Bormans J, List P, Marpe D, Narroschke M, Pereira F, Stockhammer T, Wedi T: Video coding with H.264/AVC: tools, performance and complexity. IEEE Trans. Circuits Syst. Mag. 2004, 4(1):7-28. 10.1109/MCAS.2004.1286980

    Article  Google Scholar 

  4. Wiegand T, Schwarz H, Joch A, Kossentini F, Sullivan G: Rate-constrained coder control and comparison of video coding standards. IEEE Trans. Circuits Syst. Video Technol. 2003, 13(7):688-703. 10.1109/TCSVT.2003.815168

    Article  Google Scholar 

  5. Jeon B: Fast Mode Decision for H.264 (ISO/IEC, 2003) ISO/IEC JTC1/SC29/WG11 and ITU-T SG16, Input Document JVT-J033. http://wftp3.itu.int/av-arch/jvt-site/2003_12_Waikoloa/

  6. Choi I, Lee J, Jeon B: Fast coding mode selection with rate distortion optimization for MPEG-4 part-10 AVC/H.264. IEEE Trans. Circuits Syst. Video Technol. 2006, 16(12):1557-1561.

    Article  Google Scholar 

  7. Crecos C, Yang MY: Fast inter mode prediction for P slices in the H.264 video coding standard. IEEE Trans. Broadcast. 2005, 51(2):256-263. 10.1109/TBC.2005.846192

    Article  Google Scholar 

  8. Saha A, Mallick K, Mukherjee J, Sural S: SKIP prediction for fast rate distortion optimization in H.264. IEEE Trans. Consumer Electron 2007, 53(3):1153-1160.

    Article  Google Scholar 

  9. Wu D, Pan F, Lim KP, Wu S, Li ZG, Lin X, Rahardja S, Ko CC: Fast intermode decision in H.264/AVC video coding. IEEE Trans. Circuits Syst. Video Technol. Jul. 2005, 15(7):953-958.

    Article  Google Scholar 

  10. Choi BD, Nam JH, Hwang MC, Ko SJ: Fast motion estimation and inter mode selection for H.264. EURASIP J. Adv. Signal Process. 2006,  : . 10.1155/ASP/2006/71643

  11. Ren J, Kehtarnavaz N, Budagavi M: Computationally efficient mode selection in H.264/AVC video coding. IEEE Trans. Consumer Electron 2008, 54(2):877-886.

    Article  Google Scholar 

  12. Kim B-G: Novel inter-mode decision algorithm based on macroblock (MB) tracking for the P-slice in H.264/AVC video coding. IEEE Trans. Circuits Syst. Video Technol. 2008, 18(2):273-279.

    Article  Google Scholar 

  13. Zeng H, Cai C, Ma K-K: Fast mode decision for H.264/AVC based on macroblock motion activity. IEEE Trans. Circuits Syst. Video Technol. 2009, 19(4):491-499.

    Article  Google Scholar 

  14. Liu Z, Shen L, Zhang Z: An efficient intermode decision algorithm based on motion homogeneity for H.264/AVC. IEEE Trans. Circuits Syst. Video Technol. 2009, 19(1):128-132.

    Article  Google Scholar 

  15. Bharanitharan K, Liu BD, Yang JF: Classified region algorithm for fast inter mode decision in H.264/AVC encoder. EURASIP J. Adv. Signal Process 2010. 10.1155/2010/150809

    Google Scholar 

  16. Martínez-Enríquez E, Jiménez-Moreno A, Diaz-de-Maria F: An adaptive algorithm for fast inter mode decision in the H.264/AVC video coding standard. IEEE Trans. Consumer Electron 2010, 56(2):826-834.

    Article  Google Scholar 

  17. Joint Video Term (JVT): H.264/AVC reference software. http://iphome.hhi.de/suehring/tml

  18. Chen Z, Zhou P, He Y: Fast motion estimation for JVT. JVT-G016.doc. In 7th Meeting of the Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Pattya II, March 2003 Edited by:  .

    Google Scholar 

  19. Bjontegaard G: Calculation of average PSNR differences between RD-curves. VCEG-M33, 13th Meeting, Austin, Texas 2–4 April 2001.

    Google Scholar 

Download references

Acknowledgments

This research was supported by the MKE (The Ministry of Knowledge Economy), South Korea, under the ‘ITRC (Information Technology Research Center)’ support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2012-H0301-12-1008). Also, this work was supported by the National Research Foundation of Korea (NRF) with grant funded by the Korean government (MEST) (no. 2011–0016302).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sangyoun Lee.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Lee, J., Kim, S., Lim, K. et al. Fast intermode decision algorithm based on general and local residual complexity in H.264/AVC. J Image Video Proc 2013, 30 (2013). https://doi.org/10.1186/1687-5281-2013-30

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1687-5281-2013-30

Keywords