 Research
 Open Access
 Published:
A novel early SKIP mode detection method for coarse grain quality in H.264/SVC
EURASIP Journal on Image and Video Processing volume 2013, Article number: 2 (2013)
Abstract
Various fast mode decision algorithms are proposed to reduce computational complexity for the H.264/scalable video coding standard. In this article, we focus on skip mode detection for coarse grain quality (CGS). We propose a fast skip mode detection approach by exploiting surrounding skip patterns, rate distortion (RD) cost and coded block pattern (CBP). Experimental results show that, with 0.589% BDBR and 0.011 dB BDPSNR loss on average, the proposed method achieves considerable encoding time reduction compared with the reference software. Compared with the stateoftheart early skip algorithm, the proposed method also achieves 9.98% time reduction on average.
1 Introduction
H.264 scalable video coding (SVC) can produce a single bitstream to adapt to various network conditions and hence it can be used to improve video quality over lossy networks [1]. Apart from its superior error resilience capabilities, the elimination of multipoint control unit (MCU) in video conference systems leads to great reduction of deployment cost and transmission delay [2]. Compared with H.264 single layer video coding, SVC requires coding of both base layer (BL) and enhancement layer (EL). To combat the additional bits required for multiple layers, SVC employs interlayer prediction methods, including interlayer intra prediction, residual prediction and motion prediction [1, 3]. These additional layers and prediction modes greatly increase coding complexity. Therefore, efficient mode decision is necessary for realtime SVC applications.
In the literature, various fast mode decision (FMD) algorithms have been proposed to reduce H.264 computational complexity for single layer coding [4–8]. For H.264 SVC, there also exists considerable research study on FMD. Li et al. proposed a FMD algorithm in [9] where the RD costs of INTRA4×4 and INTER8×8 are checked to separate intra and inter modes. Moreover, reduction of candidate modes are performed by exploiting mode correlation between BL and EL. In [10], Park et al. proposed a FMD method with initial mode selection. In their method, SKIP, BL_SKIP and INTRA4×4 are first checked; then the mode with the highest expected RD cost (when skipped) is checked. A threshold is set for early termination. Kim et al. proposed a mode decision algorithm where a parameter is calculated from the weighted sum of all the modes from the neighboring MBs of BL and EL. Selected modes are then checked based on this parameter. In [11], an allzeroblock (AZB) detection algorithm at EL is proposed. Based on the analysis of correlation of AZB occupance between BL and EL, only some distributions are allowed to be AZB candidates for the EL. Furthermore, for a specific BL MB mode, priorities of AZB checking are given to increase the probability of AZB early detection. A new FMD algorithm is proposed in [12], where quantization parameter (QP) values at both BL and EL and temporal layer indexes are used for mode decision. In [13], a novel FMD is proposed by exploiting the zero motion blocks and zero coefficient blocks to reduce the number of modes to be checked. A classification method is proposed in [14], where the mode in the first class (including SKIP and INTER16×16) is first checked. Further mode checking would be performed depending on temporal layer indexes and whether all the resulted coefficients are zeros. Lin et al. attacked the FMD problem from another angle in [15]. They proposed a method to reduce the number of bidirectional predictions and thus greatly save computational time. Lee and Kim proposed a new FMD based on statistical hypothesis testing [16]. Mean and variance of pixels are used to decide early termination conditions. Zhao et al. proposed to use optimal stopping theory for early termination decisions [17]. They classified all the candidate modes into modes without and with residual prediction (referred to as regular modes and residual modes). Optimal stopping theory is used for the regular modes. A reduced candidate set is used for residual modes based on the correlation between regular and residual modes. Zero block detection is also exploited to speed up the decision process of the residual modes. All the above FMD algorithms try to speed up the full mode decision process from different aspects. For example, mode correlation, expected RD cost, AZB detection are employed in [9–11], respectively. However, as an important part of FMD, skip mode decision is not studied as thoroughly as other parts of mode decision. Therefore, the scheme proposed in this article is complementary to other mode decision methods and would further increase performance gain. The only method that focuses on early skip is [18] proposed by Shen et al. for CGS (referred as Shen’s algorithm). Their approach decided a MB at EL to be a SKIP MB if the colocated MB at BL, the top MB and the left MB are all encoded as SKIP mode. Although this approach achieves considerable speed boost with early SKIP mode detection, we find there is still much room for improvement in SKIP mode detection for CGS. In this article, we propose a fast early SKIP mode detection by considering other skip patterns, RD costs and CBP values. Shen’s algorithm is the only method we are aware of that focuses on H.264 SVC early skip, therefore, we would compare our method with it throughout this article.
The rest of the article is organized as follows. Section 1 proposes a new SKIP detection algorithm based on SKIP mode distribution, RD cost differences and CBP values. Experimental results and conclusion are given in Sections 1 and 1.
2 Proposed early SKIP algorithm
2.1 SKIP mode detection based on SKIP patterns
H.264 SVC supports various interlayer predictions as well as variable block size predictions. RD optimization and motion estimation are performed for each mode and consume a huge amount of computational power. SKIP mode is prevalent at EL for lowbit rate SVC and requires little computation [18]. Simulations in [18] show that the average percentage for SKIP mode at EL is 59–71%, while for low or medium motion sequences this percentage can be as high as 70%. Based on this observation, Shen’s algorithm performs early SKIP mode detection based on the colocated MB at BL and the neighboring left and top MBs at EL. For example, Figure 1 shows the relevant MBs for this early SKIP mode detection approach. As defined in [18], the current MB is denoted by E_{c}, the colocated MB at BL is denoted by B_{c}, the left, top and upperleft MBs at EL are denoted by E_{l}, E_{u}, E_{lu}, respectively. We define a 3tuple Vs =(bskip(B_{c}), bskip(E_{l}), bskip(E_{u})), where bskip(B_{c}) is a binary variable that is set to one if B_{c} is a skipped MB and zero otherwise. Hence, Shen’s algorithm is to determine E_{c} to be a skipped MB if Vs =(1, 1, 1). When this happens, mode decision process is early terminated and encoding speed is greatly increased with small average PSNR loss. However, for some fast moving sequences, our analysis shows that this approach may result in nonnegligible PSNR losses. Moreover, in the case Vs≠(1,1,1), there are still quite some MBs at EL that can be skipped. Table 1 shows the statistical analysis with Joint Scalable Video Model (JSVM) for eight CIF sequences. We test 32 frames and GOP size and search range are set to 16. In the table, QP_{B} and QP_{E} denote QP at BL and EL, respectively. For each skipped MB E_{c} at EL, Vs values are collected. The table shows that, if E_{c} is skipped, for the listed two QP settings, there is in average 15 and 20% locations where only one of E_{l} and E_{u} is skipped (i.e., Vs =(1, 1, 0) or (1, 0, 1)). For some sequence such as Foreman with (QP_{B}, QP_{E}) =(25, 20), this number is about 34% that is comparable to the case of Vs = (1, 1,1). If we can predict SKIP mode for these extra patterns with high precision, SKIP mode detection ratio at EL could be increased and so will encoding speed.
2.2 SKIP mode detection based on RD cost comparison
The second observation is based on the SKIP mode cost comparison. To facilitate description, we define SKIP RD cost as the RD cost resulting from only checking SKIP mode. C(x) is used to represent the SKIP RD cost for any MB x. Please note that C(x) is not the RD cost of the best mode (minimum RD cost) after the mode decision procedure is finished. It is equal to the minimum RD cost only when SKIP mode has been chosen as the best mode.
It can be observed that, for each skipped MB E_{c} at EL, SKIP RD cost of E_{c} is usually closer to the SKIP RD cost of the skipped neighboring MB than the nonskipped one. For instance, if the top MB is skipped while the left MB is not, the absolute difference C(E _{ c })−C(E _{ u })) is usually smaller than C(E _{ c })−C(E _{ l })). This observation could be conveyed by the following equations:
In these equations, α is a configurable parameter and offers performance tradeoffs. Based on the analysis of many experimental results, we found α can be set to 1.5 to balance the encoding speed and video quality. Equation (1) is for the case of Vs =(1, 1, 1), i.e., all the B_{c}, E_{l}, E_{u} are skipped. In this case, if E_{c} is skipped, its SKIP RD cost is usually smaller than the average SKIP RD cost of the top and left MBs. Equations (2) and (3) correspond to Vs = (1, 1, 0) and Vs =(1, 1, 0), respectively.
Second, when Vs =(1, 1, 1) or Vs =(1, 1, 0) or (1, 0, 1), aside from the SKIP mode, there is also a large number of MBs that are of BL_SKIP mode. Table 2 provides the distribution of SKIP and BL_SKIP mode when the cost conditions in Equations (1), (2), and (3) are satisfied. The table demonstrates that the average percentages are more than 95% when all the above conditions are satisfied. This implies the proposed early skip approach would predict the skipped MBs with high precision.
2.3 Early SKIP by detecting zero CBP for mode direct16×16
Another way to decide skip mode before checking all other modes is by checking CBP after computing DIRECT16×16 mode. We observed that SKIP mode would have high probability to be chosen if Vs = (1, 1, 1) and CBP for DIRECT16×16 is zero. The validity of this observation can be verified through experimental results are shown in Table 3, where the percentages of skipped MB when CBP for DIRECT16×16 is zero are displayed. It can be seen from the table that, for most cases, the percentages are more than 90%, and the average values are 94.62% and 97.14% for (QP_{B}, QP_{E}) = (25, 20) and (QP_{B}, QP_{E}) =(35, 30), respectively. We also integrate this observation into our algorithm.
The overall skip mode detection method is based on all the observations described in Sections 1, 1, and 1. The whole procedure is described as follows. To make the steps clearer, procedures for P slices and B slices are separated.
Procedure for P slices:

Step 1) For each MB, if Vs =(1, 1, 1) or (1, 1, 0) or (1, 0, 1), goto STEP 2; otherwise, goto to STEP 3.

Step 2) If Vs =(1, 1, 1) and Equation (1) is not satisfied, or Equations (2) and (3) are not satisfied for Vs =(1, 1, 0) and Vs =(1, 0, 1), respectively, goto STEP 3; otherwise, check BL_SKIP mode and goto STEP 4.

Step 3) Conduct the normal exhaustive mode decision for the rest of modes and determine the best mode.

Step 4) Go to STEP 1 to process the next MB.
Procedure for B slices:

Step 1) For each MB, if Vs =(1, 1, 1) or (1, 1, 0) or (1, 0, 1), goto STEP 2; otherwise, goto to STEP 5.

Step 2) If Vs =(1, 1, 1), goto STEP 3; otherwise, goto STEP 4.

Step 3) If CBP is non zero, goto to STEP 5; otherwise check BL_SKIP mode and goto to STEP 6.

Step 4) If Equations (2) and (3) are not satisfied for Vs =(1, 1, 0) and Vs =(1, 0, 1), respectively, goto STEP 5; otherwise, check BL_SKIP mode and goto STEP 6.

Step 5) Conduct the normal exhaustive mode decision for the rest of modes and determine the best mode.

Step 6) Go to STEP 1 to process the next MB.
During the encoding process, SKIP RD cost for each MB should be saved for the computation. Moreover, if Vs = (1, 1, 0) or (1, 0, 1) and the left MB is not available, the right neighbor of the top MB would replace the left MB. On the other hand, if the right MB is not available, the left neighbor of the left MB would be used instead.
3 Experimental results
The configurations for the simulation is listed in Table 4. Sequences with various types of resolutions and motions are used. Sequence ‘Akiyo’, ‘Paris’, and ‘Silent’ are of CIF format, ‘Vidyo1’, ‘Shields’ and ‘Mobcal’ belong to HD720p format. ‘Vidyo1’ is a test sequence used for the development of the ongoing standard High efficiency video coding (HEVC)[19]. The QP differences are set to 2, 5, and 10 and higher DQP means large quality difference between BL and EL.
JSVM 9.19 is used for the two layer CGS encoding. The simulation environment is a PC of 2.2 GHz CPU and 8 GB memory. We use average bit rates increase (BDBR, %) and average peaksignaltonoise ratio decrease (BDPSNR, dB) to measure the average quality change [20]. We measure the total encoding time of BL and EL to reveal the speed of encoder. The average encoding time saving (ATS, %) is used to measure the reduction on encoding complexity, which is the average of time savings (TS) for various QPs. These parameters are important to compare the proposed fast approach with the reference software and the stateoftheart fast algorithm. The results are obtained for various sequences.
Table 5 lists the results when comparing the proposed approach with JSVM. It demonstrates that in average, the proposed approach achieves 35.15% reduction of encoding time with 0.011 dB BDPSNR and 0.589% BDBR loss. For slow motion sequences like “Akiyo”, the reduction could be as high as 56.52%.
Comparing with Shen’s algorithm, the proposed approach achieves 9.98% time reduction with 0.045 BDPSNR and 0.879% BDRATE gain decrement on average (Table 6). Specifically, the maximum time reduction 15.74% is achieved for sequence “Akiyo” at DQP =2 with 0.036 dB BDPSNR and 0.4% BDRATE gain. Overall, in half cases, the time reduction is more than 10% with negligible quality changes. Shen’s algorithm makes a skip decision at EL only when all the EL neighboring MBs and the collocated MB at BL are skipped. This constraint excludes many candidates from early skip. Our proposed algorithm only requires the collocated MB and one of the EL neighboring MBs to be skipped, hence enlarged the candidate set. Combined this relaxation with other CBP and RD cost comparison, the proposed algorithm achieves better performance. Besides, the proposed algorithm sometimes achieves RD gains over Shen’s algorithm as well as time reduction. We think it is because the algorithm sometimes checks BL_SKIP mode other than SKIP mode.
It is also interesting to see separate contributions of each part of the proposed algorithm. Because the RD cost comparison is interleaved with skip pattern detection, we only compare the algorithm with and without zero CBP detection and the results are listed in Table 7 for HD sequences. On average, the algorithm without and with zero CBP detection achieves 23.22 and 24.51% time reduction respectively with similar quality losses. Therefore, zero CBP detection provides about 1.29% time reduction.
Aside from the average performance metrics shown above, we choose one sequence from each resolution to show their RD performance and encoding time reduction for various bitrates. Figures 2, 3, and 4 show the RD performance curves for ‘Paris’ with CIF resolution. It can be seen that all three algorithms have similar RD curves. Encoding time per frame is displayed in Figures 2b, 3b, and 4b for each DQP. The proposed algorithm demonstrates much faster encoding speed than JSVM and Shen’s algorithm under low and medium bitrates. It can be observed that the encoding time reduction tends to be smaller for higher bitrates. It is due to the fact that smaller number of MBs are skipped with higher bitrates. Similarly, ‘Shields’ is selected as the candidates of sequences with HD720p resolution. Figures 5, 6, and 7 show similar curves for ‘Shields’. For sequence ‘Shields’, when DQP = 5, 10, both Shen’s and the proposed algorithm do not show much improvement under high bitrate scenarios. Further study is needed to improve encoding speed for the high bitrate case.
4 Conclusion
We propose a novel early skip algorithm in this article. By exploring extra skip patterns in the colocated MB at BL and spatial neighboring MBs at EL, further encoding speed improvement is achieved. The results demonstrate that, when combining with RD cost comparison and CBP checking, a larger portion of MBs in SKIP or BL_SKIP mode could be early detected with high precision. Compared with the reference software and the stateoftheart algorithm, the proposed algorithm leads to greater encoding speed improvement and negligible quality losses. For further speed up, this approach could be used as the early skip detection module in other SVC FMD algorithms. In the future, we are interested in exploring other mechanisms to speed up SVC encoders such as fast motion estimation [21–24].
References
 1.
Schwarz H, Marpe D, Wiegand T: Overview of the Scalable Video Coding Extension of the H.264/AVC Standard. IEEE Trans. Circ. Syst. Video Technol 2007, 17: 11031120.
 2.
Eleftheriadis A, Civanlar R, Shapiro O: Multipoint videoconferencing with scalable video coding. J. Zhejiang Univ. Sci. A 2006, 7: 696705.
 3.
Wien M, Schwarz H, Oelbaum T: Performance analysis of SVC. IEEE Trans. Circ. Syst. Video Technol 2007, 17: 11941203.
 4.
Lee J, Jeon B: Fast mode decision for H.264. In IEEE International Conference on Multimedia and Expo(ICME). Taipei, Taiwan; 2004:11311134.
 5.
Choi I, Lee J, Jeon B: Fast coding mode selection with ratedistortion optimization for MPEG4 Part10 AVC/H.264. IEEE Trans. Circ. Syst. Video Technol 2006, 16: 15571561.
 6.
Kim C, Shih H, Kuo C: Featurebased intraprediction mode decision for H.264. In IEEE International Conference on Image Processing(ICIP). Singapore; 2004:769772.
 7.
Wu D, Pan F, Lim K, Wu S, Li Z, Lin X, Rahardja S, Ko C: Fast intermode decision in H.264/AVC video coding. IEEE Trans. Circ. Syst. Video Technol 2005, 15: 953958.
 8.
Kannangara C, Richardson I, Bystrom M, Solera J, Zhao Y, MacLennan A, Cooney R: Lowcomplexity skip prediction for H.264 through Lagrangian cost estimation. IEEE Trans. Circ. Syst. Video Technol 2006, 16: 202208.
 9.
Li H, Li Z, Wen C: Fast mode decision algorithm for interframe coding in fully scalable video coding. IEEE Trans. Circ. Syst. Video Technol 2006, 16: 889895.
 10.
Park C, Dan B, Choi H, Ko S: A statistical approach for fast mode decision in scalable video coding. IEEE Trans. Circ. Syst. Video Technol 2009, 19: 19151920.
 11.
Jung S, Baek S, Park C, Ko S: Fast mode decision using allzero block detection for fidelity and spatial scalable video coding. IEEE Trans. Circ. Syst. Video Technol 2010, 20: 201206.
 12.
Lin H, Peng W, Hang H: Fast contextadaptive mode decision algorithm for scalable video coding with combined coarsegrain quality scalability (CGS) and temporal scalability. IEEE Trans. Circ. Syst. Video Technol 2010, 20: 732748.
 13.
Lee B, Kim M: A low complexity mode decision method for spatial scalability coding. IEEE Trans. Circ. Syst. Video Technol 2011, 21: 8895.
 14.
Zhao T, Wang H, Kwong S, Kuo CC, Halang W: Hierarchical Bpicture mode decision in H.264/SVC. J. Visual Commun. Image Represent 2011, 22: 627633. 10.1016/j.jvcir.2011.07.004
 15.
Lin H, Hang H, Peng W: Fast bidirectional prediction selection in H.264/MPEG4 AVC temporal scalable video coding. IEEE Trans. Circ. Syst. Video Technol 2011, 20: 35083523.
 16.
Lee B, Kim M: An efficient interprediction mode decision method for temporal scalability coding with hierarchical Bpicture structure. IEEE Trans. Broadcast 2012, 58(2):285290.
 17.
Zhao T, Kwong S, Wang H, Kuo CC: H.264/SVC mode decision based on optimal stopping theory. IEEE Trans. Image Process 2012, 21(5):26072618.
 18.
Shen L, Sun Y, Liu Z, Zhang Z: Efficient SKIP mode detection for coarse grain quality scalable video coding. IEEE Signal Process. Lett 2010, 17: 887890.
 19.
Sullivan G, Ohm JR: Recent developments in standardization of high efficiency video coding (HEVC). Proc. SPIE 2010, 77987798.
 20.
Bjontegaard G: ITUT Q6/SG16. Calculation of average PSNR differences between RDcurves. VCEGM33 2001, 1616.
 21.
Katayama T, Hamamoto T, Song T, Shimamoto T: Improvement of motion estimation with modified search center and search range for H.264/SVC. In International Technical Conference on Circuits/Systems, Computers and Communications. Korea; 2009:401404.
 22.
Katayama T, Hamamoto T, Song T, Shimamoto T: Motion based lowcomplexity algorithm for spatial scalability of H.264/SVC. 2010.
 23.
Na S, Kyung C: Activitybased motion estimation scheme for H.264 scalable video coding. IEEE Trans. Circ. Syst. Video Technol 2010, 20: 14751485.
 24.
Shen L, Zhang Z: Contentadaptive motion estimation algorithm for coarsegrain SVC. IEEE Trans. Image Process 2012, 21(5):25822591.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Received
Accepted
Published
DOI
Keywords
 Base Layer
 Mode Decision
 High Efficiency Video Code
 Scalable Video Code
 Enhancement Layer