A novel early SKIP mode detection method for coarse grain quality in H.264/SVC

Various fast mode decision algorithms are proposed to reduce computational complexity for the H.264/scalable video coding standard. In this article, we focus on skip mode detection for coarse grain quality (CGS). We propose a fast skip mode detection approach by exploiting surrounding skip patterns, rate distortion (RD) cost and coded block pattern (CBP). Experimental results show that, with 0.589% BDBR and 0.011 dB BDPSNR loss on average, the proposed method achieves considerable encoding time reduction compared with the reference software. Compared with the state-of-the-art early skip algorithm, the proposed method also achieves 9.98% time reduction on average.


Introduction
H.264 scalable video coding (SVC) can produce a single bitstream to adapt to various network conditions and hence it can be used to improve video quality over lossy networks [1]. Apart from its superior error resilience capabilities, the elimination of multipoint control unit (MCU) in video conference systems leads to great reduction of deployment cost and transmission delay [2]. Compared with H.264 single layer video coding, SVC requires coding of both base layer (BL) and enhancement layer (EL). To combat the additional bits required for multiple layers, SVC employs inter-layer prediction methods, including inter-layer intra prediction, residual prediction and motion prediction [1,3]. These additional layers and prediction modes greatly increase coding complexity. Therefore, efficient mode decision is necessary for real-time SVC applications.
In the literature, various fast mode decision (FMD) algorithms have been proposed to reduce H.264 computational complexity for single layer coding [4][5][6][7][8]. For H.264 SVC, there also exists considerable research study on FMD. Li et al. proposed a FMD algorithm in [9] where the RD costs of INTRA4×4 and INTER8×8 are checked to separate intra and inter modes. Moreover, reduction of candidate modes are performed by exploiting mode correlation between BL and EL. In [10], Park et al. proposed a FMD method with initial mode selection. In their method, *Correspondence: hao@csu.edu.cn School of Information Science and Engineering, Central South University, Changsha, Hunan, China SKIP, BL SKIP and INTRA4×4 are first checked; then the mode with the highest expected RD cost (when skipped) is checked. A threshold is set for early termination. Kim et al. proposed a mode decision algorithm where a parameter is calculated from the weighted sum of all the modes from the neighboring MBs of BL and EL. Selected modes are then checked based on this parameter. In [11], an all-zeroblock (AZB) detection algorithm at EL is proposed. Based on the analysis of correlation of AZB occupance between BL and EL, only some distributions are allowed to be AZB candidates for the EL. Furthermore, for a specific BL MB mode, priorities of AZB checking are given to increase the probability of AZB early detection. A new FMD algorithm is proposed in [12], where quantization parameter (QP) values at both BL and EL and temporal layer indexes are used for mode decision. In [13], a novel FMD is proposed by exploiting the zero motion blocks and zero coefficient blocks to reduce the number of modes to be checked. A classification method is proposed in [14], where the mode in the first class (including SKIP and INTER16×16) is first checked. Further mode checking would be performed depending on temporal layer indexes and whether all the resulted coefficients are zeros. Lin et al. attacked the FMD problem from another angle in [15]. They proposed a method to reduce the number of bi-directional predictions and thus greatly save computational time. Lee and Kim proposed a new FMD based on statistical hypothesis testing [16]. Mean and variance of pixels are used to decide early termination conditions. Zhao et al. proposed to use optimal stopping theory for early termination http://jivp.  decisions [17]. They classified all the candidate modes into modes without and with residual prediction (referred to as regular modes and residual modes). Optimal stopping theory is used for the regular modes. A reduced candidate set is used for residual modes based on the correlation between regular and residual modes. Zero block detection is also exploited to speed up the decision process of the residual modes. All the above FMD algorithms try to speed up the full mode decision process from different aspects. For example, mode correlation, expected RD cost, AZB detection are employed in [9][10][11], respectively. However, as an important part of FMD, skip mode decision is not studied as thoroughly as other parts of mode decision. Therefore, the scheme proposed in this article is complementary to other mode decision methods and would further increase performance gain. The only method that focuses on early skip is [18] proposed by Shen et al. for CGS (referred as Shen's algorithm). Their  mode is prevalent at EL for low-bit rate SVC and requires little computation [18]. Simulations in [18] show that the average percentage for SKIP mode at EL is 59-71%, while for low or medium motion sequences this percentage can be as high as 70%. Based on this observation, Shen's algorithm performs early SKIP mode detection based on the co-located MB at BL and the neighboring left and top MBs at EL. For example, Figure 1 shows the relevant MBs for this early SKIP mode detection approach. As defined in [18], the current MB is denoted by E c , the co-located MB at BL is denoted by B c , the left, top and upper-left MBs at EL are denoted by E l , E u , E lu , respectively. We define a 3-tuple Vs = (bskip(B c ), bskip(E l ), bskip(E u )), where bskip(B c ) is a binary variable that is set to one if B c is a skipped MB and zero otherwise. Hence, Shen's algorithm is to determine E c to be a skipped MB if Vs = (1, 1, 1). When this happens, mode decision process is early terminated and encoding speed is greatly increased with small average PSNR loss. However, for some fast moving    1, 1 ,1). If we can predict SKIP mode for these extra patterns with high precision, SKIP mode detection ratio at EL could be increased and so will encoding speed.

SKIP mode detection based on RD cost comparison
The second observation is based on the SKIP mode cost comparison. To facilitate description, we define SKIP RD cost as the RD cost resulting from only checking SKIP mode. C(x) is used to represent the SKIP RD cost for any MB x. Please note that C(x) is not the RD cost of the best mode (minimum RD cost) after the mode decision procedure is finished. It is equal to the minimum RD cost only when SKIP mode has been chosen as the best mode. It can be observed that, for each skipped MB E c at EL, SKIP RD cost of E c is usually closer to the SKIP RD cost of the skipped neighboring MB than the non-skipped one. For instance, if the top MB is skipped while the left MB is not, the absolute difference |C(E c ) − C(E u ))| is usually In these equations, α is a configurable parameter and offers performance tradeoffs. Based on the analysis of many experimental results, we found α can be set to 1.5 to balance the encoding speed and video quality. Equation (1) is for the case of Vs = (1, 1, 1), i.e., all the B c , E l , E u are skipped. In this case, if E c is skipped, its SKIP RD cost is usually smaller than the average SKIP RD cost of the top and left MBs. Equations (2) and (3) correspond to Vs = (1, 1, 0) and Vs = (1, 1, 0), respectively. Second, when Vs = (1, 1, 1) or Vs = (1, 1, 0) or (1, 0, 1), aside from the SKIP mode, there is also a large number of MBs that are of BL SKIP mode. Table 2  satisfied. The table demonstrates that the average percentages are more than 95% when all the above conditions are satisfied. This implies the proposed early skip approach would predict the skipped MBs with high precision.

Early SKIP by detecting zero CBP for mode direct16×16
Another way to decide skip mode before checking all other modes is by checking CBP after computing DIRECT16×16 mode. We observed that SKIP mode would have high probability to be chosen if Vs = (1, 1, 1) and CBP for DIRECT16×16 is zero. The validity of this observation can be verified through experimental results are shown in  (35, 30), respectively. We also integrate this observation into our algorithm. The overall skip mode detection method is based on all the observations described in Sections 2.1, 2.2, and 2.3. The whole procedure is described as follows. To make Procedure for P slices: Step 1) For each MB, if Vs = (1, 1, 1) or (1, 1, 0) or (1, 0, 1), goto STEP 2; otherwise, goto to STEP 3.
Step 3) Conduct the normal exhaustive mode decision for the rest of modes and determine the best mode.
Step 4) Go to STEP 1 to process the next MB.
Step 5) Conduct the normal exhaustive mode decision for the rest of modes and determine the best mode.
Step 6) Go to STEP 1 to process the next MB.
During the encoding process, SKIP RD cost for each MB should be saved for the computation. Moreover, if Vs = (1, 1, 0) or (1, 0, 1) and the left MB is not available, the right neighbor of the top MB would replace the left MB. On the other hand, if the right MB is not available, the left neighbor of the left MB would be used instead.

Experimental results
The configurations for the simulation is listed in Table 4. Sequences with various types of resolutions and motions are used. Sequence ' Akiyo' , 'Paris' , and 'Silent' are of CIF format, 'Vidyo1' , 'Shields' and 'Mobcal' belong to HD720p format. 'Vidyo1' is a test sequence used for the development of the ongoing standard High efficiency video coding (HEVC) [19]. The QP differences are set to 2, 5, and 10 and higher DQP means large quality difference between BL and EL. JSVM 9.19 is used for the two layer CGS encoding. The simulation environment is a PC of 2.2 GHz CPU and 8 GB memory. We use average bit rates increase (BDBR, %) and average peak-signal-to-noise ratio decrease (BDP-SNR, dB) to measure the average quality change [20]. We measure the total encoding time of BL and EL to reveal the speed of encoder. The average encoding time saving (ATS, %) is used to measure the reduction on encoding complexity, which is the average of time savings (TS) for various QPs. These parameters are important to compare the proposed fast approach with the reference software and the state-of-the-art fast algorithm. The results are obtained for various sequences. Table 5 lists the results when comparing the proposed approach with JSVM. It demonstrates that in average, the proposed approach achieves 35.15% reduction of encoding time with 0.011 dB BDPSNR and 0.589% BDBR loss. For slow motion sequences like "Akiyo", the reduction could be as high as 56.52%.
Comparing with Shen's algorithm, the proposed approach achieves 9.98% time reduction with 0.045 BDPSNR  (Table 6). Specifically, the maximum time reduction 15.74% is achieved for sequence "Akiyo" at DQP = 2 with 0.036 dB BDPSNR and 0.4% BDRATE gain. Overall, in half cases, the time reduction is more than 10% with negligible quality changes. Shen's algorithm makes a skip decision at EL only when all the EL neighboring MBs and the collocated MB at BL are skipped. This constraint excludes many candidates from early skip. Our proposed algorithm only requires the collocated MB and one of the EL neighboring MBs to be skipped, hence enlarged the candidate set. Combined this relaxation with other CBP and RD cost comparison, the proposed algorithm achieves better performance. Besides, the proposed algorithm sometimes achieves RD gains over Shen's algorithm as well as time reduction. We think it is because the algorithm sometimes checks BL SKIP mode other than SKIP mode. It is also interesting to see separate contributions of each part of the proposed algorithm. Because the RD cost comparison is interleaved with skip pattern detection, we only compare the algorithm with and without zero CBP detection and the results are listed in Table 7 for HD sequences. On average, the algorithm without and with zero CBP detection achieves 23.22 and 24.51% time reduction respectively with similar quality losses. Therefore, zero CBP detection provides about 1.29% time reduction.
Aside from the average performance metrics shown above, we choose one sequence from each resolution to show their RD performance and encoding time reduction for various bitrates. Figures 2, 3, and 4 show the RD performance curves for 'Paris' with CIF resolution. It can be seen that all three algorithms have similar RD curves. Encoding time per frame is displayed in Figures 2b, 3b, and 4b for each DQP. The proposed algorithm demonstrates much faster encoding speed than JSVM and Shen's algorithm under low and medium bitrates. It can be observed that the encoding time reduction tends to be smaller for higher bitrates. It is due to the fact that smaller number of MBs are skipped with higher bitrates. Similarly, 'Shields' is selected as the candidates of sequences with HD720p resolution. Figures 5, 6, and 7 show similar curves for 'Shields' . For sequence 'Shields' , when DQP = 5, 10, both Shen's and the proposed algorithm do not show much improvement under high bitrate scenarios. Further study is needed to improve encoding speed for the high bitrate case.

Conclusion
We propose a novel early skip algorithm in this article. By exploring extra skip patterns in the co-located MB at BL and spatial neighboring MBs at EL, further encoding speed improvement is achieved. The results demonstrate that, when combining with RD cost comparison and CBP checking, a larger portion of MBs in SKIP or BL SKIP mode could be early detected with high precision. Compared with the reference software and the state-of-the-art algorithm, the proposed algorithm leads to greater encoding speed improvement and negligible quality losses. For further speed up, this approach could be used as the early skip detection module in other SVC FMD algorithms. In the future, we are interested in exploring other mechanisms to speed up SVC encoders such as fast motion estimation [21][22][23][24].