 Research
 Open Access
 Published:
Fast and adaptive mode decision and CU partition early termination algorithm for intraprediction in HEVC
EURASIP Journal on Image and Video Processing volume 2017, Article number: 86 (2017)
Abstract
High Efficiency Video Coding (HEVC or H.265), the latest international video coding standard, displays a 50% bit rate reduction with nearly equal quality and dramatically higher coding complexity compared with H.264. Unlike other fast algorithms, we first propose an algorithm that combines the CU coding bits with the reduction of unnecessary intraprediction modes to decrease computational complexity. In this study, we first analyzed the statistical relationship between the best mode and the costs calculated through Rough Mode Decision (RMD) process and proposed an effective mode decision algorithm in intramode prediction process. We alleviated the computation difficulty by carrying out the RMD process in two stages, reducing 35 modes down to 11 modes in the first RMD process stage, and adding modes adjacent to the most promising modes selected during the first stage into the second RMD stage. After these two stages, we had two or three modes ready to be used in the rate distortion operation (RDO) process instead of the three or eight in the original HEVC process, which significantly reduced the number of unnecessary candidate modes in the RDO process. We then used the coding bits of the current coding unit (CU) as the main basis for judging its complexity and proposed an early termination method for CU partition based on the number of coding bits of the current CU. Experimental results show that the proposed fast algorithm provides an average time reduction rate of 53% compared to the reference HM16.12, with only 1.7% Bjontegaard delta rate increase, which is acceptable for RateDistortion performance.
Introduction
High Efficiency Video Coding (HEVC) is the newest video coding standard developed by the ITUT Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG) standardization organizations and the Joint Collaborative Team on Video Coding (JCTVC) [1]. HEVC aims to achieve an efficiently high coding compression rate compared to H.264/AVC [2], especially with highresolution video content. HEVC still uses the hybrid coding framework based on motion compensation, which was adapted by H.261. Under this framework, interframe prediction is used to eliminate the correlation of the domains of time and space. The prediction residual uses discrete cosine transform and quantization to eliminate spatial correlation. Adaptive entropy coding eliminates statistical redundancy. A loop filter is used to eliminate the quantization noise, but the HEVC has caused significant improvements in such aspects as the loop filter, image coding unit, contextadaptive binary arithmetic coding (CABAC), directional intraprediction, advanced motion vector prediction and merge, subpixel motion estimation and compensation, and sample adaptive offsets (SAO). Using objectoriented metrics, HEVC intracoding achieves an average bitrate reduction of 22% and up to 36% over H.264/AVC [3]. Compared with H.264, HEVC supports a larger range of coding block sizes and adapts a more flexible quadtree coding unit to solve problems in video images, including having different colors and textures, reference frame correlation, and partial information. According to the different functions, HEVC consists of four feature coding block units, including coding tree unit (CTU), coding unit (CU), prediction unit (PU), and transform unit (TU). In the current HEVC test model (HM) [4], video images are first divided into slices, which are then divided into CTUs of equal size or largest coding unit (LCU). An LCU is an N × N (64 × 64) block of luma simples paired with two corresponding blocks of chroma samples, the concept of which is broadly analogous to that of a macroblock (MB) in previous standards such as H.264/AVC [5]. The CTU allows a quadtree to be split recursively into four CUs of equal size. Each CU can then be encoded or split into four subCUs of equal size, and so on, with the process ending only when the smallest CU is reached. Splitting is the reason for the variation in the CU’s size and depth from 64 × 64 to 8 × 8 and from 0 to 3, respectively. Figure 1 shows the quadtree structure formed by CUs in HEVC. An LCU comprises optimal CU partitions is used as a low bit rate to encode the picture content diversity adaptively. These optimal CUs have the smallest rate distortion (RD) cost generated by a rate distortion optimization (RDO) process. When predicting a CU, it may be divided into PUs that contain individual prediction information. PU sizes and depths range from 64 × 64 to 4 × 4 and from 0 to 4, respectively. Each PU has up to 35 prediction modes, including a PLANAR, a DC, and 33 directional modes as shown in Fig. 2. The DC and PLANAR are two nondirectional modes applied to predict the area with homogeneous content. The 33 directional modes improve prediction accuracy, but adding computation difficulty significantly.
Figure 3 shows a picture divided into optimal CUs in HEVC. The picture sufficiently proves that the smaller CUs contain more information and texture complexity. In contrast, larger CUs contain lesser information. The HEVC encoder has to search for all possible CUs to obtain optimal CUs, resulting in an extremely large computation [6]. The process of finding the optimal CUs is highly timeconsuming because the final optimal CU can either be the current CU or any of the four subCUs, depending on whether the RD cost of the current CU is greater or lesser than the sum of RD costs of the four subCUs. Thus, the encoder has to encode CUs with sizes 64 × 64, 32 × 32, 16 × 16, and 8 × 8. An LCU measuring 64 × 64 is divided into four CUs measuring 32 × 32 each, which are then divided into four 16 × 16 CUs, with every 16 × 16 CU divided into four 8 × 8 CUs, yielding a total of 85 CUs for each LCU. Hence, 35 prediction modes have to be tested under each of the 85 CUs to gain the optimal partition plan. The proposed method seeks to decrease the 35 prediction modes to obtain the optimal CU partition as soon as possible. The prediction accuracy is equal to that of HEVC standards, thereby reducing the burden of computation. Finding a new algorithm to solve the computation difficulty is necessary. Hence, to widen the range of applications, HEVC should be widened to ultrahighdefinition formats known as 4K and 8K and realtime application scenarios.
Related work
Since the emergence of HEVC, several researchers and scientific research institutions focused considerable efforts in increasing the efficiency of HEVC, including carrying out various approaches to reduce intraprediction computation difficulty and coding complexity. There are many other researches aimed to solve video coding problems besides HEVC, for example [7,8,9,10], these papers also contributed to the development of video coding. But in this paper, we mainly optimize HEVC encoding. Ahn et al. [11] parallelized HEVC encoder interpolation filter, cost function, and transform to reduce the complexity of HEVC, which is caused by single instruction, multiple data (SIMD) operations, and datalevel parallelism. Min et al. [12] proposed a distributed video coding method with a hierarchical group of picture structure. Yan et al. [13] proposed a parallel framework to decouple motion estimation for different partitions on manycore process. The parallel deblocking filter for HEVC was proposed by Yan et al. [14]; they decrease HEVC complexity in decoder process. Peng et al. [15] introduced a fast coding algorithm that uses depth and color features to extract depth discontinuous regions, depth edge regions, and motion regions as masks for efficient processing and fast coding. Nishikori et al. [16] of the Department of Information and Electronics of the Tottori University in Japan proposed a fast CU size selection method that determined CU sizes using the variance value of the input image. Bai [17] put forward a fast coding tree unit algorithm that utilizes Sobel gradient and mean absolute deviation values to analyze the texture of the CU while filtering out unnecessary CU candidates to speed up the original intracoding in HEVC. Blasi et al. [18] proposed a method that involves visiting the smallest CUs first and continuing with the larger CUs up the quadtree and then extracting useful information to decide if a CTU is encoded using the reverse CU visiting. Belghith et al. [19] proposed a CU partition algorithm based on the Sobel edge detection process to decide on the appropriate CU size early. Goswami et al. [20] proposed a new approach that utilizes the RD costs of the parent and current levels to terminate the quadtreebased structure early, saving average time of 38.03%, 1.3% BDrate increase compared with HM 10.0. Shen et al. [21] proposed a CUsplitting, early termination algorithm that makes use of a support vector machine (SVM). Cen et al. [22] introduced a fast adaptive CU depth decision mechanism that utilizes the special correlations in the sequence frame, saving average time of 16% compared with HM 10.0. The aforementioned papers [15,16,17,18,19,20,21,22] are fast algorithms aimed at speeding up the HEVC intraprediction encoder through early termination of CU partition. There are other algorithms aimed to optimize HEVC in different way, for example, Yan et al. [23] proposed a parallel framework to decide coding unit trees on manycore processors. Yan et al. [24] used a directed acyclic graph to parallelize CTUs to optimize HEVC intraprediction.
A significant number of studies have focused on decreasing HEVC computation complexity by optimizing the intramode decision process. For example, Liu et al. [25] showed a fast mode decision algorithm that filters out unnecessary prediction units based on texture complexity and direction for HEVC intraprediction, with texture complexity defined as the difference between the current pixel and its surrounding pixels, and the standard deviation of the current CU block, saving average time of 38% compared with HM 10.0. Ruiz et al. [26] presented a texture orientation detection algorithm by computing the dominant gradient and reducing the unnecessary directional candidate modes in the RDO process, saving average time of 30.1% compared with HM 14.0. Jiang et al. [27] calculated gradient directions and generated a gradientmode histogram for each CU. The distribution of the histogram leaves only a small number of the candidate modes for the RMD and the RDO processes. Zhao et al. [28] took advantage of the direction information of the neighboring blocks to reduce the candidates in RDO process. Yan et al. [29] utilized early termination and pixelbased edge detection methods to reduce the number of candidates for the RDO process, saving average time of 23.52% compared with HM 7.0. Silva et al. [30] proposed an algorithm which took into account the edge direction information and explored the correlation of intramodes across levels of the HEVC hierarchical tree structure. Chen et al. [31] proposed a candidate mode selection algorithm that adds kernel density estimation to the histogram calculation. Chen et al. [32] proposed a fast mode depth decision algorithm based on edge detection and reconfiguration to decrease the computation complexity in intraprediction. In general, these papers [30,31,32] all considered edge detection to speed up the intraprediction mode process. Motra et al. [33] planned to reduce 35 modes to 17 modes following the direction information of the colocated neighboring blocks of the previous frame along with neighboring blocks of the current frame to speed up the intramode decision process, saving time of 23% compared with HM 6.0.
In addition to these fast intramode algorithms, several algorithms that combine early CU pruning and intramode decision together, as Shen et al. [34] did when they skipped a number of specific depth levels rarely used in spatially nearby CUs. RD cost and prediction mode correlations among different depth levels or spatially near CUs also exist. Their algorithm got on average time saving of 21.1 and 1.74% BDBR increase compared with HM 5.0. Liao et al. [35] combined CU depth information with the order of most possible modes (MPMs) skipping some unlikely modes to save 31% encoder time compared with HM16.0. Tian et al. [36] utilized the cost of two candidate modes and the texture consistency of neighborhood and current PU to reduce modes in RMD and RDO process, achieving an average time reduction of 30% compared with HM16.0. However, although they decreased the number of modes included in the RMD process in [33, 35, 36], the method they adapted is different from ours. Unlike in previous studies, the current study takes advantage of RMD in two stages, decreasing mode selection complexity and using CU coding bits as CU partition judgment to decide whether a CU needs to subdivide further into smaller CUs. Comprehensively, our algorithm achieved better compression efficiency and easier implementation.
Proposed method and experimental setup
This study is aimed to optimize the HEVC, speed up encoding efficiency, and maintain the almost equal encoding quality simultaneously. In this study, we speed up intraprediction in two phases. The first stage involves an adaptive mode decision algorithm while the second phase utilizes a fast CU early termination method. The goal of our algorithm is to perform the RMD process in two stages, namely (1) finding the most promising nodes in the first stage and (2) adding the modes adjacent to the most promising ones during the second stage of the RMD, with two or three modes selected during the two stages being adapted into the RDO process.
The material we tested is 7988 frames of all sequences (classes A to E) with different resolution. We first analyze the statistical data we collected, then find the appropriate thresholds in statistical way, last propose the fast algorithm and do experiments. We coded up to 7988 frames of all sequences (classes A to E) with the test conditions being “All IntraMain” [37]. QP values are set to 22, 27, 32, and 37 for all frames. The tested computer is GreatWall with a Windows 10 operating system running on Inter Core i5 (TM) CPU of 3.2 GHz and 4 GB RAM. Coding performance is measured in terms of Bjontegaard delta metrics (BDrate, BDPSNR) and time reduction rate.
The remainder of this paper is organized as follows: Section 3 describes the whole proposed algorithm in detail. Section 4 illustrates the experiment results and discussion. Finally, the conclusions are presented in Section 5.
Proposed algorithm
This paper presents the adaptive mode decision and CU partition early termination algorithm in two parts. First, the RMD process is divided into two stages according to the mathematical statistics results and early CU partition termination using thresholds based on the CU total coding bits. Figure 4 shows the procedure architecture of LCU coding of HM intraprediction, and Fig. 5 illustrates the architecture of the proposed algorithm. Figure 5 clarifies the proposed algorithm more clearly compared with Fig. 4.
RMD operation
In the current HEVC test model, two steps are used to decide on the best intracoding mode [38]. First, a subset of all intraprediction modes is obtained by calculating the SATD in RMD process. The first phase is adapted to relieve the burden of the encoder in intracoding. As shown in Table 1, the number (N) of the most promising modes subsets is predetermined to be 8 for 4 × 4 and 8 × 8 PUs and 3 for 16 × 16, 32 × 32, and 64 × 64 PUs using the following RMD evaluation cost equation:
where SATD is calculated by deriving the sum of the absolute Hadamard transform residual and B _{pred} is the number of bits needed to code the prediction mode information, and λ _{pred } presents the Lagrangian constant values variations with quantization parameters. The most possible modes (MPMs) derived from neighboring blocks are added to the subset. Second, in the best mode decision phase, the RD cost of each mode in the subset is computed to find the best mode, which can be calculated using the following formula:
where SSE is the sum of squared errors between the original input image block and the predicted block, B _{mode} is the number of bits needed for coding the current CU by the corresponding mode, and λ _{mode} is the Lagrange multiplier. The mode with the least RD cost is identified to be the best mode for finding the optimal residual quadtree (RQT) structure. SATD and RDO are the key operations for selecting the best mode [39]. The SATD operation aim in part to reduce RDO complexity. However, the computational complexity increased dramatically because all possible combinations of the mode candidates are calculated to determine the optimal RD cost using the Lagrange multiplier. To decrease HEVC complexity and speed up the process in determining the best intramode, we collected a considerable amount of data and observed the relationship between the most promising mode and the first N modes in the candidate list yielded by the RMD process. In our algorithm, we tested 11 instead of 35 modes in the RMD process, with the 11 modes being a DC, a PLANAR, and nine directional modes, which were selected by equal intervals. These directional modes are labeled using red in Fig. 6.
Prepared work
We analyzed five typical sequences with different motion and texture information and collected the best CU modes, as well as 35 modes corresponding to every RMD evaluation cost in each depth, with the best CU mode being nearly identical to the one listed before it in the candidate list, where the modes are ranked from the one as the lowest RMD evaluation cost to the one with the highest. The statistics collected from five typical sequences in HM were shown in Table 2; looking at the statistical data, it appears that the best CU mode is almost the same as the first mode (FM) or the second mode (SM) in the candidate list, especially in cases where both the first and second candidate modes are DC or PLANAR modes.
To further illustrate the relationship between two or three modes in the candidate list from the RMD process and the optimal mode in the RDO process, the relationship among PUs of different sizes, 64 × 64, 32 × 32, and 16 × 16, have to be explained. The results show that if the FM is a DC, PLANAR, or vertical directional mode (VDM), one of the three selected modes by the RMD process is the final best mode. If the first candidate mode is a directional mode, the selected best mode by the RDO process is the FM or one of the FM’s neighboring directional modes. In the case of 8 × 8 and 4 × 4 PUs, if the FM is a DC or PLANAR mode; the probability of the FM being selected as the best mode is almost 95%. If the first candidate mode is a directional mode, the selected best mode by the RDO process is most likely the FM or the neighboring directional modes of the FM. If the FM is a DC or a PLANAR mode and the SM is a directional mode, the selected best mode by the RDO process will most likely be the FM, SM, or any of the neighboring directional modes of the SM. We calculated the number of CUs that matched the aforementioned rules in every depth in the first frame of each of the five sequences to narrate the scheme more clearly and concisely. The CU matching ratio in every depth is displayed in Fig. 7, with the matching ratio α calculated as follows:
Figure 7 shows how the best CU modes obtain high matching ratios with the aforementioned rules, with all of the ratios being over 75% and fluctuating only minimally across different sequences. The texture of some PUs in natural video images changes slowly and smoothly, with DC and PLANAR modes being effective at intraprediction when applied to a homogeneous area. In other areas, the texture of PUs changes in a particular direction; the rough angle of which can be found in the RMD process and the optical direction of which is decided by the RDO process. These observations led to the proposal of an adaptive mode decision algorithm based on statistical data and image texture variation characteristics analysis.
Specific implementation of intramode decision algorithm
We divided the RMD process into two steps, following the image texture characteristics and statistical data. The specific details of the implementation are shown as the part of the flowchart above the red broken line in Fig. 5, details are further illustrated as following.
First, the 33 directional modes are reduced to 9 equally spaced modes, as marked by the red line in Fig. 6. These nine modes, a DC, and a PLANAR totally 11 modes will be tested using the RMD process to easily find the mode with the least J _{RMD}, namely the FM. Second, we will judge whether the FM is a DC or a PLANAR. If the FM is a DC or a PLANAR, this PU best mode is likely to adapt the form of a DC or a PLANAR. If the FM is a directional mode, we may speculate that this PU will most likely be a mode adjacent to the FM. We then add the modes adjacent to the FM to be tested during the RMD process, forming a new mode candidate list from which we select two or three modes during the RDO process to obtain the optimal mode. Computing for different PU types, we give two detailed solutions to acquire the optimal mode faster than HEVC.
Solution 1 is applied to PUs sized 64 × 64, 32 × 32, and 16 × 16 after the FM is obtained during the RMD process. If the FM is a DC, a PLANAR, or a VDM, we maintain three modes in the candidate list, with these three modes added by MPMs being tested during the RDO process. If the FM is a directional mode, we add four modes adjacent to the FM as FM − 1, FM − 2, FM + 1, and FM + 2 to be tested during the RMD process again. Then, the modes in the candidate list will be rearranged in an ascending order based on the value of their J _{RMD}. Lastly, we choose two modes from the top of the candidate list to be tested during the full RDO process. Hence, only two modes instead of three will be tested during the RDO process.
Solution 2 is applied to PUs sized 8 × 8 and 4 × 4. In HEVC, eight modes selected by the RMD process add MPMs to the RDO process. In our algorithm, we judge whether the FM and the SM are both DC or PLANAR. If the FM is a DC or a PLANAR, we select two modes instead of eight for the RDO process. If the SM is a directional mode, we add four modes adjacent to the SM as SM − 1, SM − 2, SM + 1, and SM + 2 to be tested during the RMD process again, and then, we choose two modes at the top of candidate list which is arranged in an ascending order based on the value of J _{RMD}. If the FM is a directional mode, we add four modes adjacent to the FM as FM − 1, FM − 2, FM + 2, and FM + 1 to be tested during the RMD process again. We then choose two modes at the top of candidate list which is arranged in an ascending order based on the value of J _{RMD}.
Supplementary explanation, the FM or SM is mode number 2 or 34, their neighboring modes only have two modes, respectively.
Table 3 shows the difference of the modes we selected is tested in RMD process and RDO process in our proposed algorithm compared with HM.
Combining the two solutions, significant encoding time is saved over HEVC while the encoder quality remains nearly the same. The pseudocode is provided as following, corresponding the part of the flowchart above the red broken line in Fig. 5.
Intramode decision algorithm pseudocode 
First RMD stage: put DC, planar, and 9 modes labeled in Fig. 6 into first RMD stage to get candidate list Second RMD stage: Solution 1: if (PU size = = 64 × 6432 × 3216 × 16) If (FM = = DCplanarVDM) Maintain first 3 modes in candidate list into RDO Else Select FM − 2, FM − 1, FM + 1, and FM + 2 modes into second RMD stage Get reconstructed candidate list Retain first 2 modes in candidate list into RDO Solution 2: else If ((FM = = DC && SM = = planar)(FM = = planar && SM = = DC)) Maintain first 2 modes in candidate list into RDO Else If (FM = = DM) Select FM − 2, FM − 1, FM + 1, and FM + 2 modes into second RMD stage Get reconstructed candidate list Retain first 2 modes in candidate list into RDO Else Select SM – 2, SM − 1, SM + 1, and SM + 2 modes into second RMD stage Get reconstructed candidate list Retain first 2 modes in candidate list into RDO 
Fast CU partition early termination algorithm
To save more time, the CU partition rule is studied further. As shown in Fig. 3, every frame in the picture will be divided into different sized CUs, the best combinations of which follow some rules. The CUs homogeneous in area and contain less complex information require lesser coding bits. CUs usually heterogeneous in area and contain more dynamic information require more coding bits as shown in Fig. 8a. The first LCU in Fig. 8a is sized 64 × 64, meaning its best depth is 0 and its coding bits count is 95. The second LCU is divided into different sized CU combinations, with the largest depth at 3. The coding bits of every CU are shown in Fig. 8b. The first LCU is an obviously homogeneous area with less dynamic information that requires less coding bits. The second LCU is a heterogeneous area that contains more dynamic information and requires more coding bits. The heterogeneous LCU with more coding bits is usually divided into subCUs, with the heterogeneous subCUs with more coding bits being subdivided further into smaller subCUs until the optimal combination is achieved. This optimal combination is the least resourceconsuming and requires the least amount of data transfer. Every CU requires different coding bits, and we have simplified the process of finding the best intraprediction mode to obtain the CU coding bits.
If the CUs in each level are used in the process of finding the best combination, a considerable amount of time is wasted. Hence, we propose a CU early pruning algorithm that takes advantage of obtaining coding bits earlier to speed up the process of finding the best CU combination. HEVC uses essentially the same uniform reconstruction quantization scheme controlled by a quantization parameter (QP) as in H.264/MPEG4 AVC. The range of QP values is defined from 0 to 51, and an increase by 6 doubles the quantization step size such that the mapping of QP values to step sizes is approximately logarithmic. The QP values affect the number of CU’s total coding bits required, if the QP is larger, the CU’s total coding bits is smaller. The QP values in this paper are defined as 22, 27, 32, and 37 recommended by [37] to collect statistical data and test all sequences. We count the demand coding bits of different CUs sized 64 × 64, 32 × 32, and 16 × 16 under different quantization parameters (QP) derived through comparative analysis. By setting a certain coding bit threshold, we can obtain the best CU combinations more quickly than in HEVC.
Part (b) of Fig. 8 shows the coding bits of part (a) of Fig. 8, where the LCU in the yellow box shows the CUs of every level with sizes 64 × 64, 32 × 32, and 16 × 16 from top to bottom, with the number of coding bits shown. All demarcated CUs are divided into four subCUs. The CUs outside the yellow mark are the best CU partitions.
We analyzed the demanding coding bits of different sized CUs using different sequences to obtain an accurate threshold and achieve higher compression efficiency and lower quality loss. The thresholds of the CUs sized 64 × 64, 32 × 32, and 16 × 16 under different QPs are shown in Table 4. We analyzed the accuracy in each threshold and described the results in Fig. 9. The error of our algorithm exists that the CUs’ coding bits is smaller than the thresholds we set; these CUs are not divided into four nextlevel CUs in our algorithm. Instead, these CUs are divided into four nextlevel CUs in HEVC, as shown in Fig. 8 (32 × 32 CU 139 coding bits).
The number of CU coding bits required under the best intramode can be obtained, and then, we judge whether the number of coding bits is smaller than the threshold set by our statistical data. If the number of coding bits is smaller than the threshold shown in Table 4, we can end the CU partition early. Otherwise, CU continues to be divided into four subCUs, as shown in the detailed implementation process illustrated under the red broken line in Fig. 5. The HEVC encoder achieves higher encoding efficiency by judging the amount of CU coding bits in advance.
Results and discussion
We use our proposed fast intramode decision and CU partition early termination algorithm on HM16.12 to evaluate the effectiveness of the algorithm. Time reduction is calculated by the following equation:
where T _{HM16.12} is the coding time of HM16.12, T _{proposed} is the coding time of HM16.12 using the proposed algorithm, and ∆Tis the time reduction. The decrease in PSNRY is calculated using Eq. (5):
7988 frames of all sequences (classes A to E) with different resolution are tested to checkout our proposed algorithm. Table 5 shows that on average [40], the proposed algorithm achieved 53% time reduction, 1.7% BDrate increase, and 0.08 dB BDPSNR decrease. Table 5 also shows the experimental results of [35, 36], under same test condition, [35] achieved 31% time reduction on average, 0.7% BDrate increase compared with HM 16.0, although [36] got 30% time reduction on average, and it brought 1.8% BDrate increase compared with HM 16.0. The comparison data certified our proposed algorithm is efficient sufficiently. Generally, the different sequences obtain different timesaving percentages mainly because the different sequence frames have different detail changes and complexity. Table 5 shows our proposed algorithm reduces coding time of “Kimono” by up to 68%. Figure 10 shows that the RD curves of “Kimono” are almost the same as that of the original encoder. Our algorithm performs very well in sequences class A to D, especially in highresolution sequences. Time reduction and quality rates are almost perfect, making the algorithm applicable for compressing videos that require realtime and highresolution compression. However, the BDrate of three sequences under class E is not the same or better than the others. We also explore why the sequences of PartyScene, BlowingBubbles, and BQSquare have time reduction rates that fall below 40%. Compared with other sequences with higher time reduction, we find that the CUs of these sequences will most likely be divided into smaller CUs, indicating that their corresponding depths are larger than that of other sequences, and hence, the threshold set to end the CU partition earlier does not play an important role in the compression process. Overall, the proposed algorithm significantly improved coding efficiency while maintaining the average RD performance at nearly the same level.
Conclusions
This paper presents an adaptive mode decision and CU partition early termination algorithm for alleviating the computational complexity of the HEVC intraencoder. The proposed algorithm is performed on HEVC reference software HM16.12. The adaptive mode decision and CU partition early termination algorithms both reduce encoder time in different HEVC processes. The aforementioned experiment results show that through the implementation of the RMD process as a twostage process, decreasing the number of promising modes during the RDO process, and using the coding bits generated during the HEVC coding process, our proposed algorithm can improve intracoding efficiency. Our algorithm reduced total coding time by 53% while maintaining coding performance at nearly the same level as that of the original HEVC encoder.
Abbreviations
 BDRate:

Bjontegaard delta rate
 CABAC:

Contextadaptive binary arithmetic coding
 CTU:

Coding tree unit
 CU:

Coding unit
 FM:

First mode
 HEVC:

High Efficiency Video Coding
 HM:

HEVC test model
 JCTVC:

Joint Collaborative Team on Video Coding
 LCU:

Largest coding unit
 MB:

Macroblock
 MPEG:

Moving Picture Exports Group
 MPMs:

Most possible modes
 PSNR:

Peak to signal noise ratio
 PU:

Prediction unit
 QP:

Quantization parameters
 RD:

Rate distortion
 RDO:

Ratedistortion operation
 RMD:

Rough Mode Decision
 RQT:

Residual quadtree
 SAO:

Sample adaptive offsets
 SATD:

Sum of the absolute transformed difference
 SM:

Second mode
 SVM:

Support vector machine
 TU:

Transform unit
 VCEG:

Video Coding Experts Group
 VDM:

Vertical directional mode
References
 1.
GJ Sullivan, JR Ohm, WJ Han, T Wiegand, Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
 2.
T Wiegand, GJ Sullivan, The H.264/AVC video coding standard. IEEE Signal Process. Mag. 24(2), 148–153 (2007)
 3.
J Lainema, F Bossen, WJ Han, J Min, K Ugur, Intra coding of the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1792–1801 (2012)
 4.
GJ Sullivan, JR Ohm, HEVC software guidelines(Joint Collaborative Team on Video Coding (JCTVC) of ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11, document JCTVCH1001, 8th Meeting, San José, CA, USA), 2012
 5.
L Shen, Z Zhao, P An, Fast CU size decision and mode decision algorithm for HEVC intra coding. IEEE Consum. Electron. Soc. 59(1), 207–213 (2013)
 6.
MKJ Kim, JCTVCC067 TE9: report on large block structure testing ( In proceedings of Third meeting of the Joint Collaborative Team on Video Coding (JCTVC), Guangzhou, China) (2010), pp. 2–4
 7.
C Yan, H Xie, D Yang, J Yin, Y Zhang, Q Dai, Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transp. Syst. PP(99), 1–12 (2017)
 8.
H Bai, C Zhu, Y Zhao, Optimized multiple description lattice vector quantization for wavelet image coding. IEEE Trans. Circuits Syst. Video Technol. 17(7), 912–917 (2007)
 9.
C Yan, H Xie, S Liu, J Yin, Y Zhang, Q Dai, Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans. Intell. Transp. Syst. PP(99), 1–10 (2017)
 10.
H Bai, W Lin, M Zhang, A Wang, Y Zhao, Multiple description video coding based on human visual system characteristics. IEEE Trans. Circuits Syst. Video Technol. 24(8), 1390–1394 (2014)
 11.
YJ Ahn, TJ Hwang, DG Sim, WJ Han, Implementation of fast HEVC encoder based on SIMD and datalevel parallelism. EURASIP J. Image Video Process. 2014, 16 (2014)
 12.
KY Min, W Lim, J Nam, D Sim, IV Bajić, Distributed video coding supporting hierarchical GOP structures with transmitted motion vectors. EURASIP J. Image Video Process. 2015, 12 (2015)
 13.
C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, F Wu, Efficient parallel framework for HEVC motion estimation on manycore processors. IEEE Trans. Circuits Syst. Video Technol. 24(12), 2077–2089 (2014)
 14.
C Yan, Y Zhang, F Dai, X Wang, L Li, Q Dai, Parallel deblocking filter for HEVC on manycore processor. Electron. Lett. 50(5), 367–368 (2014)
 15.
Z Peng, H Han, F Chen, G Jiang, M Yu, Joint processing and fast encoding algorithm for multiview depth video. EURASIP J. Image Video Process. 2016, 24 (2016)
 16.
T Nishikori, T Nakamura, T Yoshitome, K Mishiba, A fast CU decision using image variance in HEVC intra coding (IEEE Symposium on Industrial Electronics & Applications (ISIEA)) (2013), pp. 52–56
 17.
C Bai, C Yuan, Fast coding tree unit decision for HEVC intra coding (IEEE International Conference on Consumer Electronics, China) (2013), pp. 28–31
 18.
SG Blasi, I Zupancic, E Izquierdo, E Peixoto, Fast HEVC coding using reverse CU visiting (Picture Coding Symposium(PCS)) (2015), pp. 50–54
 19.
F Belghith, H Kibeya, MAB Ayed, N Masmoudi, Fast coding unit partitioning method based on edge detection for HEVC intracoding. SIViP 10(5), 811–818 (2016)
 20.
K Goswami, BG Kim, D Jun, SH Jung, SC Jin, Early coding unitsplitting termination algorithm for high efficiency video coding (HEVC). ETRI J. 36(3), 407–417 (2014)
 21.
X Shen, L Yu, CU splitting early termination based on weighted SVM. EURASIP J. Image Video Process. 2013, 4 (2013)
 22.
YF Cen, WL Wang, XW Yao, A fast CU depth decision mechanism for HEVC. Inf. Process. Lett. 115(9), 719–724 (2015)
 23.
C Yan, Y Zhang, J Xu, F Dai, L Li, J Zhang, Q Dai, F Wu, A highly parallel framework for HEVC coding unit partitioning tree decision on manycore processors. IEEE Signal Process Lett. 21(5), 573–576 (2014)
 24.
C Yan, Y Zhang, F Dai, J Zhang, L Li, Q Dai, Efficient parallel HEVC intra prediction on manycore processor. Electron. Lett. 50(11), 805–806 (2014)
 25.
Y Liu, X Liu, P Wang, A texture complexity based fast prediction unit size selection algorithm for HEVC intracoding (IEEE 17th International Conference on Computational Science and Engineering) (2014), pp. 1585–1588
 26.
D Ruiz, G FernándezEscribano, JL Martínez, P Cuenca, Fast intra mode decision algorithm based on texture orientation detection in HEVC. Signal Process. Image Commun. 44(C), 12–28 (2016)
 27.
W Jiang, H Ma, Y Chen, Gradient based fast mode decision algorithm for intra prediction in HEVC (IEEE, 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet)) (2012), pp. 1836–1840
 28.
L Zhao, L Zhang, S Ma, D Zhao, Fast mode decision algorithm for intra prediction in HEVC (Visual Communications and Image Processing (VCIP)) (2011), pp. 1–4
 29.
S Yan, L Hong, W He, Q Wang, Groupbased fast mode decision algorithm for intra prediction in HEVC (Eighth International Conference on Signal Image Technology and Internet Based Systems) (2012), pp. 225–229
 30.
TLD Silva, LADS Cruz, LV Agostini, HEVC intra mode decision acceleration based on tree depth levels relationship (IEEE, Picture Coding Symposium(PCS)) (2013), pp. 277–280
 31.
G Chen, Z Liu, T Ikenaga, D Wang, Fast HEVC intra mode decision using matching edge detector and Kernel density estimation alike histogram generation (IEEE International Symposium on Circuits and Systems(ISCAS)) (2013), pp. 53–56
 32.
G Chen, L Sun, Z Liu, T Ikenaga, Fast mode and depth decision HEVC intra prediction based on edge detection and partitioning reconfiguration (International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)) (2013), pp. 38–41
 33.
AS Motra, A Gupta, M Shukla, P Bansal, V Bansal, Fast intra mode decision for HEVC video encoder, (SoftCOM 20th International Conference on Software, Telecommunications and Computer Networks) (2012), pp. 1–5
 34.
L Shen, Z Zhang, P An, Fast CU size decision and mode decision algorithm for HEVC intra coding. IEEE Trans. Consum. Electron. 59(1), 207–213 (2013)
 35.
W Liao, D Yang, Z Chen, A fast mode decision algorithm for HEVC intra prediction, (IEEE Visual Communications and Image Processing (VCIP)) (2016), pp. 1–4
 36.
R Tian, Y Zhang, R Fan, G Wang, Adaptive fast mode decision for HEVC intra coding, (Digital Image Computing Techniques and Applications (DICTA)) (2016), pp. 1–6
 37.
F Bossen, Common test conditions and software reference configurations, Joint Collaborative Team on Video Coding (JCTVC) of ITUT SG16 WP3 and ISO/IEC JCT1/SC29/WG11, (Doc. JCTVCB300, 2nd Meeting, Geneva, CH, 2128 July) (2010), pp. 14–23
 38.
M Zhang, J Qu, H Bai, Entropybased fast largest coding unit partition algorithm in highefficiency video coding. Entropy 15(6), 2277–2287 (2013)
 39.
M Zhang, C Zhao, JZ Xu, An adaptive fast intra mode decision in HEVC, (IEEE International Conference on Image Processing (ICIP)) (2012), pp. 221–224
 40.
G Bjontegaard, Calculation of average PSNR differences between RDcurves (doc.VCEGM33, in ITUT VCEG 13th Meeting, Austin) (2001), pp. 2–4
Acknowledgements
Not applicable.
Availability of data and materials
The conclusion and comparison data of this article are included within the article.
Funding
This work is supported by the National Natural Science Foundation of China (no. 61370111), Beijing Municipal Natural Science Foundation (no. 4172020), Beijing Nova Programme (Z141101001814032), Beijing Youth Talent Project (CIT&TCD 201504001), and Beijing Municipal Education Commission General Program (KM201610009003).
Author information
Affiliations
Contributions
MZ proposed the framework of this work, and XZ carried out the whole experiments and drafted the manuscript. ZL offered useful suggestions and helped to modify the manuscript. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Mengmeng Zhang.
Ethics declarations
Authors’ information
MMZ: Doctor of Engineering, Professor, Master Instructor, Master of Communication and Information Systems. His major research interests include the Video codec, Embedded systems, Image processing, and Pattern recognition. He has authored or coauthored more than 30 refereed technical papers in international journals and conferences in the field of video coding, image processing, and pattern recognition. He holds 16 national patents and 2 monographs in the areas of image/video coding and communications.
XJZ: Studying master of North China University of Technology. Her major research is HEVC.
ZL: Doctor of Engineering, Master Instructor. His major research interests include the video codec, pattern recognition, and selforganizing network.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 HEVC
 CU partition
 Coding bits
 Mode
 Fast algorithm