 Research
 Open access
 Published:
Hierarchical complexity control algorithm for HEVC based on coding unit depth decision
EURASIP Journal on Image and Video Processing volume 2018, Article number: 96 (2018)
Abstract
The nextgeneration High Efficiency Video Coding (HEVC) standard reduces the bit rate by 44% on average compared to the previousgeneration H.264 standard, resulting in higher encoding complexity. To achieve normal video coding in powerconstrained devices and minimize the rate distortion degradation, this paper proposes a hierarchical complexity control algorithm for HEVC on the basis of the coding unit depth decision. First, according to the target complexity and the constantly updated reference time, the coding complexity of the group of pictures layer and the frame layer is allocated and controlled. Second, the maximal depth is adaptively assigned to the coding tree unit (CTU) on the basis of the correlation between the residual information and the optimal depth by establishing the complexitydepth model. Then, the coding unit smoothness decision and adaptive low bit threshold decision are proposed to constrain the unnecessary traversal process within the maximal depth assigned by the CTU. Finally, adaptive upper bit threshold decision is used to continue the necessary traversal process at a larger depth than the maximal depth of allocation to guarantee the quality of important coding units. Experimental results show that our algorithm can reduce the encoding time by up to 50%, with notable control precision and limited performance degradation. Compared to stateoftheart algorithms, the proposed algorithm can achieve higher control accuracy.
1 Introduction
With the development of capture and display technologies, highdefinition video is being widely adopted in many fields, such as television, movies, and education. To efficiently store and transmit large amounts of highdefinition video data, the Joint Collaborative Team on Video Coding (JCTVC), consisting of ISOIEC/MPEG and ITUT/VCEG, proposed High Efficiency Video Coding (HEVC) [1] as the nextgeneration international video coding standard in 2013. Compared with the previousgeneration video coding standard H.264/AVC [2], HEVC uses new technologies, such as the quadtree encoding structure [3], which reduces the average bit rate by 44% while providing the same objective quality [4]. However, the computational complexity of HEVC is high [5], and HEVC cannot be implemented on all devices, especially mobile multimedia devices with limited power capacity. To reduce the computational complexity of HEVC, many algorithms have been proposed to speed up the motion estimation [6], mode decision [7], and coding unit (CU) splitting [8]. However, the speedup performance is obtained at the cost of the degradation of rate distortion performance. In addition, the computational complexity reduction of these algorithms is not consistent for different video sequences. Hence, it is significant to control the coding complexity for different multimedia devices and sequences.
The research goals of HEVC complexity control are high control accuracy and rate distortion performance. High control accuracy can minimize the loss of rate distortion performance. Many researchers have devoted considerable efforts toward achieving these goals. Correa et al. used the spatialtemporal correlation of the coding tree unit (CTU) to limit the maximal depth of restricted CTUs. Thus, they reduced the coding complexity by 50% while incurring a small degradation in the rate distortion performance [9]. Correa et al. further limit the maximal depth of CTU in restricted frames to achieve complexity control [10]. The abovementioned algorithms [9, 10] do not fully consider the image characteristics when determining the restricted CTUs and frames. Correa et al. used the CTU rate distortion cost in the previous frame as the basis for determining whether the current CTU should be constrained or unconstrained, and they controlled the coding complexity by limiting the modes and the maximal depth [11]. Furthermore, by adjusting the configuration of the coding parameters, they were able to restrict the target complexity to 20% [12]. However, the relationship between the coding parameters and the complexity is obtained offline and cannot adapt to video with different features. Deng et al. employed the visual perception factor to limit the maximal depth in order to realize complexity allocation [13]. Further, they studied the relationship between the maximal depth and the complexity and limited the maximal CTU depth by combining the temporal correlation and visual weight. Their algorithm not only controls the computational complexity but also guarantees the subjective and objective quality [14]. In addition, they have proposed a complexity control algorithm for video conferencing, which is adaptable to the features of video conferencing [15]. The abovementioned methods of complexity allocation [13,14,15] are more effective for sequences with less texture, while they result in degradation of the rate distortion performance for sequences with rich texture. Zhang et al. established a statistical model to estimate the complexity of CTU coding and restricted the CTU depth traversal range to achieve complexity control. However, it cannot achieve accurate complexity control for the videos with large scene changes [16]. Amaya et al. proposed a complexity control method based on fast CU decisions [17]. They obtained thresholds for early termination at different depths via online training. These thresholds are used to terminate the recursive CU process in advance. Their algorithm can restrict the target complexity to 60% while guaranteeing the coding performance. However, the control accuracy requires improvement and the rate distortion performance undergoes severe degradation as the target computational complexity decreases.
To further improve the control accuracy and reduce the rate distortion performance degradation, this paper proposes a hierarchical complexity control algorithm based on the CU depth decision. First, according to the target complexity and the constantly updated reference time, the coding complexity of the group of pictures (GOP) layer and frame layer is assigned and controlled. Second, the complexity weight of the current CTU is calculated, and the maximal depth is adaptively allocated according to the encoding complexitydepth model (ECDM) and the video encoding feature. Finally, the rate distortion optimization (RDO) process is terminated early or continued on the basis of the CU smoothness decision and the adaptive upper and low bit threshold decision. This paper has two main contributions: (1) We propose a method with periodical updating strategy to predict the reference time. (2) We propose two kinds of adaptive complexity reduction methods, which adapt to different video contents well.
The remainder of this paper is organized as follows. Section 2 describes the quadtree structure and the rate distortion optimization process of HEVC. Section 3 provides a detailed explanation of the proposed method. Section 4 presents and discusses the experimental results. Finally, Section 5 concludes the paper.
2 Quadtree structure and rate distortion optimization process of HEVC
HEVC divides each frame into several CTUs of equal size. If the video is sampled according to the 4:2:0 sampling format, then each CTU contains a luma and two chroma coding tree blocks, which form the root of the quadtree structure. As shown in Fig. 1a, CTU can be divided into several equalsized CUs according to the quadtree structure, which ranges in size from 8 × 8 to 64 × 64. The CU is the basic unit of intra or inter prediction. Each CU can be divided into 1, 2, or 4 prediction units (PUs), and each PU is a region that uses the same prediction. HEVC supports 11 candidate PU splitting modes, Merge/Skip mode, two intra modes (2N × 2N, N × N), and eight inter modes (2N × 2N, N × N, N × 2N, 2N × N, 2N × nU, 2N × nD, nL × 2N, nR × 2N). The transform unit (TU) is a shared transform and quantization square region defined by a quadtree partitioning of a leaf CU. Each PU contains luma and chroma prediction blocks (PBs) and the corresponding syntax elements. The size of a PB can range from 4 × 4 to 64 × 64. Each TU contains luma and chroma transform blocks (TBs) and the corresponding syntax elements. The size of a TB can range from 4 × 4 to 32 × 32.
RDO process of the quadtree structure is used to determine the optimal partition of the CTU. RDO needs to traverse entire depth in the order shown in Fig. 2 with all the PU splitting modes. By comparing the minimal rate distortion cost of the parent CU and the sum of the minimal rate distortion costs of four sub CUs, it is determined whether the parent CU should be divided into four sub CUs. If the minimal rate distortion cost of the parent CU is smaller, then partitioning is not performed; otherwise, it is performed.
Analysis of the HEVC quadtree structure and RDO process shows that the high computational complexity of HEVC is mainly caused by the depth traversal with the various modes. Considering the limited computational power of multimedia devices, we design a complexity control algorithm that skips the unnecessary CU depth and performs early termination of the mode search according to the video coding feature.
3 Methods
This paper proposes a hierarchical complexity control algorithm based on the coding unit depth decision, as shown in Fig. 3. The proposed algorithm includes the complexity allocation and control of GOP layer and frame layer, the CTU complexity allocation (CCA), the CU smoothness decision (CSD) method, and the adaptive upper and lower bit threshold decision (ABD) method. The CCA divides the complexity weight of the CTU and allocates the maximal depth to the CTU in combination with the ECDM model. The CSD and ABD further restrict the RDO process and reduce the computational complexity.
3.1 Complexity allocation and control of GOP layer and frame layer
Among the encoding process, the first GOP has only one I frame, and is different from other GOPs. In the second GOP, the number of reference frames in the first three frames is less than that in other frames. Except the first two GOPs, the encoding structures of the subsequent GOPs are similar, and the encoding time is nearly consistent. Moreover, a GOP except the first GOP contains G frames, and for convenience of presentation, we refer to the mth frame (m = 1,2,3, …, G) in the jth GOP as frame (m,j). For certain k, the frames (k, j) are corresponding frames. The encoding parameters of the corresponding frames in consecutive GOPs are consistent. Hence, their proportion of the encoding time is similar. Figure 4 shows the proportion of the encoding time in different GOPs for the BQsquare sequence. The encoding time proportion ρ(k,j) is calculated as:
where t(k,j) is the coding time of frame (k,j). Clearly, the ρ(k,j) is nearly consistent for certain k. For example, ρ(4,j) slightly varies in a small range from 0.314 to 0.337. Inspired by the phenomena, we can estimate the reference coding time of entire sequence T_{o} by normal coding some frames. T_{o} is the predicted value of the normal coding time of entire sequence, and normal coding means that frames should be encoded without complexity control.
The first three GOPs are normally coded to obtain the initial T_{o}, and after the third GOP, a frame is normally coded for every four GOPs to update T_{o}, i.e.,
where f denotes the fth frame, f ∈ [0, F − 1], J is the total number of GOPs to be encoded, and t_{f} denotes the actual coding time of the fth frame. Constantly updating T_{o} causes it to further approach its true value.
After encoding the (j − 1)th GOP, the target coding time of the jth GOP \( {T}_{\mathrm{GOP}}^j \) is determined according to the remaining target time and the number of remaining frames to be coded. It is calculated as
where T_{c} is the target complexity proportion, T_{c} ⋅ T_{o} is the target coding time of entire sequence, T_{coded} is the consumed encoding time, F_{coded} is the number of coded frames, and F is the total number of frames to be encoded.
In the case of complexity control algorithms, the rate distortion performance of video sequences severely deteriorates as the target encoding time decreases [9,10,11,12,13,14,15]. Moreover, the coding time of each frame in one GOP differs significantly because of different coding parameters. Hence, the time proportion rather than absolute time is reasonably used to regulate the encoding complexity. In the study, to achieve temporalconsistent rate distortion performance, the proportion of time saving is the same in one GOP. To maintain the same proportion of the time saving for each frame in the GOP, we allocate the target encoding time by using the temporal stability of the encoding proportion in the frame layer.
For complexity control of the frame layer, it is important to maintain good rate distortion performance. In the proposed algorithm, we estimate the actual time saving of the coded frame by considering the difference between the normal coding time of the coded frame and the actual coding time. Then, the following strategies are adopted. (1) If the sum of the actual time saving of already coded frames is greater than the target time saving of the entire sequence, normal coding is carried out in time to avoid degradation of the rate distortion performance. (2) When the sum of actual time saving of the coded frames is less than the target time saving of the entire sequence, the remaining frames still need to be encoded under control. (3) When the actual time saving of the previous frame is much greater than the target time saving of the previous frame, the degree of control of the current frame needs to be reduced. Therefore, the current frame only uses the CSD method to save the coding time and achieve better rate distortion performance.
3.2 CTU complexity allocation
The proportion of the CU at each depth changes according to the characteristics and coding parameters of video sequences. Based on the residual information and the ECDM model, a complexity allocation method for the CTU layer is proposed in this study. The target complexity of the frame layer is reasonably allocated to each CTU by avoiding the RDO process in the unnecessary CU depth.
3.2.1 Complexity weight calculation
In the lowdelay configuration of HEVC, the CTU encoded with large depth often corresponds to the region with strong motion or rich texture. Figure 5a shows the 16th frame in sequence ChinaSpeed. Figure 5b shows the residual of the ChinaSpeed. Figure 5c shows the optimal partition of ChinaSpeed and blue solid line indicates motion vector of CU. Clearly, the residual in motion regions is more obvious and the corresponding optimal partition is more precise. Therefore, when the CU depth is 0, the residual is reflected by the absolute difference between the original pixel and the predicted pixel. The mean absolute difference (MAD) of the CU is used as the basis for judging the pixel level fluctuation, and the absolute difference that is greater than the MAD is accumulated to obtain the effective sum of absolute differences (ESAD). Figure 5d shows the relation between the ESAD and optimal depth of CTU. Here, the optimal depth refers to the depth calculated through the RDO process. The digit in each CTU represents the optimal depth, and color denotes different ESAD. Clearly, the optimal depth is strongly related to the ESAD. Therefore, in the proposed algorithm, ESAD of the ith CTU, denoted by ω_{i}, is used as the complexity allocation weight of the ith CTU.
3.2.2 Encoding complexitydepth model
Statistical analysis of the average coding complexity under the different maximal depth d_{max} was conducted to explore the relationship between the coding complexity and d_{max} of the CTU. We trained five sequences, as shown in Table 1, on HM 13.0 software under the low delay P main configuration. Four different quantization parameter (QP) values (i.e., 22, 27, 32, 37) are used. The coding time \( {C}_n\left({d}_{\mathrm{max}}^n\right) \) is obtained statistically when \( {d}_{\mathrm{max}}^n \) is 0, 1, 2, and 3, respectively, in the nth CTU. The coding time when \( {d}_{\mathrm{max}}^n \) is 3 is regarded as the reference time, and the coding time is normalized when \( {d}_{\mathrm{max}}^n \) is 0, 1, 2, and 3, respectively, i.e.,
The normalized coding times when d_{max} is 0, 1, 2, and 3, respectively, are summed and the average normalized coding time \( \overline{C}\left({d}_{\mathrm{max}}\right) \) is obtained. Figure 6 shows normalized coding complexity difference of four QPs with different maximal depth. We find that the difference of normalized encoding complexity with different QPs is small. Thus, we get mean of the training results of four QPs, as presented in Table 1.
From mean of the training results, the average coding complexity \( \overline{T_{\mathrm{CTU}}} \) under different d_{max} is obtained, and the ECDM is established as follows:
where \( \overline{T_{\mathrm{CTU}}} \) represents the average coding complexity of the CTU under different d_{max}.
The CCA method is summarized as follows:
1) Obtain the target coding time \( {T}_f^t \) and ω_{i} of the fth frame.
2) According to the allocation time of \( {T}_f^t \) and ω_{i}, the target coding time of the ith CTU \( {T}_{\mathrm{CTU}}^i \) is calculated as:
where R_{coded} represents the sum of the actual coding times of the all the coded CTUs in the current frame, ω_{m} represents the complexity allocation weight of the mth CTU in the corresponding frame of the last GOP, and I represents the number of CTUs in one frame.
4) Use the normal coding time of the CTU in the corresponding frame in the third GOP as the normalized denominator to normalize \( {T}_{\mathrm{CTU}}^i \) in order to obtain the normalized target coding complexity of CTU \( {\tilde{T}}_{\mathrm{CTU}}^i \).
5) According to the ECDM and \( {\tilde{T}}_{\mathrm{CTU}}^i \), set the maximal depth of the current CTU as:
In the proposed method, the frames in the first three GOPs are normally coded. Subsequently, only one frame out of every four GOPs is normally coded to update T_{0} and ratio of CTU optimal depth is 0. When the motion is strong or the texture is rich, the maximal CTU depth determined by Eq. (7) will degrade the rate distortion performance. When the ratio is less than 0.4, the CU at depth 0 only tests Merge/Skip and inter 2N × 2N mode, and longer coding time is required for traversal of larger depths. Thus, Eq. (7) becomes:
3.3 CU smoothness decision
The CCA method avoids traversal of some unnecessary depths, but after allocating d_{max}, redundant traversal may still occur for CUs with depth d ∈ [0, d_{max}]. It has been observed that when the residual volatility is smooth and the motion is weak, the CU is more likely to be optimal partition, as shown in Fig. 5c. Therefore, the current CU no longer proceeds with the deeper RDO process when the following conditions are satisfied: (1) the absolute difference between the original value and the predicted value of any pixel in the CU is smaller than a certain threshold, and (2) the motion vector is 0.
Apparently, the greater the threshold, the greater is the probability of falsely terminating the RDO process. The threshold should be set on the basis of a tradeoff between the rate distortion performance and the computational complexity. To obtain the threshold, we perform explorative experiments by normal coding 150 frames under the lowdelay configuration. The training sequences, as shown in Table 2, with different features are encoded under QP = 22, 27, 32, and 37. We can obtain the partitioned quadtree of all the CTUs. Further, we can directly obtain the number of CUs that is not optimal partition, statistically analyze them, and ensure that the former conditions with the threshold (ranging from 1 to 128) are satisfied. Hence, we can set a reasonable threshold by considering the rate distortion performance and encoding speed. On the one hand, we constrain the false termination ratio within 1% by adjusting the threshold in order to achieve better rate distortion performance. On the other hand, we should maximize the threshold to save more time. Hence, in the proposed method, the reasonable threshold β_{d} at depth level d is given by:
where H_{d} is the number of CUs at depth level d that is not optimal partition, and \( {E}_d^{\beta } \) is the number of CUs at depth level d that is not optimally partitioned and satisfy the former conditions with β (β=1,2,3,…, 128). The β_{d} values for different training sequences under different QP are listed in Table 2.
According to the average value, we obtain the threshold by Gaussian fitting:
where Q is QP value of current sequence and β is the threshold under different d as Q changes.
According to statistical analysis of the optimal CU mode that satisfies the abovementioned condition, the probability of the optimal mode being inter 2N × 2N is not less than 93.5%. Hence, after testing the inter 2N × 2N mode, the current CU is judged. If the condition is satisfied, then the traversal of the remaining modes and the RDO process are terminated.
The CSD method is summarized as follows:

1)
Test the inter 2N × 2N mode and obtain the absolute difference between the original pixel value and the predicted pixel value of the current CU as well as the motion vector information.

2)
Obtain β from the current CU depth and QP value.

3)
If the absolute difference between the original value and the predicted value of any pixel in the CU is less than β, and the motion vector is 0, then the traversal of the remaining modes and the RDO process are terminated.
3.4 Adaptive upper and lower bit threshold decision
On the one hand, due to the strict conditions of the CSD method, the time saving cannot reach to the target time. On the other hand, in the CCA method, the CU depth decision will lead to rate distortion performance degradation. Hence, we should further regulate the computational complexity on the basis of the CSD and CCA methods.
In [14], it has been shown that the greater the corresponding bit of the current CU, the greater is the probability that it is not optimal partition. For further analyzing the relationship between bit of the current CU and the probability that it is not optimal partition, we used the same experimental environment and training sequences described in Section 3.3. Figure 7a, b shows the statistical results of probability for the 2nd to the 3rd GOP and the 2nd to the 38th GOP, respectively. In these figures, \( {F}_Y^d(Bit) \) and \( {F}_N^d(Bit) \) denote the probability of the CU being optimal and nonoptimal partition, respectively, when its bit is smaller than or equal to Bit with CU depth level d. The probability functions for other depths (i.e., 1, 2) are similar, and the same to other training sequences. According to the figure, the variation ranges of the two image probability functions are highly consistent. The function \( {F}_Y^d(Bit) \) varies sharply over the interval close to 0, and the function \( {F}_N^d(Bit) \) changes gradually over a wide range. Statistical analysis of different sequences shows the same trend as that in Fig. 7. Therefore, the early termination or continuous partition threshold can be determined adaptively by the functions \( {F}_N^d(Bit) \) and \( {F}_Y^d(Bit) \) using the normal coding information statistics of the 2nd to the 3rd GOP. The lower bit bounds N_{d} and Y_{d} of the extremely smooth interval of functions \( {F}_N^d(Bit) \) and \( {F}_Y^d(Bit) \) at different depths are used as the reference bits of the lower and upper thresholds, respectively. The upper threshold H_{d} is obtained by multiplying Y_{d} with 0.7, and the lower threshold L_{d} is obtained by multiplying N_{d} with μ, which is defined as
The adaptive upper and lower bit threshold decision method is summarized as follows.

1)
According to normal coding of the 2nd to the 3rd GOP, Y_{d} and N_{d} under different depths are obtained.

2)
μ is obtained by the target complexity proportion; then, H_{d} and L_{d} are obtained.

3)
When the depth is d and Bit_{d} corresponding to the optimal mode of the CU is smaller than L_{d}, RDO traversal is terminated. When Bit_{d} is greater than H_{d} and the current depth is not less than d_{max} allocated by the CCA method, the current CU continues the RDO process.
4 Results and discussions
To evaluate the performance of the proposed algorithm, the rate distortion performance and the complexity control precision are verified via implementation on HM13.0, with QP values of 22, 27, 32, and 37. The test conditions follow the recommendations provided in [18], and our all experiments only consider the low delay P main configuration. The detailed coding parameter is summarized in Table 3.
To verify the effectiveness of the proposed algorithm, the actual time saving TS is used as a measure of complexity reduction:
where T_{Original} denotes the normal encoding time and T_{Proposed} denotes the actual encoding time with a certain T_{c} in our algorithm. The mean control error (MCE) is used as a measure of complexity control accuracy and calculated as follows:
where n is the number of test sequences and TS_{i} is the TS of the ith test sequence.
The bit rate increase (∆BR) and PSNR reduction (∆PSNR) are used as measures of the rate distortion performance of the complexity control algorithm. The proposed algorithm tests and analyzes five target complexity levels, T_{c}(%) = {90, 80, 70, 60, 50}.
Table 4 summarizes the performance of the proposed algorithm in terms of ∆PSNR, ∆BR, and TS for different sequences under different T_{c}. The experimental results presented in Table 4 indicate that the actual coding complexity of the proposed algorithm is quite close to the target complexity. This means that our algorithm can smoothly code most of the sequences under limited computing power. Although the individual sequence deviation is large (when T_{c} = 90%, the maximal complexity deviation is 3.77%), the MCE is small, with a maximum of 1.22%. For T_{c} = 90%, 80%, 70%, 60%, and 50%, the average ∆PSNR is − 0.01 dB, − 0.02 dB, − 0.05 dB, − 0.06 dB, and − 0.09 dB, the average ∆BR is 0.50%, 1.04%, 1.86%, 2.60%, and 3.48%, and the MCE is 1.22%, 0.87%, 0.61%, 0.41%, and 0.24%, respectively. From the viewpoint of the degree of attenuation of the average ∆PSNR and ∆BR with decreasing T_{c}, the decrease of our algorithm is relatively smooth; however, the decrease of individual sequences is sharper (e.g., the sequence SlideShow, most of whose frames are smooth, except for some frames that have strong motion). This is because the frames with strong motion influence the CCA method, which depends on the encoding complexity of the previous frame.
Figure 8 shows rate distortion curves of the sequence BasketballPass and Vidyo1 for the five different T_{c}. The rate distortion performance of the sequence BasketballPass with strong motion is not as good as the sequence Vidyo1, which has little scene changes. The conclusions also can be drawn from Table 4.
To demonstrate the effectiveness of our frame level complexity allocation method, two frame level complexity allocation methods are compared in T_{c} = 90%. One of the methods is proposed in this paper, and the other is to get the target encoding time of the frame layer by equally dividing the target encoding time of the GOP layer. The same experimental environment described in first paragraph of this section was used for analyzing the performance of two methods, and the experimental results of the comparison method are obtained by modifying the frame level complexity allocation method of the proposed algorithm. As shown in Fig. 9, our method exhibits better rate distortion performance, which proves that it can balance the complexity and rate distortion effectively in the frame layer.
To evaluate the performance of the proposed algorithm more intuitively, we compared our algorithm with three stateoftheart algorithms [14, 16, 17]. The results are listed in Tables 5, 6, and 7. Because the minimal controllable target complexity proportions of [17] and our algorithm is 60% and 50%, respectively, the performance is compared under the target complexity proportions, 80% and 60%.
Regarding losses in rate distortion performance, we can find in Tables 5 and 7 that the average ∆BR of our algorithm is slightly higher than [14, 17], and the average ∆PSNR difference between our algorithm and the algorithms [14, 17] is negligible when T_{c} = 80%. When T_{c} = 60%, the average rate distortion performance of our algorithm is better than those of [14, 17]. Specially, for a few sequences, such as Johnny, for which the performance our algorithm is slightly worse than that of [14, 17] in terms of both ∆PSNR and ∆BR. It mainly benefits from the fact that algorithms [14, 17] can effectively skip unnecessary higher CU depths for little motion videos. Moreover, we can find in Table 6 that the average BDBR [19] of algorithm [16] is better than our algorithm, but the rate distortion performance of our algorithm is better in sequences of class E like Johnny and FourPeople.
The control accuracy is an important index to validate the performance of complexity control algorithm, and the overall control accuracy of our algorithm and other three algorithms is compared by MCE. From Tables 5, 6, and 7, we can obviously see that the MCE of our algorithm is lower than [14, 16, 17], which means that our algorithm can achieve steady complexity control for different test sequences.
5 Conclusions
This paper proposed a hierarchical complexity control algorithm based on the coding unit depth decision to guarantee the rate distortion performance during realtime coding when the computing power of a device is limited. First, we get the reference time by periodical updating strategy. Second, the GOP layer and frame layer complexity allocation and control method based on the target complexity are used to control the coding time of these layers. Then, the RDO process at unnecessary CU depths layer is skipped by using the correlation between ESAD and the optimal depth and by establishing the ECDM model to adaptive allocate the maximum CTU depth. Next, based on the CU smoothness decision and the adaptive low bit threshold decision, the redundant traversal process within the allocated maximal depth is reduced to further save the time. Finally, the adaptive upper bit threshold is used to guarantee the quality of important CUs by performing the RDO process at depths larger than the maximal depth allocated by the CCA method. The experimental results showed that the minimum target complexity of our algorithm can reach 50% with smooth attenuation of ∆PSNR and ∆BR as T_{c} decreases. Compared with other stateoftheart complexity control algorithms, our algorithm outperforms better in control accuracy. In the future, we will design an effective mode decision method to save more time. In addition, we will further investigate the frame layer complexity allocation and improve the frame layer control accuracy.
Abbreviations
 ABD:

Adaptive upper and lower bit threshold decision
 CCA:

CTU complexity allocation
 CSD:

CU smoothness decision
 CTU:

Coding tree unit
 CU:

Coding unit
 ECDM:

Complexitydepth model
 ESAD:

Effective sum of absolute differences
 FDM:

Fast decision for merge ratedistortion cost
 FEN:

Fast encoder decision
 GOP:

Group of picture
 HEVC:

High Efficiency Video Coding
 HM:

HEVC test model
 JCTVC:

Joint Collaborative Team on Video Coding
 MAD:

Mean absolute difference
 MPEG:

Moving Picture Experts Group
 PB:

Prediction block
 PU:

Prediction unit
 QP:

Quantization parameters
 RDO:

Rate distortion optimization
 SAO:

Sample adaptive offsets
 TB:

Prediction block
 TU:

Transform unit
 VCEG:

Video Coding Experts Group
References
G.J. Sullivan, J.R. Ohm, W.J. Han, T. Wiegand, Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
T. Wiegand, G.J. Sullivan, G. Bjøntegaard, A. Luthra, Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)
I.K. Kim, K. Mccann, K. Sugimoto, B. Bross, W.J. Han, G. Sullivan, High Efficiency Video Coding (HEVC) Test Model 13 (HM13) Encoder Description ITUT/ISO/IEC Joint Collaborative Team on Video Coding (JCTVC) Document JCTVCO1002, in 15th Meeting of JCTVC (CH, Geneva, 2013)
T.K. Tan, R. Weerakkody, M. Mrak, N. Ramzan, V. Baroncini, J.R. Ohm, G.J. Sullivan, Video quality evaluation methodology and verification testing of HEVC compression performance. IEEE Trans. Circuits Syst. Video Technol. 26(1), 76–90 (2016)
J.R. Ohm, G.J. Sullivan, H. Schwarz, T.K. Tan, T. Wiegand, Comparison of the coding efficiency of video coding standards—including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22(12), 1669–1684 (2012)
R. Fan, Y. Zhang, B. Li, Motion classificationbased fast motion estimation for highefficiency video coding. IEEE Trans. Multimedia. 19(5), 893–907 (2017)
J. Zhang, B. Li, H. Li, An efficient fast mode decision method for inter prediction in HEVC. IEEE Trans. Circuits Syst. Video Technol. 26(8), 1502–1515 (2016)
F. Chen, P. Li, Z. Peng, G. Jiang, M. Yu, F. Shao, A fast inter coding algorithm for HEVC based on texture and motion quadtree models. Signal Process. Image Commun. 44(C), 271–279 (2016)
G. Correa, P. Assuncao, L. Agostini, L.A. da Silva Cruz, Computational complexity control for HEVC based on coding tree spatiotemporal correlation (IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS), Abu Dhabi, United Arab Emirates) (2013), pp. 937–940
G. Correa, P. Assuncao, L. Agostini, L.A. da Silva Cruz, Coding tree depth estimation for complexity reduction of HEVC (2013 data compression conference, Snowbird, UT, USA) (2013), pp. 43–52
G. Correa, P. Assuncao, L. Agostini, L.A. da Silva Cruz, Complexity scalability for realtime HEVC encoders. J. Real.Time Image Process. 12(1), 107–122 (2016)
G. Correa, P. Assuncao, L. Agostini, L.A. da Silva Cruz, Encoding time control system for HEVC based on ratedistortioncomplexity analysis (2015 IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal) (2015), pp. 1114–1117
X. Deng, M. Xu, L. Jiang, X. Sun, Z. Wang, Subjectivedriven complexity control approach for HEVC. IEEE Trans. Circuits Syst. Video Technol. 26(1), 91–106 (2016)
X. Deng, M. Xu, C. Li, Hierarchical complexity control of HEVC for live video encoding. IEEE Access. 4(99), 7014–7027 (2016)
X. Deng, M. Xu, Complexity control of HEVC for video conferencing (2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA) (2017), pp. 1552–1556
J. Zhang, S. Kwong, T. Zhao, Z. Pan, CTUlevel complexity control for high efficiency video coding. IEEE Trans. Multimedia. 20(1), 29–44 (2018)
A. JiménezMoreno, E. MartínezEnríquez, F. DíazDeMaría, Complexity control based on a fast coding unit decision method in the HEVC video coding standard. IEEE Trans. Multimedia. 18(4), 563–575 (2016)
F. Bossen, Common Test Conditions and Software Reference Configurations, ITUT/ISO/IEC Joint Collaborative Team on Video Coding (JCTVC) Document JCTVCH1100, in 8th Meeting of JCTVC (CA, San Jose, 2012)
G. Bjontegaard, Calculation of average PSNR differences between RDcurves (doc.VCEGM33, in ITUT VCEG 13th meeting, Austin, TX, USA) (2001), pp. 2–4
Acknowledgements
The authors would like to thank the editors and anonymous reviewers for their valuable comments.
Funding
This work is supported by the Natural Science Foundation of China (61771269, 61620106012, 61671258) and Natural Science Foundation of Zhejiang Province (LY16F010002, LY17F01000 5). It is also sponsored by K.C. Wong Magna Fund in Ningbo University.
Availability of data and materials
The conclusion and comparison data of this article are included within the article.
Author information
Authors and Affiliations
Contributions
FC designed the proposed algorithm and drafted the manuscript. PW carried out the main experiments. ZP supervised the work. GJ participated in the algorithm design. MY performed the statistical analysis. HC offered useful suggestions and helped to modify the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Chen, F., Wen, P., Peng, Z. et al. Hierarchical complexity control algorithm for HEVC based on coding unit depth decision. J Image Video Proc. 2018, 96 (2018). https://doi.org/10.1186/s1364001803413
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1364001803413