Skip to main content

Gradient-based pre-processing for intra prediction in High Efficiency Video Coding


In order to reach higher coding efficiency compared to its predecessor, a state-of-the-art video compression standard, the High Efficiency Video Coding (HEVC), has been designed to rely on many improved coding tools and sophisticated techniques. The new features are achieving significant coding efficiency but at the cost of huge implementation complexity. This complexity has increased the HEVC encoders’ need for fast algorithms and hardware friendly implementations. In fact, encoders have to perform the different encoding decisions, overcoming the real-time encoding constraint while taking care of coding efficiency. In this sense, in order to reduce the encoding complexity, HEVC encoders rely on look-ahead mechanisms and pre-processing solutions. In this context, we propose a gradient-based pre-processing stage. We investigate particularly the Prewitt operator used to generate the gradient and we propose necessary approaches that enhance the gradient performance of detecting the HEVC intra modes. We also set different probability scenarios, based on the gradient information, in order to speed up the mode search process. Moreover, we propose a gradient-based estimation of the texture complexity that we use for coding unit decision. Results show that the proposed algorithm achieves a reduction of 42.8% in encoding time with an increase in BD rate of only 1.1%.

1 Introduction

Especially, with the emergence of the H.264/AVC standard, a significant progress has been performed in video applications. This progress has led to an increasing need for better video quality and higher compression especially with the applications and services dealing with high and ultra-high resolutions.

In this context, the Joint Collaborative Team on Video Coding (JCT-VC), a team of experts from the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) have standardized, in 2013, a state-of-the-art video coding, the HEVC [1, 2]. The architecture of the new standard has kept the same high-level design as its predecessor. But, the HEVC relies on many improved coding tools and techniques that offer effectively higher coding efficiency but at the cost of more encoding complexity.

The block structure is one of the most important new features that contributes to this complexity and directly affects all the other features. In fact, HEVC relies on a coding tree block (CTB) structure. Unlike the AVC Macrobloc with the size of 16 × 16, the largest coding unit (LCU) defined in HEVC, allows to use block sizes of 8 × 8 up to 64 × 64. The LCU can then be partitioned, following a quadtree structure, into coding units (CUs), where each CU can be recursively partitioned into four sub-CUs. After the CU partitioning structure is performed, each CU can then be split, at the prediction stage, into one or more prediction units (PUs) [2, 3]. Moreover, at the transform stage, each CU can be split into one or more recursive transform units (TUs). Figure 1 illustrates a description of the possible recursive splits of a CU in intra coding case, on which we particularly focus in this work.

Fig. 1
figure 1

Example of CUk, PUk, and TUk recursive split structures for intra case with k as depth index

At the intra prediction level, HEVC supports 35 prediction modes, as can be seen in Fig. 2. In addition to DC and planar modes, HEVC supports 33 angular modes, much more than the maximum of eight angular modes proposed by H.264/AVC. Furthermore, the new standard allows deriving a “most probable mode” from neighbor blocks. In the case of the Chroma component, the same mode as the Luma can be used. Moreover, HEVC supports additional reference sample smoothing as well as a boundary smoothing.

Fig. 2
figure 2

Modes and directional orientations

All these sophisticated prediction features offer a better coding efficiency, but at the cost of significant complexity at the encoder side. Thus, the HEVC encoders are facing a real challenge for speeding up the encoding process and especially the mode decisions while paying a close attention to encoding efficiency.

Many approaches have been studied in order to speed up the encoding decisions. Among these approaches, many-core processors technique, which relies on parallelization of encoding algorithms, presents a good alternative. Many works have been conducted to speed up some encoding processes as the coding unit partitioning [4], the motion estimation [5], the HEVC deblocking filter [6], and the intra prediction [7]. Another approach relies on multi-pass processing. For example in [8], Wang et al. proposed a two-pass based rate control algorithm. In addition to the solution of multi-pass processing, of which the possibility remains quite related to the application constraints, it became quite important for HEVC encoders to rely on look-ahead and pre-processing solutions.

In this work, the large number of HEVC supported intra modes presents a motivation to investigate the solution of a pixel gradient-based pre-processing stage that will operate on the original frame. We are interested in intra coding dealing with mode decision as well as CU coding.

Many works have been proposed to deal with these aspects. In [9], a fast pre-processing is proposed to generate estimations of the RD costs. Operating on the original frame instead of the reconstructed one, the pre-processing allows to reduce the data dependency from the reconstruction loop. Then, the generated data is used to reduce the number of tested prediction unit levels as well as the number of tested intra modes.

In [10, 11], a down-sampling approach is applied on the CU in order to reduce the prediction related computation. The down-sampled prediction is coupled with a progressive search in order to reduce the intra candidate modes. In [12], authors categorize the edge directions in five groups by applying different types of differences on the pixel values. A dominant edge direction for each PU is generated and used to reduce the number of intra modes going to be evaluated. Shen et al. [13] have used the spatial correlation between neighbor CUs in order to speed up the CU split decision as well as early terminating the motion estimation. A Bayesian rule-based approach has been proposed in [14]. The CU split decision is formulated as classification problem for which a probability density function is estimated. A minimization of the Bayesian risk is performed in order to approach the optimal CU split decision. The works [1517] have relied on the correlation between the intra RD cost and its estimation based on Hadamard transform, for early termination of the intra mode and CU coding decisions.

Now regarding the gradient-based approach, which is of particular interest here, many works have been proposed in video coding and they could be categorized into two main classes: The first class deals with works that generate gradient information through differences computation on the pixel blocks. Such work has been conducted by Tsai et al. [18] for H.264 intra prediction. A similar approach has been proposed by Yongfei [19] for HEVC intra prediction. The second class concerns works in which a differential operator is used to approximate the mathematical gradient values such as [20], where Pan et al. proposed to measure the edge directions, at a pre-processing level, with the Sobel operator. The generated gradient information is used then to predict the H.264 intra modes. A similar approach, using the same operator, has been proposed by Jiang [21] for HEVC intra prediction. More recent similar work has been proposed in [22] coupled by gap consideration into the values of the sums of absolute transformed differences (SATD), in order to eliminate less probable modes from the prediction process.

In this work, we focus on this later class as it offers a mathematic generation of the gradient direction, which is an interesting solution for taking advantage of the large number of HEVC angular intra modes. As we are particularly interested in reducing the implementation complexity compared to [21, 22], we focus on the operator used for the gradient computation. The reason why Sobel operator is widely used in gradient intra prediction works and in general in many image and video algorithms and applications comes especially from its significant performance on edge detection area. In this work, we are interested in comparing its performance in detecting the HEVC intra direction with the Prewitt operator. Such a work is motivated by the fact that Prewitt operator offers simpler coefficients that can contribute much less implementation complexity for a gradient solution. In [23], we have conducted a motivation work using the Prewitt operator with granular pixel coverage, toward further understanding the gradient operators’ impact on HEVC video coding. We also presented a pixel neighbor extension of the gradient values in order to enhance the performance of the intra mode detection. In [24], we investigated the two-dimensional Roberts operator to even more simplify the gradient computation. In addition, we considered the appearance number of modes, as well as the gradient magnitude in order to optimize the performance of intra mode detection.

In this work, we extend the latter approaches to present a complete pre-processing solution for intra coding. In order to speed up the process of optimal intra mode research, we exploit the gradient information, generated at the pre-processing stage, to limit the modes to be tested to only the most probable ones based on different probability scenarios. Moreover, we propose a gradient-based scheme for the CU intra split decision. For this purpose, we propose an approach to measure the texture complexity depending on the CU sizes.

We consider here the work of Jiang as a basis work for a gradient solution for HEVC. Jiang has worked on HM4.0 [21, 25] but since that time, some features in the intra prediction design has changed. For example, unlike HM4.0 which supports three modes for 64 × 64 PUs, the HEVC standard supports 35 modes as will be exposed in more details in a next section of this paper. Therefore, in this work, we test the gradient-based approach on recent adopted HEVC design of intra prediction.

The remainder of this paper is organized as follows. Section 2 presents the experimental methods. Section 3 presents an overview of the HEVC intra prediction algorithm as well the proposed gradient-based intra prediction. In addition, it exposes the proposed optimization approaches dealing with a pre-selection of intra mode as well as an optimized mode selection at PU level. In Section 4, we present an approach for speeding up the intra prediction based on the gradient information. Section 5 exposes the proposed schemes for the CU split decision. Then, Section 6 presents the experimental results of the proposed algorithm. And finally, we present the conclusions in Section 7.

2 Experimental methods

The aim is to measure the impact of the proposed solution on video coding efficiency as well as on time of coding. For that purpose, the proposed algorithm was integrated in HEVC test model (HM) version 14.0. Simulations were performed conforming to common test condition specified in [30]. As the implemented feature concerns mainly the intra coding, we present the results for an all intra (AI) coding. We used test video sequences of classes A to E. To measure the coding efficiency, we present the Bjontegaard delta rate (BD-rate) [30]. This metric represents the average difference between the original rate-distortion curve and that obtained after the integration of the proposed features. The rate-distortion curves are obtained by coding each test sequence at four different QPs: 22, 27, 32, and 37. We measure the coding time saving according to Eq. (25), using T HM14 which is the encoding time of HM14.0 and T Prop which is that obtained after the integration of the proposed solution on HM14.0.

3 HEVC intra prediction

3.1 Overview of HEVC intra prediction

In order to speed up the intra prediction process, HM [26] adopted a simplified intra prediction algorithm. As presented in Fig. 3, the adopted algorithm goes through four stage processes for each PU. In the first stage, referred to as the rough mode decision (RMD), the HM performs a Hadamard transform for each PU possible size, for all the 35 possible intra modes, to generate for each combination the sum of SATDs [27].

Fig. 3
figure 3

Four stage intra prediction

The SATD will be used in the estimation of the rate-distortion (RD) cost of that PU, as shown in the following equation:

$$ JHAD=\mathrm{SATD}+\lambda .R, $$

where λ is a Lagrangian multiplier and R is the bit consumption estimation.

After the RMD step, a mode candidate set ψ R is generated by considering the best intra modes. The number of the candidate modes is set to 3, 3, 3, and 8, respectively, for PU sizes of 64 × 64, 32 × 32, 16 × 16, and 8 × 8 [28]. To exploit the correlation of direction information between the neighboring blocks [29], a check is performed, at a second stage, for additional most probable modes (MPMs) that are derived from neighbors. These modes are added, if they are not already included, to form an extended candidate set ψ M [30]. At the third stage, a rate-distortion optimized quantization (RDOQ) is performed using the modes of the candidate set at only the maximum size of TU. The goal of this step is pick the optimal intra mode m opt for the PU as well as the best PU split structure at rate-distortion wise. In the last stage, the optimal mode m opt found previously is used in order to find out the optimal residual quadtree (RQT) structure.

3.2 Gradient-based intra prediction

The idea of a gradient-based intra prediction is estimate the pixel intensity variation in order to approach the best intra mode direction.

The computation of the gradient values is performed through a discrete differentiation operator. The operator relies on horizontal and vertical kernels, noted here, respectively, S x and S y. The operator kernels will be detailed on the next section. At each pixel position of the original image, presented here as a two dimension matrix A, we perform a convolution through the two kernels, according to Eqs. (2) and (3), generating two matrices G x and G y. These matrices represent, respectively, an approximation of the horizontal and vertical derivatives at a pixel position.

$$ {G}_x={S}_x\times A $$
$$ {G}_y={S}_y\times A $$
$$ {\varPhi}_G= arctan\left({G}_y/{G}_x\right) $$

The corresponding gradient direction is then generated according to Eq. (4). The generated direction at each pixel position, is supposed to represent the most important variation of pixels intensity. That is, in the case of a pixel located on an edge, the obtained gradient direction goes across that edge as presented in Fig. 4. For our case, we consider the perpendicular direction to the gradient as it represents the similarity direction of pixel intensity. Equation (4) could be simplified to only computing the value of G y/G x, relying on the fact that arctan function is monotone [21].

Fig. 4
figure 4

Example of a gradient at an edge pixel position

We presented in Table 1 the HEVC intra directions and correspondent G y/G x values.

Table 1 Look-up table with HEVC intra directions

As exposed in Eq. (5), the HEVC supported intra direction Φ m that is the nearest to the obtained Φ G value is picked from the look-up table, and the corresponding intra mode m is affected to the current pixel location.

$$ {\varPhi}_m=\underset{m\in \left[2..34\right]}{ \arg\ \min}\left|{\varPhi}_m-{\varPhi}_G\right| $$

For the gradient magnitude, it can be roughly approximated as such:

$$ M=\left|{G}_x\right|+\left|{G}_y\right| $$

At the end of this pre-processing step, we will obtain a mode map m i as well as a magnitude map M i where i presenting a pixel position i.

For each PU, a mode histogram with accumulated mode magnitudes will be generated. The modes with highest values will be selected to form the candidate set. We mention here that the generated mode matrix contains only angular modes.

DC and planar modes are not represented and as these two modes have great probability to be the best modes at the end of the rate-distortion evaluation, we include them automatically in the candidate set. Figure 5 summarizes the details of the algorithm flow with some features that will be treated in the rest of this paper.

Fig. 5
figure 5

Proposed algorithm flow

3.3 Operators analysis

The works that have proposed gradient-based solutions such as [20] and [21] have used the Sobel operator to compute the gradient. The reason behind this is that Sobel has one of the best edge detection performance among the existing operators.

The Sobel kernels are exposed in the equations below:

$$ {S}_{x\left(\mathrm{Sobel}\right)} = \left[\begin{array}{ccc}\hfill -1\hfill & \hfill 0\hfill & \hfill 1\hfill \\ {}\hfill -2\hfill & \hfill 0\hfill & \hfill 2\hfill \\ {}\hfill -1\hfill & \hfill 0\hfill & \hfill 1\hfill \end{array}\right] $$
$$ {S}_{y\left(\mathrm{Sobel}\right)}=\left[\begin{array}{ccc}\hfill 1\hfill & \hfill 2\hfill & \hfill 1\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill -1\hfill & \hfill -2\hfill & \hfill -1\hfill \end{array}\right] $$

In our work, we are particularly interested in reducing the implementation complexity especially for such a processing that operates on a pixel basis. The convolution computation presents the heaviest part of the pre-processing stage, which gives a special interest to focus on its complexity. For this reason, we propose to investigate on the potential of the Prewitt operator as it offers simpler coefficients. The kernels of the Prewitt operator are exposed in the following equations:

$$ {S}_{x\left(\mathrm{Prewitt}\right)} = \left[\begin{array}{ccc}\hfill -1\hfill & \hfill 0\hfill & \hfill 1\hfill \\ {}\hfill -1\hfill & \hfill 0\hfill & \hfill 1\hfill \\ {}\hfill -1\hfill & \hfill 0\hfill & \hfill 1\hfill \end{array}\right] $$
$$ {S}_{y\left(\mathrm{Prewitt}\right)}=\left[\begin{array}{ccc}\hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill \end{array}\right] $$

This operator, with simpler coefficients presents some keys points that offer less implementation complexity. In fact, the Prewitt filter has only 1 and −1 coefficients, which can be implemented with simple additions and subtractions instructions. However, for Sobel operator case, which includes 2 and −2 coefficients, the gradient calculations would be implemented with additional instructions. For hardware implementation considerations, the 2 and −2 coefficients, make the convolution implementation need applying additional masks to isolate the pixels concerned by these coefficients as well as extra addition/subtraction instructions. Such considerations make the Prewitt filter much simpler, hardware-wise, especially for the applications that require a gradient generation at a pixel level.

Besides the implementation complexity, the investigation of the Prewitt based solution is motivated by some aspects:

  • The relation between approximating the pixel gradients and detecting the intra prediction directions is not that evident. In fact, the gradient solution is used as an approximation of the pixel intensity direction that would best represent the current PU. But the theoretical optimal direction is only related to rate-distortion wise. Hence, the impact of gradient operator on video coding efficiency should be investigated.

  • There are only 33 angular directions to represent each PU best direction. Hence, we have to choose the nearest HEVC supported direction Φ m to represent the computed gradient direction Φ G . This difference between Φ m and Φ G would offer an additional margin for a less accurate operator to make up for detection performance.

To evaluate the operator’s capabilities of detecting the HEVC intra directions, we consider the hit rate of the theoretical best intra angular modes. For such an experiment, we force the RDO step to perform the rate-distortion quantization on all the 35 intra modes while ignoring the candidate set. Then, we evaluate the operators capabilities of detecting the theoretical optimal mode in the candidate set. We consider here the hit rate for the gradient-based algorithm using Sobel and Prewitt operators as well as the Hadamard-based prediction used on HM. We present in Table 2 the hit rate for different sequences and different QP values taking into consideration only the angular cases. First, we notice from results, that the Prewitt operator is generating a detection performance of 63.39%, which is relatively close to that of Sobel with 63.01%. Thus, the small difference in detection performance between the two operators motivates an investigation on a gradient solution based on Prewitt operator. We notice also from the table that the Sobel and Prewitt operators are generating detection performances that are relatively low compared to the Hadamard-based prediction.

Table 2 Average hit rate of the theoretical optimal mode

Actually, this difference should be considered taking into account that the Hadamard prediction is performing a kind of multi-pass processing. In fact, that the Hadamard prediction is performing huge transform computation that is made for each intra mode to estimate the corresponding distortion.

Also, it is estimating the bit consumption of each mode. These estimations are then used into a rate-distortion cost function in order to choose the best intra mode.

The comparison with the results of the Hadamard based prediction suggests to optimize the gradient-based solution to have better detection performance. So in the next sections, we propose some approaches that would improve the gradient solution performance of detecting the HEVC directions while keeping a close watch to the implementation complexity.

3.4 Optimization of intra mode detection

3.4.1 Optimal mode selection

To choose the best modes for the candidate set, Jiang [21] has considered, as a cost function, the accumulated gradient magnitudes M m for each mode m in the current PU.

The cost function is as follows:

$$ {\mathrm{Cost}}_m = {M}_m = {\displaystyle {\sum}_{i\in PU}}{M}_{m,i}, $$

where M m,i is the gradient magnitude of a point i of which the gradient direction corresponds to the mode m. Thereafter, a histogram of Cost m is considered to select the modes with the highest values.

However, the M m criterion is presenting some limitations. In fact, we can have, in some cases, a mode that appears in many points in the PU but with small magnitudes representing a spread variation of pixel intensity but with very small values. And we can have, in other cases, a mode that exists in few points but with high gradient magnitudes reflecting a limited but high variation of pixel intensity. So as in both cases, the most appearing modes as well as the modes with high gradient values would approach the optimal mode, so we propose here, to consider in addition to M m , the number of appearance of a mode m in the current PU, N m .

To investigate the impact of these two factors on the criterion, we consider the following cost function for each angular mode m:

$$ {\mathrm{Cost}}_m=\alpha {N}_m+\left(1-\alpha \right).{\displaystyle {\sum}_{i\in PU}}{M}_{m,i}, $$

where α is a weighting factor that belongs to [0,1].

We consider here the hit rate of the theoretical optimal intra mode when this mode is angular. Table 3 below shows the results for different sequences and QP values. From the table, we notice first, that the highest values are obtained with the cases that make involving both N m and M m in the mode selection function. The explanation behind this is that, in some cases, especially for small block sizes, in which we have fewer gradient samples, we can have some gradient points with the same appearing number, thus favoring those with higher gradient magnitudes improves the mode detection. And, we can have in other cases, some points that have the same gradient magnitudes. In this case, favoring the most appearing modes, improves also the decision performance.

Table 3 Average hit rate of the theoretical optimal mode in function of α

The optimal weighting value for the different QP values is obtained around α value of 0.8. Such a case, when compared to the case of N m only consideration, improves the average hit rate of the best mode by 0.21%. For complexity consideration, we prefer to consider α value of 0.5, to avoid the weighting so that the cost function becomes:

$$ {\mathrm{Cost}}_m={N}_m+{\displaystyle {\sum}_{i\in PU}}{M}_{m,i} $$

3.4.2 Mode pre-selection

Owing to the nature of the gradient computation, the generated modes are, in fact, just approximations. So, we propose in this section to extend, the detection of a mode m at a pixel position, to a range of modes. So in addition to the detected mode, we will consider the neighbor modes m + 1 and m − 1 provided that they exist. To expose the approach, we express the cost function Cost m in function of Cost m,i which is the cost related to a point i in the current PU, in which the mode m was detected:

$$ {\mathrm{Cost}}_m={\displaystyle {\sum}_{i\in PU}}{\mathrm{Cost}}_{m,i} $$
$$ {\mathrm{Cost}}_{m,i}=1+{M}_{m,i} $$

In addition to the increase of Costm with the value Costm,i for each pixel point i, the proposed approach consists in increasing Cost m − 1,i and Cost m + 1,i as well. As expressed in Eq. (16), the Cost m,i is weighted by a bonus value b m used to favor the detected mode against its two neighbors. Similarly, the Cost m − 1 and Cost m + 1 of the neighbor modes m + 1 and m − 1 are weighted by a neighboring bonus value b n, used to favor these two neighbor modes against the other modes:

$$ \left\{\begin{array}{c}\hfill {\mathrm{Cost}}_{m-1,i}=\left(1+{M}_{m,i}\right)\times {b}_n\hfill \\ {}\hfill {\mathrm{Cost}}_{m,i}=\left(1+{M}_{m,i}\right)\times {b}_m\hfill \\ {}\hfill {\mathrm{Cost}}_{m+1,i}=\left(1+{M}_{m,i}\right)\times {b}_n\hfill \end{array}\right., $$

where b m and b n are the used bonus values so that b m  > b n .

For investigation on the best bonus values, we consider for different values of b m and b n the hit rate of the theoretical best mode in the candidate set for each PU. We notice from the results that, the hit rate is more related to the quotient q expressed in Eq. (17), than the values of the couple (b m ; b n ) themselves, i.e., we have for example almost the same results for the couples (2;1) and (4;2) despite the fact that they affect the cost distribution Cost m differently.

$$ q = {b}_m/{b}_n $$

Therefore, we present in Table 4 the hit rate of the best mode for different values of the quotient q on different sequences and different QP values. We mention here that the exposed performance deals with only the cases of angular optimal modes. We see from the table that the rate of best mode matching is clearly improved by the neighbor extension.

Table 4 Average hit rate of the theoretical optimal mode in function of q

In fact, this extension improved the average rate by 3.58, 4.22, 4.28 and 4.24% for, respectively, q values of 1.0, 1.3, 1.5, and 2.0. From the results, we notice also that the hit rate have the best result of q value around 1.5 obtained with bonus couple of (3;2). Thus, in the remaining of this paper, we continue working with these bonus values.

4 Fast intra mode decision

As mentioned before, all the 35 modes will be tested, in the RMD stage through a Hadamard transform encoding in order to choose the best modes for the current PU. Here, the idea is to select the most probable modes, in order to limit the number of the modes to be tested and so speed up the intra prediction process. In fact, the generated histogram for each PU, presents the cost values Cost m of each intra mode m.

The value Cost m reflects a kind of probability of the intra mode m to be the theoretical optimal mode for the current PU, i.e., higher the value Cost m is, more probable the intra mode m is matching the optimal mode. Therefore, instead of going through all the modes, only a limited list of modes will be investigated. We refer to this list as the gradient candidate set, \( {\psi}_i^G \) where 0 ≤ i ≤ N G , N G being the appearance number of modes in the current PU. The gradient modes are ordered from most probable to least probable in the candidate set. The gradient generated modes are more precise for bigger sizes of PU as it has more points to approximate the most representative gradient in the PU. Thus, the number of modes N G has to be set accordingly. We set this number to 15, 14, 8, 6, and 5 for, respectively, PU sizes of 4 × 4, 8 × 8, 16 × 16, 32 × 32, and 64 × 64, as we noticed that under theses settings, we have good tradeoff between time saving and encoding performance.

The best modes obtained through the RMD process will form the RMD candidate set referred to as \( {\psi}_i^R \), where 0 ≤ i ≤ N R , N R being the number of modes. We keep the number of modes N R as it set in HM14.0, i.e., 8, 8, 3, 3, and 3 for, respectively, PU sizes of 4 × 4, 8 × 8, 16 × 16, 32 × 32, and 64 × 64.

In order to speed up the RDO process, the heaviest stage in the intra prediction, we propose to reduce even more the number N R for PU sizes of 8 × 8 and 4 × 4, based on the gradient stage performance of detecting the theoretical optimal mode. So, we reduce the number N R according to different confidence scenarios. These scenarios are set by comparing the candidate set ψ G , result of gradient stage, to the candidate set ψ R result of the RMD stage. The idea relies on the hypothesis that the more results are similar, the more the gradient stage is approaching the theoretical optimal mode. The scenarios are ordered below according to the probability of matching the best optimal mode:

  • Scenario 1: the best RMD mode is DC (i.e., DC mode is the better than all angular modes. Since DC mode has high probability to be best mode, it is not worth going on large testing of angular modes).

  • Scenario 2: the best RMD mode is planar. (The RMD performance of detecting planar is relatively low, so the reduction of the number N R should be relatively careful).

  • Scenario 3: the best three RMD modes are matching the three best gradient modes.

  • Scenario 4: the best RMD mode is the best gradient mode.

  • Scenario 5: the best RMD mode and best gradient mode are neighbors.

The algorithm below summarizes how, N R , the number of modes is reduced based on the scenarios:

figure a

5 CU coding

In HEVC, for each CU of depth d and size of 2N × 2N, a CU split decision has to be performed. This decision is to evaluate if an encoding of the CU at that depth would be preferred rather than an encoding of the four sub-CUs at depths of d + 1 and sizes of N × N.

So each CU is first encoded to generate a rate-distortion cost of a no split coding:

$$ {J}_{\mathrm{NoSplit}}=D+\lambda .R, $$

where D is the distortion generated by the CU encoding, λ is a Lagrangian multiplier and R is the bit consumption.

The four Sub-CUs are then encoded to generate respectively four RD costs J i. The generated J NoSplit is compared to J Split, the sum of the four sub-CUs costs J i:

$$ {J}_{\mathrm{Split}}={\displaystyle {\sum}_i^4}{J}_i. $$

The split decision is done based on the smallest cost and the process is used for all the supported depth levels d, where d = 0, 1, 2, or 3, so that an optimal CU structure is generated at RD wise.

To deal with such complexity, the proposed scheme in this section, suggests predicting the non-split decisions allowing to avoid unnecessary encoding of sub-CUs.

5.1 Gradient-based scheme

To approach the optimal CU split decision, the proposed scheme estimates the spatial texture complexity of each CU and sub-CUs. The idea is relying, in a first hand, on the hypothesis that detailed texture area would suggest small CU sizes which implies to consider split decisions. In a second hand, flat area would suggest large CU sizes, which implies to consider no split decisions. The complexity estimation is generated through the pixel-based gradient values computed at the pre-processing stage.

We present in this section the design of the split decision adopted in the proposed scheme. Table 5 presents the number of the possible CUs to be tested in one LCU of size 64 × 64. As we can see from the table, the split decisions to be investigated at depths 2 and 3 represent almost the whole cases. Moreover, the split decisions at lower depths are much more sensitive than those at higher depths. In fact, a no split decision implies that no more decisions will be investigated at sub-CUs, so our no split decision has to be particularly careful at lower depths. For these raisons, we limit our early split decision to only depths 2 and 3, i.e., at depths 0 and 1, both split and no split options will be investigated. In order to estimate the optimal decision, we consider the texture complexity T of the current CU represented by the median magnitude value of the 2N × 2N gradient points of the current unit.

Table 5 Number of possible CUs depending on the size

We notice that, in some cases of CUs, the texture complexity measured by T fails to approach the optimal decision. In fact, some CUs have a low texture but four relatively different texture complexities inside its four sub-CUs, respectively. In such a case, a split structure would generate a better RD cost. Therefore, in addition to T, we estimate the texture complexity of each of the four N × N sub-CUs by considering T i , the median value of the N × N gradient magnitude values in the ith sub-CU (1 ≤ i ≤ 4).

In order to estimate the texture difference of the sub-CUs, we consider on the proposed scheme, the texture variation presented as:

$$ V=\frac{1}{4}\ {\displaystyle {\sum}_{i=1}^4}\left|T-{T}_i\right|. $$

Considering both the texture magnitude and texture variation, we set the split criterion as follows:

$$ SpC=\alpha T+\beta V, $$

where α and β are two weighting factors.

An early no split decision occurs when the following inequality holds:

$$ \mathrm{S}\mathrm{p}\mathrm{C} < {T}_d, $$

where T d is a threshold that depends on the depth d of the current CU.

5.2 CU coding performance

In this section, we evaluate the performance of the proposed split scheme and refine the adopted criteria based on this performance. For that purpose, we compare the new scheme performance to the case of a theoretical optimal decision. The optimal decision, obtained by encoding the current CU twice (without split and with a split), represents the encoding case with the smallest RD cost.

To evaluate the proposed scheme, we need to consider two key aspects that have a direct impact on the coding efficiency.

  • No split matching rate (NS): this rate represents the cases in which we obtain a no split decision through the proposed criterion while the optimal decision is also a no split. In such cases, the proposed schemes succeed to speed up the encoding by avoiding unnecessary encoding of the sub-CUs without involving any loss in RD performance.

  • Split error rate (SE): this rate represents the cases of a no split decision while the optimal decision is a split. Such cases imply speeding the encoding but with a RD loss.

In order to investigate on the impact of the two factors T and V involved in the split criterion SpC, we present below the NS-SE relation for different values of α and β (0, 1, and 2), with different values of the threshold T d .

From Fig. 6, in which we present the case of CUs with a depth of 3, we notice that involving the two factors T and V allows achieving better results than the case of considering only one of two factors.

Fig. 6
figure 6

Example case of NS (no split match)/SE (split error) function, depth 3

We notice also, that we do not really need to favor one of the two factors in the criterion which, thus, could be simplified as:

$$ {\mathrm{SpC}}_{d=3}=T+V $$

For the cases of depth of 2 presented in Fig. 7, we notice that we have the best performance while considering only the texture variation. This can be explained by the fact that T is a relatively an overall metric which is more suitable for small sizes of CUs. In this case, the variation becomes a more precise metric for the split decision than the texture itself. So for depth 2, we adopt the following criterion:

Fig. 7
figure 7

Example case of NS (no split match)/SE (split error) function, depth

$$ {\mathrm{SpC}}_{d=2}=V $$

For the threshold T d, we choose the values of 65 and 2.2, respectively, for depths 3 and 2 as we notice that the algorithm achieves favorable results under these values.

6 Results and discussion

In this section, we are interested in the overall effect of the proposed features on the encoding performance. For this purpose, the proposed algorithm was integrated in HM 14.0, and simulations were performed conforming to common test condition specified in [30]. To measure the time effect of the algorithm, we consider the time saving:

$$ \varDelta T=\left({T}_{\mathrm{HM}14}-{T}_{\mathrm{Prop}}\right)/{T}_{\mathrm{HM}14}, $$

where T HM14 is the encoding time of HM14.0 and T Prop is that of the proposed solution integrated on HM14.0.

As the implemented feature concerns mainly the intra coding, we present the results for an (AI) coding. We have set the number of modes in the candidate set to be tested in the RMD to 15, 14, 8, 6, and 5 for, respectively, the PU sizes of 4 × 4, 8 × 8, 16 × 16, 32 × 32, and 64 × 64. And for the RDO, we kept the tested mode numbers as defined in the HM (8, 8, 3, 3, and 3 accordingly).

We present, in Table 6, the (BD-rate/BDBR) [31] and time saving performance of the gradient solution over that of HM14.0. We can see from the table that the proposed algorithm achieves a time saving for all the sequences with an average value of 42.6% while increasing in BD-rate of 1.1%.

Table 6 Encoding efficiency for all intra coding, with a bit depth of 8

To further evaluate the proposed algorithm, we compare it to Jiang work [21], a Sobel-based gradient algorithm, which we refer to as SG. In order to properly measure the effect of each of the proposed features, we present in Table 7 different combinations of the SG algorithm with the proposed features.

Table 7 Performance comparison

We note here that Jiang has used different RMD and RDO iteration numbers. In our simulations, in order to have apples to apples comparison, we use the same iteration numbers specified earlier, for all the configurations. From the results, we see that SG achieves an average of 11.6% in time reduction with an increase of 0.6% in BD-rate. This results seem less obvious in time saving than [21] but with less loss in BD-rate. The small result difference is mainly due to the fact that we use here different iteration numbers and also because the intra prediction implementation in HM 4.0, used by Jiang, presents some difference with that in HM14.0. For example, the intra prediction supports now 35 modes for all PU sizes, unlike that in HM4.0 which supports 3 modes for 64 × 64 PUs.

The combination of SG with our gradient stage optimizations, referred to as optimized Sobel based gradient algorithm (OSG), achieves almost the same complexity reduction as SG configuration, with 11.0%. This configuration gives an increase in BD-rate of only 0.3%. This result shows thus that the optimizations enhance the performance of the gradient HEVC intra modes detection and offers around 0.3% in BD-rate. Additionally, we expose the performance of the combination of SG with the optimizations as well as the fast RDO feature. This combination, referred to as fast optimized Sobel gradient algorithm (FOSG), allows to reach 31.8% in time saving with an increase of 0.9% in BD-rate. Such an algorithm shows then how the gradient information would be exploited to avoid unnecessary treatment.

In addition to the exposed combinations, we consider, for the performance evaluation, additional configurations since the proposed algorithms deals with the intra mode decision and the CU decision. The first configuration includes only the intra MD and will be noted as Prop-MD.

The second configuration includes only the CU split decision algorithm and will be named Prop-Splt. The configuration combining these two aspects will be noted as Prop-Overall.

As we can see from the results table, the configuration Prop-MD, gives 31.8% in time saving with 0.9% in BD-rate. Comparing this result to that of FOSG confirms that Prewitt operator offers better intra mode detection and so better encoding efficiency than the Sobel operator. This confirms the advantage of a pre-processing solution based on the Prewitt operator, offering in addition more friendly hardware implementation, with better options for multiple data operations.

The configuration Prop-Splt, which presents a solution for CU coding, gives an average reduction time of 31.0% with a BD-rate increase of 0.7%. Finally, the configuration Prop-Overall, combining both the intra MD and the CU coding presents a time reduction of 42.8% with an average BD-rate increase of 1.1%.

We propose here that the profiling of execution time computed according to Eq. (25) aims to estimate the complexity reduction at the prediction stage compared to the Hadamard transform based prediction used in HM. Such time profiling does not aim to estimate the time execution effects of the two operators at the pre-processing stage. This is due to the fact that the pre-processing stage is about only 2% of the whole HM intra encoding.

7 Conclusions

This paper has presented a pixel-based gradient pre-processing stage for HEVC intra coding. The proposed algorithm uses Prewitt as a discrete differentiation operator in order to approximate the gradient values on the original picture. The algorithm generates a preferred direction for each pixel in each PU, from which we select a candidate set of modes to be tested at a rate-distortion optimization level. The mode selection is optimized through neighbor mode extension and adapted cost function that takes into account both the most appearing modes and those with higher gradient magnitudes. Moreover, we exploit the gradient information in order to speed up the best intra mode research process. For that purpose, we rely on different probability scenarios in order to limit the modes to be tested to only the most probable ones. In addition to the intra mode decision, we propose a gradient-based CU split scheme in which we set criteria to measure the texture complexity of each CU. The results show that the proposed algorithm achieves a time saving of 42.8% with an average increase in BD-rate of just 1.1%.

As the proposed gradient pre-processing stage presents promising performances, we intend to further optimize the solution for hardware real-time application. In fact, we are finalizing an investigation work that allows to completely ovoid the pixel based research process of the intra mode from the look-up table presented in section 3.2, which is the heaviest step of the pre-processing stage.



All intra


Bjontegaard delta rate


Coding tree block


Coding unit


Fast optimized Sobel gradient algorithm


High Efficiency Video Coding


HEVC test model


Joint Collaborative Team on Video Coding


Largest coding unit


ISO/IEC Moving Picture Experts Group


Most probable mode


No split matching rate


Optimized Sobel-based gradient algorithm


Prediction unit


Rate distortion


Rate-distortion optimized quantization


Rough mode decision


Residual quadtree


Sum of absolute transform difference


Split error rate


Transform unit


ITU-T Video Coding Experts Group


  1. GJ Sullivan, JR Ohm, WJ Han, T Wiegand, Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst Video Technol 22(12), 1649–1668 (2012). doi:10.1109/TCSVT.2012.2221191

    Article  Google Scholar 

  2. B Bross, WJ Han, JR Ohm, GJ Sullivan, YK Wang, T Wiegand, High Efficiency Video Coding (HEVC) Text Specification Draft 10, in Doc. JCTVC-L1003 (rev. 37), JCTVC 13th Meeting of Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 2013

    Google Scholar 

  3. J Lainema, F Bossen, WJ Han, J Min, K Ugur, Intra coding of the HEVC standard. IEEE Trans Circuits Syst Video Technol 22(12), 1792–1801 (2012). doi:10.1109/TCSVT.2012.2221525

    Article  Google Scholar 

  4. C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, F Wu, A Highly, Parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5), 573–576 (2014). doi:10.1109/LSP.2014.2310494

    Article  Google Scholar 

  5. C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, F Wu, Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12), 2077–2089 (2014). doi:10.1109/TCSVT.2014.2335852

    Article  Google Scholar 

  6. C Yan, Y Zhang, F Dai, X Wang, L Li, Q Dai, Parallel deblocking filter for HEVC on many-core processor. Electron Lett 50(5), 367–368 (2014). doi:10.1049/el.2013.3235

    Article  Google Scholar 

  7. C Yan, Y Zhang, F Dai, J Zhang, L Li, Q Dai, Efficient parallel HEVC intra-prediction on many-core processor. Electron Lett 50(11), 805–806 (2014). doi:10.1049/el.2014.0611

    Article  Google Scholar 

  8. S Wang, A Rehman, K Zeng, Z Wang, SSIM-inspired Two-pass Rate Control for High Efficiency Video Coding (IEEE International Workshop on Multimedia Signal Processing (MMSP), Xiamen, 2015), pp. 19–21

    Google Scholar 

  9. H Sun, D Zhou, S Goto, A Low-complexity HEVC Intra Prediction Algorithm Based on Level and Mode Filtering (IEEE International Conference on Multimedia and Expo (ICME), Melbourne, 2012), pp. 9–13

    Google Scholar 

  10. H Lei, Z Yang, Fast Intra Prediction Mode Decision for High Efficiency Video Coding, 2nd International Symposium on Computer (Communication, Control and Automation, , Singapore, 2013). doi:10.2991/3ca-13.2013.9

    Google Scholar 

  11. H Zhang, Z Ma, in 13th Pacific-Rim Conference on Multimedia, Singapore, December, 2012. Lecture notes in artificial intelligence, ed. by W Lin, D Xu, A Ho, J Wu, Y He, J Cai, M Kankanhalli, MT Sun, vol. 1114 (Springer, Heidelberg, 2012), p. 157

    Google Scholar 

  12. TD Silva, LV Agostini, LADS Cruz, Fast HEVC Intra Prediction Mode Decision Based on Edge Direction Information (European Signal Processing Conference (Eusipco), Bucharest, 2012), pp. 27–31

    Google Scholar 

  13. SL Shen, Z Liu, X Zhang, W Zhao, Z Zhang, An effective cu size decision method for HEVC encoders. IEEE Trans on Multimedia 15(2), 465–470 (2013). doi:10.1109/TMM.2012.2231060

    Article  Google Scholar 

  14. X Shen, L Yu, J Chen, Fast Coding Unit Size Selection for HEVC Based on Bayesian Decision Rule (Picture Coding Symposium (PCS), Krakow, 2012), pp. 7–9

    Google Scholar 

  15. H Zhang, Z Ma, Early Termination Schemes for Fast Intra Prediction in High-Efficiency Video Coding (IEEE International Symposium on Circuits and Systems (ISCAS), Melbourne, 2013), pp. 1–5

    Google Scholar 

  16. H Zhang, Z Ma, Fast intra mode decision for high efficiency video coding (HEVC). IEEE Trans Circuits Syst Video Technol 24(4), 660–668 (2014). doi:10.1109/TCSVT.2013.2290578

    Article  Google Scholar 

  17. Y Kim, D Jun, S Jung, JS Choi, J Kim, A fast intra-prediction method in HEVC using rate-distortion estimation based on Hadamard transform. ETRI J 35(2), 270–280 (2013). doi:10.4218/etrij.12.0112.0223

    Article  Google Scholar 

  18. AC Tsai, A Paul, JC Wang, JF Wang, Intensity gradient technique for efficient intra-prediction in H.264/AVC. IEEE Trans Circuits Syst Video Technol 18(5), 694–698 (2008). doi:10.1109/tcsvt.2008.919113

    Article  Google Scholar 

  19. Y Zhang, Z Li, B Li, Gradient-based Fast Decision for Intra Prediction in HEVC (IEEE Visual Communications and Image Processing (VCIP), San Diego, 2012), pp. 27–30

    Google Scholar 

  20. F Pan, X Lin, S Rahardja, K Lim, Z Li, D Wu, S Wu, Fast mode decision algorithm for intra prediction in H.264/AVC video coding. IEEE Trans Circuits Syst Video Technol 15(7), 813–822 (2005). doi:10.1109/TCSVT.2005.848356

    Article  Google Scholar 

  21. W Jiang, H Ma, Y Chen, Gradient Based Fast Mode Decision Algorithm for Intra Prediction in HEVC, in International Conference on Consumer Electronics (Communications and Networks (CECNet), Yichang, 2012), pp. 21–23

    Google Scholar 

  22. M Jamali, S Coulombe, F Caron, Fast HEVC Intra Mode Decision Based on Edge Detection and SATD Costs Classification (Data Compression Conference (DCC), Snowbird, 2015), pp. 7–9

    Google Scholar 

  23. A BenHajyoussef, T Ezzedine, A Bouallegue, Fast gradient based intra mode decision for high efficiency video coding. Int J Emerg Trends Technol Comput Sci 3(3), 223–228 (2014)

    Google Scholar 

  24. A BenHajyoussef, T Ezzedine, A Bouallegue, Optimized Intra Mode Decision for High Efficiency Video Coding (International Conference on Image Analysis and Processing (ICIAP), Genoa, 2015), pp. 7–11

    Google Scholar 

  25. B Bross, WJ Han, JR Ohm, GJ Sullivan, T Wiegand, WD4: Working Draft 4 of High-Efficiency Video Coding, in Doc. JCTVC-F803_d6 (rev. 6), JCTVC 6th Meeting of Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 2011

    Google Scholar 

  26. HEVC reference model. Accessed 06 Aug 2016

  27. Y Piao, J Min, J Chen, Encoder Improvement of Unified Intra Prediction, Doc. JCTVC-C207 (JCTVC 3rd Meeting, Guangzhou, 2010), pp. 7–15

    Google Scholar 

  28. L Zhao, L Zhang, X Zhao, Further Encoder Improvement of Intra Mode Decision (Doc. JCTVC-D283, in JCTVC 4th Meeting, Daegu, 2011), pp. 20–28

    Google Scholar 

  29. L Zhao, L Zhang, S Ma, D Zhao, Fast Mode Decision Algorithm for Intra Prediction in HEVC (Visual Communications and Image Processing (VCIP), Tainan City, 2011), pp. 6–9

    Google Scholar 

  30. F Bossen, Common HM Test Conditions and Software Reference Configurations (Doc. JCTVC-L1100, in JCTVC 13th Meeting, Genova, 2013), pp. 14–23

    Google Scholar 

  31. G Bjontegaard, Calculation of Average PSNR Differences Between R-D Curves (Doc. VCEG-M33, in ITU-T VCEG 13th Meeting, Austin, 2001), pp. 2–4

    Google Scholar 

Download references


No funding sources were available for these research works.

Authors’ contributions

ABH, TE, and AB conceived and designed the research. ABH performed the experiments. ABH and TE analyzed the data. ABH and TE wrote and edited the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Anis BenHajyoussef.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

BenHajyoussef, A., Ezzedine, T. & Bouallègue, A. Gradient-based pre-processing for intra prediction in High Efficiency Video Coding. J Image Video Proc. 2017, 9 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: