 Research
 Open Access
 Published:
Gradientbased preprocessing for intra prediction in High Efficiency Video Coding
EURASIP Journal on Image and Video Processing volume 2017, Article number: 9 (2017)
Abstract
In order to reach higher coding efficiency compared to its predecessor, a stateoftheart video compression standard, the High Efficiency Video Coding (HEVC), has been designed to rely on many improved coding tools and sophisticated techniques. The new features are achieving significant coding efficiency but at the cost of huge implementation complexity. This complexity has increased the HEVC encoders’ need for fast algorithms and hardware friendly implementations. In fact, encoders have to perform the different encoding decisions, overcoming the realtime encoding constraint while taking care of coding efficiency. In this sense, in order to reduce the encoding complexity, HEVC encoders rely on lookahead mechanisms and preprocessing solutions. In this context, we propose a gradientbased preprocessing stage. We investigate particularly the Prewitt operator used to generate the gradient and we propose necessary approaches that enhance the gradient performance of detecting the HEVC intra modes. We also set different probability scenarios, based on the gradient information, in order to speed up the mode search process. Moreover, we propose a gradientbased estimation of the texture complexity that we use for coding unit decision. Results show that the proposed algorithm achieves a reduction of 42.8% in encoding time with an increase in BD rate of only 1.1%.
Introduction
Especially, with the emergence of the H.264/AVC standard, a significant progress has been performed in video applications. This progress has led to an increasing need for better video quality and higher compression especially with the applications and services dealing with high and ultrahigh resolutions.
In this context, the Joint Collaborative Team on Video Coding (JCTVC), a team of experts from the ITUT Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) have standardized, in 2013, a stateoftheart video coding, the HEVC [1, 2]. The architecture of the new standard has kept the same highlevel design as its predecessor. But, the HEVC relies on many improved coding tools and techniques that offer effectively higher coding efficiency but at the cost of more encoding complexity.
The block structure is one of the most important new features that contributes to this complexity and directly affects all the other features. In fact, HEVC relies on a coding tree block (CTB) structure. Unlike the AVC Macrobloc with the size of 16 × 16, the largest coding unit (LCU) defined in HEVC, allows to use block sizes of 8 × 8 up to 64 × 64. The LCU can then be partitioned, following a quadtree structure, into coding units (CUs), where each CU can be recursively partitioned into four subCUs. After the CU partitioning structure is performed, each CU can then be split, at the prediction stage, into one or more prediction units (PUs) [2, 3]. Moreover, at the transform stage, each CU can be split into one or more recursive transform units (TUs). Figure 1 illustrates a description of the possible recursive splits of a CU in intra coding case, on which we particularly focus in this work.
At the intra prediction level, HEVC supports 35 prediction modes, as can be seen in Fig. 2. In addition to DC and planar modes, HEVC supports 33 angular modes, much more than the maximum of eight angular modes proposed by H.264/AVC. Furthermore, the new standard allows deriving a “most probable mode” from neighbor blocks. In the case of the Chroma component, the same mode as the Luma can be used. Moreover, HEVC supports additional reference sample smoothing as well as a boundary smoothing.
All these sophisticated prediction features offer a better coding efficiency, but at the cost of significant complexity at the encoder side. Thus, the HEVC encoders are facing a real challenge for speeding up the encoding process and especially the mode decisions while paying a close attention to encoding efficiency.
Many approaches have been studied in order to speed up the encoding decisions. Among these approaches, manycore processors technique, which relies on parallelization of encoding algorithms, presents a good alternative. Many works have been conducted to speed up some encoding processes as the coding unit partitioning [4], the motion estimation [5], the HEVC deblocking filter [6], and the intra prediction [7]. Another approach relies on multipass processing. For example in [8], Wang et al. proposed a twopass based rate control algorithm. In addition to the solution of multipass processing, of which the possibility remains quite related to the application constraints, it became quite important for HEVC encoders to rely on lookahead and preprocessing solutions.
In this work, the large number of HEVC supported intra modes presents a motivation to investigate the solution of a pixel gradientbased preprocessing stage that will operate on the original frame. We are interested in intra coding dealing with mode decision as well as CU coding.
Many works have been proposed to deal with these aspects. In [9], a fast preprocessing is proposed to generate estimations of the RD costs. Operating on the original frame instead of the reconstructed one, the preprocessing allows to reduce the data dependency from the reconstruction loop. Then, the generated data is used to reduce the number of tested prediction unit levels as well as the number of tested intra modes.
In [10, 11], a downsampling approach is applied on the CU in order to reduce the prediction related computation. The downsampled prediction is coupled with a progressive search in order to reduce the intra candidate modes. In [12], authors categorize the edge directions in five groups by applying different types of differences on the pixel values. A dominant edge direction for each PU is generated and used to reduce the number of intra modes going to be evaluated. Shen et al. [13] have used the spatial correlation between neighbor CUs in order to speed up the CU split decision as well as early terminating the motion estimation. A Bayesian rulebased approach has been proposed in [14]. The CU split decision is formulated as classification problem for which a probability density function is estimated. A minimization of the Bayesian risk is performed in order to approach the optimal CU split decision. The works [15–17] have relied on the correlation between the intra RD cost and its estimation based on Hadamard transform, for early termination of the intra mode and CU coding decisions.
Now regarding the gradientbased approach, which is of particular interest here, many works have been proposed in video coding and they could be categorized into two main classes: The first class deals with works that generate gradient information through differences computation on the pixel blocks. Such work has been conducted by Tsai et al. [18] for H.264 intra prediction. A similar approach has been proposed by Yongfei [19] for HEVC intra prediction. The second class concerns works in which a differential operator is used to approximate the mathematical gradient values such as [20], where Pan et al. proposed to measure the edge directions, at a preprocessing level, with the Sobel operator. The generated gradient information is used then to predict the H.264 intra modes. A similar approach, using the same operator, has been proposed by Jiang [21] for HEVC intra prediction. More recent similar work has been proposed in [22] coupled by gap consideration into the values of the sums of absolute transformed differences (SATD), in order to eliminate less probable modes from the prediction process.
In this work, we focus on this later class as it offers a mathematic generation of the gradient direction, which is an interesting solution for taking advantage of the large number of HEVC angular intra modes. As we are particularly interested in reducing the implementation complexity compared to [21, 22], we focus on the operator used for the gradient computation. The reason why Sobel operator is widely used in gradient intra prediction works and in general in many image and video algorithms and applications comes especially from its significant performance on edge detection area. In this work, we are interested in comparing its performance in detecting the HEVC intra direction with the Prewitt operator. Such a work is motivated by the fact that Prewitt operator offers simpler coefficients that can contribute much less implementation complexity for a gradient solution. In [23], we have conducted a motivation work using the Prewitt operator with granular pixel coverage, toward further understanding the gradient operators’ impact on HEVC video coding. We also presented a pixel neighbor extension of the gradient values in order to enhance the performance of the intra mode detection. In [24], we investigated the twodimensional Roberts operator to even more simplify the gradient computation. In addition, we considered the appearance number of modes, as well as the gradient magnitude in order to optimize the performance of intra mode detection.
In this work, we extend the latter approaches to present a complete preprocessing solution for intra coding. In order to speed up the process of optimal intra mode research, we exploit the gradient information, generated at the preprocessing stage, to limit the modes to be tested to only the most probable ones based on different probability scenarios. Moreover, we propose a gradientbased scheme for the CU intra split decision. For this purpose, we propose an approach to measure the texture complexity depending on the CU sizes.
We consider here the work of Jiang as a basis work for a gradient solution for HEVC. Jiang has worked on HM4.0 [21, 25] but since that time, some features in the intra prediction design has changed. For example, unlike HM4.0 which supports three modes for 64 × 64 PUs, the HEVC standard supports 35 modes as will be exposed in more details in a next section of this paper. Therefore, in this work, we test the gradientbased approach on recent adopted HEVC design of intra prediction.
The remainder of this paper is organized as follows. Section 2 presents the experimental methods. Section 3 presents an overview of the HEVC intra prediction algorithm as well the proposed gradientbased intra prediction. In addition, it exposes the proposed optimization approaches dealing with a preselection of intra mode as well as an optimized mode selection at PU level. In Section 4, we present an approach for speeding up the intra prediction based on the gradient information. Section 5 exposes the proposed schemes for the CU split decision. Then, Section 6 presents the experimental results of the proposed algorithm. And finally, we present the conclusions in Section 7.
Experimental methods
The aim is to measure the impact of the proposed solution on video coding efficiency as well as on time of coding. For that purpose, the proposed algorithm was integrated in HEVC test model (HM) version 14.0. Simulations were performed conforming to common test condition specified in [30]. As the implemented feature concerns mainly the intra coding, we present the results for an all intra (AI) coding. We used test video sequences of classes A to E. To measure the coding efficiency, we present the Bjontegaard delta rate (BDrate) [30]. This metric represents the average difference between the original ratedistortion curve and that obtained after the integration of the proposed features. The ratedistortion curves are obtained by coding each test sequence at four different QPs: 22, 27, 32, and 37. We measure the coding time saving according to Eq. (25), using T _{HM14} which is the encoding time of HM14.0 and T _{Prop} which is that obtained after the integration of the proposed solution on HM14.0.
HEVC intra prediction
Overview of HEVC intra prediction
In order to speed up the intra prediction process, HM [26] adopted a simplified intra prediction algorithm. As presented in Fig. 3, the adopted algorithm goes through four stage processes for each PU. In the first stage, referred to as the rough mode decision (RMD), the HM performs a Hadamard transform for each PU possible size, for all the 35 possible intra modes, to generate for each combination the sum of SATDs [27].
The SATD will be used in the estimation of the ratedistortion (RD) cost of that PU, as shown in the following equation:
where λ is a Lagrangian multiplier and R is the bit consumption estimation.
After the RMD step, a mode candidate set ψ ^{R} is generated by considering the best intra modes. The number of the candidate modes is set to 3, 3, 3, and 8, respectively, for PU sizes of 64 × 64, 32 × 32, 16 × 16, and 8 × 8 [28]. To exploit the correlation of direction information between the neighboring blocks [29], a check is performed, at a second stage, for additional most probable modes (MPMs) that are derived from neighbors. These modes are added, if they are not already included, to form an extended candidate set ψ ^{M} [30]. At the third stage, a ratedistortion optimized quantization (RDOQ) is performed using the modes of the candidate set at only the maximum size of TU. The goal of this step is pick the optimal intra mode m _{opt} for the PU as well as the best PU split structure at ratedistortion wise. In the last stage, the optimal mode m _{opt} found previously is used in order to find out the optimal residual quadtree (RQT) structure.
Gradientbased intra prediction
The idea of a gradientbased intra prediction is estimate the pixel intensity variation in order to approach the best intra mode direction.
The computation of the gradient values is performed through a discrete differentiation operator. The operator relies on horizontal and vertical kernels, noted here, respectively, S _{x} and S _{y}. The operator kernels will be detailed on the next section. At each pixel position of the original image, presented here as a two dimension matrix A, we perform a convolution through the two kernels, according to Eqs. (2) and (3), generating two matrices G _{x} and G _{y}. These matrices represent, respectively, an approximation of the horizontal and vertical derivatives at a pixel position.
The corresponding gradient direction is then generated according to Eq. (4). The generated direction at each pixel position, is supposed to represent the most important variation of pixels intensity. That is, in the case of a pixel located on an edge, the obtained gradient direction goes across that edge as presented in Fig. 4. For our case, we consider the perpendicular direction to the gradient as it represents the similarity direction of pixel intensity. Equation (4) could be simplified to only computing the value of G _{y}/G _{x}, relying on the fact that arctan function is monotone [21].
We presented in Table 1 the HEVC intra directions and correspondent G _{y}/G _{x} values.
As exposed in Eq. (5), the HEVC supported intra direction Φ _{ m } that is the nearest to the obtained Φ _{ G } value is picked from the lookup table, and the corresponding intra mode m is affected to the current pixel location.
For the gradient magnitude, it can be roughly approximated as such:
At the end of this preprocessing step, we will obtain a mode map m _{ i } as well as a magnitude map M _{ i } where i presenting a pixel position i.
For each PU, a mode histogram with accumulated mode magnitudes will be generated. The modes with highest values will be selected to form the candidate set. We mention here that the generated mode matrix contains only angular modes.
DC and planar modes are not represented and as these two modes have great probability to be the best modes at the end of the ratedistortion evaluation, we include them automatically in the candidate set. Figure 5 summarizes the details of the algorithm flow with some features that will be treated in the rest of this paper.
Operators analysis
The works that have proposed gradientbased solutions such as [20] and [21] have used the Sobel operator to compute the gradient. The reason behind this is that Sobel has one of the best edge detection performance among the existing operators.
The Sobel kernels are exposed in the equations below:
In our work, we are particularly interested in reducing the implementation complexity especially for such a processing that operates on a pixel basis. The convolution computation presents the heaviest part of the preprocessing stage, which gives a special interest to focus on its complexity. For this reason, we propose to investigate on the potential of the Prewitt operator as it offers simpler coefficients. The kernels of the Prewitt operator are exposed in the following equations:
This operator, with simpler coefficients presents some keys points that offer less implementation complexity. In fact, the Prewitt filter has only 1 and −1 coefficients, which can be implemented with simple additions and subtractions instructions. However, for Sobel operator case, which includes 2 and −2 coefficients, the gradient calculations would be implemented with additional instructions. For hardware implementation considerations, the 2 and −2 coefficients, make the convolution implementation need applying additional masks to isolate the pixels concerned by these coefficients as well as extra addition/subtraction instructions. Such considerations make the Prewitt filter much simpler, hardwarewise, especially for the applications that require a gradient generation at a pixel level.
Besides the implementation complexity, the investigation of the Prewitt based solution is motivated by some aspects:

The relation between approximating the pixel gradients and detecting the intra prediction directions is not that evident. In fact, the gradient solution is used as an approximation of the pixel intensity direction that would best represent the current PU. But the theoretical optimal direction is only related to ratedistortion wise. Hence, the impact of gradient operator on video coding efficiency should be investigated.

There are only 33 angular directions to represent each PU best direction. Hence, we have to choose the nearest HEVC supported direction Φ _{ m } to represent the computed gradient direction Φ _{ G }. This difference between Φ _{ m } and Φ _{ G } would offer an additional margin for a less accurate operator to make up for detection performance.
To evaluate the operator’s capabilities of detecting the HEVC intra directions, we consider the hit rate of the theoretical best intra angular modes. For such an experiment, we force the RDO step to perform the ratedistortion quantization on all the 35 intra modes while ignoring the candidate set. Then, we evaluate the operators capabilities of detecting the theoretical optimal mode in the candidate set. We consider here the hit rate for the gradientbased algorithm using Sobel and Prewitt operators as well as the Hadamardbased prediction used on HM. We present in Table 2 the hit rate for different sequences and different QP values taking into consideration only the angular cases. First, we notice from results, that the Prewitt operator is generating a detection performance of 63.39%, which is relatively close to that of Sobel with 63.01%. Thus, the small difference in detection performance between the two operators motivates an investigation on a gradient solution based on Prewitt operator. We notice also from the table that the Sobel and Prewitt operators are generating detection performances that are relatively low compared to the Hadamardbased prediction.
Actually, this difference should be considered taking into account that the Hadamard prediction is performing a kind of multipass processing. In fact, that the Hadamard prediction is performing huge transform computation that is made for each intra mode to estimate the corresponding distortion.
Also, it is estimating the bit consumption of each mode. These estimations are then used into a ratedistortion cost function in order to choose the best intra mode.
The comparison with the results of the Hadamard based prediction suggests to optimize the gradientbased solution to have better detection performance. So in the next sections, we propose some approaches that would improve the gradient solution performance of detecting the HEVC directions while keeping a close watch to the implementation complexity.
Optimization of intra mode detection
Optimal mode selection
To choose the best modes for the candidate set, Jiang [21] has considered, as a cost function, the accumulated gradient magnitudes M _{ m } for each mode m in the current PU.
The cost function is as follows:
where M _{ m,i } is the gradient magnitude of a point i of which the gradient direction corresponds to the mode m. Thereafter, a histogram of Cost_{ m } is considered to select the modes with the highest values.
However, the M _{ m } criterion is presenting some limitations. In fact, we can have, in some cases, a mode that appears in many points in the PU but with small magnitudes representing a spread variation of pixel intensity but with very small values. And we can have, in other cases, a mode that exists in few points but with high gradient magnitudes reflecting a limited but high variation of pixel intensity. So as in both cases, the most appearing modes as well as the modes with high gradient values would approach the optimal mode, so we propose here, to consider in addition to M _{ m }, the number of appearance of a mode m in the current PU, N _{ m }.
To investigate the impact of these two factors on the criterion, we consider the following cost function for each angular mode m:
where α is a weighting factor that belongs to [0,1].
We consider here the hit rate of the theoretical optimal intra mode when this mode is angular. Table 3 below shows the results for different sequences and QP values. From the table, we notice first, that the highest values are obtained with the cases that make involving both N _{ m } and M _{ m } in the mode selection function. The explanation behind this is that, in some cases, especially for small block sizes, in which we have fewer gradient samples, we can have some gradient points with the same appearing number, thus favoring those with higher gradient magnitudes improves the mode detection. And, we can have in other cases, some points that have the same gradient magnitudes. In this case, favoring the most appearing modes, improves also the decision performance.
The optimal weighting value for the different QP values is obtained around α value of 0.8. Such a case, when compared to the case of N _{ m } only consideration, improves the average hit rate of the best mode by 0.21%. For complexity consideration, we prefer to consider α value of 0.5, to avoid the weighting so that the cost function becomes:
Mode preselection
Owing to the nature of the gradient computation, the generated modes are, in fact, just approximations. So, we propose in this section to extend, the detection of a mode m at a pixel position, to a range of modes. So in addition to the detected mode, we will consider the neighbor modes m + 1 and m − 1 provided that they exist. To expose the approach, we express the cost function Cost_{ m } in function of Cost_{ m,i } which is the cost related to a point i in the current PU, in which the mode m was detected:
In addition to the increase of Cost_{m} with the value Cost_{m,i} for each pixel point i, the proposed approach consists in increasing Cost_{ m − 1,i } and Cost_{ m + 1,i } as well. As expressed in Eq. (16), the Cost_{ m,i } is weighted by a bonus value b _{m} used to favor the detected mode against its two neighbors. Similarly, the Cost_{ m − 1} and Cost_{ m + 1} of the neighbor modes m + 1 and m − 1 are weighted by a neighboring bonus value b _{n}, used to favor these two neighbor modes against the other modes:
where b _{ m } and b _{ n } are the used bonus values so that b _{ m } > b _{ n }.
For investigation on the best bonus values, we consider for different values of b _{ m } and b _{ n } the hit rate of the theoretical best mode in the candidate set for each PU. We notice from the results that, the hit rate is more related to the quotient q expressed in Eq. (17), than the values of the couple (b _{ m }; b _{ n }) themselves, i.e., we have for example almost the same results for the couples (2;1) and (4;2) despite the fact that they affect the cost distribution Cost_{ m } differently.
Therefore, we present in Table 4 the hit rate of the best mode for different values of the quotient q on different sequences and different QP values. We mention here that the exposed performance deals with only the cases of angular optimal modes. We see from the table that the rate of best mode matching is clearly improved by the neighbor extension.
In fact, this extension improved the average rate by 3.58, 4.22, 4.28 and 4.24% for, respectively, q values of 1.0, 1.3, 1.5, and 2.0. From the results, we notice also that the hit rate have the best result of q value around 1.5 obtained with bonus couple of (3;2). Thus, in the remaining of this paper, we continue working with these bonus values.
Fast intra mode decision
As mentioned before, all the 35 modes will be tested, in the RMD stage through a Hadamard transform encoding in order to choose the best modes for the current PU. Here, the idea is to select the most probable modes, in order to limit the number of the modes to be tested and so speed up the intra prediction process. In fact, the generated histogram for each PU, presents the cost values Cost_{ m } of each intra mode m.
The value Cost_{ m } reflects a kind of probability of the intra mode m to be the theoretical optimal mode for the current PU, i.e., higher the value Cost_{ m } is, more probable the intra mode m is matching the optimal mode. Therefore, instead of going through all the modes, only a limited list of modes will be investigated. We refer to this list as the gradient candidate set, \( {\psi}_i^G \) where 0 ≤ i ≤ N _{ G }, N _{ G } being the appearance number of modes in the current PU. The gradient modes are ordered from most probable to least probable in the candidate set. The gradient generated modes are more precise for bigger sizes of PU as it has more points to approximate the most representative gradient in the PU. Thus, the number of modes N _{ G } has to be set accordingly. We set this number to 15, 14, 8, 6, and 5 for, respectively, PU sizes of 4 × 4, 8 × 8, 16 × 16, 32 × 32, and 64 × 64, as we noticed that under theses settings, we have good tradeoff between time saving and encoding performance.
The best modes obtained through the RMD process will form the RMD candidate set referred to as \( {\psi}_i^R \), where 0 ≤ i ≤ N _{ R }, N _{ R } being the number of modes. We keep the number of modes N _{ R } as it set in HM14.0, i.e., 8, 8, 3, 3, and 3 for, respectively, PU sizes of 4 × 4, 8 × 8, 16 × 16, 32 × 32, and 64 × 64.
In order to speed up the RDO process, the heaviest stage in the intra prediction, we propose to reduce even more the number N _{ R } for PU sizes of 8 × 8 and 4 × 4, based on the gradient stage performance of detecting the theoretical optimal mode. So, we reduce the number N _{ R } according to different confidence scenarios. These scenarios are set by comparing the candidate set ψ _{ G }, result of gradient stage, to the candidate set ψ _{ R } result of the RMD stage. The idea relies on the hypothesis that the more results are similar, the more the gradient stage is approaching the theoretical optimal mode. The scenarios are ordered below according to the probability of matching the best optimal mode:

Scenario 1: the best RMD mode is DC (i.e., DC mode is the better than all angular modes. Since DC mode has high probability to be best mode, it is not worth going on large testing of angular modes).

Scenario 2: the best RMD mode is planar. (The RMD performance of detecting planar is relatively low, so the reduction of the number N _{ R } should be relatively careful).

Scenario 3: the best three RMD modes are matching the three best gradient modes.

Scenario 4: the best RMD mode is the best gradient mode.

Scenario 5: the best RMD mode and best gradient mode are neighbors.
The algorithm below summarizes how, N _{ R }, the number of modes is reduced based on the scenarios:
CU coding
In HEVC, for each CU of depth d and size of 2N × 2N, a CU split decision has to be performed. This decision is to evaluate if an encoding of the CU at that depth would be preferred rather than an encoding of the four subCUs at depths of d + 1 and sizes of N × N.
So each CU is first encoded to generate a ratedistortion cost of a no split coding:
where D is the distortion generated by the CU encoding, λ is a Lagrangian multiplier and R is the bit consumption.
The four SubCUs are then encoded to generate respectively four RD costs J _{i}. The generated J _{NoSplit} is compared to J _{Split}, the sum of the four subCUs costs J _{i}:
The split decision is done based on the smallest cost and the process is used for all the supported depth levels d, where d = 0, 1, 2, or 3, so that an optimal CU structure is generated at RD wise.
To deal with such complexity, the proposed scheme in this section, suggests predicting the nonsplit decisions allowing to avoid unnecessary encoding of subCUs.
Gradientbased scheme
To approach the optimal CU split decision, the proposed scheme estimates the spatial texture complexity of each CU and subCUs. The idea is relying, in a first hand, on the hypothesis that detailed texture area would suggest small CU sizes which implies to consider split decisions. In a second hand, flat area would suggest large CU sizes, which implies to consider no split decisions. The complexity estimation is generated through the pixelbased gradient values computed at the preprocessing stage.
We present in this section the design of the split decision adopted in the proposed scheme. Table 5 presents the number of the possible CUs to be tested in one LCU of size 64 × 64. As we can see from the table, the split decisions to be investigated at depths 2 and 3 represent almost the whole cases. Moreover, the split decisions at lower depths are much more sensitive than those at higher depths. In fact, a no split decision implies that no more decisions will be investigated at subCUs, so our no split decision has to be particularly careful at lower depths. For these raisons, we limit our early split decision to only depths 2 and 3, i.e., at depths 0 and 1, both split and no split options will be investigated. In order to estimate the optimal decision, we consider the texture complexity T of the current CU represented by the median magnitude value of the 2N × 2N gradient points of the current unit.
We notice that, in some cases of CUs, the texture complexity measured by T fails to approach the optimal decision. In fact, some CUs have a low texture but four relatively different texture complexities inside its four subCUs, respectively. In such a case, a split structure would generate a better RD cost. Therefore, in addition to T, we estimate the texture complexity of each of the four N × N subCUs by considering T _{ i }, the median value of the N × N gradient magnitude values in the ith subCU (1 ≤ i ≤ 4).
In order to estimate the texture difference of the subCUs, we consider on the proposed scheme, the texture variation presented as:
Considering both the texture magnitude and texture variation, we set the split criterion as follows:
where α and β are two weighting factors.
An early no split decision occurs when the following inequality holds:
where T _{ d } is a threshold that depends on the depth d of the current CU.
CU coding performance
In this section, we evaluate the performance of the proposed split scheme and refine the adopted criteria based on this performance. For that purpose, we compare the new scheme performance to the case of a theoretical optimal decision. The optimal decision, obtained by encoding the current CU twice (without split and with a split), represents the encoding case with the smallest RD cost.
To evaluate the proposed scheme, we need to consider two key aspects that have a direct impact on the coding efficiency.

No split matching rate (NS): this rate represents the cases in which we obtain a no split decision through the proposed criterion while the optimal decision is also a no split. In such cases, the proposed schemes succeed to speed up the encoding by avoiding unnecessary encoding of the subCUs without involving any loss in RD performance.

Split error rate (SE): this rate represents the cases of a no split decision while the optimal decision is a split. Such cases imply speeding the encoding but with a RD loss.
In order to investigate on the impact of the two factors T and V involved in the split criterion SpC, we present below the NSSE relation for different values of α and β (0, 1, and 2), with different values of the threshold T _{ d }.
From Fig. 6, in which we present the case of CUs with a depth of 3, we notice that involving the two factors T and V allows achieving better results than the case of considering only one of two factors.
We notice also, that we do not really need to favor one of the two factors in the criterion which, thus, could be simplified as:
For the cases of depth of 2 presented in Fig. 7, we notice that we have the best performance while considering only the texture variation. This can be explained by the fact that T is a relatively an overall metric which is more suitable for small sizes of CUs. In this case, the variation becomes a more precise metric for the split decision than the texture itself. So for depth 2, we adopt the following criterion:
For the threshold T _{d}, we choose the values of 65 and 2.2, respectively, for depths 3 and 2 as we notice that the algorithm achieves favorable results under these values.
Results and discussion
In this section, we are interested in the overall effect of the proposed features on the encoding performance. For this purpose, the proposed algorithm was integrated in HM 14.0, and simulations were performed conforming to common test condition specified in [30]. To measure the time effect of the algorithm, we consider the time saving:
where T _{HM14} is the encoding time of HM14.0 and T _{Prop} is that of the proposed solution integrated on HM14.0.
As the implemented feature concerns mainly the intra coding, we present the results for an (AI) coding. We have set the number of modes in the candidate set to be tested in the RMD to 15, 14, 8, 6, and 5 for, respectively, the PU sizes of 4 × 4, 8 × 8, 16 × 16, 32 × 32, and 64 × 64. And for the RDO, we kept the tested mode numbers as defined in the HM (8, 8, 3, 3, and 3 accordingly).
We present, in Table 6, the (BDrate/BDBR) [31] and time saving performance of the gradient solution over that of HM14.0. We can see from the table that the proposed algorithm achieves a time saving for all the sequences with an average value of 42.6% while increasing in BDrate of 1.1%.
To further evaluate the proposed algorithm, we compare it to Jiang work [21], a Sobelbased gradient algorithm, which we refer to as SG. In order to properly measure the effect of each of the proposed features, we present in Table 7 different combinations of the SG algorithm with the proposed features.
We note here that Jiang has used different RMD and RDO iteration numbers. In our simulations, in order to have apples to apples comparison, we use the same iteration numbers specified earlier, for all the configurations. From the results, we see that SG achieves an average of 11.6% in time reduction with an increase of 0.6% in BDrate. This results seem less obvious in time saving than [21] but with less loss in BDrate. The small result difference is mainly due to the fact that we use here different iteration numbers and also because the intra prediction implementation in HM 4.0, used by Jiang, presents some difference with that in HM14.0. For example, the intra prediction supports now 35 modes for all PU sizes, unlike that in HM4.0 which supports 3 modes for 64 × 64 PUs.
The combination of SG with our gradient stage optimizations, referred to as optimized Sobel based gradient algorithm (OSG), achieves almost the same complexity reduction as SG configuration, with 11.0%. This configuration gives an increase in BDrate of only 0.3%. This result shows thus that the optimizations enhance the performance of the gradient HEVC intra modes detection and offers around 0.3% in BDrate. Additionally, we expose the performance of the combination of SG with the optimizations as well as the fast RDO feature. This combination, referred to as fast optimized Sobel gradient algorithm (FOSG), allows to reach 31.8% in time saving with an increase of 0.9% in BDrate. Such an algorithm shows then how the gradient information would be exploited to avoid unnecessary treatment.
In addition to the exposed combinations, we consider, for the performance evaluation, additional configurations since the proposed algorithms deals with the intra mode decision and the CU decision. The first configuration includes only the intra MD and will be noted as PropMD.
The second configuration includes only the CU split decision algorithm and will be named PropSplt. The configuration combining these two aspects will be noted as PropOverall.
As we can see from the results table, the configuration PropMD, gives 31.8% in time saving with 0.9% in BDrate. Comparing this result to that of FOSG confirms that Prewitt operator offers better intra mode detection and so better encoding efficiency than the Sobel operator. This confirms the advantage of a preprocessing solution based on the Prewitt operator, offering in addition more friendly hardware implementation, with better options for multiple data operations.
The configuration PropSplt, which presents a solution for CU coding, gives an average reduction time of 31.0% with a BDrate increase of 0.7%. Finally, the configuration PropOverall, combining both the intra MD and the CU coding presents a time reduction of 42.8% with an average BDrate increase of 1.1%.
We propose here that the profiling of execution time computed according to Eq. (25) aims to estimate the complexity reduction at the prediction stage compared to the Hadamard transform based prediction used in HM. Such time profiling does not aim to estimate the time execution effects of the two operators at the preprocessing stage. This is due to the fact that the preprocessing stage is about only 2% of the whole HM intra encoding.
Conclusions
This paper has presented a pixelbased gradient preprocessing stage for HEVC intra coding. The proposed algorithm uses Prewitt as a discrete differentiation operator in order to approximate the gradient values on the original picture. The algorithm generates a preferred direction for each pixel in each PU, from which we select a candidate set of modes to be tested at a ratedistortion optimization level. The mode selection is optimized through neighbor mode extension and adapted cost function that takes into account both the most appearing modes and those with higher gradient magnitudes. Moreover, we exploit the gradient information in order to speed up the best intra mode research process. For that purpose, we rely on different probability scenarios in order to limit the modes to be tested to only the most probable ones. In addition to the intra mode decision, we propose a gradientbased CU split scheme in which we set criteria to measure the texture complexity of each CU. The results show that the proposed algorithm achieves a time saving of 42.8% with an average increase in BDrate of just 1.1%.
As the proposed gradient preprocessing stage presents promising performances, we intend to further optimize the solution for hardware realtime application. In fact, we are finalizing an investigation work that allows to completely ovoid the pixel based research process of the intra mode from the lookup table presented in section 3.2, which is the heaviest step of the preprocessing stage.
Abbreviations
 AI:

All intra
 BDrate:

Bjontegaard delta rate
 CTB:

Coding tree block
 CU:

Coding unit
 FOSG:

Fast optimized Sobel gradient algorithm
 HEVC:

High Efficiency Video Coding
 HM:

HEVC test model
 JCTVC:

Joint Collaborative Team on Video Coding
 LCU:

Largest coding unit
 MPEG:

ISO/IEC Moving Picture Experts Group
 MPM:

Most probable mode
 NS:

No split matching rate
 OSG:

Optimized Sobelbased gradient algorithm
 PU:

Prediction unit
 RD:

Rate distortion
 RDOQ:

Ratedistortion optimized quantization
 RMD:

Rough mode decision
 RQT:

Residual quadtree
 SATD:

Sum of absolute transform difference
 SE:

Split error rate
 TU:

Transform unit
 VCEG:

ITUT Video Coding Experts Group
References
 1.
GJ Sullivan, JR Ohm, WJ Han, T Wiegand, Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst Video Technol 22(12), 1649–1668 (2012). doi:10.1109/TCSVT.2012.2221191
 2.
B Bross, WJ Han, JR Ohm, GJ Sullivan, YK Wang, T Wiegand, High Efficiency Video Coding (HEVC) Text Specification Draft 10, in Doc. JCTVCL1003 (rev. 37), JCTVC 13th Meeting of Joint Collaborative Team on Video Coding (JCTVC) of ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 2013
 3.
J Lainema, F Bossen, WJ Han, J Min, K Ugur, Intra coding of the HEVC standard. IEEE Trans Circuits Syst Video Technol 22(12), 1792–1801 (2012). doi:10.1109/TCSVT.2012.2221525
 4.
C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, F Wu, A Highly, Parallel framework for HEVC coding unit partitioning tree decision on manycore processors. IEEE Signal Process Lett 21(5), 573–576 (2014). doi:10.1109/LSP.2014.2310494
 5.
C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, F Wu, Efficient parallel framework for HEVC motion estimation on manycore processors. IEEE Trans Circuits Syst Video Technol 24(12), 2077–2089 (2014). doi:10.1109/TCSVT.2014.2335852
 6.
C Yan, Y Zhang, F Dai, X Wang, L Li, Q Dai, Parallel deblocking filter for HEVC on manycore processor. Electron Lett 50(5), 367–368 (2014). doi:10.1049/el.2013.3235
 7.
C Yan, Y Zhang, F Dai, J Zhang, L Li, Q Dai, Efficient parallel HEVC intraprediction on manycore processor. Electron Lett 50(11), 805–806 (2014). doi:10.1049/el.2014.0611
 8.
S Wang, A Rehman, K Zeng, Z Wang, SSIMinspired Twopass Rate Control for High Efficiency Video Coding (IEEE International Workshop on Multimedia Signal Processing (MMSP), Xiamen, 2015), pp. 19–21
 9.
H Sun, D Zhou, S Goto, A Lowcomplexity HEVC Intra Prediction Algorithm Based on Level and Mode Filtering (IEEE International Conference on Multimedia and Expo (ICME), Melbourne, 2012), pp. 9–13
 10.
H Lei, Z Yang, Fast Intra Prediction Mode Decision for High Efficiency Video Coding, 2nd International Symposium on Computer (Communication, Control and Automation, , Singapore, 2013). doi:10.2991/3ca13.2013.9
 11.
H Zhang, Z Ma, in 13th PacificRim Conference on Multimedia, Singapore, December, 2012. Lecture notes in artificial intelligence, ed. by W Lin, D Xu, A Ho, J Wu, Y He, J Cai, M Kankanhalli, MT Sun, vol. 1114 (Springer, Heidelberg, 2012), p. 157
 12.
TD Silva, LV Agostini, LADS Cruz, Fast HEVC Intra Prediction Mode Decision Based on Edge Direction Information (European Signal Processing Conference (Eusipco), Bucharest, 2012), pp. 27–31
 13.
SL Shen, Z Liu, X Zhang, W Zhao, Z Zhang, An effective cu size decision method for HEVC encoders. IEEE Trans on Multimedia 15(2), 465–470 (2013). doi:10.1109/TMM.2012.2231060
 14.
X Shen, L Yu, J Chen, Fast Coding Unit Size Selection for HEVC Based on Bayesian Decision Rule (Picture Coding Symposium (PCS), Krakow, 2012), pp. 7–9
 15.
H Zhang, Z Ma, Early Termination Schemes for Fast Intra Prediction in HighEfficiency Video Coding (IEEE International Symposium on Circuits and Systems (ISCAS), Melbourne, 2013), pp. 1–5
 16.
H Zhang, Z Ma, Fast intra mode decision for high efficiency video coding (HEVC). IEEE Trans Circuits Syst Video Technol 24(4), 660–668 (2014). doi:10.1109/TCSVT.2013.2290578
 17.
Y Kim, D Jun, S Jung, JS Choi, J Kim, A fast intraprediction method in HEVC using ratedistortion estimation based on Hadamard transform. ETRI J 35(2), 270–280 (2013). doi:10.4218/etrij.12.0112.0223
 18.
AC Tsai, A Paul, JC Wang, JF Wang, Intensity gradient technique for efficient intraprediction in H.264/AVC. IEEE Trans Circuits Syst Video Technol 18(5), 694–698 (2008). doi:10.1109/tcsvt.2008.919113
 19.
Y Zhang, Z Li, B Li, Gradientbased Fast Decision for Intra Prediction in HEVC (IEEE Visual Communications and Image Processing (VCIP), San Diego, 2012), pp. 27–30
 20.
F Pan, X Lin, S Rahardja, K Lim, Z Li, D Wu, S Wu, Fast mode decision algorithm for intra prediction in H.264/AVC video coding. IEEE Trans Circuits Syst Video Technol 15(7), 813–822 (2005). doi:10.1109/TCSVT.2005.848356
 21.
W Jiang, H Ma, Y Chen, Gradient Based Fast Mode Decision Algorithm for Intra Prediction in HEVC, in International Conference on Consumer Electronics (Communications and Networks (CECNet), Yichang, 2012), pp. 21–23
 22.
M Jamali, S Coulombe, F Caron, Fast HEVC Intra Mode Decision Based on Edge Detection and SATD Costs Classification (Data Compression Conference (DCC), Snowbird, 2015), pp. 7–9
 23.
A BenHajyoussef, T Ezzedine, A Bouallegue, Fast gradient based intra mode decision for high efficiency video coding. Int J Emerg Trends Technol Comput Sci 3(3), 223–228 (2014)
 24.
A BenHajyoussef, T Ezzedine, A Bouallegue, Optimized Intra Mode Decision for High Efficiency Video Coding (International Conference on Image Analysis and Processing (ICIAP), Genoa, 2015), pp. 7–11
 25.
B Bross, WJ Han, JR Ohm, GJ Sullivan, T Wiegand, WD4: Working Draft 4 of HighEfficiency Video Coding, in Doc. JCTVCF803_d6 (rev. 6), JCTVC 6th Meeting of Joint Collaborative Team on Video Coding (JCTVC) of ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 2011
 26.
HEVC reference model. https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/. Accessed 06 Aug 2016
 27.
Y Piao, J Min, J Chen, Encoder Improvement of Unified Intra Prediction, Doc. JCTVCC207 (JCTVC 3rd Meeting, Guangzhou, 2010), pp. 7–15
 28.
L Zhao, L Zhang, X Zhao, Further Encoder Improvement of Intra Mode Decision (Doc. JCTVCD283, in JCTVC 4th Meeting, Daegu, 2011), pp. 20–28
 29.
L Zhao, L Zhang, S Ma, D Zhao, Fast Mode Decision Algorithm for Intra Prediction in HEVC (Visual Communications and Image Processing (VCIP), Tainan City, 2011), pp. 6–9
 30.
F Bossen, Common HM Test Conditions and Software Reference Configurations (Doc. JCTVCL1100, in JCTVC 13th Meeting, Genova, 2013), pp. 14–23
 31.
G Bjontegaard, Calculation of Average PSNR Differences Between RD Curves (Doc. VCEGM33, in ITUT VCEG 13th Meeting, Austin, 2001), pp. 2–4
Funding
No funding sources were available for these research works.
Authors’ contributions
ABH, TE, and AB conceived and designed the research. ABH performed the experiments. ABH and TE analyzed the data. ABH and TE wrote and edited the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 HEVC
 Intra prediction
 Preprocessing
 Image gradient
 Sobel
 Prewitt