 Research
 Open
 Published:
CU splitting early termination based on weighted SVM
EURASIP Journal on Image and Video Processingvolume 2013, Article number: 4 (2013)
Abstract
High efficiency video coding (HEVC) is the latest video coding standard that has been developed by JCTVC. It employs plenty of efficient coding algorithms (e.g., highly flexible quadtree coding block partitioning), and outperforms H.264/AVC by 35–43% bitrate reduction. However, it imposes enormous computational complexity on encoder due to the optimization processing in the efficient coding tools, especially the rate distortion optimization on coding unit (CU), prediction unit, and transform unit. In this article, we propose a CU splitting early termination algorithm to reduce the heavy computational burden on encoder. CU splitting is modeled as a binary classification problem, on which a support vector machine (SVM) is applied. In order to reduce the impact of outliers as well as to maintain the RD performance while a misclassification occurs, RD loss due to misclassification is introduced as weights in SVM training. Efficient and representative features are extracted and optimized by a wrapper approach to eliminate dependency on video content as well as on encoding configurations. Experimental results show that the proposed algorithm can achieve about 44.7% complexity reduction on average with only 1.35% BDrate increase under the “random access” configuration, and 41.9% time saving with 1.66% BDrate increase under the “low delay” setting, compared with the HEVC reference software.
1. Introduction
High definition (HD) and ultrahigh definition (UHD) video contents have become increasingly popular worldwide, thus the demand of video compression technologies that can provide higher coding efficiency over HD/UHD videos can be envisioned in near future. In view of this, high efficiency video coding (HEVC) standard is being developed by the Joint Collaborative Team on Video Coding [1], which is established by the ITUT Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. HEVC outperforms H.264/AVC high profile by 35–43% bitrate reduction at the same reconstructed video quality [2]. HEVC inherits the wellknown blockbased hybrid coding scheme [3] used by previous coding standards, e.g., H.264/AVC, and extends the framework by introducing highly flexible quadtree coding block partitioning. The quadtree coding block partitioning consists of newly brought concepts of coding unit (CU), prediction unit (PU), and transform unit (TU). CU is the basic unit of region splitting used for inter/intra coding, which extends the traditional concept of macroblock (MB) based on a hierarchical structure with block size varying from 64 × 64 to 8 × 8 pixels. A CU is allowed to recursively be split into four smaller CUs of equal size. In this manner, a picture is represented by a contentadaptive coding tree structure comprised of CU blocks with different sizes. PU is the basic unit used for prediction process in a rectangular shape. One PU can be encoded with one of the modes in candidate set, which is similar to MB mode of H.264/AVC in spirit. The pixels in one PU share prediction information, e.g., modes, motion vectors (MV), and reference index. TU is the basic unit for transform and quantization. TU is defined in a similar way as CU, and its size varies from 4 × 4 to 32 × 32. As reported in [4, 5], the flexible data structure representation (extending the MB size up to 64 × 64) introduced over 10% bitrate saving in comparison with the 16 × 16based configuration in H.264/AVC, since the flexibility of block partitioning can effectively deal with the diversity of picture content.
However, the flexibility of block partitioning of HEVC imposes significant computational burden on encoder during seeking of the optimal combinations of CU, PU, and TU sizes. Thus, it is crucial for practical implementation of the new standard to reduce the complexity while maintaining the coding performance. Researches on accelerating the encoder of HEVC test model (HM) are emerging. A fast intra mode decision algorithm [6] was proposed, which made use of the direction information of the neighboring blocks to reduce the number of directions taking part in rate distortion optimization (RDO) process. To reduce the computational complexity of TU size selection, a fast algorithm for residual quadtree mode decision was proposed in [7]. Besides, the depthfirst decision process for TU size selection in HM was replaced by a mergeandsplit decision process, which also reduces unnecessary computation by using the inheritance property of zeroblocks and early termination schemes for nonzero blocks.
In this article, we focus on CU size selection for HEVC. A contentbased fast CU decision algorithm was developed for HEVC TMuC (test model under consideration) [8], which analyzed the ratio of utilized CUs to total number of CUs in different depth in frame level and skipped the rarely used CUs with specified depths. Information of neighboring and colocated CUs was used to skip CUs in unnecessary depth in CU level. The algorithm investigated temporal and spatial correlations of CU depth, and designed different thresholds to control the number of CU depths to be evaluated. However, the correlations were data dependent and the ratio was affected by encoding configurations, such as the hierarchical depth in hierarchical prediction structure. Spatial correlation of CU depth as well as the probability that neighboring CUs were SKIP mode was considered in [9] to design an adaptive weighting factor, which was used to adjust the threshold in early terminating the following RD calculations of the current CU. In [10], a method for complexity controlling was proposed by limiting the number of coding decision tests and comparisons according to temporal correlations. All these related works explored the spatial correlations and/or temporal correlations of CU depth to eliminate specific CU depths with a trivial impact on RD performance. However, they were not robust enough due to diversity of the content. It is necessary to consider more statistics so as to get a more accurate and stable model to simplify the CU splitting.
In the field of accelerating the encoder of H.264/AVC as well as its extensions, various properties were investigated and employed to simplify mode decision. A nearly sufficient condition for early zeroblock detection is constructed based on the analysis of prediction error to speed up the motion estimation of H.264/AVC JM reference software in [11]. It indicated that prediction error offered a valuable clue about encoder acceleration. Spatial and temporal correlations were exploited to predict the skip mode [12] to reduce encoder complexity. In [13, 14], distribution of MV in an MB was chosen as a feature to predict the optimal mode other than performing exhaustive search over all modes. A hierarchical algorithm proposed in [15] categorized all type of modes into three levels which were triggered on by evaluating SAD (which is between current MB and its colocated MB), highfrequency energy in DCT domain, and RD cost of mode P8 × 8. In [16], a fast mode decision algorithm named motion activitybased mode decision was proposed. It classified MBs into different classes by predefined thresholds and motion activity. Each class corresponded to different number of modes to be checked. Tiesong et al. [17] projected encoding modes onto a 2D map and an optimal 2D map was predicted using spatial and temporal information. Then, a prioritybased mode candidate list was constructed based on the optimal 2D map and mode decision was performed starting with the most important mode in the candidate list with early termination conditions. In such a way, the number of modes to be evaluated was reduced and acceleration was achieved. Changsung and Kuo [18] presented a featurebased fast inter/intra mode decision algorithm. This algorithm computed three features regarding spatial and temporal correlations with which to determine inter or intra mode to use. The feature space were partitioned into three regions, i.e., riskfree, risktolerable, and riskintolerable regions by checking the RD loss due to wrong mode decision and the probability distribution of inter/intra modes. Depending on the region, mechanisms with different complexity were applied for final mode decision. MartinezEnriquze et al. [19] analyzed the conditional pdfs for every mode and estimated the RD cost to decide the optimal mode. A fast stereo video encoding algorithm based on hierarchical twostage neural network was proposed in [20]. Local properties of input data and predicted error were extracted as the input feature to train a neural network which was designed to predict the optimal partition mode. SVM were also introduced in the study of fast mode decision [21, 22]. However, MBs were treated equally in the classification problem, and the RD performance of an MB was ignored. In general, these works exploited various moderelated features to predict the optimal mode or reduce the number of modes to be evaluated. The features included spatial and temporal correlations, the gradient or highfrequency energy, the RD cost of specific mode, motion activity, and local properties, such as the prediction error or SAD/sum of absolute transformed differences (SATD).
As shown in the previous researches, CU size selection process applying RD optimization can be unacceptably timeconsuming for practical implementation, which will be further analyzed in Section 2. To solve this problem, we propose a method utilizing machine learning to accelerate the CU size selection process. With properly modeling the problem and applying machine learning algorithm, our method can accurately predict the optimal decision on CU splitting instead of exhaustive searching over all possibilities. In order to derive a more accurate model to predict the CU splitting decision, RD difference is introduced as weights in the SVM training procedure to alleviate the RD performance degradation due to misclassification. Furthermore, various features are extracted from input video as well as earlier encoded data and an optimal feature subset is derived by a wrapper feature selection algorithm.
The rest of the article is organized as follows. We briefly go through CU size selection process of HM, and present the motivation of the proposed algorithm in Section 2. In Section 3, we elaborate the modeling of the CU splitting problem and its solution based on a machine learning algorithm, i.e., SVM. Experimental results in Section 4 demonstrate the effectiveness of the proposed algorithm, and Section 5 concludes the article.
2. CU size optimization in HM
To adapt to the diversity of picture content, flexible quadtree coding block partitioning is adopted into HEVC which enables the use of CU, PU, and TU. The concept of CU is analogous to MB in pervious standards, e.g., H.264/AVC. It is the basic unit for intra/inter coding and is always square in shape. Pictures are divided into many largest CUs (LCUs), and each LCU can be splitting into four equalsized CUs which can be further recursively split up to the maximal allowable hierarchical depth. In such a manner, the LCU is constructed as a quadtree of CU(s) with different size as it shown in Figure 1. At leaf node of the quadtree, the CU can be encoded in SKIP, inter, or intra mode. The partitioning size of SKIP mode is 2N × 2N, which means that the PU size of SKIP mode equals to CU size; the CU encoded in inter mode can be treated as one PU or partitioned into several PUs, which is specified by partitioning mode: Part_2N × 2N, Part_2N × N, Part_N × 2N, (Part_N × N), Part_2N × n U, Part_2N × n D, Part_n L × 2N, and Part_n R × 2N; and the CU in intra mode can be treated as one PU with size of 2N × 2N, or partitioned into four N × N PUs. A simple example of PUs in one CU is shown in Figure 1, as highlighted by the green square. PU corresponding to different partition size is the basic unit to carry the prediction information. In order to match the boundaries of real objects in a picture, the shape of PU is not restricted to being square, e.g., 2N × N is allowed. TU is defined for the transform and quantization process. The shape of TU depends on PU. When PU is square, TU is also square and its size varies from 4 × 4 to 32 × 32 luma samples. When PU is nonsquare, TU is also nonsquare and takes a size of 32 × 8, 8 × 32, 16 × 4, or 4 × 16 luma samples. One CU may contain one or more PUs. As well one CU may contain one or more TUs which are arranged in quadtree structure as shown in Figure 1.
As explained in the previous paragraph, one LCU can be coded into a rather complex quadtree to adapt to various video contents. Furthermore, CUs with different depths may be coded in different prediction modes, different partitioning modes, and different transform sizes. To derive the optimal CUlevel coding parameters, an exhaustive search method is employed by evaluating the RD costs of all possible combinations of CU size, PU size, and TU size. The RDO of CU size is illustrated in Figure 2. It needs a total of 85 RD calculations when CU size varies from 64 × 64 to 8 × 8. Obviously, such RDbased optimization method introduces significant complexity on encoder. Actually, it is unnecessary to do an exhaustive search over all possible CU sizes, since there exist some CU sizes that do not result in much rate distortion improvement and it is possible to accelerate the encoder by early terminating the CU splitting decision process. As shown in Figure 3, “flat” or “homogenous” regions, e.g., the floor, are more likely to be encoded in large CUs. Areas containing moving objects or objects boundaries, e.g., the net and the basketball, are usually split into small CUs. Motivated by this observation, we model CU splitting decision as a binary classification problem.
3.CU splitting early termination algorithm based on weighted SVM
3.1. Problem formulation
As the flexible representation of coding data introduces heavy burden on the encoder, we propose to early terminate CU splitting to avoid unnecessary trials. We model CU splitting as a binary classification problem, (i.e., a CU that is not split into four subparts is assigned a label +1, otherwise −1 is assigned,) and tackle the classification problem by SVM [23]. As a widely used machine learning algorithm, SVM is based on the idea of structural risk minimization (SRM) and it has successfully been applied to a number of realworld problems, such as face recognition, text categorization, and object detection in machine vision. The main idea behind SVM is to derive a unique separating hyperplane that maximizes margin between two classes. Given l training data points
where {x _{ i }, y _{ i }} is the i th training sample, i.e., i th CU. x _{ i } is the input feature vector and y _{ i } is the class label indicating CU splitting or not. The membership decision rule is based on the function defined in Equation (2), where f(x) represents the discriminant function associated with the hyperplane.
where φ(·) is a nonlinear operator that maps the input x _{ i } into a higherdimensional space and it is the kernel function.
Mathematically, this hyperplane can be constructed by minimizing the following cost function
with constraints
For a nonseparable case, the classification problem is generalized by introducing slack variables ξ _{ i } and a userdefined regularization parameter C. Then the classification problem is to minimize the following quantity
subject to
The modified cost function in Equation (5) is the socalled structural risk, which balances the empirical risk (i.e., the training errors reflected by the second term) with model complexity (the first term) [24]. It has been proven that the solution to the optimization problem of Equation (5) under the constraint of Equation (6) is given by the saddle point of Lagrange function
where α _{ i } and β _{ i } are Lagrange multipliers associated with the constraints in Equation (6).
The Lagrange multipliers are solved as maximizing
subject to
where K(x _{ i }, x) = ϕ^{T}(x _{ i })ϕ(x). The decision function can equivalently be expressed as
It is obvious from Equation (10) that the α _{ i } associated with training point x _{ i } expresses the strength with which that point is embedded in the final decision function. Notice that the nonlinear mapping φ(·) never appears explicitly in the training or the decision. In general, the kernel takes the form of linear, polynomial, radial basis function (RBF), or sigmoid. In this article, we use the RBF kernel, since it can handle the case when the relation between class labels and the input vector is nonlinear as well as linear. Furthermore, the model complexity of the RBF kernel is lower than polynomial, and RBF kernel has fewer numerical difficulties [25].
3.2. Proposed CU splitting early termination algorithm
The proposed CU splitting early termination algorithm is shown in Figure 4. At each CU depth, the encoder first performs rate and distortion calculation of SKIP mode and inter mode with Part_2N × 2N (denoted as inter 2N × 2N mode thereafter), meanwhile extracts required features, i.e., input vector x of SVM during the evaluation procedure. Then, an offline trained SVM CU splitting model is loaded, which predicts the class label of the current CU according to the extracted input features. Based on the predicted class label, the encoder will decide whether to perform RD trials on CU splitting. The offline trained SVM model is optimized based on SVM procedure with weighting on training samples. The weights are proposed as the difference of RD costs due to misclassifications. It is obvious that as long as the CU splitting predictor is accurate, early terminating RD trials on CU splitting can reduce a lot of computational complexity while maintaining RD performance.
3.3. CU splitting early termination algorithm based on weighted SVM
3.3.1. Offline training and weights generation
In the field of machine learning, accuracy is one of the most important measurements for classification algorithms. However, in this scenario, not only the ratio of correct classification, but also the loss of RD performance introduced by misclassifications is important.
There exist some CUs that the RD cost difference between four subCUs coding and one CU coding are almost the same. Misclassification of such CUs results in negligible RD degradation. On the contrary, for CUs that four subCUs coding outperforms one CU coding greatly, misclassification does lead to much RD loss. Obviously, different CUs are of different importance. It is improper to treat samples with different RD performance equally in the training process, and the optimal hyperplane will be deviated by those “unimportant” samples, i.e., these samples are outliers. The desired SVM predictor should predict class label as accurate as possible and keep RD loss as low as possible. Based on this observation, we suggest introducing weights into the SVM training process, i.e., assigning different weights to training samples.
where the weights are defined as the percentage of RD cost increased due to misclassification, which is
where C _{ i }(s) and C _{ i }(n) are RD cost of splitting the CU into four subCUs and RD cost of nonsplitting CU, respectively. CU with little difference of RD cost is assigned a small weight, while CU with large difference of RD cost is assigned a large weight. Note that the weights are only needed in the training procedure, and not needed anymore when the trained model is used to predict the class label in the encoding process.
Then the standard SVM optimization problem in Equation (5) becomes
subject to
The upper bounds of α _{ i } are bounded by dynamical boundaries C*W _{ i } instead of a constant value C. Then the CUs with larger difference when encoded into one CU and into four subCUs will affect the optimal hyperplane more by introducing a larger weight W _{ i }.
3.3.2. Feature selection
We introduce several representative features related to CU splitting. Selecting effective and relevant features is crucial for classification. Good features help reduce training time as well as utilization time, defy the curse of dimensionality to improve prediction performance, and reduce storage requirements [26]. To select the features that are useful to build a good predictor of SVM, there are usually two types of feature selection approaches, filters and wrapper approaches. In this article, we suggest using a wrapper method based on Fscore [27]. Filter methods based on correlation or mutual information ranking [21] are easy to implement; however, selecting the most relevant variables is usually suboptimal for building a predictor, particularly if the variables are redundant. Wrapper method assesses a subset of features according to their usefulness to a given predictor, which is better in this scenario. However, the number of subsets is extremely large as the number of features increase, and thus exhaustive search is not proper. Therefore, we propose to rank all features first by Fscore and perform a greedy search based on the ranked results. Fscore, as define in Equation (16), is a simple metric that measures the discrimination of two sets of real numbers.
where ${\stackrel{\u2015}{x}}_{i},{\stackrel{\u2015}{x}}_{i}^{+},{\stackrel{\u2015}{x}}_{i}^{}\text{.}$ are the average of the i th feature of the input vector x of the whole, positive, and negative training samples, respectively. x _{ k,i } ^{+} is the i th feature of the k th positive sample and x _{ k,i } ^{−} is the i th feature of the k th negative sample. n _{+} and n _{−} are the total numbers of positive and negative training samples. The larger the Fscore is, the more likely this feature is more discriminative. Fscore is easy to calculate and is friendly to be coupled with SVM training process. The procedure of the wrapper approach is summarized in the following four steps:

(1)
Collect training samples by running the HEVC reference software HM6.0.

(2)
Calculate Fscore of every feature in the training set and sort the features in descending order according to Fscore.

(3)
Start from one feature formed subset F (only one feature with the highest Fscore).

(a)
Randomly divide the training set into S _{tr} and S _{cv}.

(b)
Train SVM model using the S _{tr}.

(c)
Predict S _{cv} and get the cross validation (CV) (based on accuracy rate).

(d)
Add the feature with the highest Fscore in the rest to subset F and repeat steps in (3) until all features are evaluated or early terminate this process by defining the maximum feature number.

(a)

4)
Find the optimal feature subset with the lowest validation error.
To setup a rich feature set, diverse features are introduced and evaluated. Furthermore, it is possible to eliminate the dependency on video content by considering as many features as possible and then optimizing the feature subset. The features we consider as potential candidates are summarized as follows.

Prediction errorrelated features, such as SATD and CBF, denoted as x _{std}, x _{vrs}, and x _{cbf}. x _{std} is defined as the SATD between prediction and original pixel values, and x _{vrs} is the variance of four SATDs of subblock. x _{cbf} is the coded block flags (CBF) of the inter 2N × 2N mode. CBF indicates the complexity of the predicted error under specific quantization parameters (QP). As discussed in [11–15], these features are correlated with CU partitioning.

CU depth information of the context [8], denoted as x _{sl}, x _{sa}, and x _{tp}. x _{sl} and x _{sa} are the CU depth of leftneighboring and aboveneighboring CU, respectively. x _{tp} is the CU depth of the colocated CU. Since there is substantial correlation in spatial and temporal domain of video signal, such context provides very good information.

Gradient magnitude of current CU [18], denoted as x _{gm}. It is the summation of gradient of every pixel in the current CU by applying Sobel operator, which reveals the flatness of the CU.

Motion consistencyrelated feature [13, 14], denoted as x _{mc}, which is defined as the variance of the MVs of four subblocks in inter N × N mode. Regions with inconsistent motion activities are more likely to be encoded in small CUs.

RD cost difference between skip and inter 2N × 2N mode, denotes as x _{drc}. If the skip mode is better than inter 2N × 2N, the CU is likely to be background and it maybe not necessary to partition the CU into smaller ones. On the contrary, if inter 2N × 2N mode is better, it may be better to apply smaller partition mode or smaller CU size.

Side information in RD cost, denotes as x _{si}. Small size motion partition provides good RD performance for those blocks with high motion activities or rich in content. However, more bits should be paid to signal the side information. Therefore, the percentages of side information in total RD cost of inter 2N × 2N mode give good indication of optimal CU size.

Hierarchical structurerelated feature, denotes as x _{hrc}. For the hierarchical prediction structure in HEVC, small CU size is preferred for frames with low temporal depth and large CU size is more likely to be optimal for the frames with high temporal depth.
All the abovementioned candidate features are evaluated and an effective feature subset is formed by the proposed wrapper approach based on Fscore. The experimental results on feature selection are presented. Although some of the features are correlated, the wrapper method can select the useful feature to the predictor regardless of correlation, as discussed in [26]. The video sequences we use in feature selection are “Cactus”, “BQMall”, and “FourPeople” and the training samples are collected by running HM6.0 [28] under common test conditions. In Table 1, it presents the Fscores of different features in different CU depths. CBF information x _{cbf} and side information in RD cost x _{si} exhibit relative high Fscore and give good information about CU splitting. In contrast, the Fscore of x _{hrc} is rather low and therefore is excluded from the input vector in the feature selection. Table 2 presents the feature subsets in selection procedure and its corresponding CV. The CV is nearly the same when feature number is greater than five. However, it takes more time to extract the features and the SVM predictor will become more complex as the number of features raises. It is a good choice to set the feature number as five, as shown in Table 2, considering the balance between accuracy and additional complexity introduced by feature extraction and SVM model predictor. The optimized feature subsets are x _{cbf}, x _{si}, x _{tp}, x _{drc}, x _{std}, x _{cbf}, x _{si}, x _{tp}, x _{drc}, x _{std}, and x _{cbf}, x _{si}, x _{tp}, x _{gm}, x _{std} for CU depth zero (CU 64 × 64), one (CU 32 × 32), and two (CU 16 × 16), respectively. Since the optimal feature subsets are different for different CU depths, the proposed CU splitting early termination models are trained separately for different CU depths. The overhead introduced by feature extraction is almost negligible, since most of them can be derived when calculating the RD cost of Skip and inter 2N × 2N modes.
4. Experimental results
4.1. Experimental results on the proposed CU splitting early termination algorithm
To verify the efficiency of the proposed CU splitting early termination algorithm, we conduct comprehensive experiments by comparing the proposed algorithm with HEVC reference software HM6.0. The encoding configuration exactly follows what is recommended in [29] and the test sequences in the experiments cover a variety of content. The sequences we use to train the SVM predictor model are “Cactus”, “BQMall”, and “FourPeople”, denoted as TS1 (training set 1) and they are not used in performance comparison anymore. The offline training process is carried out by the SVM training software [30] and the proposed CU early termination algorithm is incorporated into HEVC reference software HM6.0.
To evaluate the performance of the proposed algorithm, two metrics are used in Tables 3 and 4: the average BDrate (BDBR) [31] difference between the proposed algorithm and HM6.0, and the time reduction ratio which is defined as
where T _{HM} and T _{ p } are the total encoding time of HM6.0 encoder and the proposed encoder, respectively. The actual encoding time is measured on a workstation with a 2.93GHz processor and 8 GB of RAM. In Tables 3 and 4, we present the RD performance and the computational complexity of the proposed algorithm and the anchor under “Random Access, main” and “Low Delay, main” configurations.
Regarding complexity, the proposed algorithm achieves a maximum of 73.7% runningtime reduction with respect to HM6.0 with an average of 44.7% under “Random Access, main” configuration, as shown in Tables 3 and 4. In Table 3, the column of “ΔT” is the average ΔT of 4 QP points. Concerning the RD performance, it loses 1.35% in terms of BDrate on average, and a worst case of 1.8% for sequence “Traffic”. The RD loss is not significant. For the “Low Delay, main” configuration as shown in Tables 3 and 4, the proposed algorithm behaves very similar to the “Random Access, main” case and it reduces the complexity by 41.9% with 1.66% RDRate loss on average. In Table 4, part of the experimental results under different QPs is listed. As can be seen from it, more complexity reduction is achieved in low bitrate scenario (i.e., using high QP values). In such cases, larger CUs are more efficient in RD performance than smaller CUs, and large CUs take a high percentage. The proposed algorithm accurately early terminates the RDO procedures on large CU size and avoids unnecessary RD calculations on small CU size. Therefore, greater complexity reduction can be achieved in low bitrate case than the high bitrate case.
To verify that different training set will not affect the performance of the proposed algorithm, additional experiment is conducted. Three different sequences (“ParkScene”, “BasketballDrill”, and “Johnny”, denoted as TS2) are used to train the offline model which is to be used in the encoding process. The encoding configurations are the same as the previous experiments. The metrics used in Table 5 are the same with that in Table 3. As shown in Table 5, similar RD performance and complexity reduction are derived using a different training set.
Both the weighted SVM training algorithm and the wrapper feature selection algorithm have been designed to provide the ability to generalize. First of all, the weighted SVM is based on SRM principle as opposed to traditional empirical risk minimization principle employed by conventional learning algorithms. SRM minimizes an upper bound on the expected risk, which equips the SVM with great ability to generalize. Introducing RD difference as weights eliminates the influence of outliers. In other words, those training samples with little RD performance degradation due to misclassification are “almost excluded” by assigning small weights and more attention is paid to “important” samples. Second, large number of relevant features are evaluated and assessed. Diversity of features lowers the opportunity of dependence on training set. The feature selection algorithm chooses optimal feature subset based on CV error to ensure that the optimal subset is not dependent on a specific training set. Therefore, the algorithm performs stably.
4.2. Additional overhead of SVM classification
SVM classification imposes additional computational complexity on encoder. Some experiments are conducted to investigate the overhead. Table 6 presents the total time to predict class labels in column “Total SVM” and the total time to encode sequences with the proposed algorithm in column “Encode Time”. As it shown in column “percentage”, the computational overheads are not critical especially in the low bitrate cases, less than 5%. It costs a little more time to predict the class labels of CU 16 × 16 as there are more 16 × 16 CUs.
5. Conclusion
In this article, a CU splitting early termination algorithm is proposed. The CU splitting optimization in HEVC is formulized as a binary classification problem and is solved by support vector classification. In order to maintain the RD performance of CU splitting early termination algorithm, RD loss due to misclassification is introduced as weighting factor of training samples in the offline training procedure, with which the training method pays special attention to CUs which are prone to degrade RD performance when using a suboptimal partition. Furthermore, diverse features are considered such as the correlation between CUs both in spatial and temporal domains, prediction errors, motion activities, and RD cost of modes. To select the optimal feature subset, a wrapper feature selection approach is carried out. It embeds the model training into the selection process and simple greedy search is performed based on Fscore ranking. In such a way, the proposed algorithm performs well and stably across different configurations and various video contents. Since the CU splitting early termination model is trained offline and the optimal feature subset is small, the proposed algorithm is computationally simple. Demonstrated by the experimental results, the proposed algorithm can achieve 44.7% reduction in computational complexity with 1.35% BDRate increase in “Random Access, main” configuration and 41.9% complexity reduction with 1.66% BDRate increase in “Low Delay, main” configuration.
References
 1.
ITUT SG16 Q6 and ISO/IEC JTC1/SC29/WG11, 2010 ITUT SG16 Q6 document VCEGAM91 and ISO/IEC JTC1/SC29/WG11 document N11113: Joint Call for Proposals on Video Compression Technology. ITUT SG16 Q6 and ISO/IEC JTC1/SC29/WG11, Kyoto, Japan;
 2.
Bin L, Sullivan GJ, Jizheng X: Comparison of compression performance of HEVC working draft 5 with AVC high profile. ITUT/ISO/IEC Joint Collaborative Team on Video Coding (JCTVC) document JCTVCH0360, in 8th Meeting of JCTVC, San Jose, USA; 2012.
 3.
Bross B, Han WJ, Sullivan GJ, Ohm JR, Wiegand T: High efficiency video coding (HEVC) text specification draft 6. ITUT/ISO/IEC Joint Collaborative Team on Video Coding (JCTVC) document JCTVCH1003, in 8th Meeting of JCTVC, San Jose,USA; 2012.
 4.
Kim J, Kim M, Kim HY, Sato K, Shen X, Yu L, Choi K, Jang ES, Bross B, Han WJ, Jo JK, Park SN, Sim DG, Oh SJ: JCTVC TE9: Report on large block structure testing. ITUT/ISO/IEC Joint Collaborative Team on Video Coding (JCTVC) document JCTVCC067, in 3rd Meeting of JCTVC, Guangzhou, China; 2010.
 5.
Qualcomm Inc: Video Coding Using Extended Block Sizes, ITUT Q.6/SG16 document COM16C123E. VCEG 36th Meeting, Geneva, Switzerland; 2009.
 6.
Liang Z, Li Z, Siwei M, Debin Z: Fast mode decision algorithm for intra prediction in HEVC. 2011 IEEE Visual Communications and Image Processing (VCIP), Tainan; 2011:14.
 7.
SuWei T, HsuehMing H, YiFu C: Fast mode decision algorithm for residual quadtree coding in HEVC. 2011 IEEE Visual Communications and Image Processing (VCIP), Tainan; 2011:14.
 8.
Jie L, Lei S, Ikenaga T, Sakaida S: Content based hierarchical fast coding unit decision algorithm for HEVC. 1st edition. 2011 International Conference on Multimedia and Signal Processing (CMSP), Guilin, Guangxi; 2011:5659.
 9.
Jongho K, Seyoon J, Sukhee C, Jin Soo C: Adaptive coding unit early termination algorithm for HEVC. 2012 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV; 2012:261262.
 10.
Correa G, Assuncao P, Agostini L, da Silva Cruz LA: Complexity control of high efficiency video encoders for powerconstrained devices. IEEE Trans Consum Electron 2011, 57(4):18661874. 10.1109/TCE.2011.6131165
 11.
Lee YM, Tsai YJ, Lin Y: Improved motion estimation using early zeroblock detection. EURASIP J Image Video Process 2008, 2008: 524793. 10.1155/2008/524793
 12.
ByungGyu K: Novel intermode decision algorithm based on macroblock (MB) tracking for the Pslice in H.264/AVC video coding. IEEE Trans Circuits Syst Video Technol 2008, 18(2):273279. 10.1109/TCSVT.2008.918121
 13.
TienYing K, ChenHung C: Fast variable block size motion estimation for H.264 using likelihood and correlation of motion field. IEEE Trans Circuits Syst Video Technol 2006, 16(10):11851195. 10.1109/TCSVT.2006.883512
 14.
Zhi L, Liquan S, Zhaoyang Z: An efficient inter mode decision algorithm based on motion homogeneity for H.264/AVC. IEEE Trans Circuits Syst Video Technol 2009, 19(1):128132. 10.1109/TCSVT.2008.2005804
 15.
Yu ACW, Martin GR, Heechan P: Fast intermode selection in the H.264/AVC standard using a hierarchical decision process. IEEE Trans Circuits Syst Video Technol 2008, 18(2):186195. 10.1109/TCSVT.2007.913970
 16.
Huanqiang Z, Canhui C, KaiKuang M: Fast mode decision for H.264/AVC based on macroblock motion activity. IEEE Trans Circuits Syst Video Technol 2009, 19(4):491499. 10.1109/TCSVT.2009.2014014
 17.
Tiesong Z, Hanli W, Kwong S, Kuo CCJ: Fast mode decision based on mode adaptation. IEEE Trans Circuits Syst Video Technol 2010, 20(5):697705. 10.1109/TCSVT.2010.2045812
 18.
Changsung K, Kuo CCJ: Featurebased intra/inter coding mode selection for H.264/AVC. IEEE Trans Circuits Syst Video Technol 2007, 17(4):441453. 10.1109/TCSVT.2006.888829
 19.
MartinezEnriquez D, JimenezMoreno A, DiazdeMaria F: An adaptive algorithm for fast inter mode decision in the H.264/AVC video coding standard. IEEE Trans Consum Electron 2010, 56(2):826834. 10.1109/TCE.2010.5506008
 20.
JuiChiu C, WeiChih C, LienMing L, KuoFeng H, WenNung L: A fast H.264/AVCbased stereo video encoding algorithm based on hierarchical twostage neural classification. IEEE J Sel Topics Signal Process 2011, 5(2):309320. 10.1109/JSTSP.2010.2066956
 21.
ChenKuo C, WeiHau P, Chiuan H, ShinShan Z, ShangHong L: Fast H.264 encoding based on statistical learning. IEEE Trans Circuits Syst Video Technol 2011, 21(9):13041315. 10.1109/TCSVT.2011.2147250
 22.
Jaeil K, Munchurl K, Sangjin H, Injoon C, Changsub P: Blockmode classification using SVMs for early termination of block mode decision in H.264MPEG4 part 10 AVC. Seventh International Conference on Advances in Pattern Recognition, ICAPR'09, Kolkata; 2009:8386.
 23.
Corinna C, Vapnik V: Supportvector networks. Mach Learn 1995, 20(3):273297. 1995
 24.
Scholkopf B, Burges C, Smola A: Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge, MA; 1999.
 25.
Hsu CW, Chang CC, Lin CJ: A practical guide to support vector classification, Tech. rep. Department of Computer Science, National Taiwan University; 2003. http://www.csie.ntu.edu.tw/cjlin/guide/guide.pdf
 26.
Isabelle G, André E: An introduction to variable and feature selection. J Mach Learn Res 2003, 3: 11571182.
 27.
Chen YW, Lin CJ: Combining SVMs with Various Feature Selection Strategies. Springer, New York; 2006.
 28.
HM Software. https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM6.0
 29.
Bossen F: Common test conditions and software reference configurations, ITUT/ISO/IEC Joint Collaborative Team on Video Coding (JCTVC) document JCTVCH1100. 8th meeting o JCTVC, San Jose, USA; 2012.
 30.
ChihChung C, ChihJen L: LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2011, 2(27):127.
 31.
Bjontegaard G: Improvements of the BDPSNR model, ITUT SG16/Q6 document VCEGAI11. 35th VCEG Meeting, Germany, Berlin; 2008.
Acknowledgements
This work is supported by the National Basic Research Program of China (973) under Grant No. 2009CB320903 and Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP) No. 20120101110032.
Author information
Additional information
Competing interests
The authors declare that they have on competing interests.
Authors’ original submitted files for images
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 HEVC
 fast coding unit decision
 classification
 SVM
 feature selection.