Low-complexity depth map compression in HEVC-based 3D video coding

In this paper, a low-complexity algorithm is proposed to reduce the complexity of depth map compression in the high-efficiency video coding (HEVC)-based 3D video coding (3D-HEVC). Since the depth map and the corresponding texture video represent the same scene in a 3D video, there is a high correlation among the coding information from depth map and texture video. An experimental analysis is performed to study depth map and texture video correlation in the coding information such as the motion vector and prediction mode. Based on the correlation, we propose three efficient low-complexity approaches, including early termination mode decision, adaptive search range motion estimation (ME), and fast disparity estimation (DE). Experimental results show that the proposed algorithm can reduce about 66% computational complexity with negligible rate-distortion (RD) performance loss in comparison with the original 3D-HEVC encoder.


Introduction
Three-dimensional video standard has been recently finalized by the Joint Collaborative Team on 3D Video Coding (JCT-3V), and the high-efficiency video coding (HEVC)-based 3D video coding (3D-HEVC) is developed as an extension of HEVC [1][2][3]. For the efficient compression of 3D video data with multiview texture video and depth map, a number of coding tools are investigated to exploit in 3D-HEVC such as inter-view motion prediction and disparity-compensated prediction [4]. This technique achieves the highest possible coding efficiency in multiview texture video compression, but it results in extremely large encoding time with small increase of depth coding efficiency which obstructs it from 3D-HEVC practical use. Therefore, it is necessary to develop a fast algorithm that can reduce the complexity of multiview depth map compression with minimal loss of coding efficiency in a 3D-HEVC encoder.
Recently, a number of approaches have been made to explore fast algorithms in depth map coding. A motion vector (MV) sharing algorithm is proposed in [5] to reduce the complexity of depth map coding. An early termination algorithm for depth coding is introduced in [6] based on the detection of the differences between the current macroblock (MB) and the co-located MBs in texture video. An intra prediction algorithm for depth coding is presented in [7] to reduce the number of candidate prediction directions for smooth regions. A lowcomplexity mode decision and motion estimation algorithm is proposed in [8] to take advantage of the texture motion information which may be usefully exploited in the encoding of the corresponding depth map. A novel depth and depth-color codec is proposed in [9] based on a shape-adaptive wavelet transform and an explicit encoding of the locations of major depth edges. A depth map compression algorithm [10] uses the corresponding texture video as side information to improve the coding performance. A fast motion search and mode decision algorithm is proposed in [11] to speed up the motion estimation (ME) stages of the depth coding process, and a fast depth map method is proposed in our previous work [12] based on sharing motion vector and SKIP mode from the texture video to reduce complexity of depth coding. All these algorithms are efficient in reducing computational complexity with acceptable quality degradation in coding performance for previous video coding standards. However, these algorithms are not directly applicable to the new standard 3D-HEVC, where high computational complexity is intrinsically related to the use of new prediction coding structures for the 3D-HEVC encoder.
To this end, several fast algorithms [13][14][15][16] have been proposed for the 3D-HEVC encoder to reduce the complexity of depth map coding. A fast mode decision algorithm is proposed in [13] to early terminate the unnecessary prediction modes full rate-distortion (RD) cost calculation in 3D-HEVC. A low-complexity depth map coding algorithm based on the associated texture video is introduced in [14] to reduce the number of wedgelet candidates. A fast wedgelet partitioning algorithm is proposed in [15] to simplify the intra mode decision in 3D-HEVC depth map coding. A content adaptive complexity reduction algorithm is proposed in [16] to reduce the 3D-HEVC coding complexity by utilizing the correlations between the base view and the dependent view. The aforementioned algorithms are well developed for depth map coding achieving significant time savings in 3D-HEVC. However, the coding information correlations between the depth map and the texture video are not fully studied. This situation results in a limited time saving. There is still some room for further reduction of computational complexity of the 3D-HEVC depth map compression.
The depth map represents a 3D scene information, which has the same content with similar characteristic of the texture video. Therefore, there is a high correlation among motion information from depth map and texture video. In this paper, we propose a lowcomplexity depth compression algorithm using the correlation among motion information from depth map and texture video. The proposed algorithm consists of three approaches: early termination mode decision, adaptive search range ME, and fast disparity estimation (DE) for depth map coding. Experimental results illustrate that the proposed algorithm can significantly reduce the computational complexity of depth map compression while maintaining almost the same coding performance in comparison with the original 3D-HEVC encoder.
The rest of the paper is organized as follows. Section 2 analyzes the property of depth map and the correlation among motion information from depth map and texture video. A low-complexity depth coding algorithm base on adaptive search range ME and fast DE is presented in Section 3. Experimental results and conclusions are given in Sections 4 and 5, respectively.

Observations and analysis
In the test model of 3D-HEVC, the variable sizes the ME and DE to exploit both temporal and view correlation within temporally successive pictures and neighboring views. The coding unit (CU) is the basic unit of region splitting used for 3D-HEVC similar to macroblock in H.264/AVC, which has a hierarchical quadtree structure having variable sizes from 64 × 64 to 8 × 8. The partition unit (PU) is the basic unit used for 3D-HEVC inter/intra prediction processes. At each treeblock, 3D-HEVC performs ME and DE with different PU sizes including 2N × 2N, 2N × N, N × 2N, and N × N.
Similar to HEVC for a treeblock, the mode decision process in 3D-HEVC is performed using all the possible prediction modes to find the one with the least RD cost using a Lagrange multiplier. The RD cost function (J) used in 3D-HEVC is defined as follows: where D specifies the bit cost to be considered for the 3D-HEVC mode decision, SSE is the average difference between the current treeblock and the matching treeblock, and λ is the Lagrange multiplier. However, calculation of the RD cost needs to execute both the ME and DE processes in 3D-HEVC, and these 'try all and select the best' method will result in high computational complexity and limit the use of 3D-HEVC encoders in practical applications. Therefore, low-complexity algorithms, which can reduce the complexity of the ME and DE processes with negligible loss of coding efficiency, are extremely necessary for real-time implementation of 3D-HEVC encoders.
Since the depth map and its associated texture video are both projections of the same scenery from the same viewpoint at the same time instant, the motion characteristics (i.e., block partitioning and corresponding motion vectors) of the depth map and its associated texture video are typically similar. Therefore, a new coding mode motion parameter inheritance (MPI) [4,17], where the data that are already transmitted for the texture video picture can be reused for efficient encoding of the depth map, has been introduced in the 3D-HEVC encoder. This achieves the highest coding efficiency but requires a very high computational complexity. Since the motion vectors of the texture video have quarter-sample accuracy, whereas for the depth map only full-sample accuracy is used, in the inheritance process, the motion vectors are quantized to their nearest full-sample position. In addition, the inherited reference picture shall be the one with the same picture order count (POC) and viewpoint as the reference picture of the co-located block in the texture video picture. If there is no reference picture in the reference lists that satisfies this condition, such a candidate is treated as invalid and it is not inserted to the merge candidate list. However, the coding information correlations between the depth map and texture video are not fully studied. The coding information includes the reference picture, prediction mode, and motion vector. Therefore, the prediction mode of the depth map treeblock is similar to that of the corresponding texture video treeblock. Meanwhile, the homogeneous regions in the depth map have a strong spatial correlation, and thus, spatially neighboring depth map treeblocks have similar coding information. The relationship among the current depth map treeblock, co-located texture video treeblock, and spatially neighboring treeblock is shown in Figure 1. The reference picture in the co-located texture view has the same POC value as the reference picture of current depth map view.
On the basis of these observations, we propose to analyze the depth intra prediction mode using the coding information from the spatial neighboring depth map and the co-located texture video treeblock. The neighboring depth map and the co-located texture video treeblock are described as in Figure 2. D c denotes the current depth map treeblock, D l , D u , D ul , and D ur denote the neighboring treeblocks in the depth map. C col denotes the co-located treeblock in the texture video and C l , C u , C ul , and C ur its left treeblock, up treeblock, upleft treeblock, and upright treeblock, respectively, as shown in Figure 2.
According to the coding information correlation with the mode maps of encoded frames, we define a set of intra mode predictors (P) for depth map treeblock as follows: Based on this predictor set, a mode complexity (C) parameter is defined according to the mode context of the spatial neighboring depth map and the co-located texture video treeblock, and then, the mode characteristic of a depth map treeblock is estimated. The mode complexity of a depth map treeblock is described as follows: where i is the related treeblock in predictors P, β i is the treeblock weight factor of each predictor in Equation 2, and η i is the treeblock mode factor of each predictor.
Only the prediction modes of those available neighboring treeblocks in predictors P will be used. In 3D-HEVC, various prediction mode sizes are used in the mode decision process. The mode factor of each predictor η i can be assigned based on the complexity of each mode as follows: when the predictor i is SKIP mode, merge mode, inter 2N × 2N, and intra 2N × 2N mode, η i is assigned with a small value '1;' when the predictor i is inter 2N × N, inter N × 2N mode, η i is assigned with a medium value '2;' when the predictor i is small-size inter modes, intra N × N mode (depth modeling modes (DMM) and region boundary chain (RBC) mode in the neighboring depth map treeblocks), and DE mode, η i is assigned with a large value '3.' The treeblock weight factors of these nine predictors have an additional property, X i β i ¼ 1. β i is defined according to the effect of related treeblocks on current treeblock. Since treeblocks in the horizontal and vertical directions have a large effect on the current treeblock compared to treeblocks in the diagonal direction, the weight factors β i for the horizontal Figure 1 Co-located texture video and spatial correlations of current depth map treeblock. and vertical treeblocks (D l , D u , C l , and C u ) are set to 0.1, and that of the diagonal direction treeblocks (D ul , D ur , C ul , and C ur ) are set to 0.05. In the case of the colocated texture video treeblock, the treeblock weight factor β Ccol is set to 0.4. Generally, the larger the mode factor, the more complex the treeblock is. According to the value of C, each treeblock can be divided into three types. T 1 and T 2 are set to determine whether a treeblock belongs to the region with the simple mode, normal mode, or complex mode. The criterion is defined as follows: Treeblock ∈ simple mode region where T 1 and T 2 are mode-weight factors. Those threshold settings are crucial for effective depth map compression, and it is always a tradeoff between depth map coding quality and computational complexity reduction. From simulations on various test sequences, it can be found that the optimal threshold for each sequence depends on the sequence content. In order to cope with different texture characteristics of test sequences, extensive simulations have been conducted on eight video sequences to analyze the thresholds for three types of treeblocks. Among these test sequences, Kendo, Balloons, and Newspaper are in 1,024 × 768 resolution, while Undo_Dancer, GT_Fly, Poznan_Street, Poznan_-Hall2, and Shark are in 1,920 × 1,088 resolution, and the 'Shark' and 'Undo_Dancer' sequences are with a large global motion or rich texture, the 'Kendo', 'Balloons', 'Newspaper', and 'Poznan_Street' sequences are with a medium local motion or a smooth texture, and 'Poznan_-Hall2' is a small global motion or a homogeneous texture sequence. The test conditions are as follows: I-B-P view structure; test full-length frames for each sequence; quantization parameter (QP) is chosen with 34, 39, 42, and 45; group of pictures (GOP) size = 8; treeblock size = 64; search range of ME is configured with 64; and contextadaptive binary arithmetic coding (CABAC) is used for entropy coding. Then, we calculated the average thresholds of those eight test sequences. Table 1 shows the accuracies of the proposed algorithm using various thresholds. The accuracies here are defined as the ratio of the number of the simple mode, normal mode, and complex mode, which select the same best modes using the 3D-HEVC encoder as well as the proposed algorithm. It can be seen from Table 1 that when the threshold values are T 1 = 0.8, T 2 = 1.2, the average accuracy of the proposed algorithm achieves more than 93% with a maximum of 97% in the 'Shark' sequence. Based on extensive experiments, T 1 and T 2 are set to 0.8 and 1.2, respectively, which achieve a good and consistent performance on a variety of test sequences with different texture characteristics and motion activities and fixed for each treeblock QP level in 3D-HEVC encoder.
3 Proposed low-complexity depth map compression algorithm

Early termination mode decision
The depth map is usually not the ground truth because existing depth map estimation methods still have difficulties to generate accurate depths at object edges or in areas with less texture. Distortion may occur during depth map estimation, which will result in a noisy depth map (caused by occlusion and areas of low texture), such that it would be inefficient to spend more bits to achieve an accurate representation of the depth map in 3D-HEVC coding. To overcome this problem, this paper proposes an early termination mode decision for 3D-HEVC, which takes into account the correlations between coding information from texture videos and depth maps to speed up the coding process. The depth map content is similar with that of texture video, and thus, the coding modes of texture and depth map are similar. By utilizing the information of the corresponding treeblock in the texture video, the coding information of previously encoded texture images at the same view can be effectively shared and reused. Such that we propose a novel early termination mode decision considering a co-located texture video. The merge/skip mode provides good coding performance and requires little complexity in the 3D-HEVC encoder, where the motion vector predictor (MVP) is adopted for the current treeblock to generate a compensated block. Meanwhile, the merge/skip mode is the dominant mode at low bitrates (high QPs) in the 3D-HEVC encoder, and the distribution is similar to that in the previous video coding standard, H.264/AVC. Once the merge/skip mode can be predecided, variable size ME and DE computation for a treeblock can be entirely saved. Usually, the decision to use merge/skip mode is delayed until the RD costs of all other modes (inter-, intra-, and DEmodes) have been calculated and merge/skip mode is found to have the minimum RD cost. Thus, if we can exploit previously encoded texture coding information to determine that those depth map treeblocks are encoded in merge/skip mode (this mode along with CU partition inherited to encode forcefully the depth treeblock without going further in depth quadtree level), we can skip the time-consuming process of computing RD costs on smaller block sizes for a high percentage of treeblocks and, thus, significantly reduce the computation complexity of the 3D-HEVC mode decision process.
Based on this consideration, the proposed algorithm introduces an early termination mode decision to skip checking unnecessary ME and DE by utilizing the colocated texture video prediction mode information. In our approach, we first take advantage of the relations of previously encoded texture images at the same view for early merge/skip mode decision. Since both depth map and texture video are generally captured at the same time, it is likely for each treeblock to have the same motion and block partition information. So when a treeblock of depth map is encoded, we consider how the corresponding texture video treeblock (C col in Figure 2) was encoded. When the merge/skip mode is selected as the best prediction mode on the texture treeblock in the 3D-HEVC mode decision, it indicates that the current texture treeblock is located in a low-motion or static region. The motion of the texture treeblock can be predicted well using the merge/skip mode, which results in a lower energy residual after motion compensation compared to other prediction modes such as inter 2N × 2N, 2N × N, N × 2N, and N × N. Thus, no further processing of variable size ME and DE computation is necessary.
However, the proposed early termination mode decision algorithm has a few strong assumptions: depth map content is not always similar to the color content, e.g., in a planar highly textured area, there is a highcolor variance but depth is constant. Depth acquisition can be unreliable but the assumption that information can be discarded for this reason is questionable. Finally, if motion estimation on color data is wrong with the proposed approach, errors can propagate to depth data even if the estimation from depth could be correct. Based on this observation, we investigate the effectiveness of the proposed early termination mode decision algorithm. By exploiting the exhaustive mode decision in the 3D-HEVC encoder under the aforementioned test conditions in Section 2, extensive simulations have been conducted on a set of test sequences as listed in Table 2. Table 2 shows the hit rate of the early termination mode decision algorithm. This hit rate is defined as the ratio of the number of depth map treeblocks, which selects the same best prediction mode using the 3D-HEVC encoder as well as the proposed algorithm, to the total number of depth map treeblocks. The average hit rate of the proposed algorithm is larger than 93% with a maximum of 95% in 'QP = 45' and a minimum of 91% in 'QP = 34'. The  simulation results shown in Table 2 indicate that the proposed early termination mode decision algorithm can accurately reduce the unnecessary depth map CU mode by utilizing the information of the corresponding treeblock in texture video. Based on this statistical tendency, the proposed depth map early termination algorithm checks the prediction modes from the co-located texture video: if texture treeblock (C col ) has no motion, corresponding depth map treeblock (D c ) has motion due to unreliable depth estimation; therefore, the motion in the depth map can be ignored. When the texture video treeblock selects merge/skip as the best mode, it indicates that the motion can be efficiently represented using the current depth map treeblock, and the variable size ME and DE computation for a depth map treeblock can be skipped in the 3D-HEVC mode decision.

Adaptive search range motion estimation
ME is the most computationally expensive task in the 3D-HEVC encoder, which is defined as the search of the best matched treeblock within a predefined region in the reference frame. The larger ME search range produces higher computational load, and a very small ME search range may reduce the coding performance due to poor matching results. A suitable ME search range can reduce the computational complexity of 3D-HEVC and also maintain the good RD performance.
where SR represents the search range defined in the configuration file of the 3D-HEVC, and Search  To verify legitimacy of the proposed adaptive search range motion estimation algorithm, extensive simulations have been conducted on eight video sequences to analyze the motion vector distribution for these three types of treeblocks. By exploiting the exhaustive mode decision in 3D-HEVC under the aforementioned test conditions, we investigate the motion vector distribution for these three types of treeblocks. Table 3 shows the motion vector distribution for each type of treeblocks. It can be seen from Table 3 Table 3 demonstrate that the proposed adaptive search range motion estimation algorithm can accurately reduce the unnecessary ME search range in 3D-HEVC. A flowchart of the proposed adaptive search range motion estimation algorithm is given in Figure 3.

Fast disparity estimation for depth map coding
In the test model of 3D-HEVC, when coding the dependent views, the HEVC codec is modified by including some high-level syntax changes and the disparity-compensated prediction (DCP) techniques, similar to the inter-view prediction in the MVC extension of H.264/AVC [4]. In addition, different from coding dependent texture view, depth map is characterized by sharp edges and large regions with nearly constant values. The eight-tap interpolation filters that are used for ME interpolation in HEVC can produce ringing artifacts at sharp edges in the depth map, which are visible as disturbing components in synthesized intermediate views. For avoiding this issue and for decreasing the encoder and decoder complexity, the ME as well as the DE has been modified in a way that no interpolation is used. That means, for depth map, the inter-picture prediction is always performed with full-sample accuracy. For the actual DE, a block of samples in the reference picture is directly used as the prediction signal without interpolating any intermediate samples. In order to avoid the transmission of motion and disparity vectors with an unnecessary accuracy, full-sample accurate motion and disparity vectors are used for coding the depth map. The transmitted motion vector differences are coded using full-sample instead of quarter-sample precision. This modified technique achieves the highest possible depth map coding efficiency, but it results in extremely large encoding time which obstructs 3D-HEVC from practical application. In this paper, a fast DE algorithm for depth map coding is proposed to reduce 3D-HEVC computational complexity.  As mentioned in the above, disparity prediction is used to search the best matched block in frames from neighbor views. Although temporal prediction is generally the most efficient prediction mode in 3D-HEVC, it is sometimes necessary to use both DE and ME rather than only use ME to achieve better predictions. In general, temporal motion cannot be characterized adequately, especially for regions with non-rigid motion and regions with motion boundaries. For the former, ME based on simple translation movement usually fails and, thus, produces a poor prediction. For the latter, regions with motion boundaries are usually predicted using small mode sizes with a larger magnitude of motion vectors and higher residual energy [18]. Thus, the treeblocks with a simple mode region are more likely to choose temporal prediction (ME), and treeblocks with a complex mode region are more likely to choose inter-view prediction (DE).
By exploiting the exhaustive mode decision in the 3D-HEVC encoder under the aforementioned test experimental conditions in Section 3.2, we investigate the probabilities of choosing inter-view prediction and temporal prediction for each type of treeblocks in Table 4. For treeblocks with a simple mode region, the average probabilities of choosing temporal prediction and interview prediction are 97.7% and 2.2%, respectively. For treeblocks with a normal mode region, they are 89.1% and 11.0%, respectively. For treeblocks with a complex mode region, the probabilities are 63.7% and 36.4%, respectively. We can see from Table 4 that treeblocks with a simple mode region are much more likely to choose temporal prediction. Thus, for a simple mode region, the procedure of the inter-view prediction can be skipped with only a very low miss detection ratio by using the optimal prediction mode chosen by the full inter-view and temporal prediction modes. But for complex mode region treeblocks and treeblocks with a normal mode region, the average probabilities of choosing inter-view prediction are 36.4% and 11.0%, respectively. Although the test sequences such as 'Poznan_Hall2' and 'Newspaper' contain a large area of the homogeneous textures and low-activity motion, which are more likely to be encoded with temporal prediction, the probability of inter-view prediction for a treeblock with a normal mode region and complex mode region is still highest. Thus, if we disable inter-view prediction in the normal mode region and complex mode region, the coding efficiency loss is not negligible.  Based on the aforementioned analysis, we propose a fast disparity estimation algorithm in which a disparity search is selectively enabled. For treeblocks with a simple mode region, disparity search is skipped (only the RD cost of the MVP is used); while for treeblocks with a normal mode region, the RD cost of the MVP is compared with that of the disparity vector predictor (DVP). If the RD cost of MVP is larger than that of DVP, the disparity search is enabled; otherwise, it is disabled. For treeblocks with a complex mode region, disparity search is enabled (all the RD cost of MVP and DVP are used). A flowchart of the scheme is given in Figure 4.

Overall algorithm
Based on the aforementioned analysis, including the approaches of early termination mode decision, adaptive search range ME and fast DE for depth map coding, we propose a low-complexity depth map compression algorithm for 3D-HEVC as follows.
Step 1: start mode decision for a depth map treeblock.
Step 2: locate the spatial neighboring depth map treeblock and its co-located texture video treeblocks (shown in Figure 2) at the previously coded data. Derive the coding information from predictors in the depth map and texture video.
Step 3: derive the prediction mode of the co-located texture video treeblocks; if texture treeblock has no motion, perform early merge/skip mode decision and go to Step 7, else go to Step 4.
Step 4: compute C based on Equation 3 and T 1 and T 2 based on Equation 4; classify the current depth map treeblock into the simple mode region, normal mode region, and complex mode region.
Step 5: perform adaptive search range ME determination: for the treeblocks in a simple mode region, the search range window is reconfigured with [SR/8 × SR/8]; for the treeblock in a normal mode region, the search range window is with [SR/4 × SR/4]; otherwise, the search range window is unchanged.
Step 6: perform variable size DE: for treeblocks with a simple mode region, disparity search is skipped, while for treeblocks with a complex mode region, disparity search is enabled. For treeblocks with a normal mode region, the RD cost of the MVP is compared with that of the DVP.
Step 7: determine the best prediction mode. Go to step 1 and proceed with next depth map treeblock.

Experimental results
In order to confirm the performance of the proposed low-complexity depth map compression algorithm, which is implemented on the recent 3D-HEVC Test Model (HTM ver.5.1), we show the results obtained in the test on eight sequences released by the JCT-3V Group. The detailed information of the test sequences is provided in Table 5. All the experiments are defined under the common test conditions (CTC) [19] required by JCT-3V. The encoder configuration is as follows: two-view case (coding order: left-right) and three-view case (coding order: center-left-right); GOP length 8 with an intra period of 24; HEVC codecs are configured with 8-bit internal processing; the coding treeblock has a fixed size of 64 × 64 pixels and a maximum CU depth level of 4, resulting in a minimum CU size of 8 × 8 pixels; search range of the ME is configured with 64, inter-view motion prediction mode on, P-I-P inter-view prediction; and CABAC is used as the entropy coder.   synthesis reference software (VSRS) algorithm provided by MPEG [20]. Since the depth map sequences are used for rendering instead of being viewed directly, we only compute the peak signal-to-noise ratio (PSNR) between the synthesized views using compressed depth map sequences and the synthesized views using uncompressed depth map sequences. The experimental results are presented in Tables 6, 7 Tables 6 and 7 give individual evaluation results of the proposed algorithms compared with the original 3D-HEVC encoder ( Table 6 in two-view case, Table 7 in three-view case), i.e., early termination mode decision (ETMD), adaptive search range motion estimation (ASRME), and fast disparity estimation (FDE), when they are applied alone. The proposed three algorithms can greatly reduce the encoding time with similar encoding efficiency for all sequences. For the ETMD algorithm, about 18.9% and 19.9% coding time has been reduced in the two-view and three-view conditions, respectively, with the highest gain of 32.7% in 'Poznan_Hall2' (twoview case) and the lowest gain of 7.6% in 'GT_Fly' (three-view case). It can be also observed that a consistent gain is obtained over all sequences under both conditions. The average PSNR drop for all the test sequences is 0.01 dB, which is negligible, and bitrate has been reduced by 0.02% to 0.03% on average, which indicates that the proposed ETMD algorithm can improve the bitrate performance for depth map compression in 3D-HEVC. For the proposed ASRME algorithm, 32.4% encoding time has been reduced with the maximum of 64.5% in two-view and three-view conditions. The coding efficiency loss is very negligible with a 0.02-dB PSNR drop or 0.11% to 0.13% bitrate increase. This result indicates that ASRME can efficiently skip unnecessary ME search range computation in 3D-HEVC depth map coding. As far as the FDE algorithm, 25.4% and 26.4% coding time has been reduced in the two-view and three-view conditions, respectively; the average PSNR drop for all the test sequences is 0.02 to 0.03 dB, and the average increase of bitrate is 0.27% to 0.3%, which is negligible. The foregoing result analysis indicates that the FDE algorithm can efficiently reduce unnecessary DE computation time while maintaining nearly similar coding efficiency as the original 3D-HEVC encoder.

Combined results
In the following, we will analyze the experimental result of the proposed overall algorithm, which incorporates ETMD, ASRME, and FDE. The comparison result of the overall algorithm is shown in Table 8. The proposed overall algorithm reduces 64.3% and 66.3% encoding time on average under the two-view and three-view case, respectively, and achieves the better gain in coding speed among all the test sequence approaches compared to 3D-HEVC. Also shown is a consistent gain in coding speed for depth map compression with a minimum of 47.3% in 'Kendo' (two-view case) and the maximum of 87.5% in 'GT_Fly' (two-view case). For sequences with ground-truth depth map like 'Shark, ' 'Undo_Dancer', and 'GT_Fly', the proposed algorithm saves more than 80% coding time. The computation reduction is particularly high because the variable size ME and DE decision process of a significant number of depth map treeblocks are reasonably skipped. Meanwhile, the coding efficiency loss is negligible, where the average PSNR drop for the entire test sequences is 0.04 to 0.05 dB or the average increase of bitrate is 0.39% to 0.43%. Therefore, the proposed overall algorithm can reduce more than 64% depth map coding time with the same RD performance of the original 3D-HEVC encoder.
In addition to the 3D-HEVC encoder, we also compare the proposed overall algorithm with a state-of-the-art fast algorithm for 3D-HEVC (content adaptive complexity reduction scheme (CACRS) [16]) in Table 9. Compared with CACRS, the proposed overall algorithm performs better on all the sequences and achieves more than 24.5% coding time saving, with a minimum of 14.7% in 'Kendo' (two-view case) and the maximum of 38.6% in 'GT_Fly' (three-view case). Meanwhile, the proposed overall algorithm achieves a better depth map coding performance, with 0.05-to 0.06-dB PSNR increase or 0.95% to 1.21% bitrate decrease for all the test sequences compared to CACRS.