 Research
 Open Access
 Published:
Lowcomplexity depth map compression in HEVCbased 3D video coding
EURASIP Journal on Image and Video Processing volume 2015, Article number: 2 (2015)
Abstract
In this paper, a lowcomplexity algorithm is proposed to reduce the complexity of depth map compression in the highefficiency video coding (HEVC)based 3D video coding (3DHEVC). Since the depth map and the corresponding texture video represent the same scene in a 3D video, there is a high correlation among the coding information from depth map and texture video. An experimental analysis is performed to study depth map and texture video correlation in the coding information such as the motion vector and prediction mode. Based on the correlation, we propose three efficient lowcomplexity approaches, including early termination mode decision, adaptive search range motion estimation (ME), and fast disparity estimation (DE). Experimental results show that the proposed algorithm can reduce about 66% computational complexity with negligible ratedistortion (RD) performance loss in comparison with the original 3DHEVC encoder.
Introduction
Threedimensional video standard has been recently finalized by the Joint Collaborative Team on 3D Video Coding (JCT3V), and the highefficiency video coding (HEVC)based 3D video coding (3DHEVC) is developed as an extension of HEVC [13]. For the efficient compression of 3D video data with multiview texture video and depth map, a number of coding tools are investigated to exploit in 3DHEVC such as interview motion prediction and disparitycompensated prediction [4]. This technique achieves the highest possible coding efficiency in multiview texture video compression, but it results in extremely large encoding time with small increase of depth coding efficiency which obstructs it from 3DHEVC practical use. Therefore, it is necessary to develop a fast algorithm that can reduce the complexity of multiview depth map compression with minimal loss of coding efficiency in a 3DHEVC encoder.
Recently, a number of approaches have been made to explore fast algorithms in depth map coding. A motion vector (MV) sharing algorithm is proposed in [5] to reduce the complexity of depth map coding. An early termination algorithm for depth coding is introduced in [6] based on the detection of the differences between the current macroblock (MB) and the colocated MBs in texture video. An intra prediction algorithm for depth coding is presented in [7] to reduce the number of candidate prediction directions for smooth regions. A lowcomplexity mode decision and motion estimation algorithm is proposed in [8] to take advantage of the texture motion information which may be usefully exploited in the encoding of the corresponding depth map. A novel depth and depthcolor codec is proposed in [9] based on a shapeadaptive wavelet transform and an explicit encoding of the locations of major depth edges. A depth map compression algorithm [10] uses the corresponding texture video as side information to improve the coding performance. A fast motion search and mode decision algorithm is proposed in [11] to speed up the motion estimation (ME) stages of the depth coding process, and a fast depth map method is proposed in our previous work [12] based on sharing motion vector and SKIP mode from the texture video to reduce complexity of depth coding. All these algorithms are efficient in reducing computational complexity with acceptable quality degradation in coding performance for previous video coding standards. However, these algorithms are not directly applicable to the new standard 3DHEVC, where high computational complexity is intrinsically related to the use of new prediction coding structures for the 3DHEVC encoder.
To this end, several fast algorithms [1316] have been proposed for the 3DHEVC encoder to reduce the complexity of depth map coding. A fast mode decision algorithm is proposed in [13] to early terminate the unnecessary prediction modes full ratedistortion (RD) cost calculation in 3DHEVC. A lowcomplexity depth map coding algorithm based on the associated texture video is introduced in [14] to reduce the number of wedgelet candidates. A fast wedgelet partitioning algorithm is proposed in [15] to simplify the intra mode decision in 3DHEVC depth map coding. A content adaptive complexity reduction algorithm is proposed in [16] to reduce the 3DHEVC coding complexity by utilizing the correlations between the base view and the dependent view. The aforementioned algorithms are well developed for depth map coding achieving significant time savings in 3DHEVC. However, the coding information correlations between the depth map and the texture video are not fully studied. This situation results in a limited time saving. There is still some room for further reduction of computational complexity of the 3DHEVC depth map compression.
The depth map represents a 3D scene information, which has the same content with similar characteristic of the texture video. Therefore, there is a high correlation among motion information from depth map and texture video. In this paper, we propose a lowcomplexity depth compression algorithm using the correlation among motion information from depth map and texture video. The proposed algorithm consists of three approaches: early termination mode decision, adaptive search range ME, and fast disparity estimation (DE) for depth map coding. Experimental results illustrate that the proposed algorithm can significantly reduce the computational complexity of depth map compression while maintaining almost the same coding performance in comparison with the original 3DHEVC encoder.
The rest of the paper is organized as follows. Section 2 analyzes the property of depth map and the correlation among motion information from depth map and texture video. A lowcomplexity depth coding algorithm base on adaptive search range ME and fast DE is presented in Section 3. Experimental results and conclusions are given in Sections 4 and 5, respectively.
Observations and analysis
In the test model of 3DHEVC, the variable sizes the ME and DE to exploit both temporal and view correlation within temporally successive pictures and neighboring views. The coding unit (CU) is the basic unit of region splitting used for 3DHEVC similar to macroblock in H.264/AVC, which has a hierarchical quadtree structure having variable sizes from 64 × 64 to 8 × 8. The partition unit (PU) is the basic unit used for 3DHEVC inter/intra prediction processes. At each treeblock, 3DHEVC performs ME and DE with different PU sizes including 2N × 2N, 2N × N, N × 2N, and N × N.
Similar to HEVC for a treeblock, the mode decision process in 3DHEVC is performed using all the possible prediction modes to find the one with the least RD cost using a Lagrange multiplier. The RD cost function (J) used in 3DHEVC is defined as follows:
where D specifies the bit cost to be considered for the 3DHEVC mode decision, SSE is the average difference between the current treeblock and the matching treeblock, and λ is the Lagrange multiplier. However, calculation of the RD cost needs to execute both the ME and DE processes in 3DHEVC, and these ‘try all and select the best’ method will result in high computational complexity and limit the use of 3DHEVC encoders in practical applications. Therefore, lowcomplexity algorithms, which can reduce the complexity of the ME and DE processes with negligible loss of coding efficiency, are extremely necessary for realtime implementation of 3DHEVC encoders.
Since the depth map and its associated texture video are both projections of the same scenery from the same viewpoint at the same time instant, the motion characteristics (i.e., block partitioning and corresponding motion vectors) of the depth map and its associated texture video are typically similar. Therefore, a new coding mode motion parameter inheritance (MPI) [4,17], where the data that are already transmitted for the texture video picture can be reused for efficient encoding of the depth map, has been introduced in the 3DHEVC encoder. This achieves the highest coding efficiency but requires a very high computational complexity. Since the motion vectors of the texture video have quartersample accuracy, whereas for the depth map only fullsample accuracy is used, in the inheritance process, the motion vectors are quantized to their nearest fullsample position. In addition, the inherited reference picture shall be the one with the same picture order count (POC) and viewpoint as the reference picture of the colocated block in the texture video picture. If there is no reference picture in the reference lists that satisfies this condition, such a candidate is treated as invalid and it is not inserted to the merge candidate list. However, the coding information correlations between the depth map and texture video are not fully studied. The coding information includes the reference picture, prediction mode, and motion vector.
Therefore, the prediction mode of the depth map treeblock is similar to that of the corresponding texture video treeblock. Meanwhile, the homogeneous regions in the depth map have a strong spatial correlation, and thus, spatially neighboring depth map treeblocks have similar coding information. The relationship among the current depth map treeblock, colocated texture video treeblock, and spatially neighboring treeblock is shown in Figure 1. The reference picture in the colocated texture view has the same POC value as the reference picture of current depth map view.
On the basis of these observations, we propose to analyze the depth intra prediction mode using the coding information from the spatial neighboring depth map and the colocated texture video treeblock. The neighboring depth map and the colocated texture video treeblock are described as in Figure 2. D _{c} denotes the current depth map treeblock, D _{l}, D _{u}, D _{ul}, and D _{ur} denote the neighboring treeblocks in the depth map. C _{col} denotes the colocated treeblock in the texture video and C _{l}, C _{u}, C _{ul}, and C _{ur} its left treeblock, up treeblock, upleft treeblock, and upright treeblock, respectively, as shown in Figure 2.
According to the coding information correlation with the mode maps of encoded frames, we define a set of intra mode predictors (P) for depth map treeblock as follows:
Based on this predictor set, a mode complexity (C) parameter is defined according to the mode context of the spatial neighboring depth map and the colocated texture video treeblock, and then, the mode characteristic of a depth map treeblock is estimated. The mode complexity of a depth map treeblock is described as follows:
where i is the related treeblock in predictors P, β _{ i } is the treeblock weight factor of each predictor in Equation 2, and η _{ i } is the treeblock mode factor of each predictor. Only the prediction modes of those available neighboring treeblocks in predictors P will be used. In 3DHEVC, various prediction mode sizes are used in the mode decision process. The mode factor of each predictor η _{ i } can be assigned based on the complexity of each mode as follows: when the predictor i is SKIP mode, merge mode, inter 2N × 2N, and intra 2N × 2N mode, η _{ i } is assigned with a small value ‘1;’ when the predictor i is inter 2N × N, inter N × 2N mode, η _{ i } is assigned with a medium value ‘2;’ when the predictor i is smallsize inter modes, intra N × N mode (depth modeling modes (DMM) and region boundary chain (RBC) mode in the neighboring depth map treeblocks), and DE mode, η _{ i } is assigned with a large value ‘3.’ The treeblock weight factors of these nine predictors have an additional property, $$ {\displaystyle \sum_i}{\beta}_i=1 $$ . β _{ i } is defined according to the effect of related treeblocks on current treeblock. Since treeblocks in the horizontal and vertical directions have a large effect on the current treeblock compared to treeblocks in the diagonal direction, the weight factors β _{ i } for the horizontal and vertical treeblocks (D _{l}, D _{u}, C _{l}, and C _{u}) are set to 0.1, and that of the diagonal direction treeblocks (D _{ul}, D _{ur}, C _{ul}, and C _{ur}) are set to 0.05. In the case of the colocated texture video treeblock, the treeblock weight factor β _{ Ccol} is set to 0.4.
Generally, the larger the mode factor, the more complex the treeblock is. According to the value of C, each treeblock can be divided into three types. T _{1} and T _{2} are set to determine whether a treeblock belongs to the region with the simple mode, normal mode, or complex mode. The criterion is defined as follows:
where T _{1} and T _{2} are modeweight factors. Those threshold settings are crucial for effective depth map compression, and it is always a tradeoff between depth map coding quality and computational complexity reduction. From simulations on various test sequences, it can be found that the optimal threshold for each sequence depends on the sequence content. In order to cope with different texture characteristics of test sequences, extensive simulations have been conducted on eight video sequences to analyze the thresholds for three types of treeblocks. Among these test sequences, Kendo, Balloons, and Newspaper are in 1,024 × 768 resolution, while Undo_Dancer, GT_Fly, Poznan_Street, Poznan_Hall2, and Shark are in 1,920 × 1,088 resolution, and the ‘Shark’ and ‘Undo_Dancer’ sequences are with a large global motion or rich texture, the ‘Kendo’ , ‘Balloons’ , ‘Newspaper’ , and ‘Poznan_Street’ sequences are with a medium local motion or a smooth texture, and ‘Poznan_Hall2’ is a small global motion or a homogeneous texture sequence. The test conditions are as follows: IBP view structure; test fulllength frames for each sequence; quantization parameter (QP) is chosen with 34, 39, 42, and 45; group of pictures (GOP) size = 8; treeblock size = 64; search range of ME is configured with 64; and contextadaptive binary arithmetic coding (CABAC) is used for entropy coding. Then, we calculated the average thresholds of those eight test sequences.
Table 1 shows the accuracies of the proposed algorithm using various thresholds. The accuracies here are defined as the ratio of the number of the simple mode, normal mode, and complex mode, which select the same best modes using the 3DHEVC encoder as well as the proposed algorithm. It can be seen from Table 1 that when the threshold values are T _{1} = 0.8, T _{2} = 1.2, the average accuracy of the proposed algorithm achieves more than 93% with a maximum of 97% in the ‘Shark’ sequence. Based on extensive experiments, T _{1} and T _{2} are set to 0.8 and 1.2, respectively, which achieve a good and consistent performance on a variety of test sequences with different texture characteristics and motion activities and fixed for each treeblock QP level in 3DHEVC encoder.
Proposed lowcomplexity depth map compression algorithm
Early termination mode decision
The depth map is usually not the ground truth because existing depth map estimation methods still have difficulties to generate accurate depths at object edges or in areas with less texture. Distortion may occur during depth map estimation, which will result in a noisy depth map (caused by occlusion and areas of low texture), such that it would be inefficient to spend more bits to achieve an accurate representation of the depth map in 3DHEVC coding. To overcome this problem, this paper proposes an early termination mode decision for 3DHEVC, which takes into account the correlations between coding information from texture videos and depth maps to speed up the coding process.
The depth map content is similar with that of texture video, and thus, the coding modes of texture and depth map are similar. By utilizing the information of the corresponding treeblock in the texture video, the coding information of previously encoded texture images at the same view can be effectively shared and reused. Such that we propose a novel early termination mode decision considering a colocated texture video. The merge/skip mode provides good coding performance and requires little complexity in the 3DHEVC encoder, where the motion vector predictor (MVP) is adopted for the current treeblock to generate a compensated block. Meanwhile, the merge/skip mode is the dominant mode at low bitrates (high QPs) in the 3DHEVC encoder, and the distribution is similar to that in the previous video coding standard, H.264/AVC. Once the merge/skip mode can be predecided, variable size ME and DE computation for a treeblock can be entirely saved. Usually, the decision to use merge/skip mode is delayed until the RD costs of all other modes (inter, intra, and DEmodes) have been calculated and merge/skip mode is found to have the minimum RD cost. Thus, if we can exploit previously encoded texture coding information to determine that those depth map treeblocks are encoded in merge/skip mode (this mode along with CU partition inherited to encode forcefully the depth treeblock without going further in depth quadtree level), we can skip the timeconsuming process of computing RD costs on smaller block sizes for a high percentage of treeblocks and, thus, significantly reduce the computation complexity of the 3DHEVC mode decision process.
Based on this consideration, the proposed algorithm introduces an early termination mode decision to skip checking unnecessary ME and DE by utilizing the colocated texture video prediction mode information. In our approach, we first take advantage of the relations of previously encoded texture images at the same view for early merge/skip mode decision. Since both depth map and texture video are generally captured at the same time, it is likely for each treeblock to have the same motion and block partition information. So when a treeblock of depth map is encoded, we consider how the corresponding texture video treeblock (C _{col} in Figure 2) was encoded. When the merge/skip mode is selected as the best prediction mode on the texture treeblock in the 3DHEVC mode decision, it indicates that the current texture treeblock is located in a lowmotion or static region. The motion of the texture treeblock can be predicted well using the merge/skip mode, which results in a lower energy residual after motion compensation compared to other prediction modes such as inter 2N × 2N, 2N × N, N × 2N, and N × N. Thus, no further processing of variable size ME and DE computation is necessary.
However, the proposed early termination mode decision algorithm has a few strong assumptions: depth map content is not always similar to the color content, e.g., in a planar highly textured area, there is a highcolor variance but depth is constant. Depth acquisition can be unreliable but the assumption that information can be discarded for this reason is questionable. Finally, if motion estimation on color data is wrong with the proposed approach, errors can propagate to depth data even if the estimation from depth could be correct. Based on this observation, we investigate the effectiveness of the proposed early termination mode decision algorithm. By exploiting the exhaustive mode decision in the 3DHEVC encoder under the aforementioned test conditions in Section 2, extensive simulations have been conducted on a set of test sequences as listed in Table 2. Table 2 shows the hit rate of the early termination mode decision algorithm. This hit rate is defined as the ratio of the number of depth map treeblocks, which selects the same best prediction mode using the 3DHEVC encoder as well as the proposed algorithm, to the total number of depth map treeblocks. The average hit rate of the proposed algorithm is larger than 93% with a maximum of 95% in ‘QP = 45’ and a minimum of 91% in ‘QP = 34’. The simulation results shown in Table 2 indicate that the proposed early termination mode decision algorithm can accurately reduce the unnecessary depth map CU mode by utilizing the information of the corresponding treeblock in texture video.
Based on this statistical tendency, the proposed depth map early termination algorithm checks the prediction modes from the colocated texture video: if texture treeblock (C _{col}) has no motion, corresponding depth map treeblock (D _{c}) has motion due to unreliable depth estimation; therefore, the motion in the depth map can be ignored. When the texture video treeblock selects merge/skip as the best mode, it indicates that the motion can be efficiently represented using the current depth map treeblock, and the variable size ME and DE computation for a depth map treeblock can be skipped in the 3DHEVC mode decision.
Adaptive search range motion estimation
ME is the most computationally expensive task in the 3DHEVC encoder, which is defined as the search of the best matched treeblock within a predefined region in the reference frame. The larger ME search range produces higher computational load, and a very small ME search range may reduce the coding performance due to poor matching results. A suitable ME search range can reduce the computational complexity of 3DHEVC and also maintain the good RD performance.
In 3D video coding, since both depth map and texture video represent the same scene, it is likely for the depth map and texture video treeblock to have similar motion information. Based on this observation, we propose to use the mode complexity parameter (C) defined in Equation 3 to speed up the procedure of depth map coding DE search range computational complexity. According to the motion complexity of a depth map treeblock (based on Equation 3), we first classify the depth map treeblock to different categories in terms of DE search range as follows:
where SR represents the search range defined in the configuration file of the 3DHEVC, and Search range_{depth} is the adjusted search range of the corresponding treeblock in the depth map.
To verify legitimacy of the proposed adaptive search range motion estimation algorithm, extensive simulations have been conducted on eight video sequences to analyze the motion vector distribution for these three types of treeblocks. By exploiting the exhaustive mode decision in 3DHEVC under the aforementioned test conditions, we investigate the motion vector distribution for these three types of treeblocks.
Table 3 shows the motion vector distribution for each type of treeblocks. It can be seen from Table 3 that for treeblocks in the simple mode region, more than 97% of all motion vectors lie in the [SR/8 × SR/8] window. In other words, if the maximum search range is set to SR/8, it will most likely cover about 97% of all motion vectors. For the treeblocks in normal mode region, about 97% of all motion vectors lie in the [SR/4 × SR/4] window. If the maximum search range is set to be SR/4, it will most likely cover about 97% of motion vectors. For the treeblocks in the complex mode region, the percentage of all motion vectors that lie in the [SR/16 × SR/16], [SR/8 × SR/8], and [SR/4 × SR/4] windows are relatively low, only about 51%, 72%, and 84%, respectively, and thus, 3DHEVC motion vector search range cannot be reduced. The results shown in Table 3 demonstrate that the proposed adaptive search range motion estimation algorithm can accurately reduce the unnecessary ME search range in 3DHEVC. A flowchart of the proposed adaptive search range motion estimation algorithm is given in Figure 3.
Fast disparity estimation for depth map coding
In the test model of 3DHEVC, when coding the dependent views, the HEVC codec is modified by including some highlevel syntax changes and the disparitycompensated prediction (DCP) techniques, similar to the interview prediction in the MVC extension of H.264/AVC [4]. In addition, different from coding dependent texture view, depth map is characterized by sharp edges and large regions with nearly constant values. The eighttap interpolation filters that are used for ME interpolation in HEVC can produce ringing artifacts at sharp edges in the depth map, which are visible as disturbing components in synthesized intermediate views. For avoiding this issue and for decreasing the encoder and decoder complexity, the ME as well as the DE has been modified in a way that no interpolation is used. That means, for depth map, the interpicture prediction is always performed with fullsample accuracy. For the actual DE, a block of samples in the reference picture is directly used as the prediction signal without interpolating any intermediate samples. In order to avoid the transmission of motion and disparity vectors with an unnecessary accuracy, fullsample accurate motion and disparity vectors are used for coding the depth map. The transmitted motion vector differences are coded using fullsample instead of quartersample precision. This modified technique achieves the highest possible depth map coding efficiency, but it results in extremely large encoding time which obstructs 3DHEVC from practical application. In this paper, a fast DE algorithm for depth map coding is proposed to reduce 3DHEVC computational complexity.
As mentioned in the above, disparity prediction is used to search the best matched block in frames from neighbor views. Although temporal prediction is generally the most efficient prediction mode in 3DHEVC, it is sometimes necessary to use both DE and ME rather than only use ME to achieve better predictions. In general, temporal motion cannot be characterized adequately, especially for regions with nonrigid motion and regions with motion boundaries. For the former, ME based on simple translation movement usually fails and, thus, produces a poor prediction. For the latter, regions with motion boundaries are usually predicted using small mode sizes with a larger magnitude of motion vectors and higher residual energy [18]. Thus, the treeblocks with a simple mode region are more likely to choose temporal prediction (ME), and treeblocks with a complex mode region are more likely to choose interview prediction (DE).
By exploiting the exhaustive mode decision in the 3DHEVC encoder under the aforementioned test experimental conditions in Section 3.2, we investigate the probabilities of choosing interview prediction and temporal prediction for each type of treeblocks in Table 4. For treeblocks with a simple mode region, the average probabilities of choosing temporal prediction and interview prediction are 97.7% and 2.2%, respectively. For treeblocks with a normal mode region, they are 89.1% and 11.0%, respectively. For treeblocks with a complex mode region, the probabilities are 63.7% and 36.4%, respectively. We can see from Table 4 that treeblocks with a simple mode region are much more likely to choose temporal prediction. Thus, for a simple mode region, the procedure of the interview prediction can be skipped with only a very low miss detection ratio by using the optimal prediction mode chosen by the full interview and temporal prediction modes. But for complex mode region treeblocks and treeblocks with a normal mode region, the average probabilities of choosing interview prediction are 36.4% and 11.0%, respectively. Although the test sequences such as ‘Poznan_Hall2’ and ‘Newspaper’ contain a large area of the homogeneous textures and lowactivity motion, which are more likely to be encoded with temporal prediction, the probability of interview prediction for a treeblock with a normal mode region and complex mode region is still highest. Thus, if we disable interview prediction in the normal mode region and complex mode region, the coding efficiency loss is not negligible.
Based on the aforementioned analysis, we propose a fast disparity estimation algorithm in which a disparity search is selectively enabled. For treeblocks with a simple mode region, disparity search is skipped (only the RD cost of the MVP is used); while for treeblocks with a normal mode region, the RD cost of the MVP is compared with that of the disparity vector predictor (DVP). If the RD cost of MVP is larger than that of DVP, the disparity search is enabled; otherwise, it is disabled. For treeblocks with a complex mode region, disparity search is enabled (all the RD cost of MVP and DVP are used). A flowchart of the scheme is given in Figure 4.
Overall algorithm
Based on the aforementioned analysis, including the approaches of early termination mode decision, adaptive search range ME and fast DE for depth map coding, we propose a lowcomplexity depth map compression algorithm for 3DHEVC as follows.

Step 1: start mode decision for a depth map treeblock.

Step 2: locate the spatial neighboring depth map treeblock and its colocated texture video treeblocks (shown in Figure 2) at the previously coded data. Derive the coding information from predictors in the depth map and texture video.

Step 3: derive the prediction mode of the colocated texture video treeblocks; if texture treeblock has no motion, perform early merge/skip mode decision and go to Step 7, else go to Step 4.

Step 4: compute C based on Equation 3 and T _{1} and T _{2} based on Equation 4; classify the current depth map treeblock into the simple mode region, normal mode region, and complex mode region.

Step 5: perform adaptive search range ME determination: for the treeblocks in a simple mode region, the search range window is reconfigured with [SR/8 × SR/8]; for the treeblock in a normal mode region, the search range window is with [SR/4 × SR/4]; otherwise, the search range window is unchanged.

Step 6: perform variable size DE: for treeblocks with a simple mode region, disparity search is skipped, while for treeblocks with a complex mode region, disparity search is enabled. For treeblocks with a normal mode region, the RD cost of the MVP is compared with that of the DVP.

Step 7: determine the best prediction mode. Go to step 1 and proceed with next depth map treeblock.
Experimental results
In order to confirm the performance of the proposed lowcomplexity depth map compression algorithm, which is implemented on the recent 3DHEVC Test Model (HTM ver.5.1), we show the results obtained in the test on eight sequences released by the JCT3V Group. The detailed information of the test sequences is provided in Table 5. All the experiments are defined under the common test conditions (CTC) [19] required by JCT3V. The encoder configuration is as follows: twoview case (coding order: leftright) and threeview case (coding order: centerleftright); GOP length 8 with an intra period of 24; HEVC codecs are configured with 8bit internal processing; the coding treeblock has a fixed size of 64 × 64 pixels and a maximum CU depth level of 4, resulting in a minimum CU size of 8 × 8 pixels; search range of the ME is configured with 64, interview motion prediction mode on, PIP interview prediction; and CABAC is used as the entropy coder. The proposed algorithm is evaluated with QP combinations for texture video and depth map (25, 34), (30, 39), (35, 42), and (40, 45) (the first number is the texture video QP, the second number gives the depth map QP). The experiments test fulllength frames for each sequence. After encoding, the intermediate rendered views were synthesized between each view. The intermediate rendered views are generated at the receiver using view synthesis reference software (VSRS) algorithm provided by MPEG [20]. Since the depth map sequences are used for rendering instead of being viewed directly, we only compute the peak signaltonoise ratio (PSNR) between the synthesized views using compressed depth map sequences and the synthesized views using uncompressed depth map sequences. The experimental results are presented in Tables 6, 7, 8, and 9 in which coding efficiency is measured with rendered PSNR and total bitrate (depth map), and computational complexity is measured with the consumed coding time. Since the proposed approaches affect only depth map intra coding, results for texture video coding are identical; thus, the texture video results are not included in the table. The Bjontegaard Delta PSNR (BDPSNR) [21] represents the average texture quality for synthesized views PSNR gain, Bjontegaard Delta Bitrate (BDBR) represents the improvement of total bitrates for depth map coding, and ‘Dtime (%)’ represents the entire depth map coding time change in percentage.
Individual performance results of the proposed algorithms
Tables 6 and 7 give individual evaluation results of the proposed algorithms compared with the original 3DHEVC encoder (Table 6 in twoview case, Table 7 in threeview case), i.e., early termination mode decision (ETMD), adaptive search range motion estimation (ASRME), and fast disparity estimation (FDE), when they are applied alone. The proposed three algorithms can greatly reduce the encoding time with similar encoding efficiency for all sequences. For the ETMD algorithm, about 18.9% and 19.9% coding time has been reduced in the twoview and threeview conditions, respectively, with the highest gain of 32.7% in ‘Poznan_Hall2’ (twoview case) and the lowest gain of 7.6% in ‘GT_Fly’ (threeview case). It can be also observed that a consistent gain is obtained over all sequences under both conditions. The average PSNR drop for all the test sequences is 0.01 dB, which is negligible, and bitrate has been reduced by 0.02% to 0.03% on average, which indicates that the proposed ETMD algorithm can improve the bitrate performance for depth map compression in 3DHEVC. For the proposed ASRME algorithm, 32.4% encoding time has been reduced with the maximum of 64.5% in twoview and threeview conditions. The coding efficiency loss is very negligible with a 0.02dB PSNR drop or 0.11% to 0.13% bitrate increase. This result indicates that ASRME can efficiently skip unnecessary ME search range computation in 3DHEVC depth map coding. As far as the FDE algorithm, 25.4% and 26.4% coding time has been reduced in the twoview and threeview conditions, respectively; the average PSNR drop for all the test sequences is 0.02 to 0.03 dB, and the average increase of bitrate is 0.27% to 0.3%, which is negligible. The foregoing result analysis indicates that the FDE algorithm can efficiently reduce unnecessary DE computation time while maintaining nearly similar coding efficiency as the original 3DHEVC encoder.
Combined results
In the following, we will analyze the experimental result of the proposed overall algorithm, which incorporates ETMD, ASRME, and FDE. The comparison result of the overall algorithm is shown in Table 8. The proposed overall algorithm reduces 64.3% and 66.3% encoding time on average under the twoview and threeview case, respectively, and achieves the better gain in coding speed among all the test sequence approaches compared to 3DHEVC. Also shown is a consistent gain in coding speed for depth map compression with a minimum of 47.3% in ‘Kendo’ (twoview case) and the maximum of 87.5% in ‘GT_Fly’ (twoview case). For sequences with groundtruth depth map like ‘Shark,’ ‘Undo_Dancer’ , and ‘GT_Fly’ , the proposed algorithm saves more than 80% coding time. The computation reduction is particularly high because the variable size ME and DE decision process of a significant number of depth map treeblocks are reasonably skipped. Meanwhile, the coding efficiency loss is negligible, where the average PSNR drop for the entire test sequences is 0.04 to 0.05 dB or the average increase of bitrate is 0.39% to 0.43%. Therefore, the proposed overall algorithm can reduce more than 64% depth map coding time with the same RD performance of the original 3DHEVC encoder.
In addition to the 3DHEVC encoder, we also compare the proposed overall algorithm with a stateoftheart fast algorithm for 3DHEVC (content adaptive complexity reduction scheme (CACRS) [16]) in Table 9. Compared with CACRS, the proposed overall algorithm performs better on all the sequences and achieves more than 24.5% coding time saving, with a minimum of 14.7% in ‘Kendo’ (twoview case) and the maximum of 38.6% in ‘GT_Fly’ (threeview case). Meanwhile, the proposed overall algorithm achieves a better depth map coding performance, with 0.05 to 0.06dB PSNR increase or 0.95% to 1.21% bitrate decrease for all the test sequences compared to CACRS.
Figures 5 and 6 gives more detailed experiment results (RD and timesaving curves) of the proposed overall algorithm compared to 3DHEVC in the threeview case. As shown in Figures 5 and 6, the proposed lowcomplexity depth map compression algorithm can achieve a consistent time saving over a large bitrate range with almost negligible loss in PSNR and increment in bitrate.
Conclusions
This paper presents a lowcomplexity depth map compression algorithm to reduce the computational complexity of the 3DHEVC encoder by exploiting three fast approaches, i.e., early termination mode decision, adaptive search range motion estimation, and fast disparity estimation for depth map coding. The recent 3DHEVC test model is applied to evaluate the proposed algorithm. The experimental results show that the proposed algorithm can significantly reduce the computational complexity of depth map compression and maintain almost the same RD performances as the 3DHEVC encoder.
References
 1.
GJ Sullivan, JM Boyce, C Ying, JR Ohm, CA Segall, A Vetro, Standardized extensions of high efficiency video coding (HEVC). IEEE J. Sel. Top. Sign. Proces. 7(6), 1001–1016 (2013)
 2.
GJ Sullivan, JR Ohm, WJ Han, T Wiegand, Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
 3.
K Müller, H Schwarz, D Marpe, C Bartnik, S Bosse, H Brust, T Hinz, H Lakshman, P Merkle, H Rhee, G Tech, M Winken, T Wiegand, 3D high efficiency video coding for multiview video and depth data. IEEE Trans. Image Process. 22(9), 3366–3378 (2013)
 4.
L Zhang, G Tech, K Wegner, S Yea, in Joint Collaborative Team on 3D Video Coding Extensions (JCT3V) Document JCT3VE1005, 5th Meeting, 3DHEVC test model 5 (Vienna, Austria 2013).
 5.
H Oh, YS Ho, in Proc. The PacificRim Symposium on Image and Video Technology, H.264based depth map sequence coding using motion information of corresponding texture video (LNCS 2006), 4319, 898–907
 6.
M Wang, X Jin, S Goto, in Proc. 28th Picture Coding Symp., Difference detection based early mode termination for depth map coding in MVC (2010), pp. 502–505
 7.
S Tsang, Y Chan, W Siu, Efficient intra prediction algorithm for smooth regions in depth coding. Electron. Lett. 48(18), 1117–1119 (2012)
 8.
G Cernigliaro, F Jaureguizar, J Cabrera, N García, Low complexity mode decision and motion estimation for H.264/AVC based depth maps encoding in free viewpoint video. IEEE Trans. Circuits Syst. Video Techn. 23(5), 769–783 (2013)
 9.
M Maitrea, MN Do, Depth and depth–color coding using shapeadaptive wavelets. J. Vis. Commun. Image 21(5–6), 513–522 (2010)
 10.
S Milani, P Zanuttigh, M Zamarin, S Forchhammer, in Proc. IEEE Int. Conf. Multimedia and Expo(ICME), Efficient depth map compression exploiting segmented color data (Barcelona, 2011), pp.1–6
 11.
L Shen, P An, Z Liu, Z Zhang, Low complexity depth coding assisted by coding information from color video. IEEE Trans. Broadcasting 60(1), 128–133 (2014)
 12.
Q Zhang, P An, Y Zhang, L Shen, Z Zhang, Low complexity multiview video plus depth coding. IEEE Trans. Consumer Electron. 57(4), 1857–1865 (2011)
 13.
Z Gu, J Zheng, N Ling P Zhang, in Proc. 2013 IEEE International Conference on Multimedia and Expo Workshops, Fast depth modeling mode selection for 3D HEVC depth intra coding (2013), pp. 1–4
 14.
Y Song, Y Ho, in Proc. IEEE 11th IVMSP Workshop, Simplified intercomponent depth modeling in 3DHEVC, (2013), pp. 1–4
 15.
M Zhang, C Zhao, J Xu, H Bai, in Proc. IEEE International Symposium on Circuits and Systems(ISCAS), A fast depthmap wedgelet partitioning scheme for intra prediction in 3D video coding (2013), pp.2852–2855
 16.
HRTohidypour, MT Pourazad, P Nasiopoulos, V Leung, in Proc. 18th International Conference on Digital Signal Processing (DSP 2013), A Content Adaptive Complexity Reduction Scheme for HEVCBased 3D Video Coding (2013), pp.1–5
 17.
M Winken, H Schwarz, T Wiegand, in Proc. Picture Coding Symp., Motion vector inheritance for high efficiency 3D video plus depth coding (Krakow, Poland 2012), pp. 53–56
 18.
L Shen, Z Liu, T Yan, Z Zhang, P An, Viewadaptive motion estimation and disparity estimation for low complexity multiview video coding. IEEE Trans. Circuits Syst. Video Technol. 20(6), 925–930 (2010)
 19.
D Rusanovskyy, K Mueller, A Vetro, Common test conditions of 3DV core experiments (Joint Collaborative Team on 3D Video Coding Extensions (JCT3V) document JCT3VE1100, 5th Meeting, Vienna, AT, 2013)
 20.
M Tanimoto, T Fujii, K Suzuki, View Synthesis Algorithm in View Synthesis Reference Software 2.0 (VSRS 2.0), (Lausanne, Switzerland, ISO/IEC JTC1/SC29/WG11 M16090, Feb. 2008).
 21.
G Bjontegaard, Calculation of average PSNR difference between RDcurves (13th VCEGM33 Meeting, Austin, TX, 2001)
Acknowledgements
The authors would like to thank the editors and anonymous reviewers for their valuable comments. This work was supported in part by the National Natural Science Foundation of China under grant No. 61302118, 61401404, 61340059, and 61272038; the Scientific and Technological Project of Zhengzhou under Grant No.141PPTGG360; and in part by the Doctorate Research Funding of Zhengzhou University of Light Industry, under grant No. 2013BSJJ047.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 3D video
 3DHEVC
 Depth coding
 Low complexity