 Research
 Open access
 Published:
Shapereserved stereo matching with segmentbased cost aggregation and dualpath refinement
EURASIP Journal on Image and Video Processing volume 2020, Article number: 38 (2020)
Abstract
Stereo matching is one of the most important topics in computer vision and aims at generating precise depth maps for various applications. The major challenge of stereo matching is to suppress inevitable errors occurring in smooth, occluded, and discontinuous regions. In this paper, the proposed stereo matching system uses segmentbased superpixels and matching cost. After determination of edge and smooth regions and selection of matching cost, we suggest the segmentbased adaptive support weights in cost aggregation instead of color similarity and spatial proximity only. The proposed dualpath depth refinements use the crossbased support region by referring texture features to correct the inaccurate disparities with iterative procedures to improve the depth maps for shape reserving. Specially for leftmost and rightmost regions, the segmentbased refinement can greatly improve the mismatched disparity holes. The experimental results show that the proposed system can achieve higher accurate depth maps than the conventional methods.
1 Introduction
With fast evolutions of nature threedimension (3D) technologies, the applications of mixed reality [1], visual entertainment [2,3,4], environment reconstruction [5], autonomous driving [6], object detection [7, 8], and recognition [9] with additional depth information become more and more important nowadays. All the above applications, the key part is to retrieve high accuracy depth maps from multiple camera images. Instead of transmitting complex multiple views, a color texture image with its corresponding gray depth map can effectively represent the 3D information. For satisfying 3D vision, the traditional way is to directly provide multiview/stereoview videos, but the 2D image plus depth map is a preferable way to characterize 3D sensation nowadays. The depth map provides the pixelwise distance and exhibits stereoscopic vision. We can use the depth imagebased rendering (DIBR) system [10] to create multiview videos with depth information and texture image in the user side.
The concept of stereo vision comes from the different views at distinct positions of the scene, leads a limited displacement in a pair of corresponding pixels, i.e., socalled “disparity”. The disparity becomes larger when the object is moving toward the observer [11]. By parsing the disparities of left and right views, we can also extend the geometrical principle to estimate the distance between a viewed object and the observer. To get the depth map efficiently, we propose a local stereo matching method to save the computation. Since the depth values are mostly dependent on bases of the objects. By using segmentation information, the proposed stereo matching system can not only enhance the aggregation efficiency but also refine the missing objects. The basic idea of stereo matching will be briefly reviewed in Section 2. In Section 3, the designs of the proposed stereo matching system are present. Section 4 will show the experimental results achieved by the proposed and other methods. Some conclusions about this paper are finally given in Section 5.
2 Local stereo matching methods
Generally, the stereo matching algorithms can be classified into three major categories: global, local, and semiglobal approaches. The global approach uses data term and smooth team to construct their energy functions to compute the global depth map. Graphcut [12], belief propagation [13, 14], and dynamic programming [15] are the typical global stereo matching algorithms. Recently, deep learning approaches have been proposed to estimate the depth maps [16]; however, they are data dependent. In this paper, we focus on the designs of local stereo matching approaches for computation considerations and avoid the problems of data dependency.
2.1 Local stereo matching process
The typical local stereo matching process shown in Fig. 1 mainly contains matching cost computation, cost aggregation, disparity decision, and disparity refinement stages.
2.1.1 Matching cost computation
To evaluate the pixelbased matching status, there are several famous costs that are used for disparity estimation. The sum of absolute differences (SAD) [17] with color components is the most common cost of stereo matching. The SAD cost for finding the left disparity map can be formulated as:
where p = (x, y) is the pixel position in the left image and p’ = (xd, y) denotes its corresponding pixel position in the right image with disparity d. In this paper, \( {I}_c^l \) and \( {I}_c^r \) represent the color intensities of the left and right images in RGB domain, respectively.
Besides the SAD cost, the gradient similarity can also measure the variations of the texture images. The gradient cost for searching the left disparity map can be expressed by
where the gradient operator, ∇, is the combination of horizontal and vertical differences between the central pixel and its neighboring pixels in the cross relation.
Besides, the census transform, which detects the slight variations in a small block, can achieve a robust performance for minor intensity changes successfully. Figure 2 shows the traditional census and modified minicensus transforms [18]. The modified minicensus transform only selects a few specified representative pixels in the block to reduce the useless information.
To describe the above census transforms precisely, the binary result is expressed by comparing the neighboring pixel to the central pixel at p as
where q denotes the position of the neighboring pixel in the block. A bitwise catenation is applied to get the census transform as
where ⊗ expresses the bitwise catenation operator and W is the block containing the selected neighboring pixels. Thus, the census cost in terms of Hamming distance between two census transforms obtained from left and right views is expressed by:
where ⊕ is the bitwise XOR operation. The modified minicensus transform [18] needs fewer computations and achieves more robust performance against the noises than the traditional census transform.
2.1.2 Cost aggregation
Once the cost of the paired pixels in the stereo images is calculated, the cost aggregation is further applied to achieve more robust results by including more pixels, which have the same tendency. For local stereo matching, the windowbased aggregation considers the similarities of the surrounding pixels in a designated window [19,20,21,22,23,24,25]. The ideal windows are designed to include the nearby pixels, which are in the same object as possible. For example, the adaptive support weights [21] based on color similarity and spatial proximity are noted as
where Δc_{pq} and Δg_{pq} denote the color similarity weight and the geometric distance weight, respectively. γ_{c} and γ_{g} are control factors that map Δc_{pq} and Δg_{pq} to become weights. The color similarity weight is controlled by Δc_{pq}, which can be represented as
while the geometric distance weight is controlled by Δg_{pq}, which is given as
where (x_{p}, y_{p}) and (x_{q}, y_{q}) are the x and y coordinates of pixels p and q, respectively.
Besides the pixelwise adaptive support weights, the segmentation concept is also used to modify the weights increasing the matching reliability. The segmentbased adaptive support weight [22, 23] could be expressed by
where S_{p} is the segment on which p lies. In (9), they modify the weight to the largest, i.e., 1.0 if the neighboring pixel is in the same segment as the target pixel while the weights of the rest pixels are determined by color similarity.
After the weight of each pixel in the window has been calculated, we can apply the aggregation cost to all the pixel costs become as
where C(p, p’) is the initial cost, which could be SAD cost, gradient cost, or census cost stated in (2), (3), or (5), respectively. Of course, the combined cost with different weighting ratios is also possible. In (10), q and q’ are the neighboring pixels of p and p’ pixels in the target and the reference windows of the target and the reference views, respectively.
2.1.3 Raw disparity estimation
To obtain the raw disparity map, the disparity estimation is executed after cost aggregation. It is common to utilize the winnertakeall (WTA) strategy for the criterion of disparity estimation. The selection of WTA can be formulated as
where R_{d} is the disparity search range. In the WTA process, we can finally estimate the raw disparity by choosing the smallest cost. The raw disparity map d_{p} needs to be refined in the final disparity refinement process.
2.1.4 Disparity refinement
Usually, the raw disparity map contains mismatched disparities occurring near the object boundaries due to occlusion problems and the regions with smooth texture regions, which are hard to find the exact matches. Thus, a suitable disparity refinement technique is required to remove the mismatched disparities. First, we need to identify the mismatched pixels by left right consistency check (LRC) to test if the disparities of the left and right views are consistent.
The LRC detection rule is normally stated as
where d_{i} and d_{r} are the disparities of the left and right views respectively, and σ_{0} is the tolerance for detecting the wrong disparity. To correct the mismatched pixels with L(x, y) = 0, there are several disparity refinement methods [26,27,28,29,30,31]. Usually, we can classify the mismatching pixels into large and small hole regions. For small hole regions, the background filling algorithm is used to improve the rough disparity map. For big hole regions, the fourstep hole filling method can search the nearest reliable pixel in neighboring regions [31].
2.2 Simple linear iterative clustering
It is noted that the disparity map will have same disparity values in an object. In order to correctly estimate the disparity, the precise segmentation of the objects will help to improve accurate performance. With precise object boundaries, we could use them to improve the estimation of disparity map. It is noted that the precise object segmentation is computation consuming processes for left and right images. However, for stereo matching, we only need to perform a localized segmentation in small regions. In other words, we only need to identify the superpixels, which are collections of adjacent and homogeneous pixels of the images. The superpixel, as a segment, provides more structure information than a single pixel.
In this paper, we adopt simple linear iterative clustering (SLIC) [32], which adapts kmeans clustering method to efficiently group the superpixels. The SLIC method with fivedimension space of {l_{i}, a_{i}, b_{i}, x_{i}, y_{i}} localizes the ith pixel search range to an area associated with the cluster center to reduce the computation, where (l, a, b) is the pixel color vector defined in CIELAB color space and (x, y) is the pixel position. The SLIC algorithm, which measures the distance between the ith pixel to the cluster center, considers both color similarity and spatial proximity, which are respectively denoted as
and
where {l_{k}, a_{k}, b_{k}, x_{k}, y_{k}} is the cluster center. The kmeans clustering is then applied to achieve superpixel segmentation. With the SLIC method, the utilization of segmentation results could provide more matching information for local stereo matching algorithms.
3 The proposed stereo matching system
Comparing to the traditional method depicted in Fig. 1, the corresponding functional diagram of the proposed stereo matching system is shown in Fig. 3, which uses SLICbased cost aggregation for estimating the accurate left and right depth maps.
To exhibit the usages of SLIC segmentation information, Fig. 4 shows two innovated kernels: the adaptive stereo matching computation unit first estimates the left and right raw disparity maps while the dualpath refinement unit further enhances them to become accurate disparity maps. The descriptions of the kernels are addressed in the following subsections.
3.1 Adaptive stereo matching computation
Figure 5 shows the diagram of the proposed adaptive stereo matching computation unit, which includes the adaptive cost selection of gradient cost, census cost, or SAD cost, the SLICbased cost aggregation with left and right SLIC segmentation information, and 2level winner takes all to estimate the left and right raw disparity maps.
3.1.1 Adaptive cost selection
To estimate the similarity between the pixels in the left and its corresponding right image, the initial cost computation is necessary. First, we detect the edge regions in color image by using Sobel operator such that we can classify the pixels into edge region or nonedge region. For edge regions, we will use gradient cost as the initial cost. For nonedge region, we further classify it as a smooth or nonsmooth region. Here, we utilize the crossbased window [22] to identify the smooth region. The criterion for the adaptive cost selection of SAD, gradient, or census cost is shown in Fig. 6.
If the pixel is classified in the edge region, the gradient similarity as stated in (2) is used since the variation in color image is large. If the pixel is classified as the nonedge region, we will use crossbased window to further verify whether the pixel lies on smooth region or not. To find a smooth plane, we calculated the crossbased plane as
where r^{*} denotes the largest left span in one direction and the indicator function is defined by
to evaluate the color similarity of pixels. In (15) and (16), p_{i} is the pixel extended in the direction. Once the largest span arm r* is derived, we define the left arm length h_{p}^{} = max (r^{*}, 1). Similarly, we can find the other three directions to obtain the arm lengths as {h_{p}^{−}, h_{p}^{+}, v_{p}^{−}, v_{p}^{+}} for the pixel p. The two orthogonal cross segments are given as
After computation of pixelwise cross decision results, we can obtain a shapeadaptive full support region U(p) for the pixel at p. The support region is an area integral of multiple segments H(p) and is defined as
where p_{v} is a support pixel located on the vertical segment V(p). If the area of the crossbased plane is more than 80% of the intact window, we classify the pixel lies on a smooth region. Once the pixel is classified in the smooth region, we use the census cost defined in (5) for stereo matching. On the contrary, if we classify the pixel in nonsmooth region, the SAD cost as stated in (1) will be used. For stereo matching cost, Fig. 7 shows the results of raw disparity map achieved by using the direct combined initial cost and the adaptive selected initial cost. In consideration of different texture features, the proposed adaptive cost selection achieves better raw disparity maps in both complex texture regions and smooth regions.
3.2 Cost aggregation with SLICbased ASW
For cost aggregation, we use adaptive support weights (ASWs), which are determined by SLIC segmentation information [32]. For each segmented superpixel, the aggregated cost for the pixels in the same segment should give them higher weights. The aggregation weights in the superpixel concept will be better than the geometry and color similarities in pixelbypixel fashions. First, we segment the color image into K levels by the SLIC segmentation algorithm. The segments in a higher level will have a more complex segmentation map. From low to high levels, if the neighboring pixels and the center pixel are in the same segmented superpixels, these pixels, which are prone to have higher similarity, should be given with higher weights. Figure 8 shows the result of different level segmentation images.
For Klevel system, the proposed SLICbased adaptive weight is given as
where N_{s}(p, q) denotes the segmentation dissimilarity defined as
where T [·] is an indicator function whose value equals to 1 when p and q are not in the same segment at the kth level, and 0 otherwise. In (21), S^{k}_{p} and S^{k}_{q} are the segmentation labels of pixels p and q at the kth level, respectively. To avoid the ambiguity in the dissimilar pixels, we suggest the adaptive weight based on the color difference to increase the accuracy if the dissimilarity is over half of total levels. The SLICbased adaptive weights help to obtain a more reasonable aggregation cost to improve the disparity estimation than the cost aggregation weights stated in (10).
It is noted that the proposed adaptive weights can reduce the distortion of the similar pixels and keep the sensitive in complex texture regions. If we only use segmentation similarity part, called SLIConly, without adaptive weights controlled by color changes, the variations of weights cannot tell the detailed differences. Figure 9 shows two distributions of the adaptive weights along xaxis obtained by the proposed method (blue color) and the SLIConly (red color). If their weights are the same, we show them with mixed (purple) color. Thus, the weights obtained by the SLIConly are hard to separate the differences in complex region since they are nearly equal and of low values. As the results, the proposed adaptive weights obtained in (22) can successfully avoid the ambiguity conditions with large variation weights.
3.3 Twolevel WTA strategy
Normally, the WTA strategy is used to select the disparity value with the minimum cost. However, there might exist over one disparity sharing the same minimum cost or have several similar minimum costs. In order to avoid inaccurate disparity decision, we modified WTA into twolevel procedure. First, we check every pixel as
where N(·) represents the number of disparities, which have the same minimum cost. If we have more than 3 candidates, which share the same minimum cost, we will replace d(p) by 256 to label the pixel p as an unstable point. To deal with the unstable points, we use windowbased histogram voting to select the correct disparity. For each pixel p, a histogram H_{p}(d) of the stable points surrounding p in this window is created. The histogram bin with the highest value d_{v}(p) is selected to replace the unstable point as
After the disparity of each pixel is found, we could adjust the scale of the disparities to generate raw depth maps. Generally, the left and right images will have slight intensity difference except the whole object is flat or perpendicular to the paired cameras. The minimum matching cost might not be able to find the correct matching point. With the proposed method, the truly disparity could be obtained more precisely.
3.4 Iterative dualpath depth refinement
Since the estimated raw disparity map usually contains some mismatches occurring near object boundaries and smooth regions. It is hard to reserve the shapes in these regions. Thus, we propose an iterative dualpath refinement algorithm to refine the raw disparity maps to obtain high precision depth maps and shape reserved.
To find the mismatched disparity, we first label the disparity map by the modified LRC as,
In (24), not only with disparity similarity, we further include the color tolerance to label the pixels. For L(x, y) = 0, the mismatched pixels are further categorized into two types: small holes or big holes. If the mismatching region between the pixel in the target view is less than 2 pixels, we classify them as small holes. Otherwise, the other mismatched pixels are classified as big holes. Figure 10 shows the flow chart of proposed iterative dualpath refinement.
3.4.1 Small hole filling
Since the mismatching region contains small holes, the color image helps to find the accurate disparity by obtaining the texture information. Here, we utilize maximumweighted candidate to find the correct disparity. With the image color similarity and spatial proximity in a correction window, we calculate the weight of each pixel as
where Δc_{pq} and Δs_{pq} are the color differences in RGB domain and spatial difference, respectively. We analyze the disparity distribution with the calculated weight. Under the disparity in the ascending order weighted histogram, the maximum corresponding disparity in the ordered histogram is the point of the final disparity, which is written as
where Ω is the correlation window region and d_{out} is the final refined disparity map.
3.4.2 Dualpath big hole filling
For big holes, finding the correct disparity in the surrounding pixels is not suitable in this circumstance. Here we should first classify the occlusion region into nonborder occlusion and leftmost/rightmost border occlusion. Then, we designed twopath hole filling for both cases. For nonborder regions, the holes, which are induced by the occlusion of the foreground objects, should be filled with the background disparity. On the other hand, the holes should be considered on the target color image only. The flow chart of occlusion region refinement in two paths is shown in Fig. 11, and the processing details are described as the following.
For nonborder hole filling, we usually directly use the background information to fill the pixels with the mismatched disparity. To get more accurate disparity map, we make use of the similar pixels in background of the color image. First, we calculate the color similarity to find the most similar pixel on the same horizontal line among Q pixels to fill its corresponding disparity of the occluded pixel. After finding the similar pixel in the background (extended to the left side), we assign its corresponding disparity to the hole as
where ΔC(x, x − i) denotes the color similarity between the target hole at x and the horizontalleft background pixel at x − i. Though there are still some residual wrong disparities by the proposed nonborder hole filling method, the problem can be solved by iterative steps. The illustrations of the regular background hole filling and the matchedcolor background hole filling are shown in Fig. 12 a and b, respectively. We did not fill the hole by the nearby background disparity (light blue) pixel. We filled the marchedcolor background disparity (yellow) pixel.
For the leftmost border regions in the target (left) disparity map, as shown in Fig. 13 a and b, we only can refer the target (left) color image to fill the holes of the target (left) disparity map since we cannot find any matching information from the reference (right) view. The object in the leftmost region of the right image totally disappears. We do not have any clue to find the corresponding disparity for unknown regions. Thus, we only can use the leftmost color image to infer the holes as possible. Fortunately, we have computed SLIC segments for determination of ASWs to the color image as shown in Fig. 13c, which shows the localized superpixels. We can use the concept that the pixel in the same superpixel should have the same depth value. For better inferences, we could associate the localized SLIC superpixels for border bighole filling as the following procedures: First, we could merge the localized superpixels, which have similar texture color information, as shown in Fig. 13d, to gather some superpixels into larger megapixels, which are treated as the objectlike segments; secondly, we horizontally extend the known and reliable disparity leftward to all the hole pixels, which share the same megapixel as possible. We can obtain some filled megapixels in this step.
Thirdly, we perform disparity histogram voting for those isolated megapixels, which do not contain any known disparity. Starting from the lowest pixel of the isolated megapixel, we choose the disparity from the largest disparity histogram of the filled megapixel, which is next to the current megapixel. Finally, the hole regions in the border can be reproduced with clear objects and their edges. The left refined disparity map is shown in Fig. 13f.
4 Results and discussion
The proposed stereo matching system was implemented with MATLAB R2016a and tested on an Intel Core i58400 PC and 16 GB RAM. The experimental evaluation is performed by using 2003 [33], 2005 [34], and 2014 [35] datasets created in Middlebury. The testing images that include Cones, Teddy, Tsukuba, and Venus are shown in Table 1 while the test images with higher resolutions and higher disparity levels are exhibited in Table 2.
4.1 Results achieved by the proposed system
As shown in Fig. 14, the raw and refined disparity maps achieved by the proposed adaptive stereo matching and dualpath refinement methods for Cones, Teddy, Tsukuba, and Venus test images are exhibited in the first and second rows, respectively.
4.2 Comparisons with other approaches
For performance evaluations, we compare the proposed method to other stereo matching algorithms. The compared methods include adaptive support weight (ASW) [21], segmentationbased adaptive support weight (SASW) [23], plant leaf stereo matching (LPSM) [36], edgebased stereo matching method (ESM) [37], stereo matching implemented on GPU platform [31], AdaStereo [38], comparative evaluation of SGM variants for dense stereo matching (tMGM) [39], learningbased disparity estimation (iResNet) [40], and DeepPruner [41] methods. Tables 3 and 4 show that the performance of the proposed multiscale ASW is superior to traditional ASW and other methods. Table 4 shows we have better performance than some deep learningbased methods in training set, even the training set is more beneficial to deep learning. Though the average error rate is slightly lower than SASW, our method utilizes more information from segmentation instead of only assign weight to 1 with the same segment. According to the refinement steps, the edge areas of the depth maps can be reasonably reconstructed. With the proposed algorithm, the disparity maps show accurate, which helps to improve the performance in the DIBR system for multiview synthesis [42]. Figures 15, 16, 17 and 18 show the results achieve by the referenced methods for Cones, Teddy, Tsukuba, and Venus images, respectively.
.
5 Conclusions
In this paper, we proposed a segmentbased adaptive stereo matching algorithm and a dualpath disparity segmentbased refinement method. The former can provide a reasonable good raw disparity map, and the latter can effectively enhance the raw disparity map into highquality ones. The contributions of the proposed method include adaptive cost selection, the segmentbased adaptive weights for cost aggregation, twolevel WTA strategy, and dualpath depth refinement. For small holes, the depth refinement uses maximumweighted candidate for the best filling process. For nonborder big holes, the background filling strategy is adopted by consideration of color and proximity information. And for border holes, the megapixelbased filling process is suggested to achieve better results. The proposed stereo matching system tested on Middlebury stereo datasets shows the best performances among all compared methods. Especially in the edge areas of the depth maps, it can reasonably reconstruct depth values of the objects. The experimental results show that the proposed system can reach highquality depth maps for 3D video broadcasting with 3DHEVC [43, 44] and CTDPHEVC [45, 46] formats. Comparing with the deeplearning methods, the proposed system can be applied to various databases. As to learningbased approaches with convolutional neural networks, they have problems in data dependencies and are easily blurred at depth edges because of the designs of the loss functions.
Availability of data and materials
All the data and material are from Middlebury datasets, which have been mentioned in the references.
Abbreviations
 3D:

Threedimension
 DIBR:

Depth imagebased rendering
 SAD:

Summation of absolute differences
 WTA:

Winnertakeall
 LRC:

Left right consistency check
 SLIC:

Simple linear iterative clustering
 ASWs:

Adaptive support weights
 SASW:

Segmentationbased adaptive support weight
 LPSM:

An improved stereo matching algorithm applied to 3D visualization of plant leaf
 ESM:

Variable window size for stereo image matching based on edge information
 tMGM:

SGM variants for dense stereo matching
 iResNet:

Learning for disparity estimation
 CTDP:

Centralized texture depth packing
References
R. Kaiser, D. Schatsky, For more companies, new ways of seeing – Momentum is building for augmented and virtual reality in the enterprise. Deloitte, Insights 5 (2017)
L. Zhang, Fast stereo matching algorithm for intermediate view reconstruction of stereoscopic television images, IEEE Trans Circuits Syst Video Technol, 16(10), 1259 – 1270, Oct. (2006).
S. Carmichael, Using 3D immersive technologies for organizational development and collaboration. Organizational Dynamics Working Papers, University of Pennsylvania, May 1 (2011)
KPMG – FOCCI, The future: now streaming, Indian Media and Entertainment Industrial Report, (2016).
J. H. Joung, K. H. An, J. W. Kang, M. J. Chung and W. Yu, 3D environment reconstruction using modified color ICP algorithm by fusion of a camera and a 3D laser range finder, Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, October 1115, 2009, St. Louis, USA (2009)
S. Kriegel, C. Rink, T. Bodenmüller and M. Suppa, Efficient nextbestscan planning for autonomous 3D surface reconstruction of unknown objects, J. RealTime Image Proc., 10(4), 611631, Dec. (2015).
X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, R. Urtasun, 3D object proposals for accurate object class detection, Proc. of Advances in Neural Information Processing Systems 28 (NIPS), (2015)
S. Song and J. Xiao, Sliding shapes for 3D object detection in depth images, Proc. of European Conference on Computer Vision. Pp.634651, (2014).
E. Zappa, P. Mazzoleni, Y. Hai, Stereoscopy based 3D face recognition system. Proc Comput Sci. 1(1), 2529–2538 (2010)
S.C. Chan, H. Shum, K. Ng, Imagebased rendering and synthesis. IEEE Signal Process. Mag. 24(6), 22–33 (2007)
I.P. Howard, B.J. Rogers, Binocular Vision and Stereopsis (Oxford University Press, USA, 1995)
Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Machine Intell 23(11), 1222–1239 (2001)
X. Sun, X. Mei, S. Jiao, M. Zhou and H. Wang, Stereo matching with reliable disparity propagation, Proc. of International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, Hangzhou, 2011, pp. 132139. (2011)
J. Sun, N.N. Zheng and H.Y. Shum, Stereo matching using belief propagation, IEEE Trans Pattern Anal Machine Intell, 25(7), 787800, July (2003).
O. Veksler, Stereo correspondence by dynamic programming on a tree, Proc. of IEEE Conference on Computer Vision and Pattern Recognition, (2005).
W. Luo, A. G. Schwing and R. Urtasun, Efficient deep learning for stereo matching, Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 56955703. (2016)
M. Humenberger, T. Engelke, and W. Kubinger, A censusbased stereo vison algorithm using modified semiglobal matching and plane fitting to improve matching quality, Proc. of IEEE Computer Vision Patter Recognition Conf., pp. 7784, (2010).
N. Y.C. Chang, T.H. Tsai, B.H. Hsu, Y.C. Chen, T.S. Chang, Algorithm and architecture of disparity estimation with minicensus adaptive support weight, IEEE Trans Circuits Syst Video Technol, 20(6), 792 – 805, June (2010).
T. Chen and W. Li, Stereo matching algorithm based on adaptive weight and local entropy, Proc. of the 9th International Conference on Modelling, Identification and Control (ICMIC), Kunming, 2017, pp. 630635. (2017)
O. Veksler, Fast variable window for stereo correspondence using integral images, Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Madison, WI, USA, 2003, pp. II. (2003)
K. J. Yoon and I. S. Kweon, Adaptive supportweight approach for correspondence search, IEEE Trans Pattern Anal Machine Intell, 28(4), 650656, April (2006).
K. Zhang, J. Lu and G. Lafruit, Crossbased local stereo matching using orthogonal integral images, IEEE Trans Circuits Syst Video Technol, 19(7), 10731079, July (2009).
F. Tombari, S. Mattoccia, L. Di Stefano, SegmentationBased Adaptive Support for Accurate Stereo Correspondence in Lecture Notes in Computer Science, Berlin, Germany: Springer, 4872, pp. 427438, Dec. (2007).
D. Chang, S. Wu, H. Hou, L. Chen, Accurate and fast segmentbased cost aggregation algorithm for stereo matching. Proc. of IEEE 19th International Workshop on Multimedia Signal Processing, 1–6 (2017, 2017)
H. Zhu, J. Yin, D. Yuan, SVCV: Segmentation volume combined with cost volume for stereo matching. IET Comput. Vis. 11(8), 733–743 (2017)
S. B. Kang, R. Szeliski and J. Chai, Handling occlusions in dense multiview stereo, Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, Kauai, HI, USA, 2001, pp. II, (2001)
P.C. Kuo, K.L. Lo, H.K. Tseng, K.T. Lee, B.D. Liu, J.F. Yang, Stereoview to multiview conversion architecture for autostereoscopic 3D displays. IEEE Trans Circuits Syst Video Technol 28(11), 3274–3287 (2017)
H. T. Kuo, VLSI Implementation of realtime stereo matching and centralized texture depth packing for 3D video broadcasting, M.S. Thesis, National Cheng Kung University, Tainan, Taiwan, July (2017).
C. L. Hsieh, A twoview to multiview conversion system and its VLSI implementation for 3D displays, M. S. Thesis, National Cheng Kung University, Tainan, Taiwan, July (2017).
A. Emlek, M. Peker and K. F. Dilaver, Variable window size for stereo image matching based on edge information, Proc. of International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, 2017, pp. 14. (2017)
T. Y. Sun, Stereo matching and depth refinement on GPU platform, M. S. Thesis, National Cheng Kung University, Tainan, Taiwan, July (2018).
R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Süsstrunk, SLIC superpixels compared to stateoftheart superpixel methods. IEEE Trans Pattern Anal Machine Intell 34(11), 2274–2282 (2012)
D. Scharstein and R. Szeliski, Highaccuracy stereo depth maps using structured light, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), volume 1, Madison, WI, pp. 195202 June 2003.
D. Scharstein and C. Pal, Learning conditional random fields for stereo, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), Minneapolis, MN, Jun, (2007).
D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nesic, X. Wang, P. Westling, HighResolution Stereo Datasets with SubpixelAccurate Ground Truth, German Conference on Pattern Recognition (GCPR 2014) (Münster, Germany, Sep, 2014)
Liu, Zhichao, Lihong Xu, and Chaofeng Lin. An improved stereo matching algorithm applied to 3D visualization of plant leaf. 2015 8th International Symposium on Computational Intelligence and Design (ISCID). 2. IEEE, (2015).
Emlek, A., Peker, M., & Dilaver, K. F., Variable window size for stereo image matching based on edge information, 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), IEEE, pp. 1–4, September (2017).
Song, X., Yang, G., Zhu, X., Zhou, H., Wang, Z., & Shi, J, AdaStereo: a simple and efficient approach for adaptive stereo matching, arXiv preprint arXiv:2004.04627. (2020)
Patil, Sonali, Tanmay Prakash, Bharath Comandur, and Avinash Kak., A comparative evaluation of SGM variants (including a new variant, tMGM) for dense stereo matching, arXiv preprint arXiv:1911.09800, (2019).
Liang, Z., Feng, Y., Guo, Y., Liu, H., Chen, W., Qiao, L., ... & Zhang, J, Learning for disparity estimation through feature constancy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 28112820, (2018).
Duggal, Shivam, Shenlong Wang, WeiChiu Ma, Rui Hu, and Raquel Urtasun, DeepPruner: learning efficient stereo matching via differentiable PatchMatch., In Proceedings of the IEEE International Conference on Computer Vision, pp. 43844393. (2019).
K. J. Hsu, GPU implementation for centralized texture depth depacking and depth imagebased rendering, M. S. Thesis, National Cheng Kung University, Tainan, Taiwan, July (2017).
G. Tech, K. Wegner, Y. Chen, and S. Yea, 3D HEVC test model 3. Document: JCT3VC1005. Draft 3 of 3DHEVC Test Model Description. Geneva, (2013).
D. Rusanovskyy, K. Müller, and A. Vetro, Common test conditions of 3DV core experiments, joint collaborative team on 3D video coding extensions of ITUT SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Document no. JC3VCE1100, Vienna, Aug. (2013).
J.F. Yang, K.T. Lee, G.C. Chen, W.J. Yang and Lu Yu, A YCbCr color depth packing method and its extension for 3D video broadcasting services, IEEE Trans. on Circuits and Systems for Video Technology, ISSN: 10518215, Online ISSN: 15582205 Digital Object Identifier: https://doi.org/10.1109/TCSVT.2019.29342541, pp.111. (2019)
W.J. Yang, J.F. Yang, G.C. Chen, P.C. Chung, M.F. Chung, An assigned color depth packing method with centralized texture depth packing formats for 3D VR broadcasting services. IEEE J Emerg Selected Topics Circuits Systems 9(1), 122–132 (2019)
Acknowledgements
The authors deeply thank the Editor and anonymous reviewers who have spent their valuable time to review this paper and give constructive suggestions for improvements of formatting and readability of the paper.
Funding
This work was supported in part by the Ministry of Science and Technology of Taiwan, under Grant MOST 1062221E006 038 MY3.
Author information
Authors and Affiliations
Contributions
C.S. Huang carried out image processing studies, participated in the proposed system, and drafted the manuscript. Y.H. Huang carried out the software simulations and adjustment parameters. D.Y. Chan and J.F. Yang conceived of the study and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing financial interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Huang, CS., Huang, YH., Chan, DY. et al. Shapereserved stereo matching with segmentbased cost aggregation and dualpath refinement. J Image Video Proc. 2020, 38 (2020). https://doi.org/10.1186/s13640020005253
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13640020005253