 Research
 Open Access
 Published:
Robust semiautomatic 2Dto3D image conversion via residualdriven optimization
EURASIP Journal on Image and Video Processing volume 2018, Article number: 66 (2018)
Abstract
Semiautomatic 2Dto3D conversion provides a costeffective solution to the problem of 3D content shortage. The performance of most methods degrades significantly when crossboundary scribbles are present due to their inability to remove unwanted input. To address this problem, a residualdriven energy function is proposed to remove unwanted input introduced by crossboundary scribbles while preserving expected user input. Firstly, confidence of user input is computed from residuals between the estimation and userspecified depth values, and it is applied to the data fidelity term. Secondly, the residualdriven optimization is performed to estimate dense depth from user scribbles. The procedure is repeated until a maximum number of iterations is exceeded. Input confidence based on residuals avoids the propagation of unwanted scribbles and thus enables to generate highquality depth even with crossboundary input. Experimental results demonstrate that the proposed method removes unwanted scribbles successfully while preserving expected input, and it outperforms the stateoftheart when presented with crossboundary scribbles.
Introduction
2Dto3D conversion aims to estimate depth from 2D images and generates stereoscopic views from the depth, which is a key technology to produce 3D content [1]. Existing approaches are mainly categorized into two groups: automatic and semiautomatic methods.
Automatic methods try to create depth from 2D images using various depth cues, such as dark channel [2], motion [3], lighting bias [4], defocus [5], geometry [6], boundary [7], etc. Each cue is only applicable to certain scenes [8], and thus, these methods are hard to provide acceptable results in any general content. Recently, neural networks have been employed to learn the implicit relation between depth and color values [9–12]. However, these learningbased methods are limited to the trained image types [13].
Semiautomatic methods address these issues by introducing human interactions. The objective of these approaches is to produce a dense depthmap from user scribbles which indicate the labeled pixels are farther or closer from the camera [14]. In order to solve the problem of 3D content shortage, many methods have been developed for depth estimation from user input. Guttmann et al. [15] employed user scribbles to train a support vector machines (SVM) classifier that assigns depth to image patches, but results may be inaccurate due to misclassifications. S‘ykora et al. [16] proposed an interactive method for user adding depth (in)equalities information and formulated depth propagation as an optimization problem, but it may produce several artifacts due to the incorrect estimation of contour thickness. Rzeszutek et al. [17] utilized the randomwalks (RW) algorithm to generate dense depthmaps from user input, but RW has problems in preserving strong edges. Phan et al. [18] appended graphcuts (GC) segmentation to the neighbor cost in RW to preserve depth boundaries. Xu et al. [19] proposed a similar method which uses a fast watershed segmentation to replace GC. Zhang et al. [20] combined automatic depth estimation from multiple cues and interactive object segmentation to obtain the final depth. Zeng et al. [21] utilized occlusion cues and shape priors to obtain a rough approximation of depth and refined the estimation using an interactive ground fitting. These segmentationbased methods can preserve strong edges but may generate artifacts due to incorrect segments. Yuan et al. [22] incorporated nonlocal neighbors into the RW algorithm to improve depth quality. Liang et al. [23] extended this scheme to support video conversion using spatialtemporal information. Wang et al. [24] propagated userspecified sparse depth into dense depth using an optimization method originally used for colorization [25]. Wu et al. [26] improved this method with depth consistency between superpixels. Liao et al. [27] used a diffusion process to generate a depth map from user coarse annotations.
Depthmap is typically made of smooth regions separated by sharp transitions along the boundaries between different objects [28]. Therefore, existing semiautomatic methods require that user scribbles do not cross object boundaries; otherwise, the quality of produced depth degrades significantly. As shown in Fig. 1, when user scribbles cross object boundaries, the stateoftheart methods [18, 22, 24] will produce depth artifacts. In 2Dto3D conversion, the crossboundary scribbles are introduced by users carelessly. As for a crossboundary scribble, its longer part is usually user expected input and shorter part is unwanted input. It can be seen from Fig. 1 that the proposed method can remove depth artifacts caused by unwanted user input from crossboundary scribbles.
Semiautomatic image segmentation methods have addressed the problem of crossboundary scribbles [29–31]. Although Subr et al. [29] and Bai et al. [30] can reduce artifacts caused by crossboundary scribbles, they focus on the foreground object segmentation and are hard to apply in 2Dto3D conversion. Oh et al. [31] used occurrence and cooccurrence probability (OCP) of color values at labeled pixels to estimate the confidence of user input. This method can be used for 2Dto3D conversion, but it may mistake expected scribbles for unwanted ones.
Surprisingly, there are few methods to consider the impact of crossboundary scribbles on 2Dto3D conversion. To address this problem, we propose a robust method based on residuals between the userspecified and estimated depth values during the iteratively solving process. Thanks to the confidence of user scribbles measured by the residuals, experimental results show that the proposed method can remove depth artifacts caused by crossboundary scribbles. The two most relevant to this work are Wang et al. [24] and Hong et al. [32]. Unlike the optimization model in Wang et al. [24], the proposed method utilizes residuals to eliminate the depth artifacts caused by crossboundary scribbles. The main difference to Hong et al. [32] is that they use residuals to determine the relative weight between data fidelity and regularization, whereas this paper leverages residuals to compute the confidence of user scribbles.
Recently, Ham et al. [33] proposed a static dynamic filter (SDF) to reduce artifacts caused by structural differences between guidance and input signals. Although SDF [33] can handle differences in structure, it is not robust to outliers introduced by crossboundary scribbles. Yuan et al. [34] proposed an ℓ_{1} optimization method to remove user erroneous scribbles. However, ℓ_{1} norm assumes that input image can be approximated by the sum of a piecewiseconstant function and a smooth function [35]. Depth artifacts will be introduced when the assumption does not hold.
The remainder of this paper is organized as follows. In Section 2, the proposed method is described. The experimental results are given in Section 3. Finally, conclusion is given in Section 4.
Method
The workflow of 2Dto3D image conversion based on the proposed method is shown in Fig. 2. Firstly, the user specifies sparse depth on an input image, where scribbles indicate the labeled pixels are closer or farther from the camera. Secondly, a sparse depthmap is extracted according to the intensities of user scribbles. Thirdly, the confidence of user scribbles is calculated based on the residuals between the estimated and userspecified depth values. Then, an energy function constraint by the confidence is designed and minimized to obtain the estimated dense depthmap. The procedure is repeated from the confidence computation step, until a maximum number of iterations is exceeded. Finally, the stereoscopic 3D image is generated by depth imagebased rendering (DIBR).
Model
Let O be the set consisting of pixels with userspecified depth values. The objective of this paper is to estimate an accurate dense depthmap d from the user input and the given image I even when crossboundary scribbles are present. It can be expressed as solving the energy minimization problem:
where d_{i} and u_{i} denote the estimated and userspecified depth values at pixel i, respectively. n is the size of the input image I. \(\mathcal {N}_{i}\) represents the set of 8connected neighbors for pixel i. w_{ij} is a weighting function to make pixels with similar colors have similar depth values and is defined as
where I_{i} and I_{j} are the color values of image I at pixel i and j, respectively. β in Formula (2) is a parameter controlling the strength of the weight w_{ij}.
r_{i} in Formula (1) is a confidence measure of the userspecified depth value at pixel i and is defined asz
Here, η is a constant that controls how dissimilar two depth values are. In Formula (1), the data fidelity term enforces the estimated depth values of labeled regions to approximate the userspecified ones. Unlike Wang et al. [24], the proposed method maintains this consistency only when user inputs are confident. The confidence r_{i} is low when the residual (d_{i} − u_{i})^{2} is high. The regularization term is used to penalize the difference of the estimated depth values between each pixel and its neighbors.
Solver
Formula (1) is nonlinear to d and thus is an unconstrained, nonlinear optimization. A fixed point iteration strategy is adopted to solve Formula (1). Let \(\mathbf {d}^{k} =\left [d_{i}^{k}\right ]_{n \times 1}\) and u denote vectors representing the estimated depth image in iteration k and userspecified depth values, respectively. The ith element of u is userspecified depth value u_{i} if i∈O and 0 otherwise. Then, in iteration k, the objective function to be minimized is expressed as
where R^{k − 1} is a n×n diagonal matrix and its ith diagonal element is \(r_{i}^{k\,\,1}\). Here, \(r_{i}^{k\,\,1} = \text {exp}\left ({\eta } \left (d_{i}^{k\,\,1} \,\, u_{i}\right)^{2}\right)\) if i∈O and 0 otherwise. L is the n×n sparse Laplacian matrix. Its element L_{ij}=−w_{ij} (i≠j) and \(L_{ii} = \sum _{j \in \mathcal {N}_{i}} w_{ij}\). To minimize the energy function in Formula (4), taking its derivatives on d^{k}, Formula (5) can be obtained.
The energy function in Formula (4) can be minimized by setting \(\frac {\partial E\left (\mathbf {d}^{k}\right)}{\partial \mathbf {d}^{k}}\) in Formula (5) equal to zero, and Formula (6) is obtained.
The linear system in Formula (6) is sparse, and thus, it can be solved using standard methods such as preconditioned conjugate gradient.
Analysis
It can be seen from Formula (4) that in each iteration, userspecified depth values can only be preserved if the residuals between estimated and userspecified depth values are small.
Specifically, the unwanted user input introduced by crossboundary scribbles will make the depth values of labeled pixels differ from their neighbors. Meanwhile, the regularization term will enforce the estimation to be consistent with their neighbors, and thus make the estimated depth to deviate from the user input. As a result, the residual between the estimated and userspecified depth values of the unwantedly labeled pixel will be increased, and the confidence computed from the residual in Formula (3) will be decreased to zero during the iterative solution process. Therefore, the proposed method can remove unwanted user input introduced by crossboundary scribbles.
As for userexpected input, the specified values of labeled pixels are consistent with their neighbors; thus, the estimation mainly depends on the data fidelity term which enforces the estimated depth to approximate the user input. Therefore, the residuals of expectedly labeled pixels are almost 0, and their confidence will be remained at 1 with the proper setting of η in Formula (3). For this reason, the proposed method can preserve the expected user input.
Figure 3 shows the change curve of confidence from user scribbles in an input image. It can be seen that confidence of the unwanted input rapidly drops to 0 while confidence of the expected input remains at 1.
Experimental results and discussion
Experimental details
RGBZ (red, green, blue plus zaxis depth) datasets [36] are used for comparison which include objects, human figures, and multiple human interaction. Performance are also evaluated on four Middlebury stereo datasets, Tsukuba, Venus, Teddy, and Cones [37]. The source code and more experimental results can be downloaded from https://github.com/tcyhx/rdopt.
In the proposed method, the bandwidth parameters, η, are empirically set to 9000. A maximum number of five iterations is used to solve Formula (1). β is set to 100 for RGBZ datasets and 50 for Middlebury datasets. Results of the proposed method are compared to the stateoftheart: RW [17], hybrid GC and RW (HGR) [18], nonlocal RW (NRW) [22], optimization (OPT) [24], OCP [31], SDF [33], and ℓ_{1} [34]. Note that OCP originally aims for interactive segmentation, and this paper applies it to 2Dto3D conversion by replacing the confidence in Formula (3) with the aggregation of the OCPs in a local neighborhood. Structural similarity (SSIM) [38] is used for performance evaluation since it can predict human perception of image quality. The standard deviation of SSIM in the experiments is set to 4 so as to evaluate the similarity of semiglobal structure [39].
In the experiments, a trained user is asked to draw scribbles with a standard brush by referring to the groundtruth depth values, where higher intensities indicate the labeled pixels are closer to the camera. Since depth propagation from user scribbles relies on color or intensity similarity between neighboring pixels, more scribbles are drawn in high textured areas. To make the comparison as fair as possible, a sparse depthmap is extracted from user scribbles, and each algorithm estimates a dense depthmap from the sparse depthmap.
Experiments with crossboundary user scribbles
In this section, a user is asked to assign the initial depth values manually by drawing some scribbles across object boundaries. Tables 1 and 2 show the SSIM values of the proposed algorithm in comparison with other methods on the RGBZ and Middlebury datasets, respectively. As shown in Tables 1 and 2, the proposed method achieves the highest average SSIM among all of the competing methods for both datasets. Except for the comparison with ℓ_{1} in RGBZ_05 and Teddy, the SSIM values of the proposed method are higher than those of the other methods.
For RGBZ datasets, qualitative comparisons are shown in Figs. 4, 5, 6, 7, 8, 9, 10, 11 and 12. Qualitative comparisons on Middlebury datasets are given in Figs. 13, 14, 15, and 16. The rendered images based on depth are only shown for Middlebury datasets in order to avoid making the lengthy paper. In each figure, the yellow rectangles on depthmaps or synthesized views represent artifacts caused by crossboundary scribbles while the purple ones denote artifacts caused by other issues. The crossboundary scribbles of userlabeled images are marked by the yellow rectangles (Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12b, 13, 14, 15, and 16a).
RW [17] assumes that user scribbles should not cross object boundaries and thus generates depth artifacts around crossboundary labeled regions (see Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, and 16e). These artifacts cause distortions when a new view is synthesized from the depth as shown in Figs. 13, 14, 15, and 16f. HGR [18] relies on GC to preserve depth boundaries. However, GC is sensitive to the outliers. The quality of depthmaps produced from HGR thus degrades significantly when user scribbles cross object boundaries (see Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12f, 13, 14, 15, and 16g), which leads to significant degradation of quality in synthesized views (see Figs. 13, 14, 15, and 16h). Although introducing nonlocal constraints, NRW [22] is difficult to remove depth artifacts caused by crossboundary user scribbles (see Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12g, 13, 14, 15, and 16i), which results in distortions in synthesized views (see Figs. 13, 14, 15, and 16j). OPT [24] constrains the estimated depth values of labeled pixels to be consistent with the user input; thus, unwanted information propagates to the neighbors (see Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12h, 13, 14, 15, and 16k). Distortions in synthesized views caused by input errors are shown in yellow rectangles of Figs. 13, 14, 15, and 16l. OCP [31] can remove some depth artifacts caused by crossboundary user input, but it fails when the crossboundarylabeled pixels have similar color distributions; thus, residual artifacts are still visible (see Figs. 4, 5, 6, 7i, 10, 11, 12i, 13, and 14m). OCP may also consider some expected scribbles as unwanted ones [31], which yields distortions as shown in the purple rectangles of Figs. 7, 8, 9i, 14, 15, and 16m. SDF [33] can reduce depth artifacts caused by structural differences between color and depth images by using the Welsch function as a regularizer. However, SDF is hard to handle artifacts introduced by the crossboundary scribbles (see Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12j, 13, 14, 15, and 16o), which leads to distortions in synthesized views as shown in Figs. 13, 14, 15, and 16p. ℓ_{1} [34] tends to produce a nearly piecewise constant depthmap with sparse structures. Therefore, it generates artifacts when depth discontinuities do not coincide with object boundaries (see purple rectangles of Figs. 4, 5, 6, 7, 8, 9k, 14q, and 16q), which causes distortions in synthesized views (see purple rectangles of Figs. 14r and 16r). The proposed method alleviates the influence of crossboundary user scribbles successfully and produces highquality depthmaps (see Figs. 4, 5, 6, 7, 8, 9, 10, 11, and 12l, and 13, 14, 15 and 16s). Therefore, the proposed method can reduce distortions in synthesized views caused by crossboundary input as shown in Figs. 13, 14, 15 and 16t.
Experiments without crossboundary user scribbles
In this section, the user carefully draws on an input image, ensuring that scribbles do not cross object boundaries. In this case, unwanted scribbles are usually inside objects when depth discontinuity occurs. Tables 3 and 4 show the SSIM obtained from different methods on RGBZ and Middlebury datasets, respectively. It can be seen from Table 3 that the proposed method gives the highest average SSIM on RGBZ datasets. As shown in Table 4, both the proposed method and OPT [24] obtain the highest average SSIM on Middlebury datasets. Therefore, the proposed method has comparable performance to the stateoftheart methods when user scribbles do not cross object boundaries.
Conclusion
To remove unwanted input from crossboundary scribbles in semiautomatic 2Dto3D conversion, this paper proposes a residualdriven energy function for depth estimation from user input. The residual between the estimation and userspecified depth value will be large at the unwantedly labeled pixel due to inconsistency with its neighbors and be small at expectedly labeled pixel due to consistency with the neighbors. Therefore, the residual can differentiate unwanted scribbles from the user input. The experimental results demonstrate that the proposed method eliminates the depth artifacts caused by crossboundary scribbles effectively and outperforms existing methods when crossboundary input is present.
Abbreviations
 RGBZ:

Red, green, blue plus zaxis depth
 SVM:

Support vector machines
 RW:

Randomwalks
 GC:

Graphcuts
 OCP:

Cooccurrence probability
 DIBR:

Depth imagebased rendering
 HGR:

Hybrid GC and RW
 NRW:

Nonlocal RW
 OPT:

Optimization
 SSIM:

Structural similarity
References
W Huang, X Cao, K Lu, Q Dai, AC Bovik, Toward naturalistic 2Dto3D conversion. IEEE Trans. Image Process. 24(2), 724–733 (2015).
TY Kuo, YC Lo, CC Lin, in Proceedings of the IEEE Intl. Conf. on Acoustics, Speech and Signal Process. 2Dto3D conversion for singleview image based on camera projection model and dark channel model (IEEEPiscataway, 2012), pp. 1433–1436.
YK Lai, YF Lai, YC Chen, An effective hybrid depthgeneration algorithm for 2Dto3D conversion in 3D displays. J. Disp. Technol. 9(3), 154–161 (2013).
H Han, G Lee, J Lee, J Kim, S Lee, A new method to create depth information based on lighting analysis for 2D/3D conversion. J. Cent. South Univ. 20(10), 2715–2719 (2013).
J Lin, X Ji, W Xu, Q Dai, Absolute depth estimation from a single defocused image. IEEE Trans. Image Process. 22(11), 4545–4550 (2013).
CC Han, HF Hsiao, Depth estimation and video synthesis for 2D to 3D video conversion. J. Sign. Process. Syst. 76(1), 33–46 (2014).
TT Tsai, TW Huang, RZ Wang, A novel method for 2Dto3D video conversion based on boundary information. EURSIP J. Image Video Process. 2: (2018). https://link.springer.com/article/10.1186%2Fs1364001702395.
AH Somaiya, RK Kulkarni, in Proceedings of the Intl. Conf. on Signal Process. Image Process. Pattern Recognition (ICSIPR). Depth cue selection for 3D television (IEEEPiscataway, 2013), pp. 14–19.
F Liu, C Shen, G Lin, I Reid, Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. on Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2016).
C Godard, OM Aodha, GJ Brostow, in Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Unsupervised monocular depth estimation with leftright consistency (IEEEPiscataway, 2017), pp. 6602–6611.
I Laina, C Rupprecht, V Belagiannis, F Tombari, N Navab, in Proceedings of the Intl. Conf. on 3D Vision (3DV). Deeper depth prediction with fully convolutional residual networks (IEEEPiscataway, 2016), pp. 239–248.
J Xie, R Girshick, A Farhadi, in Proceedings of the European Conf. on Computer Vision (ECCV). Deep3D: fully automatic 2Dto3D video conversion with deep convolutional neural networks (SpringerBerlin, 2016), pp. 842–857.
A Lopez, E Garces, D Gutierre, in Proceedings of the Spanish Computer Graphics Conference. Depth from a single image through user interaction (WileyHoboken, 2014), pp. 1–10.
R Rzeszutek, R Phan, D Androutsos, in Proceedings of the ACM Intl. Conf. on Multimedia. Depth estimation for semiautomatic 2D to 3D conversion (ACMNew York, 2012), pp. 817–820.
M Guttmann, L Wolf, D CohenOr, in Proceedings of the IEEE Intl. Conf. on Computer Vision (ICCV). Semiautomatic stereo extraction from video footage (IEEEPiscataway, 2009), pp. 136–142.
D S‘ykora, D Sedlacek, S Jinchao, J Dingliana, S Collins, Adding depth to cartoons using sparse depth (in)equalities. Comput. Graph. Forum. 29(2), 615–623 (2010).
R Rzeszutek, R Phan, D Androutsos, in Proceedings of the IEEE Intl. Conf. on Multimedia & Expo. Semiautomatic synthetic depth map generation for video using random walks (IEEEPiscataway, 2011), pp. 1–6.
R Phan, D Androutsos, Robust semiautomatic depth map generation in unconstrained images and video sequences for 2D to stereoscopic 3D conversion. IEEE Trans. Multimedia. 16(1), 122–136 (2014).
X Xu, LM Po, KW Cheung, KH Ng, in Proceedings of the IEEE Intl. Conf. on Signal Processing, Communication and Computing (ICSPCC). Watershed and random walks based depth estimation for semiautomatic 2D to 3D image conversion (IEEEPiscataway, 2012), pp. 84–87.
Z Zhang, C Zhou, Y Wang, W Gao, Interactive stereoscopic video conversion. IEEE Trans. Circuits Syst. Video Technol. 23(10), 1795–1807 (2013).
Q Zeng, W Chen, H Wang, C Tu, D Cohenor, D Lischinski, B Chen, Hallucinating stereoscopy from a single image. Comput. Graph. Forum. 34(2), 1–12 (2015).
H Yuan, S Wu, P Cheng, P An, S Bao, Nonlocal random walks algorithm for semiautomatic 2Dto3D image conversion. IEEE Signal Proc. Let. 22(3), 371–374 (2015).
Z Liang, J Shen, in Proceedings of the IEEE Intl. Conf. on Digital Signal Processing. Consistent 2Dto3D video conversion using spatialtemporal nonlocal random walks (IEEEPiscataway, 2016), pp. 672–675.
O Wang, M Lang, M Frei, A Hornung, A Smolic, M Gross, in Proceedings of the Eur. Symp. SketchBased Interfaces and Modeling. StereoBrush: interactive 2D to 3D conversion using discontinous warps (SpringerBerlin, 2011), pp. 47–54.
A Levin, D Lischinski, Y Weiss, Colorization using optimization. ACM Trans. Graph. 23(3), 689–694 (2004).
S Wu, H Yuan, P An, P Cheng, Semiautomatic 2Dto3D conversion using soft segmentation constrained edgeaware interpolation. ACTA Electron. Sin. 43(11), 2218–2224 (2015).
J Liao, S Shen, E Eisemann, in Graph. Interface Conf. Depth Map Design and Depthbased Effects With a Single Image (ACMNew York, 2017), pp. 57–63.
M Calemme, P Zanuttigh, S Miiani, M Cagnazzo, B PesquetPopescu, in Proceedings of the IEEE Intl. Conf. on Image Processing. Depth map coding with elastic contours and 3D surface prediction (IEEEPiscataway, 2016), pp. 1106–1110.
K Subr, S Paris, C Soler, J Kautz, Accurate binary image selection from inaccurate user input. Comput. Graph. Forum. 32(2pt1), 41–50 (2013).
J Bai, X Wu, in Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Errortolerant scribbles based interactive image segmentation (IEEEPiscataway, 2014), pp. 392–399.
C Oh, B Ham, K Sohn, Robust interactive image segmentation using structureaware labeling. Expert Syst. Appl. 79:, 90–100 (2017).
BW Hong, JK Koo, H Dirks, M Nurger, in Proceedings of the German Conf. on Pattern Recognition (GCPR). Adaptive regularization in convex composite optimization for variational imaging problems (SpringerBerlin, 2017), pp. 268–280.
B Ham, M Cho, J Ponce, Robust guided image filtering using nonconvex potentials. IEEE Trans. Pattern. Anal. Mach. Intell. 40(1), 192–207 (2018).
H Yuan, P An, S Wu, Y Zheng, Errortolerant semiautomatic 2Dto3D conversion via l1 optimization. Acta Electron. Sin. 46(2), 447–455 (2018).
M Jung, Piecewisesmooth image segmentation models with l1 datafidelity terms. J. Sci. Comput. 70(3), 1229–1261 (2017).
C Richardt, C Stoll, NA Dodgson, HP Seidel, C Theobalt, Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos. Comput. Graph. Forum. 31(2), 247–256 (2012).
D Scharstein, R Szeliski, in 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings. Highaccuracy stereo depth maps using structured light (IEEEPiscataway, 2003), pp. 195–2021.
Z Wang, AC Bovik, HR Sheikh, EP Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004).
Y Konno, M Tanaka, M Okutomi, Y Yanagawa, K Kinoshita, M Kawade, in 2016 23rd International Conference on Pattern Recognition (ICPR). Depth map upsampling by selfguided residual interpolation (IEEEPiscataway, 2016), pp. 1394–1399.
Acknowledgements
The author would like to thank the editors and anonymous reviewers for their valuable comments.
Funding
This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY16F010014, and Ningbo Natural Science Foundation under Grant No. 2017A610109.
Availability of data and materials
The author can provide the data and source code.
Author information
Authors and Affiliations
Contributions
HY designed the research, analyzed the data, then wrote and edited the manuscript. The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The author declares that he has no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional information
Authors’ information
Hongxing Yuan is currently an Associate Professor at the School of Electronics and Information Engineering, Ningbo University of Technology, China. He received doctor’s degree from University of Science and Technology of China, in 2010. His current research interests include computer vision, 3D video processing, and 2Dto3D conversion.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Yuan, H. Robust semiautomatic 2Dto3D image conversion via residualdriven optimization. J Image Video Proc. 2018, 66 (2018). https://doi.org/10.1186/s136400180310x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s136400180310x
Keywords
 3D video
 2Dto3D conversion
 Depth
 Crossboundary scribbles
 Optimization