An Iterative Surface Evolution Algorithm for Multiview Stereo
© Y. Xi and Y. Duan. 2010
Received: 2 August 2009
Accepted: 3 March 2010
Published: 15 April 2010
We propose a new iterative surface evolution algorithm for multiview stereo. Starting from an embedding space such as the visual hull, we will first conduct robust 3D depth estimation (represented as 3D points) based on image correlation. A fast implicit distance function-based region growing method is then employed to extract an initial shape estimation based on these 3D points. Next, an explicit surface evolution will be conducted to recover the finer geometry details of the recovered shape. The recovered shape will be further improved by several iterations between depth estimation and shape reconstruction, similar to the Expectation Maximization (EM) approach. The experiments on the benchmark datasets show that our algorithm can obtain high-quality reconstruction results that are comparable with the state-of-art methods, with considerable less computational time and complexity.
Despite significant advancement in interactive shape modeling, creating complex high-quality realistic looking 3D models from scratch is still a very challenging task. Recent advancement in 3D shape acquisition systems such as laser range scanners and encoded light projecting system has made directly 3D data acquisition feasible . These active 3D acquisition systems however remain expensive. Meanwhile, the price of digital cameras and digital video cameras keeps decreasing while the quality is improving every day, partially due to the intense competition in the huge consumer market. Furthermore, huge amounts of images and videos are added in internet sites such as Google, and so forth. Every day, a lot of which could be used for multiview image-based 3D shape reconstruction .
To date, there have been a lot of researches conducted in the area of multiview image-based modeling. The recent survey by Seitz et al.  gives an excellent review of the state of arts in this area. As summarized by , most of the existing algorithms follow a two-stage approach: ( ) conduct depth estimation based on local groups of input images; ( ) fuse the estimated depth values into a global watertight 3D surface estimation. The depth estimation step is often based on image correlation . The main differences between existing algorithms are in the second stage, the data fusion step, which can be divided into two categories. The first type of data fusion reconstructs the 3D surface by conducting volumetric data segmentation using global energy minimization approaches such as graph cut [6–11], level-set [12–16], or deformable models [5, 17–19]. Recently, people have proposed other types of data fusion algorithms that are based on local surface growing and filtering [2, 20, 21]. Without global optimization, these types of data fusion algorithms can be computationally more efficient [22, 23].
Our algorithm also follows this two-stage process. We proposed an iterative refinement scheme that iterates between the depth estimation step and the data fusion step. This is similar in spirit of the Expectation Maximization (EM) algorithm. Moreover, we propose a novel outlier removal algorithm based on anisotropic kernel density estimation. Our data fusion algorithm integrates the fast implicit region growing with the high-quality explicit surface evolution; thus it is both fast and accurate.
The rest of the paper is organized as follows. In Section 1.1 we discuss the main differences between our approach and related existing works. Section 2 describes the details of our algorithm. The benchmark data evaluation is shown in Section 3. The paper concludes in Section 4.
1.1. Comparison with Related Works
Our work is most related to the works of Hernández and Schmitt  and Quan et al. [16, 24]. Hernández et al. proposed a deformable model-based reconstruction algorithm  that achieves one of the highest-quality reconstruction . The depth estimation of  is conducted by rectangular window-based normalized cross-correlation (NCC). The estimated depth values are then discretized into an octree-based volumetric grid. Finally a gradient vector flow-based deformable model is applied to the volumetric grid to reconstruct the 3D surface.
Our depth estimation follows the similar pipeline of , with several modifications to further improve its efficiency. We will describe these modifications in Section 2.2. Furthermore, unlike , we represent the depth estimations as 3D points whose accuracy is not restricted by the resolution of the volumetric grid. Quan et al. [16, 24] also represent the estimated depth values as 3D points. However, unlike our method, they do not have an explicit outlier removal. Instead they rely on level-set-based surface evolution with high-order smoothness terms such as Gaussian/mean curvature to overcome noises, which may create surfaces that maybe too smooth to represent finer geometry details of the original object. Most recently, Campbell et al.  proposed an outlier removal algorithm based on the Markov Random Field (MRF) model which can achieve very impressive reconstruction results. On the other hand, our outlier removal algorithm is based on kernel density estimation and is conducted on 3D unorganized points instead of the 2D image space of .
To summarize, the main contributions of this paper are. ( ) a novel iterative refinement scheme between the depth estimation and the data fusion, ( ) a novel anisotropic kernel density estimation based outlier removal algorithm, ( ) a novel data fusion algorithm that integrates the fast implicit distance function-based region growing method with the high-quality explicit surface evolution.
visual hull construction,
3D point generation,
implicit surface evolution,
explicit surface evolution.
2.1. Visual Hull Construction
The first step of our algorithm is to obtain an initial shape estimation by constructing a visual hull. Visual hull is an outer approximation of the observed solid constructed as the intersection of the visual cones associated with all the input cameras . A discrete volumetric representation of the visual hull can be obtained by intersecting the cones generated by back projecting the object silhouettes from different camera views. An explicit shape representation can be obtained by iso-surface extraction algorithms such as Marching Cubes .
Once we had an initial explicit shape estimation, we will proceed to 3D depth estimation. First, we need to estimate the visibility of the initial shape with respect to all the cameras. We use OpenGL to render the explicit surface into the image planes of each individual cameras and extract the depth values from the Z-buffer. Given a point on the surface, its visibility with respect to a given camera can then be decided by comparing its projected depth value into the image plane of the given camera with the corresponding depth value stored in the Z-buffer.
Our depth estimation is based on the Lambertian assumption; that is, if a point belongs to the object surface, its corresponding 2D patches in the image planes of its visible cameras should be strongly correlated. Hence starting from a point on the object surface, we can conduct a line search along a defined search direction to locate the best position whose correlation between the corresponding 2D image patches of different visible cameras is the maxima within a certain search range. This idea is first proposed by . Our paper follows the same principle with several modifications. In the following, we will briefly describe our depth estimation method as well as the main differences between our method and the method of .
Given a point on the initial surface, we will select a set of (up to) five "best-view" visible cameras based on the point's estimated surface normal. Each camera in the selected set will serve as the main camera for once. The search direction is defined as the optical ray passing through the optical center of the main camera and the given point. We will uniformly sample the optical ray within a certain range of the given point, and for each sampled position, we will project it into the image planes of the main camera and another camera in the set, respectively. Rectangular image patches centered at the projected locations of the two image planes will be extracted, and the correlation between the two image patches will be computed by similarity measures such as the normalized cross-correlation (NCC) .
For a set of five "best-view" cameras, a total of 20 correlation curves will be generated. For each of the correlation curves, the best position (i.e., the point with the highest correlation value) will be selected as the depth estimation. The depth estimations will be represented as 3D points, which will be processed further to construct a new shape estimation of the object.
The main differences between our implementation and the method of  are the following. First, we start the line search from every point on the explicit object surface. The line search in  is initiated from every image and the correlation is computed with all the other images, which could be computationally more expensive than ours. Secondly, in , for each set of correlation curves computed using the same search direction and the same main camera, only one representative depth estimation is used. While in our method, we avoid this potentially premature averaging by using the depth estimations from all the correlation curves, and postpone the outlier pruning into the subsequent outlier removal step. Thirdly, in , the depth estimations are stored in an octree-based volumetric grid, while we store them as discrete points whose accuracy is not restricted by the grid size.
2.3. Outlier Removal
Points generated by the above depth estimation step may contain outliers (points that do not belong to the object surface) that have to be removed. Since the real object surface is unknown, it is hard to specify a general criterion to detect outliers. In this paper, we propose to employ Parzen-window-based nonparametric density estimation method for outlier removal.
where is the norm (i.e., Euclidean distance metric) of the d-dimensional vector x. There are three types of commonly used spherical kernel functions : the Epanechnikov kernel, the uniform kernel, and the Gaussian kernel .
where r is half of the length of the minimum axis of the ellipsoidal kernel E.
2.4. Implicit Surface Evolution
After outlier removal, the remaining 3D points will be used to reconstruct the 3D surface of the object. The shape estimation is conducted into two steps. First, a fast implicit distance function-based region growing method—tagging algorithm —is employed to create a coarse shape estimation from the 3D points. Next, an explicit surface evolution step is applied to recover the finer geometry details of the object. We will briefly review the tagging algorithm in the following, for more details please refer to the original paper in . The explicit surface evolution method will be discussed in the next section.
The basic idea of tagging algorithm is to identify as many correct exterior grid points as possible and hence provide a good initial implicit surface, which is represented as an interface that separates the exterior grid points from the interior grid points. There are two main steps in the original tagging algorithm. First, we will compute a volumetric unsigned distance field based on the 3D points. This is done by the aforementioned fast sweeping method . Once we had the volumetric unsigned distance field, the tagging algorithm will iteratively grow the set of exterior grid points and stop at the boundary of the object. The algorithm can start from any initial exterior region that is a subset of the true exterior region, for example, an outmost corner grid point of the bounding volume, and iteratively tag all the grid points as exterior or interior points based on the comparison of the closeness to the object boundary between the current grid points and its neighboring interior grid points.
2.5. Explicit Surface Evolution
where is the 3D evolving surface, t is the time parameter, g(S) is speed function and is defined as the derivative of , which is the point-based density estimation calculated by (1). is the surface normal vector. The final reconstructed 3D shape is then given by the steady-state solution of the equation . Since the speed function g is dynamically calculated at each time step based on the local points distribution, the accuracy of our evolution method will not be limited by the grid resolution as other volumetric image based surface evolution methods such as in .
3. Benchmark Data Evaluation
Running time and reconstruction accuracy.
Running time (mins : secs)
No. of input images
33 : 17
Temple sparse ring
29 : 06
36 : 45
Dino sparse ring
32 : 01
4. Conclusion and Future Work
In this paper, we propose an iterative surface evolution algorithm for 3D shape reconstruction from multiview images. The proposed novel iterative refinements between image correlation-based 3D depth estimation and surface evolution-based shape estimation can significantly reduce the computational time and improve the accuracy of the final reconstructed surface. The benchmark evaluation results are comparable with the state-of-art methods.
Currently, our method utilizes the visual hull for initial estimation. This requires image segmentation that may be difficult for some images. We would like to relax this requirement in the future. This might be possible since our algorithm uses the iterative refinement which should be able to start from any coarse shape such as a bounding box or a convex hull.
The authors are very grateful for Seitz et al.  for providing them the datasets used in the paper and Daniel Scharstein for helping them evaluating the result on the benchmark datasets. Research was supported in part by the Leonard Wood Institute in cooperation with the U.S. Army Research Laboratory and was accomplished under Cooperative Agreement # LWI-281074, and by the NSF Grant no. CMMI-0856206. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Leonard Wood Institute, the Army Research Laboratory, the Army Research Office, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.
- Wang Y, Huang X, Lee CS, et al.: High resolution acquisition, learning and transfer of dynamic 3-D facial expressions. Computer Graphics Forum 2004,23(3):677-686. 10.1111/j.1467-8659.2004.00800.xView ArticleGoogle Scholar
- Goesele M, Snavely N, Curless B, Hoppe H, Seitz S: Multi-view stereo for community photo collections. Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV '07), October 2007, Rio de Janeiro, BrazilGoogle Scholar
- Seitz S, Curless B, Diebel J, Scharstein D, Szeliski R: A comparison and evaluation of multi-view stereo reconstruction algorithms. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), July 2006 1: 519-526.Google Scholar
- Campbell N, Vogiatzis G, Hernández C, Cipolla R: Using multiple hypotheses to improve depth-maps for multi-view stereo. Proceedings of the European Conference on Computer Vision (ECCV '08), 2008 766-779.Google Scholar
- Hernández C, Schmitt F: Silhouette and stereo fusion for 3D object modeling. Computer Vision and Image Understanding 2004,96(3):367-392. 10.1016/j.cviu.2004.03.016View ArticleGoogle Scholar
- Vogiatzis G, Hernández C, Torr PHS, Cipolla R: Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency. IEEE Transactions on Pattern Analysis and Machine Intelligence 2007,29(12):2241-2246.View ArticleGoogle Scholar
- Goesele M, Curless B, Seitz S: Multi-view stereo revisited. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), July 2006 2402-2409.Google Scholar
- Hornung A, Kobbelt L: Hierarchical volumetric multi-view stereo reconstruction of manifold surfaces based on dual graph embedding. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), July 2006 503-510.Google Scholar
- Vogiatzis G, Torr P, Cipolla R: Multi-view stereo via volumetric graph-cuts. Proceedings of Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), July 2005, San Diego, Calif, USA 391-398.Google Scholar
- Sinha S, Pollefeys M: Multi-view reconstruction using photo-consistency and exact silhouette constraints: a maximum-flow formulation. Proceedings of 10th IEEE International Conference on Computer Vision (ICCV '05), October 2005 349-356.View ArticleGoogle Scholar
- Kolmogorov V, Zabih R: Generalized multi-camera scene reconstruction using graph cuts. Proceedings of the European Conference on Computer Vision (ECCV '02), 2002 3: 82-96.Google Scholar
- Jin H, Soatto S, Yezzi AJ: Multi-view stereo reconstruction of dense shape and complex appearance. International Journal of Computer Vision 2005,63(3):175-189. 10.1007/s11263-005-6876-7View ArticleGoogle Scholar
- Faugeras O, Keriven R: Variational principles, surface evolution, PDE's, level set methods, and the stereo problem. IEEE Transactions on Image Processing 1998,7(3):336-344. 10.1109/83.661183View ArticleMathSciNetMATHGoogle Scholar
- Soatto S, Yezzi A, Jin H: Tales of shape and radiance in multi-view stereo. Proceedings of the 9th IEEE Internationa Conference on Computer Vision (ICCV '03), October 2003, Nice, France 974-981.View ArticleGoogle Scholar
- Jin H, Soatto S, Yezzi A: Multi-view stereo beyond Lambert. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03), July 2003, Madison, Wis, USA 1: 171-178.Google Scholar
- Lhuillier M, Quan L: A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence 2005,27(3):418-433.View ArticleGoogle Scholar
- Duan YE, Yang L, Qin H, Samaras D: Shape reconstruction from 3D and 2D data using pde-based deformable surfaces. Proceedings of the European Conference on Computer Vision (ECCV '04), May 2004 3: 238-251.MATHGoogle Scholar
- Hernandez C, Schmitt F: Multi-stereo 3D object reconstruction. Proceedings of 3D Data Processing Visualization and Transmission, June 2002, Padova, Italy 159-166.Google Scholar
- Furukawa Y, Ponce J: Carved visual hulls for image-based modeling. Proceedings of the European Conference on Computer Vision (ECCV '06), May 2006, Graz, Austria 3951: 564-577.Google Scholar
- Furukawa Y, Ponce J: Accurate, dense, and robust multi-view stereopsis. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), July 2007Google Scholar
- Habbecke M, Kobbelt L: A surface-growing approach to multi-view stereo reconstruction. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), June 2007Google Scholar
- Merrell P, Akbarzadeh A, Wang L, et al.: Real-time visibility-based fusion of depth maps. Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV '07), October 2007, Rio de Janario, BrazilGoogle Scholar
- Bradley D, Boubekeur T, Heidrich W: Accurate multi-view reconstruction using robust binocular stereo and surface meshing. Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), July 2008, Anchorage, Alaska, USAGoogle Scholar
- Quan L, Wang J, Tan P, Yuan L: Image-based modeling by joint segmentation. International Journal of Computer Vision 2007,75(1):135-150. 10.1007/s11263-007-0044-1View ArticleGoogle Scholar
- The multi-view stereo evaluation http://vision.middlebury.edu/mview
- Laurentini A: The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 1994,16(2):150-162. 10.1109/34.273735View ArticleGoogle Scholar
- Lorensen WE, Cline HE: Marching cubes: a high resolution 3D surface construction algorithm. Computer Graphics 1987,21(4):163-169. 10.1145/37402.37422View ArticleGoogle Scholar
- Comaniciu D, Meer P: Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002,24(5):603-619. 10.1109/34.1000236View ArticleGoogle Scholar
- Zhao HK, Osher S, Fedkiw R: Fast surface reconstruction using the level set method. Proceedings of IEEE Workshop on Variational and Level Set Methods in Computer Vision, July 2001, Vancouver, Canada 194-201.View ArticleGoogle Scholar
- Zhao H, Osher S, Merriman B, Kang M: Implicit and nonparametric shape reconstruction from unorganized data using a variational level set method. Computer Vision and Image Understanding 2000,80(3):295-314. 10.1006/cviu.2000.0875View ArticleMATHGoogle Scholar
- Caselles V, Kimmel R, Sapiro G, Sbert C: Three dimensional object modeling via minimal surfaces. Proceedings of the European Conference on Computer Vision (ECCV '96), April 1996, Cambridge, UK 1: 97-106.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.