# An Iterative Surface Evolution Algorithm for Multiview Stereo

- Yongjian Xi
^{1}and - Ye Duan
^{1}Email author

**2010**:274269

https://doi.org/10.1155/2010/274269

© Y. Xi and Y. Duan. 2010

**Received: **2 August 2009

**Accepted: **3 March 2010

**Published: **15 April 2010

## Abstract

We propose a new iterative surface evolution algorithm for multiview stereo. Starting from an embedding space such as the visual hull, we will first conduct robust 3D depth estimation (represented as 3D points) based on image correlation. A fast implicit distance function-based region growing method is then employed to extract an initial shape estimation based on these 3D points. Next, an explicit surface evolution will be conducted to recover the finer geometry details of the recovered shape. The recovered shape will be further improved by several iterations between depth estimation and shape reconstruction, similar to the Expectation Maximization (EM) approach. The experiments on the benchmark datasets show that our algorithm can obtain high-quality reconstruction results that are comparable with the state-of-art methods, with considerable less computational time and complexity.

## Keywords

## 1. Introduction

Despite significant advancement in interactive shape modeling, creating complex high-quality realistic looking 3D models from scratch is still a very challenging task. Recent advancement in 3D shape acquisition systems such as laser range scanners and encoded light projecting system has made directly 3D data acquisition feasible [1]. These active 3D acquisition systems however remain expensive. Meanwhile, the price of digital cameras and digital video cameras keeps decreasing while the quality is improving every day, partially due to the intense competition in the huge consumer market. Furthermore, huge amounts of images and videos are added in internet sites such as Google, and so forth. Every day, a lot of which could be used for multiview image-based 3D shape reconstruction [2].

To date, there have been a lot of researches conducted in the area of multiview image-based modeling. The recent survey by Seitz et al. [3] gives an excellent review of the state of arts in this area. As summarized by [4], most of the existing algorithms follow a two-stage approach: ( ) conduct depth estimation based on local groups of input images; ( ) fuse the estimated depth values into a global watertight 3D surface estimation. The depth estimation step is often based on image correlation [5]. The main differences between existing algorithms are in the second stage, the data fusion step, which can be divided into two categories. The first type of data fusion reconstructs the 3D surface by conducting volumetric data segmentation using global energy minimization approaches such as graph cut [6–11], level-set [12–16], or deformable models [5, 17–19]. Recently, people have proposed other types of data fusion algorithms that are based on local surface growing and filtering [2, 20, 21]. Without global optimization, these types of data fusion algorithms can be computationally more efficient [22, 23].

Our algorithm also follows this two-stage process. We proposed an iterative refinement scheme that iterates between the depth estimation step and the data fusion step. This is similar in spirit of the Expectation Maximization (EM) algorithm. Moreover, we propose a novel outlier removal algorithm based on anisotropic kernel density estimation. Our data fusion algorithm integrates the fast implicit region growing with the high-quality explicit surface evolution; thus it is both fast and accurate.

The rest of the paper is organized as follows. In Section 1.1 we discuss the main differences between our approach and related existing works. Section 2 describes the details of our algorithm. The benchmark data evaluation is shown in Section 3. The paper concludes in Section 4.

### 1.1. Comparison with Related Works

Our work is most related to the works of Hernández and Schmitt [5] and Quan et al. [16, 24]. Hernández et al. proposed a deformable model-based reconstruction algorithm [5] that achieves one of the highest-quality reconstruction [3]. The depth estimation of [5] is conducted by rectangular window-based normalized cross-correlation (NCC). The estimated depth values are then discretized into an octree-based volumetric grid. Finally a gradient vector flow-based deformable model is applied to the volumetric grid to reconstruct the 3D surface.

Our depth estimation follows the similar pipeline of [5], with several modifications to further improve its efficiency. We will describe these modifications in Section 2.2. Furthermore, unlike [5], we represent the depth estimations as 3D points whose accuracy is not restricted by the resolution of the volumetric grid. Quan et al. [16, 24] also represent the estimated depth values as 3D points. However, unlike our method, they do not have an explicit outlier removal. Instead they rely on level-set-based surface evolution with high-order smoothness terms such as Gaussian/mean curvature to overcome noises, which may create surfaces that maybe too smooth to represent finer geometry details of the original object. Most recently, Campbell et al. [4] proposed an outlier removal algorithm based on the Markov Random Field (MRF) model which can achieve very impressive reconstruction results. On the other hand, our outlier removal algorithm is based on kernel density estimation and is conducted on 3D unorganized points instead of the 2D image space of [4].

To summarize, the main contributions of this paper are. ( ) a novel iterative refinement scheme between the depth estimation and the data fusion, ( ) a novel anisotropic kernel density estimation based outlier removal algorithm, ( ) a novel data fusion algorithm that integrates the fast implicit distance function-based region growing method with the high-quality explicit surface evolution.

## 2. Algorithm

- (1)
visual hull construction,

- (2)
3D point generation,

- (3)
outlier removal,

- (4)
implicit surface evolution,

- (5)
explicit surface evolution.

### 2.1. Visual Hull Construction

The first step of our algorithm is to obtain an initial shape estimation by constructing a visual hull. Visual hull is an outer approximation of the observed solid constructed as the intersection of the visual cones associated with all the input cameras [26]. A discrete volumetric representation of the visual hull can be obtained by intersecting the cones generated by back projecting the object silhouettes from different camera views. An explicit shape representation can be obtained by iso-surface extraction algorithms such as Marching Cubes [27].

### 2.2. Points Generation

Once we had an initial explicit shape estimation, we will proceed to 3D depth estimation. First, we need to estimate the visibility of the initial shape with respect to all the cameras. We use OpenGL to render the explicit surface into the image planes of each individual cameras and extract the depth values from the Z-buffer. Given a point on the surface, its visibility with respect to a given camera can then be decided by comparing its projected depth value into the image plane of the given camera with the corresponding depth value stored in the Z-buffer.

Our depth estimation is based on the Lambertian assumption; that is, if a point belongs to the object surface, its corresponding 2D patches in the image planes of its visible cameras should be strongly correlated. Hence starting from a point on the object surface, we can conduct a line search along a defined search direction to locate the best position whose correlation between the corresponding 2D image patches of different visible cameras is the maxima within a certain search range. This idea is first proposed by [5]. Our paper follows the same principle with several modifications. In the following, we will briefly describe our depth estimation method as well as the main differences between our method and the method of [5].

Given a point on the initial surface, we will select a set of (up to) five "best-view" visible cameras based on the point's estimated surface normal. Each camera in the selected set will serve as the main camera for once. The search direction is defined as the optical ray passing through the optical center of the main camera and the given point. We will uniformly sample the optical ray within a certain range of the given point, and for each sampled position, we will project it into the image planes of the main camera and another camera in the set, respectively. Rectangular image patches centered at the projected locations of the two image planes will be extracted, and the correlation between the two image patches will be computed by similarity measures such as the normalized cross-correlation (NCC) [5].

For a set of five "best-view" cameras, a total of 20 correlation curves will be generated. For each of the correlation curves, the best position (i.e., the point with the highest correlation value) will be selected as the depth estimation. The depth estimations will be represented as 3D points, which will be processed further to construct a new shape estimation of the object.

The main differences between our implementation and the method of [5] are the following. First, we start the line search from every point on the explicit object surface. The line search in [5] is initiated from every image and the correlation is computed with all the other images, which could be computationally more expensive than ours. Secondly, in [5], for each set of correlation curves computed using the same search direction and the same main camera, only one representative depth estimation is used. While in our method, we avoid this potentially premature averaging by using the depth estimations from all the correlation curves, and postpone the outlier pruning into the subsequent outlier removal step. Thirdly, in [5], the depth estimations are stored in an octree-based volumetric grid, while we store them as discrete points whose accuracy is not restricted by the grid size.

### 2.3. Outlier Removal

Points generated by the above depth estimation step may contain outliers (points that do not belong to the object surface) that have to be removed. Since the real object surface is unknown, it is hard to specify a general criterion to detect outliers. In this paper, we propose to employ Parzen-window-based nonparametric density estimation method for outlier removal.

*d*-dimensional Euclidean space , the multivariate kernel density estimate obtained with kernel and window radius (without loss of generality, letus assume from now on), computed in the point

*x,*is defined as

where
is the
norm (i.e., Euclidean distance metric) of the *d*-dimensional vector *x*. There are three types of commonly used spherical kernel functions
: the Epanechnikov kernel, the uniform kernel, and the Gaussian kernel [28].

*x*and , will be replaced by the Mahalanobis distance metric :

*x*, with its shape and orientation defined by

*H*. Using Single Value Decomposition (SVD), the covariance matrix

*H*can be further decomposed as

where
are the three eigenvalues of the matrix *H*, and *U* is an orthonormal matrix whose columns are the eigenvectors of matrix *H*.

*E*of equal size and shape on all the data points. The orientation of the ellipsoidal kernel

*E*will be determined locally. More specifically, given a point

*x*, we will calculate its covariance matrix

*H*by points located in its local spherical neighborhood of a fixed radius. (Without loss of generality, we will assume the radius is 1, which can be done by normalizing the data by the radius). The

*U*matrix of (4) calculated by the covariance analysis is kept unchanged to maintain the orientation of the ellipsoid. The size and shape of the ellipsoid will be modified to be the same as the ellipsoidal kernel

*E*by modifying the diagonal matrix

*A*as

where *r* is half of the length of the minimum axis of the ellipsoidal kernel *E*.

### 2.4. Implicit Surface Evolution

After outlier removal, the remaining 3D points will be used to reconstruct the 3D surface of the object. The shape estimation is conducted into two steps. First, a fast implicit distance function-based region growing method—tagging algorithm [29]—is employed to create a coarse shape estimation from the 3D points. Next, an explicit surface evolution step is applied to recover the finer geometry details of the object. We will briefly review the tagging algorithm in the following, for more details please refer to the original paper in [29]. The explicit surface evolution method will be discussed in the next section.

The basic idea of tagging algorithm is to identify as many correct exterior grid points as possible and hence provide a good initial implicit surface, which is represented as an interface that separates the exterior grid points from the interior grid points. There are two main steps in the original tagging algorithm. First, we will compute a volumetric unsigned distance field based on the 3D points. This is done by the aforementioned fast sweeping method [30]. Once we had the volumetric unsigned distance field, the tagging algorithm will iteratively grow the set of exterior grid points and stop at the boundary of the object. The algorithm can start from any initial exterior region that is a subset of the true exterior region, for example, an outmost corner grid point of the bounding volume, and iteratively tag all the grid points as exterior or interior points based on the comparison of the closeness to the object boundary between the current grid points and its neighboring interior grid points.

### 2.5. Explicit Surface Evolution

where
is the 3D evolving surface, *t* is the time parameter, *g*(*S*) is speed function and is defined as the derivative of
, which is the point-based density estimation calculated by (1).
is the surface normal vector. The final reconstructed 3D shape is then given by the steady-state solution of the equation
. Since the speed function *g* is dynamically calculated at each time step based on the local points distribution, the accuracy of our evolution method will not be limited by the grid resolution as other volumetric image based surface evolution methods such as in [5].

## 3. Benchmark Data Evaluation

Running time and reconstruction accuracy.

Dataset | Running time (mins : secs) | No. of input images | Accuracy |
---|---|---|---|

Temple ring | 33 : 17 | 47 | 98.9% |

Temple sparse ring | 29 : 06 | 16 | 96.8% |

Dino ring | 36 : 45 | 48 | 97.7% |

Dino sparse ring | 32 : 01 | 16 | 97.6% |

## 4. Conclusion and Future Work

In this paper, we propose an iterative surface evolution algorithm for 3D shape reconstruction from multiview images. The proposed novel iterative refinements between image correlation-based 3D depth estimation and surface evolution-based shape estimation can significantly reduce the computational time and improve the accuracy of the final reconstructed surface. The benchmark evaluation results are comparable with the state-of-art methods.

Currently, our method utilizes the visual hull for initial estimation. This requires image segmentation that may be difficult for some images. We would like to relax this requirement in the future. This might be possible since our algorithm uses the iterative refinement which should be able to start from any coarse shape such as a bounding box or a convex hull.

## Declarations

### Acknowledgments

The authors are very grateful for Seitz et al. [3] for providing them the datasets used in the paper and Daniel Scharstein for helping them evaluating the result on the benchmark datasets. Research was supported in part by the Leonard Wood Institute in cooperation with the U.S. Army Research Laboratory and was accomplished under Cooperative Agreement # LWI-281074, and by the NSF Grant no. CMMI-0856206. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Leonard Wood Institute, the Army Research Laboratory, the Army Research Office, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.

## Authors’ Affiliations

## References

- Wang Y, Huang X, Lee CS,
*et al*.:**High resolution acquisition, learning and transfer of dynamic 3-D facial expressions.***Computer Graphics Forum*2004,**23**(3):677-686. 10.1111/j.1467-8659.2004.00800.xView ArticleGoogle Scholar - Goesele M, Snavely N, Curless B, Hoppe H, Seitz S:
**Multi-view stereo for community photo collections.***Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV '07), October 2007, Rio de Janeiro, Brazil*Google Scholar - Seitz S, Curless B, Diebel J, Scharstein D, Szeliski R:
**A comparison and evaluation of multi-view stereo reconstruction algorithms.***Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), July 2006***1:**519-526.Google Scholar - Campbell N, Vogiatzis G, Hernández C, Cipolla R:
**Using multiple hypotheses to improve depth-maps for multi-view stereo.***Proceedings of the European Conference on Computer Vision (ECCV '08), 2008*766-779.Google Scholar - Hernández C, Schmitt F:
**Silhouette and stereo fusion for 3D object modeling.***Computer Vision and Image Understanding*2004,**96**(3):367-392. 10.1016/j.cviu.2004.03.016View ArticleGoogle Scholar - Vogiatzis G, Hernández C, Torr PHS, Cipolla R:
**Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency.***IEEE Transactions on Pattern Analysis and Machine Intelligence*2007,**29**(12):2241-2246.View ArticleGoogle Scholar - Goesele M, Curless B, Seitz S:
**Multi-view stereo revisited.***Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), July 2006*2402-2409.Google Scholar - Hornung A, Kobbelt L:
**Hierarchical volumetric multi-view stereo reconstruction of manifold surfaces based on dual graph embedding.***Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), July 2006*503-510.Google Scholar - Vogiatzis G, Torr P, Cipolla R:
**Multi-view stereo via volumetric graph-cuts.***Proceedings of Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), July 2005, San Diego, Calif, USA*391-398.Google Scholar - Sinha S, Pollefeys M:
**Multi-view reconstruction using photo-consistency and exact silhouette constraints: a maximum-flow formulation.***Proceedings of 10th IEEE International Conference on Computer Vision (ICCV '05), October 2005*349-356.View ArticleGoogle Scholar - Kolmogorov V, Zabih R:
**Generalized multi-camera scene reconstruction using graph cuts.***Proceedings of the European Conference on Computer Vision (ECCV '02), 2002***3:**82-96.Google Scholar - Jin H, Soatto S, Yezzi AJ:
**Multi-view stereo reconstruction of dense shape and complex appearance.***International Journal of Computer Vision*2005,**63**(3):175-189. 10.1007/s11263-005-6876-7View ArticleGoogle Scholar - Faugeras O, Keriven R:
**Variational principles, surface evolution, PDE's, level set methods, and the stereo problem.***IEEE Transactions on Image Processing*1998,**7**(3):336-344. 10.1109/83.661183View ArticleMathSciNetMATHGoogle Scholar - Soatto S, Yezzi A, Jin H:
**Tales of shape and radiance in multi-view stereo.***Proceedings of the 9th IEEE Internationa Conference on Computer Vision (ICCV '03), October 2003, Nice, France*974-981.View ArticleGoogle Scholar - Jin H, Soatto S, Yezzi A:
**Multi-view stereo beyond Lambert.***Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03), July 2003, Madison, Wis, USA***1:**171-178.Google Scholar - Lhuillier M, Quan L:
**A quasi-dense approach to surface reconstruction from uncalibrated images.***IEEE Transactions on Pattern Analysis and Machine Intelligence*2005,**27**(3):418-433.View ArticleGoogle Scholar - Duan YE, Yang L, Qin H, Samaras D:
**Shape reconstruction from 3D and 2D data using pde-based deformable surfaces.***Proceedings of the European Conference on Computer Vision (ECCV '04), May 2004***3:**238-251.MATHGoogle Scholar - Hernandez C, Schmitt F:
**Multi-stereo 3D object reconstruction.***Proceedings of 3D Data Processing Visualization and Transmission, June 2002, Padova, Italy*159-166.Google Scholar - Furukawa Y, Ponce J:
**Carved visual hulls for image-based modeling.***Proceedings of the European Conference on Computer Vision (ECCV '06), May 2006, Graz, Austria***3951:**564-577.Google Scholar - Furukawa Y, Ponce J:
**Accurate, dense, and robust multi-view stereopsis.***Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), July 2007*Google Scholar - Habbecke M, Kobbelt L:
**A surface-growing approach to multi-view stereo reconstruction.***Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '07), June 2007*Google Scholar - Merrell P, Akbarzadeh A, Wang L,
*et al*.:**Real-time visibility-based fusion of depth maps.***Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV '07), October 2007, Rio de Janario, Brazil*Google Scholar - Bradley D, Boubekeur T, Heidrich W:
**Accurate multi-view reconstruction using robust binocular stereo and surface meshing.***Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), July 2008, Anchorage, Alaska, USA*Google Scholar - Quan L, Wang J, Tan P, Yuan L:
**Image-based modeling by joint segmentation.***International Journal of Computer Vision*2007,**75**(1):135-150. 10.1007/s11263-007-0044-1View ArticleGoogle Scholar -
**The multi-view stereo evaluation**http://vision.middlebury.edu/mview - Laurentini A:
**The visual hull concept for silhouette-based image understanding.***IEEE Transactions on Pattern Analysis and Machine Intelligence*1994,**16**(2):150-162. 10.1109/34.273735View ArticleGoogle Scholar - Lorensen WE, Cline HE:
**Marching cubes: a high resolution 3D surface construction algorithm.***Computer Graphics*1987,**21**(4):163-169. 10.1145/37402.37422View ArticleGoogle Scholar - Comaniciu D, Meer P:
**Mean shift: a robust approach toward feature space analysis.***IEEE Transactions on Pattern Analysis and Machine Intelligence*2002,**24**(5):603-619. 10.1109/34.1000236View ArticleGoogle Scholar - Zhao HK, Osher S, Fedkiw R:
**Fast surface reconstruction using the level set method.***Proceedings of IEEE Workshop on Variational and Level Set Methods in Computer Vision, July 2001, Vancouver, Canada*194-201.View ArticleGoogle Scholar - Zhao H, Osher S, Merriman B, Kang M:
**Implicit and nonparametric shape reconstruction from unorganized data using a variational level set method.***Computer Vision and Image Understanding*2000,**80**(3):295-314. 10.1006/cviu.2000.0875View ArticleMATHGoogle Scholar - Caselles V, Kimmel R, Sapiro G, Sbert C:
**Three dimensional object modeling via minimal surfaces.***Proceedings of the European Conference on Computer Vision (ECCV '96), April 1996, Cambridge, UK***1:**97-106.View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.