Real-time virtual view synthesis using light field
© The Author(s). 2016
Received: 30 June 2016
Accepted: 6 September 2016
Published: 15 September 2016
Virtual view synthesis technique renders a virtual view image from several pre-collected viewpoint images. The hotspot on virtual view synthesis area is depth image-based rendering (DIBR), which has low one-time imaging quality. To achieve high imaging quality artifacts, the holes must be inpainted after image warping which means high computational complexity. This paper proposed a real-time virtual view synthesis method from light field. Then the light field is transformed into frequency domain. The light field is parameterized and reconstructed from image array. The virtual view is rendered by resampling the light field in frequency domain. After resampling the image by performing Fourier slice, the virtual view image is obtained by inverse Fourier transform. Experiments show that our method can get high one-time imaging quality in real time.
For many modern applications including special effects on films, TVs, and surveillance as well as virtual reality and other applications, it is often desirable to generate a high-quality virtual view image of a 3D scene from a viewpoint for which no direct information is available. The virtual view is synthesized from information from a number of known reference views (RV). Depth image-based rendering (DIBR) technique is the hotspot in virtual view synthesis field. Each pixel in RV has corresponding depth information. The core of DIBR is 3D image warping theory proposed by McMillan . The virtual view is synthesized by re-projection of every pixel of referenced images in space.
The main drawback of DIBR technology is a low one-time image quality. The holes [2, 3], artifacts [4, 5], deformation , and abnormal edge  will appear after image warping; repair works for these issues, such as inpainting, have high computational complexity. Although there are a lot of recent work on improving imaging quality in real time , Ho  and Xiao  reduced the computational complexity and improved operational efficiency in their work but still cannot achieve real-time performance. Mori  and Zinger  enhanced imaging quality in their work but further increased computational complexity.
Levoy , the proposer of light field rendering theory, called light field the “next generation display technology” and introduced light field into computer graphics. Studies on light field have gotten good results in many research areas, such as microscopic imaging , light field camera , complex scene 3D reconstruction , and accurate depth estimation . Light field has shown great potential in these research works.
For the first time, this paper applies the light field theory to virtual view synthesis. Our method can synthesize high-quality virtual view from the light field in real time.
2 Parameterization and collection of light field
Traditional cameras capture a 2D digital image, in which each pixel is the energy integration of all rays that reach the point in scene; the direction of these rays are not distinguished. The image is a projection of 3D scene, which makes the direction and position of rays lost. Light field retains the directions and positions of rays in scene.
2.2 Light field collection based on camera array
3.1 Light field reconstruction
In the conventional imaging process, light is projected on the image plane which makes the position information of light on the s plane recorded but does not record the location information on the u plane. However, in a camera array, the imaging process, the resultant is a multiple image array of the same scene. This is equivalent to placing lots of main lens on the u plane in Fig. 4, which means the position information of light on the u plane is recorded. Therefore, using camera spatial relationship and imaging Eq. (3), a light field is captured.
The formula expressed that image can be taken as a slice of 4D light field project on a 2D plane. Therefore, each image in image array can be taken as a slice of the light field of a certain scene project on a 2D plane.
3.2 Resampling in frequency domain
Although the spatial relationship between the image and the light field can be visually described in space domain, but the description itself is an integral projection process, algorithm of this process usually have high computational complexity O(n 4 ). In contrast, the relationship between images and light field in frequency domain can be simply express as an image is a 2D slice of 4D light field in Fourier domain. This conclusion stems from the fundamental theory Fourier slice theorem by Ron Bracewell  proposed in 1956. The classical theory has a great contribution later in the field of medical imaging, computed tomography scanning, and positron imaging technology.
The process of virtual view synthesis can be transformed from space domain to frequency equivalently.
4 Results and discussion
4.1 Experiment environment
The software and hardware environment
Intel(R) Core(TM) i7-4790 K @ 4.00 GHz
NVIDIA(R) GeForce GTX TITAN X (16 GB)
16GB DDR3 1600 MHz
Windows 10 professional x64
OpenGL, C++, OpenCV 2.4.11, CUDA 7.0
4.2 Experiment 1: open dataset
Image array information
17 × 17
17 × 17
17 × 17
17 × 17
17 × 17
768 × 1024
1400 × 800
1024 × 1024
1024 × 1024
640 × 1024
Single image size(KB)
939 ~ 968
1265 ~ 1274
1657 ~ 1687
1263 ~ 1280
873 ~ 883
Total size (MB)
4.2.1 Real-time performance experiment
In this section, the real-time performance is evaluated by frames per second (fps). The more frames rendered per second, the better real-time performance is.
- 1)The viewpoint is moving along the red path in Fig. 10.
This process lasts 18 s. From start to end, fps is calculated each second. Thus, average fps and one-time imaging time can be obtained.
Figure 10 is a visualization of plane (u, v) in a parameterized light field. The dark square is the light field we constructed. Each dark point in dark square represents the center of a camera in camera array. The white point upper left is where the viewpoint starts. The black point lower left is where the viewpoint ends. The red broken line is the path which viewpoint moves along. The blue point upper left means the viewpoint is closer to that camera. The viewpoint is moving along the red path. The viewpoint movement trajectory covers all camera points, from start to end.
4.2.2 Vision effect experiment
As shown in Fig. 12, virtual view images of amethyst and chess are similar with real images. Synthesized image in ball blurs obviously.
When evaluating the quality of synthesized images, we usually compare the real view image and the synthesized image. The more similar they are the better vision effect there is. PSNR is a common objective evaluation standard of image quality, the unit is dB, and the greater the value, the better synthesis quality is.
This data set is a close-up image array, compared with amethyst and chess with the same space between cameras; the rays captured in ball have larger angle of incidence, which results in the attenuation effect of the incident light .
The scene contains a relatively large transparent object , which has negative influence on resampling. This is a current problem in this area, our algorithm provides no special treatment with this case, which lowers the imaging effect.
4.3 Experiment 2: simulated dataset
In this experiment, we compare the real-time performance and imaging quality of our algorithm with a DIBR-based virtual view synthesis algorithm. Limited by current studies, the data used in two virtual view synthesis methods do not support mutual-use. Therefore, simulation scenario is used in this experiment.
Most of studies based on DIBR in recent years is some improvements based on the framework proposed Do , for example, depth image preprocessing , improved hole-filling algorithm , allocation of resources , and parallelization acceleration. So in this experiment, we selected a DIBR synthesis algorithm proposed by Do .
4.3.1 Real-time performance experiment
Comparison between DIBR and light field
Do 1 (CPU)
Do 2 (CUDA)
4.3.2 Vision effect comparison
As shown in Fig. 14, column (a) lists real image and the details in virtual view position, column (b) lists images synthesized by Do’s algorithm, and column (c) lists images synthesized by our algorithm. In the cups scene, we can see that our result is much more similar to the ground truth at the right edge of the cup (the first and second rows in Fig. 14). In the bowling scene, there are ghost edges and blurring on the bowling pins because of the specular surface, and our result is good.
Objective evaluation results are as shown in Table 3 from 3 to 5 rows. The PSNR, MSE, and SSIM differences between our method and Do’s method were small, but our algorithm has better visual result in real-time performance.
4.4 Experiment 3: light field compression
Light field compression result
Original light Field (GB)
Compressed light field (MB)
Compress rate (%)
Traditional DIBR based virtual view synthesis method has low one-time imaging quality, and it is time-consuming to deal with artifacts, holes, and other issues. Our light field based virtual view synthesis method provides a better one-time imaging quality, and no repair work is needed after first-time imaging, which means lower computational complexity; thus, high-quality and real-time virtual view synthesis is performed in this paper. Experiments show that the virtual view synthetic methods proposed in this paper provides good performance in real time and objective image quality using light field. But with transparent objects in the scene, the imaging quality is not high enough. Tao  in their study has made some attempt for this issue, which is also the direction of our next study. Moreover, our method is currently only performed on image array. The amount of data will be larger when it comes to videos. Our next work is to compress light field.
This work is supported by the National High-tech R&D Program of China (863 Program) (Grant No. 2015AA015904), China Postdoctoral Science Foundation funded project (2015 M571640), Special grade of China Postdoctoral Science Foundation funded project (2016 T90408), CCF-Tencent Open Fund (RAGR20150120), and Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase).
LY implemented the core algorithm and drafted the manuscript. WX participated in light field reconstruction and helped to draft the manuscript. YL participated in the light field compression. All authors read and approved the final manuscript.
The authors declare that they have no competing interest.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- L Mcmillan, An Image-Based Approach to Three-Dimensional Computer Graphics (University of North Carolina, Chapel Hill, 1997)Google Scholar
- Oh, K. J., Yea, S., & Ho, Y. S. (2009). Hole filling method using depth based in-painting for view synthesis in free viewpoint television and 3-D video. Picture Coding Symposium (pp.1-4).Google Scholar
- S Horst, Q Matthias, B Jörg, G Karsten, M Peter, Inter-view consistent hole filling in view extrapolation for multi-view image generation. IEEE Int Conf Image Process 22, 426 (2015)Google Scholar
- Muller, K., Smolic, A., Dix, K., & Kauff, P. (2008). Reliability-based generation and view synthesis in layered depth video. Multimedia Signal Processing, 2008 IEEE, Workshop on (Vol.24, pp.409-427). IEEE.Google Scholar
- Smolic A, Mueller K, Merkle P, et al. Multi-view video plus depth (MVD) format for advanced 3D video systems. ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q, 2007, 6: 2127.Google Scholar
- Gaurav, C., Olga, S., & George, D. (2011). Silhouette-aware warping for image-based rendering. Eurographics Conference on Rendering (Vol.30, pp.1223–1232). Eurographics AssociationGoogle Scholar
- P Merkle, Y Morvan, A Smolic, D Farin, K Müller, PHND With et al., The effects of multiview depth video compression on multiview rendering. Signal Process Image Commun 24(1–2), 73–88 (2009)View ArticleGoogle Scholar
- M Sharma, S Chaudhury, B Lall, MS Venkatesh, A flexible architecture for multi-view 3dtv based on uncalibrated cameras. J. Vis. Commun. Image Represent. 25(4), 599–621 (2014)View ArticleGoogle Scholar
- TY Ho, DN Yang, W Liao, Efficient resource allocation of mobile multi-view 3d videos with depth image-based rendering. IEEE Trans Mob Comput 14(2), 344–357 (2015)View ArticleGoogle Scholar
- J Xiao, MM Hannuksela, T Tillo, M Gabbouj, Scalable bit allocation between texture and depth views for 3-d video streaming over heterogeneous networks. IEEE Trans. Circuits Syst. Video Technol. 25(1), 139–152 (2015)View ArticleGoogle Scholar
- Y Mori, N Fukushima, T Yendo, T Fujii, M Tanimoto, View generation with 3d warping using depth information for ftv. Signal Process Image Commun 24(1–2), 65–72 (2009)View ArticleGoogle Scholar
- Zinger, S., Do, L., & With, P. H. N. D. (2010). Free-viewpoint depth image based rendering. Journal of Visual Communication & Image Representation, 21(s 5–6), 533–541.Google Scholar
- Levoy, M. (1996). Light field rendering. Conference on Computer Graphics and Interactive Techniques (Vol.2, pp.II-64 - II-71). ACM..Google Scholar
- M Levoy, R Ng, A Adams, M Footer, M Horowitz, Light field microscopy. ACM Trans Graph 25(3), 924–934 (2006)View ArticleGoogle Scholar
- Ren, N., Levoy, M., Bredif, M., Duval, G., Horowitz, M., & Hanrahan, P. (2005). Light field photography with a hand-held plenoptic cameraGoogle Scholar
- C Kim, H Zimmer, Y Pritch, A Sorkine-Hornung, M Gross, Scene reconstruction from high spatio-angular resolution light fields. ACM Trans Graph 32(4), 96–96 (2013)MATHGoogle Scholar
- Tao, M. W., Srinivasan, P. P., Malik, J., Rusinkiewicz, S., & Ramamoorthi, R. (2015). Depth from shading, defocus, and correspondence using light-field angular coherence. IEEE Conference on Computer Vision and Pattern Recognition (pp.1940-1948). IEEE Computer Society.Google Scholar
- Adelson, Edward H., and James R. Bergen. The plenoptic function and the elements of early vision. Vision and Modeling Group, Media Laboratory, Massachusetts Institute of Technology, 1991Google Scholar
- R Szeliski, S Gortler, R Grzeszczuk, MF Cohen, The lumigraph. Proceedings of Siggraph, 2001, pp. 43–54Google Scholar
- B Wilburn, N Joshi, V Vaish, EV Talvala, E Antunez, A Barth et al., High performance imaging using large camera arrays. ACM Trans. Graph. 24(3), 765–776 (2010)View ArticleGoogle Scholar
- Stroebel, & Leslie, D..Basic photographic materials and processes /. Focal Press..Google Scholar
- Ng, R. (2006). Digital light field photography. (Vol.115, pp.38-39). Stanford University.Google Scholar
- Bracewell, R. N.. The Fourier Transform & Its Applications. The Fourier transform and its applications /. WCB/McGraw Hill.Google Scholar
- Do, L., Zinger, S., & With, P. H. N. D. (2009). Quality improving techniques for free-viewpoint dibr. Proceedings of SPIE - The International Society for Optical Engineering, 7524, 1–4.Google Scholar
- Tao, M. W., Wang, T. C., Malik, J., & Ramamoorthi, R. (2014). Depth Estimation for Glossy Surfaces with Light-Field Cameras. Computer Vision - ECCV 2014 Workshops. Springer International PublishingGoogle Scholar
- Magnor, M., & Girod, B. (2000). 3D TV: Data compression for light-field rendering. (Vol.10, pp.338-343).Google Scholar
- Y Cho, K Seo, KS Park, Enhancing depth accuracy on the region of interest in a scene for depth image based rendering. Ksii Trans Internet Inf Systs 8(7), 2434–2448 (2014)Google Scholar
- B Kim, M Hong, An adaptive interpolation algorithm for hole-filling in free viewpoint video. J Meas Sci Instrume 8681(4), 343–345 (2013)MathSciNetGoogle Scholar