In this paper, we propose a real-time virtual view synthesis method based on light field; the algorithm flow chart is shown in Fig. 3. First, light field is reconstructed according to the image array captured by camera array. Then, a virtual view image is synthesized by resampling the reconstructed light field, and this process is real time when light field is reconstructed.
Light field reconstruction
The photosensitive device inside camera captures images. Each position on these photosensitive devices captures all the light irradiation on it. Figure 4a describes how a pixel on a camera sensor is formed, wherein u plane is the plane of the main lens of the camera and s plane is the image plane of the camera; the thin lines represent the light converged on a pixel in the imaging plane through the main lens. Figure 4b describes a pixel which is formed in the s plane by overlaying a series of light pass through the main lens in the u plane. The box in Fig. 4b represents that all the light pass through the main lens and image on the s plane. The linear inside box intersects with axis s at point (a, 0) means a series of light pass through the u plane forms pixel (a, 0), which is the pixel in Fig. 4a. Here, for convenience of explanation, this example uses only one dimension each of plane (s, t) and plane (u, v). In practical, the imaging process also includes the vertical dimension v and t of light.
In the conventional imaging process, light is projected on the image plane which makes the position information of light on the s plane recorded but does not record the location information on the u plane. However, in a camera array, the imaging process, the resultant is a multiple image array of the same scene. This is equivalent to placing lots of main lens on the u plane in Fig. 4, which means the position information of light on the u plane is recorded. Therefore, using camera spatial relationship and imaging Eq. (3), a light field is captured.
As shown in Fig. 5, the focus in different distances from the plane s' and s, in Eq. (3) F is different; the result is equivalent to cutting the propagation of light in a track, different facets obtained. When camera is focused on the new plane s, F' changes into F, then expressed L
F
(u, s) with L
F
(u, s'); thus, a new image is synthesized.
As shown in Fig. 5, L
F
described by (u, s') also can be described by (u, s). Thus, let α = F '/ F, by the similar triangles, and then extending two-dimensional to four-dimensional case, the following equation can be obtained:
$$ {L}_{F\hbox{'}}\left(u,v,s\hbox{'},t\hbox{'}\right)={L}_F\left(u,v,u+\frac{s\hbox{'}-u}{\alpha },v+\frac{t\hbox{'}-v}{\alpha}\right). $$
(4)
Ng [22] have used Eq. (4) to achieve a light-field image refocusing work. In this paper, a pixel of slices of light field is calculated by Eqs. (3) and (4). Thus, any virtual view in light field is given by Eq. (5):
$$ {E}_{\left(\alpha *F\right)}\left(s\hbox{'},t\hbox{'}\right)=\frac{1}{\alpha^2{F}^2}{\displaystyle \iint }{L}_F\left(u,v,u+\frac{s\hbox{'}-u}{\alpha },v+\frac{t\hbox{'}-v}{\alpha}\right). $$
(5)
The formula expressed that image can be taken as a slice of 4D light field project on a 2D plane. Therefore, each image in image array can be taken as a slice of the light field of a certain scene project on a 2D plane.
Resampling in frequency domain
Although the spatial relationship between the image and the light field can be visually described in space domain, but the description itself is an integral projection process, algorithm of this process usually have high computational complexity O(n
4). In contrast, the relationship between images and light field in frequency domain can be simply express as an image is a 2D slice of 4D light field in Fourier domain. This conclusion stems from the fundamental theory Fourier slice theorem by Ron Bracewell [23] proposed in 1956. The classical theory has a great contribution later in the field of medical imaging, computed tomography scanning, and positron imaging technology.
Classic Fourier slice theorem performs a 1D projection in a 2D function; this theorem can be extended to 4D [22]. Figure 6 is a graphical illustration of the relationship between light fields and photographs, in both the spatial and Fourier domains. The middle row shows that photographs focused at different depths correspond to slices at different trajectories through the ray-space in the spatial domain. The bottom row illustrates the Fourier-domain slices that provide the values of the photograph’s Fourier transform. The slices pass through the origin. The slice trajectory is horizontal for focusing on the optical focal plane, and as the chosen virtual film plane deviates from this focus, the slicing trajectory tilts away from horizontal.
We use Fourier slice theorem to synthesize virtual view images based on image array. Figure 6 shows this theory in our application. Figure 7a is a synthetic virtual view. Figure 7b shows the relationship of rays between two planes, which is a number of rays through plane u will converge to a point in plane s. Figure 7c shows the Fourier slice in frequency domain, which coincides with k
s
axis.
The two planes parameterize light field are finite actually, and the size is constrained by camera array size and other parameters. Therefore, when it comes to frequency domain, the slice is not a line but a line segment, and when it comes to space domain, it is a tangent plane which is not perpendicular to plane (u, v) and plane (s, t). As shown in Fig. 8, if the tangent is out of blue range, such as the yellow tangent, the resolution of virtual view image is deleted; if the tangent is in the blue region, such as the red tangent, the resolution of virtual view image is not deleted. Therefore, when resampling in frequency domain, in order to get a full-resolution image, the focal length should be shorter than the distance between plane (u, v) and (s, t).
The process of virtual view synthesis can be transformed from space domain to frequency equivalently.
First, transforming light field into frequency domain, the transformation formula is:
$$ {G}_F\left({f}_u,{f}_v,{f}_s,{f}_t\right) = {g}^4\left\{{L}_F\left(u,v,s,t\right)\right\}. $$
(6)
Second, performing Fourier slice, the formula is:
$$ {G}_{F^{`}}\left({f}_s,{f}_t\right)=\frac{1}{F^2}{G}_F\left({B}_a^{-T}{\left[{f}_u,{f}_v,{f}_s,{f}_t\right]}^T\right). $$
(7)
Finally, the virtual view 2D image obtained by inverse Fourier transform, the formula is:
$$ {E}_{F^{\cdotp }}\left(s,t\right)={g}^2\left\{{G}_{F\hbox{'}}\left({f}_x,{f}_t\right)\right\}. $$
(8)
Equations (6) to (8) are the resampling imaging process in frequency domain, in which f
u
, f
v
, f
s
, and f
t
are the projection of the Fourier domain.
This process reduces the computational complexity of virtual view synthesis, thereby contributing to the real-time performance of virtual view synthesis. Figure 9 is a schematic view of the complexity of algorithms in spatial and frequency domain [22]. The frequency domain light field data can get from the collecting device, meanwhile the context of real time in this paper is after the reconstruction of light field. Therefore, computational complexity of operation in the frequency domain is O(n
2logn) + O(n
2) = O(n
2logn), which is reduced a lot compared to O(n
4) when operating in space domain.