 Research
 Open Access
 Published:
Calibration and rectification of vertically aligned binocular omnistereo vision systems
EURASIP Journal on Image and Video Processing volume 2017, Article number: 46 (2017)
Abstract
Omnidirectional stereo vision systems have been widely used as primary vision sensors in intelligent robot 3D measurement tasks, which require stereo calibration and rectification. Current stereo calibration and rectification methods suffer from complex calculations or a lack of accuracy. This paper establishes a simple and effective equivalency between an omnidirectional stereo vision system and a perspective vision system by studying stereo calibration and rectification methods. First, we improved the stereo calibration method. By applying the essential matrix, the complicated calibration process of the original method is simplified. By using a manual extraction method to extract corner points, noise error is eliminated and high precision is ensured. Second, we propose a new rectification method. By using the proposed simple rectification model and calibration data, the baseline length and an accurate columnaligned image pair are easily obtained, which reduces the computation time. The proposed stereo calibration and rectification method can simply and effectively obtain two key parameters of the triangulation formula for 3D measurement tasks: baseline length and parallax. Using real data captured by equipment, we performed experiments covering all the necessary stages to obtain a highperformance omnidirectional stereo vision system. Statistical analyses of the experimental results demonstrate the effectiveness of the proposed method.
Introduction
Omnidirectional stereo (omnistereo) vision systems composed of omnidirectional cameras offer the possibility of providing 3D measurement information for a 360° field of view. Several interesting configurations of omnistereo systems, such as binocular omnistereo [1], Nocular omnistereo [2], circular projection omnistereo [3], and dynamic omnistereo [4], have been designed to achieve different mission requirements. Vertically aligned binocular (Vbinocular) omnistereo vision systems, composed of two vertical coaxial catadioptric omnidirectional cameras, provide certain advantages over other types of omnistereo vision systems. (a) These systems possess a simple epipolar geometry correspondence. (b) The depth accuracy of the Vbinocular omnistereo vision system is isotropic, and there are no occlusions of the image pair due to the coaxial installation. Due to the above advantages, Vbinocular omnistereo vision systems have been widely used in many intelligent robot tasks [5,6,7,8,9]. In our research, to obtain a highperformance Vbinocular omnistereo vision system, we focused on stereo calibration and rectification.
For a stereo system, calibration is the process of calibrating the camera intrinsic parameters and the cameracamera extrinsic relationship. There are two categories of current stereo calibration methods for omnistereo vision systems. One category is the calibration of the relative parameters between the camera and calibration boards. These parameters are then transformed into the cameracamera relationship [10]. Such methods provide high precision but typically require multiple calibration stereo pairs, and the LevenbergMarquardt [11] iterative algorithm is required to reduce errors. Thus, significantly more work is required to configure the control points and measurement process. The other category consists of methods that calibrate the absolute parameters in the world coordinates [12] based on epipolar geometry [13]. The method uses only selfpoint correspondences in one image pair without requiring prior knowledge about the scene. However, accuracy suffers, making selfpoint correspondences in one image pair unsuitable for 3D information measurement tasks [14].
Stereo rectification aligns the corresponding points on the same column [15]. Current omnistereo rectification models also suffer from various defects. Some are limited to articular mirrors and produce heavily distorted images. Other models are not scanline methods, and thereby lose the important advantage of simplified stereo matching. Y. Wang et al. proposed an omnistereo rectification method [14], which is a scanline method and avoids heavy distortion; however, the rectification model is complicated and not suitable for realtime 3Dinformation measurement tasks.
To overcome these difficulties and obtain a highperformance omnistereo system, first, we improved the stereo calibration method [10] based on an epipolar geometry [12], which requires only a few matching points manually extracted from one image pair to reduce the complexity of the calibration and ensure accuracy. Second, we propose a simple rectification method. The calculation of baseline length and accurate columnaligned image pairs are easily achieved by using the proposed simple rectification model and calibration data. The proposed rectification method can reduce effort while ensuring a realtime calculation. After the proposed procedure, two key parameters of the triangulation formula for 3D measurement tasks, baseline length and parallax, are easily obtained.
Using real data captured by our system, we performed experiments with the proposed stereo calibration and rectification method and compared the data with those from some existing methods. We also performed other necessary experiments to verify the high performance of the proposed method, including stereo matching, 3D reconstruction and depth estimation. Statistical analyses of the experimental results demonstrate the effectiveness of the system.
Stereo calibration method
Singleviewpoint system calibration
Nayar and Baker use a mathematical formula to prove that a singleviewpoint catadioptric omnidirectional system mirror section must be a quadratic curve [16]. Geyer and Daniilidis [17] also demonstrate that a central catadioptric system is coincident with the unified sphereimaging model. The unified sphereimaging model projection process, as shown in Fig. 1, isolates the nonlinear transformation from the projection, substantially simplifying the subsequent analysis and calculation.
Due to the lens distortion of the perspective camera, the resulting omnidirectional camera calibration errors must be considered. For the calibration of internal parameters, we referred to the singleview, omnidirectional camera calibration algorithm proposed by Mei and Rives [10] in which camera lens distortion is introduced into the projection image process and the installation error of the structure is compensated. In addition, this method provides the best results with respect to catadioptric omnidirectional systems among existing methods [18]. The first step is the singlecamera calibration using the OpenSource toolbox [10] based on the unified sphereimaging model [19]. Figure 2 shows the transformation of the omnidirectional camera calibration model’s coordinate system. The omnidirectional camera coordinate system is the same as the mirror coordinate system, and the origin of the coordinate system is the internal focus of the mirror.
The calibration model projection process includes unknown parameters, as described below.

1.
Extrinsic parameters: The relationship between the plane calibration plate coordinate system and the panoramiccamera coordinate system can be expressed by the formula x = PX, where P = [R _{w}, T _{w}] is a 3 × 4 matrix, as shown in Fig. 2. The rotation matrix R _{w} is a quaternion notation, wherein W represents the projection process, and V _{1} = [q _{ o }, q _{1}, q _{2}, q _{3}, t _{1}, t _{2}, t _{3}] represents the unknown variables in the projection matrix P.

2.
Nonlinear projection transformation: Assuming that the projection coordinates under the mirror coordinate system are known, the projection point coordinates on the metric plane can be calculated. H represents the nonlinear projection equation, and V _{2} = [ξ] represents the unknown variable of the mirror.

3.
Distortion: The model introduces two primary distortions: radial distortion, which is caused by changes in radial curvature, and eccentric distortion, which is caused by the incomplete colinearity of the axes of the optical lens. These distortions occur in optical systems as a result of assembly errors. Distortions can be expressed by five parameters, in which three are radial distortion δ _{ r } factors:
$$ {\delta}_r=1+{k}_1{\rho}^2+{k}_2{\rho}^4+{k}_5{\rho}^6 $$(1)where \( \rho =\sqrt{x^2+{y}^2} \). The remaining two parameters are eccentric distortion δ _{ d } factors:
$$ {\delta}_d=\left[\begin{array}{c}\hfill 2{k}_3 xy+{k}_4\left({\rho}^2+2{x}^2\right)\hfill \\ {}\hfill {k}_3\left({\rho}^2+2{y}^2\right)+2{k}_4 xy\hfill \end{array}\right] $$(2)D represents the distortion equation, and V _{3} = [k _{1}, k _{2}, k _{3}, k _{4}, k _{5},] represents the distortion variable.

4.
Perspective camera model: the projection process from the normalized plane to the image plane can be expressed using the generalized camera projection matrix K _{ c }:
$$ {\boldsymbol{K}}_c=\left[\begin{array}{c}\hfill {\gamma}_1\hfill \\ {}\hfill 0\hfill \\ {}\hfill 0\hfill \end{array}\kern1em \begin{array}{c}\hfill \alpha \hfill \\ {}\hfill {\gamma}_2\hfill \\ {}\hfill 0\hfill \end{array}\kern1em \begin{array}{c}\hfill {u}_0\hfill \\ {}\hfill {v}_0\hfill \\ {}\hfill 1\hfill \end{array}\right] $$(3)where γ _{ i } = f _{ i }⋅η, f is the camera focal length, η is the mirror parameter, and V _{4} = [α, γ _{1}, γ _{2}, u _{0}, v _{0}] represents the unknown variable.

5.
Final projection equation: G represents the allprojection equation, and V includes the 18 unknowns:
$$ G={K}_c\times D\times H\times W,= V=\left[{V}_1,{V}_2,{V}_3,{V}_4\right] $$(4)Assuming that there are n images in the calibration board, each image has m corner points, and all the unknown parameters’ maximum likelihood solutions can be obtained by calculating the minimum value of the formula
$$ {\displaystyle \sum_{i=1}^n{{\displaystyle \sum_{j=1}^m\left\Vert G\left({V}_{1 i},{V}_2,{V}_3,{V}_4,{X}_{i j}\right){m}_{i j}\right\Vert}}^2} $$(5)where G(V _{1i }, V _{2}, V _{3}, V _{4}, X _{ ij }) is the projection of the calibration board’s corner points, and X _{ ij } and m _{ ij } contain the corresponding image coordinates. Equation 5 is a nonlinear optimization equation that can be solved using the LevenbergMarquardt optimization algorithm. The initialvalue selection problem has been well analyzed in Mei and Rives [19] and will not be discussed in this study.
A modification theory
After omnidirectional camera calibration, all variables are known except for the extrinsic parameters. The toolbox proposed by Mei and Rives does not include the function to calculate extrinsic parameters between the board and camera. Thus, we developed this function independently from the toolbox. The theory is as follows: P′ is the projection matrix from the plane calibration board’s coordinate system to the omnidirectional camera coordinate system, and V _{1} ' = [q _{0}, q _{1}, q _{2}, q _{3}, t _{1}, t _{2}, t _{3}] contains the unknown variables. Assuming that the calibration block has m corner points, the homogeneous coordinates are \( {X}_i^{\prime } \), and the corresponding image coordinates of \( {X}_i^{\prime } \) are \( {m}_i^{\prime } \). Then, the maximum likelihood solutions of the external parameters can be obtained by calculating the minimum value of the formula
where V _{2}, V _{3} and V _{4} are internal omnidirectional camera parameters that are calculated after calibration and are used as known values to obtain V _{1} ' based on the LM nonlinear optimization algorithm. The calibration block corners’ coordinates in the omnidirectional coordinate system are calculated by
Cylindrical expansion model
A cylindrically expanded, panoramic image is based on the unified sphereimaging model. By cutting the cylindrical surface radially and tiling it, a 2D, rectangular, cylindrical, panoramic image can be obtained. This procedure eliminates the scene distortion in the restored image. Experiments were conducted with stereo vision systems and image processing software that was independently developed by our laboratory, as shown in Fig. 3.
As shown in Fig. 3, the omnidirectional system’s effective viewpoint is considered to be the origin of the mirror coordinate system O _{m} X _{m} Y _{m} Z _{m}, and the virtual imaging plane is considered to be a cylindrical surface with a coaxial omnidirectional system whose radius is f. Assuming that the cylindrically expanded image resolution is W × H and that the pitch angles of the cylinder image’s upper and lower edges are α _{1} and α _{2}, the cylindrical image height is H = f tan α _{1} + f tan α _{2}, as shown in Fig. 4. m ' = [i, j]^{T} is a point on the cylindrically expanded image, and the 3D coordinate x of m ' in the mirror coordinate system can be expressed as follows:
where θ = 2πi/W.
Epipolar geometry and essential matrix
The epipolar geometry describes a geometric relationship between the positions of the corresponding points in the two images acquired by central cameras [13]. Because epipolar geometry is a property of central projection perspective cameras, it also exists for central catadioptric cameras and is represented by a matrix as the essential matrix. Figure 5 shows the epipolar geometry model of a general stereo vision system based on the unified sphereimaging model. This model is a general model and also applies to the vertically arranged stereo omnidirectional system. Without loss of generality, we used this model to introduce the epipolar geometry. The epipolar constraint of the stereo image pair simplifies the search procedure from a 2D search into a 1D search.
F _{1} and F _{2} represent two singleviewpoint visionsystem coordinate systems whose origins are O _{ m1} and O _{ m2}. The two corresponding image planes are π_{p1} and π_{p2}, and x = [x, y, z]^{T} is a point in the 3D space. Without loss of generality, it is assumed that the world coordinate system coincides with the coordinate system F _{1}; l _{1} and l _{2} are a polar pair correspondence; the image points m _{1} = [u _{1}, v _{1}] and m _{2} = [u _{2}, v _{2}] are two image points of x on the image plane, where m _{1} = ∈ l _{1}, m _{2} = ∈ l _{2}; and polar points are denoted as e _{ ij }(i = 1, 2, j = 1, 2). Projecting x onto the twosphere surface, we obtain x _{ s1} = [x _{ s1}, y _{ s1}, z _{ s1}]^{T} and x _{ s2} = [x _{ s2}, y _{ s2}, z _{ s2}]^{T}. The points O _{ m1}, O _{ m2}, and x constitute the polar plane, and the line connecting O _{ m1} and O _{ m2} is the baseline. R and T are the rotation matrix and the translation vector between the two omnidirectional coordinate systems, and vector T also represents the coordinate of O _{ m2} in the coordinate system F _{1}. Because T and x _{ s1} are on the polar plane, the normal equation of the polar plane can be expressed as N _{ F1} = T × x _{s1}, where x represents the outer product. The normal equation of the polar plane under the F _{2} coordinate system can be expressed as N _{ F2} = RN _{ F1} = R(T × x _{ s1}). Because the point x _{ s2} is also on the intersection line of the polar plane and the unit sphere, x _{ s2} ^{ T} ⋅ N _{ F2} = 0 and
Assuming that T = [t _{ x }, t _{ y }, t _{ z }]^{T}, we obtain
where [T]_{×} is the antisymmetric matrix of the vector T. Therefore, Eq. 9 can be expressed as follows:
where E = R[T]_{×} is an essential 3 × 3 matrix with a rank of 2. Notably, the essential matrix has only 5° of freedom. Based on Eq. 10, each pair of points can provide two linear constraint equations of the essential matrix; thus, calculating the essential matrix requires a minimum of eight pairs of points. Because of the homogeneity of Eq. 10, the essential matrix E can be obtained only with the difference of a nonzero factor, indicating that the equation \( \boldsymbol{E}=\left[\boldsymbol{R},\overline{\boldsymbol{T}}\right] \) _{×} with the motion parameters [R, T] can only be obtained as \( \left[\boldsymbol{R},\overline{\boldsymbol{T}}\right] \), that is, with the difference of a nonzero factor, where \( \boldsymbol{T}=\lambda \overline{\boldsymbol{T}} \). Here, the physical meaning of the nonzero factor λ is the baseline length.
Relative positional calculation of the omnidirectional, stereo vision system
Based on the epipolar geometry of the omnidirectional stereo vision system, the relative position of the two omnidirectional cameras is equal to the essential matrix. Using the improved eightcornerpoint algorithm [20], the eigenvalue of the decomposed essential matrix is E = UDV ^{T}, where D = diag(σ, σ, 0); thus, the solution of R and \( \overline{\boldsymbol{T}} \) can be expressed as follows:
where
The symbol ≈ indicates that there is a difference in the proportional constant factor λ, which can be calculated by the real translation vector \( \boldsymbol{T}=\lambda \overline{\boldsymbol{T}} \), where \( \overline{\boldsymbol{T}} \) is the unitized vector of the translation vector solution in Eq. 11. In this case, the physical meaning of λ is the baseline length. The calculation method of λ and the real translation vector T will be described after the stereo rectification method is described.
Stereo rectification method
A new rectification model
In the standard Vbinocular omnistereo vision system shown in Fig. 6, the triangulation formula can be simplified as a standard, binocular, visual triangulation formula for perspective stereo systems as follows:
where v = v _{2} − v _{1} describes the pixel disparity of the vertical axis, f is the camera focal length, and d is the length of the vertical baseline. Equation 12 shows that the positioning accuracy is isotropic in a vertically aligned stereo vision system and is not affected by the field of view. The epipolar line is the vertical axis of the cylindrically expanded image, which makes the corresponding polar line simple to determine.
An ideal, verticalbaseline, omnidirectional, stereo pair has a linear, epipolar, geometric relationship in the radial direction; however, this assumption is ideal in that inevitable misalignment errors exist between the two optical axes when applied in practice. Our rectification is a procedure used to obtain a column alignment image pair in our vertically arranged systems. A standard, stereo, cylindrical image pair can be generated using the rotation matrix R, and the unit transition vector \( \overline{\boldsymbol{T}} \) can be calculated using the essential matrix. This significantly improves the speed of the algorithm making it suitable for realtime applications. After applying our proposed procedure, two important parameters, namely, (a) baseline length (obtained via stereo calibration) and (b) pixel disparity of the vertical axis (easily obtained from a paralleled cylindrical image pair obtained by rectification), can be used in Eq. 12 to calculate metric scene measurements for robot tasks.
Figure 7 shows the cylindrical image, rectification model of the stereo vision system with a deviation in the optical axis. The epipolar line of the upper and lower cylindrical image is not the vertical line of the expanded cylindrical image when the expansion model that we mentioned is applied directly. However, based on the rectification model shown in Fig. 7, the vertical baseline (the connection of the two origins) of the panoramic, stereo, expanded, cylindrical image pair described in Fig. 6 can be obtained directly. The corresponding polar line and triangulation formulas are thus simplified.
By using the effective viewpoints of the two omnidirectional vision systems as origins to establish the omnidirectional vision system’s coordinates, F_{ u } and F_{ d } and O _{ m1} and O _{ m2} are the origins of the systems, indicating O _{ m1} as the origin of the world coordinate system. R and \( \overline{T} \) can be calculated, and the physical meaning of the unit translation vector \( \overline{T} \) is the unit vector of O _{ m2} in F _{u}. Point x _{u} is a point of the upper cylindrical expansion image in F _{u}, and x _{d} is a point of the lower cylinder expansion image in F _{d}. Thus, the following holds:
The line connecting the two origins is the axis of the new annular, cylindrical expansion image. In Fig. 7, the normal vector of the annular, cylindrical top plane π can be expressed as \( n=\left[{n}_x, n{}_y,{n}_z\right]=\overline{T} \) in F _{ u }. F _{u} is transformed to make the plane π perpendicular to the \( {Z}_{\mathrm{m}1}^{\prime } \) axis in the new coordinate system \( {F}_{\mathrm{u}}^{\prime } \) via a rotation transformation, which can be expressed as
where
and m = [0, 0, 1]^{T}. \( {F}_u^{\prime } \) is the expanded cylindrical coordinate system shown in Fig. 4, and x _{u} is described by Eq. 8. Similarly, transforming the expanded cylindrical image in the coordinate system F _{ d }, where \( \boldsymbol{n}=\boldsymbol{R}\overline{\boldsymbol{T}} \), the formula for R _{d} can be described by Eq. 15.
In Eq. 15, the calculation of the rotation matrix R _{u} does not rely on the real solution of the translation vector T; indeed, the rotation matrix R _{u} can be calculated using the unit translation vector \( \overline{\boldsymbol{T}} \), which is directly decomposed by the essential matrix E. Using Eqs. 8 and 13–15 to calculate the point’s 3D coordinates of the expanded cylindrical image in the mirror coordinate system, the unified sphereimaging model’s projection formula can be applied to facilitate the cylindrical expansion of the upper and lower panoramic stereo image pair to obtain the standard, cylindrical, stereo image pair. Then, the standard, verticalbaseline, omnidirectional, stereovisionpositioning model in Fig. 6 is used to calculate the 3D coordinates of the spatial points according to Eq. 12.
Baseline length calculation
The 3D coordinates of one of the calibration board’s corner points are X = [X, Y, Z]^{T}, and the corresponding point coordinates in the cylindrical image are equal to m = [u, v]^{T}. The corresponding point’s incident light vector can be determined using Eq. 8, where x = [x, y, z]^{T}; thus, the corresponding point’s angle of incident light can be expressed as follows:
This angle’s corresponding parallax value is assumed to be v′, and as a result, based on Eq. 12, the corner space coordinates can be expressed as
The length between the two corner points of the calibration board is assumed to be equal to L; then, the coordinates in the omnidirectional coordinate system can be expressed as X _{1} = [X _{1}, Y _{1}, Z _{1}]^{T} and X _{2} = [X _{2}, Y _{2}, Z _{2}]^{T}, and the parallax values are \( {v}_1^{\prime } \) and \( {v}_2^{\prime } \), respectively. Thus, the following holds:
where the proportional coefficient λ is described by
Therefore, an accurate value of the proportional constant factor λ can be calculated from two corner points that are accurately located on the calibration board; the real value of the translation vector T can thus be obtained.
Results and discussion
Vbinocular omnistereo vision system
Figure 8 presents our experimental equipment. A Vbinocular omnistereo system was used as a vision sensor in an REVVB32 crawlertype intelligent mobile robot. The robot was used for the tracking and localization of moving targets in our lab. The two singleviewpoint omnidirectional cameras consisted of highaccuracy hyperbolic mirrors and GREY POINT 1394b cameras, and the parameters of which are shown in Table 1, which were given by the manufacturer.
The camera base was equipped with a 3°offreedom adjustment device. By using a singleviewpoint constraint determination method [20] and by adjusting the device, the singleviewpoint constraint was considered to be satisfied during cameramirror assembly, and the installation accuracy of the mechanical structures was guaranteed. There were no changes in the extrinsic parameters of our hardware configuration. The baseline was defined as the distance from the focus of the two mirrors under the unified sphereimaging model, which was accurately measured directly by a long Vernier caliper. The installation spacing of the vertical baseline was 332 mm, which was used as a ground truth to validate the accuracy of the stereo calibration method.
Single camera calibration experiment
The OpenSource calibration toolbox [10] was directly used to independently calibrate the intrinsic parameters of the upper and lower omnidirectional cameras. A total of 20 images were obtained by each camera. The calibration images are shown in Fig. 9.
The calibration results are shown in Table 2. The accuracy of the toolbox is demonstrated in [18]. From Table 2, the calibrated parameter of the upper mirror parameter was ξ = 0.83176, and the lower mirror parameter was ξ = 0.84279. Compared with the designed parameter ξ = 0.82 from Table 1, the calibration results indicate that the upper and lower omnidirectional cameras were both correctly installed.
The results of cylindrical expansion using the intrinsic calibration results, the expansion model, and a stereo pair (shown in Fig. 10) are shown in Fig. 11a left. Details are provided in Fig. 11a right, which shows that the spatial imaging pixels are out of alignment due to coaxial installation errors. Therefore, stereo calibration and rectification are required to obtain a pixelaligned image pair.
Stereo calibration
Seven calibration boards are presented in Fig. 10, and we manually measured the distance to the corner points of the seven calibration boards with respect to the center of the sensor when taking these photos. Three corner matching points from each calibration board are extracted from the image pair. We manually extracted the initial matching points and then used an extraction algorithm with subpixel accuracy in a 9pixel neighborhood of the initial points to extract matching points. This can significantly reduce matching errors, thereby improving the calculation accuracy and eliminating noise interference. Using the improved method based on the essential matrix, the calculated results for the camera pose using 21 pairs of matching points are shown in Table 3 after the incorrect results are rejected.
The original method [5] was also used for comparative experiment results. We used the same image (Fig. 10) in the contrast stereo calibration method. The following experiment was performed. First, we calculated the extrinsic parameter matrices [R _{1}, T _{1}] and [R _{2}, T _{2}]. The coordinates of a calibration point in the world coordinate system and in the two omnidirectional coordinate systems were X = [X, Y, Z]^{T}, x _{1} = [x _{1}, y _{1}, z _{1}]^{T}, and x _{2} = [x _{2}, y _{2}, z _{2}]^{T}. Then, the transformation between X, x _{1}, and x _{2} is achieved as follows:
After eliminating X, the following is obtained:
where \( \boldsymbol{R}={\boldsymbol{R}}_2{\boldsymbol{R}}_1^{1} \) and \( \boldsymbol{T}={\boldsymbol{T}}_2{\boldsymbol{R}}_2{\boldsymbol{R}}_1^{1}{\boldsymbol{T}}_1 \) are the rotation matrix and translation vector of the two cameras.
The comparison results of our stereo calibration method with the contrast method are shown in Table 4.
The proportional constant factor λ was calculated using Eq. 18. The result of the improved method was λ = 330.1302 and \( \boldsymbol{T}=\lambda \overline{\boldsymbol{T}} \); therefore, the final translation vector obtained using our calibration method was Τ = [0.3301, 1.3205, ‐ 330.1302]. The modulus of vector T is the calibrated baseline length. The baseline length of our system was 332 mm. The deviation between our calibration and true value was 1.8698 mm with an error of 0.56%. The translation vector calculated by the original method in Table 4 was Τ = [‐ 7.9351, 4.4249, ‐ 322.8688], whose model is 322.9966. The deviation from the true value of 332 mm was −9.0034 mm, giving an error of 2.71%. Thus, the improved method has better performance.
When stereo calibration is performed using the contrast method, the calibrated results for each stereo pair will be slightly different. This is because the method [10] does not allow for the manual extraction of grid points. Mei and Rives only considered images wherein the grid points were successfully extracted, which increases noise and rounding errors. The results obtained using Eq. 20 can only be used as an initial approximation of the real results. The LevenbergMarquardt iterative algorithm is then used to perform the calculation, which minimizes the projection error. Such methods typically require multiple calibration stereo pairs, and thus, significantly more work is required to configure the control points and measurement process, which requires rigorous and complex calculations.
In contrast, in our method, we only need to calculate one stereo pair to obtain the rotation matrix and translation vector via our manual extraction method, therein allowing the methods to eliminate noise error to obtain the maximum amount of available data. Our calibration method is easier to implement.
Stereo rectification
A rectification experiment of the upper and lower images in Fig. 10 was performed using the calibrated intrinsic parameters, the calculated rotation matrix, the unit translation vector, and our proposed stereo rectification method. We saved the rectification transforms as lookup tables. The results are shown in Fig. 11b left. Figure 11b right presents the details from the rectified image pair. We can observe that the stereo correspondences fall on the same line and that the pixels are aligned. Comparing the details before rectification with those after rectification, we can conclude that our rectification method is effective.
To accurately evaluate the precision of the stereo rectification process, nine corner coordinates were extracted from each calibration board in Fig. 11b left. We manually extracted the initial matching points and used an extraction algorithm in the 9pixel neighborhood of the initial points with subpixel accuracy to extract the matching points. The abscissa parallax values of the corresponding corner points were also calculated. Selected results are shown in Table 5. The rectification accuracy can be determined from the row coordinate deviation value of corner points between the upper and lower images. Table 5 shows that all the row coordinate deviations are at the subpixel level. The noise error can be eliminated via manual extraction of the corner points. The mean value of the abscissa parallax of 63 corresponding corner points was 0.5875 pixels.
With the same picture taken during our experiment and the rectification method proposed by Wang et al. [14], a contrast experiment can be performed under the same conditions. We cannot visually see the difference between the algorithm proposed in this paper and the contrast algorithm visually from the image pair, so we compared the quantized data. The mean value of the abscissa parallax of 63 corresponding corner points was 0.9225 pixels. This finding indicates that our rectification method provides a higher pixelalignment accuracy. We used 500 pictures to record the computation time of our proposed algorithm and contrast algorithm. The average time of our algorithm was 97 ms per frame, while the average calculation time of the contrast algorithm was 151 ms per frame. Thus, our algorithm requires less computational time than the comparison algorithm.
There is no ground truth data for the rotation matrix. However, because the rectification method uses R and \( \overline{\boldsymbol{T}} \), the rectification accuracy can also laterally show the accuracy of the calculated rotation matrix R.
Offline experiments in practice: stereo matching, 3D reconstruction, and depth estimation
We also conducted stereo matching, 3D reconstruction and offline depth estimation experiments to show the accuracy of our proposed stereo calibration and rectification method.
Stereo matching
Stereo matching was performed on the rectified cylinder stereo image pairs using the semiglobal matching algorithm [21], and the matching results are shown in Fig. 12. The gray scale of the pixel points in the disparity map in Fig. 12 is in proportion to the distance between the points and the omnidirectional system (white is close; black is further away). Based on the experimental results, the disparity map represents the distance between most objects in the area of high texture (e.g., the area near the system where the depth information is valid after recovery). By using a subpixel accuracy algorithm to extract the parallax values of the corner points from the calibration board, the disparity between the determined values and the real values was less than 1 pixel. The deviation between the measurement points was less than 60 mm following the imagesampling measurement of seven calibration boards in the range of approximately 1–2 m.
3D reconstruction
Following stereo matching, 3D reconstruction of the seven calibration boards in Fig. 11b left using the unified sphereimaging model was performed. We used the stereo pair in Fig. 11b left as the input images. A total of nine matching points from each of the seven calibration boards were extracted via the manual extraction method. The 3D reconstruction results are shown in Fig. 13.
The distance between the calibration boards and the panorama system was less than 1–2 m. The origin was the center of the sensor; coordinates of all matching points were extracted from the generated picture. The corner points of the calibration boards are coplanar following reconstruction. By comparing the real distance of these corner points that we manually measured when taking these photos with the coordinates extracted from the generated picture, the average distance error of the 63 matching points was 1.16%. The 3D calculation results demonstrate the precision of our stereo calibration and rectification method.
Depth estimation
We also conducted a depth estimation experiment using the experimental depth estimation procedures in reference [22]. The system configuration remains the same as that introduced at the beginning of the experimental section. In our work, depth was defined as the distance from the middle of the cylindrical shell of the stereo vision system to the object point plus the radius of the shell. Figure 14 shows 18 ground truth points in three unexpanded stereo pairs. We manually measured the distance of each chosen point when taking these photos and considered them to be ground truth data. Following stereo calibration and rectification, we manually selected the target point and extracted the column coordinate deviation using an extraction algorithm subpixel accuracy to elimination of noise error and matching error. Using triangulation Eq. 12, we calculated the estimated values. Table 6 compares the estimated depth with the ground truth depth for the selected points.
According to Table 6, we calculated the average depth estimation error ratio with respect to the ground truth data to be 3.37%. This indicates that depth information can be effectively obtained following stereo calibration and rectification using our method.
Conclusions
In this paper, we have proposed a general, comprehensive stereo calibration and rectification method suitable for any Vbinocular stereo vision system. We have provided the key techniques required to establish a simple and effective equivalency between an omnidirectional stereo vision system and a perspective vision system, including stereo calibration and rectification. The stereo calibration method was improved. The stereo calibration procedure was simplified based on epipolar geometry. The rounding error was reduced, and the accuracy was ensured by using a manual extraction method. The experimental results verified that the improved stereo calibration method is more accurate than the original method and reduces the complexity of the algorithm. We proposed a simple rectification model. The experimental results verified that the computation time of the proposed rectification method is shorter, and the accuracy is higher than the existing method, which makes it more suitable for realtime vision tasks. Other experiments, such as stereo matching, 3D reconstruction, and depth estimation were also conducted. Experimental results and analyses also support the effectiveness of our methods. In conclusion, our methods can effectively meet the requirements of highprecision vision sensors for robot tasks.
References
Y Tang, Q Wang, M Zong, J Jiang, Y Zhu, Design of vertically aligned binocular omnistereo vision sensor. EURASIP J Image Video Process 2010, 1–24 (2010). doi:10.1155/2010/624271
S Gulati, T George, Multi parallax exploitation for Omnidirectional imaging electronic eye. United States Patent US 8, 2012, p. 326
G Chen, TD Bui, S Krishnan, S Dai, Circular projection for pattern recognition, in Advances in Neural Networks: 10th International Symposium on Neural Networks, Dalian, China, July 46, 2013. Proceedings, part I, ed. by C Guo, ZG Hou, Z Zeng (Springer, Berlin, 2013), pp. 429–436. doi:10.1007/9783642390654_52
Z Zhu, G Wolberg, JR Layne, Dynamic pushbroom stereo vision. 3D imaging for safety and security (Springer Verlag, Amsterdam, 2007), pp. 173–199
J Cacace, A Finzi, V Lippiello, G Loianno, D Sanzone, Aerial service vehicles for industrial inspection: task decomposition and plan execution. Appl Intell 42, 49–62 (2015). doi:10.1007/s1048901405420
Q Zhu, X Liu, C Cai, Feature optimization for longrange visual homing in changing environments. Sensors 14, 3342–3361 (2014). doi:10.3390/s140203342
ZE Kadmiri, OE Kadmiri, SE Joumani, Z Kaddouri, Color based omnidirectional target tracking. Int J Imaging Robotics 16, 1 (2016)
M Mendonça, LVR de Arruda, F Neves Jr, Autonomous navigation system using event drivenfuzzy cognitive maps. Appl Intell 37, 175–188 (2012). doi:10.1007/s1048901103201
H Korrapati, Y Mezouar, Multiresolution map building and loop closure with omnidirectional images. Auton Robot 41, 967–987 (2017). doi:10.1007/s1051401695606
C Mei, P Rives, Single view point omnidirectional camera calibration from planar grids, in Proceedings of the 2007 IEEE International Conference on Robotics and Automation, 2007, pp. 3945–3950. doi:10.1109/ROBOT.2007.364084
JJ Moré, The LevenbergMarquardt algorithm: implementation and theory. Lecture Notes Math 630, 105–116 (1978). doi:10.1007/BFb0067700
B Micusík, T Pajdla, Structure from motion with wide circular field of view cameras. IEEE Transactions Pattern Anal Mach Intell 28, 1135–1149 (2006). doi:10.1109/TPAMI.2006.151
T Svoboda, T Pajdla, Epipolar geometry for central catadioptric cameras. Int J Comput Vis 49, 23–37 (2002). doi:10.1023/A:1019869530073
Y Wang, X Gong, Y Lin, J Liu, Stereo calibration and rectification for omnidirectional multicamera systems. Int J Adv Robotic Syst 9, 1–12 (2012). doi:10.5772/50541
YP Tang, CJ Pang, ZS Zhou, YY Chen, Binocular omnidirectional vision sensor and epipolar rectification in its omnidirectional images. J Zhejiang Univ Technol 1, 20 (2011)
S Baker, SK Nayar, A theory of catadioptric image formation, in IEEE International Conference on Computer Vision, Mumbai, India, 1998, pp. 35–42. doi:10.1109/ICCV.1998.710698
C Geyer, K Daniilidis, Conformal rectification of omnidirectional stereo pairs, in Conference on Computer Vision and Pattern Recognition Workshop (Madison Book Company, Madison, 2003)
L Puig, J Bermúdez, P Sturm, JJ Guerrero, Calibration of omnidirectional cameras in practice: a comparison of methods. Comput Vis Image Underst 116, 120–137 (2012). doi:10.1016/j.cviu.2011.08.003
C Geyer, K Daniilidis, A unifying theory for central panoramic systems and practical implications, in Computer Visition ECCV 2000, ed. by D Vernon (Berlin, Springer, 2000), pp. 445–461. doi:10.1007/354045053X_29
Q Zhu, F Zhang, K Li, L Jing, On a new calibration method for single viewpoint constraint for catadioptric omnidirectional vision. J Hua Zhong Univ Sci Tech 38, 115–118 (2010)
Hirschmuller H, Accurate and efficient stereo processing by semiglobal matching and mutual information, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (IEEE, 2005), p. 807814. doi: 10.1109/CVPR.2005.56.
ZH Xiong, I Cheng, W Chen, A Basu, MJ Zhang, Depth space partitioning for omnistereo object tracking. IET Comput Vis 6, 153–163 (2012). doi:10.1049/ietcvi.2010.0115
Funding
This study was supported in part by the National Natural Science Foundation of China via grant number 61673129, the Natural Science Foundation of Heilongjiang Province of China via grant number F201414, and the Fundamental Research Funds for the Central Universities via grant number HEUCF160418 and HEUCF041703.
Author information
Authors and Affiliations
Contributions
The work presented in this paper was carried out in collaboration between all authors. CC, XW, and QZ conceived the research theme, designed and implemented the feature optimization procedure, and prepared the manuscript. XW and BF wrote the code. XW performed the experiments and analyzed the data. XW reviewed and edited the manuscript. All authors discussed the results and implications, commented on the manuscript at all stages, and approved the final version.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Cai, C., Weng, X., Fan, B. et al. Calibration and rectification of vertically aligned binocular omnistereo vision systems. J Image Video Proc. 2017, 46 (2017). https://doi.org/10.1186/s1364001701950
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1364001701950
Keywords
 Robot
 Stereo calibration
 Rectification
 Stereo vision system