1. Introduction
Accurate calibration of the relative pose between vision sensors is crucial for the final measurement accuracy of multi-vision-sensor (MVS) measurement systems. Although there are many existing methods that can efficiently handle the precise pose calibration problem for vision sensors with a common field-of-view (FOV), more effort is needed in the study of flexible and accurate calibration methods for non-overlapping vision sensors. Since the intrinsic calibration of a single vision sensor has some comprehensive solutions such as the methods presented by Tsai [
1] and Zhang [
2], we now focus on the extrinsic parameter calibration for vision sensors without overlapping FOVs in this paper.
According to the difference of auxiliary tools used in the calibration process, the published extrinsic calibration methods for non-overlapping vision sensors can be roughly classified into three categories. The first approach utilizes high precision measuring instruments to establish a global reference coordinate and finishes the global calibration by coordinate transformation. Lu and Li [
3] employed two theodolites and additional calibration targets to accomplish the global calibration. To solve the problem of the blind observation zone of a dual theodolite, Zhang et al. [
4] utilized one theodolite and one planar calibration target to obtain the 3D coordinates and corresponding image coordinates of the feature points. Xie et al. [
5] used a high-resolution digital camera instead of a theodolite as a global measurement device to complete global calibration. Similarly, Dong et al. [
6] realized the extrinsic calibration of a camera network based on close-range photogrammetry. In addition, Liu et al. [
7] used a laser tracker to establish a global coordinate system, and then used one precise three dimensional target to obtain the rotation and translation matrix of each local vision sensor coordinate system relative to the global laser tracker coordinate system, which in turn, realized the global calibration among multiple sensors. Additionally, Lu et al. [
8] utilized a coordinate measuring machine to accomplish the calibration of stereo cameras.
The second approach does not need auxiliary measuring instruments, but uses customized calibration targets that contain precise spatial geometric information. Taking advantage of a one-dimensional target, Liu et al. [
9] proposed a global calibration method for multiple sensors based on the principle of cross-ratio invariance and collinear constraints of the calibration object. Likewise, vanishing points of a one-dimensional target were also used for solving the coordinates of each target point in camera coordinate systems [
10]. Liu et al. [
11] further used the laser rangefinder instead of a long one-dimensional target for the calibration of vision sensors in a wide range. In order to improve the acquisition efficiency of feature points for calibration, Zhang et al. [
12] formed a dual-planar target by fixing two planar calibration panels on the two ends of a rigid beam. During the calibration process, each camera saw the planar target in its own FOV, and the invariance of spatial structure between the two planar targets was fully utilized to calculate the relative pose between two cameras. Additionally, Gong et al. [
13] constructed a global target covered with circular features to realize the quick calibration of multiple vision sensors for some specific applications. It is also worth noting that Kumar et al. [
14] skillfully acquired the calibration image of the same target by means of planar mirroring. The global calibration of multiple sensors was implemented by making the transformation between the virtual local vision sensor coordinate system and the real one. Liu et al. [
15] designed a three-dimensional target, which was combined of three planar targets to calibrate the relative orientation between multiple depth cameras. Ni et al. [
16] used the Lie algebra optimization to address the relative pose estimation for multiple cameras in the context of motion-based camera calibration. In their methods, a planar target is almost indispensable.
The final approach does not use any auxiliary tools during the calibration process, which is also called self-calibration methods. This kind of method is used to accomplish the calibration of the relative pose between vision sensors based on the invariance of the spatial structure in the observed scenario. Specifically, Equivel et al. [
17] applied the structure from motion algorithm to locate the change of the relative position of each image sequence acquired by each vision sensor and then implemented the global calibration of multiple sensors based on rigid constraints among the sensors. Lebraly et al. [
18] also presented a similar algorithm. Anjum et al. [
19] improved the estimation robustness of the relative orientation and position by taking the unobserved trajectory and the exit–entrance direction of each object into consideration. In addition, Mendikute et al. [
20] presented a self-calibration technique for vision systems by using redundant information of machine measurements to avoid extra mechanical anchoring or calibration means, but the method lacks versatility.
Among the aforementioned methods, the third kind of approach is the most flexible, but also has the worst calibration repeatability and accuracy. Essentially, most of self-calibration methods are not designed for measurement applications. Regarding calibration methods using customized targets, they can achieve moderate calibration accuracy and have a wide range of applications. However, the second kind of method is usually executed under controllable test environments such as in laboratories or factories. Despite the high cost of auxiliary measuring instruments, the first approach is still popular in an outdoor environment due to its high precision and strong capacity of resisting disturbance.
In some practical measurement applications, the field of view of each vision sensor is so narrow that there is no space to place such a large planar target or similar 3D target. The presented method in this paper was designed to overcome the calibration difficulties on these occasions. Unlike existing approaches, our method does not need additional targets to locate feature points for calibration. The accessorial target sphere of the laser tracker is used to form those calibration points. Since the target sphere diameter can range from 10 mm to about 50 mm, the accurate 3D locations of calibration points can be easily obtained in most constrained environments by freely moving the small target sphere at several positions. By substituting a small optical target sphere for those planar or three-dimensional targets, our method can accomplish the extrinsic calibration of vision sensors installed in a confined workspace. Despite the fact that the accurate locations of the target sphere center in the global reference frame can be directly read out by the laser tracker, the recovered 3D positions in the local camera coordinate are inevitably affected by noises. Thus, the distance and reprojection constraints were introduced in our method to restrain the noise influence. The remainder of this paper is organized as follows.
Section 2 introduces the basic calibration principle and procedure using a laser tracker and its accessorial target sphere. In
Section 3, the bundle adjustment for extrinsic parameters based on distance and projection constraints is described in detail.
Section 4 presents a physical experiment conducted to verify the feasibility of the proposed approach, specifically, the accuracy improvement due to bundle adjustment based on distance and projection constraints. Conclusions are drawn in
Section 5.
2. Calibration Principle
Since the local coordinate system of a vision sensor is always consistent with its camera coordinate, we hereinafter refer to the camera coordinate frame as the coordinate frame of the vision sensor. Without loss of generality, the global calibration of multiple vision sensors without overlapping FOVs is represented by the calibration of two non-overlapping cameras in this section. As illustrated in
Figure 1, the relative pose between the left and right cameras is to be calibrated. For brevity, the global coordinate system established by the laser tracker is denoted by global coordinate system (GCS), and local coordinate systems (LCS) of the left and right cameras are denoted by LLCS and RLCS, respectively.
The basic calibration procedure contains three steps. First, the target sphere is placed at several (at least three) different positions and then observed by the camera. The three dimensional coordinate of the sphere center in the GCS can be easily read out from the laser tracker and its coordinate in the LCS can be reconstructed according to the known radius of the sphere and its pixel coordinate in the projection image. Second, the relative position and orientation between the LCS and the GCS can be calculated after the point correspondence is given in the first step. Finally, the relative pose between the LLCS and the RLCS is computed by rigid transformation.
2.1. 3D Localization of the Target Sphere Center in the LCS and the GCS
At each placement position, the coordinate of the target sphere center in the GCS can be read directly from the laser tracker. Meanwhile, the coordinate of the target sphere center in the LCS can be reconstructed through the projection contour of the target sphere on the image plane, according to Shiu et al. [
21].
Specifically, the equation of the ellipse projected on the image plane by the target sphere is as follows:
where (u, v) is the pixel coordinate of the point on the image ellipse. The coefficient (a–f) of the ellipse can be calculated by elliptic fitting after extracting the contour of the image ellipse. Suppose (x, y, z) is the back-projection coordinate of the point on the image ellipse in the LCS and f0 is the focal length of the camera, then Equation (1) can be rewritten as:
where
,
,
,
,
,
. Equation (2) can be further expressed in terms of a quadratic form:
At the ith position, the three-dimensional coordinate P c i of the target sphere center in the LCS is shown below:
where {λ1, λ2, λ3} is the eigenvalue of matrix Q in Equation (3), and e3 = (e3x, e3y, e3z)T is the eigenvector of matrix Q corresponding to the eigenvalue λ3. Furthermore, R0 is the radius of the target sphere. It can be seen from Equations (3) and (4) that the three-dimensional coordinates of the target sphere centers in the LCS are closely related to the size of the target sphere and its position relative to the camera.
2.2. Local Calibration of the Relative Pose between the LCS and the GCS
Once the three-dimensional coordinates of the target sphere center in the LCS and the GCS at several placements are known, the local calibration of the relative pose between the LCS and the GCS can be accomplished through the following two steps. In the first step, three positions of the target sphere center are randomly selected to establish a transfer coordinate system (TCS) and obtain the initial value of the transformation matrix from the LCS to the GCS. Then, all positions of the target sphere center are involved in the optimization of the transformation matrix using a nonlinear iterative method such as the well-known Levenberg–Marquardt algorithm.
Assume that the coordinates of the target sphere center at three different positions in the LCS (LLCS or RLCS) are , , respectively, and their counterparts in the GCS are , , and . As long as the three points are not on a straight line, we can establish a TCS as follows:
- (1)
The origin O
t of the TCS in the LCS and the GCS are shown below:
where (
,
,
)T =
(
,
,
) and (
,
,
)
T =
(
,
,
) for i = 1, 2, and 3.
- (2)
The x-axis Xt of the TCS in the LCS and the GCS are represented by:
- (3)
The z-axis Zt of the TCS in the LCS and the GCS are:
- (4)
The y-axis Yt of the TCS can be calculated by means of the cross product of Xt and Zt, that is:
According to Equation (5), the translation vectors
and
from the TCS to the LCS and the GCS are respectively given by:
Based on Equations (6)–(8), the rotation matrices
and
from the TCS to the LCS and the GCS can be calculated as follows:
Now, we can calculate the relative pose (that is, the rotation matrix
and the translation vector
) between the LCS and the GCS using Equations (9) and (10):
Using the rotation matrix and the translation vector given by Equation (11) as the initial value, the relative pose can be further optimized by minimizing the objective function defined by
where
2.3. Global Calibration of the Relative Pose between Vision Sensors
Suppose the rotation matrix and the translation vector between the LLCS and the GCS are
and
, and their counterparts between the RLCS and the GCS are
and
, the rotation matrix
and the translation vector
can be given by:
where
, , , and
can be obtained using the method described in
Section 2.2. So far, the global calibration of extrinsic parameters between two vision sensors has been accomplished after two local calibrations have been carried out.
3. Bundle Adjustment Based on Distance and Reprojection Constraints
In
Section 2, we described the global calibration procedure for multiple vision sensors without an overlapping field of view in detail. Nevertheless, the practical image of the target sphere usually deteriorates to some extent due to the variations of the environmental illumination and the observation angle of the vision sensor. Furthermore, the uncertainties of the feature extraction during the sphere image processing and the elliptic fitting process also introduce errors into the final three-dimensional location of the target sphere center. Position errors of the target sphere center in the LCS will inevitably reduce the extrinsic calibration accuracy.
In order to decrease the adverse impact imported by the deviation of the three-dimensional positions of the target sphere in the LCS, two constraints are taken into consideration. One is the distance between different positions of the target sphere center in the GCS, and the other is involved with the image projection of the target sphere center besides that of the sphere contour.
Assume that the three-dimensional coordinates of the target sphere center in the GCS and the LCS at the ith position are
and
, respectively, and the deviation vector between the reconstructed coordinates of the target sphere center and the real ones at the ith position in the LCS is Δ
. Then, we can optimize the local calibration by bundle adjustment based on the distance and projection constraints and the objective function is given by:
where
= ||
− Δ
−
+ Δ
||,
= ||
−
|| and ||
−
||
2 represents the distance constraint between the three-dimensional coordinates of target sphere centers in the LCS and the GCS. The item
represents the distance between the image projection point of the target sphere center and the major axis of the projection ellipse of the target sphere, and
represents the reprojection errors of the reconstructed target sphere center positions. Assume that the camera intrinsic parameters
fx,
fy, u
0, and
v0, the distortion parameters
k1 and
k2 are known, and the linear equation of the major axis of the projection ellipse of the target sphere is
l(
l1,−1,
l2), then
di can be given by:
Based on the conclusion of Sun et al. [
22], the image projection point of the target sphere center should be located on the major axis of the projection ellipse of the target sphere, that is, the ideal value of
di should be zero. Additionally, the principal point of the image should be on the major axis of the projection ellipse of the target sphere, according to Daucher et al. [
23]. Then, we can obtain the linear equation of the major axis using the known principal point of the image (
u0,
v0) and the geometric center of the projection ellipse (
uc,
vc) as follows:
where the coordinates of the geometric center of the projection ellipse are given by:
and the coefficients a, b, c, d, and e are defined in Equation (1).
Additionally, the error function ei represents the distance between the extracted ellipse feature point (
uij,vij) and the re-projection ellipse at the ith position, which is defined as follows:
where M represents the number of extracted feature points on the ith projection image. Given the known target sphere center (
− Δ
), the radius of the sphere
R0 and the focal length
f0, the re-projection error
ei can be easily calculated by the method proposed by Shiu et al. [
21].
During the bundle adjustment process,
Δ is set to zero as its initial value, and other parameters can be initialized using the method described in
Section 2. From Equations (15)–(19), the optimal estimation of the rotation matrix and translation vector between the LCS and the GCS can be obtained by means of the large-scale trust-region reflective algorithm. After the precise local calibration, the precise calibration between vision sensors can be realized using the method described in
Section 2.3.
4. Experimental Results
To verify the feasibility of the proposed method, a typical vision measurement system with two non-overlapping vision sensors was established, as illustrated in
Figure 2. The two vision sensors were both AVT GC1380H digital cameras with 12 mm Schneider lens. The image resolutions of the vision sensors were 1360 × 1024. The reference GCS was built on a Leica API-T3 laser tracker and the radius of its target sphere was 19.5 mm. The LLCS and RLCS coincided with the corresponding local camera coordinates, respectively.
During the experiment, the target sphere was moved several times (no less than three) within the FOV of each camera. The target sphere should be placed as far as possible to cover the measurement range of each camera. For each position of the target sphere, the laser tracker was used to take 10 samples on the coordinates of the sphere center with the center-of-mass of the sphere center coordinates as the precise coordinates of the target sphere center in the GCS, thus reducing the location positioning noise of the target sphere center. Similarly, for each position of the target sphere, the camera also captured 10 projection images of the target sphere to reduce the influence of illumination on the image quality.
In the following section, the ellipse detection and fitting results in the projection images of the target sphere are first introduced. Then, the 3D positions of the target sphere center in the LCS and the GCS at each position are given. Finally, the accuracy of the global calibration results before and after the bundle adjustment are analyzed and compared.
4.1. Ellipse Detection and Fitting in the Projection Image of the Target Sphere
After projection images of the target sphere were captured, edge feature points were selectively extracted using the Harris detector. Then, the ellipse center of the projected sphere was located and the initial estimate of the ellipse equation coefficient was calculated utilizing the method proposed by Bennett et al. [
24]. Finally, the parameters of the projection ellipse equation were further optimized using the least square fitting algorithm.
Figure 3 and
Figure 4 exemplify the ellipse detection and fitting results of left and right projection images, respectively.
It is noteworthy that the distortion of all extracted feature points should be corrected before they are used to locate the ellipse center and further applied to fit the ellipse equation.
4.2. 3D Localization of the Target Sphere Center in the LCS and GCS
The intrinsic parameters of the left and right cameras were obtained using the method presented by Zhang [
2] and are shown in
Table 1.
For each camera, the target sphere was placed at ten different positions, respectively. The 3D coordinates of the target sphere center in the GCS read by the laser tracker and their counterparts in the LLCS and the RLCS that were reconstructed by Equation (4), where R
0 = 19.5 mm, are shown in
Table 2.
4.3. Results Analysis and Accuracy Comparison
Given the three-dimensional coordinates of the target sphere center in the GCS and LCS, the initial global calibration of the relative pose between the LLCS and the RLCS can be realized using method described in
Section 2.3. After the bundle adjustment based on distance and reprojection constraints, the rectified 3D coordinates of the target sphere center in LLCS and RLCS are shown in
Table 3. The extrinsic parameters between LLCS and RLCS before and after the bundle adjustment are shown in
Table 4.
Using distances between the left and right points in the GCS as the reference distances, we can easily compute the corresponding distance errors using points in the LCS. According to the calibration results, the root mean square error of the 100 distances before optimization was about 3.5 mm and that after the bundle adjustment was 0.8 mm. It is notable that the accuracy was increased by two times. The error curves calculated by the points before and after the bundle adjustment are displayed in
Figure 5.
Meanwhile, the reprojection errors using the initial and rectified 3D points in LLCS and RLCS were also calculated and are shown in
Figure 6 and
Figure 7, respectively. From
Figure 6 and
Figure 7, we can see that the distance errors between extracted feature points and the reprojection ellipse of each sphere center point after bundle adjustment were all restrained to some extent.