We evaluate the proposed method on the synthetic and real data. The synthetic data contain synthetically generated motion information of an IMU–camera system and corresponding calibration pattern points. Note that the two sensors are rigidly connected. Using the synthetic data, we validate the accuracy and stability of the proposed method by running 100 Monte-Carlo simulations with fixed noise parameters. Furthermore, we compare the calibration results with and without noise identification using random noise parameters to demonstrate the effect of noise estimation. In real-data experiments, we utilize two experimental setups: a self-designed system, which is equipped with two rolling shutter cameras and one low-cost IMU, and a commercial smartphone, which has a rolling shutter camera and a low-cost IMU. The first setup is specially designed to evaluate the extrinsic calibration accuracy. Since the ground truth cannot be obtained for real-data experiments, we indirectly measure the performance based on loop-closing errors between the two cameras and the IMU. With these two setups, we first analyze the performance of the proposed method based on the standard deviation of the estimate. We also compare the proposed method with hand-measure estimates in terms of extrinsics and existing calibration methods in terms of intrinsics and extrinsics for reference. These comparisons show that the proposed method produces comparable results on the camera intrinsic calibration. In addition, the smartphone experiments show that the proposed method is more robust to measurement noises than the existing calibration method.
6.1. Synthetic Data
We generate the synthetic measurements for a frame length of 60 s. For this, we synthetically make 20 corners of a checkerboard pattern in the world coordinate and a smooth trajectory of the mobile platform by applying a sinusoidal function, as shown in
Figure 5a. The points in the world coordinate are projected to the image coordinate as in [
40], and are used as synthetic visual measurements for IMU–camera calibration. We set the image resolution to 1280 × 960. The focal length is set to 700 and the radial distortions are set to [0.1 −0.1]
. The rolling shutter readout time
is set to 41.8
s. The standard deviation of Gaussian noises for the visual measurements is set to 1 pixel. The IMU acceleration and angular velocity measurements along the world coordinate are obtained by differentiating the trajectory of the IMU. Then, to generate uncalibrated measurements, intrinsic parameters including a scale factor, a misalignment, a bias, and gravity in the world coordinate are used to perform the inertial measurements based on Equation (
12). The scale factors (
,
), misalignments (
,
), and biases (
,
) are set to 1.1, 1.1, 0.03, 0.03, 0.1, and 0.1, respectively. These parameters are represented by a 3D vector and the elements of each vector are set to the same value. For instance,
. Finally, we apply additive white Gaussian noise to the inertial measurements.
Figure 5b,c shows an example of the synthetic inertial measurements. The frame rates of the IMU and the camera measurements are set to 100 and 25 Hz, respectively. For the extrinsic parameters, the relative rotation, relative translation, and time delay between the IMU and the camera are set to [50 −50 50]
, [50 50 −50]
mm, and 100 ms, respectively. The calibration parameters are initialized as described in
Section 5.3.
In the first experiment, we evaluate the proposed method with fixed IMU noise parameters, which represent the noise density of the IMU set to
= 0.1 m/s
/
, and
= 0.1
/s/
. The purpose of this experiment is to show how accurately and stably the proposed method estimates the calibration and noise parameters.
Figure 6 shows the estimates of the calibration parameters and the standard deviation of their errors, which are a square root of diagonals of the state error covariance, over the 60-s frame length in the calibration and noise identification step. In our framework, the calibration parameters and their covariances are estimated by the UKF while estimating the noise parameters using nonlinear optimization. The graphs in
Figure 6 show the calibration estimates obtained by the UKF at optimal noise parameter estimates by nonlinear optimization. In
Figure 6a, the errors of the states, including the rotation
, translation
, time delay
, biases
and
, misalignments
and
, scale factors
and
, and focal length
f, the radial distortion
and the rolling shutter readout time
, are rapidly converged to zero. The errors are computed using the ground truth calibration parameters. The rotation errors are displayed in the Euler angle form for intuitive representation. In
Figure 6b, the error standard deviations obtained by the UKF are rapidly converged.
Table 4 summarizes the final calibration and noise parameter estimates of the proposed method by running 100 Monte-Carlo simulations. We describe the mean and standard deviation of the estimates to show statistically meaningful results.
Table 4d shows the noise parameter estimates obtained by the proposed method. We find that the ground truth of the noise parameter is within the error bound of the noise parameter estimates. This result indicates that the proposed noise identification reasonably estimates the noise parameters. Therefore, the estimated noise parameters can be used to further refine the calibration parameters in the refinement step.
Table 4a–c shows the extrinsic and intrinsic parameter estimates of the IMU and the camera. The differences between the estimates and the ground truth are smaller than their standard deviations. This validates that the proposed method accurately estimates all calibration parameters and noise parameters.
In the second experiment, we generate 100 sequences based on different IMU noise parameters, and, using these sequences, we analyze the proposed calibration method with and without the noise density estimation. The purpose of this experiment is to show that how the proposed noise identification robustly estimates the calibration parameters under unknown IMU noise density. The noise parameters
are randomly selected from 0.001 to 0.2, and the generated noises are added to the inertial measurements. These values cover most of the noise characteristics from low- to high-cost IMUs. We run the proposed method with noise estimation and fixed noise parameters (
= 0.2/0.2 and 0.001/0.001).
Table 5 shows the calibration parameter estimates of the proposed method with and without noise estimation. With noise estimation, the proposed calibration method estimates accurate calibration parameters. The mean of errors between the estimated results and the ground truth is smaller than the standard deviation of the estimates. On the contrary, fixed noise densities cause inaccurate and unstable estimates. The mean of the translation estimates without noise estimation is out of the error bound, and their standard deviation is about two times larger than that of the proposed method with noise estimation. For bias estimates, the mean of the estimates is not within the error bound and their standard deviation is larger. Besides, the estimates of the focal length and rolling shutter readout time are largely affected by the IMU noise parameters, whereas the rotation, misalignments, scale factors, and radial distortions are not dependent on the IMU noise parameters. These results validate that the proposed noise identification approach improves the accuracy and stability of the calibration parameter estimation, especially under unknown IMU noise density.
6.2. Real Data
For the real-data experiments, we use two setups: a self-designed system and a commercial smartphone. Since it is difficult to obtain ground truth for real data, we indirectly evaluate the performance of the proposed method using the results of existing methods and the hand-measured values. In addition, we use the standard deviation of estimates as an evaluation metric to analyze the algorithm stability. Moreover, the loop closing error metric with two cameras and the IMU is utilized to evaluate extrinsic calibration performance.
At first, we evaluate the proposed method using the self-designed system, which consists of two rolling shutter cameras and a low-cost IMU.
Figure 7a shows our experimental setup. The rolling shutter cameras are Logitech C80 (Logitech, Lausanne, Switzerland) and the low-cost IMU is Withrobot myAHRS+. The image resolution is 640 × 480, and the frame rates of the camera and the IMU are, respectively, 30 and 100 Hz. We record a set of measurements for approximately 90 s and perform experiments on the 10 datasets.
Table 6 demonstrates calibration parameter estimates of camera 1 and the IMU. For extrinsic and IMU intrinsic parameters, we compare our method with the existing calibration method (i.e., the Kalibr-imucam method [
19]) and the hand-measured values. The noise density of the IMU for the Kalibr-imucam method is set to the noise density estimated by the proposed method.
Table 6a describes the extrinsic parameter estimates of the proposed method, the Kalibr-imucam method, and the hand-measured values. The rotation parameter
is parameterized by Euler angle
instead of unit quaternion for intuitive understanding. The average rotation, translation, and time delay estimates of the proposed and Kalibr-imucam methods are close to each other, and the average rotation and translation estimates of both methods are within the error bound of the hand-measured value. In addition, the standard deviation of the extrinsic parameter estimates from both methods is very small. This comparison shows that the proposed method successfully estimates the extrinsic parameters; moreover, the estimated noise parameters from the proposed method lead to the successful calibration of the existing calibration method.
Furthermore, with the two rolling shutter cameras, we evaluate the extrinsic calibration performance using the loop-closing error metric, which is defined as the transformation error
between the low-cost IMU and the two cameras (i.e.,
and
). Here,
represents the relative transformation from the IMU to camera 1,
represents the relative transformation between the two cameras, and
denotes the relative transformation from camera 2 to the IMU.
is estimated through the stereo camera calibration algorithm [
43], and
and
are estimated through the IMU–camera calibration algorithm. We run 10 sequences with the left camera–IMU and right camera–IMU. Then, we compute 10 × 10 loop closing errors by using bipartite matching (i.e., total 100 sets).
Table 6b shows the mean and standard deviation of the loop closing errors on rotation and translation motions. The proposed and Kalibr-imucam methods produce very small errors on rotation and translation. Since the error bound of
is about ±2 mm, the errors in our method and the Kalibr-imucam method are negligible.
Table 6c describes the IMU intrinsic parameter estimates of the proposed and Kalibr-imucam methods. We do not compare classic IMU intrinsic calibration methods [
7,
8,
9] because they require special equipment such as robot arms to estimate intrinsic parameters. Besides, since the Kalibr-imucam method regards biases as time-varying parameters, unlike the proposed method, we do not compare it in this table. In both methods, the standard deviations of the misalignments and the scale factors are small enough, which indicates that the estimates are reliable. The results show that the
z-axis bias of the accelerometer is larger than the
x,
y-axis biases due to the effect of the gravity vector located on the
z-axis of the IMU coordinate (i.e.,
z-axis values of inertial measurements are much larger than those of
x,
y-axis). Interestingly, the misalignments are close to zero and the scale factors are close to one. This means that the IMU is somewhat intrinsically calibrated, although it is a low-cost sensor. The average estimates of the misalignments and scale factors from both methods have small differences because the Kalibr-imucam method also considers the effects of linear accelerations on gyroscopes. However, there are negligible numerical errors in the misalignments and scale factors.
For camera intrinsic parameters, we compare our method with the Kalibr-rs method [
12] and the MATLAB calibration toolbox, as shown in
Table 6d. The Kalibr-rs method uses image sequences, which are the same measurements as those used for the proposed method, and the MATLAB toolbox uses 20 still images of the same checker board. The focal length and radial distortion estimates of the three methods are close to each other. In addition, the average rolling shutter readout time estimates of the proposed method are close to the average estimates of the Kalibr-rs method. Although there are small differences between the estimates obtained from the three methods, they are negligible. Besides, the smaller standard deviation of the estimates from the proposed method indicates that the proposed method is more reliable than the Kalibr-rs method. This comparison of the intrinsic parameter estimates from the proposed and the two existing methods, which use various measurements, validates the camera intrinsic calibration of the proposed method.
Table 6e describes the estimated noise parameters. The standard deviation of IMU noise density estimates is small enough and their mean can be used for other sensor fusion algorithms such as IMU–camera calibration and visual-inertial SLAM methods. Although we do not compare the noise density provided from the other algorithms or manufacturers, the noise estimation performance of the proposed method is indirectly verified through comparison with extrinsic parameters of the hand-measured value and camera intrinsic parameters from the MATLAB Toolbox. In addition, the successful calibration of the Kalibr-imucam method with the noise density estimates obtained from the proposed method validates the noise estimation of the proposed method.
Second, we evaluate the proposed method using a smartphone “Samsung Galaxy Alpha”, which is equipped with a rolling shutter camera and a low-cost IMU; its configuration is shown in
Figure 7b. The image resolution is 720 × 480, and the frame rates of the camera and the IMU are, respectively, 30 and 50 Hz. We record 10 sets of measurements whose frame length is about 90 s.
Table 7 shows the calibration parameter estimates of the smartphone. We compare the proposed method with the existing calibration method (i.e., Kalibr-imucam [
19]) and hand-measured values, similarly to the above experiment. The average rotation estimates of the proposed method and the Kalibr-imucam method are close to those of the hand-measured value. The standard deviation of the rotation estimates from the proposed method is smaller than that obtained from the Kalibr-imucam method, and this result indicates that the proposed method is more reliable than the Kalibr-imucam method. Besides, the translation estimates of the Kalibr-imucam method are converged to unrealistic values that are zero, but our estimates are close to the hand-measured value. The average time delay estimates of the proposed method are close to those of [
19]; however, the standard deviation is large because an irregular time delay occurs due to the camera module activation of the smartphone. In summary, in this experiment, the extrinsic calibration of the proposed method outperforms the Kalibri-imucam method unlike the first experiment, which uses the self-designed setup. We argue that the proposed method is more robust than the noisy measurements because our framework is based on the graybox method, which internally uses the Kalman filter. In practice, the rolling shutter camera and the IMU of the commercial smartphone contain larger noises than those of the self-designed setup, as described in
Table 7d.
Table 7d demonstrates the IMU intrinsic estimates of the proposed method. Similar to the results of the first experiment, the
z-axis bias of the accelerometer is larger than those of the other axes. The misalignment and scale factor estimates of the accelerometer from the proposed and Kalibri-imucam methods are, respectively, close to zero and one, and the biases and misalignments of the gyroscope are close to zero. This result indicates that they were already intrinsically calibrated. However, the scale factors of the gyroscope are calibrated through the proposed and Kalibr-imucam methods, as shown in the table. Although the scale factor estimates from two methods are slightly different, the low standard deviation of our estimates indicates that the proposed method is more reliable.
For camera intrinsic parameters, we compare our estimates to the Kalibr-rs method [
12] and the MATLAB calibration toolbox, as shown in
Table 7c. The focal length, radial distortion, and rolling shutter readout time estimates of the proposed method are close to the estimates of two existing methods. Although the estimates of the second coefficient of radial distortion are different for each method, the first coefficient is more important than the second. Besides, the proposed method provides the lower standard deviation of the rolling shutter readout time estimates, as in the first experiment.
Table 7d describes the estimated noise densities. They are two times larger than the noise density estimated on the smartphone setup. This means that the IMU noise of the smartphone setup is more severe than that of the self-designed setup. This experiment shows that a full calibration of off-the-shelf devices is possible with the proposed method.
Furthermore, we compare the operating time of the existing method (Kalibr [
12,
19]) and our method. The experimental environment for the comparison is Intel i7-4790K CPU running at a single core 4.00 GHz.
Table 8 demonstrates the operating time of the Kalibri [
12,
19] and the proposed method. The timing statistics are measured on the 10 datasets used for the above “Samsung Galaxy Alpha” experiment. In case of the existing method, it is required to use both the Kalibr-rs [
12] and the Kalibr-imucam [
19] together because they separately estimate the intrinsic and extrinsic parameters, whereas, the proposed method does not. The total operating time of the Kalibr method is about 2 h in average for the full calibration. In particular, the operating time of Kalibri-rs [
12] is longer than 1 h because of the iterative batch optimization for adaptive knot placement. However, the proposed method estimates noise parameters as well as calibration parameters, and it is 1.733 times faster than the Kalibr method in average.
We also compare the prediction errors of the UKF for motion estimation with calibration parameters estimated by the Kalibr method and the proposed method. In this experiment, the states of the UKF are a position, velocity, and orientation of the IMU, and they are predicted by inertial measurements and corrected by visual measurements, which are corners of a checkerboard pattern. We record a set of inertial and visual measurements using the smartphone “Samsung Galaxy Alpha” for 60 s. The prediction errors using the Kalibr are accumulated due to the IMU noise and they result in the motion drift. However,
Figure 8 shows that the calibration parameters of the proposed method reduce the mean and variance of prediction errors. The average RMSE in terms of pixel for all frames decreases from 3.03 to 2.45 (about 19.2%).
Finally, we compare the localization accuracy with calibration parameters estimated by the Kalibr method and the proposed method. In this experiment, we capture image sequences and inertial measurements using the smartphone “Samsung Galaxy Alpha” in an outdoor vehicle driving environment. The trajectories are estimated by VINS-mono [
44], which is an open source visual-inertial SLAM algorithm. In this experiment, we do not use loop closure correction and online calibration on the extrinsic parameter and temporal delay.
Figure 9 shows the estimated trajectory with the proposed method is close to the ground truth trajectory, whereas the trajectory estimates with the Kalibr method suffers from scale drift as the trajectory becomes longer. The experimental results validate that the proposed method improves the performance of the real-world system as well.