1. Introduction
Fringe projection profilometry provides a convenient way to measure dense and accurate three dimensional (3D) surface point cloud of target objects. It plays an increasingly important role in various fields such as industrial quality inspection, prototyping, culture heritage preservation and movie industry [
1,
2,
3,
4,
5]. Owing to the limited field of view (FOV) and object self-occlusion, 3D point cloud obtained from a single viewpoint only contains partial surface shape data. To reconstruct complete surface models, 3D measurements from multiple viewpoints are deserved to cover the whole object, and their sensor poses need to be precisely tracked to further transform these partial surface point clouds into a global coordinate system [
6,
7,
8,
9].
Existing sensor pose tracking solutions are mostly based on external assistance methods, such as attaching artificial markers or using external positional equipment such as a laser tracker or optical coordinate measuring machines (CMMs) [
10], their usage flexibility is inherently limited. Alternatively, sensor poses can also be directly estimated by using 3D registration techniques [
11,
12,
13] to compute the relative pose between sequential two measurements. However, sensor pose estimation drifts inevitably exist due to 3D registration inaccuracy. Small sensor pose estimation error which may seem negligible on a local scale, can drastically accumulate along a long scanning trajectory [
12,
14]. The accumulated error directly leads to surface point clouds inconsistency between the first and last scans and finally breaks the reconstruction result.
Different optimization methods have been adopted to solve the accumulated error problem. Among them, bundle adjustment (BA) is one of the most well-known approaches that performs global optimization by minimizing the reprojection error across different frames. Specifically, BA is conducted by firstly identifying the same visual feature points appearing in multiple frames, and then adjusting the estimated 3D locations of feature points together with the camera poses [
7,
9]. Nevertheless, BA only optimizes sparse 3D feature points and camera poses, thus it does not guarantee local shape consistency of the reconstructed 3D models [
14]. Besides, visual feature detection is the prerequisite for BA optimization, it cannot be fulfilled when the color image is not valid or the target object surface is textureless (e.g., industrial parts).
Instead of optimizing the accumulated error to solve surface inconsistency, Zhou et al. [
15] and Whelan el al. [
16] chose to deform inconsistency local point clouds together using non-rigid 3D registration techniques, consumer RGB-D sensors are taken as the depth input in their works. Shape deformation provides a simple yet useful approach to obtain globally consistent models, especially in some applications such as indoor reconstruction [
12] where surface consistency instead of the accuracy is of the most importance. However, shape deformation is not desired in our problem, because it directly ruins the surface measurement accuracy. Furthermore, since FPP sensor provides high-accuracy surface point cloud measurements, theoretically when sufficient accurate sensor poses are recovered, the individual local 3D point clouds should be able to integrate into a globally consistent model using only rigid transformations.
Differently, Cao et al. [
17] and Yue et al. [
18] optimized the accumulated error by first identifying the loop closures formed through successful 3D registration between each current frame and other earlier frames, and then performing a pose graph optimization [
19] to reduce the sensor poses drifts. However, in their works the loop closures are identifying either by manually checking the 3D point cloud overlapping ratio [
17], or by using the measurement system setup information [
18], which prevents their further usage in a practical 3D scanning system. Moreover, the pose graph optimization in [
17,
18] only optimized the inconsistency between two associated sensor poses and their relative pose constraint; it ignores important surface consistency information in the 3D registration process [
6].
According to the above analysis, the key to accurate surface reconstruction lies in the reduction of accumulated sensor pose estimation error. In this paper, we present a flexible and accurate method for high-accuracy globally consistent surface reconstruction using a single FPP sensor. The accumulated error problem is addressed from two aspects: (1) observing the underlying principle that surface curvature remains invariant against measurement viewpoint changes, a novel 3D registration method is proposed which fuses both dense geometric and curvature consistency constraints to joint optimize the relative sensor pose estimation. The introduction of curvature consistency constraint implicitly pays attention to high-curvature surfaces, which helps to generate more accurate 3D registration results [
20]. (2) We utilize 6-DOF pose distances for adaptive keyframe determination, and use a two-step checking scheme for automatic loop closure detection. By modelling the surface inconsistency information as a pre-computed covariance matrix and formulating the multi-view point cloud registration problem in a pose graph optimization framework, the accumulated error can be effectively reduced to obtain the final accurate sensor pose estimations.
The effectiveness of our proposed method is demonstrated by reconstructing a 1300 mm × 400 mm workpiece with a FPP sensor. Results show that the proposed method substantially reduced the accumulated error, making the sensor pose estimation accuracy match the measurement accuracy well. Our method shows the ability to accomplish industrial-level surface model reconstruction without any external positional assistance but only using a single FPP sensor.
2. Measurement Principle
In our FPP sensor, a series of sinusoidal fringes along the horizontal axes of projector image frame with constant phase shifting are projected onto a target object, and two cameras capture the distorted fringe images synchronously. The captured images can be expressed as:
where
is the pixel coordinates and is omitted in the following expression,
denotes the recorded intensity,
indicates the average intensity,
represents the modulation intensity,
is the constant phase-shift,
n is the phase shift number, and
is the desired phase information. By solving Equation (
1), the phase value
can be obtained according to:
The arctangent function in Equation (
2) will result in a phase value within the range of
with
discontinuities. In our sensor, multi-frequency heterodyne technology is adopted to construct the continuous phase map [
21], so that the correspondence between two camera views can be established unambiguously. Finally, the 3D result can be obtained according to the pre-calibrated camera intrinsic and external parameters. The measurement principle of the FPP sensor is shown in the
Figure 1 below.
3. Relative Sensor Pose Estimation
The relative sensor pose estimation between sequential two measurements (also called as frames in the following) is the basis to obtain the initial global sensor pose estimation of each measurement. In this section, we will introduce the proposed method which estimates the relative sensor pose (a rigid transformation) by 3D registering two depth maps to jointly optimize the dense geometric and curvature inconsistency errors. The whole process is conducted by first computing the curvature map of each depth map, and then iteratively performing data association and error minimization steps.
3.1. Curvature Map Estimation
Similarly to depth map (also called as depth image), curvature map is a 2D image in which the value of each pixel is the surface curvature value instead of the depth value. Specifically, for each pixel
in the depth map with valid depth
, its corresponding 3D point coordinate
can be computed using the inverse of projection function
as:
where
,
are the focal lengths and
,
are the principle point, respectively. The mean curvature of each point on the surface is represented using a surface variation notion in [
22]. Hence, the surface curvature value
at pixel
is estimated by taking the eigen-analysis of the covariance matrix of the local neighbor points of point
. The covariance matrix is defined as:
where
is one of the nearest neighbor points of
. Then
can be computed as:
where
are the eigenvalues of the covariance matrix
.
To speed up the nearest neighbor search, we take advantage of the organized point cloud structure embedded in the depth map, only taking adjacent pixels as candidate neighbors. Meanwhile, the geometric continuity constraints are also considered to filter the potential depth gaps by specifying a maximum allowed distance. Pixel is the nearest neighbor of pixel , only when it satisfies , and , where and represent the pixel and point nearest neighbor distance threshold, respectively. In this paper, we set and mm (with average point cloud density as 0.275 mm) to allow approximate 30 nearest neighbor points for curvature value estimation.
Figure 2a shows a depth map measured with the FPP sensor,
Figure 2b shows the estimated curvature map using our method.
Figure 2c is the corresponding 3D point cloud whose color is mapped from the curvature map, and the local detail is displayed in
Figure 2d. It can be seen that the estimated curvature map exhibits high consistency with the point cloud surface variation. Furthermore, by carefully handling the discontinuous boundary case, the curvature values at boundary points can also be robustly estimated, as shown in
Figure 2d.
3.2. Data Association
Data association is to identify the corresponding points between two sequential frames, the correspondence set is then fed to the optimization process to find the optimal relative sensor pose estimation. Assuming small camera motion between sequential frames, the projective data association algorithm [
12] is conducted to produce the point correspondences set. Given the relative sensor pose estimation
between current frame
and its previous frame
, then for each pixel
with valid depth in
, we first transform its corresponding 3D point
into the local coordinate system of previous frame
as
. Then the corresponding pixel of
in frame
can be computed with perspective projection:
where
is the camera intrinsic matrix. Note that for simplicity of notation, we omit the conversions between vectors and its homogeneous vectors throughout this paper.
With the projective data association, multiple pixels in source depth image may correspond to a common pixel in target depth image . To solve the many-to-one problem, the z-buffer technique is adopted, for each pixel in target depth map we only keep the corresponding pixel in source depth map with minimum depth. All corresponding points pairs together construct the corresponding set between frame and .
3.3. Minimization
The relative sensor pose optimization function
is defined as:
where
denotes the geometric inconsistency error,
denotes the curvature inconsistency error,
is the weight of the curvature inconsistency error.
The geometric error is defined as the point-to-plane error [
11] between current and previous frames:
in which
is one corresponding pixels pair in the corresponding set
,
is the local 3D point in the current frame
,
and
are the corresponding 3D point and normal, respectively.
is the current estimation of the relative sensor pose between the two frames.
is the incremental transformation to be estimated in each iteration, in which
.
The curvature inconsistency error
is defined as the curvature value inconsistency between the warped curvature map of current frame
and the curvature map of previous frame
:
where
is the curvature value at pixel
of the current frame,
is the curvature value at pixel
of the previous frame.
Assuming the incremental pose transformation
to optimize at each iteration is small, it can be linearized as
, where
is the corresponding Lie algebra element:
the
is a linear skew-symmetric operator (see [
23] for details).
With this linearization and simple notation
, the error term
becomes:
where
is the Jacobian matrix and
is the residual vector. Similarly, the error term
becomes:
With the above linearization, minimization of Equation (
7) allows to solve the following linear system:
In each iteration, we compute Jacobian
,
and residual
,
at current relative sensor pose estimation
, and solve the linear system in Equation (
13) to find the
that best satisfies the geometric and curvature consistency constraint. Then the relative pose
is updated to
, and taken as the initialization for the next iteration.
When the optimization converges, the is taken as the final relative sensor pose estimation between two frames. We fix the sensor pose of the first frame as and regard it as the world coordinate system. Then the initial global sensor pose of frame is computed as .
Figure 3 shows the 3D registration results comparison between the proposed method and two other methods. The sensor pose estimation accuracy is directly reflected in the surface shape consistency of two registered point clouds. When independently visual inspecting each registration result, each method seems to converge to a correct result. However, when comparing the registration results between
Figure 3b–d, it is not hard to see that the relative sensor pose estimation accuracy of our method outperforms the other two methods.
Figure 4a,b represents the curvature value difference map between source and target point cloud before and after the 3D registration, respectively. The curvature difference map is built on the target frame
, correspondences are built using the above data association method. Gray pixels indicate that no correspondence is built for these pixels. It can be seen that the curvature value difference from
Figure 4a,b decreases dramatically over the whole map, which demonstrates the significance of introducing curvature map consistency into the 3D registration constraints.
5. Experiment
In the experiment, a FPP sensor is constructed using (1) a Texas Instruments LighterCrafter4500 board (Texas Instruments, Dallas, TX, USA) for fringe patterns projection, (2) two Basler acA1300-30gm cameras (Basler AG, Ahrensburg, Germany) simultaneously capturing the modulated images with pixel resolution of 1296 × 966. The proposed method is validated by scanning a 1300 mm × 400 mm sheet metal using the FPP sensor as shown in
Figure 8, the 3D measurement and model reconstruction are conducted on a desktop PC with a 3.3 GHz Intel Xeon CPU and 16 GB RAM. By moving the FPP sensor around, complete scan of the sheet metal with totally 146 frames (depth maps) acquired is accomplished.
To test and verify the accuracy and effectiveness of the proposed relative sensor pose estimation method and the global optimization method, a ceramic ball bar is placed beside the measured sheet metal. The reconstruction accuracy can then be well examined by qualitatively observing the surface consistency and quantitatively analyzing the size fitting results of the reconstructed ceramic ball bar.
5.1. Relative Sensor Pose Estimation Accuracy
The accuracy of our proposed relative sensor pose estimation method is tested first. The sensor pose of each frame relative to the world coordinate system (frame 1) is separately estimated by (1) jointly optimizing the geometric and curvature consistency constraints (our method), (2) only optimizing the geometric consistency constraint for comparison. With the estimated sensor poses, 3D point cloud of each frame is transformed to the world coordinate system and further voxel downsampled to a unified 3D point cloud.
Figure 9a shows the reconstructed surface of sheet metal with our method, it shows that the overall shape of our reconstruction result matches the actual sheet metal shape well. The point clouds are rendered with
Open3D library [
24].
On the other side, sensor pose estimation error inevitably accumulated in the reconstruction process, which leads to obvious surface shape artifacts, as shown in
Figure 9b,c. In which,
Figure 9b shows the local surface inconsistency at 3 difference places using our method,
Figure 9c shows the corresponding results using only geometric consistency constraints. With this comparison, it is not hard to see that introducing the curvature consistency constraint effectively improves the sensor pose estimation accuracy, which provides a good foundation for further global optimization.
5.2. Global Sensor Pose Optimization Accuracy
Based on the sensor pose estimation results above, the global optimization is performed by (1) keyframe selection, (2) loop closure detection and (3) pose graph optimization. Then the globally optimized reconstruction result is obtained with the optimized sensor poses.
Figure 10a,b show the optimized surface model and its local details, respectively. With the global model optimization, we obtained globally consistent surface model, surface inconsistencies due to the accumulated error are well optimized as shown in
Figure 10b.
To further quantitatively analyze the accuracy improvement with the global optimization, we computed the relative translation and rotation changes of each keyframe pose before and after global optimization, as shown in
Figure 11, the optimized poses are taken as the reference values here. It demonstrates that even very small translation estimation inaccuracy (less than 2.0 mm) and rotation estimation inaccuracy (less than
) in the reconstruction range of 1300 mm × 400 mm, are enough to cause obvious surface inconsistency (as shown in
Figure 9b), and lead to reconstruction results that are unusable for high-accuracy dimensional inspection.
Meanwhile, the absolute accuracy of the reconstructed surface model can be directly and precisely tested by comparing (1) diameter fitting values of two spheres, (2) standard deviation values of Euclidean distances between sphere surface 3D points and the fitted sphere surface, (3) Euclidean distance between two sphere centers. The comparison is made between the not-optimized model, globally-optimized model and the ground truth. The ground-truth is obtained with the fitting values of frame 130, because two spheres are both measured in this frame, the fitting values are only related to the measurement accuracy of our FPP sensor, and are not affected by any sensor pose estimation error. Specifically, for each kind of data source, we manually cropped the corresponding points that belong to the two sphere surfaces, and fitted the diameter and standard deviation values using the Geomagic software.
Table 1 shows the comparison results of diameter and standard deviation fitting values of two spheres. The standard deviation values directly reflect the surface consistency of our reconstruction model. After the global optimization, it decreases from 0.1971 mm to 0.0282 mm for sphere 1, and decreases from 0.2534 mm to 0.0301 mm for sphere 2. Furthermore, the standard deviation value of globally-optimized model is very close to the value of a single measurement (frame 130), which demonstrates that our reconstructed surface exhibits very good shape consistency.
We also compared the difference of the sphere center distances between not-optimized and globally-optimized models, as shown in
Table 2. The absolute error of sphere center distance relative to the ground truth decreases from 0.2080 mm to 0.0205 mm, the relative error relative to the ground truth decreases from 0.1387% to 0.0137%.
Both of the above two comparison results explain the surface shape inconsistency refinement from
Figure 9a,b to
Figure 10a,b, and illustrate that with the global optimization (1) the accumulated error is substantially reduced to less than
of the not-optimized reconstruction result, (2) the final sensor pose estimation accuracy can well match the measurement accuracy of our FPP sensor.
6. Conclusions
In this paper, we present a high-accuracy globally consistent surface reconstruction method using fringe projection profilometry. The accumulated sensor pose estimation error problem is solved with a first relative sensor pose estimation step and a following global sensor pose optimization step. The former step tries to reduce the accumulated error by maximizing the relative sensor pose estimation accuracy; it helps to ensure the initial sensor poses lie in the convergence basin of the following global optimization method. The latter step globally optimizes the sensor poses through a multi-view point cloud registration formulated in the pose graph optimization framework. Besides, adaptive keyframe selection and loop closure detection method are proposed to efficiently and automatically build point cloud connections and their relative pose constraints, which are the prerequisites of global sensor pose optimization. By qualitatively observing and quantitatively analyzing the reconstruction results of a 1300 mm × 400 mm workpiece, we validated the effectiveness and accuracy of our method. Our method demonstrates the ability to accomplish industrial-level surface model reconstruction without any external positional assistance but only using a single FPP sensor.
Since our reconstruction method is based on 3D registration, it also shares some limitations similar to most 3D registration based surface reconstruction methods [
7,
12,
16]. For example, when the target object is near a plane, 3D registration may not converge to a correct result due to insufficient geometric constraint [
11], which will stop the sensor poses from being robustly tracked. A possible solution is to further exploit the usage of surface textures constraint to help the robust tracking of sensor poses.