1. Introduction
In recent years, the technology of vehicle-borne mobile mapping systems (MMSs) has seen rapid development. Vehicle-borne MMSs often integrate laser scanners, cameras, an inertial measurement unit (IMU), a global navigation satellite system (GNSS), and other equipment [
1]; when the vehicle is traveling at a certain speed, the GNSS and IMU obtain, in real-time, the position and attitude data of the equipment, while LiDAR and a panoramic camera synchronously obtain a series of point clouds and panoramic images of the scene. This technology can efficiently collect ground information from streets, industrial parks, mining warehouses, forests and other scenes around the clock, and is widely used in fields such as road maintenance, 3D scene construction, topographic mapping, positioning, and navigation [
2,
3,
4,
5,
6]. The high-precision fusion of multi-source data can bring the complementary advantages of different types of sensors; the point cloud data obtained using a vehicle-borne MMS have rich spatial location information with high accuracy [
7]; and panoramic data can be integrated for imaging, with a 360-degree field of view, and contain a large amount of textural and color information [
8]. Achieving the matching of the two types of data is a hot research direction in vehicle-borne MMS applications.
Matching between the point cloud and images often requires several steps, including coordinate transformation, registration, and fusion [
9]. In terms of coordinate transformation, matching currently relies mainly on the collinear equation, direct linear, and pyramid methods [
10] to establish the relationship between the point cloud and pixel data. The important factors directly affecting registration include the quality and synchronization rate of data collection from various sensors, the accuracy regarding the relative position between sensors, and the method of solving and correlating the various types of collected raw data [
11]. With the iteration of the hardware system and calibration method, the errors generated in the first two aspects have reached a relatively low level. In the past few years, the focus of research has been on the use of algorithms to achieve further fine registration. Yao et al. used feature points to divide the point clouds generated with multiple laser scanners into blocks, and determined the accurate relationship between points and pixels by correlating the visual field of the panoramic camera with the point cloud blocks and the principle of the collinear equation [
12]. Zhang et al. used the spherical epipolar line method and spherical absolute orientation model to achieve dense matching between images and point clouds based on Harris angle extraction, thereby obtaining more accurate registration parameters [
13]. Wang et al. extracted the rods in the panoramic image and the corresponding point clouds, re-projected them to a virtual image, and obtained refined correspondence by maximizing the overlap area through particle swarm optimization [
14]. Zhu et al. proposed a relative orientation model for panoramic images (PROM), which used feature point matching between adjacent panoramic images to calculate their relative poses, combined with the absolute pose of the starting panoramic image, to achieve registration under the condition that the image pose parameters are unknown [
15]. Li et al. used Fast-RCNN to extract vehicles in panoramic images and matched them with possible corresponding vehicles in the point clouds through initial poses, thereby improving registration accuracy [
16]. Wang et al. used the semantic segmentation method to remove the sky in panoramic images, projected the point clouds after ground removal to images to obtain registration primitives, and then achieved fine registration through the whale algorithm [
17].
Fusion is often divided into point cloud coloring based on 3D point clouds and point cloud projection and depth maps based on 2D images [
18]. Point cloud coloring is the process of assigning actual material colors to each laser point, which is commonly used in point cloud classification or 3D real scene modeling [
19]; its implementation is based on using timestamps to find the optimal image correspondence of the point cloud, searching further for the pixel correspondence of the laser point, and assigning color attributes to the point clouds. Yuan et al. proposed an iterative strategy through which to construct color and textural information from point clouds using multi-view observations; for the color conflicts of point clouds generated from different viewpoints, they used differential image regions to assign values to the point clouds [
20]. Shinohara et al. used PointNet++ to estimate the color of points in the point clouds and used a differentiable renderer to convert the colored point cloud into an image. The difference between the real image and the converted image was used as a loss function with which to train the network [
21]. Depth maps are similar to point cloud projection; based on the coordinate transformation theory, point clouds are projected onto the image, and corresponding pixels are given depth of field or spatial position information, thereby achieving the measurement and positioning of ground objects through two-dimensional images [
22]. The main difference between the two methods is that the depth map assigns an initial depth value to all pixels, replaces them with the depth values of the point clouds, and sets a scaling factor of panoramic image to increase the continuity of depth measurement [
23]. Ku et al. used only basic computer vision operations (such as dilation, smoothing, etc.) to solve the problem of generating depth maps from point clouds, achieving good performance [
24]. Xiang et al. proposed a deep learning network, 3dDepthNet, to generate accurate dense depth maps from pairs of point clouds and color images, then trained and validated it based on the KITTI dataset [
25]. Bai et al. proposed a lightweight network with a significant reduction in the number of parameters, achieving a depth map generation algorithm applicable to embedded devices [
26].
At present, for the data-matching method based on two-dimensional images, a vehicle-borne MMS often undergoes calibration after delivery, which can enable it to obtain highly accurate sensor external parameter data, thus reducing the workload for registration. However, there are still some unresolved issues in other aspects. Firstly, if data matching is performed based on deep learning, then the application effect is largely limited by the quality of the dataset and requires significant computational resources.
Secondly, due to the fact that point clouds represent discrete data, the projection generated with a point cloud representing an object on the image may be a series of discontinuous pixels. The point cloud map generated using a vehicle-borne MMS contains the 3D point clouds of most objects in the scene; meanwhile, a single panoramic image reflects the imaging from a certain perspective, and some objects in the scene are occluded. Therefore, when transforming point clouds into images, a point cloud projection in a region representing an occluded object might be generated incorrectly, in addition to the visible point cloud projections that should exist, which will cause a number of pixels with incorrect depth values to appear in the generated results of the point cloud map or depth map, and errors will be generated when querying the location coordinates based on the pixels. In addition, a generated point cloud map contains a large number of noise points and data with no measurement values (such as ground point clouds), which directly leads to accuracy interference in the generated results and reduces computational efficiency. In order to address the above issues, a method for matching point clouds with panoramic images based on occlusion removal, which assigns spatial position information to the main target on the image, is proposed. The algorithm removes invalid points contained in the generated point clouds and filters erroneous projection points from occluded objects, thus obtaining high-precision matching results.
2. Materials and Methods
In order to achieve accurate matching between point clouds and panoramic images, it is necessary to determine the fusion area, based on the coverage range of a single panoramic image, and search for the corresponding relationship between point clouds and images based on the time series. Specifically, the key technology can be divided into three parts—sequence point cloud scene generation, fusion coordinate conversion, and image matching—and the implementation method of each is described below.
2.1. Sequence Point Cloud Scene Generation
A vehicle-borne MMS can obtain the original point cloud data from the scanner, and the trajectory data from the IMU, during the acquisition process. This equipment outputs point cloud maps through a preprocessing system. Within the preprocessing system, the inertial navigation data are corrected based on the base station to obtain Position and Orientation System (POS) data that represent the trajectory of the inertial navigation center point. The system integrates the point cloud from each frame in the local coordinate system into a point cloud map in the world coordinate system, based on the scanner data and the POS data.
The point cloud map contains the point cloud data of the entire scanned scene, with a lot of data, but also useless information such as noise and ground points. In order to improve the efficiency and accuracy of point cloud projection into panoramic images, it is necessary to preprocess the point cloud. In this study, we designed a set of point-cloud-processing algorithms; the pseudocode is shown in
Figure 1.
During this process, the point cloud data were read, and then the uniform sampling algorithm was used to reduce the density of the point cloud data to an appropriate level. Secondly, the statistical filtering algorithm was used to remove irrelevant points, such as outliers in the point clouds. Thereafter, the cloth simulation filter (CSF) algorithm was used to separate non-ground point cloud data [
27]. Furthermore, the trajectory data of the camera optical center were read, and, based on the region segmentation algorithm, a suitable scene range was set and the point cloud roughly corresponding to each trajectory point was segmented. Finally, the point cloud data were saved, and the point cloud file corresponding to each panoramic image frame was generated. It should be noted that the setting of various thresholds in this process primarily achieved good processing effects in the experimental scenario described in this paper; it can be adjusted according to the actual circumstances. The density of the point cloud, as the principal data feature, influences the selection of certain thresholds in this process. For sparser point clouds, ‘every_k_points’ in the uniform sampling step and ‘mean_k’ in the statistical filtering step should be reduced, while ‘cloth_resolution’ in the CSF algorithm step should be appropriately increased. Conversely, for denser point clouds, these parameters should be adjusted in the opposite direction.
2.2. Coordinate Transformation
The world geodetic coordinate system is affected by the positioning and pose determination method, as well as the initial coordinate system of the 3D point cloud generated from a vehicle-borne MMS [
28]. In order to integrate panoramic images with point clouds, firstly, coordinate system transformation is required for the point cloud, and then collinear relationships and spherical projection are applied to complete the mapping of the point cloud to the panoramic image [
29].
To complete coordinate transformation from the point cloud to the corresponding image, the point cloud coordinates need to be unified to the camera coordinate system, with the camera center as the origin at the time of image capture. Each camera shooting time corresponds to the timestamp of a POS trajectory point, and the pose information of the IMU at that time can be queried in the trajectory. By using this pose and the relative position relationship obtained from calibrating the IMU and camera, the pose of the camera optical center at the time of shooting can be determined.
Given the coordinates of the point cloud and the pose of the camera optical center in the world coordinate system at the time of shooting, the coordinates of the point cloud in the camera optical center coordinate system can be calculated using the following formula:
In Formula (1), represents the coordinates of the point cloud in the world coordinate system, represents the coordinates of the point cloud in the camera coordinate system, and and represent the rotation matrix and translation matrix of the camera optical center in the world coordinate system, corresponding to the pose of the camera optical center. The camera trajectory pose points are usually represented by a set of Euler angles. In order to convert Euler angles into a rotation matrix, different Euler angle definitions have different rotation matrix generation rules. In this experiment, the corresponding order and rules in the East–North–Up coordinate system are used to generate the rotation matrix.
After obtaining the coordinates of the point cloud in the camera coordinate system, with the camera coordinate system origin as the center of sphere, the coordinates from the camera coordinate system are converted to the spherical coordinate system, the calculation formula for which is as follows:
In Formula (2), is the coordinate of the point cloud in the spherical coordinate system, which can be understood as its longitude and latitude on the sphere.
The panoramic image can also be converted into a panoramic sphere, corresponding to the spherical coordinate system. To convert the point cloud from a spherical coordinate system to a panoramic pixel coordinate system, the calculation formula is as follows:
In Formula (3), is the coordinate of the point cloud in the pixel coordinate system, and is the spherical radius corresponding to the panoramic image. Using the above formula, the point cloud in each frame of the scene can be projected onto the corresponding region on the panoramic image.
2.3. Data Matching
As previously mentioned, scene data are processed and segmented from a point cloud map scanned in the field, containing the overall 3D information of the region. When converting this information to a two-dimensional image, there may be situations in which the point cloud information from objects occluded in the camera’s perspective is projected onto the panoramic image, but these projection point data do not conform to physical reality and should be filtered out. In this study, we designed a filtering algorithm, based on vectorial angle, to divide the processing logic into the two following main parts:
Nearest point preservation: all point cloud data are traversed during the projection process; if a pixel corresponds to multiple data points, only the data points closest to the camera center are retained.
Filter out occluded points based on vectorial angle: to determine whether a point to be processed is an occluded point, we propose an algorithm based on angle filtering under spherical projection. Within the range of the spherical projection, a pair of points that do not have an occlusion relationship from a certain perspective form a larger angle with the viewpoint, and, similarly, a pair of points that are easily occluded tend to form a smaller angle.
As shown in
Figure 2, the angle in (a) between the vector
formed from the visible point
is the starting point to adjacent points
, and the vector
, formed from the visible point to the spherical center
, is significantly greater than the angle in (b) between the vector
and the vector
, both of which start with the occluded point
.
By calculating the angle between vectors, it can be judged whether a point in 3D space is occluded by other points during centroid projection. Taking Figure (a) as an example, firstly, the unit vector between the point to be processed and the camera optical center can be constructed, the calculation formula for which is as follows:
In Formula (4), is the coordinate of the point to be processed in the world coordinate system, is the world coordinate of the camera optical center, , , and are the normalized components of the unit vector from the point to the camera optical center in the three directions of the coordinate axis, and is the modulus of the vector from the point to the camera optical center.
A radius threshold
is set, and all pixels within the threshold range as nearby pixels are found. For these nearby pixels, their corresponding point cloud is queried and the unit vectors formed by these points with the point to be determined are calculated as follows:
In Formula (5), is the coordinate of the point cloud corresponding to adjacent pixels in the world coordinate system, , , and are the normalized components of the unit vector from the point to be processed to adjacent points in the three directions of the coordinate axis, and is the modulus of the vector from the point to adjacent points.
Finally, the angle between the two vectors is calculated using the following formula:
Simplification leads to the following formula:
In Formula (7), is the angle formed by the point to be processed and the camera optical center and the adjacent point. The adjacent points are traversed according to the threshold value, and the minimum angle generated between the point to be processed and the adjacent points within the radius threshold is recorded. The angle screening threshold is set. If the minimum angle is less than the screening threshold, the point is judged to be invisible, filtered as an occluded point, and is not reflected on the panoramic image projection.
Based on the research methods described above, we designed and implemented an algorithm for generating the matching results of point clouds and panoramic images, the pseudocode for which is shown in
Figure 3:
The algorithm first read the camera optical center trajectory; for each trajectory point, it read the corresponding point cloud scene and panoramic image. The projection result of the point cloud scene on the corresponding panoramic image was obtained through the coordinate transformation function in step 10. In step 12, the data-matching function took the point cloud projection result, the position of camera optical center, and the parameters of the occlusion removal algorithm as input data and traversed all points that had been processed through the nearest point preservation. The function was used to calculate the vector angles formed by these points, and their neighbors within the search radius, and determined whether any points were occluded based on the threshold value of the angle. For all points that completed occlusion determination, the corresponding relationship between the generated non-occluded points and pixels was retained, and a series of pixel-to-point correspondences was output in the form of a database file, which contains the coordinates of some pixels on the panoramic images and the corresponding spatial points in the world coordinate system. The algorithm traversed all trajectory points, generating a series of matching result files and corresponding visualization files in a specified folder, thus realizing batch processing of point cloud and image matching based on trajectory information.
3. Experimentation and Results
In order to verify the correctness and reliability of the matching method detailed in this article, a complete implementation process for matching point clouds and panoramic images was designed. The specific technical route is shown in
Figure 4.
3.1. Preparation of Experiment
In this study, the experimental basic data were collected and generated using the ARS-1000L mobile mapping platform produced by the company Hi-target DigitalCloud. The system composition is shown in
Figure 5.
In the hardware system, the acquisition equipment mainly includes a Riegl VUX-1HA laser scanner, aSPAN-ISA-100C inertial navigation equipment, and a Ladybug5Plus panoramic camera. The equipment is fixed and connected via mechanical devices and calibrated with high precision so that original sensor data can be obtained. The main performance parameters are shown in
Table 1.
The Shandong Xiaoya Group campus was selected as the site for experimental data acquisition in order to complete the verification of the proposed method. Located at No. 44, Industrial North Road, Licheng District, Jinan City, Shandong Province, the area features expansive and level roads, along with numerous well-organized buildings, which facilitated accuracy verification. Data acquisition was carried out in the afternoon, when direct sunlight was not strong. The driving route of the experimental vehicle was planned, from the nearby road to the industrial park, the mobile scanning platform was started, and the vehicle was driven on the planned route at a speed of approximately 15 km/h. The original data of the park were synchronously acquired with each sensor, and then a point cloud map of the planned route, a trajectory corresponding to the camera optical center, and panoramic images corresponding to trajectory points were generated.
3.2. Experimental Evaluation of Matching Effect
To qualitatively analyze the matching effects, we utilized point cloud maps, camera optical center trajectories, and the panoramic images corresponding to the trajectory points as input data. Data processing was conducted sequentially through the workflows designed previously for generating sequential point cloud scenes and matching results. We investigated threshold selection for occlusion removal algorithms, examined the visual effects of the method, and assessed its processing efficiency. Furthermore, in order to facilitate a comparative analysis of the algorithms, the same experimental dataset was used to generate colored point clouds, using a point cloud coloring algorithm, and to generate point cloud projection, directly based on the original point cloud map. The visual effects produced using both algorithms, as well as their computational efficiencies, were obtained and compared.
3.2.1. Selection of Threshold Value for Occlusion Removal Algorithm
In the filtering algorithm based on vector angles, the setting of the threshold directly affects the result of the processing. The radius threshold represents the search range of the surrounding data points from the points to be evaluated in the image coordinate system. Sufficient surrounding points help to make more accurate judgments. The angle selection threshold determines the screening level of the algorithm for the points to be evaluated. The higher the threshold, the more points will be filtered out. In the matching process, the density and distribution characteristics of point clouds, the sizes of images, and other characteristics are all related to the selection of algorithm thresholds. The appropriate selection of algorithm thresholds should be based on reasonable empirical ranges, with multiple experimental evaluations conducted to select a good set of thresholds as the parameters of subsequent algorithms.
According to the characteristics of data acquisition utilized in this experiment, the range of
was set to be within 10 pixels, and the range of
was set to be within 0.1. Taking the scene corresponding to a trajectory point in the industrial park as the test dataset, different filtering thresholds were set within this range to compare the filtering effect produced using the algorithm.
Table 2 and
Figure 6, respectively, show the influence of different thresholds within the range on the number of filtered points, and the visualization of matching results at representative positions after completing the occlusion removal process. It can be seen that
determines the overall filtering effect. The smaller the threshold value, the more filtered points, and the more obvious the distinction between occluded objects. The filtering effect on the occluded part at the junction of two surfaces of an object is significantly improved, but too small a threshold value can easily lead to erroneous filtering of visible points, especially those at the far end, resulting in sparse point cloud projection. The increase in
improves the accuracy of occlusion judgment, which has a significant impact on the point cloud projection at the boundary between different objects, producing the effect of distinguishing the boundaries of objects. However, too large a value can cause a significantly loss of valuable point clouds at the boundary.
When threshold A is set to 0.1 and B is set to 5, the filtering result is more in line with the physical reality, effectively filtering out the points of the occluded object and preserving the points of the object within view, with a low loss of position information. Scanning devices of the same type with similar settings tend to have consistent image and point cloud generation characteristics; so, the current threshold value can be used as an empirical parameter for processing scenes of this type. For scenes with different data characteristics, accurate occlusion removal can be achieved by applying different threshold and search radius values.
It is evident from algorithmic principles that the density of the point clouds is the most significant data characteristic affecting processing outcomes. To achieve the desired accuracy in occlusion removal, the threshold of the algorithm should be adjusted according to the density of the point clouds. In certain scenarios, where the target object is situated at a greater distance from the acquisition device, or where the acquisition vehicle is traveling at a higher speed, the resulting density of the point clouds is lower; for these situations, the search radius for each point to be processed should be increased, correspondingly elevating the threshold . The sparseness of points also results in an increase in the angle formed by occluded points, necessitating a corresponding increase in the threshold . Conversely, when the density of the point clouds is increased, the threshold of the algorithm should be correspondingly reduced.
Each panoramic image is taken on the ground, and there are some point clouds that are significantly higher than or far away from the camera optical center. According to the algorithm principle, the vector angle does not change with the increase or decrease in the distance between the object and the camera optical center, so the selection of the threshold is not obviously affected by the point cloud distance. However, the width of the edge area resulting from the projection of adjacent objects is not affected by point cloud distance. For point clouds that are far away from the camera optical center, the edge area occupies a large proportion of the projection area, which may lose the effective spatial information. Therefore, when determining the value of , the utilization rate of the projection information for tall or far away objects should be considered. For the situation where the corresponding spatial information of these two types of objects needs to be retained, should not be set too large. On the contrary, the value of can be appropriately increased to make the projection discrimination of objects close to the camera optical center more obvious.
3.2.2. Visualization of the Processing Effect
In the visualization of sequence point cloud scene generation, a series of point cloud scene data extracted frame by frame is obtained through the designed workflow.
Figure 7a,b shows the generation effect of the point cloud map and the extraction effect of the point cloud scene corresponding to a certain trajectory point. As shown in the figure, the device generates a 3D point cloud scene of the planned road section, with a large amount of noise and many insignificant points. After processing via the workflow, the valuable point clouds (such as buildings) in the scene are well preserved, the invalid points are mostly filtered, and the quality of the point cloud is generally good. The point cloud corresponds to the real scene around the panoramic camera at the time of shooting and can be used to match the panoramic image with the corresponding point cloud. Additionally,
Figure 7c shows the generation effect of the point cloud coloring algorithm. As can be seen from the figure, this algorithm ensures the accurate assignment of real color information to the point cloud, without affecting the point cloud map data. The data benchmark of this method is point cloud information, and it lacks the capability to extract spatial information from images.
In the visualization of data matching, using point cloud scenes, trajectory data, and corresponding panoramic images as inputs for the matching module,
is set to 0.1,
is set to 5, and corresponding point cloud projections are generated on the image, acquiring the point cloud direct projection result using the point cloud map as input.
Figure 8 shows the visual effects of point cloud direct projection and the matching method we proposed, before and after occlusion removal. It is observable that, upon transforming the point cloud onto a two-dimensional plane, the projection of the point cloud exhibits a good degree of coincidence with the corresponding objects in the panoramic imagery; however, some point clouds that should not appear in the image also generate projections. The point cloud direct projection result contains significant numbers of occluded point clouds, noise points, and ground point clouds. If one were to directly query the spatial coordinates corresponding to image pixels, representing objects of surveying value (such as building facades), based on this matching result, erroneous outcomes are likely to be obtained. In the matching method we proposed before occlusion removal, noise points and ground points were effectively removed, while the projection points of some occluded objects still persist, affecting the correct matching relationship between pixels and point clouds. Point cloud matching with occlusion removal significantly improves this phenomenon. Visual objects can be correctly assigned with point cloud information, and the point clouds that should not be displayed on the image are effectively filtered. The edge discrimination between the projections of different objects is obvious, and the spatial information of tall and far away objects is also well preserved. The matching results between 2D image pixels and 3D point cloud coordinates are more consistent with physical reality.
3.2.3. Processing Efficiency of the Method
In terms of processing efficiency, we compared the designed method and two comparative algorithms in terms of data quantity and time metrics in our experiments, as shown in
Table 3, revealing that the sequence point cloud generation workflow processed approximately 460 million data points in 4337.68 s, batch-generating files corresponding to 430 trajectory points and filtering out approximately three-quarters of insignificant points, thereby conserving computational resources for subsequent processing. The point cloud coloring algorithm consumed 3775 s to generate a colored point cloud map, which was saved as a map file; however, this commonly used matching method does not demonstrate significant efficiency advantages. In the matching result generation workflow, taking the trajectory points processed in the previous section as an example, 20,326,339 data points were read, and 1,866,860 data points were processed for matching, with a total time consumption of 60.62 s. The point cloud direct projection algorithm consumed 607.31 s due to using raw point cloud map as input, indicating relatively inefficient processing. In summary, the method proposed in this paper can perform serialization processing on large data volumes with relatively short time consumption and good processing efficiency.
3.3. Analysis of Matching Accuracy
To quantitatively analyze the matching error, 18 typical feature points in the industrial park were selected as control points. These control points were located 5–30 m from the center of the shooting area. The authentic world coordinates of the control points were obtained using a reflector-less total station, employing the WGS84 coordinate system, wherein the x, y, and z axes, respectively, represent the north coordinate, east coordinate, and geodetic height. On the bases of data acquisition and the processing results of the algorithm described in the previous section, the coordinates were used as a benchmark for comparative evaluation.
3.3.1. Generation Accuracy of Point Clouds
In order to evaluate the generation accuracy of point clouds, we manually selected the point cloud locations corresponding to the control points on the point cloud map and calculated the difference between the generated coordinates of the selected points and the true coordinates of the control points. The results are shown in
Table 4 and
Figure 9. As shown in the table, the average point cloud generation error is about 3 cm, the mean error in the plane is about 2.5 cm, and the maximum generation error is not above 5 cm. The source of error mainly includes the mechanical error of the sensor in data acquisition, the calibration error of the sensor, etc.; the error distribution of each point is relatively uniform, the overall generation accuracy is good, and the point cloud map can better restore the spatial information characteristics of the real scene.
3.3.2. Comparison of Pixel Matching Error
To evaluate the accuracy of the projection, 18 control points were manually labeled with pixel coordinates on the image. Using the real 3D coordinates of the control point as input and calculating the target pixel coordinates of the control point on the corresponding panoramic image based on the point cloud projection formula, the difference between the two coordinates was calculated as an evaluation of projection accuracy. As shown in
Table 5 and
Figure 10, the average error in pixel coordinates is 2.82, with a median error of 3.2 and a maximum error of around 6. The error source of pixel matching is mainly the accuracy error generated from the camera optical center trajectory, which is directly related to the calibration effect of the camera. Moreover, as the distance between the control point and the shooting center increases, the error tends to increase. The overall matching error is at a low level, and it can be considered that the matching effect between the point clouds and the images is relatively accurate.
3.3.3. Comparison of Position Matching Error
The matching results generated in this study correspond to the correspondence between the pixel points representing the primary targets in the panoramic images and their actual spatial coordinates. In order to evaluate the spatial matching errors of this method, the corresponding spatial coordinates were retrieved from the matching results for each control point, according to its pixel coordinates in the image. These coordinates were then compared with the actual coordinates of the control points to calculate the spatial matching errors, the results of which are presented in
Table 6 and
Figure 11. From the table, it can be seen that the mean square error in plane is about 3.2 cm, the distance error is about 4 cm, and the maximum distance error is no more than 6 cm. The accuracy of the position matching result is directly related to the accuracy of point cloud generation, the accuracy of the pixel matching result, and the filtering effect of the occluded data points. The overall error of this result is at a low level, and there is no obvious error fluctuation, indicating that there is no situation in which the wrong match point is queried. It can be considered that the matching relationship between the point cloud and the image generated using the method can better characterize the distribution characteristics of spatial information based on the images, and the corresponding position information can be obtained accurately for the pixels with spatial information.
5. Conclusions
For this research, we studied a method for generating a matching relationship between point clouds and panoramic image data obtained from a vehicle-borne mobile mapping system based on occlusion removal. Through the designed point-cloud-processing workflow, the effective point clouds corresponding to the camera scenes were generated; based on the spherical projection model and the design logic of filtering occluded points using vector angles, the problem of filtering the projection of occluded points in the perspective of panoramic images was solved, and the method logic was streamlined with the ability to automate and batch process.
An experimental comparison of the characteristics between this method and two commonly used matching techniques—point cloud coloring and point cloud direct projection—was conducted to demonstrate our method’s superiority in terms of processing efficiency and accuracy. Furthermore, an analysis was performed to examine the impacts of data characteristics, primarily point cloud density and distance of the points from the camera optical center, on the selection of algorithm thresholds. Under appropriate threshold values, accuracy experiments were also conducted, and the results show that, under the proposed process, the average generation error of point clouds is around 3 cm, with a maximum error of no more than 5 cm; the average pixel matching error is around 2.8 pixels, with a maximum error of about 6; additionally, the average position matching error is about 4 cm, with a maximum error of no more than 6 cm.
In light of the preceding discussion, the prospective research directions for this method ought to be concentrated on the following components:
Enhancing the applicability of algorithms under different acquisition methods and varying data characteristics. Devising multiple algorithmic logics in order to accomplish the processing of input data in diverse forms should be considered. Threshold control functions that utilize more fundamental variables, such as vehicle speed and the distance to the primary target, should be designed as inputs, enabling adaptive adjustment of thresholds to achieve the desired processing outcomes.
Acquisition of high-quality point cloud maps. Incorporating an optimization module for point cloud maps should be considered. Based on redundant point cloud data acquired multiple times, a filtering algorithm should be designed to retain high-quality data points, thereby generating an accurate point cloud of the primary target and supplementing the missing portions of the point cloud maps from conventional acquisition methods.
Enhancing registration accuracy and processing efficiency of data. Accurate matching results are derived from precise calibration parameters of the camera and laser scanner. Consideration should be given to employing registration algorithms that integrate point clouds with images to acquire more accurate coordinate transformation parameters between the two data types. Additionally, the computational speed of the method can be further enhanced through programming techniques such as parallelization and CUDA acceleration, thereby reducing the time consumption for generating matching results.
Overall, the proposed method yields satisfactory matching results, offering a reference-value-rich scheme for the fusion of multi-source data based on point clouds and panoramic images. The generated matching results can provide spatial information corresponding to pixel coordinates based on the image, offering robust support for algorithms that acquire object positional information through visual features. This method can be broadly applied in the fields of surveying, mapping, positioning, and navigation.