Next Article in Journal
Multi-Parameter Characterization of Liquid-to-Ice Phase Transition Using Bulk Acoustic Waves
Next Article in Special Issue
DIO-SLAM: A Dynamic RGB-D SLAM Method Combining Instance Segmentation and Optical Flow
Previous Article in Journal
Pipeline Leak Detection: A Comprehensive Deep Learning Model Using CWT Image Analysis and an Optimized DBN-GA-LSSVM Framework
Previous Article in Special Issue
UAV-Borne Mapping Algorithms for Low-Altitude and High-Speed Drone Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Accurate 3D LiDAR SLAM System Based on Hash Multi-Scale Map and Bidirectional Matching Algorithm

1
Guangdong Provincial Key Laboratory of Robotics and Intelligent System, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
2
Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen 518055, China
3
Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen 518055, China
4
Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China
*
Authors to whom correspondence should be addressed.
Sensors 2024, 24(12), 4011; https://doi.org/10.3390/s24124011
Submission received: 30 April 2024 / Revised: 15 June 2024 / Accepted: 18 June 2024 / Published: 20 June 2024
(This article belongs to the Special Issue Sensors and Algorithms for 3D Visual Analysis and SLAM)

Abstract

:
Simultaneous localization and mapping (SLAM) is a hot research area that is widely required in many robotics applications. In SLAM technology, it is essential to explore an accurate and efficient map model to represent the environment and develop the corresponding data association methods needed to achieve reliable matching from measurements to maps. These two key elements impact the working stability of the SLAM system, especially in complex scenarios. However, previous literature has not fully addressed the problems of efficient mapping and accurate data association. In this article, we propose a novel hash multi-scale (H-MS) map to ensure query efficiency with accurate modeling. In the proposed map, the inserted map point will simultaneously participate in modeling voxels of different scales in a voxel group, enabling the map to represent objects of different scales in the environment effectively. Meanwhile, the root node of the voxel group is saved to a hash table for efficient access. Secondly, considering the one-to-many (1 × 10 3 order of magnitude) high computational data association problem caused by maintaining multi-scale voxel landmarks simultaneously in the H-MS map, we further propose a bidirectional matching algorithm (MSBM). This algorithm utilizes forward–reverse–forward projection to balance the efficiency and accuracy problem. The proposed H-MS map and MSBM algorithm are integrated into a completed LiDAR SLAM (HMS-SLAM) system. Finally, we validated the proposed map model, matching algorithm, and integrated system on the public KITTI dataset. The experimental results show that, compared with the ikd tree map, the H-MS map model has higher insertion and deletion efficiency, both having O ( 1 ) time complexity. The computational efficiency and accuracy of the MSBM algorithm are better than that of the small-scale priority matching algorithm, and the computing speed of the MSBM achieves 49 ms/time under a single CPU thread. In addition, the HMS-SLAM system built in this article has also reached excellent performance in terms of mapping accuracy and memory usage.

1. Introduction

Compared with visual SLAM [1,2,3], LiDAR SLAM is widely used in the industrial field due to its robustness in complex scenes (lighting changes and low textures). Currently, most LiDAR SLAM [4,5,6,7] systems use direct point cloud maps with a “point” being the smallest unit for environmental modeling. During the localization process, each measurement point will be associated with a landmark represented by a specific number of neighboring map point sets. However, because the point set with a specified number may not achieve optimal landmark modeling, it will further limit the accuracy of state estimation and subsequent map updating. On the other hand, by maintaining voxels of different scales [8,9], objects of various sizes in the environment can be accurately represented. However, compared with point cloud maps, maintaining voxel maps requires the preset voxel segmentation threshold or the estimation of the scene size, which increases the difficulty of using the map. Therefore, it is important to design a practical mapping method that is convenient to use and has a high map modeling accuracy for diverse, complex scenes.
For point-based map [10] models, researching universal scene representations and improving maintenance efficiency are currently hot topics. The authors of [11,12] proposed a structured feature extraction method based on scan smoothness. Then, the accumulated multi-frame feature sets were used to construct a structured point cloud map. An interesting map modeling method was proposed in [13], in which an implicit moving least squares surface composed of local points was developed. Compared with [11,12,14], the method of [13] is also suitable for unstructured scenes. Cai et al. [15] proposed a modified ikd tree model that supports increment point insertion and deletion. Consequently, compared with the traditional kd tree model, the new model does not need to rebuild a new tree structure when inserting new map points. The ikd tree map has been widely applied in different open-source SLAM systems [16,17].
For voxel-based [18] map models, the current research hotspot is to improve the map model accuracy. Biber et al. [19] considered constructing the environment as a normal distribution model within the fixed-size voxel, which can effectively represent complex scenes and achieve reliable positioning and mapping. Yuan et al. [8] used a specific threshold to segment voxels in the hash octree map. Thus, each voxel in the map contained a unique plane landmark. Undoubtedly, the accuracy and robustness of mapping depend on the threshold selection. In [20], Nguyen et al. used the improved octree model [9] to represent the environment. The plane landmarks of different scales are simultaneously maintained in corresponding voxels. However, this method requires manually presetting the maximum depth of the octree map by estimating the size of the scene. Unlike the above studies, this paper proposes a hash multi-scale map model that does not require presetting the voxel segmentation threshold or adjusting the maximum depth parameter according to the scene size.
On the other hand, to achieve robust and accurate robot localization, the SLAM system requires some corresponding matching algorithms. At present, the matching method based on “point” has been thoroughly studied. For example, some improved iterative closest point (ICP) [21,22] algorithms used the covariance distribution of measurement and target points during registration. Therefore, these algorithms can work well in complex scenarios. In [23,24], different methods were proposed to extract edge or surface feature points from point clouds, and then state estimation was achieved by matching feature points to the local feature map. In addition, the method of [25] introduced the intensity information of point clouds during the registration process, which further improves registration accuracy.
Recently, some research has focused on registering point clouds to voxel landmarks. Biber et al. [19] proposed a method matching each 3D measurement point with 27 neighboring voxels via the normal distribution model. Due to its ability to achieve one-to-many matching, this algorithm had strong robustness but slightly low accuracy. Liu et al. [26] developed a kd tree with the mean of all points in the voxel. Then, using the nearest matching solution, each feature point was associated with a unique voxel landmark by the kd tree. For the matching problem of multi-scale models, Nguyen et al. [20] proposed a small-scale priority matching algorithm to achieve the data association between the measurement and modified octree map. However, the premise of this algorithm is that small-scale map voxels can always achieve higher accuracy in modeling. This condition does not always hold in practice. On the other hand, in the multi-scale map model, the number of small-scale landmarks is usually much larger than that of larger scales. Therefore, traversing and comparing the smaller-scale landmarks will be a time-consuming process. To address the issues above, we propose a novel multi-scale bidirectional matching algorithm (MSBM). The core idea of this algorithm is to construct a candidate set using only a limited number of neighboring voxels at the to-be-searched scale.
The main contributions of this article are concluded below.
  • A novel hash multi-scale (H-MS) map model is proposed. This model can be used without estimating the scene size or presetting the voxel segmentation threshold.
  • A multi-scale bidirectional matching (MSBM) algorithm, adapted to the H-MS map model, is developed. The algorithm can achieve satisfactory performance in the trade-off problem between registration accuracy and efficiency.
  • The proposed map model and matching algorithm are integrated into a unified SLAM (HMS-SLAM) system. The proposed algorithm and system are compared with some state-of-the-art methods based on public datasets.
The remaining part of the article is organized as follows. Section 2 presents the proposed map model and matching algorithm. Section 3 introduces the integrated SLAM system. Section 4 and Section 5 detail the experiments and conclusions, respectively.

2. Method

2.1. Problem Formulation

In this article, we present a novel hash multi-scale (H-MS) map model designed to achieve precise scene modeling by concurrently maintaining voxel landmarks at different scales. Additionally, we employ a hash table to store multi-scale voxel groups, ensuring efficient data queries.
The H-MS map is maintained through point insertion and voxel deletion (single voxel, frame-based) operations. The temporal complexity of point insertion and single-voxel deletion operations is O ( 1 ) . Unlike existing methods such as the hash adaptive voxel map [8], which relies on predetermined thresholds for voxel segmentation, and the octree map [20], which requires estimation of scene size to specify maximum depth parameters, our model overcomes these limitations.
In addition, the proposed H-MS map model has a low matching efficiency because of the high computational complexity. To address this problem, we introduce a multi-scale bidirectional matching (MSBM) algorithm to enhance the efficiency and robustness of data association.

2.2. Hash Multi-Scale Map

This subsection introduces the proposed hash multi-scale map model from three aspects: data structure, map maintenance (insertion, deletion), and data querying.

2.2.1. Data Structure

In the H-MS map, each key value in the hash table connects to a root voxel node (the map voxel with the largest scale s 0 ( u n i t = m ) ) of the multi-scale voxel group. By defining the maximum depth d of the multi-scale map, the scale of voxels in other layers can be calculated by
s n = s 0 s 0 2 n 2 n
where n indicates that the voxel is located at the n-th ( n = 0 , 1 , , d 1 ) scale of the group. The voxel coordinate v 0 at scale s 0 can be derived by forward projecting the 3D point p to the map coordinate system m. Utilizing the hash function, we calculate the mapping relationship between v 0 and the key value k e y in the hash table, thus obtaining the corresponding root node Ø 0 in the H-MS map, and we have
p = [ p x , p y , p z ] T v 0 = p p s 0 s 0 = [ v 0 , x , v 0 , y , v 0 , z ] T k e y = h a s h _ c o m b i n e ( v 0 )
where h a s h _ c o m b i n e ( ) is the function used in the open-source b o o s t library to generate unique key values for the hash table. Based on the specified scale s n , the voxel coordinate v n corresponding to p can be obtained through the forward projection operation, i.e.,
v n = p p s n s n = [ v n , x , v n , y , v n , z ] T .
Because a map point will participate in the creation of voxel landmarks across all scales within the group, increasing the maximum depth d enhances map modeling accuracy. Furthermore, each voxel in the H-MS map maintains statistical features: point number N, mean u , and covariance ∑. The plane, edge, or normal distributed landmarks can be calculated from the statistical features. To manage memory usage, a frame-based voxel deletion method is also implemented in the H-MS map. Consequently, each map voxel needs to maintain an observation count variable l o b s . Figure 1 illustrates an example of the H-MS map.

2.2.2. Map Maintenance

The insertion and deletion of the H-MS map are operated in units of “point” and “voxel”, respectively. The reason for voxel deletion rather than point deletion is that the H-MS map aims to construct a multi-scale map based on voxel landmarks. Therefore, deletion is always performed with the voxel as the smallest unit.
Point insertion: For a given scale s n , assuming that the voxel coordinate of p at that scale is v n , x , v n , y , v n , z , the following equation determines the insertion position l e a f i d of the map voxel at the next scale:
q = p x < ( s n v n , x ) ? 0 : 1 p y < ( s n v n , y ) ? 0 : 1 p z < ( s n v n , z ) ? 0 : 1 l e a f i d = 4 q [ 0 ] + 2 q [ 1 ] + 1 q [ 2 ] .
The process of inserting a map point can be divided into two steps: (1) calculating the corresponding root node pointer Ø 0 in the hash table, and (2) recursively inserting points. The pseudocode is illustrated in Algorithm 1.
Algorithm 1: Map point insertion
Sensors 24 04011 i001
Voxel deletion: The H-MS map supports both single-voxel and frame-based voxel removal methods. Firstly, we use a method for removing a single voxel. Given the scale s n , we define the map coordinate m as the center position of the voxel to be deleted, and the corresponding processing method is introduced below.
(1)
Perform a forward projection of m with scale s 0 to obtain the root voxel coordinate v 0  by
v o = m m s 0 s 0 .
(2)
Utilize Equation (4) recursively to find and delete the voxel node at scale s n .
The frame-based voxel deletion method can be defined if the map is organized by frame. Firstly, when a frame of point cloud enters the map, the “point insertion” operation is used to update the H-MS map. All observable map voxels are stored in vector form, with the observation count l o b s adding 1.
Then, when removing an inserted frame from the map, operations happen in the stored voxel vector. By traversing the voxel vector, the observation count l o b s stored in the voxel will subtract 1. We can observe whether the observation count l 0 o b s of each root voxel is zero to determine whether the voxel group can be removed. The reason why we use the observation count l 0 o b s to delete the voxel group is that a map point is simultaneously inserted into voxels of different scales. The observation count l 0 o b s of the root voxel is always equal to or greater than those of smaller-scale voxels.

2.2.3. Data Query

The H-MS map supports querying k neighboring voxels at a specified scale, where k can take values of 1, 7, or 27. Firstly, the reverse projection operation is defined as projecting the voxel coordinate v n into the map coordinate system m through scale s n to calculate the map coordinate m , which takes the form
m = s n v n .
Taking map point p as an example, the query process for 27 neighboring voxels in scale s n can be outlined as:
(1)
Calculate the nearest voxel coordinate v n c of p through the forward projection of Equation (3). Then, the set of neighboring voxel coordinates V n = { v n 1 , , v n i , , v n 27 } can be represented as stacking 1 , 0 , 1 on each dimension of coordinate v n c .
(2)
Utilize Equation (6) and (5) to perform reverse and forward projection operations on the voxel coordinate set V n . This yields the corresponding map coordinate set M = { m 1 , , m i , , m 27 } and root node voxel coordinate set V 0 = { v 0 1 , , v 0 i , , v 0 27 } .
(3)
Recursively apply Equation (4) to the map coordinate set M to obtain the required voxel set at scale s n .

2.3. Multi-Scale Bidirectional Matching Algorithm

The multi-scale bidirectional matching algorithm is designed based on the H-MS map to associate measurement point clouds with corresponding voxel landmarks. During the data association process, each measurement point undergoes three operations: forward, reverse, and forward projections to obtain the candidate voxel set. The best match pair will be generated from the candidate voxel set. Subsequently, the algorithm is developed considering the matching target, evaluation criteria, and matching method.

2.3.1. Matching Target and Evaluation Criteria

Considering the generality of surface features in real scenes, we use the plane parameter ß m = [ n m , d m ] T calculated from the accumulated measurements in each voxel as the landmark (refer to Section 3.1.3 for the calculation method). Assuming that the point set in the LiDAR coordinate system l to be registered is Ω = p 1 l , , p i l , , p n l , the algorithm’s objective is to obtain the matching set U = p 1 l ß b e s t 1 , , p i l ß b e s t i , , p n l ß b e s t n .
Assuming that the predicted pose that can transform the measurement point p i l to the map coordinate system m, T ^ m l S E ( 3 ) . The corresponding rotation matrix and translation vector are R ^ m l S O ( 3 ) and t ^ m l R 3 , respectively. The distance from the point p i l to the candidate plane landmark ß m can be defined as
e i ß = ß T m T ^ m l p ˜ i l
where p ˜ i l is the homogeneous coordinate of p i l . We use the absolute value e ß as the evaluation criteria for matching.

2.3.2. Matching Method

Taking the measurement point p ˜ i l as an example, Algorithm 2 provides a specific method for obtaining the matching relationship between the point p ˜ i l and the plane landmark ß j , b e s t k .
In Algorithm 2, line 1 calculates the predicted 3D point in the map coordinate system m. Lines 2–3 initialize the scale vector. Lines 4–13 calculate the candidate landmark set at multiple scales and sort them based on evaluation criteria. Lines 14–18 perform a distance check on the obtained matching pairs with minimum criteria.
In addition, it can be seen that the candidate matching set for each measurement point is composed of a specified number of neighboring voxel sets at different scales. For instance, with the maximum depth d of the H-MS map being set to 3 and considering the neighboring set with 27 voxels, there can be up to 81 candidate-matching landmarks in the p r i o r i t y _ q u e u e . Compared with the small-scale priority matching algorithm [20], the proposed method enables an equitable search for the best landmark across all scales and utilizes a smaller candidate set.
Algorithm 2: Multi-scale bidirectional matching algorithm
Sensors 24 04011 i002

3. System

This section introduces the 3D LiDAR SLAM framework that integrates the proposed H-MS map and MSBM algorithm. Figure 2 depicts the system flowchart. The localization and mapping module discussed in Section 3.1 and the loop closure module outlined in Section 3.2 are executed in two separate threads.

3.1. Localization and Mapping

3.1.1. Robot Localization Based on the MSBM Algorithm

To enhance the computational efficiency of localization and mapping, we employ the method proposed in [13] to process the raw point cloud data and obtain a sampling point set Ω = p 1 l , , p i l , , p n l . Then, Algorithm 2 is utilized to achieve data association U between the sampling point set Ω and H-MS map Φ .
Because the robot is typically mobile during the continuous sampling process of the LiDAR sensor, it is necessary to process motion distortion when estimating the robot pose using a raw point cloud [23]. Assume that the initial and final poses of the current frame F j are X j = T j s , T j e SE 3 2 , with corresponding timestamps η j s , η j e . The localization module can utilize the data association U and the following equation to construct a cost function for estimating X j , i.e.,
min X j i = 1 n e ¯ i , j ß X j 2 2 + θ t t j s t j 1 e 2 2
where the second term represents the inter-frame localization consistency constraint. θ t is the corresponding weight coefficient, which is proposed in [27]. e ¯ i , j ß X j denotes a point-to-plane error considering distortion correction,
e ¯ i , j ß = n b e s t i p i , j m + d b e s t i
where point p i , j m in the map coordinate system m is obtained by transforming the measurement point p i , j l with
p i , j m = R j λ i p i , j l + t i λ i R j λ i = s l e r p q j s , q j e , λ i t j λ i = 1 λ i t j s + λ i t j e .
In Equation (10), q j s , q j e SO ( 3 ) 2 represents the rotational component of X j , and t j s , t j e R 3 2 represents the translation component of X j . The s l e r p ( ) function calculates the spherical linear interpolation between two quaternions. λ i denotes the relative timestamp, which has
λ i = η i η s η i η s η e η s η e η s .
Finally, we employ the LM algorithm [28] to solve Equation (8).

3.1.2. Keyframe Detection

When both the time interval and relative motion transformation conditions are satisfied, the system will create a new keyframe:
(1)
Time interval: Assume that the time when the last keyframe was inserted is η l a s t k f . If the interval between η l a s t k f and η j e (the end timestamp of current frame F j ) exceeds the threshold η t h k f , condition (1) holds.
(2)
Relative motion transformation: Calculate the relative transformation T r e l k f based on the end pose T j e of the current frame F j and the pose T l a s t k f of the last inserted keyframe. If the relative translation part t r e l k f exceeds threshold t t h or the relative rotation part R r e l k f represented by Euler angle exceeds threshold R t h , condition (2) holds.
When a keyframe needs to be created, the system utilizes the estimation result X j from Section 3.1.1 to correct the motion distortion of the sampling point set Ω . Then, the end pose T j e of the current frame F j and distortion point cloud Ω ¯ are stored in the new keyframe F j . Finally, F j will be inserted into the sliding window S and keyframe list H .

3.1.3. H-MS Map Updating

The system utilizes keyframes in the sliding window S to maintain a local H-MS map, where each voxel stores the independent statistical feature Λ k = [ N k , u k , k ] and the plane parameter ß k m = [ n k m , d k m ] T . For the newly inserted keyframe F c u r , we employ its pose T c u r to transform Ω ¯ to the map coordinate system m. Then, Algorithm 1 recursively inserts each map point into the H-MS map. All map voxels observed by the point cloud in F c u r are stored as a vector.
To accelerate the H-MS map updating step, the system first updates the statistical features of the smallest-scale map voxels using a “point increment” method. Assuming that the statistical features of the current map voxel are the point number N k , mean u k , and covariance k , then the new statistical features updated by p N k m (keyframe index j is omitted) can be represented as:
N k = N k + 1 u k = ( N k u k + p N k m ) / N k k = 1 N k i = 1 N k ( p i m u k ) ( p i m u k ) T = 1 N k i = 1 N k ( p i m u k + u k u k ) ( p i m u k + u k u k ) T = 1 N k N k k + ( p N k m u k ) ( p N k m u k ) T + ( N k + 1 ) ( u k u k ) ( u k u k ) T + 2 ( u k u k ) ( p N k m u k ) T .
The statistical feature at other scales can be recursively obtained from smaller scales using the method proposed by [20]. Assume that the eigenvalues of covariance k in descending order are [ α 0 , α 1 , α 2 ] . The plane parameters in each voxel can be updated. The normal vector n k m is the feature vector corresponding to α 0 , and the offset is d k m = n k m · u k . The weight of the plane landmark is calculated by W = ( α 1 α 2 ) ( α 1 α 2 ) α 0 α 0 .
When the number of keyframes in the sliding window S exceeds K s l , the system will perform a “minus 1” operation on the observation count l 0 o b s of all visible voxels in the earliest keyframe. When the observation count l 0 o b s of the root voxel returns to zero, the voxel group will be removed from the hash table. Finally, the earliest keyframe is deleted from the sliding window S .

3.2. Loop Detection and Correction

3.2.1. Loop Detection

When a new keyframe F c u r enters, the system constructs a kd tree based on the translation component of each keyframe in the list H . Then, the translation component t c u r of the current keyframe is searched within the kd tree to obtain a set of neighboring historical keyframes. If the difference between the timestamp of a historical keyframe F h i s and the timestamp of the current keyframe exceeds η t h l o o p , the system will create a local point cloud map. The map comprises n l o o p historical keyframes before and after F h i s .
By using the ICP [29] algorithm, the measurement point cloud stored in the current keyframe F c u r can be registered to the local map. The registration error is defined as e l o o p and the correction transformation is T c o r r e c t c u r . When e l o o p is small enough, it means that a loop can be found. Assuming the estimated pose of the historical keyframe F h i s is T h i s , then the relative motion transformation T h , c l o o p from F c u r to F h i s can be computed by
T h , c l o o p = T h i s 1 T c o r r e c t c u r .
The k-th loop id pair ( q k h , q k c ) and the corresponding relative transformation T q k h , q k c l o o p are stored in an independent loop list L .

3.2.2. Pose Graph Optimization and Map Reconstruction

When the loop detection is successfully triggered, we utilize the keyframe list H and loop list L to construct a pose graph optimization [30] model aimed at eliminating accumulated localization errors. The pose error terms formed by the keyframe F j and F j + 1 are defined as
e j , j + 1 = ln ( T ¯ j , j + 1 1 T j 1 T j + 1 )
where T j and T j + 1 represent the odometry poses of keyframes F j and F j + 1 that have not undergone loop correction. T ¯ j , j + 1 1 represents the relative pose transformation between two keyframes in the keyframe list H and loop list L , which is used as a measurement value in pose graph optimization. Then, the cost function corresponding to all error terms can be represented as
min T j k f H j = 0 n k f 1 e j , j + 1 T e j , j + 1 2 2 + q k L e q k h , q k c T e q k h , q k c 2 2 .
After loop correction, we utilize the last n s l n e a r keyframes in the sliding window S for map reconstruction. Based on the corrected keyframe pose and Algorithm 1, the sampling point set Ω ¯ in each keyframe is traversed and reinserted into the H-MS map.

4. Results and Discussion

In this section, we validate the proposed algorithm and the integrated system. Specifically, Section 4.2 compares the insertion and deletion performance of the proposed H-MS map model with the ikd tree [15] model. Then, the modified MSBM algorithm is compared with the matching algorithm, which can also be used for the H-MS map but is based on the small-scale priority idea [20], as well as the commonly used G-ICP [21,22] and NDT [19] matching algorithms. The efficiency and accuracy of the MSBM algorithm are verified. In Section 4.3, we compare the HMS-SLAM system constructed in this article with other advanced LiDAR SLAMs (A-LOAM [31], CT-ICP [27], and VoxelMap [8]) to verify the overall mapping accuracy of the system.

4.1. Experimental Setup and Evaluation Metrics

The validation environment for all experiments in this article contained an AMD® Ryzen 7 5800h (8 cores @ 3.2 GHz) CPU, 16 GB of RAM, and ROS Melodic. The parameter settings and evaluation metrics involved in the experiment are provided as follows.
Parameter settings: In the map performance experiment, the parameters of the H-MS map are set as the maximum depth d = 3 and maximum scale s 0 = 1.0 . For the ikd tree map, the parameters include the delete parameter d e l = 0.3 , balance parameter b a l = 0.6 , and box length b o x = 0.2 .
In the matching algorithm experiment, the parameters of the MSBM algorithm are the number of neighboring voxels k = 27 , the threshold for the accumulated map point number N t h = 10 , the threshold for the plane landmark weight W t h = 0.5 , and the matching criteria threshold D t h = 1.0 m. The voxel search radius for small-scale priority matching is r = 1.0 m; the parameters of N t h , W t h and D t h are as above. G-ICP and NDT follow the default settings of [32]. To mitigate frame loss due to high computational steps in G-ICP and NDT, they leverage the OpenMP [33] library for parallel acceleration processing.
In the integrated system experiment, the parameters of the H-MS map include the maximum depth d = 4 and maximum scale s 0 = 2.0 , and the MSBM algorithm parameters remain consistent with those in the matching algorithm experiment. The parameters for keyframe detection include the time interval η t h k f = 0.3 , relative translation transformation threshold t t h = 1.0 m, and relative rotation transformation threshold R t h = 10 . The sliding window size utilized to maintain the local H-MS map is K s l = 100 . Parameters for the loop closure thread consist of the loop time interval η t h l o o p = 20 s , number of neighboring historical keyframes n l o o p = 5 , ICP error threshold L t h = 0.3 m, and number of reconstructed keyframes K s l n e a r = 7 .
Evaluation metric: In the map performance experiment, we analyzed the computational efficiency of map insertion and deletion operations. All operations are timed in milliseconds. In the matching algorithm and integrated system experiments, we employed standard metrics from the SLAM field, such as ATE [34] (absolute trajectory error) and RPE [35] (relative pose error), to evaluate the proposed methods.

4.2. Algorithm Validation

4.2.1. Map Performance Experiment

The map performance experiment is mainly divided into two parts: point insertion and voxel deletion. For the map model conducting comparative experiments, an ikd tree [15] is an improved kd tree model that can achieve incremental point insertion and deletion operations. When a new point cloud frame is inserted into the map, the ikd tree does not need to be reconstructed, making it more suitable for SLAM tasks.
In the point insertion experiment, we randomly sample 3D points within a space of 100 m × 100 m × 100 m. Subsequently, we insert 2000 points into the H-MS and ikd tree maps in each iteration, calculating the time required for the operation. Figure 3 illustrates the accumulated and single-time consumption of two models during 2 × 10 5 point insertions. It can be seen that the H-MS map has a better computational efficiency in the point insertion operation than the ikd tree map. This efficiency stems from the fact that a map point can access the corresponding root voxel in the H-MS map in O ( 1 ) time through the hash table. Meanwhile, the finite-time recursive insertion of a point also has the constant temporal complexity of O ( 1 ) . In contrast, the temporal complexity of point insertion in the ikd tree is O l o g n .
The voxel deletion experiment is subdivided into random voxel deletion and frame-based voxel deletion experiments. In the random voxel deletion experiment, we first generate an H-MS map using the method outlined in the point insertion experiment. Concurrently, the map coordinates of map voxels are stored separately by scale. Then, we record the accumulated and single computation time required to remove the map voxel set (200 voxels) at different scales. This experiment compares the H-MS map’s computation efficiency in removing voxels of different scales. Figure 4a shows that the voxel removal operation is proportional to the number of voxels being removed, validating the O ( 1 ) temporal complexity of the single voxel deletion operation. Additionally, it is worth noting that voxel deletion at the scale of s 1 always takes longer than at other scales. This observation suggests that when deleting voxels at scale s 1 , the total time required to locate the voxel from the map and remove its voxel groups is the longest.
Then, we conducted frame-based voxel deletion experiments using the outdoor KITTI odometry [35] and indoor Hilti [36] datasets. The KITTI odometry dataset offers pose ground truth and processed point cloud data without motion distortion, making it easy to construct the desired map. The Hilti dataset provides raw point cloud data with timestamps and trajectory ground truth. We applied the linear interpolation model proposed in [23] to process the raw point cloud and obtain the distortion-corrected point cloud in the LiDAR coordinate system.
Starting from frames 0, 50, 100, and so on of the sequential sequence in the dataset, we use continuous 50-frame point clouds to construct the H-MS map. Based on the voxel deletion method described in Section 3.1.3, the earliest keyframe can be removed from the sliding window. Table 1 presents the number of map voxels with observation count l 0 o b s = 0 in the earliest keyframe, along with their operation times for observation maintenance and voxel removal.
From Table 1, we found that performing frame-based voxel deletion operations on indoor Hilti datasets typically takes less time compared with outdoor autonomous driving scenes. This is because more voxels the small-scale scenes in the earliest keyframe can be observed by other keyframes in the sliding window in the small-scale scenes. These voxels do not need to be removed. For the experimental results of the Hilti-lab sequence, it is evident that both the removed number and voxel removal operation time are zero. This indicates that all observed map voxels in the earliest keyframe have a co-view relationship with other historical keyframes. Additionally, the number of voxels deleted in a single frame is always less than 500. The total cost time of the observation maintenance and voxel removal operations is less than 2 ms, which further verified the efficiency of H-MS map maintenance.

4.2.2. Matching Algorithm Experiment

In this section, we conducted comparative experiments on matching algorithms using the KITTI raw [35] and ETHZ [37] datasets. The KITTI raw dataset, collected by vehicles equipped with a Velodyne HDL-64E LiDAR sensor in urban environments, provides raw point cloud data without motion distortion correction. This dataset is useful for testing the registration accuracy and computational efficiency of matching algorithms in structured scenes. The ETHZ dataset contains point cloud data collected by a Hokuyo UTM-30LX LiDAR sensor in various unstructured scenes, such as mountains and woods. This dataset is ideal for verifying the registration accuracy of matching algorithms in more challenging scenarios.
For the matching algorithms conducting comparative experiments, SSPM [20] layers voxel landmarks of different scales within a specified radius range. Then, smaller-scale voxels are prioritized for matching judgments. The algorithm can terminate in advance once the “predicate” (plane landmark weight and point number) condition is satisfied. G-ICP [21] was a modified ICP matching algorithm that considered both measurement and target covariance distribution, exhibiting good scene adaptability. Based on [21], VG-ICP [22] further achieved registration from distribution to multiple distributions using voxels. The 3D-NDT model [19] used the fixed-size grid to segment the environment and stored voxels as the normal distribution model. Table 2 and Table 3 show different matching algorithms’ registration accuracy and computational time on the real-time collected KITTI raw dataset. Table 4 shows the registration accuracy of these algorithms on the offline-collected ETHZ dataset.
As depicted in Table 2, the proposed MSBM algorithm achieved the best registration accuracy in most sequences. Furthermore, compared with the SSPM [20] algorithm, which is also applicable to the H-MS map, the MSBM algorithm yielded superior estimation results on the dataset. In applications such as ground mobile robot navigation, large-scale voxel landmarks typically enable higher precision modeling of the ground. Because the MSBM algorithm evaluates all voxels in the candidate set equally, it ensures that large-scale landmarks are not overlooked, thereby obtaining the optimal matching results.
In Table 3, G-ICP exhibits the highest operational efficiency. This is because both G-ICP and VG-ICP algorithms are implemented based on the OpenMP library. High time-consuming steps such as covariance matrix calculation and data association are accelerated through multi-threaded parallel processing. In contrast, the MSBM and SSPM algorithms only run on a single thread and require fewer computational resources. Furthermore, the computational efficiency of the MSBM algorithm is nearly twice that of the SSPM algorithm, thereby satisfying the real-time requirements of the SLAM system.
To analyze the temporal complexity of the MSBM and SSPM algorithms, we take the registration process of a 3D measurement point as an example. Assuming that the maximum depth of the H-MS map is d and the number of neighboring voxels for search is k. The MSBM algorithm uses multi-scale and finite neighboring search strategies to construct the candidate voxel set. The temporal complexity of the MSBM algorithm is O ( k d ) . The SSPM algorithm employs a small-scale priority search to construct the candidate voxel set, resulting in a temporal complexity of O ( 8 d ) . Notably, as the parameter d increases, the computation time of the MSBM algorithm only increases linearly, whereas the computation time of the SSPM algorithm increases exponentially. This demonstrates the superior efficiency of the MSBM algorithm.
Table 4 presents the registration accuracy of different matching algorithms on the ETHZ dataset. Compared with other matching methods, the proposed MSBM algorithm achieves better registration results in most sequences. It is worth noting that during this experiment, we maintained a unique plane landmark for each voxel in the H-MS map. Although using planar parameters for environmental modeling in unstructured scenarios might seem inappropriate, the H-MS map’s ability to maintain voxels at different scales allows for effective segmentation of irregular objects like trees and grasslands. This ensures that these objects can always be segmented into a reasonably scaled voxel to find an effective local plane. Consequently, the registration accuracy of the MSBM algorithm is guaranteed in unstructured scenarios.

4.3. Integrated System Testing and Analysis

Compared with the KITTI raw [35] dataset, the KITTI-360 [38] dataset offers longer sequences, including point clouds larger than 1 × 10 4 frames, enabling the evaluation of the accuracy and stability of SLAM systems in large-scale scenarios. In this experiment, the proposed HMS-SLAM system, integrating the H-MS map and the MSBM algorithm, is compared with other advanced LiDAR SLAM frameworks using the ATE [34] and RPE [35] metrics. The ATE [34] metric evaluates the system’s absolute localization accuracy, while the RPE [35] metric reflects odometry drift. The loop closure modules were turned off during the experiment to ensure fairness in odometry verification.
For the SLAM frameworks involved in the comparison, A-LOAM [31] and CT-ICP [27] construct landmarks based on “points”, while VoxelMap [8] builds landmarks based on voxels. Specifically, A-LOAM [31] is a modified system based on [23], eliminating the use of the inertial measurement unit (IMU) sensor and relying on edge and plane feature point sets extracted from the point cloud for two-stage mapping (scan-to-scan and scan-to-map). The system organizes point cloud maps using kd trees, and each feature point in the set is matched with a landmark composed of a specified number of neighboring point set found in the kd tree. State estimation is accomplished using the Ceres 2.0.0 [39] non-linear optimizer. CT-ICP [27] maintains map points in fixed-size hash voxels and constructs landmarks using a specified number of neighboring map point sets. Additionally, the proposed continuous-time ICP algorithm considers intra-frame continuity and inter-frame non-continuity for state estimation, registering measurements to the map while removing point cloud motion distortion. Voxelmap [8] used a hash adaptive voxel map that maintained a unique plane landmark in each voxel based on the preset segmentation threshold. Moreover, Voxelmap [8] utilized an estimator based on IEKF [40] that considered both map and measurement uncertainties for robot localization.
As illustrated in Table 5, the proposed HMS-SLAM system achieved the best ATE and RPE evaluation accuracy in most sequences. For the comparison frameworks, we noticed that A-LOAM [31] and VoxelMap [8] performed worse. The A-LOAM [31] system utilized a scan-to-scan method to correct motion distortion and estimate the initial odometry pose for map refinement. However, due to the limited information in a single frame, the feature sampling set of the current frame may fail to construct a reasonable data association with the last frame. Additionally, the scan-to-scan method lacks the capability to filter data associations using the plane landmark weight, as demonstrated in [27] and our method. The hash adaptive voxel map proposed by [8] is adaptable to urban structural environments. However, the system depends on the preset threshold for the map voxel segmentation operation. This may result in the inability to create voxels at the optimal scale, thereby reducing positioning accuracy. Furthermore, we noticed that VoxelMap [8] crashed due to system memory exhaustion in sequence 02. This occurred because VoxelMap utilized all measurement information to construct a complete hash adaptive voxel map during the runtime cycle. Although the adaptive segmentation operation somewhat controls memory usage, voxel-based memory consumption is higher than that of point-based SLAM systems. Moreover, the idea of maintaining multi-scale voxels to ensure modeling accuracy for our proposed H-MS map leads to faster memory growth compared with VoxelMap. Without targeted processing, the HMS-SLAM system may struggle to achieve long-term operation. To address this issue, we introduce the frame-based voxel deletion operation in the H-MS map. Consequently, the localization and mapping tasks of the HMS-SLAM system rely only on the sliding window with a limited number of keyframes. Finally, from the results of the ATE metric, the localization accuracy of CT-ICP [27] is roughly equivalent to that of the HMS-SLAM system. However, CT-ICP [27] relied on a fixed number of neighboring point sets to construct the plane landmark, which limits the map model’s accuracy and further impacts localization results.
Figure 5 illustrates the H-MS map constructed by the HMS-SLAM system under sequence 10 of the KITTI-360 dataset. Figure 5a–c depicts effective map voxels with a plane weight greater than 0.5 at scale s 0 , s 1 , and s 2 , respectively. The inset in the bottom-right corner provides an enlarged view of the highlighted red box. It can be seen that the effective voxels at scale s 0 mainly come from the ground. Meanwhile, most vertical walls are not regular large-scale structural planes, and only a few voxels have been successfully created. In the voxel map at scale s 1 , a greater number of effective wall landmarks can be seen. In the voxel map at scale s 2 , effective voxel landmarks are also generated for the car on the ground. By maintaining a hash multi-scale map, objects of different sizes in the environment can be accurately modeled using voxels at the corresponding scale.

5. Conclusions

In this article, we propose a novel hash multi-scale (H-MS) map for environmental modeling and representation. This model considers allowing a map point to simultaneously participate in creating map voxel at different scales to ensure the accuracy of map modeling. Meanwhile, by storing the root node of the multi-scale voxel group in the hash table, each group can be quickly accessed. Compared with commonly used octree and kd tree maps, the H-MS map can be used without estimating the scene size and has higher map maintenance efficiency. On the other hand, while maintaining multi-scale landmarks improves map completeness, the large number of feasible voxel landmarks also increases the difficulty of data association. To address this issue, we designed a multi-scale bidirectional matching (MSBM) algorithm. The algorithm can quickly obtain the candidate landmark set through forward–reverse–forward projection and achieve efficient data association. Then, the proposed H-MS map and MSBM algorithm are uniformly integrated into the HMS-SLAM system. Based on the publicly available KITTI dataset, we validated the maintenance efficiency of the H-MS map, the accuracy and time consumption of the MSBM algorithm, and the mapping accuracy of the HMS-SLAM system. The experimental results show that the H-MS map can complete insertion and deletion operations with O ( 1 ) temporal complexity, with higher maintenance efficiency than the ikd tree map. Compared with the small-scale priority matching algorithm, which also applies to the H-MS map, the MSBM algorithm has improved accuracy and efficiency. Moreover, the single run time of the MSBM algorithm on the CPU is around 49 ms, which meets the real-time application requirements of the SLAM system. Compared with other advanced LiDAR SLAM frameworks, the HMS-SLAM system built in this article has advantages in memory usage and mapping accuracy. In the future, we will apply a semantic segmentation module to the HMS-SLAM system to enable scene modeling that includes high-level semantic information. It is beneficial for improving the robot’s understanding and adaptability to complex environments such as high dynamics.

Author Contributions

Conceptualization, T.M. and Y.O.; methodology, T.M. and S.X.; software, T.M. and L.K.; validation, T.M. and L.K.; formal analysis, T.M.; investigation, T.M. and Y.O.; resources, Y.O.; data curation, T.M. and L.K.; writing—original draft preparation, T.M.; writing—review and editing, T.M., S.X., and Y.O.; visualization, T.M.; supervision, Y.O. and S.X.; funding acquisition, Y.O. and S.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grants Nos. 62173319, 62063006, and 62273327); in part by the Guangdong Basic and Applied Basic Research Foundation (2022B1515120067); and in part by the Shenzhen Science and Technology Program (KCXFZ20211020165003005).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this work are not publicly available at this time. Raw data can be obtained upon reasonable request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
  2. Chen, W.; Zhu, L.; Loo, S.Y.; Wang, J.; Wang, C.; Meng, M.Q.H.; Zhang, H. Robustness improvement of using pre-trained network in visual odometry for on-road driving. IEEE Trans. Veh. Technol. 2021, 70, 12415–12426. [Google Scholar] [CrossRef]
  3. Liu, G.; Zeng, W.; Feng, B.; Xu, F. DMS-SLAM: A general visual SLAM system for dynamic scenes with multiple sensors. Sensors 2019, 19, 3714. [Google Scholar] [CrossRef] [PubMed]
  4. Yang, L.; Ma, H.; Wang, Y.; Xia, J.; Wang, C. A Tightly Coupled LiDAR-Inertial SLAM for Perceptually Degraded Scenes. Sensors 2022, 22, 3063. [Google Scholar] [CrossRef] [PubMed]
  5. Ma, T.; Jiang, G.; Ou, Y.; Xu, S. Semantic geometric fusion multi-object tracking and lidar odometry in dynamic environment. Robotica 2024, 42, 891–910. [Google Scholar] [CrossRef]
  6. Yokozuka, M.; Koide, K.; Oishi, S.; Banno, A. LiTAMIN2: Ultra light LiDAR-based SLAM using geometric approximation applied with KL-divergence. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11619–11625. [Google Scholar]
  7. Xu, S.; Doğançay, K. Optimal sensor placement for 3-D angle-of-arrival target localization. IEEE Trans. Aerosp. Electron. Syst. 2017, 53, 1196–1211. [Google Scholar] [CrossRef]
  8. Yuan, C.; Xu, W.; Liu, X.; Hong, X.; Zhang, F. Efficient and probabilistic adaptive voxel mapping for accurate online lidar odometry. IEEE Robot. Autom. Lett. 2022, 7, 8518–8525. [Google Scholar] [CrossRef]
  9. Duberg, D.; Jensfelt, P. UFOMap: An efficient probabilistic 3D mapping framework that embraces the unknown. IEEE Robot. Autom. Lett. 2020, 5, 6411–6418. [Google Scholar] [CrossRef]
  10. Behley, J.; Stachniss, C. Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments. In Proceedings of the Robotics: Science and Systems, Pittsburgh, PA, USA, 26–30 June 2018; Volume 2018, p. 59. [Google Scholar]
  11. Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. Lio-sam: Tightly-coupled lidar inertial odometry via smoothing and mapping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 5135–5142. [Google Scholar]
  12. Shan, T.; Englot, B. Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 4758–4765. [Google Scholar]
  13. Deschaud, J.E. IMLS-SLAM: Scan-to-model matching based on 3D data. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 2480–2485. [Google Scholar]
  14. Xu, S.; Wu, L.; Doğançay, K.; Alaee-Kerahroodi, M. A hybrid approach to optimal TOA-sensor placement with fixed shared sensors for simultaneous multi-target localization. IEEE Trans. Signal Process. 2022, 70, 1197–1212. [Google Scholar] [CrossRef]
  15. Cai, Y.; Xu, W.; Zhang, F. ikd-tree: An incremental kd tree for robotic applications. arXiv 2021, arXiv:2102.10808. [Google Scholar]
  16. Xu, W.; Cai, Y.; He, D.; Lin, J.; Zhang, F. Fast-lio2: Fast direct lidar-inertial odometry. IEEE Trans. Robot. 2022, 38, 2053–2073. [Google Scholar] [CrossRef]
  17. Lin, J.; Zhang, F. R3LIVE: A Robust, Real-time, RGB-colored, LiDAR-Inertial-Visual tightly-coupled state Estimation and mapping package. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 10672–10678. [Google Scholar]
  18. Ye, H.; Chen, G.; Chen, W.; He, L.; Guan, Y.; Zhang, H. Mapping While Following: 2D LiDAR SLAM in Indoor Dynamic Environments with a Person Tracker. In Proceedings of the 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), Sanya, China, 27–31 December 2021; pp. 826–832. [Google Scholar]
  19. Biber, P.; Straßer, W. The normal distributions transform: A new approach to laser scan matching. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453), Las Vegas, NV, USA, 27–31 October 2003; Volume 3, pp. 2743–2748. [Google Scholar]
  20. Nguyen, T.M.; Duberg, D.; Jensfelt, P.; Yuan, S.; Xie, L. SLICT: Multi-input multi-scale surfel-based LiDAR-inertial continuous-time odometry and mapping. IEEE Robot. Autom. Lett. 2023, 8, 2102–2109. [Google Scholar] [CrossRef]
  21. Segal, A.; Haehnel, D.; Thrun, S. Generalized-icp. In Proceedings of the Robotics: Science and Systems, Seattle, WA, USA, 28 June–1 July 2009; Volume 2, p. 435. [Google Scholar]
  22. Koide, K.; Yokozuka, M.; Oishi, S.; Banno, A. Voxelized gicp for fast and accurate 3D point cloud registration. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11054–11059. [Google Scholar]
  23. Zhang, J.; Singh, S. LOAM: Lidar Odometry and Mapping in Real-time. In Proceedings of the Robotics: Science and Systems, Berkeley, CA, USA, 12–16 July 2014; Volume 2, pp. 1–9. [Google Scholar]
  24. Xu, W.; Zhang, F. Fast-lio: A fast, robust lidar-inertial odometry package by tightly-coupled iterated kalman filter. IEEE Robot. Autom. Lett. 2021, 6, 3317–3324. [Google Scholar] [CrossRef]
  25. Wang, H.; Wang, C.; Xie, L. Intensity-slam: Intensity assisted localization and mapping for large scale environment. IEEE Robot. Autom. Lett. 2021, 6, 1715–1721. [Google Scholar] [CrossRef]
  26. Liu, Z.; Zhang, F. Balm: Bundle adjustment for lidar mapping. IEEE Robot. Autom. Lett. 2021, 6, 3184–3191. [Google Scholar] [CrossRef]
  27. Dellenbach, P.; Deschaud, J.E.; Jacquet, B.; Goulette, F. Ct-icp: Real-time elastic lidar odometry with loop closure. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 5580–5586. [Google Scholar]
  28. Madsen, K.; Nielsen, H.B.; Tingleff, O. Methods for Non-Linear Least Squares Problems; Technical Report; Informatics and Mathematical Modeling, Technical Universityof Denmark: Lyngby, Denmark, 2004. [Google Scholar]
  29. Besl, P.J.; McKay, N.D. Method for registration of 3-D shapes. In Proceedings of the Sensor Fusion IV: Control Paradigms and Data Structures; International Society for Optics and Photonics: Bellingham, WA, USA, 1992; Volume 1611, pp. 586–606. [Google Scholar]
  30. Frese, U.; Larsson, P.; Duckett, T. A multilevel relaxation algorithm for simultaneous localization and mapping. IEEE Trans. Robot. 2005, 21, 196–207. [Google Scholar] [CrossRef]
  31. Qin, T.; Cao, S. A-LOAM: A Lidar Odometry and Mapping. 2019. Available online: https://github.com/HKUST-Aerial-Robotics/A-LOAM (accessed on 20 April 2024).
  32. Koide, K.; Miura, J.; Menegatti, E. A portable three-dimensional LIDAR-based system for long-term and wide-area people behavior measurement. Int. J. Adv. Robot. Syst. 2019, 16, 1729881419841532. [Google Scholar] [CrossRef]
  33. Chandra, R. Parallel Programming in OpenMP; Morgan kaufmann: San Francisco, CA, USA, 2001. [Google Scholar]
  34. Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
  35. Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
  36. Helmberger, M.; Morin, K.; Berner, B.; Kumar, N.; Cioffi, G.; Scaramuzza, D. The hilti slam challenge dataset. IEEE Robot. Autom. Lett. 2022, 7, 7518–7525. [Google Scholar] [CrossRef]
  37. Pomerleau, F.; Liu, M.; Colas, F.; Siegwart, R. Challenging data sets for point cloud registration algorithms. Int. J. Robot. Res. 2012, 31, 1705–1711. [Google Scholar] [CrossRef]
  38. Liao, Y.; Xie, J.; Geiger, A. Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 3292–3310. [Google Scholar] [CrossRef] [PubMed]
  39. Agarwal, S.; Mierle, K. Ceres Solver: Tutorial & Reference; Google Inc.: Mountain View, CA, USA, 2023. [Google Scholar]
  40. Barfoot, T.D. State Estimation for Robotics; Cambridge University Press: Cambridge, UK, 2024. [Google Scholar]
Figure 1. An example of the H-MS map. Ø 0 is a map voxel at scale s 0 , Ø 1 [ 0 ] , Ø 1 [ 1 ] , and Ø 1 [ 2 ] are map voxels at scale s 1 . Map point p participates in the calculation of statistical features for both Ø 0 and Ø 1 [ 1 ] simultaneously.
Figure 1. An example of the H-MS map. Ø 0 is a map voxel at scale s 0 , Ø 1 [ 0 ] , Ø 1 [ 1 ] , and Ø 1 [ 2 ] are map voxels at scale s 1 . Map point p participates in the calculation of statistical features for both Ø 0 and Ø 1 [ 1 ] simultaneously.
Sensors 24 04011 g001
Figure 2. The HMS-SLAM system flowchart.
Figure 2. The HMS-SLAM system flowchart.
Sensors 24 04011 g002
Figure 3. Comparison of point insertion performance between H-MS map and the ikd tree map.
Figure 3. Comparison of point insertion performance between H-MS map and the ikd tree map.
Sensors 24 04011 g003
Figure 4. Time consumption for removing voxels from the H-MS map at different scales.
Figure 4. Time consumption for removing voxels from the H-MS map at different scales.
Sensors 24 04011 g004
Figure 5. The H-MS map created by the HMS-SLAM system in sequence 10 of the KITTI-360 dataset.
Figure 5. The H-MS map created by the HMS-SLAM system in sequence 10 of the KITTI-360 dataset.
Sensors 24 04011 g005
Table 1. Time consumption and removed voxel number for frame-based voxel deletion operation under the KITTI odometry and Hilti datasets.
Table 1. Time consumption and removed voxel number for frame-based voxel deletion operation under the KITTI odometry and Hilti datasets.
SequenceObservation Maintenance [ms]Voxel Removal [ms]Removed Number
KITTI-000.8740.602342
KITTI-011.2020.703385
KITTI-021.0360.399204
KITTI-031.0920.408203
KITTI-040.9460.445210
KITTI-050.6310.17591
KITTI-061.3580.339192
KITTI-070.8740.193121
KITTI-081.1870.479309
KITTI-090.7140.444254
KITTI-100.4150.15673
Hilti-lab0.1270.0000
Hilti-arena0.4280.0136
Hilti-upper10.4450.341240
Hilti-upper30.2050.07665
Hilti-base20.1270.0082
Table 2. Accuracy evaluation for different matching algorithms under the KITTI raw dataset.
Table 2. Accuracy evaluation for different matching algorithms under the KITTI raw dataset.
Absolute Motion Trajectory RMSE [m]
Sequence SSPM [20] G-ICP [21] VG-ICP [22] MSBM (Ours)
1003-002715.4416.98710.0895.582
1003-004217.868--19.816
1003-003443.102--7.681
0930-0016-19.97487.848-
0930-001811.0953.27711.5024.623
0930-002019.1573.6002.7920.846
0930-00278.2820.5572.0341.301
0930-0028-7.2709.29310.975
0930-00334.5167.85746.3332.748
0930-00343.4343.32023.3082.169
Mean15.3126.60524.1506.193
Note: ‘-’ means the sequence was not successfully run entirely. Bold values represent the best results.
Table 3. Average time consumption for different matching algorithms under the KITTI raw dataset.
Table 3. Average time consumption for different matching algorithms under the KITTI raw dataset.
Average Time Consumption [ms]
Sequence SSPM [20] G-ICP [21] VG-ICP [22] MSBM (Ours)
1003-002777.18215.92918.79649.739
1003-004254.055--46.884
1003-003493.571--49.147
0930-0016-24.76619.719-
0930-0018104.12017.50321.72348.977
0930-0020104.02026.29529.84445.751
0930-0027101.78115.69415.67053.205
0930-0028-19.89622.91448.129
0930-003399.03221.28722.80050.645
0930-003495.79316.21620.79349.119
Mean91.19419.69821.53249.066
Note: ‘-’ means the sequence was not successfully run entirely. Bold values represent the best results.
Table 4. Registration evaluation for different matching algorithms under the ETHZ Dataset.
Table 4. Registration evaluation for different matching algorithms under the ETHZ Dataset.
Absolute Motion Trajectory RMSE [cm]
Sequence SSPM [20] G-ICP [21] VG-ICP [22] 3D-NDT [19] MSBM (Ours)
Apart.2.148--14.7350.556
Haupt.1.9620.5331.2934.8920.621
Stairs1.5341.0821.5601.7090.464
Mount.3.07815.407-17.7270.901
Gazebo.S.1.4052.5525.52811.1930.741
Gazebo.W.1.1724.3263.3756.1250.346
Wood.S.1.75211.4354.96321.1790.500
Wood.A.2.3845.3437.69111.9510.744
Mean1.9295.8114.06811.1890.609
Note: ‘-’ means the sequence was not successfully run entirely. Bold values represent the best results.
Table 5. Accuracy evaluation of different SLAM systems under the KITTI-360 dataset.
Table 5. Accuracy evaluation of different SLAM systems under the KITTI-360 dataset.
Seq.A-LOAM [31]CT-ICP [27]VoxelMap [8]HMS-SLAM (Ours)
ATE [m] RPE [%] ATE [m] RPE [%] ATE [m] RPE [%] ATE [m] RPE [%]
00119.7530.63128.8470.810143.4712.84513.4930.498
02227.7850.83877.0140.851--25.5090.619
038.6800.2525.1080.650102.7540.2367.0060.556
04110.0630.633--85.8221.05133.2970.468
0522.3310.77514.0390.86420.1521.4676.8970.631
06135.0200.63719.7950.82537.8531.19312.6450.690
07400.6920.61695.6630.808247.9701.78517.7730.363
09236.3780.67031.1460.865144.8800.50612.1310.582
1034.2230.45034.9740.861101.0532.83710.8870.274
Mean114.3610.61138.3230.816110.4941.4915.5150.520
Note: ‘-’ means the sequence was not successfully run entirely. Bold values represent the best results.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, T.; Kong, L.; Ou, Y.; Xu, S. Accurate 3D LiDAR SLAM System Based on Hash Multi-Scale Map and Bidirectional Matching Algorithm. Sensors 2024, 24, 4011. https://doi.org/10.3390/s24124011

AMA Style

Ma T, Kong L, Ou Y, Xu S. Accurate 3D LiDAR SLAM System Based on Hash Multi-Scale Map and Bidirectional Matching Algorithm. Sensors. 2024; 24(12):4011. https://doi.org/10.3390/s24124011

Chicago/Turabian Style

Ma, Tingchen, Lingxin Kong, Yongsheng Ou, and Sheng Xu. 2024. "Accurate 3D LiDAR SLAM System Based on Hash Multi-Scale Map and Bidirectional Matching Algorithm" Sensors 24, no. 12: 4011. https://doi.org/10.3390/s24124011

APA Style

Ma, T., Kong, L., Ou, Y., & Xu, S. (2024). Accurate 3D LiDAR SLAM System Based on Hash Multi-Scale Map and Bidirectional Matching Algorithm. Sensors, 24(12), 4011. https://doi.org/10.3390/s24124011

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop