Graph Matching for Underwater Simultaneous Localization and Mapping Using Multibeam Sonar Imaging

Zhuang, Lingfei; Chen, Xiaofeng; Lu, Wenjie; Yan, Yiting

doi:10.3390/jmse12101859

Open AccessArticle

Graph Matching for Underwater Simultaneous Localization and Mapping Using Multibeam Sonar Imaging

Guangdong Provincial Key Laboratory of Intelligent Morphing Mechanisms and Adaptive Robotics, the School of Mechanical Engineering and Automation, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(10), 1859; https://doi.org/10.3390/jmse12101859

Submission received: 16 August 2024 / Revised: 15 September 2024 / Accepted: 15 October 2024 / Published: 17 October 2024

(This article belongs to the Special Issue Autonomous Marine Vehicle Operations—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

This paper addresses the challenges of underwater Simultaneous Localization and Mapping (SLAM) using multibeam sonar imaging. The widely used Iterative Closest Point (ICP) often falls into local optima due to non-convexity and the lack of features for correct registration. To overcome this, we propose a novel registration algorithm based on Gaussian clustering and Graph Matching with maximal cliques. The proposed approach enhances feature-matching accuracy and robustness in complex underwater environments. Inertial measurements and velocity estimates are also fused for global state estimation. Comprehensive tests in simulated and real-world underwater environments have demonstrated that the proposed registration method effectively addresses the issue of the ICP algorithm easily falling into local optima while also exhibiting excellent inter-frame registration performance and robustness.

Keywords:

underwater SLAM; multibeam sonar; registration; Graph Matching

1. Introduction

In recent years, autonomous underwater robots have been used in extensive applications, such as pipeline inspection, geological structure mapping, rescue operations, and resource exploration [1]. These robots’ autonomous positioning and navigation control in complex environments necessitate environmental modeling and robot state estimation as inputs. Particularly in ocean environments that still need to be mapped, the limitations of communication infrastructure and remote control operations make autonomous exploration capabilities even more critical. Simultaneous Localization and Mapping (SLAM) technology is vital in this context. The Global Navigation Satellite System (GNSS) and other communication systems, such as Wi-Fi or electromagnetic waves, are significantly attenuated and thus unusable in underwater settings. Autonomous Underwater Vehicles (AUVs) primarily rely on dead reckoning for navigation; however, this method tends to drift over time. While cameras can capture detailed underwater scenes, their effectiveness is constrained by water turbidity and manual control or data processing limitations in such environments. Sonar is a commonly used alternative sensor. However, challenges such as low signal-to-noise ratios, low resolution, uneven sound waves, and multi-path reflections hinder the accurate extraction and registration of target features. Therefore, we address these challenges and focus on solving positioning and mapping issues underwater robots face when navigating complex ocean environments.

Sonar-based underwater SLAM has been a subject of research for many years. Earlier studies have described sonar inertia-based SLAM systems for underwater reconstruction and navigation [2,3,4]. Corke et al. [5] compared acoustic and visual underwater positioning methods, demonstrating their applicability in specific scenarios. Subsequently, research on sonar-based underwater SLAM gradually progressed, with Folkesson et al. [6] utilizing blazed array sonar sensors for real-time feature tracking. A system employing resampled features from sub-sonar is detailed in [7]. The sunfish model launching robot described in [8] employs multibeam sonar, a Fiber Optic Gyroscope (FOG) Inertial Measurement Unit (IMU), an Acoustic Doppler Velocimeter (ADV), and a pressure depth sensor. While it has been successfully applied in cave exploration, its inter-frame registration performance remains less than optimal, relying primarily on trajectory estimation for position determination.

In this paper, to enhance the performance of inter-frame registration and address the issue of local optima due to the non-convexity of the Iterative Closest Point (ICP) algorithm, we have augmented the application of the Gaussian clustering and maximal clique algorithms, which have been previously studied, to adapt to the unique challenges faced by multibeam imaging sonar-equipped robots navigating cluttered underwater environments. Our approach encompasses the following:

We introduce a novel registration algorithm to effectively address the issue of local optima in inter-frame registration due to the non-convex nature of the Iterative Closest Point (ICP) algorithm.
We employed an enhanced Accelerated-KAZE (AKAZE) algorithm to improve feature-matching accuracy in sparse environments and along extended straight lines, thereby enhancing the robustness of SLAM.
We integrated an Inertial Measurement Unit and Doppler velocimeter data for the global state estimation of the robot, incorporating various pose design factors into a factor graph for back-end optimization.

2. Related Work

2.1. Point Cloud Registration

Point cloud registration algorithms can be categorized into two main approaches based on probabilistic fields and point-to-point matching. The probabilistic field approach often employs Gaussian Mixture Models (GMMs) to represent point clouds. In contrast, the point-to-point matching approach encompasses the well-established Iterative Closest Point (ICP) algorithm and its various extensions, as detailed by Segal et al. [9]. Besl and McKay conducted pioneering work in the registration of 3D shapes [10], while Chen and Medioni [11] concentrated on aligning range data for object modeling. Since then, numerous ICP variants have emerged. Due to the extensive body of literature on ICP algorithms, a complete and exhaustive introduction is beyond the scope of this paper. Although registration methods that ensure optimality have been published, they are relatively few and primarily rely on the Branch and Bound (BnB) algorithms. For instance, Li and Hartley [12,13] have proposed a 3D registration method that utilizes a Lipschitz-regularized L2 error function, which is minimized through BnB.

The widely recognized technique for field-based scan matching is the Normal Distribution Transform (NDT), which was first introduced by Biber and Strasser for the purpose of aligning two-dimensional point clouds [14]. This method has since been expanded to three dimensions by Magnusson [15]. Saarinen and colleagues developed the Normal Distribution Transform Occupancy Maps (NDT-OM), integrating NDT with the OctoMap framework crafted by Hornung and his team [16]. In response to the challenge of accurately aligning misaligned scans, Bouraine and others [17] have introduced a Particle Swarm Optimization approach for the Direct-to-Direct (D2D) method, known as NDT-P2D-PSO, which ensures a globally optimal solution and is capable of operating in real time for SLAM applications. However, in underwater sonar SLAM, NDT has some limitations. It employs a Cartesian grid to sample point clouds, which may result in an inadequate capture of the underlying structure of the point cloud, particularly when the scanned surface has irregular shapes. Therefore, we propose a new registration algorithm based on ICP, which provides a robust prediction matrix and effectively addresses the local optimal solution problem associated with ICP.

2.2. Underwater Sonar SLAM

In underwater settings, sonar images typically exhibit low resolution, low signal-to-noise ratios, and high ambiguity, characteristics that pose significant challenges to the crucial processes of automatic feature extraction and data association for underwater SLAM. To address these challenges, various methods have been introduced. Santos et al. [18] introduced a method based on topological Graph Matching, which utilizes Gaussian functions and the maximal clique algorithm to describe and match image features, demonstrating good robustness and rotational invariance. Eustice and colleagues proposed the first SLAM implementation using the Exact Sparse Extended Information Filter (ESEIF) [19], with features manually selected from sonar images, focusing on constructing a feature map to ensure real-time performance. Johannsson, Kaess, and Englot introduced the Normal Distribution Transform (NDT) to detect and match FLS image features [20], assuming a planar surface to address the ambiguity in elevation measurements. Similarly, Shin, Lee, and Choi engaged in seabed mapping, utilizing accelerated KAZE (AKAZE) features to process sonar images and demonstrated effective data association [21].

Huang and Kaess introduced the Acoustic Sensor-based Feature Motion (ASFM), which reconstructs 3D structures and sensor motion from multiple sonar viewpoints without making specific assumptions about the scene [22]. In their work, statistical analysis was conducted to identify and discuss degenerate cases that lead to ASFM failures, particularly analyzing which motion patterns result in poor constraints for the optimization problem. In 2018, Jie Li proposed a machine-learning-based loop closure method that accounts for the sparsity of sonar image features and the false data association, developing a reliable mathematical association method and demonstrating the robustness of the system [23]. In 2022, the Stevens Institute of Technology proposed an exploration framework based on virtual maps, which demonstrated favorable performance in simulated experiments [24]. However, practical applications necessitate the consideration of the costs associated with the construction and updating of virtual maps, as well as the resolution and noise levels of the sonar equipment.

Our research significantly enhances the navigation capabilities of multibeam imaging sonar-equipped robots in complex underwater environments, the robustness of the SLAM system, and the accuracy of global state estimation by introducing a novel registration algorithm to address the local optima issue of the Iterative Closest Point (ICP) algorithm, improving feature matching accuracy with an enhanced Accelerated-KAZE (AKAZE) algorithm, and integrating Inertial Measurement Unit and Doppler velocimeter data for factor graph optimization.

3. Inter-Frame Registration

To address the issue of ICP falling into local optima, we employ an inter-frame registration method based on the parallel Accelerated-KAZE (AKAZE) algorithm [25] and a newly proposed Graph Matching (GM) registration algorithm. This approach effectively performs sonar feature matching, adapts to rotational and long straight-line scenes, and accurately estimates the robot’s motion state. The two algorithms complement each other’s weaknesses.

3.1. Graph Matching ICP

The Iterative Closest Point (ICP) is the most widely used inter-frame registration algorithm, which is implemented by minimizing the following value function:

J (x_{t s}) = \sum_{p_{t} \in P_{t}, p_{s} \in P_{s}} {∥T_{t s} p_{s} - p_{t}∥}^{2} = \sum_{p_{t} \in P_{t}, p_{s} \in P_{s}} {∥R_{t s} p_{s} + t_{t s} - p_{t}∥}^{2},

(1)

where

p_{t}

represents a point in the target point cloud;

p_{s}

represents a point in the source point cloud;

P s

and

P t

represent the point sets of the source point cloud and the target point cloud, respectively;

T_{t s}

denotes the transformation matrix from the source point cloud to the target point cloud;

R_{t s}

indicates the rotation matrix that aligns the source point cloud with the target point cloud; and

t_{t s}

signifies the translation vector that positions the source point cloud relative to the target point cloud.

However, ICP can fall into local optimum issues due to its non-convexity [26]. The application of ICP matching for sonar images is often hindered by the low angular resolution of the sonar, the high noise levels of the imaging sonar, and the low accuracy of trajectory calculations. To solve the problem of ICP falling into a local optimal solution, we propose a new Graph Matching (GM) registration algorithm. This algorithm aims to solve the following equation:

{\tilde{x}}_{s}, T_{θ} = \underset{x_{s}, T_{θ}}{arg max} \sum_{p_{s} \in P_{s}} ⨿ (d (p_{s}) / n \leq ε),

(2)

where

{\tilde{x}}_{s}

represents the globally initialized source point cloud pose and

x_{s}

denotes the source point cloud pose that is being optimized.

T_{θ}

denotes the predicted transformation matrix of ICP.

ε

is the reprojection error convergence threshold, and n represents the total number of points in the source point cloud. d is the sum of distances and represents the cumulative measure used to search for the nearest point through the course of the last iterative transformation. ⨿ is an indicator function that equals one if the solution is successful and zero if the solution fails. Simultaneously, the definition of

d (p_{s})

is as follows:

d (p_{s}) = min_{p_{t} \in P_{t}} ∥R_{t s} p_{s} + t_{t s} - p_{t}∥,

(3)

where

p_{s}

refers to an individual point within the source point cloud, and

p_{t}

refers to a point within the target point cloud. The rotation matrix

R_{t s}

and the translation vector

t_{t s}

together form the transformation that maps points from the source coordinate system to the target coordinate system, facilitating the alignment and registration process between the two point clouds.

3.1.1. Sonar Image Processing and Feature Extraction

Initially, we employ a Constant False Alarm Rate (CFAR) algorithm [27] for initial filtering of the sonar imagery, followed by the removal of discrete points, and finally apply voxel filtering to downsample the point cloud for feature extraction, as shown in Figure 1. Subsequently, we apply the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm [28] for clustering. DBSCAN is distinguished by its ability to identify dense regions within a dataset and effectively partition them into distinct clusters, even in the presence of noise.

After downsampling the feature point cloud, clustering is performed using DBSCAN. When the search radius

ε = 0.06

and the minimum number of points in a cluster

p o i n t s_{m i n} = 4

, the results are depicted in Figure 2. After clustering, we filter the clusters based on the number of point clouds they contain to eliminate discrete noise points. Clusters with several point clouds greater than

P t s_{t h r e s h o l d}

are retained for subsequent Gaussian generation. Here,

P t s_{t h r e s h o l d}

represents the threshold for the number of point clouds. Next, a Gaussian block is generated for each cluster by traversing the clusters’ labels. Gaussian blocks are used to characterize the shape of each cluster, and the parameters of the Gaussian are defined as follows:

G A = \{μ X, μ Y, μ I, σ B, σ S, σ I, θ, N\},

(4)

where

μ X, μ Y

denote the positional center point of the Gaussian block in the image,

μ I

is the average intensity of the sonar image within the Gaussian block,

σ B

represents the length of major axis of the Gaussian block,

σ S

is the length of the minor axis of the Gaussian block,

σ I

is the standard deviation of sonar image intensity within the Gaussian block, N is the number of points in the cluster, and

θ

is the angle measured clockwise from the positive underwater x-axis to the major axis of the Gaussian block. We restrict

θ

to the range of

(- \frac{π}{2}, \frac{π}{2})

. Consistent with the standard Gaussian block approach, we employ the eigenvalues and eigenvectors derived from the covariance matrix

Σ_{x y}

to calculate

σ B

and

σ S

.

Compute the center

(μ X, μ Y)

of the point cloud for the cluster and the covariance matrix as follows:

(μ X, μ Y) = (\frac{1}{N} \sum_{i = 1}^{N} x_{i}, \frac{1}{N} \sum_{i = 1}^{N} y_{i}),

(5)

Σ_{x y} = [\begin{matrix} c o v (X, X) & c o v (X, Y) \\ c o v (Y, X) & c o v (Y, Y) \end{matrix}],

(6)

where X represents the set of values derived from the X coordinates of the points in the cluster minus

μ X

, and Y represents the set of values derived from the Y coordinates of the points in the cluster minus

μ Y

.

The eigenvalues and eigenvectors of the covariance matrix

Σ_{x y}

can be determined using QR decomposition, the Jacobi method, etc. Once solved, the eigenvector

ν_{m a x}

corresponding to the largest eigenvalue

λ_{m a x}

represents the direction of the major axis of the Gaussian block, while the eigenvector

ν_{m i n}

associated with the smallest eigenvalue

λ_{m i n}

indicates the direction of the minor axis, as shown in Figure 3.

Solving for angles

θ

and intercepts b

k = \frac{ν_{m a x_{y}}}{ν_{m a x_{x}}}, b = μ Y - k μ X, θ = a r c t a n (k) .

(7)

Eventually, the clusters derived from the clustering process are further refined into Gaussian blocks, each endowed with relevant parameter attributes. This facilitates the establishment of relationships between neighbors. Figure 2 illustrates the shapes of Gaussian blocks generated for each cluster. It demonstrates that our algorithm is adaptable to various feature shapes for generating Gaussian blocks.

3.1.2. Gaussian Descriptors and Matching

In the pursuit of robust feature representation in image motion, especially with the challenge of rotation invariance, we introduce a novel approach leveraging Gaussian descriptors and graph theory. Our method aims to encapsulate the spatial relationships between feature blocks through the construction of an undirected graph G. Maximal cliques within this graph serve as a stable structural framework to describe the complex relationships among adjacent feature blocks. These cliques, and their associated descriptors, are crucial for our method as they allow for a detailed characterization of the spatial configuration of Gaussian blocks.

After generating the Gaussian blocks, we construct an undirected graph G and define

G = (V, E)

, where

V = {G A_{1}, G A_{2}, G A_{3}, \dots, G A_{n}}

is a set of vertices representing the centers of the Gaussian blocks.

E = \{E_{1}, E_{2}, E_{3}, \dots E_{m}\}

is a set of edges in the graph, with each Gaussian vertex connected by edges to describe the direct relationships between Gaussian blocks. The attributes associated with the edges are defined as

E_{i} = \{θ_{e 1}, θ_{e 2}, ρ_{e}, G A_{s o u r c e}, G A_{t a r g e t}\}

, where

θ_{e 1}

is the angle between the positive direction of the major axis of the target Gaussian cloud and that of the source Gaussian cloud, and

θ_{e 2}

is the angle between the positive direction of the major axis of the source Gaussian cloud and that of the target Gaussian cloud. These angles are obtained by substituting characteristic information of the two Gaussian blocks into the following equation, with the range of

θ_{e}

limited to

(0, 2 π]

:

θ_{e 1} = a r c t a n (μ Y_{target} - μ Y_{source}, μ X_{target} - μ X_{source}) - θ_{s o u r c e},

(8)

θ_{e 2} = a r c t a n (μ Y_{source} - μ Y_{target}, μ X_{source} - μ X_{target}) - θ_{t a r g e t} .

(9)

To address the issue of quadrant unification during the actual solution process, we use the coordinate transformation method, as illustrated in the following formulas:

X_{q 1} = μ X_{target} - μ X_{source}, Y_{q 1} = μ Y_{target} - μ Y_{source},

(10)

[\begin{matrix} X_{p 1} \\ Y_{p 1} \end{matrix}] = {[\begin{matrix} cos (θ_{s o u r c e}) & - sin (θ_{s o u r c e}) \\ sin (θ_{s o u r c e}) & cos (θ_{s o u r c e}) \end{matrix}]}^{T} [\begin{matrix} X_{q 1} \\ Y_{q 1} \end{matrix}] .

(11)

Furthermore, we obtain

θ_{e 1}

from Equation (12). If

θ_{e 1} < 0

, we adjust

θ_{e 1} = θ_{e 1} + 2 π

, ensuring that the coordinates are unified into the interval

(0, 2 π]

. The method for obtaining

θ_{e 2}

is analogous to that for

θ_{e 1}

, with details omitted for the sake of brevity.

θ_{e 1} = a r c t a n (\frac{Y_{p 1}}{X_{p 1}}) .

(12)

In addition, in the edge constraints,

ρ_{e}

represents the Euclidean distance between two Gaussian center points, calculated using the following equation:

ρ_{e} = \sqrt{{(μ X_{s o u r c e} - μ X_{t a r g e t})}^{2} + {(μ Y_{s o u r c e} - μ Y_{t a r g e t})}^{2}} .

(13)

To construct an effective graph, we implement a criterion for establishing constraint edges: an edge is created between two Gaussian blocks if the Euclidean distance

ρ_{e}

between their centers is below a predefined threshold

R_{m a x}

. An increase in the value of

R_{m a x}

results in a more densely connected graph. This setting is conducive to establishing a Gaussian descriptor that quantifies the adjacency relationships between Gaussian blocks, as illustrated in Figure 4.

Our method is designed to describe the relationship between adjacent feature blocks, addressing the challenge of rotation invariance during image motion. This is achieved by leveraging the angle

θ_{e}

stored in the edge between two vertices, which depends on the relationship between the Gaussian block and the edge. When the vertex corresponds to the source Gaussian block (

G A_{s o u r c e}

), the angle is denoted as

θ_{e 1}

, and when it corresponds to the target Gaussian block (

G A_{t a r g e t}

), the angle is denoted as

θ_{e 2}

. Stated differently, provided the surrounding environment remains static and maintains its shape, the values of

θ_{e}

and

ρ_{e}

for adjacent Gaussian blocks will remain constant, irrespective of changes in time and the robot’s observational perspective. Figure 4 illustrates the effect of different

R_{m a x}

values on graph density, demonstrating that an increase results in denser Gaussian descriptor generation. However, this increased density corresponds to a longer relative computation time. Empirical tests have indicated that a setting of

R_{m a x} = 1.0

yields optimal results.

We constructed a graph containing an image’s Gaussian information and adjacency relationships through the above process. Then, we further optimize the filtering descriptors by searching for the maximal clique [29], which aids in establishing subsequent matching constraints. Given an undirected graph

G = (V, E)

, a subset

C = (V^{'}, E^{'})

of G is considered a clique. A maximal clique is a clique that is not a proper subset of any larger clique within the graph. This means no additional vertex from G can be incorporated to form a larger clique. Specifically, when a maximal clique contains the largest number of vertices among all maximal cliques in the graph, it is referred to as a maximum clique.

We employ the recursive Bron–Kerbosch algorithm [30] to identify maximal cliques. Our experiments reveal that a clique with three nodes yields the most stable cluster structure. When the number of nodes in a maximal clique is too small, such as two, the clique is insufficient to adequately describe the adjacency relationships among Gaussian blocks, leading to potential similarities that cannot be effectively distinguished. Conversely, identifying maximal cliques may become difficult when the number of nodes reaches four. This is attributed to the disappearance of some obstacles from the robot’s field of view during its movement, making it unsuitable for environments with sparse features. We seek a stable structure to address this issue by targeting a maximal clique with exactly three nodes. Figure 4 illustrates a schematic of the identified maximal clique.

Following the search for the maximal clique, we utilize it as a reference to extract the vertices it contains and to retrieve the associated edge information. Both the edge length

ρ_{e}

and the angle

θ_{e}

are encapsulated within the descriptor to characterize the maximal clique. The descriptor for the i-th vertex within the maximal clique is formalized in the following equation, where the data in descriptor

{descriptor}_{i}

are sorted in ascending order of angles.

{descriptor}_{i} = \{[ρ_{1}, θ_{1}], [ρ_{2}, θ_{2}], \dots, [ρ_{m}, θ_{m}]\} .

(14)

The obtained

{descriptor}_{1}, {descriptor}_{2}, \dots, {descriptor}_{m}

are then assigned to the corresponding attributes of

V e r t e x_{i}

to facilitate indexing the Gaussian center point and stored in

descriptors

. We propose an algorithm to align two sets of descriptors, which we call the GM (Graph Matching) algorithm.

Through the process above, we obtain the descriptors of the corresponding Gaussian blocks for the two frames. Next, we apply matching constraints to these descriptors: during the traversal of the i-th descriptor

{descriptor}_{1, i}

of

descriptors 1

in

f r a m e 1

and the j-th descriptor

{descriptor}_{2, j}

of descriptors

descriptors 2

in

f r a m e 2

[31], we define the similarity score, the index of the best matching vertex

b e s t_m a t c h

, and the optimal distance

d_{b e s t}

. Using a brute-force descriptor matching approach [32], the corresponding vertex

V e r t e x_{j}

of the

V e r t e x_{i}

is found in turn. The specific algorithm procedure is depicted in Algorithm 1.

Initially, we employ Algorithm 2, a similarity algorithm, to calculate the similarity score between the descriptors

{descriptor}_{i}

and

{descriptor}_{j}

of the two vertices using Formula (15), where m and n represent the m-th and n-th pieces of data in

{descriptor}_{i}

and

{descriptor}_{j}

, respectively. By adjusting the given angular threshold

θ_{d i f f}

and distance threshold

ρ_{d i f f}

, we effectively control the adjacency relationship between Gaussian blocks and mitigate the uncertainties associated with them. After calculating the similarity score, the optimal pairing vertex is identified using Algorithm 1.

d_{p o s}

is defined as the Euclidean distance threshold of the matched vertices. To enhance robustness, we adjust

d_{p o s}

by the robot’s movement velocity, which influences the keyframe distance generation conditions on the trajectory calculation side, as depicted in Formula (16). This adjustment ensures consistency, preventing matched points from diverging beyond the distance of the dead reckoning keyframe motion. Through these steps, a set of preliminary matching points is obtained. Subsequently, the ICP algorithm [33] is employed to refine this set, resulting in the predictive transformation matrix and addressing the issue of the ICP algorithm converging to a local optimum.

|θ_{i, m} - θ_{j, n}| \leq θ_{d i f f}, |ρ_{i, m} - ρ_{j, n}| < ρ_{d i f f},

(15)

\sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2}} \leq ξ_{p a s} .

(16)

Algorithm 1 Matching constraint algorithm

Require: Descriptor set for two pictures

descriptors 1

and

descriptors 2

and abnormal distance threshold

d_{p o s}

1:

descriptors 1 = \{descriptor 1_1, descriptor 1_2, \dots, descriptor 1_m\}

,

descriptors 2 = \{descriptor 2_1, descriptor 2_2, \dots, descriptor 2_n\}

2:

connections = \emptyset

3: for every

descriptor 1_i

in

descriptors 1

/*i

\in [1, m]

*/
4:

m a x_s c o r e \leftarrow - 1

5:

b e s t_m a t c h \leftarrow - 1

6:

d_{b e s t} \leftarrow - 1

7: for every

descriptor 2_j

in

descriptors 2

/*j

\in [1, n]

*/
8: /*Calculate the similarity between two descriptors*/
9:

s c o r e \leftarrow c a l c u l a t e_s c o r e

10: if

s c o r e > m a x_s c o r e

then
11: /*Calculate the similarity between two descriptors*/
12:

d^{*} \leftarrow d i s t a n c e (descriptor 1_i, descriptor 2_j)

13: if

d^{*} \leq d_{p o s}

then
14:

m a x_s c o r e \leftarrow s c o r e

15:

b e s t_m a t c h \leftarrow j

16:

d_{b e s t} \leftarrow d^{*}

17: end if
18: else if

s c o r e = = m a x_s c o r e

then
19: if

d^{*} < d_{b e s t} and d^{*} < d_{p o s}

then
20:

b e s t_m a t c h \leftarrow j

21:

d_{b e s t} \leftarrow d^{*}

22:                          end if
23:                   end if
24:       end for
25:       if

b e s t_m a t c h \neq - 1

then
26:

connections \leftarrow connections \cup (b e s t_m a t c h)

27: end if
28: end for
29:

connections \leftarrow r e m o v e_d u p l i c a t e_p o i n t_p a i r s

30: return

connections

Algorithm 2 Similarity scoring algorithm

Require: Two descriptors

descriptor 1_i = \{[ρ_{i, 1}, θ_{i, 1}], [ρ_{i, 2}, θ_{i, 2}], \dots, [ρ_{i, p}, θ_{i, p}]\}

and

descriptor 2_j = \{[ρ_{j, 1}, θ_{j, 1}], [ρ_{j, 2}, θ_{j, 2}], \dots, [ρ_{j, q}, θ_{j, q}]\}

, angle threshold

θ_{d i f f}

, score distance threshold

ρ_{d i f f}

1:

s c o r e \leftarrow 0

2:

m, n \leftarrow 0

3: /*m

\in [1, p]

, n

\in [1, q]

*/
4: while

m < l e n (descriptor 1_i)

and

n < l e n (descriptor 2_j)

do
5: /*calculate score*/
6:

θ^{*} \leftarrow |θ_{i, m} - θ_{j, n}|

7:

ρ^{*} \leftarrow |ρ_{i, m} - ρ_{j, n}|

8: if

θ^{*} \leq θ_{d i f f}

then
9: if

ρ^{*} < ρ_{d i f f}

then
10:

s c o r e \leftarrow s c o r e + 1

11: end if
12:

m \leftarrow m + 1

,

n \leftarrow n + 1

13: else if

θ_{i, m} < θ_{j, n}

then
14:

m \leftarrow m + 1

15: else
16:

n \leftarrow n + 1

17: end if
18: end while
19: return score

3.2. Parallel Inter-Frame Registration Algorithm Using AKAZE and GM Methods

When the environmental feature comprises a long straight line and the robot’s trajectory is parallel to this line, both the GM and ICP algorithms may degenerate and fail. In such scenarios, salient echo points typically emerge at the endpoints of the linear feature, which can be effectively tracked by the AKAZE (Accelerated-KAZE) algorithm [25]. The AKAZE algorithm is proficient in feature detection and description, leveraging the FED (Fast Explicit Diffusion) method to accelerate scale space construction and employing the M-LDB (Modified Local Difference Binary) paradigm to generate descriptors. When the robot’s trajectory is perpendicular to the environmental feature line, the AKAZE algorithm exhibits robust performance and high precision in feature tracking and matching. However, it encounters difficulties matching when the motion involves rotation [34].

The final inter-frame registration process is as follows: after the front end obtains the critical frame information, denoted as

f r a m e

, we initialize the waiting point cloud and set the current frame’s point cloud to

P_{s}

and the previous frame’s point cloud to

P_{t}

. Following initialization, we employ the GM and AKAZE algorithms to extract effective sparse points. We then apply the ICP algorithm for registration estimation on these sparse points and calculate the corresponding reprojection errors

ε_{G M}

and

ε_{A K A Z E}

for the transformation matrices

T_{G M}

and

T_{A K A Z E}

, respectively. We select the transformation matrix associated with the smaller error and integrate it into the dense point clouds

P_{s}

and

P_{t}

for further ICP refinement, obtaining an accurate solution

{\hat{x}}_{t s}

.

Descriptor matching in the traditional AKAZE algorithm is performed using the brute force method, with the Hamming distance as a similarity metric, and a global search is conducted to identify the closest descriptor. However, the effectiveness of this approach is diminished when dealing with underwater sonar imagery, which is prone to multipath echoes and higher noise levels. We propose a method to eliminate false matching points to further improve the accuracy of matching. We sort the distance

d_{i}

between the matching points from small to large, record it as

d_{i} = [d_{1}, d_{2}, \dots, d_{n}]

, and calculate it using the following formula, thereby eliminating mismatches caused by similar features due to sonar reverberation. In the formula, k is the proportion coefficient, used to eliminate data with low confidence. The scaling factor

ξ

controls the extent of the culling space.

\bar{d} = \frac{1}{(1 - 2 k) n} \sum_{k n + 1}^{(1 - k) n} d_{i},

(17)

d_{i} < ξ \bar{d} .

(18)

4. SLAM System

Let

x_{i}

represent the state of the robot at the i-th timestamp, where

i \in \{0, \dots, N\}

. The robot state

x_{i} ≜ [\begin{matrix} R_{i} & t_{i} \end{matrix}] \in SE (2)

, where

R_{i} \in SO (2)

represents the direction of the robot at the i-th timestamp and

t_{i} \in R^{2}

represents its position [35]. To embed the environmental map into the state space, we depict an entity, such as a point cloud or salient image features, as a two-dimensional point feature. This feature is designated as

l_{j} \in R^{2}

, where

j \in {1, \dots, M}

. Then, the measurement result can be represented by

z_{k}

, where

k \in {1, \dots, K}

, and the landmark

l_{j k}

observed in the state of

x_{i k}

during the measurement process usually corresponds to a set of paired associations

(i_{j}, i_{k})

.

The objective of SLAM is to find the robot’s trajectory

X = x_{i}

, the position of the landmark map, and the landmark map

L = {l_{j}}

based on a set of data from the measurement set

Z = {z_{k}}

.

4.1. Sonar’s Field of View

The multibeam sonar M1200d, as utilized in this research, encompasses a three-dimensional spatial field of view for the sonar sensor. Given that the horizontal beam angle is

θ

, the elevation angle is

φ

, and the effective detection radius is

r

, as shown in Figure 5a. Imaging sonar transforms the observed three-dimensional space into a two-dimensional plane for visualization, as shown in Figure 5b.

4.2. Underwater Robot Motion Estimation

We harness an Inertial Measurement Unit (IMU), a Doppler Velocity Log (DVL), and a depth meter—three pivotal sensors that contribute to estimating the robot’s underwater motion posture. These three sensors are used to estimate the motion posture of the underwater robot, providing initial value constraints, which serve as tight coupling.

4.3. Factor Graph and Mapping

In the front-end calculations, pose estimation and mapping use the relationship between adjacent frames. However, relying solely on local constraints in a chained algorithm can lead to error accumulation and drift. Back-end optimization selects keyframes globally to mitigate this issue, establishing constraints across a larger spatial and temporal scale. This enables us to utilize the factor graph [31] from GTSAM for back-end optimization [36].

As shown in Figure 6, each keyframe’s pose is denoted by a black square and linked to feature points in the sonar imagery. The factor graph includes Sequential Scan Matching (SSM) and Non-Sequential Scan Matching (NSSM) factors. Sequential factors represent the incremental transformation relative to the previous pose, leading to error accumulation. On the contrary, non-sequential factors act as loop closures, connecting two separate poses to correct accumulated drift.

The fundamental principle of the factor graph is to estimate the maximum a posteriori (MAP) probability of the robot’s trajectory and the state of its surrounding environment based on dead reckoning and various sensor measurements [37]. This estimation is typically formulated as follows:

X^{*} = arg max_{X, M} P (X, M | U, Z),

(19)

where

U

signifies the robot’s control input,

Z

represents the measurement information,

X

denotes the robot’s posture, and

M

indicates the position of the observation point.

We assume that the SLAM system’s system model and observation model follow a Gaussian noise distribution and take into account the scaling factor, while also enhancing the fault tolerance of the SLAM system by incorporating switch variables in the nonlinear iterative process. Furthermore, by assigning weight values to loop closures, the system’s maximum a posteriori (MAP) probability estimation can be formulated as follows:

\begin{matrix} X^{*}, S^{*} = arg min_{X, S} & \sum_{i} {∥f (x_{i}, u_{i}) - x_{i + 1}∥}_{Σ_{i}}^{2} + \sum_{i j} {∥ψ (S_{i j}) \cdot [f (x_{i}, u_{i j}) - x_{j}]∥}_{Λ_{i j}}^{2} + \sum_{i j} {∥r_{i j} - s_{i j}∥}_{Σ_{i j}}^{2}, \end{matrix}

(20)

where

X^{*}

denotes the optimal estimate of the robot’s trajectory;

S^{*}

represents the optimal estimate for the switch variables;

X

refers to the trajectory of the robot;

S

symbolizes the switch variables that determine whether specific loop closure constraints are activated;

f (x_{i}; u_{i})

is the motion model function that predicts the next state of the robot based on its current state

x_{i}

and control input

u_{i}

at timestamp i;

x_{i + 1}

denotes the actual state of the robot at the subsequent timestamp;

Σ_{i}

represents the covariance matrix of the motion model, which characterizes the uncertainty in the predictions made by the motion model;

ψ (S_{i j})

is the function of the switch variable

S_{i j}

that adjusts the weight of the loop closure constraint;

u_{i j}

is the sequence of control inputs from timestamp i to timestamp j;

Λ_{i j}

represents the covariance matrix associated with the loop closure constraint, which describes the uncertainty of the loop closure;

r_{i j}

refers to the residual of the loop closure constraint;

s_{i j}

denotes the actual residual corresponding to the switch variable for the loop closure constraint; and

Σ_{i j}

represents the covariance matrix for the loop closure constraint.

5. Results

5.1. Underwater Platform and Datasets

We employ the BlueRov2 robot as the primary motion carrier, integrating the Oculus M1200d multibeam sonar, the DVL-a50 Doppler Velocity Log, and an IMU into its hardware suite, as shown in Figure 7a. Communication is enabled through a mini switch and a Raspberry Pi, with signals relayed to a distant PC via a carrier module. The multibeam sonar functions in a high-frequency mode at 2.1 MHz, offering a maximum field of view of 10 m, a horizontal aperture of 80°, a vertical aperture of 12°, and a maximum update frequency of 40 Hz. It boasts a range resolution of 0.007 m and an angular resolution of 0.4°. These sensor characteristics are detailed in Table 1. Sensor data are aggregated for seamless communication using the Robot Operating System (ROS). This enables the visualization of sonar raw data and image data, as well as the DVL data visualization module, as shown in Figure 7b. This setup is crucial for subsequent validation with the SLAM system.

As shown in Figure 8a, the indoor environmental dataset was collected within a pool, which measures approximately 21 square meters with dimensions of 3.5 m by 8 m by 1.5 m. To enhance the environmental complexity, large water buckets were strategically positioned as underwater obstacles, each measuring 0.31 m by 0.295 m by 0.45 m. An optical positioning tracking system, equipped with 11 optical cameras, was deployed in the experimental setup to accurately capture the robot’s motion trajectory. This setup facilitates error analysis by comparing the captured trajectory with that estimated by the SLAM algorithm. Additionally, a wide-angle camera was installed to monitor the robot’s motion, which is crucial for investigating the factors influencing the quality of sonar images for future enhancements. The sonar’s detection range was configured to 4 m with a gain compensation set at 15. A PID algorithm was implemented to govern the robot’s low-speed movement. To ensure planar motion, the robot received z-depth data from the motion capture system and utilized real-time feedback to maintain its movement at a constant height of 0.55 m above the water surface, enabling it to complete a full circuit.

As shown in Figure 8b, the outdoor environmental dataset was collected in the Dasha River, located within the University Town of Nanshan District, Shenzhen City. The experimental site featured a water depth of 2 m, abundant silt and sand, aquatic life, small harbors along the riverbanks, and various obstacles, including bridge piers, effectively simulating the natural environment for underwater robots. The robot’s buoyancy was adjusted using weight blocks throughout the experiment, and its movement was carefully controlled to maintain a consistent depth. The sonar angle was fine-tuned to face slightly downward to enhance detection capabilities. The robot was maneuvered in a counterclockwise circuit within the river to acquire the dataset.

5.2. Inter-Frame Registration Results

5.2.1. GM Results

After obtaining multiple clusters through clustering, the clusters are further generated as Gaussian blocks, each endowed with relevant parameter attributes to facilitate the establishment of adjacency relationships in the subsequent section. These Gaussian blocks are designed to capture the local geometric properties of the point cloud data, which are crucial for accurate feature matching and pose estimation. Figure 9 shows the shapes of the Gaussian blocks generated by each cluster, which can be seen to adapt to various shapes of features for generating Gaussian blocks. The adaptability of these blocks is a key aspect of our method, as it allows for a more robust representation of the underlying geometry, which is essential for handling complex scenes and varying environmental conditions.

After acquiring the descriptors of the two images

descriptors 2

and

descriptors 1

[31], the GM algorithm is employed for descriptor registration. Figure 10 illustrates the matching outcome between the source and target point cloud of the fifth frame, demonstrating high accuracy. This high accuracy is a result of the robust Gaussian block representation and the effective descriptor-matching strategy employed by our GM algorithm. The Gaussian blocks are effectively aligned, providing initial points for calculating the predicted transformation matrix

T_{0}

. The dense point cloud and the expected matrix are then substituted into the ICP for resolution. Figure 11a indicates that the traditional ICP algorithm encounters convergence issues by the 26th iteration, settling into a local minimum with a final error of 0.0623 m. This error is specifically related to the registration accuracy between the two point clouds. Figure 11b displays the result after ICP iterations, where the point clouds are nearly superimposed with an error of 0.036 m, achieving convergence around the 29th iteration. This suggests that the enhanced algorithm effectively addresses the issue of ICP getting stuck in local optima and failing to converge, thereby strengthening the system’s pose estimation accuracy.

5.2.2. AKAZE Results

In a monotonous environment with sparse sonar target features, the quality of adjacency relationship information is compromised, resulting in the generation of descriptor information that is not viable for use, as illustrated in Figure 12. This, consequently, causes the failure of the ICP solution. Moreover, the direct application of the ICP algorithm is typically ineffective in linear environments. Therefore, we employ the AKAZE algorithm to extract features from long, straight-line segments. As depicted in Figure 13b, target point clouds and registered source point clouds are nearly superimposed, indicating successful feature matching. Consequently, employing a frame-by-frame matching approach that leverages both the AKAZE and GM algorithms in tandem effectively addresses sonar feature matching. This method is adept at handling scenarios involving rotation and long, straight lines, facilitating the precise estimation of the robot’s motion state. The complementary nature of these algorithms compensates for the limitations inherent in each.

Figure 14 presents a comparative analysis of the traditional versus the improved culling method at the 37th frame. The prevalent engineering approach for eliminating incorrect matches is two times the Hamming distance method, typically utilized in terrestrial visual matching. However, underwater sonar imagery is subject to multipath echoes and higher noise levels, which render this method less effective, as shown in Figure 14a. Building upon this, our approach further refines the elimination of mismatches induced by echoes, thereby enhancing the registration precision, as shown in Figure 14b. The improved method is observed to effectively eliminate mismatches due to sonar echo characteristics, leading to a marked enhancement in registration accuracy and robustness.

5.3. SLAM Results

5.3.1. Indoor Environment Test Results

The SLAM algorithm presented in this work is denoted as GM SLAM. Additionally, we compare it with other state-of-the-art methods, e.g., Bruce SLAM [38]. Figure 15a illustrates the indoor positioning and mapping outcomes achieved by the GM SLAM algorithm. Here, colored markers represent mapped obstacles, including the pool walls and water tank obstacles. The gray grid delineates the area recognized as unobstructed space. In the indoor test, as the robot navigates two circles around the pool, the mapping results closely mirror the actual environmental layout, despite some observable positional drift. In contrast, as shown in Figure 15b, the Bruce SLAM algorithm’s performance in similar indoor environments is not as optimal, particularly during limited-range movements within complex or feature-rich settings. The GM SLAM demonstrates a superior ability to delineate the environment, offering enhanced positioning accuracy and robustness. Conversely, the Bruce SLAM’s frame-matching capabilities tend to be suboptimal, often failing to converge effectively and exhibiting a propensity for positional drift, leading to less precise mapping and localization outcomes.

Figure 16a presents a comparison between the predicted trajectory generated by GM SLAM, the trajectory produced by the existing Bruce SLAM algorithm, and the actual trajectory obtained from motion capture data. Utilizing the evo tool [39], the trajectories were aligned for a thorough analysis. The analysis reveals that GM SLAM experiences fewer frame-matching failures and exhibits superior robustness in complex environments. Bruce SLAM, however, shows more drift, indicating feature tracking issues and a higher rate of incorrect frame matches, which underscores the higher positioning accuracy of GM SLAM. Figure 16b presents an error analysis for the x, y, and z axes. The operational height of the robot is set at approximately 0.58 m, and the motion capture system boasts a measurement accuracy of 0.02 m. Given the assumption of horizontal plane movement, maintaining the robot at a consistent depth is challenging, resulting in deviations along the z-axis. These deviations, in turn, impact the accuracy of the x and y directions in SLAM.

We compared the positioning trajectories of GM SLAM and Bruce SLAM with the true trajectory, using Equation (21) to calculate the Absolute Pose Error (APE) and Equation (22) to calculate the Root Mean Square Error (RMSE) for analysis [40]. The smaller the RMSE value, the better the fit between the observation model and the actual observations, indicating that the SLAM algorithm’s estimated pose is more accurate.

A P E = \sqrt{1 / N \sum_{i = 1}^{N} {‖ trans (T_{g t, i}^{- 1} T_{est, i}) ‖}_{2}^{2}},

(21)

where N represents the total number of observation points,

T_{g t, i}

denotes the true pose transformation matrix for the i-th observation point, and

T_{est, i}

refers to its estimated pose transformation matrix.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}},

(22)

where

y_{i}

represents the observed value for each sample and

{\hat{y}}_{i}

represents the actual true value.

The Absolute Pose Error (APE) calculation results are detailed in Table 2, with Figure 17 illustrating the corresponding results for the estimated motion trajectories compared to the true values. Figure 17a illustrates that the main robot errors in GM SLAM occur during turning sections, which require further improvement. By comparing the trajectory errors of the three, it can be seen that the average error (mean) of the predicted trajectory of GM SLAM compared to the true trajectory is 0.118 m, with a root mean square error (rmse) of 0.132 m. In contrast, as shown in Figure 17b, the average error (mean) of the trajectory predicted by the Bruce SLAM algorithm is 0.189 m, with a root mean square error (rmse) of 0.247 m, both of which are greater than the errors of GM SLAM, indicating that the positioning accuracy of the GM SLAM algorithm is superior to that of the Bruce SLAM algorithm. Moreover, the standard deviation (std) of the trajectory of GM SLAM is 0.059 m, while that of the Bruce SLAM algorithm is 0.159 m. The other error items of GM SLAM are also generally smaller than those of Bruce SLAM, further demonstrating that the GM SLAM positioning algorithm has higher robustness and is superior to the Bruce SLAM algorithm.

5.3.2. Outdoor Environment Test Results

The sonar-observed objects from the outdoor experiment are depicted in Figure 18a, with straight lines representing the walls along the shore, circular objects representing underwater bridge piers, and other bright spots representing sand and stones on the seabed. We controlled the robot to navigate counterclockwise within the pool to collect the dataset, verifying the positioning and mapping feasibility using the GM SLAM algorithm. Figure 18b illustrates the mapping process, demonstrating that the algorithm can effectively capture and represent the shape and position of actual walls and bridge piers in the map.

As an outdoor environment lacks a ground truth trajectory, this experiment focuses solely on comparing the mapping effects. Figure 19 presents a comparison between the point cloud mapping results of the GM SLAM and Bruce SLAM algorithms. Figure 19a illustrates that the mapping of GM SLAM closely approximates the actual environment, and the robot’s positioning trajectory aligns closely with the expected controlled path. The trajectory is optimized by incorporating non-sequential factors into the factor graph and is further refined during loop closure detection when the robot revisits known locations. Figure 19b depicts the Bruce SLAM’s mapping, revealing certain discrepancies after a circular traversal, attributed to error accumulation and erroneous frame matching. Comparative analysis indicates that the GM SLAM provides superior mapping quality and greater accuracy than the Bruce SLAM.

6. Discussion and Conclusions

In this paper, we present a robust keyframe-based SLAM framework specifically designed for shallow water environments, employing a multibeam sonar. To combat the propensity of ICP matching to fall into local optima during sonar feature matching, we have introduced a GM algorithm that utilizes Gaussian clustering and maximal clique algorithms, designed to calculate a predictive initial transformation matrix for ICP, effectively reducing the risk of being ensnared in local optima. Furthermore, for cases in which features are sparse, we have implemented an enhanced AKAZE algorithm that filters out erroneous match points and enhances matching precision. Experimental validation has confirmed that our algorithm effectively tackles the ICP local optima issue and exhibits robust performance against rotational motion. A comparative analysis with the Bruce SLAM algorithm highlights the superiority of our approach in terms of both positioning and mapping accuracy, achieving impressive results.

Future endeavors will focus on fully leveraging the textural information of sonar by transforming it into semantic data, as the current method only utilizes edge point information and overlooks the sonar’s textural content.

Author Contributions

Conceptualization, W.L.; Methodology, L.Z., X.C. and W.L.; Software, L.Z.; Validation, L.Z. and X.C.; Formal analysis, L.Z.; Investigation, L.Z. and X.C.; Resources, W.L.; Data curation, X.C.; Writing—original draft, L.Z., X.C. and Y.Y.; Writing—review & editing, L.Z., W.L. and Y.Y.; Visualization, L.Z., X.C. and Y.Y.; Supervision, W.L.; Funding acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Natural Science Foundation of China #62003110, the Shenzhen Science and Technology Innovation Foundations #JCYJ20210324132607018, #JSGG20210420091804012, and #GXWD20220811163649003, the Science Center Program of the National Natural Science Foundation of China under Grant 62188101, and the open research fund of Guangdong Key Laboratory of Modern Control Technology.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, Y.; Ji, Y.; Woo, H.; Tamura, Y.; Tsuchiya, H.; Yamashita, A.; Asama, H. Acoustic camera-based pose graph slam for dense 3-d mapping in underwater environments. IEEE J. Ocean. Eng. 2020, 46, 829–847. [Google Scholar] [CrossRef]
Salvi, J.; Petillo, Y.; Thomas, S.; Aulinas, J. Visual slam for underwater vehicles using video velocity log and natural landmarks. In Proceedings of the OCEANS 2008, Quebec City, QC, Canada, 15–18 September 2008; pp. 1–6. [Google Scholar]
Beall, C.; Dellaert, F.; Mahon, I.; Williams, S.B. Bundle adjustment in large-scale 3d reconstructions based on underwater robotic surveys. In Proceedings of the OCEANS 2011 IEEE, Santander, Spainm, 6–9 June 2011; pp. 1–6. [Google Scholar]
Shkurti, F.; Rekleitis, I.; Scaccia, M.; Dudek, G. State estimation of an underwater robot using visual and inertial information. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 25–30 September 2011; pp. 5054–5060. [Google Scholar]
Corke, P.; Detweiler, C.; Dunbabin, M.; Hamilton, M.; Rus, D.; Vasilescu, I. Experiments with underwater robot localization and tracking. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Roma, Italy, 10–14 April 2007; pp. 4556–4561. [Google Scholar]
Folkesson, J.; Leonard, J.; Leederkerken, J.; Williams, R. Feature tracking for underwater navigation using sonar. In Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, CA, USA, 29 October–2 November 2007; pp. 3678–3684. [Google Scholar]
Fallon, M.F.; Folkesson, J.; McClelland, H.; Leonard, J.J. Relocating underwater features autonomously using sonar-based SLAM. IEEE J. Ocean. Eng. 2013, 38, 500–513. [Google Scholar] [CrossRef]
Richmond, K.; Flesher, C.; Lindzey, L.; Tanner, N.; Stone, W.C. SUNFISH®: A human-portable exploration AUV for complex 3D environments. In Proceedings of the OCEANS 2018 MTS/IEEE, Charleston, CA, USA, 22–25 October 2018; pp. 1–9. [Google Scholar]
Segal, A.; Haehnel, D.; Thrun, S. Generalized-icp. In Proceedings of the Robotics: Science and Systems, Seattle, WA, USA, 28 June–1 July 2009; Volume 2, p. 435. [Google Scholar]
Besl, P.; McKay, N. A Method for Registration of 3-D Shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
Chen, Y.; Medioni, G. Object Modeling by Registration of Multiple Range Images. In Proceedings of the 1992 IEEE International Conference on Robotics and Automation, Nice, France, 12–14 May 1992; pp. 2724–2729. [Google Scholar]
Li, H.; Hartley, R. The 3D-3D registration problem revisited. In Proceedings of the International Conference on Computer Vision, Rio De Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar]
Bazin, J.C.; Seo, Y.; Pollefeys, M. Globally optimal consensus set maximization through rotation search. In Proceedings of the Asian Conference on Computer Vision, Daejeon, Korea, 5–9 November 2012; pp. 539–551. [Google Scholar]
Biber, P.; Strasser, W. The normal distributions transform: A new approach to laser scan matching. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453), Las Vegas, NA, USA, 27 October–1 November 2003; Volume 3, pp. 2743–2748. [Google Scholar]
Magnusson, M. The Three-Dimensional Normal-Distributions Transform, an Efficient Representation for Registration, Surface Analysis, and Loop Detection. Ph.D. Thesis, Örebro University, Örebro, Sweden, 2009. [Google Scholar]
Hornung, A.; Wurm, K.; Bennewitz, M.; Stachniss, C.; Burgard, W. OctoMap: An efficient probabilistic 3D mapping framework based on Octrees. Auton. Robot. 2013, 34, 189–206. [Google Scholar] [CrossRef]
Bouraine, S.; Bougouffa, A.; Azouaoui, O. Particle swarm optimization for solving a scan matching problem based on the normal distributions transform. Evol. Intell. 2021, 15, 1–12. [Google Scholar] [CrossRef]
Santos, M.M.; Zaffari, G.B.; Ribeiro, P.O.; Drews-Jr, P.L.; Botelho, S.S. Underwater place recognition using forward-looking sonar images: A topological approach. J. Field Robot. 2019, 36, 355–369. [Google Scholar] [CrossRef]
Eustice, R.; Walter, M.; Leonard, J. Sparse extended information filters: Insights into sparsification. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; pp. 3281–3288. [Google Scholar]
Johannsson, H.; Kaess, M.; Englot, B.; Hover, F.; Leonard, J. Imaging sonar-aided navigation for autonomous underwater harbor surveillance. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 4396–4403. [Google Scholar]
Shin, Y.S.; Lee, Y.; Choi, H.T.; Kim, A. Bundle adjustment from sonar images and SLAM application for seafloor mapping. In Proceedings of the OCEANS 2015-MTS/IEEE, Washington, DC, USA, 19–22 October 2015; pp. 1–6. [Google Scholar]
Jiang, M.; Song, S.; Li, Y.; Jin, W.; Liu, J.; Feng, X. A survey of underwater acoustic SLAM system. In Proceedings of the Intelligent Robotics and Applications: 12th International Conference, ICIRA 2019, Shenyang, China, 8–11 August 2019; Part II 12. Springer: Berlin/Heidelberg, Germany, 2019; pp. 159–170. [Google Scholar]
Li, J.; Kaess, M.; Eustice, R.M.; Johnson-Roberson, M. Pose-graph SLAM using forward-looking sonar. IEEE Robot. Autom. Lett. 2018, 3, 2330–2337. [Google Scholar] [CrossRef]
Joshi, B.; Xanthidis, M.; Roznere, M.; Burgdorfer, N.J.; Mordohai, P.; Li, A.Q.; Rekleitis, I. Underwater exploration and mapping. In Proceedings of the 2022 IEEE/OES Autonomous Underwater Vehicles Symposium (AUV), Singapore, 19–21 September 2022; pp. 1–7. [Google Scholar]
Alcantarilla, P.F.; Solutions, T. Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans. Patt. Anal. Mach. Intell 2011, 34, 1281–1298. [Google Scholar]
Yang, J.; Li, H.; Campbell, D.; Jia, Y. Go-ICP: A globally optimal solution to 3D ICP point-set registration. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 2241–2254. [Google Scholar] [CrossRef]
Nitzberg, R. Constant-false-alarm-rate signal processors for several types of interference. IEEE Trans. Aerosp. Electron. Syst. 1972, AES-8, 27–34. [Google Scholar] [CrossRef]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. Density-based spatial clustering of applications with noise. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, USA, 2–4 August 1996; Volume 240. [Google Scholar]
Leordeanu, M.; Hebert, M.; Sukthankar, R. An integer projected fixed point method for Graph Matching and map inference. Adv. Neural Inf. Process. Syst. 2009, 22, 1114–1122. [Google Scholar]
Johnston, H. Cliques of a graph-variations on the Bron-Kerbosch algorithm. Int. J. Comput. Inf. Sci. 1976, 5, 209–238. [Google Scholar] [CrossRef]
Zass, R.; Shashua, A. Probabilistic graph and hyperGraph Matching. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
Zhou, F.; De la Torre, F. Factorized Graph Matching. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 1774–1789. [Google Scholar] [CrossRef]
Zhou, F.; De la Torre, F. Deformable Graph Matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2922–2929. [Google Scholar]
Tareen, S.A.K.; Saleem, Z. A comparative analysis of sift, surf, kaze, akaze, orb, and brisk. In Proceedings of the 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, 3–4 March 2018; pp. 1–10. [Google Scholar]
Censi, A. An accurate closed-form estimate of ICP’s covariance. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Roma, Italy, 10–14 April 2007; pp. 3167–3172. [Google Scholar]
Nabiyev, V.V.; Yılmaz, S.; Günay, A.; Muzaffer, G.; Ulutaş, G. Shredded banknotes reconstruction using AKAZE points. Forensic Sci. Int. 2017, 278, 280–295. [Google Scholar] [CrossRef]
Dellaert, F.; Kaess, M. Factor graphs for robot perception. Found. Trends^® Robot. 2017, 6, 1–139. [Google Scholar] [CrossRef]
Wang, J.; Chen, F.; Huang, Y.; McConnell, J.; Shan, T.; Englot, B. Virtual maps for autonomous exploration of cluttered underwater environments. IEEE J. Ocean. Eng. 2022, 47, 916–935. [Google Scholar] [CrossRef]
Olson, E.B. Real-time correlative scan matching. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 4387–4393. [Google Scholar]
Carroll, S.B. Evo-devo and an expanding evolutionary synthesis: A genetic theory of morphological evolution. Cell 2008, 134, 25–36. [Google Scholar] [CrossRef]

Figure 1. Filtering and feature extraction effects: The image on the left shows the raw data, the image in the center depicts the detected target features in orange, and the image on the right illustrates the target features interpolated from polar to Cartesian coordinates.

Figure 2. Point cloud clustered using DBSCAN and segmented into Gaussian blocks: The left column presents the point cloud after downsampling, the center shows the clustering effect, and the right depicts the resulting Gaussian blocks, color-coded for clarity.

Figure 3. Gaussian block parameters and adjacency relationships.

Figure 4. The impact of

R_{m a x}

on draph density.

Figure 4. The impact of

R_{m a x}

on draph density.

Figure 5. The geometric model of the multibeam sonar M1200d. (a) Sonar coordinate system. (b) Projection effects on sonar’s 2D plane.

Figure 6. Composition of the factor graph for SLAM back-end optimization: The two horizontal lines represent two different time periods. The black squares on the lines indicate the poses of the corresponding keyframes. The green spheres represent Sequential Scan Matching (SSM), while the orange spheres represent Non-Sequential Scan Matching (NSSM).

Figure 7. Underwater robot and sonar image example. (a) BlueRov2 Robot with Oculus M1200d sonar, DVL-a50, and IMU installations. (b) Visualization collection of sonar raw data and images.

Figure 8. Data collection environments. (a) The indoor environment. (b) The outdoor environment.

Figure 9. Point cloud segmentation into Gaussian blocks: The left column presents the point cloud after downsampling, the center shows the clustering effect, and the right depicts the resulting Gaussian blocks, color-coded for clarity.

Figure 10. Matching Results of the Source and Target Point Clouds in the 5th Frame: The source point cloud is depicted in blue, the target point cloud is in orange, and the green lines indicate the successfully matched pairs. Additionally, the generated Gaussian blocks are color-coded for clarity.

Figure 11. Comparison of ICP solution results before and after improvement. (a) Point cloud matching effects and error analysis pre-improvement: The first column illustrates the initial state of the source and target point clouds before ICP matching, and the second column in green demonstrates the transformed source point cloud after the initial alignment. (b) Point cloud matching effects and error analysis post-improvement: The first column presents the initial transformation effect of the source point cloud when applied with the predicted matrix, and the second column displays the refined alignment outcome after subsequent ICP iterations.

Figure 12. Failure of sonar feature matching for long straight lines.

Figure 13. AKAZE algorithm for long straight line features. (a) The matching result of the AKAZE algorithm for long straight lines. (b) Solution effects of the AKAZE algorithm for long straight line features: the source point cloud is represented in red, the target point cloud in blue, and the registered source point cloud in green.

Figure 14. Comparison of traditional culling method and improved method used in the 37th frame. (a) Matching results using the 2-times Hamming distance culling method. (b) Matching results using the improved culling method.

Figure 15. The mapping effect performed by GM SLAM and Bruce SLAM, with colored point cloud maps and trajectories. (a) The mapping effect performed by GM SLAM. (b) The mapping effect performed by Bruce SLAM.

Figure 16. Evo trajectory analysis results: The blue line represents the trajectory generated by the GM SLAM algorithm, the orange line indicates the trajectory generated by the Bruce SLAM algorithm, and the dashed line corresponds to the actual robot trajectory captured through motion capture technology. (a) Comparison of GM SLAM trajectory, Bruce SLAM trajectory, and true trajectory. (b) The error analysis for the x, y, and z axes.

Figure 17. Comparison of absolute pose error (APE) calculation results between GP SLAM and Bruce SLAM. (a) Absolute pose error (APE) calculation results for GP SLAM: The left plot shows the APE calculation curve, and the right plot illustrates the error intensity. (b) Absolute pose error (APE) calculation results for Bruce SLAM: the left plot shows the APE calculation curve, and the right plot illustrates the error intensity.

Figure 18. Outdoor SLAM Process. (a) A schematic diagram of objects observed by sonar. (b) Sonar-observed outdoor sand and stone wall environment.

Figure 19. The mappingeffect performed by GM SLAM and Bruce SLAM, with colored point cloud maps and trajectories. (a) Point cloud mapping and robot positioning trajectory for GM SLAM. (b) Point cloud mapping and robot positioning trajectory for Bruce SLAM.

Table 1. Sensor parameter characteristics.

Sensor Type	Model	Frequency	Features
Sonar	Oculus M1200d	12.6 Hz	High resolution, high update rate
Doppler Velocimeter	A50	10 Hz	Minimal size, max speed 2 m/s, ±0.1 cm/s accuracy
Inertial Measurement Unit	BMI055	9.5 Hz	High stability
Motion Capture System	NOKOV	171 Hz	2048 × 2048 resolution, 4.2 M pixels, 5.2 ms latency, 52° × 52° FOV

Table 2. APE calculation results.

	GM SLAM/m	Bruce SLAM/m
max	0.360	0.944
mean	0.118	0.189
median	0.112	0.143
min	0.014	0.007
rmse	0.132	0.247
sse	9.559	33.796
std	0.059	0.159

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhuang, L.; Chen, X.; Lu, W.; Yan, Y. Graph Matching for Underwater Simultaneous Localization and Mapping Using Multibeam Sonar Imaging. J. Mar. Sci. Eng. 2024, 12, 1859. https://doi.org/10.3390/jmse12101859

AMA Style

Zhuang L, Chen X, Lu W, Yan Y. Graph Matching for Underwater Simultaneous Localization and Mapping Using Multibeam Sonar Imaging. Journal of Marine Science and Engineering. 2024; 12(10):1859. https://doi.org/10.3390/jmse12101859

Chicago/Turabian Style

Zhuang, Lingfei, Xiaofeng Chen, Wenjie Lu, and Yiting Yan. 2024. "Graph Matching for Underwater Simultaneous Localization and Mapping Using Multibeam Sonar Imaging" Journal of Marine Science and Engineering 12, no. 10: 1859. https://doi.org/10.3390/jmse12101859

APA Style

Zhuang, L., Chen, X., Lu, W., & Yan, Y. (2024). Graph Matching for Underwater Simultaneous Localization and Mapping Using Multibeam Sonar Imaging. Journal of Marine Science and Engineering, 12(10), 1859. https://doi.org/10.3390/jmse12101859

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Graph Matching for Underwater Simultaneous Localization and Mapping Using Multibeam Sonar Imaging

Abstract

1. Introduction

2. Related Work

2.1. Point Cloud Registration

2.2. Underwater Sonar SLAM

3. Inter-Frame Registration

3.1. Graph Matching ICP

3.1.1. Sonar Image Processing and Feature Extraction

3.1.2. Gaussian Descriptors and Matching

3.2. Parallel Inter-Frame Registration Algorithm Using AKAZE and GM Methods

4. SLAM System

4.1. Sonar’s Field of View

4.2. Underwater Robot Motion Estimation

4.3. Factor Graph and Mapping

5. Results

5.1. Underwater Platform and Datasets

5.2. Inter-Frame Registration Results

5.2.1. GM Results

5.2.2. AKAZE Results

5.3. SLAM Results

5.3.1. Indoor Environment Test Results

5.3.2. Outdoor Environment Test Results

6. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI