1. Introduction
Geological disasters are characterized by extensive destructiveness and widespread distribution, posing significant threats to the natural environment, infrastructure, and the safety of human lives and property [
1,
2,
3]. The investigation of landslide susceptibility primarily involves field survey monitoring [
4] and physically based modeling methods that incorporate shallow slope stability analysis and relevant material parameters [
5,
6]. Prior to complete destabilization, many geological disasters experience a sustained period of deformation. Therefore, the precise measurement of deformation in the early stages of geological disasters is crucial for determining whether landslides, hazardous rock masses, and other such phenomena are in a stable condition [
7,
8,
9].
According to the differences in implementation methods, the widely used monitoring technologies can generally be categorized into two types: contact and non-contact techniques. Contact techniques primarily include instruments based on the Global Navigation Satellite System (GNSS), inclinometers, and crack gauges [
10,
11,
12]. These techniques involve the direct installation of sensors and data transmission systems on the surface of the disaster-affected area, enabling the acquisition of local point surface deformation data with millimeter accuracy. However, the monitoring scope and effectiveness of these technologies are limited by the number and distribution of the instruments. Not only are the economic costs high but the spatial resolution is low, providing specific point information only at sampled locations [
13,
14].
In contrast, non-contact monitoring methods based on remote sensing technology can acquire comprehensive surface information from a distance in disaster-affected areas [
15,
16,
17,
18]. However, due to the complex terrain and severe obstructions typical of landslide regions, ground-based remote sensing methods, including total stations [
19] and terrestrial laser scanning [
20], often struggle to obtain comprehensive and effective observational data. Additionally, satellite monitoring methods face challenges related to long revisit intervals and insufficient resolution [
21,
22,
23]. Although differential InSAR and multi-temporal InSAR methods based on spaceborne SAR satellites are widely used for large-scale geohazard deformation monitoring with high accuracy, they also suffer from calculation errors due to phase decorrelation and insufficient spatial resolution [
24,
25]. In recent years, highly mobile rotary-wing unmanned aerial vehicles (UAVs) equipped with real-time kinematic (RTK) technology have been employed in landslide surveys [
26]. Using photogrammetric techniques, UAVs can achieve a fine reconstruction of landslide areas, thus enabling the more intuitive and safer monitoring of the overall deformation process in these regions [
27,
28,
29].
UAV remote sensing systems surpass satellite remote sensing in terms of data resolution, and they simultaneously offer better terrain adaptability compared to ground-based technologies [
30]. Consequently, many researchers have applied UAV photogrammetry to the monitoring of geological landslide [
31,
32,
33]. The methods based on UAVs mainly include two categories: two-dimensional (2D) and three-dimensional (3D) deformation analysis. Two-dimensional deformation analysis primarily involves generating multi-temporal, high-resolution digital elevation models (DEMs) through airborne LiDAR scanning or photogrammetry [
34]. Deformations in the vertical direction are calculated by directly comparing two DEMs [
35,
36]. This approach is straightforward and effective in minimizing the impact of outliers and high surface roughness, resulting in high precision of the calculated vertical deformations. However, when the direction of deformation extends beyond the vertical, this method struggles to accurately calculate the actual situation of the deformation [
37].
Three-dimensional deformation analysis methods typically rely on point clouds or three-dimensional meshes. A common practice involves converting a reference point cloud into a 3D mesh and then calculating the minimal distance from points to the mesh to describe 3D deformations [
38,
39], as exemplified by the cloud-to-cloud comparison [
40], cloud-to-mesh comparison, and the multiscale model used to model the cloud comparison methods (M3C2) integrated within the point cloud processing software CloudCompare v2.10 [
41]. However, these methods compute the Euclidean distance between reference points and the nearest target points, or between points and a 3D mesh [
42,
43], without considering whether the two points at the shortest distance truly correspond as homologous points. Therefore, these methods primarily calculate deformations perpendicular to the local surface, and when the direction of deformation is parallel to the surface or involves more complex motions, the results obtained may not accurately reflect the actual displacements [
44].
Utilizing UAV photogrammetry enables the rapid and low-cost acquisition of high-precision point clouds over large disaster-affected areas [
45]. Therefore, deformation can be calculated by matching corresponding points between mesh point clouds from different time phases using methods based on 3D features. Three-dimensional point matching methods are mainly divided into those based on traditional manual features and those based on deep learning features. Traditional manual features are typically obtained by extracting geometric properties, spatial distributions, or statistical histograms of the 3D shape [
46,
47]. However, these methods primarily rely on geometric statistical features, such as the spatial distribution of neighboring points and the angles between normals [
48,
49,
50,
51]. Given the complex and varied geometric shapes in landslide areas, similar geometric information in different areas can lead to mismatches, such as incorrect point location matching or one-to-many matching situations, making it difficult to directly apply these methods to 3D point matching in geological disaster scenarios such as landslides and hazardous rock masses. On the other hand, most deep learning based methods currently focus on point cloud registration [
52,
53]. In the point cloud registration process, mismatches of homologous points are not crucial, as the goal is simply to find a correct set of key point matches, which is not suitable for the precise calculation of displacement deformations. Furthermore, deep learning methods are data-dependent, requiring extensive training for specific scenarios. However, current training datasets are largely concentrated on indoor and autonomous driving environments, thus posing challenges in effectively generalizing these methods to complex geological landslide contexts.
This study aims to address identified deficiencies with the objective of establishing point-to-point correspondences between multi-temporal geological landslide areas, thereby accurately estimating deformation to promote the application of UAV photogrammetry in geological landslide monitoring. In the experiment, the proposed method was applied to perform 3D point matching and deformation estimation on the multi-temporal datasets of the Lijie north hill geological landslide area and evaluated the accuracy by using 130 pairs of manually measured checkpoints. The results demonstrate that our method outperforms existing state-of-the-art or classic methods in terms of matching accuracy and stability. Additionally, the absolute accuracy of the proposed method was evaluated by using actual measurements from GNSS stations, showing that the accuracy of our deformation estimates is better than 2.2 cm. The main contributions of this paper include the following aspects:
A 3D deformation estimation framework based on fused shape/texture information of real-world models is proposed, which constructs accurate point-to-point correspondences between multi-temporal mesh point clouds, thus enabling the precise calculation of 3D deformations in disaster-affected areas.
A real-scene 3D model feature descriptor that integrates shape and texture information is introduced, solving the issue of inaccurate 3D point matching in real-world models of landslide areas.
A method for extracting interest points using voxel-based thinning is proposed, which ensures uniform coverage of the entire landslide area while effectively improving computational efficiency.
We conducted a rigorous evaluation of the proposed methods through numerous manual checkpoints and compared them with the state-of-the-art and classical methods, with results demonstrating our approach’s superior performance in terms of quantitative matching accuracy metrics.
The rest of this paper is organized as follows. In
Section 2, the details of the 3D deformation estimation framework are explained.
Section 3 presents the experiments on real-world geological landslide dataset that demonstrate the advance of our framework numerally and visually. In
Section 4, the effect of different features within the framework and the limitations of the method are discussed.
Section 5 is the conclusion of this study.
2. Methods
The specific workflow of the proposed framework is depicted in
Figure 1. Initially, high-resolution UAV images of the landslide area collected at different times were processed through photogrammetry, and the generated multi-temporal 3D models were registered to maintain coordinate consistency [
54]. Subsequently, using the proposed method for extracting interest points using voxel-based thinning, a set of interest points uniformly covering the entire geological landslide area was extracted from the previous time phase 3D mesh point cloud. Next, based on the real-scene 3D model feature descriptor that integrates shape/texture information, the 3D homologous point matching of all interest points was conducted in the subsequent time phase 3D mesh point cloud. Finally, the 3D deformation was calculated based on the matched set of homologous points.
2.1. Voxel-Based Interest Points Extraction
In the conventional process of homologous point matching, feature detection is typically used firstly to identify feature points, including corners, edges, or areas with prominent textures, which are then described and matched. However, for geological landslide scenarios, since the areas where deformation occurs are unknown, traditional feature detection methods often fail to comprehensively cover all potential deformation areas. Therefore, based on 3D mesh point clouds, a method for extracting interest points using voxel-based thinning is proposed. It is important to emphasize that these interest points are extracted from the previous temporal mesh point cloud and are used for 3D point matching. By performing 3D point matching and deformation calculation on these interest points, the overall deformation of the landslide can be reflected based on the deformation results of these points.
Firstly, mesh point clouds were obtained from the reconstructed 3D real-scene models. Due to the highly regular flight paths of the UAVs, the density of the generated mesh point clouds is consistent across different areas. Moreover, since the shooting distances the UAV images are known, the resolutions of the obtained images can be easily calculated. The density of the generated point clouds can thus be estimated, and the voxel size can be determined accordingly. As shown in
Figure 2, by utilizing a voxel-based approach [
55], the point cloud was divided into multiple cubic regions and retained the point closest to the center in each cubic region, thereby uniformly thinning the point cloud. After completing the initial thinning, we increased the size of the voxels and then proceeded to the next round of thinning. Through this method, interest points of the disaster area that were thinned at multiple levels were obtained.
This voxel-based multilevel thinning method allows for the convenient selection of all points at a specific level as an interest point set. This method offers two major advantages: first, it significantly reduces the required number of interest points compared to the original mesh point cloud, which markedly enhances computational efficiency; second, the extracted interest point set can uniformly cover the entire geological disaster area, ensuring no potential deformation areas are missed while preserving the structure and shape characteristics of the point cloud data as much as possible. Moreover, once a deformation area is identified, point clouds from other levels, which possess higher point densities, can be utilized to gather additional points surrounding that area, enabling a more comprehensive analysis of the deformation.
2.2. Texture Feature Descriptor
Real-scene 3D reconstruction technology has gradually matured, and is capable of providing comprehensive information, including geometric shapes and image textures. In this context, 3D meshes represent the complete 3D geometric information of the reconstructed object, while textures are used to convey radiometric information.
For typical geological disaster scenarios, such as landslides or hazardous rock masses, where the terrain inclination is substantial, orthoimages generated by traditional methods (i.e., projecting onto a horizontal plane) may incur computational errors due to the compression of complex-shaped targets. To express the geometric structural information of complex-shaped targets more accurately, a method of local 2D mapping is employed for orthogonal projection around areas of interest point, creating local 2D images and thereby achieving the dimensional reduction of the 3D model. Since the projection surface of the local 2D images is the optimal spatial plane directly facing the ground scene (such as a cross-section of the landslide surface), it can minimize the projection distortion and enhances the representation of ground information. The process of generating texture feature descriptors from local 2D images is illustrated in
Figure 3.
Firstly, an appropriate plane for the area around the interest points is determined by fitting a plane to the surrounding mesh point cloud, thereby establishing the normal vector
of the spatial projection plane
, and then the rotation matrix
from the object coordinate system
to the projected plane coordinate system
is calculated by
Next, by calculating the coordinates of each vertex of the bounding box of the surrounding area, the starting point coordinates
are determined. The transformation relationship between any point in the area around the interest points, from coordinates
in the original coordinate system
to coordinates (X′, Y′, Z′) in the projected plane coordinate system
, is expressed as:
Finally, each triangular facet in the area surrounding the interest points is transformed into the local coordinate system and rasterized to produce a local 2D image of the area. After obtaining the local 2D image, the Histogram of Oriented Gradients (HOG) feature descriptor is adopted to extract features from the neighborhood pixels of the interest point in the 2D image, resulting in a feature vector corresponding to the pixel. Then, for the mesh of the subsequent time phase model, a local 2D image of equal width and length is also constructed centered around the interest point, and the HOG features are extracted from each pixel of this image, excluding those at the boundaries. Finally, feature matching is conducted, resulting in a set of correlation coefficients related to the interest point.
The point clouds used in this study are from the mesh of real-scene models, where the texture information for each triangular facet comes directly from the original images captured by UAVs, thus the resolution is consistent with the original images. In the process of performing local 2D mapping, interpolating from the original images to obtain the local 2D images ensures that the resolution of the local 2D images remains consistent with the original images, thereby enabling more precise point matching. During the matching of subsequent temporal data, to ensure that the sets of texture descriptors and shape descriptors correspond and remain consistent, thereby facilitating a more effective integration of these two types of descriptors, all pixels that have generated feature vectors in the 2D images are elevated to the 3D coordinate system. This elevation process allows us to obtain a candidate set of points for the shape descriptors.
2.3. Shape Feature Descriptor
Due to the long intervals between multi-temporal UAV data collections, variations in lighting conditions and seasonal differences among different datasets may lead to significant changes in the texture information of the same physical features, thus causing considerable errors in texture-based matching. To overcome this challenge, in addition to using texture information descriptors, a shape feature descriptor is also introduced for matching. A robust and resilient local reference frame (LRF) is constructed using a distance-weighted method for neighboring points to reduce the sensitivity to occlusions and noise points during the matching process. Furthermore, three different attributes are utilized to encode the position and orientation of each candidate point and integrate these details through histograms to minimize the error in homologous point matching as much as possible.
Figure 4 illustrates the construction process of the geometric feature descriptors.
For the candidate point set obtained in
Section 2.2, centered around point
, all points are collected within a radius
to construct a LRF system. The specific calculation is as follows: First, within a spherical region with a support radius
centered at
, all points in this region are defined as the spherical neighborhood points of
, collectively forming a 3D local surface:
To enhance robustness against noise and occlusions, a distance-weighted method is employed for nearby points, assigning greater weight to points closer to
. Thus, the covariance matrix
is represented as a weighted linear combination, expressed as:
where
represents the distance weight for the neighboring point
, normalized by the support radius
, with the calculation formula:
By performing an eigenvalue decomposition on , and sorting the resulting eigenvectors by the magnitude of the eigenvalues, the eigenvector corresponding to the largest eigenvalue is defined as the Z-axis, the second largest as the X-axis, and the smallest as the Y-axis.
After establishing the LRF, to eliminate the effects of rigid body transformations, the point set
is transformed into the LRF centered at point
. As shown in
Figure 4b, based on the transformed local surface data, the space is first divided into various local subspaces along the radial direction and then compute three geometric attributes within each subspace.
Figure 4c displays the schematic diagram of the local height for point
, which is calculated as follows:
where
represents the local height of point
, and
represents the vector from
to
.
represents the axes of the local reference coordinate system at point
, and the range of
is [0, 2
].
For neighboring point
, the definitions of the two angles
and
are as shown in
Figure 4d, and they are calculated as follows:
where
represents the normal of
. The ranges for both
and
are [0,
π].
After computing the three geometric attributes, the values of
,
, and
are statistically analyzed to create histograms for each subspace, as shown in
Figure 4e. Subsequently, histograms from all subspaces are normalized and concatenated, as illustrated in
Figure 4f, to generate the geometric descriptor vector for the candidate point
. The correlation coefficients for the candidate point set are then obtained by calculating the correlation between these histograms.
2.4. Features Integration
After calculating the correlation using both texture feature descriptors and shape feature descriptors, the correlation coefficient results
and
for candidate points are obtained. Subsequently, a weighted average method is employed to determine the final homologous point:
where
and
are the weights assigned to the texture feature and shape feature, respectively, determined based on factors such as the collection times of the multi-temporal data and the shooting environment.
represents the set of candidate points, and
denotes the highest weighted value among all candidate points, then the corresponding point is identified as the final homologous point.
3. Experiments and Results
3.1. Experiment Data
The performance of the proposed method for estimating 3D deformation was tested by applying it in the Lijie north hill landslide area, as shown in
Figure 5a. The Lijie north hill landslide is located in Lijie Town, Zhouqu County, Gansu Province, China. Due to regional tectonic activity and river erosion, the area features deeply incised valleys and steep mountains, with frequent geological disasters. The study area is approximately 300 m in length, 400 m in width, and has an elevation difference of nearly 300 m. A typical alluvial fan forms at the mouth of the gully, and some residential buildings of Lijie Town are constructed on these debris flow alluvial fans.
In this experiment, a DJI M300 RTK drone and DJI Zenmuse P1 camera were utilized to capture images over the Lijie north hill landslide area. UAV images of two temporal phases were acquired in March and September 2022, respectively. The parameters for both flights were kept consistent, with a flight altitude of 100 m, a forward overlap of 80%, and a side overlap of 50%. After data collection, the high-resolution images were processed to obtain the real-scene 3D model. It should be noted that in the process of multi-temporal model registration, we first manually selected some points from stable areas (i.e., areas without deformation) based on the results of the first period of aerial triangulation. These points were then used as control points for the aerial triangulation of the second period images by using the method proposed in [
54], thereby achieving the registration of the models from the two periods.
For general geological disaster deformation areas, monitoring using existing methods, such as GNSS base stations, typically only allows displacement monitoring at a few key points and does not cover most of the disaster area. Using such data, on one hand, it is difficult to conduct quantitative analysis of the areas outside the key points, and on the other hand, the scope is not comprehensive enough. Therefore, to meet the needs for detailed quantitative analysis of large deformation areas, we utilized the multi-temporal 3D data from the Lijie north hill landslide terrain to manually collect 130 pairs of homonymous points, uniformly distributed on two phases of 3D real-world models as manual checkpoints. These checkpoints were used as ground truth for quantitative calculations and data validation of different methods, as well as for performance evaluation. The specific distribution of the manual checkpoints is illustrated in
Figure 5c.
The detailed displacement parameters of the manually measured checkpoints are shown in
Figure 6. It can be observed that the displacement range is concentrated within 2 m. Additionally, the displacement variations among different checkpoints differ, thereby enhancing the diversity of the test.
3.2. Evaluation Metrics
This paper presents a method for estimating 3D deformations by matching real-scene 3D models using an integration of shape and texture mapping information. Three-dimensional point matching is the core aspect of deformation estimation, and the matching performance is primarily evaluated using the following three metrics:
Root mean square error (
RMSE)
of matching residual errors in XYZ directions: For the matched homologous point pairs, the RMSEs of matching residual errors in the XYZ directions are calculated for quantitative evaluation. For example, the calculation formula for the X direction is as follows:
where
is the total number of matched points,
is the X coordinate of the
matched point, and
is the X coordinate of the corresponding checkpoint.
Standard deviation of matching residual errors in XYZ directions: To evaluate the robustness of different methods in matching 3D homologous points, the standard deviations of the matching residual errors in the XYZ directions are calculated for all matched homologous point pairs.
Correct matching rate CMR: CMR refers to the rate of correctly matched point pairs to all point pairs. If the absolute value of the matching error in any XYZ direction (i.e., the difference between the deformation values calculated from automatically matched homologous points and those obtained through manually measured checkpoints) exceeds 10 cm, the point pair is considered incorrectly matched [
53,
56].
Additionally, in the comparative experiments of the interest point extraction, the evaluation is based on the number of extracted interest points, the extraction time, and the displacement point coverage. Displacement point coverage refers to the proportion of manual checkpoints covered by the extracted interest points. A manual checkpoint is considered covered if an extracted interest point is within a 2 m radius of the checkpoint.
3.3. Implementation Details
All experiments were conducted on a computer equipped with an Intel (R) Core (TM) i7-6700HQ 2.60 GHz CPU and 64 GB RAM. In the interest point extraction experiments, classic methods such as SIFT3D, Harris3D, and ISS3D were selected as baselines. For the 3D homologous point matching experiments, five representative 3D homologous point geometric descriptor methods were chosen for comparison, including 3DSC [
47], SHOT [
50], FPFH [
49], RoPS [
48], and SpinNet [
53]. Among these, 3DSC, SHOT, FPFH, and RoPS are widely cited classic local point cloud feature descriptors, which have demonstrated superior performance across multiple public datasets. SpinNet, a 3D feature descriptor based on deep learning networks, has recently achieved leading results on both indoor and outdoor public datasets.
In the interest point extraction comparison experiments, the parameters for the SIFT3D, Harris3D, and ISS3D methods were set to conventional values. In SIFT3D, six octaves are defined, with eight scale layers generated under each octave, and the contrast threshold is set at 0.01. The corner response threshold for Harris3D is set at 0.02. For ISS3D, the neighborhood radius is 0.2, the minimum eigenvalue threshold is 0.975, the corner response threshold is 0.24, and the non-maximum suppression radius is 0.16. In our proposed method, the initial voxel size is 0.1, with each subsequent layer doubling in voxel size, and the entire point cloud is thinned into seven layers.
In the 3D point matching comparative experiments, for our method in the texture descriptors, is set to 2 m, the number of orientations for HOG features is set to nine, the cell size is set to 8 m, and the block size is set to 2 m. In the shape descriptors, the support radius for establishing the LRF is set to 2 m, and it is divided into four subspaces radially. Additionally, the weights for the texture and geometric descriptors, and , are set to 0.7 and 0.3, respectively. For the methods 3DSC, SHOT, FPFH, and RoPS, the support radius is uniformly set to 3.0 m. For RoPS, the number of rotations and the number of histogram partitions are set to 3 and 5, respectively, and for 3DSC, SHOT, and FPFH, the normal vectors are calculated using 50 nearest neighbor points. In the SpinNet network, the descriptor radius is 1.0 m, and the number of partitions for radial, azimuthal, and elevation directions are 9, 80, and 40, respectively, with a voxel radius and the number of sampling points being 0.5 and 30, respectively. It is worth emphasizing that since our method utilizes geographic coordinate constraints, to ensure a fair comparison in the experiments, other methods also search for corresponding points within a 4 m radius of the interest points during the 3D point matching process.
3.4. Comparative Experiment on Interest Points Extraction
When calculating deformations in geological disaster scenarios, the exact locations of deformations are unknown, necessitating the comprehensive consideration of every area within the scene. This requires the uniform distribution of interest points across the entire scene.
Figure 7 presents the results of interest point extraction by other baseline methods and the proposed method, demonstrating that the proposed method extracts more evenly distributed interest points across the entire geological landslide test area and ensures more comprehensive coverage. In contrast, the points extracted by the other methods covered partial areas and did not achieve the complete coverage of the entire region. This may be due to the limited number of points with distinct geometric features within the entire geohazard area, which are primarily concentrated in certain regions.
Table 1 provides a quantitative analysis of the extracted interest points, where on the key metric of displacement point coverage, the proposed method achieves comprehensive coverage of all points in the shortest time, while the other three methods cover only a small number, failing to meet the needs for deformation calculations in large-scale geological disaster scenarios. Compared to other representative methods, the proposed method achieves the most interest points in the least amount of time, indicating an improvement in computational efficiency.
3.5. Three-Dimensional Point Matching Comparative Experiment
To quantitatively analyze the matching accuracy of the proposed method compared to other baselines, this section outlines the experiments and evaluations conducted on the Lijie north hill landslide area dataset, based on the multiple metrics described in
Section 3.2. The evaluation results are presented in
Table 2. It should be noted that, before calculating the RMSE, standard deviation and CMR for each method, in order to prevent gross errors from negatively affecting the results, this study eliminates outliers. The remaining homologous point pairs are then used to calculate the RMSE, standard deviation and CMR metric value.
As shown in
Table 2, all methods completed the 3D point matching process for the test area. In key performance metrics such as the RMSE, the standard deviation of matching errors and CMR in the XYZ directions, the proposed method exhibited superior performance, demonstrating greater accuracy and robustness in estimating 3D deformations in the landslide area compared to other baseline methods. This is because the proposed method combines texture features with shape features for 3D point matching. The inclusion of texture features allows the matching accuracy to reach pixel level, which is difficult to achieve with methods that only use shape features.
In terms of manual feature descriptors, the 3DSC, SHOT, and FPFH methods showed less than ideal values for the RMSE, standard deviation and CMR of matched 3D homologous points, indicating numerous mismatches during the matching process. This may be due to the complex shapes of the landslide area, where manually defined descriptor features struggle to adapt to such complexity. Additionally, the presence of planar point clouds with extremely similar geometric structures in the test area made it difficult for traditional manual features to extract effective shape information, also leading to mismatches. Furthermore, with a six-month interval between the two collections, the growth of weeds in localized areas could also be a factor contributing to matching errors. In contrast, the RoPS method considers multiple projections of point clouds, providing rich information about local shapes, effectively capturing, and describing complex geometric structures. When dealing with the point clouds of geological landslide areas with complex geometric features, RoPS is able to effectively distinguish different local shapes and structures, thus significantly outperforming the other three manual feature descriptors in terms of the RMSE, standard deviation, and CMR metrics. The SpinNet method did not fully realize the potential of deep learning networks under geological landslide scenarios, possibly due to significant differences in features between the data used for training the network and the data present in geological landslide scenarios, causing some difficulties in generalizing this method directly to such scenarios. If there are training data under geological landslide scenarios and the network training is improved by fine-tuning process (such as descriptor radius, the number of partitions for radial, azimuthal, and elevation directions), enabling the SpinNet network to better adapt to geological landslide scenarios, then its performance should be improved.
Figure 8 displays the visualization results of 3D homologous point matching in the Lijie north hill test area. In the visualization, green lines represent correctly matched point pairs, while red lines indicate incorrectly matched point pairs. Given the inherent difficulty of 3D homologous point matching, a matched point pair is considered incorrect if the absolute value of the matching error in any of the XYZ directions exceeds 10 cm [
53].
From
Figure 8, it can be seen that the majority of the matched point pairs obtained by the proposed method are represented by green lines, with only a few inconspicuous red lines, indicating that the method’s accuracy in 3D homologous point matching is very high, thus meeting the requirements for precise 3D deformation estimation in large geological landslide areas. In contrast, the results from 3DSC, SHOT, and FPFH methods show a significantly larger number of red lines, indicating numerous mismatches and higher error values. The RoPS method, however, produces fewer red lines compared to the other manual descriptor methods, demonstrating its advantage in capturing the complex geometric information of the geohazard area. Although the SpinNet method, which utilizes deep learning-based feature descriptors, theoretically has superior capabilities for extracting geometric information, the complex and variable geometric features in the test area differ significantly from the training data; thus, it does not demonstrate a clear advantage in 3D point matching, and the number of red lines is also relatively high.
Figure 9 displays the results of point matching in a local area of the Lijie north hill using the proposed method and baseline methods. A higher overlap between the yellow and red points indicates greater accuracy in point matching process. It can be seen that the points matched by the proposed method almost completely coincide with the ground truth points, demonstrating the high accuracy of the proposed method. Similarly, the RoPS method also exhibits relatively high accuracy, while the SpinNet, SHOT, and FPFH methods show a reduced overlap between the matched points and the ground truth points, indicating lower matching accuracy compared to the proposed and RoPS methods. The homologous points matched by the 3DSC method are at a greater distance from the ground truth points, indicating poorer accuracy in point matching process.
Figure 10 displays the heat maps of the distribution of residual errors for 3D homologous point matching across the XYZ directions, using different methods. The horizontal axis represents the magnitude of the matching residuals, while different colors indicate the percentage of matched points within that residual range relative to the total number of matches.
It is evident that the residuals obtained by the proposed method are predominantly concentrated within the 0–0.02 range, showing a distribution that is significantly better than that of other baseline methods. The residuals distributions for the manual feature descriptor methods 3DSC, SHOT, and FPFH are less favorable, with most concentrated between 0.08 and 0.2, and residuals also spread across other intervals. This reflects the suboptimal robustness of these methods in 3D matching, due to the complex geometric structures of the terrain in the test area, coupled with some geometrically nondescript planar areas, leading to unstable matching performance. In some areas, these methods may adapt well, while in others, they struggle to capture the correct feature information, resulting in larger matching residuals. The RoPS method shows a notably better overall residual distribution in the XYZ directions compared to other manual feature descriptor methods, indicating higher accuracy in 3D matching. Limited by the geometric feature extraction capabilities not generalizing well to geological landslide scenarios, the residual distribution for the SpinNet method is less than ideal.
3.6. Comparison with GNSS-Measured Values
To validate the effectiveness of the proposed method, the deformation measurements of the proposed method were compared with the measurements from the GNSS monitoring points distributed in the multi-temporal Lijie north hill landslide area, which are regarded as ground truth. The locations of these GNSS sites are shown in
Figure 11.
By selecting interest points at each of the five GNSS sites and performing 3D homologous point matching, the displacement distances for these points were then calculated based on the matching results and compared with the measurements from the GNSS stations. The results are shown in
Table 3.
The results indicate the deformation estimations of the proposed method are closely aligned with the GNSS measurements, with a RMSE of 2.26 cm, and the monitoring errors in the XYZ directions are mostly less than 2 cm. This demonstrates that centimeter-level regional deformations can be distinguished by using the proposed method.