Next Article in Journal
Comprehensive Analysis of BDS/GNSS Differential Code Bias and Compatibility Performance
Previous Article in Journal
Performance Evaluation of Satellite Precipitation Products During Extreme Events—The Case of the Medicane Daniel in Thessaly, Greece
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

3D Point Cloud Fusion Method Based on EMD Auto-Evolution and Local Parametric Network

School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150006, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(22), 4219; https://doi.org/10.3390/rs16224219
Submission received: 15 September 2024 / Revised: 31 October 2024 / Accepted: 7 November 2024 / Published: 12 November 2024

Abstract

:
Although the development of high-resolution remote sensing satellite technology has made it possible to reconstruct the 3D structure of object-level features using satellite imagery, the results from a single reconstruction are often insufficient to comprehensively describe the 3D structure of the target. Therefore, developing an effective 3D point cloud fusion method can fully utilize information from multiple observations to improve the accuracy of 3D reconstruction. To this end, this paper addresses the problems of shape distortion and sparse point cloud density in existing 3D point cloud fusion methods by proposing a 3D point cloud fusion method based on Earth mover’s distance (EMD) auto-evolution and local parameterization network. Our method is divided into two stages. In the first stage, EMD is introduced as a key metric for evaluating the fusion results, and a point cloud fusion method based on EMD auto-evolution is constructed. The method uses an alternating iterative technique to sequentially update the variables and produce an initial fusion result. The second stage focuses on point cloud optimization by constructing a local parameterization network for the point cloud, mapping the upsampled point cloud in the 2D parameter domain back to the 3D space to complete the optimization. Through these two steps, the method achieves the fusion of two sets of non-uniform point cloud data obtained from satellite stereo images into a single, denser 3D point cloud that more closely resembles the true target shape. Experimental results demonstrate that our fusion method outperforms other classical comparison algorithms for targets such as buildings, planes, and ships, and achieves a fused RMSE of approximately 2 m and an EMD accuracy better than 0.5.

1. Introduction

Three-dimensional (3D) reconstruction is a critical step in the interpretation and processing of remote sensing images. Three-dimensional data can provide a more comprehensive, detailed, and accurate stereoscopic description, allowing for more intuitive and precise judgment and understanding. The 3D reconstruction of satellite remote sensing images has traditionally been within the research domain of photogrammetry, primarily relying on the results of satellite stereo image (SSI) matching combined with a rational function model to determine the spatial coordinates of object points. However, due to the resolution limitations of SSIs, the accuracy of reconstruction is often insufficient to meet the requirements for high-precision 3D descriptions of targets. With the development of deep learning technology and the use of large-scale training datasets in computer vision, it has become possible to use convolutional neural networks to reconstruct high-precision terrestrial targets in satellite remote sensing images. Sebastien et al. [1] proposed a method for 3D reconstruction of building targets based on SSIs. This method utilizes a U-net to extract the building’s polygonal outlines, combined with a digital terrain model and a digital elevation model, to accurately extract the building model. Yi et al. [2] introduced a machine learning-based method for automatic 3D building reconstruction and vectorization. This approach uses the digital surface model (DSM) and panchromatic images as inputs, combining conditional generative adversarial networks (cGANs) with semantic segmentation networks to extract building target models and construct roof polygons. Chen et al. [3] proposed a 3D extraction algorithm for object targets of SSIs, which leverages RFM constraint deformation reasoning networks and self-similar convolutions. This method integrates DSMs and SSIs to reconstruct the 3D structure of terrestrial targets.
Due to the limitations of the satellite observation perspective, the results obtained from a single pair of stereo images are often insufficient to effectively describe the complete 3D structure of the target. On the other hand, with the increasing volume of satellite image data, both the same and different satellites can perform multiple observations of the same object within a short period. This leads to multiple sets of 3D reconstruction results for the target structure. Therefore, it is necessary to develop an effective 3D point cloud fusion method, utilizing the multiple 3D reconstruction results obtained from different observations to achieve a more accurate fusion of the target’s 3D point cloud, ensuring the precision of the reconstructed 3D structure.
Three-dimensional data fusion methods are primarily divided into traditional methods and deep learning methods. Traditional methods mainly focus on 3D registration, where two sets of 3D point cloud are first unified in the same coordinate system and then fused using a certain measure or direct merging. The most classic algorithm in 3D point cloud registration is the iterative closest point (ICP) algorithm [4]. This algorithm works by finding corresponding points between the source and target point cloud, constructing a rotation and translation matrix based on these corresponding points, and transforming the source point cloud into the target point cloud’s coordinate system using the computed matrix. The error function between the transformed source point cloud and the target point cloud is then estimated. If the value of the error function exceeds a threshold, the above operations are iteratively performed until the specified error requirement is met. Based on this, Peng et al. [5] utilized ICP to register two cross-source point cloud from structure from motion (SfM) and light detection and ranging (LiDAR). However, their approach made several assumptions, including removing outliers and manually selecting targets. Zhou et al. [6] proposed a fast global registration (FGR) method to overcome the limitations of traditional ICP. This method provides precomputed fast point feature histograms (FPFHs) [7] between two surfaces, allowing dense alignment at each iteration without the need for initialization or correspondence searching while optimizing point registration. Tsin et al. [8] proposed a similar kernel-based matching approach where the registration between two sets of point cloud is formulated as maximizing kernel correlation (KC). Due to the weighted contributions of multiple linked dynamic points, the KC objective is smooth and exhibits unique convergence properties. However, this method is susceptible to noise and outliers within objects. Chen et al. [9] proposed a 3D fusion method for combining 3D data from laser scanning and SfM. By using a scale-based PCA-ICP algorithm, the scale differences between two viewpoints are eliminated, feature points are automatically extracted for precise registration, and multiple point cloud are registered using an optimized ICP method, achieving robust and accurate 3D scene reconstruction. Huang et al. [10] proposed a systematic method for registering cross-source point cloud using a scale normalization technique to eliminate scale issues and employing a new graph construction method to jointly reconstruct 3D structures from different sources. András et al. [11] introduced a point cloud blending and volumetric fusion method based on 3D tetrahedralization (3DT) ray projection, which unifies point cloud from SfM or multi-view stereo (MVS) with coarser but more complete point cloud from airborne 3D reconstruction. Hoegner et al. [12] fused point cloud reconstructed from ground-based laser scanners and RGB cameras with thermal infrared images mounted on robots for indoor 3D reconstruction.
Deep learning-based 3D data fusion methods are commonly used in 3D reconstruction across various scenarios. Typically, these methods generate the 3D structure of an entire scene by fusing multiple sets of reconstruction results (such as depth map, point cloud, or mesh) in the final step. Li et al. [13] proposed a dual-layer neural fusion network that leverages recent advances in neural implicit representation and neural rendering to fuse multi-view depth maps into a 3D structure. Jia et al. [14] developed a region attention mechanism for 3D reconstruction to learn high-quality correspondences between two coarse point cloud, which were then fused into a high-precision point cloud. Jia et al. [15] also introduced a gated network for point cloud data fusion that first predicts 3D point cloud from dual-view RGB images and then fuses two sets of point cloud obtained from different views. Liu et al. [16] proposed a method for generating 3D building models based on the fusion of 3D point cloud and mesh data. This method uses a designed multi-source 3D data quality evaluation network (MS3DQE-Net) to assess the quality of 3D meshes and point cloud and guides the fusion of 3D building models based on the evaluation results. Additionally, using deep learning to enhance the ICP algorithm for higher precision 3D registration and fusion has become a new hotspot in the field of 3D fusion in recent years. Wang et al. [17] proposed a deep closest point (DCP) network based on deep learning to address local optima and other challenges in ICP, which is used for point cloud registration. Hu et al. [18] presented a 3D point cloud enhancement optimization method that combines point cloud obtained from a binocular structured light scanner with those reconstructed by the Colmap algorithm. This method generates a more complete and higher-quality dense point cloud by integrating the coarse registration of the PointNetLK network with the fine registration of the ICP algorithm. Yookwan et al. [19] proposed a method for fusing point cloud extracted from depth values, which are directly measured by an infrared camera and estimated from RGB images of the same scene using an improved ResNet-50. They also introduced an information theory alignment strategy using cross entropy ICP for point cloud registration.
The typical 3D data fusion methods mentioned above still face the following challenges when applied to SSIs 3D data fusion:
  • SSIs data often have low resolution and limited observation angles, making it difficult to achieve the level of point cloud accuracy and density required by the typical 3D fusion methods mentioned earlier. Directly applying these methods can lead to a disordered target structure after fusion due to the insufficient accuracy and density of the original data.
  • Most of the typical fusion methods discussed above focus on constructing a measure to minimize the absolute spatial distance between the registration or fusion results and each source of data. However, minimizing absolute spatial distance does not necessarily ensure that the fused shape is closer to the true target. There may exist fusion results with a smaller absolute spatial distance but greater shape distortion. The objects considered in this paper, such as planes, ships, and buildings, have distinct inherent features in human cognition, and therefore require a higher degree of shape authenticity in the fusion results.
  • The typical fusion methods described above do not significantly increase the density of the point cloud after fusion, resulting in a substantial gap between the fused 3D structure and the true target.
To address the aforementioned challenges, this paper proposes a 3D point cloud fusion method based on Earth mover’s distance (EMD) auto-evolution and local parameterization network by combining the advantages of traditional methods and deep learning networks. The proposed method merges two sets of non-equally spaced point cloud data into a single set of 3D point cloud that more closely resemble the actual target shape and have a higher point cloud density. Our approach consists of two main stages. In the first stage, to tackle the fusion measurement problem, we introduce the EMD measure as a key metric for evaluating the fusion results. By incorporating steps such as point cloud geometric structure inheritance, point cloud auto-evolution, adaptive point cloud weighting, and symmetry constraints, we formulate an optimization problem for the EMD auto-evolution-based point cloud fusion method. Then, using an alternating iterative technique, we sequentially optimize the variables to produce preliminary fusion results. The second stage addresses the point cloud density issue. Based on the preliminary fusion results, a local parameterized network is constructed to optimize the fused point cloud. A joint loss is designed to train the network in an end-to-end manner. Specifically, the preliminary fused point cloud is parameterized into a 2D domain, where multi-scale pointwise features of the 3D point cloud are extracted and connected. After upsampling in the 2D parameter domain, the features are mapped back to 3D space through a linear transformation, yielding the optimized point cloud fusion result.
The main contributions of our work can be summarized as follows:
(1)
We propose a 3D point cloud fusion method based on EMD auto-evolution, which achieves the fusion of non-equally spaced point cloud data while ensuring that the fusion results more closely resemble the actual target shape.
(2)
We propose a point cloud optimization method based on local parameterization network that enhances point cloud density while recovering more of the points lost during the initial fusion process.
The remainder of this paper is organized as follows. Section 2 provides a detailed de-scription of the proposed method. Section 3 presents comparative experiments and a discussion of public data sets. Conclusions are drawn in Section 4.

2. The Proposed Method

2.1. 3D Point Cloud Fusion Based on EMD Auto-Evolution

To ensure the effectiveness of 3D point cloud fusion, it is necessary to establish a target-oriented accuracy metric for 3D point cloud. Commonly used metrics for evaluating the accuracy of 3D reconstruction in photogrammetry include mean elevation deviation and root mean square error (RMSE). However, these metrics are often insufficient for effectively assessing the accuracy of the 3D structure of a target. To accurately evaluate the correctness of a target’s 3D structure, this paper introduces the EMD measure, a typical metric in the field of computer vision, as an indicator of the accuracy of 3D point cloud reconstruction. Unlike commonly used distance metrics such as Euclidean distance or Chebyshev distance, the EMD measure is suitable for evaluating dissimilarity in non-uniformly sized data. This means it can be applied to assess the accuracy of 3D reconstruction when there is an inconsistency in the number of points within the point cloud. Additionally, EMD provides a shape dissimilarity measurement that aligns with human perception, making it particularly well suited for evaluating the accuracy of the 3D structure of a target.
Considering both the EMD measure and the practical need for immediate data acquisition and precise reconstruction from satellite data, this paper first constructs a 3D point cloud fusion method based on EMD auto-evolution. The method uses the 3D reconstruction results obtained from SSIs as a basis to achieve the fusion and updating of the target 3D point cloud, thereby ensuring the accuracy of the target’s 3D reconstruction results.
This work uses P a = [ P 1 a , , P 2 n a ] to represent one target 3D point cloud a reconstructed from a set of SSIs and P b = [ P 1 b , , P 2 m b ] to represent the same target 3D point cloud b reconstructed from a different set of SSIs. The fused target 3D point cloud is denoted by P = [ P 1 , , P k ] . For the fusion algorithm, it is essential not only to effectively reflect the geometric structure of the target at the current moment but also to preserve the original geometric structure of the target. Given that the EMD can effectively measure the dissimilarity in the geometric structure of point cloud, a point cloud dissimilarity measurement method based on EMD will be embedded into the fusion algorithm to ensure the structural similarity between the fused point cloud and the input point cloud. Additionally, due to the limited observation perspectives of SSIs, the target point cloud obtained from satellite observations may contain points with low confidence. To prevent low-confidence outlier points from adversely affecting the 3D fusion of the target, it is necessary to eliminate these outlier points as much as possible during the fusion process. Therefore, this paper proposes an EMD auto-evolution point cloud fusion method, which gradually approximates the geometric structure of the target point cloud during the evolution process and effectively reduces the influence of potential outlier points in the point cloud on the fusion result, based on the EMD measure. The final output is a preliminary fusion result with high confidence. The framework of proposed method is illustrated in Figure 1.

2.1.1. EMD Distance

The EMD was first introduced by Stanford University in 2000 [20], where it was demonstrated to effectively capture image dissimilarity in a manner consistent with human perception. EMD is suitable for comparing datasets of unequal length, which makes it particularly useful for assessing the similarity of point cloud when there is an inconsistency in the number of points. EMD is often used as a metric to measure the accuracy of 3D reconstruction. When applied to compare the dissimilarity between two target point cloud, it can be expressed as the following formula:
F = j = 1 2 m i = 1 2 n s i , j w i , j s . t . j = 1 2 n s i , j 1 ,   i = 1 2 m s i , j 1 j = 1 2 m i = 1 2 n s i , j = min ( 2 m , 2 n ) s i , j 0
where 2m and 2n denote the total number of points in each of the two point cloud, and the coefficient 2 is for ease of calculation. w i , j represents the cost of transporting the i-th point from the source point cloud to the j-th point in the target point cloud, and s i , j indicates whether the i-th point in the source point cloud corresponds to the j-th point in the target point cloud. Note that EMD distance does not assume m = n , meaning that the number of source point cloud and target point cloud may not be consistent, which makes it applicable for comparing non-equal length data dissimilarity. The goal of this formula is to find a flow that minimizes the objective function F.

2.1.2. Inheritance of Geometric Structure Information of Input Point Cloud

Considering that objects such as planes, ships, and buildings have distinct inherent characteristics in human cognition, simplified templates for these types of objects are constructed to ensure that the 3D data of objects obtained from different sources can be registered to the same coordinate system through 3D registration algorithms [4].
For the fused 3D point cloud P = [ P 1 , , P k ] , it needs to have a similar geometric structure to the two input point cloud P a = [ P 1 a , , P 2 n a ] and P b = [ P 1 b , , P 2 m b ] . If EMD distance is used to describe the geometric structure similarity of the target, it is necessary to minimize E M D ( P , P a ) and E M D ( P , P b ) , as shown in the following formula:
F * = j = 1 k i = 1 2 n s i , j a w i , j a + λ × j = 1 k i = 1 2 m s i , j b w i , j b s . t . i = 1 2 n s i , j a = 1 , i = 1 2 m s i , j b = 1 1 j = 1 k s i , j a + s i , j b 2 s i , j a 0 , s i , j b 0
where λ is the weight factor, s i , j a represents the correspondence between the i-th point of the reconstruction result of group a and the j-th point after fusion, s i , j b represents the correspondence between the i-th point of the reconstruction result of group b and the j-th point after fusion, w i , j a represents the Euclidean distance between the i-th point of the reconstruction result of group a and the j-th point after fusion, w i , j b represents the Euclidean distance between the i-th point of the reconstruction result of group b and the j-th point after fusion. The constraints i = 1 2 n s i , j a = 1 and s i , j a 0 are employed to ensure that the point cloud of reconstruction result group a establishes an injective relationship with the fused point cloud, thereby retaining as much of the original point cloud information as possible. Similarly, the constraints i = 1 2 m s i , j b = 1 and s i , j b 0 are employed to ensure that the point cloud of reconstruction result group b establishes an injective relationship with the fused point cloud, thereby maximizing the extraction of all the information from the newly obtained point cloud. The constraint condition 1 j = 1 k s i , j a + s i , j b 2 is used to ensure that each point in the fused result has a mapping relationship with the input point cloud, and that any point cannot link different point from the original target 3D reconstruction results.
Based on the above formula, the similarity between the fused point cloud P and the two input reconstruction result point cloud P a and P b can be calculated. However, relying solely on the formula cannot determine the specific values of the fused point cloud P. To obtain a fused point cloud with a geometric structure similar to the input point cloud, this paper constructs an auto-evolution point cloud fusion strategy, which iteratively updates the fused point cloud to effectively inherit the geometric structure information of the input point cloud.

2.1.3. Auto-Evolution of Point Cloud

Define the coordinates of the i-th point in the first set of reconstructed point cloud P a = [ P 1 a , , P 2 n a ] , as [ P i , x a , P i , y a , P i , z a ] . The coordinates of the i-th point in the second set of reconstructed point cloud P b = [ P 1 b , , P 2 m b ] , as [ P i , x b , P i , y b , P i , z b ] . The coordinates of the j-th point in the fused point cloud P = [ P 1 , , P k ] , as [ P j , x , P j , y , P j , z ] . To obtain the precise coordinates of the fused point cloud P, an auto-evolution point cloud fusion problem is formulated based on the evaluation of geometric structure similarity between the fused point cloud and the input point cloud, as shown in the following expression:
min s a , s b , P j = 1 k i = 1 2 n s i , j a [ P i , x a , P i , y a , P i , z a ] [ P j , x , P j , y , P j , z ] 2 + λ × j = 1 k i = 1 2 m s i , j b [ P i , x b , P i , y b , P i , z b ] [ P j , x , P j , y , P j , z ] 2
The self-evolutionary point cloud fusion problem includes the variables s a , s b , and P. The objective of this optimization is to search for the possible coordinates of point cloud P such that it maintains a geometric structure similar to that of the input point cloud. Given that this is a non-convex problem, an alternating iterative strategy is employed to solve it. During updating of the variable P, the variables s a and s b are kept fixed. The term auto-evolution refers to the process in which, during each iteration, the correspondence between the automatically searched point cloud P and the input point cloud is established for points with the same name, gradually converging to the optimal solution for the point cloud coordinates.

2.1.4. Weight Adaptation for Outlier Point Cloud

Due to errors such as incorrect stereo matching, local deformation of the target point cloud may occur. These local deformations, or outlier point cloud, cannot accurately reflect the geometric structure of the target. Therefore, in the process of point cloud fusion, smaller weights should be assigned to these outlier point cloud to mitigate their adverse effects. During the evolution of the point cloud, the correspondence between the fused point cloud and the input point cloud is established autonomously. Considering that the outlier point cloud deviate significantly from their correct coordinates, the Euclidean distance between the outlier point cloud and its corresponding merged point cloud will be higher than that between other corresponding point cloud. Based on this characteristic, an adaptive weighting technique is embedded in the evolution process of the point cloud to minimize the impact of outlier point cloud on the accuracy of the fused point cloud, as shown in the following equation:
min s a , s b , P , c a , c b i = 1 2 n c i a j = 1 k s i , j a [ P i , x a , P i , y a , P i , z a ] [ P j , x , P j , y , P j , z ] 2 + λ × i = 1 2 m c i b j = 1 k s i , j b [ P i , x b , P i , y b , P i , z b ] [ P j , x , P j , y , P j , z ] 2 s . t . c a T c a = 1 , c b T c b = 1 c i a 0 , c i b 0
where c a and c b are adaptive weighting factors. As fusion variables, they are iteratively updated during the auto-evolution of the point cloud. These factors adaptively select outlier point cloud and assign them smaller weights to reduce their impact on the accuracy of the fused point cloud, ensuring effective fusion of the input point cloud.

2.1.5. Point Cloud Symmetry Constraint

Many artificial structures exhibit obvious symmetry, such as planes, certain ships, and building targets, which can be considered symmetric about their principal planes. To improve the accuracy of target fusion, symmetry constraints are introduced during the point cloud fusion process, ensuring that the fused 3D point cloud retains the inherent symmetric structure of the target. On the other hand, symmetry constraints can reduce the number of point cloud to be fused, thereby improving the execution speed of the point cloud fusion algorithm.
Since the input point cloud has already been aligned to a simplified template point cloud, the input point cloud is symmetric with respect to the cross-sectional plane formed by the x-axis and z-axis. Consequently, the fused point cloud should also maintain symmetry with respect to the cross-sectional plane formed by the x-axis and z-axis. Thus, the fused point cloud should satisfy the constraints shown in the following equation.
[ P i , x l e f t , P i , y l e f t , P i , z l e f t ] = [ P i , x r i g h t , P i , y r i g h t , P i , z r i g h t ]
where [ P i , x l e f t , P i , y l e f t , P i , z l e f t ] represents the fused point cloud coordinates for y < 0, and [ P i , x r i g h t , P i , y r i g h t , P i , z r i g h t ] represents the fused point cloud coordinates for y > 0.

2.1.6. Point Cloud Fusion Problem Based on EMD Auto-Evolution

Combining the above aspects of geometric structure information inheritance, auto-evolution of point cloud, weight adaptation for outlier point cloud, and point cloud symmetry constraint, the point cloud fusion problem based on EMD auto-evolution is formulated as follows:
F = i = 1 n c i a , l e f t j = 1 k s i , j a , l e f t [ P i , x a , l e f t , P i , y a , l e f t , P i , z a , l e f t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 + λ × i = 1 m c i b , l e f t j = 1 k s i , j b , l e f t [ P i , x b , l e f t , P i , y b , l e f t , P i , z b , l e f t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 + i = n + 1 2 n c i a , r i g h t j = 1 k s i , j a , r i g h t [ P i , x a , r i g h t , P i , y a , r i g h t , P i , z a , r i g h t ] [ P j , x r i g h t , P j , y r i g h t , P j , z r i g h t ] 2 + λ × i = m + 1 2 m c i b , r i g h t j = 1 k s i , j b , r i g h t [ P i , x b , r i g h t , P i , y b , r i g h t , P i , z b , r i g h t ] [ P j , x r i g h t , P j , y r i g h t , P j , z r i g h t ] 2
where s a , l e f t , s a , r i g h t , s b , l e f t , and s b , r i g h t , respectively, represent the correspondence between the left half (i.e., y < 0) of the fused point cloud and the input point cloud group a, the right half (i.e., y > 0) of the fused point cloud and the input point cloud group a, the left half (i.e., y < 0) of the fused point cloud and the input point cloud group b, and the right half (i.e., y > 0) of the fused point cloud and the input point cloud group b.
This formula comprehensively encapsulates the four mechanisms mentioned above. As a result, the fused point cloud not only fully inherits the geometric structure of the input point cloud but also effectively prevents the adverse effects of outlier point cloud on the fusion process while preserving the inherent symmetric structure of the target point cloud. By solving this equation, the coordinates of the fused point cloud can be obtained.
Since solving Equation (6) is a non-convex optimization problem, an alternating iterative technique is employed to solve the equation effectively by sequentially updating the variable s a , s b , P , c a , c b .
First, the variables s a and s b are computed while keeping the other variables fixed. At this stage, the fusion problem is transformed into a linear programming problem, which can be solved using the typical interior point method or simplex algorithm.
Next, the variable P is calculated while keeping the remaining variables fixed. The fusion problem at this stage is represented as shown in Equation (7).
min P i = 1 n c i a j = 1 k s i , j a , l e f t [ P i , x a , l e f t , P i , y a , l e f t , P i , z a , l e f t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 + λ × i = 1 m c i b j = 1 k s i , j b , l e f t [ P i , x b , l e f t , P i , y b , l e f t , P i , z b , l e f t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 + i = n + 1 2 n c i a j = 1 k s i , j a , r i g h t [ P i , x a , r i g h t , P i , y a , r i g h t , P i , z a , r i g h t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 + λ × i = m + 1 2 m c i b j = 1 k s i , j b , r i g h t [ P i , x b , r i g h t , P i , y b , r i g h t , P i , z b , r i g h t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2
The above equation represents an unconstrained quadratic optimization problem, whose optimal solution can be directly obtained by setting the gradient of the objective function with respect to the variable to zero. The expression for the optimal solution of the variable P is given as follows:
[ P j , x l e f t , P j , y l e f t , P j , z l e f t ] = ( 1 i = 1 n c i a j = 1 k s i , j a , l e f t + λ × i = 1 m c i b j = 1 k s i , j b , l e f t + i = n + 1 2 n c i a j = 1 k s i , j a , r i g h t + λ × i = m + 1 2 m c i b j = 1 k s i , j b , r i g h t ) × ( i = 1 n c i a j = 1 k s i , j a , l e f t [ P i , x a , l e f t , P i , y a , l e f t , P i , z a , l e f t ] + λ × i = 1 m c i b j = 1 k s i , j b , l e f t [ P i , x b , l e f t , P i , y b , l e f t , P i , z b , l e f t ] + i = n + 1 2 n c i a j = 1 k s i , j a , r i g h t [ P i , x a , r i g h t , P i , y a , r i g h t , P i , z a , r i g h t ] + λ × i = m + 1 2 m c i b j = 1 k s i , j b , r i g h t [ P i , x b , r i g h t , P i , y b , r i g h t , P i , z b , r i g h t ] )
After calculating the point cloud coordinates according to Equation (8), the variables c a and c b are updated while keeping the other variables fixed. At this point, the fusion problem has been transformed into Equation (9).
min c a , l e f t , c b , l e f t , c a , r i g h t , c b , r i g h t i = 1 n c i a , l e f t j = 1 k s i , j a , l e f t [ P i , x a , l e f t , P i , y a , l e f t , P i , z a , l e f t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 + λ × i = 1 m c i b , l e f t j = 1 k s i , j b , l e f t [ P i , x b , l e f t , P i , y b , l e f t , P i , z b , l e f t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 + i = n + 1 2 n c i a , r i g h t j = 1 k s i , j a , r i g h t [ P i , x a , r i g h t , P i , y a , r i g h t , P i , z a , r i g h t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 + λ × i = m + 1 2 m c i b , r i g h t j = 1 k s i , j b , r i g h t [ P i , x b , r i g h t , P i , y b , r i g h t , P i , z b , r i g h t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2
where c a = [ c a , l e f t , c a , r i g h t ] and c b = [ c b , l e f t , c b , r i g h t ] . The fusion problem in Equation (9) can be solved using the Lagrangian dual function and the Karush–Kuhn–Tucker (KKT) conditions. The expression for the optimal solution is given in Equation (10).
c i a , l e f t = j = 1 k s i , j a , l e f t [ P i , x a , l e f t , P i , y a , l e f t , P i , z a , l e f t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 i = 1 n j = 1 k s i , j a , l e f t [ P i , x a , l e f t , P i , y a , l e f t , P i , z a , l e f t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 2 c i b , l e f t = j = 1 k s i , j b , l e f t [ P i , x b , l e f t , P i , y b , l e f t , P i , z b , l e f t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 i = 1 m j = 1 k s i , j b , l e f t [ P i , x b , l e f t , P i , y b , l e f t , P i , z b , l e f t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 2 c i a , r i g h t = j = 1 k s i , j a , r i g h t [ P i , x a , r i g h t , P i , y a , r i g h t , P i , z a , r i g h t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 i = 1 n j = 1 k s i , j a , r i g h t [ P i , x a , l e f t , P i , y a , r i g h t , P i , z a , r i g h t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 2 c i b , r i g h t = j = 1 k s i , j b , r i g h t [ P i , x b , r i g h t , P i , y b , r i g h t , P i , z b , r i g h t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 i = 1 m j = 1 k s i , j b , r i g h t [ P i , x b , r i g h t , P i , y b , r i g h t , P i , z b , r i g h t ] [ P j , x l e f t , P j , y l e f t , P j , z l e f t ] 2 2
By alternately iterating and updating the variables, the point cloud fusion algorithm terminates when the number of iterations exceeds a predefined threshold or when the value of the objective function converges earlier. The optimal fused point cloud coordinates P are then output.
F t ( s a , s b , c a , c b , P ) F t 1 ( s a , s b , c a , c b , P ) 2 ε
For asymmetric targets, the point cloud symmetry constraint can be ignored. The variable s a , s b , P , c a , c b is still updated sequentially, and the algorithm terminates when the number of iterations exceeds the threshold or when the objective function value converges early, outputting the optimal fused point cloud coordinates P.

2.2. Point Cloud Optimization Based on Local Parameterized Network

The aforementioned method enables the fusion of two sets of point cloud. However, the point cloud obtained before fusion are derived from SSIs and inherently exhibit issues such as sparsity. As a result, the preliminary point cloud obtained through EMD auto-evolution still suffers from low density, losing many points that effectively describe the target’s features. Additionally, as a simplified representation of 3D data, point cloud have limitations in visual quality compared to refined 3D products like meshes. Therefore, it is necessary to optimize the initial fused point cloud obtained in the previous step, enhancing its density and constructing mesh connections.
For targets with simple structures, local geometric optimization methods [21,22,23] can achieve good results on smooth surfaces with few features. However, for more complex targets such as planes, ships, and buildings, the unstructured and irregular nature of point cloud makes direct point cloud upsampling unfeasible. Currently, common approaches involve extracting point cloud features and performing upsampling at the feature level [24,25,26]. Although these methods can produce better results, they are largely influenced by image processing and rarely consider the target’s 3D geometric shape. As a result, various artifacts can be observed in their outputs. Moreover, these methods cannot directly obtain the mesh of optimized point cloud; a mesh model must be constructed using other mesh generation methods, which can introduce additional errors.
To address these issues, this paper proposes a point cloud optimization method based on local parameterization network. This approach enhances point cloud density while ensuring that the points lost during the initial fusion are largely recovered, enabling normal information extraction for mesh construction.
Given a point cloud P = [ P 1 , , P i , , P k ] with k points obtained from EMD auto-evolution evolution fusion and a specified upsampling factor R. Our goal is to generate a dense, uniformly distributed point cloud P R = [ P 1 , , P 1 + R , , P i , , P i + R , , P k , , P k + R ] with corresponding normals for each point. This point cloud should contain more geometric details and as many points from P a and P b as possible. Our method is based on local parameterization network, comprising four steps: parameterizing the 3D point cloud into a 2D domain, extracting and connecting multi-scale hierarchical point wise features of the 3D point cloud, upsampling the point cloud in the 2D parameter domain, and mapping the 2D samples back to 3D space through a linear transformation. Local parameterization involves dividing the input sparse point cloud into multiple paths using a farthest point sampling algorithm. The fundamental theorem of surface local theory states that the local neighborhood of points on a regular surface can be fully determined by the first fundamental form and the second fundamental form [27]. Therefore, our key idea is to learn local parameterization for each point rather than computing and learning expensive global parameterizations [28].
Firstly, the local neighborhood of a point P i in 3D space is parameterized into a 2D domain using the differential mapping ϕ : 2 3 , where ϕ ( 0 , 0 ) = P i and ( u , v ) represent a point in the 2D domain (as shown in Figure 2). The Jacobian matrix J ϕ = [ ϕ u , ϕ v ] provides the best first-order approximation ϕ ( u , v ) = ϕ ( 0 , 0 ) + [ ϕ u , ϕ v ] ( u , v ) T + O ( u 2 + v 2 ) of the mapping ϕ , where ϕ u and ϕ v are the tangent vectors defining the first fundamental form [27]. The normal at point P i can be calculated using the cross product N i = ϕ u ( u , v ) × ϕ v ( u , v ) = ϕ u ( 0 , 0 ) × ϕ v ( 0 , 0 ) . For any point P ^ = P i + J ϕ ( u 1 , v 1 ) T , since ( P ^ P i ) N i = 0 , it can be proven that point P ^ lies on the tangent plane of point P i . This allows a point ( u 1 , v 1 ) in the 2D domain to be mapped to a point P ^ on the tangent plane of a point P i in 3D space. Furthermore, we use the augmented Jacobian matrix J ^ = [ ϕ u , ϕ v , ϕ u × ϕ v ] to calculate the normal N i = J ^ ( 0 , 0 , 1 ) T and point P ^ = P i + J ^ ( u 1 , v 1 , 0 ) T . Finally, the distance between a point P ^ on the tangent plane and its corresponding point P in 3D is given by δ = P P ^ = κ 1 u 1 2 + κ 2 v 1 2 2 + O ( u 1 3 , v 1 3 ) , where κ 1 and κ 2 are the principal curvatures at point ϕ ( 0 , 0 ) , which are the eigenvalues of the second fundamental form. With this, we can reconstruct the local geometric relationship of point P i in the 2D parameter domain. In other words, based on the local neighborhood parameterization of P i , any point ( u , v ) in the 2D space can be correspondingly mapped to a point in the 3D space.
Specifically, as shown in Figure 3, given a local path P = [ P 1 , , P i , , P k ] of a 3D point cloud, the method first projects the local 3D point cloud onto a 2D parameter domain. In this 2D domain, new points u = [ u i 1 , , u i r , , u i R ] and v = [ v i 1 , , v i r , , v i R ] are generated for each point in the path. The normal N i = J ^ i ( 0 , 0 , 1 ) T of the path is then calculated. The newly generated points in the 2D parameter domain are subsequently mapped onto the tangent plane of P i using P ^ i r = P i + J ^ i ( u i r , v i r , 0 ) T . Finally, by calculating the displacement distance δ i r along the normal direction, the points P ^ i r are projected back onto the object in 3D space.

2.2.1. Multi Scale Hierarchical Feature Extraction

We apply DGCNN [29] to extract hierarchical point-wise features, which can encode both local and global intrinsic geometric information of the input blocks. The hierarchical feature learning module extracts features from low to high levels. Intuitively, as the receptive field increases, skip connections [30], which are widely used in 2D vision tasks to improve feature quality and convergence speed, help retain details at different levels. Additionally, instead of direct feature concatenation, feature reweighting through a self-gating unit is applied to enhance the features.
Let f i l denote the features extracted for point P i at level l ( l = 1 , , L ). We first concatenate all features across L layers, i.e., f i ^ = C o n c a t ( f i 1 , , f i L ) , where C o n c a t ( ) represents the concatenation operator. The directly concatenated features are then fed into a small multi-layer perceptron (MLP) h r ( ) to obtain the logits l o g i c i = ( l o g i c i 1 , , l o g i c i L ) , as follows:
l o g i c i = h r ( f i ^ )
These weights are further fed into a softmax layer to produce new weighted coefficients w i = ( w i 1 , , w i L ) , as given by:
w i l = e f i l j = 1 L e f i j
Finally, the multi-scale features are represented as a weighted concatenation:
f i = C o n c a t ( w i 1 c i 1 , , w i L c i L )

2.2.2. Point Cloud Upsampling Based on 2D Parameter Plane

In this module, we expand the input point cloud by R times. By regressing the obtained multi-scale features, a coarse dense point cloud and the corresponding coarse normals are generated. Specifically, the expansion process consists of two steps: first, adaptive upsampling is learned in the 2D parameter domain; then, the results are projected into the 3D tangent space through a learned linear transformation. The process is illustrated in Figure 4.
For each point P i , we apply an MLP g 1 ( ) to its local surface features f i to generate R new 2D coordinate points ( u i r , v i r ) , as follows:
( u i r , v i r ) = g 1 ( f i )
where r = 1 , , R . Furthermore, to fully utilize the information from the two pre-fusion point cloud P a and P b discussed in the previous section, the generation of the 2D coordinate points incorporates the corresponding 2D coordinates from P a and P b based on the adaptive weighting factors c a and c b .
This approach allows the incorporation of points from the original point cloud before fusion that contribute to refining the target shape into the fusion result while ensuring the new 2D points are generated uniformly in the 2D domain. However, due to the transformation relationship between the 2D parameter domain and the 3D point cloud space, these newly upsampled points are non-uniformly distributed on the 3D surface, which more closely aligns with real-world scenarios.
For each point P i in the 2D parameter domain, the local surface features f i are used to predict a linear transformation matrix J ^ , given by:
J ^ = g 2 ( f i )
where g 2 ( ) denotes the MLP. The J ^ is multiplied by the 2D coordinates ( u i r , v i r ) to map the points onto the tangent plane of P i , resulting in a coarse point cloud. These points lie on the tangent plane rather than the target surface.
P ^ i r = P i + J ^ i ( u i r , v i r , 0 ) T
At the same time, we also estimate a coarse normal, specifically the normal N i of the tangent plane for each input point. To achieve this, we multiply the linear transformation matrix J ^ by a predefined normal (0, 0, 1) that is perpendicular to the 2D parameter domain:
N i = J ^ ( 0 , 0 , 1 ) T

2.2.3. Mapping 2D Samples Back to 3D Space

Since the locally expanded point cloud on the 2D parameter plane is located on the tangent plane, it needs to be moved onto the target surface (typically a curved surface) and its normals updated. Specifically, each point P ^ i r in the point cloud is displaced along its normal N i by a distance δ i r . This distance δ i r is calculated by regressing the features of each point connected with the coarse coordinates, as follows:
δ i r = g 3 ( C o n c a t ( P ^ i r , f i ) )
where g 3 ( ) denotes the process of the MLP (multi-layer perceptron). The updated coordinates are then computed as:
P i r = P ^ i r + J ^ i ( 0 , 0 , δ i r ) T
We update the normals in a similar manner, and the normal displacement Δ N i r of point P i r is regressed as:
Δ N i r = g 4 ( C o n c a t ( P ^ i r , f i ) )
This value is further added to the corresponding coarse normal to obtain the refined normal:
N i r = Δ N i r + N i
where g 4 ( ) represents the MLP process.

2.2.4. Losses

In terms of loss, we designed a combined loss function to train the model end-to-end. Specifically, Q D is formed by selecting D points from the corresponding target high-density 3D point cloud ground truth, serving as the ground truth for P R . During training, we use the Chamfer distance (CD) to measure the coordinate error between P R and Q D , which is defined as:
L C D = 1 D P i r P R P i r η ( P i r ) 2 + Q d Q D Q d ψ ( Q d ) 2
where ψ ( Q d ) = a r g m i n P i r P R P i r Q d 2 and ψ ( Q d ) = a r g m i n P i r P R P i r Q d 2 .
For the normals, the ground truth normals of points P i in the 3D point cloud are denoted as N i ˜ , and the normals of points P R are denoted as N i ^ . These are considered as the ground truths for the coarse normals N i and refined normals N R , respectively. During training, we consider the error between both the coarse and refined normals:
L C O A R S E = i = 1 D L ( N i , N i ˜ )
L R E F I N E D = i = 1 D r = 1 R L ( N i r , N i ^ )
where L ( N i , N i ˜ ) = m a x { N i N i ˜ 2 2 , N i + N i ˜ 2 2 } measures the unoriented difference between two normals. Finally, the combined loss function can be expressed as:
L A L L = λ 1 L C D + λ 2 L C O A R S E + λ 3 L R E F I N E D
where λ 1 , λ 2 , λ 3 represents three positive parameters.

3. Experimental Results and Discussion

3.1. Data Sets, Metrics, and Implementation Details

  • Dataset
The testing dataset A and all training data used in this paper were self-constructed through purchases and internet collection. The 3D data used for fusion consist of 3D point cloud data reconstructed from SSIs using a target 3D reconstruction algorithm [3]. The SSIs were obtained from the SuperView-1 and Jilin-1 satellite, covering four regions: China (Harbin), Japan, the United States, and Qatar. The 3D ground truth models for evaluating the fusion results were acquired through UAV oblique photogrammetry, Google Earth 3D, and internet collection.
Since there is no publicly available dataset for satellite-based 3D data fusion, we indirectly validated the fusion performance of our proposed method using a set of publicly available datasets for stereo matching and DSM fusion. Dataset B is a public dataset from the 2019 IGRASS Data Fusion Contest and contains World-View satellite data with a resolution of 0.35 m and DSM ground truth values obtained by LiDAR.
  • Metrics
The experiment uses the EMD to evaluate the similarity of the fusion result to the actual target shape. A smaller EMD value indicates that the fusion result is closer to the true shape. For both the real 3D point cloud model and the fused 3D point cloud, the length between any two points in the point cloud is calculated, and half of the maximum length is used as the normalization radius. The center point coordinates of these two points are taken as the sphere center coordinates. All coordinates of the point cloud are then subtracted by the sphere center coordinates and divided by the normalization radius to obtain the normalized 3D point cloud. The EMD is then calculated as follows:
E M D = min φ : P ^ R P ^ t r u t h P ^ R 1 k × R P ^ R P ^ t r u t h 2
where P R represents the normalized fused point cloud, P t r u t h represents the normalized point cloud ground truth, φ represents the unidirectional mapping formed by each point pair between P R and P t r u t h , and k × R denotes the number of points in the fused point cloud.
While EMD can only evaluate the accuracy of the target shape, the 3D data of ground targets obtained from satellite imagery in this study have clear physical attributes. Therefore, in addition to EMD, the root mean square error metric is added to describe the spatial positional accuracy of the fusion result.
The calculation of the RMSE metric requires selecting all N points from the fused 3D point cloud as test points and comparing them with the corresponding points in the ground truth. The calculation method is shown below:
R M S E = 1 N P ˜ R P ˜ t r u t h 2
where P ˜ R and P ˜ t r u t h represent the coordinates of a total of N test points in the fused point cloud and their corresponding nearest neighbor points in the ground truth point cloud, respectively. When calculating point cloud accuracy, P is the 3D coordinate. When calculating DSM accuracy, P is the elevation value.
  • Implementation Details
The experiments were conducted on a PC equipped with an Intel Core i7-10870H CPU, 16GB RAM, and an Nvidia RTX 2080 GPU, without using any parallel computing programs or other specialized hardware. The values of the three parameters in the combined loss function were empirically set to 100, 1, and 1, respectively. In our experiments, we focused exclusively on three types of targets: buildings, planes, and ships.

3.2. Results and Comparisons

3.2.1. Dataset A

To comprehensively demonstrate the effectiveness of the proposed algorithm, several recent algorithms similar to the proposed method were selected for comparison experiments, including two traditional methods (ICP [4] and 3DT [11]) and two deep learning methods (DCP [17] and MS3DQE-Net [16]). For ICP and DCP, the fusion results were obtained by directly merging the registration 3D point cloud. The parameters for all comparison methods were set according to the recommendations in their respective papers, and the network models were implemented using their original architectures. Some experimental results are shown in Figure 5. Since some of the comparison methods are specifically designed for point cloud and this paper only evaluates point cloud accuracy, and considering that 3D models in mesh form are not convenient for observing point cloud positions, only the point cloud are displayed in the result images for visual clarity.
From the experimental results shown in the figures, it is evident that the proposed method achieves better fusion performance on targets such as planes, ships, and buildings, outperforming other comparison methods in terms of point cloud density and target completeness. The quantitative evaluation in Table 1 shows that the proposed method achieves higher accuracy in both EMD and RMSE compared to other methods, indicating that the fusion results are closer to the real target in both spatial position and shape.
Figure 6 shows an enlarged version of some results from Figure 5. The first two sets of data in Figure 5 demonstrate the fusion results for two plane targets. Due to the relatively simple structure of plane targets, all methods achieved satisfactory fusion results. However, the proposed method shows superior fusion performance in the wing and tail areas, with fewer noise points, as shown by the black circle in the first set of results in Figure 6. The third and fourth sets of data illustrate the fusion results for two ship targets. Ship targets are large and structurally complex, particularly the types of ships discussed in this paper, which have a large and prominent bridge structure. Limited by the resolution of SSIs, it is difficult to completely reconstruct the overall structure of the ships using only satellite data, resulting in relatively low-quality source data for fusion. As seen by the black circle in Figure 6, the proposed method ensures that the bridge structure remains prominently raised after fusion, while some methods lose this critical detail. The last two sets of data show the fusion results for two building targets. The proposed method achieves good fusion results in flat areas such as rooftops and sides of buildings, with a significant reduction in noise points. Additionally, it effectively reconstructs unique roof structures and restores the shapes of ornamental structures more completely than other methods. This can be seen from red and black circles in the third set of results in Figure 6.
Regarding the comparison methods involved in the experiments, both ICP and DCP, despite using different approaches, fundamentally aim to register two sets of point cloud data to achieve overlapping fusion, resulting in similar accuracy in the experiments. The 3DT method was primarily proposed to handle 3D data obtained in urban areas with complementary information through different means. However, the 3D data in this paper come from the same data source, even though they are obtained from remote sensing images taken at different times. There is relatively little complementary information in the source data, which does not meet the ideal application scenario for 3DT. MS3DQE-Net is mainly designed for the fusion of LiDAR point cloud data in large-scale areas, and it is prone to errors when dealing with single targets where target information is incomplete.

3.2.2. Dataset B

Since there is no publicly available dataset for satellite-based 3D data fusion, we indirectly validated the fusion performance of our proposed method using a set of publicly available datasets for stereo matching and DSM fusion. Specifically, as shown in Figure 7, we perform stereo matching and coordinate calculation on two sets of stereo image pairs to obtain DSMs. Then, we extract the 3D models of building targets from each DSM and fuse the point clouds. The fused result is reprojected into a DSM and compared with the DSM ground truth provided in the dataset. This method allows us to indirectly validate the effectiveness of the fusion method by evaluating the accuracy of the fused point cloud in the height dimension. The stereo matching and coordinate calculation are achieved through the method proposed in reference [31].
The comparison methods used in the experiments with dataset B are the same as those for dataset A. The experimental results are shown in Figure 8, and the DSM accuracy is presented in Table 2. In Figure 8, it can be observed that the proposed method effectively fuses the auxiliary structures on top of the buildings, and as indicated by the red circles, it is closer to the ground truth. In Table 2, the proposed method achieves better accuracy compared to other methods. Although the RMSE for dataset B is calculated only in the elevation dimension, its performance is worse than in dataset A. This is because the DSM in dataset B was derived from [31], and its data quality cannot be compared to that of dataset A, which was obtained from [3]. This also demonstrates that the proposed method can achieve good results even when fusing low-quality data.

3.3. Ablation Study

To verify the effectiveness of the key modules in the proposed fusion method, an ablation study was conducted by altering or removing certain components to compare the fusion performance. The experimental results are shown in Table 3. When there is no “EMD auto-evolution” module, two sets of 3D point clouds are directly merged through registration.
From Table 3, we can draw the following conclusions: (1) EMD auto-evolution significantly improves fusion performance compared to directly merging point cloud. (2) The absence of point cloud upsampling on the 2D parameter plane not only affects the upsampling accuracy and introduces errors but also fails to recover the point cloud from the source data, resulting in greater errors. (3) While other modules do not have a decisive impact on accuracy, they all contribute to the overall performance.

3.4. Robustness Analysis

3.4.1. Noisy Data

To verify the robustness of the algorithm, Gaussian noise at different noise levels was added to the point cloud obtained from SSIs, referred to as point cloud 1 and point cloud 2. The proposed fusion algorithm was then applied to fuse the two noisy point cloud. The fusion results are shown in the figure below.
As shown in Figure 9, despite the addition of noise to the point cloud, the fusion results still maintain a good shape. The proposed algorithm remains highly effective even with such challenging data, demonstrating its robustness to noise.

3.4.2. Fusion Data Obtained from Different Data Sources

Although the number of satellites is continuously increasing, it can sometimes be challenging to obtain multiple sets of satellite data for the same region within a short period. Therefore, supplementing with other data sources can effectively enhance fusion efficiency. To this end, the point cloud obtained from optical satellite remote sensing images were fused with a coarse 3D model of the target obtained from SAR data to evaluate the effectiveness of the proposed fusion method. The optical and SAR remote sensing images used are shown in Figure 10.
Achieving high precision 3D reconstruction of single targets using SAR data alone remains challenging. Therefore, this study extracts 3D information using a building height estimation method based on SAR layover to directly obtain the heights of flat and simple sloped roofs [32]. Subsequently, 3D models are manually generated using the roof heights and building outlines. This method is suitable only for a small number of regularly shaped buildings. Considering that these simple buildings are not easy to observe when displayed as point cloud, the buildings in Figure 11 are presented in mesh form.
The experimental results in Figure 11 and Table 4 show that the fused building targets retain their inherent structural characteristics while providing more regular and higher confidence building structures, resulting in a significant improvement in accuracy. The proposed method effectively addresses the fusion of 3D data obtained from different data sources.

4. Conclusions

In this work, we propose a 3D point cloud fusion method based on EMD auto-evolution and local parameterization network. The method consists of two stages. The first stage addresses the fusion measurement problem by introducing EMD as a key metric for evaluating the fusion results. We construct an optimization problem for the point cloud fusion method based on EMD auto-evolution by incorporating input point cloud geometric structure inheritance, auto-evolution of the point cloud, adaptive point cloud weighting, and symmetry constraints. An alternating iterative technique is used to sequentially optimize the variables, producing an initial fusion result. This ensures that the fusion result more closely approximates the true target shape while allowing for the fusion of point cloud with non-uniform lengths. The second stage addresses the point cloud density issue. Based on the initial fusion results, we construct a local parameterization network and design a combined loss function to train it end to end. By parameterizing the initial fused point cloud into a 2D domain, extracting and connecting multi-scale hierarchical point-wise features of the 3D point cloud, and resampling in the 2D parameter domain before mapping back to 3D space via a linear transformation, we produce an optimized point cloud fusion result. This approach increases point cloud density while ensuring the recovery of more points lost during the initial fusion. Experimental results demonstrate that our fusion method outperforms other classical comparison algorithms for targets such as buildings, airplanes, and ships. The RMSE of the fusion results can reach approximately 2 m, and the EMD accuracy is better than 0.5. Nevertheless, the proposed method still has the following issues: a) The fusion accuracy depends on the quality of the original data before fusion. b) The fusion method does not take into account the physical characteristics of the target, lacking the use of structural information about the target. These issues will be explored in future research.

Author Contributions

Conceptualization, W.C.; methodology, W.C.; Software, W.C.; validation, W.C.; formal analysis, W.C.; investigation, S.Y.; resources, H.C.; data curation, S.Y.; writing—original draft preparation, W.C.; writing—review and editing, W.C.; visualization, S.Y.; supervision, H.C.; project administration, H.C.; funding acquisition, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data sharing is not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tripodi, S.; Duan, L.; Poujade, V.; Trastour, F.; Bauchet, J.P.; Laurore, L.; Tarabalka, Y. Operational pipeline for large-scale 3D reconstruction of buildings from satellite images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Waikoloa, HI, USA, 26 September–2 October 2020; pp. 445–448. [Google Scholar]
  2. Wang, Y.; Zorzi, S.; Bittner, K. Machine-learned 3D Building Vectorization from Satellite Imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021; pp. 1072–1081. [Google Scholar]
  3. Chen, W.; Chen, H.; Yang, S. 3D model extraction network based on RFM constrained deformation inference and self-similar convolution for satellite stereo images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 11877–11885. [Google Scholar] [CrossRef]
  4. Kim, C.; Son, H.; Kim, C. Fully automated registration of 3D data to a 3D CAD model for project progress monitoring. Autom. Constr. 2013, 35, 587–594. [Google Scholar] [CrossRef]
  5. Peng, F.; Wu, Q.; Fan, L.; Zhang, J.; You, Y.; Lu, J. Street view cross-sourced point cloud matching and registration. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 2026–2030. [Google Scholar]
  6. Zhou, Q.Y.; Park, J.; Koltun, V. Fast global registration. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 766–782. [Google Scholar]
  7. Rusu, R.B.; Blodow, N.; Beetz, M. Fast point feature histograms (FPFH) for 3D registration. IEEE Int. Conf. Robot. Autom. 2009, 3212–3217. [Google Scholar]
  8. Tsin, Y.; Kanade, T. A correlation-based approach to robust point set registration. In Proceedings of the European Conference on Computer Vision (ECCV), Prague, Czech Republic, 11–14 May 2004; pp. 558–569. [Google Scholar]
  9. Chen, H.; Feng, Y.; Yang, J.; Cui, C. 3D reconstruction approach for outdoor scene based on multiple point cloud fusion. J. Indian Soc. Remote Sens. 2019, 47, 1761–1772. [Google Scholar] [CrossRef]
  10. Huang, X.; Zhang, J.; Wu, Q.; Fan, L.; Yuan, C. A coarse-to-fine algorithm for registration in 3D street-view cross-source point clouds. In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 30 November–2 December 2016; pp. 1–6. [Google Scholar]
  11. Bódis-Szomorú, A.; Riemenschneider, H.; Van Gool, L. Efficient volumetric fusion of airborne and street-side data for urban reconstruction. In Proceedings of the International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 3204–3209. [Google Scholar]
  12. Hoegner, L.; Abmayr, T.; Tosic, D.; Turzer, U.; Stilla, U. Fusion of 3D point clouds with tir images for indoor scene reconstruction. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 189–194. [Google Scholar] [CrossRef]
  13. Li, K.; Tang, Y.; Prisacariu, V.A.; Torr, P.H.S. BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 6156–6165. [Google Scholar]
  14. Jia, X.; Yang, S.; Wang, Y.; Zhang, J.; Peng, Y.; Chen, S. Dual-view 3D reconstruction via learning correspondence and dependency of point cloud regions. IEEE Trans. Image Process. 2022, 31, 6831–6846. [Google Scholar] [CrossRef] [PubMed]
  15. Jia, X.; Yang, S.; Peng, Y.; Zhang, J.; Chen, S. DV-Net: Dual-view network for 3D reconstruction by fusing multiple sets of gated control point clouds. Pattern Recognit. Lett. 2020, 131, 376–382. [Google Scholar] [CrossRef]
  16. Liu, W.; Zang, Y.; Xiong, Z.; Bian, X.; Wen, C.; Lu, X.; Wang, C.; Junior, J.; Gonçalves, W.; Li, J. 3D building model generation from MLS point cloud and 3D mesh using multi-source data fusion. Int. J. Appl. Earth Obs. Geoinf. 2023, 116, 103171. [Google Scholar] [CrossRef]
  17. Wang, Y.; Solomon, J. Deep Closest Point: Learning Representations for Point Cloud Registration. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3522–3531. [Google Scholar]
  18. Hu, Q.; Wei, X.; Zhou, X.; Yin, Y.; Xu, H.; He, W.; Zhu, S. Point cloud enhancement optimization and high-fidelity texture reconstruction methods for air material via fusion of 3D scanning and neural rendering. Expert Syst. Appl. 2024, 242, 122736. [Google Scholar] [CrossRef]
  19. Yookwan, W.; Chinnasarn, K.; So-In, C.; Horkaew, P. Multimodal fusion of deeply inferred point clouds for 3D scene reconstruction using cross-entropy ICP. IEEE Access. 2022, 10, 77123–77136. [Google Scholar] [CrossRef]
  20. Rubner, Y.; Tomasi, C.; Guibas, L.J. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 2000, 40, 99–121. [Google Scholar] [CrossRef]
  21. Huang, H.; Li, D.; Zhang, H.; Ascher, U.; Cohen-Or, D. Consolidation of unorganized point clouds for surface reconstruction. ACM Trans. Graph. 2009, 28, 1–7. [Google Scholar] [CrossRef]
  22. Preiner, R.; Mattausch, O.; Arikan, M.; Pajarola, R.; Wimmer, M. Continuous projection for fast L1 reconstruction. ACM Trans. Graph. 2014, 33, 47. [Google Scholar] [CrossRef]
  23. Huang, H.; Wu, S.; Gong, M.; Cohen-Or, D.; Ascher, U.; Zhang, H. Edge-aware point set resampling. ACM Trans. Graph. 2013, 32, 1–12. [Google Scholar] [CrossRef]
  24. Yu, L.; Li, X.; Fu, C.-W.; Cohen-Or, D.; Heng, P.-A. PU-Net: Point Cloud Upsampling Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 2790–2799. [Google Scholar]
  25. Yu, L.; Li, X.; Fu, C.-W.; Cohen-Or, D.; Heng, P.-A. Ec-net: An edge-aware point set consolidation network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 2790–2799. [Google Scholar]
  26. Wang, Y.; Wu, S.; Huang, H.; Cohen-Or, D.; Sorkine-Hornung, O. Patch-Based Progressive 3D Point Set Upsampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5951–5960. [Google Scholar]
  27. Toponogov, V.A. Differential Geometry of Curves and Surfaces; Birkhũser-Verlag: Basel, Switzerland, 2006. [Google Scholar]
  28. Campen, M.; Bommes, D.; Kobbelt, L. Quantized global parametrization. Acm Trans. Graph. 2015, 34, 1–12. [Google Scholar] [CrossRef]
  29. Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
  30. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
  31. Yang, S.; Chen, H.; Chen, W. Generalized Stereo Matching Method Based on Iterative Optimization of Hierarchical Graph Structure Consistency Cost for Urban 3D Reconstruction. Remote Sens. 2023, 15, 2369. [Google Scholar] [CrossRef]
  32. Giardino, G.A.; Schiavon, G.; Solimini, D. An approach for improving building height estimation from interferometric SAR data. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Melbourne, VIC, Australia, 21–26 July 2013; pp. 4269–4272. [Google Scholar]
Figure 1. Framework of 3D point cloud fusion method based on EMD auto-evolution.
Figure 1. Framework of 3D point cloud fusion method based on EMD auto-evolution.
Remotesensing 16 04219 g001
Figure 2. Local parameterization and shape approximation.
Figure 2. Local parameterization and shape approximation.
Remotesensing 16 04219 g002
Figure 3. Point cloud optimization based on local parameterized network.
Figure 3. Point cloud optimization based on local parameterized network.
Remotesensing 16 04219 g003
Figure 4. Point cloud upsampling architecture based on 2D parameter plane.
Figure 4. Point cloud upsampling architecture based on 2D parameter plane.
Remotesensing 16 04219 g004
Figure 5. 3D point fusion results of different methods in dataset A. (a) the first SSIs; (b) 3D point cloud reconstructed from the first SSIs; (c) the second SSIs; (d) 3D point cloud reconstructed from the second SSIs; (e) ground truth; (f) ICP; (g) 3DT; (h) DCP; (i) MS3DQE-Net; (j) proposed method.
Figure 5. 3D point fusion results of different methods in dataset A. (a) the first SSIs; (b) 3D point cloud reconstructed from the first SSIs; (c) the second SSIs; (d) 3D point cloud reconstructed from the second SSIs; (e) ground truth; (f) ICP; (g) 3DT; (h) DCP; (i) MS3DQE-Net; (j) proposed method.
Remotesensing 16 04219 g005aRemotesensing 16 04219 g005bRemotesensing 16 04219 g005c
Figure 6. Enlarged version of partial results. (a) ICP; (b) 3DT; (c) DCP; (d) MS3DQE-Net; (e) proposed method; (f) ground truth.
Figure 6. Enlarged version of partial results. (a) ICP; (b) 3DT; (c) DCP; (d) MS3DQE-Net; (e) proposed method; (f) ground truth.
Remotesensing 16 04219 g006aRemotesensing 16 04219 g006b
Figure 7. The process of conducting experiments using dataset B.
Figure 7. The process of conducting experiments using dataset B.
Remotesensing 16 04219 g007
Figure 8. DSM projection of 3D fusion results using different methods in dataset B. (a) The first SSIs; (b) DSM obtained from the first SSIs; (c) the second SSIs; (d) DSM obtained from the second SSIs; (e) ground truth; (f) ICP; (g) 3DT; (h) DCP; (i) MS3DQE-Net; (j) proposed method.
Figure 8. DSM projection of 3D fusion results using different methods in dataset B. (a) The first SSIs; (b) DSM obtained from the first SSIs; (c) the second SSIs; (d) DSM obtained from the second SSIs; (e) ground truth; (f) ICP; (g) 3DT; (h) DCP; (i) MS3DQE-Net; (j) proposed method.
Remotesensing 16 04219 g008aRemotesensing 16 04219 g008b
Figure 9. Comparison of before and after fusion of plane targets with different gaussian noise over-lays.
Figure 9. Comparison of before and after fusion of plane targets with different gaussian noise over-lays.
Remotesensing 16 04219 g009aRemotesensing 16 04219 g009b
Figure 10. Optical and SAR satellite image data.
Figure 10. Optical and SAR satellite image data.
Remotesensing 16 04219 g010
Figure 11. The fusion result of 3D data obtained from optical and SAR data sources. Corresponding to buildings 1–4 from top to bottom. (a) Optical reconstruction result; (b) SAR reconstruction result; (c) fusion result.
Figure 11. The fusion result of 3D data obtained from optical and SAR data sources. Corresponding to buildings 1–4 from top to bottom. (a) Optical reconstruction result; (b) SAR reconstruction result; (c) fusion result.
Remotesensing 16 04219 g011
Table 1. Comparison to different 3D point fusion methods in dataset A.
Table 1. Comparison to different 3D point fusion methods in dataset A.
MethodEMDRMSE (m)
PlaneShipBuildingPlaneShipBuilding
3D point cloud reconstructed from SSIs [3]0.140.480.271.692.941.71
ICP0.120.430.231.582.701.58
3DT0.130.460.231.622.811.64
DCP0.110.410.231.572.691.59
MS3DQE-Net0.110.450.241.492.861.57
Proposed0.090.390.211.422.241.50
Table 2. Comparison to different 3D point fusion methods in dataset B.
Table 2. Comparison to different 3D point fusion methods in dataset B.
MethodRMSE
DSM from SSIs [31]2.66
ICP2.39
3DT2.42
DCP2.25
MS3DQE-Net2.18
Proposed2.07
Table 3. Comparison results with different key modules.
Table 3. Comparison results with different key modules.
EMD Auto-EvolutionMulti Scale Hierarchical Feature ExtractionPoint Cloud Upsampling Based
on 2D Parameter Plane
Mapping 2D Samples
Back to 3D Space
EMDRMSE (m)
0.321.87
0.291.72
0.351.95
0.311.79
0.251.68
Table 4. Accuracy of fusion results from optical and SAR data sources.
Table 4. Accuracy of fusion results from optical and SAR data sources.
Data SourceEMDRMSE (m)
Building 1Building 2Building 3Building 4Building 1Building 2Building 3Building 4
Optical SSIs data0.130.140.140.171.751.821.841.65
SAR data0.190.240.230.251.801.861.911.93
Fusion results0.090.080.120.141.751.781.791.57
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, W.; Chen, H.; Yang, S. 3D Point Cloud Fusion Method Based on EMD Auto-Evolution and Local Parametric Network. Remote Sens. 2024, 16, 4219. https://doi.org/10.3390/rs16224219

AMA Style

Chen W, Chen H, Yang S. 3D Point Cloud Fusion Method Based on EMD Auto-Evolution and Local Parametric Network. Remote Sensing. 2024; 16(22):4219. https://doi.org/10.3390/rs16224219

Chicago/Turabian Style

Chen, Wen, Hao Chen, and Shuting Yang. 2024. "3D Point Cloud Fusion Method Based on EMD Auto-Evolution and Local Parametric Network" Remote Sensing 16, no. 22: 4219. https://doi.org/10.3390/rs16224219

APA Style

Chen, W., Chen, H., & Yang, S. (2024). 3D Point Cloud Fusion Method Based on EMD Auto-Evolution and Local Parametric Network. Remote Sensing, 16(22), 4219. https://doi.org/10.3390/rs16224219

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop