To address the issues of data redundancy, low classification accuracy, and high time complexity when using HSI data and LiDAR data together, we propose the PRDRMF method. To assess the efficacy of the proposed method, experiments were conducted using three publicly accessible HSI and LiDAR datasets. First, the experimental environment and evaluation metrics are described in detail. Next, detailed information about the four datasets used in the experiments is provided. Finally, experiments are conducted on the four datasets to compare the classification accuracy and time complexity with other existing methods. Additionally, ablation experiments are utilized to demonstrate the functionality of the various sources of data and the different modules in the proposed method.
The experiments utilize unified evaluation metrics to assess the classification outcomes, which include overall accuracy (OA), average accuracy (AA), and the Kappa coefficient. The experimental environment consists of MATLAB 2021a, Keras 2.3.1, an Nvidia RTX 3050 GPU, and 32 GB of memory.
3.3. Comparison Experiment and Analysis
To validate the excellent performance of the PRDRMF method, we conducted experiments on the four datasets and compared the experimental results with other existing methods, including SVM [
56], CCNN, EndNet [
57], CRNN [
58], TBCNN, coupled CNN [
21], CNNMRF [
59], FusAtNet [
60], S2ENet [
61], CALC [
62], Fusion-HCT [
63], SepG-ResNet50 [
64], and DSMSC
2N [
65]. For these methods, the parameter settings are described in the corresponding references.
To ensure the fairness of the experimental results, the training and test samples for all experimental methods were kept consistent. Please refer to
Table 2,
Table 3,
Table 4 and
Table 5.
Table 6,
Table 7,
Table 8,
Table 9,
Table 10,
Table 11,
Table 12 and
Table 13 list the OA, AA, and Kappa values obtained using different methods on the 2013 Houston, MUUFL, Trento, and the 2018 Houston datasets. The bold values in the table represent the optimal values. To visualize the classification effects of the compared methods,
Figure 6,
Figure 7,
Figure 8 and
Figure 9 give the classification plots obtained by different classification methods on the 2013 Houston, MUUFL, Trento, and the 2018 Houston datasets. For purposes of comparison, the HSI-generated pseudo-color images, the original LiDAR maps, and the ground truth maps are also presented.
The 2013 Houston dataset demonstrates the broadest coverage of urban scenes with multiple feature classes and is primarily utilized to validate the performance of the PRDRMF method for detailed classification of urban scenes based on satellite imagery.
On the 2013 Houston dataset, the OA value of the PRDRMF method reached 99.79%. In particular, the OA value was 40.39%, 12.87%, 11.27%, 11.24%, 10.88%, 9.36%, 8.86%, 9.18%,5.60%, 5.08%, 0.03%, 27.12%, and 8.30% higher than that of SVM, CCNN, EndNet, CRNN, TBCNN, coupled CNN, CNNMRF, FusAtNet, S2ENet, CALC, Fusion-HCT, SepG-ResNet50, and DSMSC2N, respectively. Furthermore, the classification accuracy was 100% for the categories of health grass, stressed grass, and artificial grass, as well as highway, railway, parking lot 1, and parking lot 2. This suggests that the multifeature extraction module provides texture information from diverse viewpoints, thereby enabling PRDRMF classification to perform exceptionally well on the aforementioned analogous categories.
As illustrated in
Figure 6d, the conventional machine learning algorithm SVM exhibits the lowest classification accuracy and provides markedly inadequate classification results for four categories, including highway and car parking lot 1.
Figure 6e shows that the highway and railroad categories are misclassified. This is due to the CCNN method’s inability to effectively deal with heterogeneous regions. The method’s limitations are evident in its exclusive focus on spatial neighborhood information, which leads to suboptimal classification performance on various road categories. EndNet achieves 100% classification accuracy on two categories: artificial grass and tennis court. In
Figure 6g–j, it can be seen that these CNN-based methods achieve higher classification accuracy per category, reaching 100% accuracy on several categories, particularly on the category of tennis courts, which demonstrates a significant recognition effect. Notably, CRNN, TBCNN, and CNNMRF exhibited limited performance in recognizing the category of ordinary roads and highways, while coupled CNN achieved a classification accuracy of only 41.11% for the category of water. Compared to the aforementioned methods, the classification performance of FusAtNet shows a slight improvement. However, the results are still unsatisfactory for regular highways. In contrast, S2Enet demonstrates significant enhancement in classification accuracy, achieving a perfect score of 100% for stressed grass, artificial grass, water, tennis court, and runway categories. CALC achieved a classification accuracy of over 90% in 14 categories, with the exception of the category pertaining to road, where the classification accuracy requires improvement. The Transformer-based method Fusion-HCT demonstrates efficacy in classification, attaining 100% accuracy in multiple categories. However, the SepG-ResNet50 approach exhibits limitations in discerning between analogous categories, such as roads and grasses. DSMSC
2N attains 100% accuracy in two categories, soil and tennis court, although there is scope for enhancement in the recognition of highway. In the upper right corner of
Figure 6q, it can be observed that the internal noise of health grass is almost absent and less noisy compared to
Figure 6d–p. This is attributed to the utilization of PRDRMF with PRTV, which eliminates redundant information.
The MUUFL dataset depicts small-scale neighborhood scenes within the city and is primarily utilized to validate the classification effectiveness of the PRDRMF method in localized daily scenes.
On the MUUFL dataset, the OA value of the PRDRMF method reached 92.21%. In particular, the OA value was 87.74%, 3.25%, 4.46%, 0.83%, 1.36%, 1.28%, 3.27%, 0.73%, 0.53%, 9.30%, 4.78%, 9.31%, and 1.04% higher than that of SVM, CCNN, EndNet, CRNN, TBCNN, coupled CNN, CNNMRF, FusAtNet, S2ENet, CALC, Fusion-HCT, SepG-ResNet50, and DSMSC
2N, respectively. Furthermore, in the case of the sidewalk and grass categories, where all other methods demonstrated suboptimal performance, the PRDRMF method exhibited superior results, with OA values of 84.40% and 92.21%, respectively. The presence of significant misclassification is clearly evident in
Figure 7d, and it is attributed to the absence of spatial information in SVM and its vulnerability to noise, leading to lower classification accuracy on the MUUFL dataset. In
Figure 7e, it can be seen that the sidewalk and the yellow markers on the roadside are misclassified as mud and sandy ground. This misclassification occurs because the CCNN method is inadequate in handling heterogeneous regions; it is limited to considering spatial neighborhood information only. It is difficult to accurately discriminate between similar categories. From
Figure 7f, it can be observed that there are numerous noise points in the EndNet graph. This is attributed to the limited learning capability of the encoder- and decoder-based feature representations in EndNet, which hinders their ability to effectively counteract noise interference. CRNN exhibits low classification accuracy in the category of mostly grass, but it achieves the highest classification accuracy of up to 96.97% in the category of yellow markings on the roadside. TBCNN, coupled CNN, CNNMRF, and other CNN-based methods consider spatial and spectral information, reduce noise, and achieve higher classification accuracy in pixel-level remote sensing image scenes. They have demonstrated the highest classification accuracy in several categories. However, the focus on spectral feature similarity results in the spatial elevation features of the features being ignored, with the consequence that mostly grass is misclassified as trees. The OA of CALC is lower than that of all the aforementioned CNN-based methods. In comparison to the above methods, FusAtNet and S2Enet have demonstrated increased overall classification accuracy. However, the classification accuracy is markedly deficient in the case of the roadside yellow curb category. The Transformer-based Fusion-HCT method exhibits suboptimal performance in classification, particularly in the identification of pavement. PRDRMF demonstrates the most accurate classification outcomes. A comparison of
Figure 7e–p reveals that
Figure 7q, which is most similar to the ground truth map, exhibits less classification noise and clearer boundaries. This is attributed to the utilization of PRDRMF with PRTV, which eliminates redundant information.
The Trento dataset showcases farm scenarios with fewer crop classes and is mainly utilized to validate the performance of the PRDRMF method for precise agricultural classification over extensive areas and in more standardized conditions.
On the Trento dataset, the OA value of the PRDRMF method reached 99.73%. In particular, the OA value was improved by 26.84%, 2.44%, 5.56%, 2.51%, 2.27%, 2.04%, 1.33%, 0.67%, 1.19%, 0.35%, 0.13%, 5.91%, and 0.80% compared to SVM, CCNN, EndNet, CRNN, TBCNN, coupled CNN, CNNMRF, FusAtNet, S2ENet, CALC, Fusion-HCT, SepG-ResNet50, and DSMSC
2N, respectively. Furthermore, the classification accuracy was 100% for two similar categories, namely woods and vineyards. This suggests that the multifeature extraction module provides texture information from diverse viewpoints, thereby enabling PRDRMF classification to perform exceptionally well on the aforementioned analogous categories. The large apple tree orchard depicted in
Figure 8d is erroneously classified as vineyard land due to the SVM’s lack of spatial information and susceptibility to noise interference, which can lead to misclassification of categories. From
Figure 8f, it can be seen that the total classification accuracy of EndNet is only higher than that of SVM. There are many noise points in the graph due to the limited learning ability of the encoder-based and decoder-based feature representation in EndNet, which hinders its effectiveness in resisting noise interference. In
Figure 8e, it can be seen that the yellow markers on the sidewalk and the roadside are misclassified as mud and sand. This is due to the limitations of the CCNN method in handling heterogeneous regions. The method only considers spatial neighborhood information, which makes it difficult to accurately discriminate between similar categories. In
Figure 8e,g–j, it can be seen that the total classification accuracies of these CNN-based methods are significantly higher. Specifically, CRNN and TBCNN achieve 100% classification accuracies for both categories of ground and vineyard land, while CNNMRF demonstrates the highest classification accuracies for all three categories of apple trees, woods, and vineyard land. Compared to the aforementioned methods, FusAtNet and S2Enet show a slight improvement in classification, achieving 100% accuracy in classifying ground and woods categories, respectively. The classification maps produced by CALC and Fusion-HCT are of superior quality, exhibiting minimal noise points. In comparison, the DSMSN classification map displays a few noise points within the apple trees category, while the SepG-ResNet50 map is of inferior quality, displaying significant noise points. In addition, there is noticeable noise in the extensive apple tree orchard depicted in
Figure 8e–j,o. However, in
Figure 8q, the large, well-maintained farm scene displays distinct boundaries with minimal noise. This is attributed to the utilization of PRDRMF with PRTV, which eliminates redundant information. Only a slight cross-bar phenomenon of misclassification within the ground features persists.
The 2018 Houston dataset is a selection of the same seven categories as the 2013 Houston dataset, showing areas of the same location at different points in time, and is mainly used to compare the 2013 Houston dataset as a complement to the multi-temporal data. The study of this dataset shows the applicability of this paper’s method on multi-temporal data.
On the 2018 Houston dataset, the OA value of the PRDRMF method reached 96.93%. In particular, this OA value was 15.44%, 6.84%, 6.21%, 5.77%, 5.72%, 4.72%, 4.58%, 5.35%, 2.34%, 2.13%, 0.25%, 8.63%, and 3.38% higher than that of SVM, CCNN, EndNet, CRNN, TBCNN, coupled CNN, CNNMRF, FusAtNet, S2ENet, CALC, Fusion-HCT, SepG-ResNet50, and DSMSC2N, respectively. In addition, it achieved 100% classification accuracy on the water category. This is better than PRDRMF’s recognition of the water category on the 2013 Houston dataset, probably because the 2018 Houston dataset has only seven categories and the sample size of the water category accounts for too much of the total. The dataset was simplified to make it easier to distinguish different target categories.
As shown in
Figure 9d, the traditional machine learning algorithm, SVM, has the lowest classification accuracy and provides extremely poor classification results on three categories, such as grass healthy and residential buildings. And in
Figure 9e, it can be seen that the road and non-residential buildings categories are misclassified, which is due to the shortcomings of the CCNN method in dealing with heterogeneous regions and to the poor classification performance on spatially neighboring categories. EndNet performs better only on the two categories of grass stressed and non-residential buildings, and the rest needs to be improved. In
Figure 9g–j, it can be seen that these CNN-based methods have a higher classification accuracy per category, with all of them achieving 100% accuracy on the category of water. The TBCNN achieves better recognition performance on the category of grass healthy and the CNNMRF achieves better recognition performance on the category of trees, with a classification accuracy of 97.37%. FusAtNet, on the other hand, does not perform as well on the category of trees. The improvement in classification accuracy achieved by the above methods is not much, while S2Enet has great improvement in classification accuracy, with 94.59%. CALC achieves more than 80% accuracy for eight categories. The Transformer-based Fusion-HCT method performs well in terms of classification and is the best for the residential buildings category compared to all methods. SepG-ResNet50 had a low overall classification accuracy, but only achieves the highest accuracy in the grass stressed category, with an OA of 98,59%. DSMSC
2N classified the categories of both residential buildings and non-residential buildings with a superior accuracy. In particular, the identification of grass healthy was poor for all methods on the 2018 Houston dataset, unlike the 2013 Houston dataset. This may be due to the fact that the total data volume was smaller and PRDRMF’s identification effect on specific categories was weakened.
Overall, it can be observed from the figure that the other methods exhibit significant noise in the four datasets and do not accurately identify the types of objects. In contrast, the PRDRMF method produces fewer mislabeled classification maps across the four datasets, with clearer boundaries that closely align with the corresponding ground truth. Among all the compared methods, PRDRMF demonstrates the best classification performance and is competitive in the task of feature recognition when utilizing both HSI data and LiDAR data simultaneously.