1. Introduction
Due to the complexity and diversity of ground objects, complex-terrain clutter presents serious inhomogeneities, which have a significant impact on radar target detection, communication, and navigation [
1,
2]. In order to grasp the variation patterns between clutter and various parameters, numerous studies on the characteristics of complex-terrain clutter, including several special test programs [
3,
4,
5], theoretical algorithm-based research [
6,
7,
8], and empirical modeling [
9,
10,
11], have been conducted, which are of great significance to the knowledge of complex-terrain clutter characteristics. However, different from the scattering of single-type ground objects in a uniform terrain, which can be effectively modeled as random rough surfaces [
7,
12] or multilayer dielectric scattering [
13,
14], the influence of complex-terrain parameters on scattering is much more intricate and challenging to correlate and represent. The complexity arises from two reasons: the complexity and diversity of the ground objects cannot be characterized well, and the influence of the environmental factors on the scattering coefficient presents complex nonlinear relationships that are difficult to quantify. As a result, the clutter characteristics of complex terrains cannot be accurately obtained when encountering new ground object environments and test conditions.
In recent years, in order to break though the limitation of previous models and test results of only categorizing and qualitatively describing terrain clutter [
4,
10,
15], the generation of mixed-terrain clutter using digital elevation maps (DEMs), land-cover classification data, image data, and other remote sensing data to assist traditional clutter models has garnered serious attention. Hellard et al. [
16] simulated ground-based radar clutter using digital topographic-relief and land-cover information. Feng et al. [
17] studied a low-grazing-angle-scattering model using the inductive reasoning method by combining the frequency, grazing angle, and feature type. Darrah et al. [
18] investigated a model that combined digital terrain elevation data, digital feature analysis data, and the empirical model of the terrain clutter amplitude to generate an accurate site-specific clutter reflectivity map. Kurekin et al. [
19] proposed a clutter map generation method based on multichannel images and digital terrain data. Capraro et al. [
20] developed techniques for the registration of radar echo signals and the correction of Doppler and spatial misalignment using digital terrain classification and elevation data and applied them to improve the performance of space–time adaptive processing. Based on their research results and methods, more research has been conducted on complex-terrain clutter simulations using digital elevation and traditional models in the last decade. Xie et al. [
21] proposed an improved knowledge-assisted algorithm based on the Morchin clutter model. Li et al. [
22] proposed a method based on DEMs and digital land feature classification data termed the nonuniform ground clutter simulation method, which selects different scattering models according to distinctive features to improve accuracy. Oreshkina et al. [
23] proposed a method to optimize the azimuthal sampling interval of digital topographic maps for radar ground clutter simulations. Luo et al. [
24] proposed an optimal searching strategy based on DEMs to suppress ground clutter and achieve a fast and high-probability search for low-altitude targets. Kim et al. [
25] proposed a method for synthesizing clutter based on an empirical model by creating and accumulating clutter block by block to predict clutter in unexplored environments and operational conditions.
Despite the introduction of more accurate terrain and feature classification data, previous studies still relied on the traditional empirical model, which is based on terrain and ground object information that is projected onto the corresponding scattering unit to synthesize the clutter map. The traditional model is only a brief representation of the statistical results and variation rules and does not adequately characterize a complex terrain. In addition, owing to the lack of a perfect association between the measured clutter data and geomorphological and environmental elements, these studies did not perform quantitative analyses of the influencing factors in complex terrains, leading to low accuracy.
Given the nonlinear relationship between complex-terrain clutter and environmental factors, it is crucial to adopt appropriate data-mining and analysis methods. The random forest (RF) method [
26,
27] is a popular machine learning (ML) algorithm. Its internal estimates can be used to measure the importance of the variables for the analysis of their multivariate nonlinear weighted influence. In recent years, the RF method has been widely used in remote sensing image classification, agricultural prediction, regression analysis, and other research [
28,
29,
30,
31,
32]. In addition, clustering and classification prediction methods based on similar parameters, such as k-means clustering and the k-nearest-neighbors (KNN) algorithm, have also been applied [
33,
34]. In recent years, several scholars have utilized deep learning networks to predict the characteristics and parameters of sea clutter [
35,
36,
37], achieving promising results. However, there has been limited research on the application of similar techniques to ground clutter. In theory, with a sufficiently large dataset covering a comprehensive range of parameters, both deep learning and machine learning can be effective in predictive modeling. Nevertheless, this ideal scenario contrasts with the practical limitations of real-world data acquisition, where it is impossible to achieve full coverage of all possible scenarios through measured data. The intention of this article is to develop a method for estimating similar terrain clutter based on actual measured data. This approach aims to extrapolate the clutter characteristics to unmeasured areas with comparable terrain features, especially in situations in which costs and data availability are constraints.
In response to the above research difficulties, we constructed a multifactor-associated clutter dataset by reconstructing the geometric relationship of the measurement scene, triangulating the terrain, and matching the illumination areas. The dataset contains S-band-airborne-radar-measured clutter data, Advanced Land Observing Satellite (ALOS) DEM data with a resolution of 12.5 m downloaded from the Distributed Dynamic Archive Center (DAAC) of the Alaska Satellite Facility (ASF) [
38], land-cover classification data from Global Land 30 [
39], soil composition data from the Harmonized World Soil Database (HWSD) [
40], normalized difference vegetation index (NDVI) data attributed to the Moderate Resolution Imaging Spectroradiometer (MODIS) [
41], and Google image data. This dataset solves the problem of quantitative parameter representation in the study of large-scene clutter characteristics and enables a better analysis and understanding. To overcome the challenges posed by the large dynamic ranges of the scattering coefficients and numerous parameters in complex terrains across extensive areas, prior-knowledge-based classification and multifactor weight coefficient estimation were explored on this dataset. Based on the aforementioned research, two innovative methods were developed by combining the techniques of K-means clustering, RF prediction, and the minimum Euclidean distance. The first method, named PCKRF, integrated pre-classification, weighted K-means clustering, and RF prediction. The second method, known as PCKMW, incorporated pre-classification, weighted K-means clustering, and the minimum weighted Euclidean distance. Both methods aimed to enhance the accuracy and reliability of terrain clutter estimation by leveraging the strengths of different algorithms. The pre-classification step helped to identify distinct terrain types, which were then used as inputs for the subsequent clustering and prediction steps. The weighted K-means clustering took into account the importance of different parameters, allowing for a more precise grouping of similar terrain characteristics. Compared to the measured data and evaluated across multiple metrics, such as the mean absolute error (MAE), root-mean-squared error (RMSE), coefficient of determination (R2), and mean value error in the pulse dimension (PMVE), the PCKRF and PCKMW methods demonstrate superior accuracy compared to RF prediction, the minimum weighted distance (MW), and the K-means-clustering minimum weighted distance (KMW). Despite requiring further enhancements in scatter plot regression and prediction accuracy, the current methodologies already possess significant advantages and application value in accurately recognizing the clutter characteristics of untested areas using limited data. These approaches also provide considerable support for evaluating the radar performance in complex-terrain environments where comprehensive data collection may be difficult or impossible.
The remainder of this paper is organized as follows.
Section 2 provides details on the data used and the overall method.
Section 3 shows the associative multiparameter dataset and the results of the weight analysis.
Section 4 details similar terrain estimation methods and discusses the results.
Section 5 presents the conclusions.
4. Discussion
This study is based on airborne-radar data of a large area with complex-terrain clutter combined with ALOS DEM, Global Land 30, HWSD soil distribution, MODIS NDVI, Google image, and other environmental remote sensing data. A dataset of the association and matching between the clutter and environmental parameters was constructed by reconstructing the geometric relationship of the test scene, dividing the terrain, and matching the illuminated areas. Compared with previous clutter data [
22,
25], the dataset constructed in this study has the characteristics of wide coverage and many ground object parameters. This dataset overcomes the problem of quantitative parameter representation in the study of large-scene clutter characteristics and enables a better analysis and understanding.
Based on this dataset, aiming to solve the practical problems of the high cost of traditional clutter testing in large areas, difficult testing in remote and complex terrain, and lack of understanding of clutter characteristics, this study attempted to use measured data and the underlying logic of similar associations of multidimensional parameters between different regions to implement clutter estimation methods for similar terrains. This included a weight analysis of the influences of the complex terrain and multiple environmental parameters on the clutter, RF regression prediction, prior-knowledge classification, clustering, and hybrid estimation using multiple-method superposition.
Figure 11 and
Figure 21b show the significant differences between the predictions of the RF model in the training set, test set, and unknown regions. The RMSE of the training set was only 3, the R
2 was 0.8782, and the dynamic range in the scattering coefficient was from −50 to −10 dB, which are relatively ideal indicators in cases of such high amounts of data and numbers of multidimensional parameters. For the corresponding test set, the RMSE reached 4.67, the R
2 was 0.7, and the dynamic range was from −44 to −18 dB, which is in line with the expectations. However, in the prediction of the target area, all the evaluation indexes deteriorated: the RMSE was 7, the R
2 was only 0.09, and the scatter plot of the scattering coefficients was concentrated in the range from −40 to −20 dB. This comparison shows that the RF model is inefficient in clutter prediction in unknown regions.
The logic of scattering-coefficient extrapolation for untested areas based on data is somewhat different from traditional ML or DL prediction [
39]. First, ML or DL was used to shuffle the entire dataset, most of which was used as the training set to train the model, while a small part was used as the test set to verify the model [
30,
31]. The data sources for the test and training sets were the same. In this study, the data for the source and target areas were different. Although they were in similar environments, they were not located in the same test area. Second, the high complexity of ground-cover scattering increases the data cost of ML prediction models. For example, the data of the target area were not involved in the training process for the ML prediction. The RF scattering-coefficient prediction (
Figure 21b) was not ideal because of the lack of scattering coefficients for the target area. It is reasonable to train the ML model by mixing the target region with data of the source region to improve its prediction accuracy in the target region; however, this is contrary to the precondition that the scattering coefficient of the target region is unknown. Therefore, if we want to rely on complete ML modeling to predict unknown regions, we need a large number of datasets from different regions for training and learning, which will lead to high costs and difficulty in measurements in remote areas. Therefore, the purpose of this study was to use existing data or a low amount of test data to achieve the rapid and accurate cognition of unknown regions.
Based on data calculations, it is believed that the closer the environmental parameters, the more similarity is required in the scattering characteristics. Therefore, after calculating the influence weight of the source area, the MW method was used to match and assign the scattering units of the target and source areas (
Figure 21c,d). The 2D plot is close to the measured data, but the scatter plot has poor regression, with an RMSE of 10.83 and an R
2 of −1.154, indicating that the estimation method is not appropriate. The KMW first classified the source data into several clusters and used the MW method to estimate the corresponding clusters (
Figure 21e,f). Compared with the MW, the KMW shows a certain improvement, but the regression is not sufficiently good; the RMSE is 9.47 and the R
2 is −0.6472.
The prediction results of the RF, MW, and KMW methods are not ideal, mainly because ground-cover scattering is very complex; in particular, the weight coefficient of each parameter that affects the scattering changes significantly with the change in ground cover. For example, in mountainous areas, the shading coefficient is a relatively large parameter, whereas, in flat terrains, the distribution of ground objects such as towns, woodlands, and arable land is the main influencing factor. In addition, the dynamic range of the scattering coefficient in a complex terrain is large, and in many cases, the parameters of the ground objects are close to the measured results; however, the scattering coefficient varies significantly, leading to the insufficient regression of the prediction method.
To overcome these problems, based on a previous method, this study used prior knowledge to classify the source data into six classes, calculate the weight coefficient of each class of data, and perform weighted clustering. In the process of the weighted clustering, the scattering coefficient was also used as one of the clustering parameters, and the weight of the scattering coefficient was set to 30% to increase the influence of the scattering coefficient in the clustering algorithm. Therefore, a dataset of 6 × 12 groups was obtained, and RF and MW predictions were performed for each dataset, namely, the PCKRF and PCKMW datasets. The PCKMW method uses a weighted minimum distance judgment when judging the category of the target data, and the weights include only those of the environmental parameters. From the comparison results in
Figure 21 and the evaluation indicators in
Table 6, these two methods were significantly improved compared to the other three methods. Compared with the RF, the two-dimensional plot of the PCKRF is closer to that of the measured results, and the dynamic range reaches from −50 to −15 dB. The regression of the scatter plot also improved to a certain extent, with an RMSE of 6.394 and an R
2 of 0.2491, which are better. The RMSE of the PCKMW method was 4.321 with an R
2 of 0.657, the mean value is very close to that of the measured results, and the PMVE is 0.726 dB; therefore, all the indicators are better than those of the other methods. The results of the comparison of the methods show that the PCKRF and PCKMW provide better clutter estimation in unknown regions after pre-classification and superimposed weighted clustering. However, owing to the adoption of the data-clustering method with an increasing proportion of scattering coefficients, a considerable part of the prediction results of the PCKMW have many concentrated values, which has a negative impact on the regression.
In summary, the terrain clutter estimation method presented in this paper is in the early stages of research, with several areas warranting improvement. For example, the dataset construction relies on the statistical results of the scattering-unit parameters. However, in fact, a scattering unit should be made into a dataset associated with multiple sub-units for research rather than statistical results. In addition, DL neural networks and hyperparameter optimization algorithms [
35] should be applied to improve the estimation methods in the future. Furthermore, by converting multidimensional parameter data into a two-dimensional range–pulse map, image recognition and prediction algorithms could be applied to estimate the range–pulse map of scattering coefficients. The verification dataset, derived from the target area, closely aligns with the test lot and parameters of the source area, demonstrating typicality. However, to extend applicability to a broader range of areas and datasets, further research is essential.
5. Conclusions
For the complex-terrain clutter in a vast territory, the complexity and diversity of the ground objects cannot be characterized well, and the influence of environmental factors on the scattering coefficient presents complex nonlinear relationships that are difficult to quantify. As a result, the clutter characteristics of complex terrain cannot be accurately obtained when encountering new ground object environments and test conditions. To address the aforementioned challenges, this study undertook the following research efforts. Firstly, a clutter dataset association with multisource environmental data was established. Secondly, through data mining employing various methods, two novel clutter estimation techniques for similar terrains were developed.
In this study, combining the measured clutter data, DEM data, land-cover classification data, soil composition data, NDVI data, and Google image data, we constructed a multifactor-associated clutter dataset by reconstructing the geometric relationship of the measurement scene, triangulating the terrain, and matching the illumination areas. There are as many as 16 types of parameters involved in this dataset, which quantitatively characterize the complex environmental information from various aspects, such as the terrain relief, shadowing effects, and surface-cover types and their distribution. Taking the dataset of area B as an example, it covers an extensive region of 90 km × 70 km and contains 691,200 rows and 17 columns. In terms of both the richness of the parameter information and the size of the covered area, the association dataset established in this study is unparalleled. This dataset not only solves the problem of the associated quantitative parameter representation of large-scene clutter, but it also provides a foundation for the development of clutter estimation methods for similar terrains.
Based on the multifactor-associated clutter dataset, to overcome the challenges posed by the large dynamic ranges of scattering coefficients and compound effects of multiple parameters on clutter, two innovative clutter estimation methods, designated as PCKRF and PCKMW, were developed by combining multiple techniques, such as prior-knowledge-based classification and multifactor weight coefficients, K-means clustering, RF prediction, and the minimum Euclidean distance. We compared the prediction results of the PCKRF and PCKMW with the measured data from multiple aspects, including the range–pulse, scattering-coefficient, two-dimensional image, scatter plot of scattering coefficients, and mean-scattering-coefficient curve. And multiple metrics, such as the MAE, RMSE, R2, and PMVE, were used to validate the accuracies of the algorithms. The results showed that the performances of both the PCKRF and PCKMW methods in predicting the scattering coefficient were notably superior to those of the RF, MW, and KMW. Specifically, compared with the RF, the two-dimensional plot of the PCKRF was closer to that of the measured results, and the dynamic range reached from −50 to −15 dB. The regression of the scatter plot also improved to a certain extent, with an RMSE of 6.394 dB and an R2 of 0.2491, which were better than those of the RF. Even more impressive was that all the indicators of the PCKMW method surpassed those of the other methods, with an RMSE of 4.321 dB, a PMVE of 0.726 dB, and an R2 of 0.657.
The above demonstration illustrates that prior-knowledge-based classification and weighted coefficient calculation are particularly beneficial for overcoming the challenges posed by the large dynamic ranges of scattering coefficients and numerous parameters in complex terrains across extensive areas, ultimately enhancing the prediction accuracy. The pre-classification classifies source data into six classes with similar features, which helps to calculate more accurate weight coefficients for each data class. And, in the process of weighted clustering for source data, the scattering coefficient is also used as one of the clustering parameters, and the weight of the scattering coefficient is set to 30% to increase the influence of the scattering coefficient in the clustering algorithm. In addition, by comprehensively considering the correlation coefficient and the RF out-of-bag parameter error, a comprehensive weight coefficient was formed to characterize the multifactor influence, which was found to be superior to the fitting effects of using either of these two factors individually. Through the aforementioned operations, a large and complex dataset was divided into 6 × 12 clusters. Each cluster contained elements that were similar in terms of their influence on the scattering coefficient. Subsequently, by applying the RF and MW, a relatively good prediction performance was achieved. The current achievements in clutter estimation methods offer a viable option for accurately recognizing clutter characteristics in complex-terrain environments where data collection may be difficult or impossible, with lower human and economic costs.
Despite the certain effectiveness of the current clutter estimation methods, they still require considerable time and effort to refine and optimize the algorithms to further improve their accuracy, robustness, and applicability. In the future, we plan to make improvements in the following aspects. Firstly, we aim to enrich and optimize the clutter dataset associated with multiple elements, making it closer to real-world scenarios and providing more comprehensive representations of the element information. Secondly, we intend to apply deep learning networks to the data training and prediction, which will leverage their ability to learn complex patterns and relationships from large datasets, thereby enhancing the accuracy of the algorithms. Thirdly, we plan to enrich the knowledge information of the pre-classification, making the classification method of terrain features more detailed and accurate. This will enable us to develop more fine-grained classification schemes that can distinguish between different types of terrain features more precisely, leading to more accurate clutter estimations. Lastly, we intend to consider broader principles for weight coefficient computation and to optimize the methodology for determining these weights. By addressing these areas, we can expect to see significant advancements in clutter estimation techniques in the coming years.