Combining LiDAR, SAR, and DEM Data for Estimating Understory Terrain Using Machine Learning-Based Methods

Huang, Jiapeng; Zhang, Yue; Ding, Jianhuang

doi:10.3390/f15111992

Open AccessArticle

Combining LiDAR, SAR, and DEM Data for Estimating Understory Terrain Using Machine Learning-Based Methods

by

Jiapeng Huang

^1,*

,

Yue Zhang

¹ and

Jianhuang Ding

²

¹

School of Geomatics, Liaoning Technical University, Fuxin 123000, China

²

School of Information Engineering, Jilin Vocational College of Industry and Technology, Jilin 132013, China

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(11), 1992; https://doi.org/10.3390/f15111992

Submission received: 19 October 2024 / Revised: 7 November 2024 / Accepted: 9 November 2024 / Published: 11 November 2024

(This article belongs to the Special Issue Precise Forestry: Forest Dynamic Change Mapping, Monitoring and Modeling)

Download

Browse Figures

Versions Notes

Abstract

:

Currently, precise estimation of understory terrain faces numerous technical obstacles and challenges that are difficult to overcome. To address this problem, this paper combines LiDAR, SAR, and DEM data to estimate understory terrain. The high multivariable-precision spaceborne LiDAR ICESat-2 data, validated by the NEON, are divided into training and validation sets. The training dataset is used as a dependent variable, the SRTM DEM and Sentinel-1 SAR data are regarded as independent variables, a total of 13 feature parameters with high contributions are extracted to construct a Multiple Linear Regression model (MLR), BAGGING model, Random Forest model (RF), and Long Short-Term Memory model (LSTM). The results indicate that the RF model exhibits the highest accuracy among the four models, with R² = 0.999, RMSE = 0.701 m, and MAE = 0.249 m. Then, based on the RF model, the understory terrain at the regional scale is generated, and an accuracy assessment is performed using the validation dataset, yielding R² = 0.999, RMSE = 0.847 m, and MAE = 0.517 m. Furthermore, this paper quantitatively analyzes the effects of slope, vegetation coverage, and canopy height on the estimation accuracy of understory terrain. The results show that as slope, and canopy height increase, the estimation accuracy of the RF model for understory terrain gradually decreases. The accuracy of the understory terrain estimated by the RF model is relatively stable and not easily affected by slope, vegetation coverage, and canopy height. The research on the estimation of understory terrain holds significant practical implications for forest resource management, ecological conservation, and biodiversity protection, as well as natural disaster prevention.

Keywords:

understory terrain; ICESat-2/ATLAS; SRTM DEM; RF model; elevation accuracy

1. Introduction

Forests are the largest, most complex in terms of structure, and most biologically diverse ecosystems on Earth and represent a crucial element in maintaining global ecological balance [1]. The term understory terrain refers to the topographical areas located in forest-covered areas, whose complex diversity has a profound influence on the function and stability of forest ecosystems [2]. An accurate estimation of understory terrain can help in understanding the structure and function of forest ecosystems, providing necessary data support for scientific research in forestry [3], geology [4], environment [5], and other fields [6].

The Digital Elevation Model (DEM) is an important geographic data model [7] that can digitally simulate ground terrain using mathematical methods [8]. The DEM has been applied to various tasks, including topographic surveying and mapping [9], urban planning [10], disaster monitoring [11], and resource protection [12], providing a significant data foundation for understory terrain estimation [13]. Understory terrain estimation can be realized through photogrammetry, ground surveying, digitization of the existing topographic maps, and using the existing DEM databases [14]. In modern technology, light detection and ranging (LiDAR) [15], optical photogrammetry [16], and Interferometric Synthetic Aperture Radar (InSAR) data have become important means for understory terrain estimation [17].

Spaceborne LiDAR is a high-precision, high-resolution remote sensing measurement device [18]. In the process of understory terrain estimation, laser pulses emitted by a laser emitter form a collection of spots on the surface of the understory terrain. The receiver records the time when a laser pulse is reflected by the terrain and arrives at the receiver, saving the coordinates and elevation information on ground points, which are stored as spot data [19]. Huang et al. [20] used ICESat-2 data to estimate the accuracy of understory forest terrain. The RMSE of the understory terrain accuracy under strong and weak beam conditions was 0.74 m and 0.76 m, respectively. Dong et al. [21] utilized GEDI to estimate the performance of understory forest terrain. The results indicated that the RMSE for the terrain in the Xibola coniferous forest area was 2.33 m, while the RMSE for the mixed coniferous and broad-leaved forest in Maoershan was 4.49 m. Wang et al. [22] conducted in-depth validation studies on the accuracy of NEON sites within Maryland. The research results indicated a high degree of approximation between NEON data and actual surface elevation data. The aforementioned experiments all demonstrate that spot data can be used to estimate understory terrain. Furthermore, Due to the high pulse repetition frequency of spaceborne LiDAR, a higher spot density can be achieved [23]. The high-density distribution of spots over understory terrain enables more accurate estimation of the terrain, thereby providing a higher level of precision in understory terrain estimation.

While traditional manual measurement of understory terrain is feasible, it is time-consuming and labor-intensive in complex forest environments and susceptible to human errors, resulting in limited accuracy and reliability of the measurements. In comparison, optical remote sensing data is also widely used in research on understory terrain estimation. Yang et al. [24] combined LiDAR data with optical remote sensing data to create maps of forest canopy height. Liu et al. [25] used multi-band data from TanDEM-X DEM and Sentinel-2 as input variables and high-precision understory terrain data as output variables to estimate understory terrain by RF fitting. Compared to the original TanDEM-X DEM, the extracted terrain accuracy was improved by 76% and 63% in the two experimental areas, respectively. Magruder et al. [26] synergized the ICESat-2 and Landsat 8 data to improve the SRTM DEM of understory terrain by constructing a correction model to estimate the understory terrain. The results indicated that after applying the improved model to radar measurement data, the elevation accuracy of the understory terrain in the study area was improved by nearly 50%. Optical remote sensing data is significantly influenced by weather and time. In forests, canopy occlusion, shadowing, and variations in light intensity and direction all affect the quality of remote sensing images, making it difficult to obtain accurate elevation information of the ground surface [27]. This results in challenges in acquiring topographic data under the forest canopy in densely wooded areas.

Microwave remote sensing technology, particularly synthetic aperture radar (SAR), has brought an opportunity to address the challenges of understory terrain mapping [28]. SAR technology has the advantages of large-scale coverage, high resolution, high accuracy, and all-weather monitoring capabilities, which enable it to penetrate the forest canopy and record information about the vertical structure of the forest [29]. Hu et al. [30] estimated understory terrain using a machine learning approach and the TanDEM-X InSAR, ICESat-2, and Landsat 8 data. The results showed that the RMSE of the estimated understory terrain was 5.45 m, representing an improvement of over 60% compared to the InSAR DEM with an RMSE of 14.70 m. Cai et al. [31] evaluated the accuracy of multi-source DEM data within forest areas in Maryland. Their research results indicated that SRTM had a high penetration rate in the study area. However, despite significant results achieved by the previous research, there are still certain issues that urgently need to be addressed, and they can be summarized as follows:

Although the existing research has demonstrated that ICESat-2 data can be used to estimate understory terrain, its footprint diameter of approximately 17 m and spacing of 0.75 m, along with its discontinuity, limit its ability for regional-scale estimation in forest areas.
Previous studies have estimated understory terrain using areal data, which included more optical remote sensing data, thus lacking research on other types of remote sensing data.
In previous research focused on understory terrain estimation, the effects of forest structural parameters and geographical factors in the forest system on the estimation of understory terrain have not been adequately examined.

It should be noted that strong concealment of forest-covered areas and interference factors related to the terrain and environment introduce significant challenges to the understory terrain estimation [32]. Therefore, this paper selects Maryland as the study area, uses ICESat-2 data as the dependent variable, and employs both SRTM DEM and Sentinel-1 SAR data as independent variables to construct a machine learning-based model for estimating understory terrain. The performance of different models is quantitatively evaluated based on statistical accuracy metrics. Moreover, aiming to address the limitations of previous studies where the estimation of understory terrain at regional scales did not fully quantify forest structural parameters and geographical factors within the forest system, this study analyzes understory terrain from three dimensions, namely slope, vegetation cover, and canopy height to explore their effects on the accuracy of understory terrain estimation.

2. Study Materials

2.1. Study Area

In this study, Maryland was selected as a study area. Maryland is located on the East Coast of the United States, within the South Atlantic region, having geographic coordinates of 37°69′–39°95′ N and 74°82′–79°71′ W. The state’s terrain of Maryland is highly diverse, ranging from dunes and low marshlands in the east to rolling hills across the region and ending with forested mountains in the west. The highest point in Maryland is 1025 m above sea level. The primary vegetation types in this area include shrublands, forests, and herbaceous plants, with tree heights reaching up to 45 m [22,31]. Figure 1 illustrates the SRTM DEM schematic of the study area.

2.2. Study Data

2.2.1. ICESat-2/ATLAS Data

As a follow-up mission to the ICESat (https://nsidc.org/data/icesat, accessed on 12 December 2023), the ICESat-2 carries a photon-counting LiDAR named ATLAS, which is used to measure varying surface elevations [33]. Due to its photon-counting mechanism, the single-pulse energy of the laser is only 40 μJ to 120 μJ. At the same system power consumption, the payload is designed with six beams, a laser repetition frequency as high as 10 kHz, and a footprint spacing of only 0.7 m, which enables continuous detection of six swaths under the satellite. The ICESat-2’s data products include the Global Photon Geolocation Data Product (ATL03), which primarily records information on latitude, longitude, and elevation of photons, and the Land and Vegetation Height Data Product (ATL08), which records attributes and other information related to photons [34]. The ICESat-2 data used in this study are from the year 2022 and are publicly available for download through NASA’s official website https://www.earthdata.nasa.gov/, accessed on 5 February 2024. The downloaded data are presented in Appendix A.

2.2.2. SRTM DEM

The Shuttle Radar Topography Mission (SRTM) was conducted by the National Aeronautics and Space Administration (NASA) and the National Geospatial-Intelligence Agency (NGA), where interferometric synthetic aperture radar (InSAR) was used to produce digital elevation data products [35]. The objective of this mission was to obtain the GDEM by collecting and analyzing data using a C-band InSAR system. The horizontal accuracy of the SRTM is ±20 m, and its vertical accuracy is ±16 m. The vertical error of the DEM, which is based on the WGS84 and EGM96, is less than 16 m at a 90% confidence level [36]. This DEM was used to extract slope data used in this study to analyze the influencing factors. The SRTM DEM version used in this study was SRTM V3, with a resolution of 30 m, based on the World Geodetic System 1984 (WGS-84) and the Earth Gravitational Model 1996 (EGM96). This version of data is highly resolved and has been widely used in recent research. For instance, Zhu et al. [37] and Sun et al. [38] used these data in their studies. The SRTM DEM data product was downloaded from the Gee official website https://developers.google.cn/earth-engine/, accessed on 28 February 2024.

2.2.3. Sentinel-1 SAR

The “Sentinel” series of satellites constitutes a dedicated satellite fleet for the space segment (GSC) of the European Copernicus program (https://www.copernicus.eu/en, accessed on 5 April 2024). Sentinel-1 is a constellation consisting of two satellites, Sentinel-1A and Sentinel-1B, which were launched into space in 2014 and 2016, respectively. Both satellites are equipped with C-band SAR, which can provide radar imaging of the Earth’s surface under all-weather and full-time conditions, executing multiple imaging modes (e.g., SM, IW, EW, and WV) [39]. In this study, Sentinel-1 Interferometric Wide Swath (IW) SLC products, which are the primary acquisition mode for land areas, capturing data up to 250 km in length with a spatial resolution of 5 m × 20 m (single look), were selected for analysis. Three sub-areas were captured using Terrain Observation by Progressive Scans SAR (TOPSAR). However, Sentinel-1B became inoperable due to a malfunction at the end of 2021, and it is expected to be replaced by Sentinel-1C in the near future [40]. Therefore, to avoid the impact of data temporal variability on estimation accuracy, Sentinel-1 IW-SLC data covering the study area in December 2022 was downloaded from the ASF website, https://search.asf.alaska.edu/#, accessed on 5 April 2024. The ASF website manages NASA’s Synthetic Aperture Radar (SAR) data archives, which originate from various satellites and aircraft.

2.2.4. NEON

The National Ecological Observatory Network (NEON) serves as an open source for ecological remote sensing data, and it was funded by the National Science Foundation (NSF) [41]. The NEON provides data collected from both the Terrestrial Observation System (TOS) and the Airborne Observation Platform (AOP) [42]. The TOS has been designed to conduct field monitoring of forest structures within sample plots, whereas the AOP has been tasked with collecting high-resolution and accurate aerial remote sensing data [43]. According to the NEON documentation, the aircraft is flown at an altitude of 1000 m above the ground level and has a speed of 100 knots to achieve meter-scale hyperspectral and LiDAR raster data products, along with sub-meter multispectral imagery. The NEON LiDAR is one of the AOP products used to evaluate the performance of the spaceborne LiDAR data. Liu et al. [44] used NEON as actual measurement data to evaluate the performance of spaceborne LiDAR data. The NEON data used in this study are from August 2021 and were downloaded from the NEON Science Data Hub https://data.neonscience.org/data-products/explore/, accessed on 15 May 2024.

2.2.5. Vegetation Coverage Production

Vegetation coverage (Landsat Vegetation Continuous Fields (VCF)) refers to the proportion of the Earth’s surface covered by vegetation canopies, and it has been commonly used to describe the density or growth status of vegetation in a particular region. According to the official website, the VCF tree cover layers provide estimates of the percentage of horizontal ground in each 30 m pixel that is covered by woody vegetation taller than 5 m. The corresponding dataset is publicly available for four periods centered on the years 2000, 2005, 2010, and 2015. This dataset is derived from the GFCC Surface Reflectance product (GFCC30SR), which is based on the enhanced Global Land Survey (GLS) datasets [45]. The GLS datasets consist of high-resolution Landsat 5 Thematic Mapper (TM) and Landsat 7 Enhanced Thematic Mapper Plus (ETM+) images with a resolution of 30 m. In this study, the global vegetation coverage change (GFCC) tree cover multi-year global 30-m dataset was used, which was downloaded from https://developers.google.cn/earth-engine/, accessed on 25 May 2024.

2.2.6. Canopy Height Production

As an important component of forest vertical structure parameters, the distribution of forest canopy height has a certain influence on the estimation and research of underlying forest terrain [46]. The Global Land Analysis and Discovery (GLAD) research team at the Department of Geography at the University of Maryland has integrated canopy height data obtained from GEDI LiDAR and Landsat time-series data. These combined data are referenced to the WGS-84 coordinate system and have a spatial resolution of 30 m. The canopy height data can be obtained from https://developers.google.cn/earth-engine, accessed on 19 June 2024.

3. Methods

This study focuses on the estimation of understory terrain and develops a method that combines LiDAR, DEM, and SAR data for this purpose. In this study, the airborne remote sensing data obtained from the NEON were used to validate the accuracy of understory terrain derived from spaceborne LiDAR data from the ICESat-2. Within the coverage area of the spaceborne LiDAR data, the ICESat-2 data were divided into training and validation datasets, containing 70% and 30% of the overall data, respectively. The training dataset served as a dependent variable, whereas the DEM data and SAR data were used as independent variables to construct multiple machine learning models. These models were quantitatively evaluated using accuracy metrics (R², RMSE, MAE). Subsequently, the machine learning model with the highest accuracy was employed to extrapolate the understory terrain at the regional scale, ultimately completing the estimation of the understory terrain. Figure 2 illustrates the flowchart of the method used in this experiment to estimate understory terrain.

3.1. Validation of ICESat-2’s Accuracy by NEON

To assess the performance of the spaceborne LiDAR data obtained from the ICESat-2 effectively, this study used reliable and higher-resolution observational data covering various forest ecosystems to validate the elevation accuracy. The dataset from the NEON was used as a reference for evaluation. This dataset was characterized by its comprehensiveness, openness, large scale, and long-term nature [47], making it suitable for applications in various fields, including ecological research, environmental monitoring, and climate change studies.

To quantify the elevation accuracy of the spaceborne LiDAR data ICESat-2 for understory terrain, this study employed the digital terrain model (DTM) from the NEON LiDAR for accuracy validation of ICESat-2. The NEON DTM data were compared with the ICESat-2 data for understory terrain to assess the elevation accuracy of ICESat-2. First, the NEON DTM data were obtained from the downloaded NEON LiDAR dataset, including data on areas covered by spaceborne LiDAR. It was ensured that the NEON LiDAR data had sufficient elevation accuracy and resolution for effective comparison with ICESat-2 data. Second, the NEON DTM dataset and the spaceborne LiDAR ICESat-2 data were spatially matched to ensure precise alignment in terms of geographical coordinates. For each matched point, the correlation between the NEON DTM elevation and the ICESat-2 elevation was calculated, and the accuracy of ICESat-2 was evaluated using accuracy metrics.

3.2. Data Preprocessing

3.2.1. Spaceborne LiDAR Data Preprocessing

The feature parameters were extracted from the spaceborne LiDAR data ICESat-2. The ICESat-2 data products included the Global Geolocated Photon data product (ATL03) and the Land and Vegetation Height data product (ATL08). From the ATL03 data product, parameter information on photon events in the study area was extracted, and it included longitude (lon_ph), latitude (lat_ph), elevation information (h_ph), a unique identifier for each 20-m segment (segment_id), and geoid correction information (geoid). From the ATL08 data product, parameter information, including longitude (longitude), latitude (latitude), the best-fit terrain elevation within a 20-m range (h_te_best_fit_20m), terrain elevation uncertainty (h_te_uncertainty), and the 20-m segment ID corresponding to the photon (ph_segment_id), was obtained. The ATL03 recorded photon coordinates, whereas the ATL08 recorded photon attributes. These two datasets were associated in a specific way to match the photon coordinates in ATL03 with the classification attributes in ATL08 [20]. In addition, by comparing and matching the ph_segment_id in ATL08 with the segment_id in ATL03, the corresponding photon segments were determined. Once the corresponding segments were identified, the sequence number of the photons in ATL08 within ATL03 was calculated [48]. This was achieved by adding the classed_pc_indx from ATL08 to the starting photon sequence number of the corresponding segment in ATL03; namely, since indexing typically starts from zero, one was subtracted from the result. Using the calculated photon sequence numbers, the corresponding photons in ATL03 were located, and their longitude, latitude, elevation, and other related data were extracted. These data were combined with the classification data from ATL08 for subsequent analysis and research.

Next, the correction of geoid information for spaceborne LiDAR data ICESat-2 was performed. Based on the official ATLAS data documentation, the geoid parameter value from ATL03 was associated with the h_te_best_fit_20m parameter from ATL08 using the latitude and longitude information from ICESat-2/ATLAS data. Then, the elevation calibration was performed on h_te_best_fit_20m using the geoid, and the detailed formula is given in (1). This process yielded understory elevation values that aligned with the true elevation coordinate system.

H_ICESat-2 = h_te_best_fit_20m − geoid

(1)

In (1), H_ICESat-2 represents the elevation correction value; h_te_best_fit_20m is the best-fit elevation value provided by ATL08 for the midpoint of a 20-m segment; geoid is the photon height relative to the reference ellipsoid provided by ATL03, which can be converted to a height relative to the reference geoid.

Further, the resampling of spaceborne LiDAR data ICESat-2 was conducted. To ensure that all datasets had the same resolution, the spaceborne LiDAR data were resampled. Based on the resolution of the SRTM DEM used in this study area, the ICESat-2 data were resampled to a 30-m resolution. The resampling method used in this study was bilinear interpolation, which extends linear interpolation in two directions to obtain interpolation results in a two-dimensional space. The raster value of the pixel to be determined was calculated using a weighted distance from the pixel to its four neighboring pixels, and linear interpolation was performed in both the X and Y directions to obtain the gray value of the pixel to be determined. Assuming that α = x − i and β = y − j, the bilinear interpolation formula can be expressed as follows:

f (x, y) = (1 - α) (1 - β) f (i, j) + (1 - α) β f (i, j + 1) + α (1 - β) f (i + 1, j) + α β f (i + 1, j + 1)

(2)

where a set (x, y) denotes the coordinates of the sampling point, and (i, j), (i + 1, j), (i, j + 1), and (i + 1, j + 1) are the coordinates of its four nearest neighboring points; i and j are integers, and x and y can be either integers or decimals.

3.2.2. SAR Data Preprocessing

Preprocessing of Sentinel-1 SAR data is crucial to enhance the accuracy and reliability of data by removing noise, correcting errors, and performing other procedures during the phase. The Sentinel-1 SAR data preprocessing was conducted using the official software Sentinel Application Platform (SNAP) developed by the European Space Agency. The SNAP is a Sentinel data processing platform that supports preprocessing and analysis of Sentinel-1 SAR data [49].

The preprocessing of Sentinel-1 SAR data started with importing these data into the SNAP and involved the steps. Through the orbit correction step, the positional deviation in SAR data is corrected, the coordinates of the SAR images were adjusted to align with the actual positions on the Earth’s surface; Radiometric Calibration converted information on the original radar pulse intensity observed by the SAR system into backscatter coefficients; Pulse stitching mosaics multiple adjacent or overlapping SAR images to form a larger continuous area coverage. Pulse stitching can be regarded as a combined process of image registration and image fusion, which involves the calculations of Formulas (3) and (4), respectively.

(\bar{x}, \bar{y}) = (A x + B y + C, D x + E y + F)

(3)

where (x, y) represents the coordinates of the original image, and (

\bar{x}

,

\bar{y}

) denotes the image coordinates after transformation; A, B, C, D, E, and F are the transformation parameters.

l_{f u s e d} = \frac{w_{1} l_{1} + w_{2} l_{2}}{w_{1} + w_{2}}

(4)

In (4), I_fused is the fused pixel value,

l_{1}

and

l_{2}

are the pixel values of adjacent or overlapping image areas, and w₁ and w₂ are the weighting coefficients.

Process SLC data through the multi-look processing function and set the corresponding parameters to obtain intensity data; Filter the processed data from the previous step to reduce noise through smoothing while preserving detailed information on the image; Use the downloaded and stored DEM data that covers the SAR data range to perform terrain correction on Sentinel-1 SAR. By interpolating the DEM data to obtain ground heights corresponding to the pixels of the SAR imagery, and then based on the terrain height information and the geometric relationship of SAR imaging, calculate the correct position of each pixel in the geographic coordinate system. Extract texture factors that can reflect surface texture characteristics and output the extracted factors in the form of images or data tables for subsequent machine learning research to estimate and study understory terrain.

3.3. Machine Learning-Based Models

Machine learning-based models can construct predictive models for new understory terrain data by learning and extracting features from historical understory terrain data.

3.3.1. Multiple Linear Regression Model

The Multiple Linear Regression (MLR) model is a statistical method [50] based on correlation analysis between multiple independent variables and one dependent variable. It predicts new understory terrain using an established model. MLR typically uses the least squares method to estimate regression coefficients, minimizing the sum of squared errors between predicted and actual values for an optimal solution. The model can be expressed as:

f (x) = k^{T} + b

(5)

where x represents the input vector containing multiple features (independent variables), f(x) is the model output (i.e., the predicted target variable), k^T denotes the feature weights, and b is the intercept or bias of the model.

When conducting an MLR (Multiple Linear Regression) model, selecting appropriate independent variables is one of the prerequisites for correctly constructing the MLR model. In other words, we must not only consider factors that may influence the dependent variable but also ensure that these factors (i.e., independent variables) have a statistically significant impact on the dependent variable and that their relationship exhibits a close linear characteristic. Furthermore, the performance of the MLR model heavily relies on the quality of the data, so when collecting and processing data, it is necessary to ensure the data’s accuracy, completeness, and consistency.

3.3.2. BAGGING Model

The main principle of the BAGGING (Bootstrap Aggregating) prediction model is based on the idea of ensemble learning, which aims to improve the accuracy and stability of predictions by combining multiple base learners [51]. The operational principle of this model involves constructing multiple mutually independent models and obtaining the final prediction result by averaging their predictions. These models are trained on different training subsets, each obtained by randomly generating samples through certain replacements in the original training dataset. The core of this approach lies in bootstrap sampling, where each sample is obtained using replacement, so training subsets can contain duplicate samples, and some samples from the original training set may not appear in a particular sampling set. This training method is robust to noisy data because constructing multiple models and combining their predictions can mitigate the impact of noisy data to a certain extent. Assume that there are n training subsets; each subset trains a base model, where i = 1, 2, …, n. For a new test sample x, each base model provides a prediction result, and the final prediction result of the BAGGING regression model is obtained by:

y_{f i n a l} = (y_{1} + y_{2} + \dots + y_{n})

(6)

where y_final is the final prediction result, and y_i is the prediction result of the i_th base model.

3.3.3. Random Forest Regression Model

The Random Forest (RF) model, a BAGGING extension [52], introduces random attribute selection. Tree construction selects a subset of attributes for optimal partitioning, addressing overfitting and improving generalization. RF features sample and feature randomness: sample randomness follows BAGGING’s bootstrap sampling, ensuring data diversity. The randomness of features is a unique characteristic of Random Forest. It refers to the mechanism where, when selecting the optimal partitioning feature, Random Forest only considers a randomly selected subset of features [53]. This mechanism further enhances the randomness and generalization ability of the model. Figure 3 illustrates RF’s operational principle.

3.3.4. Long Short-Term Memory Model

The Long Short-Term Memory (LSTM) model is used for regression prediction by leveraging the structural characteristics of the LSTM network to learn the mapping relationship between input and output data through network training [54]. The LSTM model controls the information flow and state updates using the three types of gates: forget, input, and output. Specifically, the forget gate decides what information should be discarded from the cell state, as shown in (7). The results of the forget and input gates are combined with the candidate cell state to update the cell state, as given in (8). These gating structures are implemented into the model through nonlinear transformations, such as the Sigmoid and Tanh functions:

f_{t} = σ (w_{f} \times [h_{t - 1}, x_{t}] + b_{f}

(7)

In (7), f_t is the output of the forget gate, which is a vector between zero and one; σ is the sigmoid function, which maps a value to the interval (0, 1); w_t is the weight matrix of the forget gate; h_t−1 is the hidden state from the previous time step; x_t is the input at the current time step; b_f is the bias term of the forget gate; and [

h_{t - 1}, x_{t}

] is the concatenation of

h_{t - 1}

and

x_{t}

:

C_{t} = f_{t} * C_{t - 1} + i_{t} * \tilde{C_{t}}

(8)

In (8),

C_{t}

denotes the cell state at the current time step,

C_{t - 1}

denotes the cell state from the previous time step, and

*

denotes element-wise multiplication.

3.4. Regional Scale Estimation and Verification

As mentioned before, this study used a spaceborne LiDAR training dataset as a dependent variable and the SRTM and SAR data as independent variables. Thirteen feature parameters with significant contributions were selected, and four models, including the MLR, BAGGING model, RF model, and LSTM model, were used for modeling. The performances of the models were evaluated based on precision evaluation metrics R², RMSE, MAE. The model with the best performance was selected for regional-scale estimation to obtain the status or changes of surface parameters or processes, forming a new areal representation of the understory terrain. To verify the accuracy of the understory terrain elevation inverted by this model, this study used the ICESat-2 validation dataset to validate the obtained regional scale.

4. Results and Discussion

In this study, we combined LiDAR and SAR data to estimate understory terrain and constructed MLR, BAGGING, RF, and LSTM models respectively. We quantitatively evaluated the performance of these four models using accuracy assessment metrics and selected the model with the highest accuracy for regional-scale extrapolation to estimate understory topography. This study conducted an in-depth analysis of the specific impacts of three major factors—slope, vegetation coverage, and canopy height on understory terrain estimation.

4.1. Results

4.1.1. ICESat-2 Data Accuracy

To ensure the accuracy of ICESat-2 data, this study used the NEON data to validate its elevation precision. The scatter plot of the correlation between the NEON DTM data and the ICESat-2 understory terrain data in the study area is presented in Figure 4. As shown in Figure 4, there was a strong correlation between these two data. Based on the NEON DTM data, the elevation accuracy evaluation metrics for the ICESat-2 understory terrain data were R² = 1, RMSE = 0.868 m, and MAE = 0.517 m. NEON and ICESat-2 exhibit perfect linear correlation with extremely high fit, and the elevation data between them show very small differences, which fully demonstrates the exceptional consistency of ICESat-2 and NEON in elevation measurements. Wang et al. [31] also used NEON data from Maryland stations in their study to evaluate the performance of spaceborne LiDAR data. This demonstrated that the elevation precision of the understory terrain extracted from the ICESat-2 data was high, satisfying the requirement for using ICESat-2 understory terrain data as a dependent variable to construct machine learning models for understory terrain estimation.

4.1.2. Model Accuracy Analysis

This study estimated understory terrain using machine learning models combining the LiDAR, SAR, and DEM data. The MLR, BAGGING, RF, and LSTM models were constructed. The evaluation metrics of the elevation accuracy of the understory terrain estimated by the reference DEM (SRTM DEM) and the MLR, BAGGING, RF, and LSTM models are presented in Table 1. As shown in Table 1, the accuracy of the MLR, BAGGING, and RF models was improved compared to the reference DEM. Among them, the RF model performed the best, with R² = 0.999, RMSE = 0.701 m, and MAE = 0.249 m, improving the accuracy compared to the reference DEM. The RMSE was improved from 9.847 m to 0.701 m, which represents a 92.8% increase in accuracy. The BAGGING model ranked second, following the RF model; compared to the elevation accuracy of SRTM, the accuracy has been improved by 91.7%. The MLR model had relatively lower performance compared to the other models, improving only the elevation accuracy by 34.7%. The results in Table 1 also show that the LSTM prediction model did not improve the accuracy compared to the original DEM but rather decreased it; the RMSE decreased from 9.847 m of the original DEM to 13.212 m. This could be because, based on the operational principle of the LSTM model, it excels in processing and predicting sequential data with long-term dependencies, but this study focused on estimating understory terrain rather than time-series data. Therefore, it could be concluded that the LSTM prediction model is not suitable for estimating understory terrain in the study area and could not provide accurate estimates.

Among the RF, MLE, and LSTM, the MLR model exhibited the poorest prediction performance. This was mainly because of the limitations imposed by the linear relationship assumption of this model. Namely, in complex environments, such as forest areas, understory terrain is influenced by various complex factors, which can exhibit nonlinear relationships with the terrain height. Therefore, the MLR model might fail to capture these nonlinear relationships in data accurately, which would lead to inaccurate estimation results. In contrast, the RF and BAGGING prediction models, due to their randomness in sample selection and averaging of prediction results from multiple decision trees, could significantly improve the accuracy of understory terrain estimation. Among them, the RF model, which was developed based on the BAGGING model with an additional feature randomness function, allowed each decision tree to select the most optimal features randomly during the prediction process. This ultimately yielded an average prediction that was superior to that of the BAGGING model, making the RF model highly accurate in estimating understory terrain.

4.1.3. Understory Terrain Estimation Accuracy

Based on the accuracy assessment results, the RF model was the best-performing model among the four models developed in this study. Therefore, the extrapolation at the regional scale was conducted using this model, and the areal data results are shown in Figure 5. The accuracy of the regional-scale data generated by the RF Model was validated using the validation dataset, which consisted of 30% of the spaceborne LiDAR ICESat-2 data. The evaluation results for the regional-scale estimation accuracy were as follows: R² = 0.999, RMSE = 0.847 m, and MAE = 0.517 m. In comparison, the original SRTM accuracy evaluation metrics were R² = 0.997, RMSE = 3.539 m, and MAE = 4.682 m. Based on the RMSE values, the RF model significantly improved the estimation accuracy of understory terrain by approximately 76%. Furthermore, analyzing the MAE values reveals a notable observation: the MAE of the RF model’s estimated regional-scale understory terrain is only 0.517 m, indicating a small average absolute difference between the ICESat-2 validation data and the newly generated regional-scale areal data. Therefore, the RF model demonstrates excellent regional estimation performance, and when combined with LiDAR, SAR, and DEM data, it achieves high estimation accuracy for understory terrain.

In this study, Sentinel-1 and SRTM data were used as the independent variable, and a total of 13 characteristic variables were extracted from it for research., including VV_GLCMMean, VV_MAX, VV_Energy, VH_GLCMMean, VV_Entropy, VH_Energy, VH_Entropy, VH_GLCMVariance, VV_GLCMVariance, VV_Homogeneity, VV_ASM, VV_Dissimilarity and SRTM DEM. The SHAP interpreter is used to reflect the impact of each characteristic variable on the prediction results. The SHAP value schematic diagram of the characteristic variables is presented in Figure 6. Among these variables, the SRTM DEM contributed the most to the estimation model, whereas the contribution rate of the 12 characteristic parameters from Sentinel-1 SAR data was relatively small. This was because, in forested areas, vegetation cover, and terrain complexity could pose certain obstacles to the propagation of radar signals, limiting the accuracy of radar images used for estimating understory terrain and, thus, affecting the accuracy of terrain estimation. However, compared to the SRTM data, the high-resolution images provided by Sentinel-1 SAR data could clearly show the detailed characteristics of understory terrain, including terrain fluctuations and vegetation cover, which were crucial for achieving an accurate estimation of the morphology and features of understory terrain. Therefore, the Sentinel-1 SAR data played a vital role and was a key element in understory terrain estimation.

4.2. Slope Importance Analysis

To analyze the influence of different factors affecting the estimation accuracy of understory terrain elevation, this study first conducted an analysis of the accuracy of understory terrain estimation using different models in terms of slope value. The slope data used in this study were obtained from the SRTM DEM and were used to analyze the slope effect on the understory terrain estimation result. This study conducted quantitative validation of the results using these two accuracy evaluation metrics, the R² and RMSE values. The diagram in Figure 7 is a schematic comparison of the elevation accuracy of SRTM and the elevation accuracy evaluation indicators of the regional-scale understory terrain retrieved using the RF model, under different slope conditions.

As shown in Figure 7a, compared to the original SRTM, the R² of the RF model shows a slight improvement, all falling within the range of 0.999. This result indicates that under different slope conditions, the RF model provides a high degree of fit between its estimation of the understory terrain and the actual terrain. However, to avoid overfitting and consider other potential factors, further analysis is conducted using the RMSE, as shown in Figure 7b. The line chart of RMSE for the regional-scale understory terrain estimation by SRTM and the RF model clearly shows that as the slope increases, the estimation accuracy of the understory terrain gradually decreases. The study by Qin et al. [55] also demonstrated that the error of SRTM DEM increases significantly as the slope increases. As the terrain becomes steeper, photon noise gradually increases, leading to a larger discrepancy between the elevation data acquired by sensors and the actual terrain data. Consequently, the accuracy of the understory terrain estimated by both SRTM and the RF model decreases.

In the slope range of 0–10°, the RMSE of the elevation from SRTM is 9.067 m, while the RMSE of the elevation of the understory terrain generated by the RF model is 0.812 m. These two accuracy values represent the highest precision segments for SRTM and RF models, respectively, in terms of understory terrain elevation. Compared to the original SRTM, the RF model has improved the estimation accuracy of understory terrain by 91%. In the slope range of 20–30°, the RMSE of the elevation from SRTM is 29.557 m, and the RMSE of the elevation of the understory terrain generated by the RF model is 2.123 m. These two accuracy values represent the lowest precision segments for SRTM and RF models, respectively, in terms of understory terrain elevation. After using the RF model for regional estimation, the accuracy of the understory terrain elevation is improved by 92.8% compared to SRTM. This result demonstrates the excellent performance of the RF model in estimating understory terrain, and within the slope range of 20–30°, the improvement in estimation accuracy of the RF model over SRTM is even more significant. This suggests that as the terrain slope increases, the improvement in estimation accuracy of the understory terrain by the RF model becomes more pronounced.

In the slope range of 0–10° to 10–20°, the RMSE of SRTM increases from 9.067 m to 16.615 m, indicating an RMSE increase of 7.548 m. Similarly, in the slope range of 10–20° to 20–30°, the RMSE of SRTM rises from 16.615 m to 29.557 m, showing an RMSE increase of 12.942 m. It can be observed that the elevation accuracy of understory terrain estimated by SRTM is significantly affected by slope gradient, and this impact intensifies as the slope gradient increases. In contrast, for the RF model’s estimation of understory terrain accuracy, the RMSE increases from 0.812 m to 1.443 m in the slope range of 0–10° to 10–20°, representing an RMSE increase of 0.631 m. In the slope range of 10–20° to 20–30°, the RMSE of the RF model’s estimation rises from 1.443 m to 2.123 m, with an RMSE increase of 0.680 m. It can be observed that the accuracy of the RF model’s estimation of understory terrain is less affected by slope gradients. Therefore, it can be concluded that under the influence of slope gradients, the accuracy of the RF model’s estimation of understory terrain is more stable and less susceptible to fluctuations caused by topographical factors, maintaining high precision consistently.

4.3. Vegetation Coverage Effect Analysis

Considering the significant influence of vegetation coverage on the terrain elevation effect, this study used the GFCC product to represent the vegetation coverage in areas with the terrain elevation distribution. The areas were divided into four vegetation coverage categories as follows: (1) low vegetation coverage, 1%–30%; (2) medium vegetation coverage, 31%–50%; (3) medium-high vegetation coverage, 51%–70%; and (4) high vegetation coverage, 71%–100%. Figure 8 is a schematic diagram comparing the elevation accuracy of SRTM with the elevation accuracy evaluation indicators of the regional-scale understory terrain retrieved using the RF model. The diagram in Figure 8 is a schematic comparison of the elevation accuracy of SRTM and the elevation accuracy evaluation indicators of the regional-scale understory terrain retrieved using the RF model, under different vegetation coverage conditions.

As shown in Figure 8a, compared to the original SRTM, the R² value for the estimation accuracy of the understory terrain by the RF model remains consistently at 0.999, close to 1. This indicates that under different vegetation coverage conditions, the RF model’s estimation of the understory terrain exhibits good consistency with the actual surface. Further analysis is conducted using the RMSE, as illustrated in Figure 8b. From the RMSE line graph for SRTM’s understory terrain, it is evident that the estimation accuracy of understory terrain first decreases and then increases as vegetation coverage increases. This is because when the vegetation cover is within the 50%–70% range, it falls into the category of high vegetation coverage. For DEM, in areas with high vegetation coverage, the vegetation obstructs the ground radar signals. This makes it difficult for the radar signals to penetrate the vegetation layer and accurately reflect the true terrain of the ground. Furthermore, as the vegetation coverage increases, the roughness of the ground surface also tends to increase. Consequently, the combined effects of ground roughness and vegetation coverage further complicate the interpretation of radar signals, reducing the estimation accuracy of SRTM. Observing the RMSE line graph for understory terrain elevation accuracy estimated by the RF model, it can be seen that the estimation accuracy of understory terrain first increases and then gradually decreases as vegetation coverage increases. The elevation accuracy within the 0%–30% range is slightly lower than that within the 30%–50% range, potentially due to steeper slopes in that area. An increase in vegetation coverage reduces the accuracy of understory terrain data obtained by sensors, which, in turn, leads to inaccurate variables being input into the model, thereby reducing the accuracy of the understory terrain estimated by the final RF model.

Within the forest coverage range of 0%–30%, the elevation RMSE for SRTM is 6.361 m, while the elevation RMSE for the understory terrain generated after constructing the RF model is 0.783 m. These two elevation accuracies represent the highest accuracy segments for SRTM and RF, respectively, in terms of understory terrain elevation. After regional estimation using the RF model, the accuracy of the understory terrain elevation is improved by 87.6% compared to SRTM. In the vegetation coverage range of 50%–70%, the elevation RMSE for SRTM is 12.348 m, while the elevation RMSE for the understory terrain generated after constructing the RF model is 0.995 m. These two elevation accuracies correspond to the lowest accuracy segments for SRTM and RF, respectively. After regional estimation using the RF model, the accuracy of the understory terrain elevation is improved by 91.9% compared to SRTM. This result demonstrates the excellent performance of the RF model in estimating understory terrain, and within the slope range of 50%–70%, the improvement in estimation accuracy of the RF model over SRTM is even more pronounced. This indicates that within the vegetation coverage range of 50%–70%, the RF model exhibits the most significant improvement in estimation accuracy for understory terrain.

In the vegetation coverage range from 0%–30% to 30%–50%, the RMSE of SRTM increases from 6.361 m to 8.137 m, representing a decrease in accuracy by 1.776 m in terms of RMSE. In the range from 30–50° to 50–70° of vegetation coverage slope, the RMSE of SRTM rises from 8.137 m to 12.348 m, indicating a decrease in accuracy by 4.211 m in RMSE. However, in the range from 50%–70% to 70%–100% of vegetation coverage, the RMSE of SRTM decreases from 12.348 m to 11.924 m, showing an improvement of 0.424 m in RMSE. It can be observed that the elevation accuracy of understory terrain estimated by SRTM is affected by vegetation coverage, with the range of 50%–70% having the greatest impact. In the vegetation coverage slope range from 0–30° to 30–50°, the RMSE of the RF model decreases by 0.052 m. In the other two vegetation coverage slope ranges, the RMSE of the RF model decreases by 0.264 m. It is noticeable that the accuracy of understory terrain estimated by the RF model is less affected by vegetation coverage. Therefore, it can be concluded that under the influence of vegetation coverage, the accuracy of understory terrain estimated by the RF model is more stable and less susceptible to fluctuations in high accuracy due to changes in vegetation coverage.

4.4. Canopy Height Effect Analysis

In forest research areas, the canopy height of vegetation has a significant influence on the acquisition of topographic elevation data and other characteristics. In this experiment, the vegetation canopy height was divided into three segments for analysis: (1) canopy heights, 0–10 m; (2) canopy heights, 10 m–20 m; (3) canopy heights, 20 m–30 m. Different models were used to estimate the canopy height data for these three segments. The diagram in Figure 9 is a schematic comparison of the elevation accuracy of SRTM under different canopy height conditions, and the evaluation index for the elevation accuracy of understory terrain at the regional scale is derived using the RF model.

As shown in Figure 9a, under different canopy height conditions, the R² of the understory terrain estimated by the RF model is consistently 0.999, which is higher than the accuracy of the SRTM estimates for the understory terrain. Further analysis is conducted using the RMSE presented in Figure 9b. The line chart of RMSE for the regional-scale forest understory terrain obtained by SRTM and the RF model clearly shows that as canopy height increases, the estimation accuracy of forest understory terrain gradually decreases. This conclusion has also been verified in the studies by Li et al. [56] and the author’s study [20], where the error of SRTM DEM significantly increases. With the increase in canopy height, more interference is generated for the SRTM radar, gradually reducing the likelihood of the SRTM radar’s phase center reaching the ground. For the RF model, as canopy height increases, the shielding effect of the canopy on remote sensing signals also intensifies, resulting in weakened information from the surface in the signals received by the remote sensing sensor. Additionally, photons within the vegetation canopy undergo multiple scattering, which alters their propagation direction and energy distribution. When the remote sensing sensor receives signals containing a large number of photons that have undergone multiple scattering, it leads to a decrease in signal quality, further affecting the accuracy of remote sensing data. As the height of the vegetation canopy increases, the distance between the remote sensing sensor and the ground also increases accordingly. This increases the interference of the atmosphere on remote sensing signals, reducing the accuracy of remote sensing data.

Within the canopy height range of 1–10 m, the elevation RMSE for SRTM is 7.031 m, while the elevation RMSE for the understory terrain generated after constructing the RF model is 0.578 m. These two RMSE values represent the highest accuracy segments for SRTM and RF, respectively, in terms of understory terrain elevation. After regional scale obtainment using the RF model, the accuracy of the understory terrain elevation is improved by 91.7% compared to SRTM. In the canopy height range of 20–30 m, the elevation RMSE for SRTM is 12.176 m, and the elevation RMSE for the understory terrain generated after constructing the RF model is 1.059 m. These two RMSE values represent the lowest accuracy segments for SRTM and RF, respectively, in terms of understory terrain elevation. After regional inversion using the RF model, the accuracy of the understory terrain story terrain elevation is improved by 91.3% compared to SRTM. This result demonstrates the excellent performance of the RF model in estimating understory terrain. From the perspective of accuracy improvement, under different canopy heights, the estimation accuracy of the RF model for understory terrain is proportionally related to the accuracy of SRTM estimates. Specifically, the increase in accuracy is generally consistent across different canopy height conditions.

In the canopy, height ranges from 1–10 m to 10–20 m, and the RMSE of SRTM increases from 7.031 m to 8.196 m, representing a decrease in accuracy by 1.165 m in terms of RMSE. Similarly, in the canopy, height ranges from 10 m–20 m to 20 m–30 m, and the RMSE of SRTM rises from 8.196 m to 12.176 m, indicating a decrease in accuracy by 3.980 m in RMSE. It can be observed that the elevation accuracy of the understory terrain obtained by SRTM is affected by the canopy height factor. In contrast, for the RF model, in the canopy height range from 1–10 m to 10–20 m, the RMSE of the estimated understory terrain increases from 0.578 m to 0.803 m, representing a decrease in accuracy by 0.225 m in RMSE. In the canopy height range from 10–20 m to 20–30 m, the RMSE of the RF model’s estimated understory terrain rises from 0.803 m to 1.059 m, indicating a decrease in accuracy by 0.256 m in RMSE. It is noteworthy that the accuracy of the understory terrain estimated by the RF model is less affected by the canopy height factor, and the accuracy remains relatively consistent across different canopy height ranges. Therefore, it can be concluded that under the influence of the canopy height factor, the accuracy of the understory terrain estimated by the RF model is more stable and less susceptible to fluctuations caused by topographical factors, resulting in high-precision estimates of the understory terrain.

5. Innovation and Limitations

The main innovation of this study can be summarized as follows:

This study utilizes point data from the spaceborne LiDAR ICESat-2 as the dependent variable and SRTM DEM and Sentinel-1 SAR data as the independent variables. Four machine learning-based models are constructed for regional-scale extrapolation, thereby generating new regional data on understory terrain. This process completes the expansion from point data to regional data, enabling precise estimation of understory terrain.
This study introduces the Sentinel-1 SAR data to the understory terrain estimation. The high-resolution images provided by Sentinel-1 enable the acquisition of clear, detailed features of understory terrain, which plays a crucial role in the precise estimation of understory terrain.
The influencing factors of understory terrain estimation are thoroughly analyzed under various conditions by delving into terrain factor slope and ecological factors, including canopy height and vegetation coverage.

However, this study also has certain limitations. First, the NEON data, as benchmark data, despite its widespread station distribution, cannot fully cover the study area, which limits the representativeness of the data. Namely, the NEON stations might not provide sufficient data support. It is hoped that in the future, more comprehensive validation data with higher accuracy will be available for experiments, reducing its impact on the accuracy of experimental results.

6. Conclusions

This paper proposes an efficient method for accurate estimation of understory terrain combining the LiDAR, SAR, and DEM data. The spaceborne LiDAR ICESat-2 data was divided into training and validation sets. The training set consists of 70% of the high-precision spaceborne LiDAR data ICESat-2, and the remaining 30% of the data is used as a validation dataset, which is validated with benchmark NEON data. The training dataset serves as a dependent variable, whereas the SRTM and Sentinel-1 data are regarded as independent variables. A total of 13 feature parameters with high contributions are extracted for the construction of the MLR, BAGGING, RF, and LSTM models. The analysis results show that the RF model has significant advantages in estimating understory terrain compared to the other models. Therefore, this model was selected to perform the regional-scale extrapolation to estimate understory terrain. Additionally, the influencing factors of forest understory terrain estimation are analyzed from the perspectives of terrain factor slope, as well as ecological factors such as vegetation coverage and canopy height.

The main conclusions of this study are as follows:

Among the four machine learning models used in this paper, the RF model demonstrates the highest accuracy, with R² = 0.999, RMSE = 0.701 m, and MAE = 0.249 m. Therefore, the RF model is employed for regional-scale extrapolation of understory terrain. The quantitative evaluation of the regional-scale understory terrain data on the validation dataset yields R² = 0.999, RMSE = 0.847 m, and MSE = 0.517 m.
The RF model maintains high accuracy for different slopes, vegetation coverage, and canopy heights. As the slope and canopy height increase, the estimation accuracy of the RF model for understory terrain gradually decreases. However, compared to the SRTM DEM, the accuracy of the understory terrain estimated by the RF model is less susceptible to the influence of slope, vegetation coverage, and canopy height, remaining in a relatively stable state.

Author Contributions

J.H. and Y.Z.: methodology; Y.Z. and J.D.: formal analysis; J.H.: investigation; J.H.: resources; Y.Z.: data curation; Y.Z.: writing—the original draft preparation; J.H. and J.D.: writing—the review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant Nos. 42401488 and 42071351), the National Key Research and Development Program of China (Grant Nos. 2020YFA0608501 and 2017YFB0504204), the Liaoning Revitalization Talents Program (Grant No. XLYC1802027), the Talend Recruited Program of the Chinese Academy of Science (Grant No. Y938091), the Project Supported Discipline Innovation Team of the Liaoning Technical University (Grant No. LNTU20TD-23), the Liaoning Province Doctoral Research Initiation Fund Program (2023-BS-202), and the Basic Research Projects of Liaoning Department of Education (JYTQN2023202).

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [https://search.earthdata.nasa.gov/search/free or https://earthengine.google.com/ and https://search.asf.alaska.edu/#/] (accessed on 5 February 2024).

Acknowledgments

The authors thank the NASA and GEE for providing the free necessary data for the analyses conducted in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

ATL03_20220110002155_02861406_006_01.h5	ATL03_20220128110706_05681402_006_01.h5
ATL03_20220201105846_06291402_006_01.h5	ATL03_20220226094311_10101402_006_01.h5
ATL03_20220228215041_10481406_006_01.h5	ATL03_20220522174708_09261506_006_01.h5
ATL03_20220904004623_11321602_006_01.h5	ATL03_20220919120314_13681606_006_01.h5
ATL03_20221022103052_04841706_006_01.h5	ATL03_20221027220641_05681702_006_01.h5
ATL03_20221124085832_09871706_006_02.h5	ATL03_20221125204244_10101702_006_02.h5
ATL03_20221129203423_10711702_006_02.h5	ATL03_20221223073432_00421806_006_02.h5
ATL03_20221228191026_01261802_006_02.h5

References

Brodribb, T.J.; Powers, J.; Cochard, H.; Choat, B. Hanging by a thread? Forests and drought. Science 2020, 368, 261–266. [Google Scholar] [CrossRef] [PubMed]
Hui, G.Y.; Zhang, G.G.; Zhao, Z.H.; Yang, A.M. Methods of Forest Structure Research: A Review. Curr. For. Rep. 2019, 5, 142–154. [Google Scholar] [CrossRef]
Xie, Y.Z.; Zhu, J.J.; Fu, H.Q.; Wang, C.C. A review of underlying topography estimation over forest areas by InSAR: Theory, advances, challenges and perspectives. J. Cent. South Univ. 2020, 27, 997–1011. [Google Scholar] [CrossRef]
Campbell, M.J.; Dennison, P.E.; Hudak, A.T.; Parham, L.M.; Butler, B.W. Quantifying understory vegetation density using small-footprint airborne lidar. Geophys. Res. Lett. 2018, 215, 330–342. [Google Scholar] [CrossRef]
Ma, L.Y.; Li, C.y.; Wang, X.Q.; Xu, X. Effects of Thinning on the Growth and the Diversity of Undergrowth of Pinus tabulaeformis Plantation in Beijing Mountainous Areas. Sci. Silvae Sin. 2007, 43, 1–9. [Google Scholar] [CrossRef]
Wang, X.H.; Guo, Q.X.; Cai, T.J. Quantitative effect of topography and forest type on snow melting process in spring. J. Beijing For. Univ. 2016, 38, 83–89. [Google Scholar] [CrossRef]
Ciou, T.-S.; Lin, C.-H.; Wang, C.-K. Airborne LiDAR Point Cloud Classification Using Ensemble Learning for DEM Generation. Sensors 2024, 24, 6858. [Google Scholar] [CrossRef]
González-Moradas, M.d.R.; Viveen, W. Evaluation of ASTER GDEM2, SRTMv3.0, ALOS AW3D30 and TanDEM-X DEMs for the Peruvian Andes against highly accurate GNSS ground control points and geomorphological-hydrological metrics. Remote Sens. Environ. 2018, 215, 330342. [Google Scholar] [CrossRef]
Merlot Le-sage, K.T.; Jean Victor, K.; Lucas, K.; Adoua, N.K.; Sylvano, M.S.; Jules, N. Hydrogeological potential estimation of Ngoua watershed, West Cameroon, using petrography, Shuttle Radar Topography Mission (SRTM), and geophysical data. Arab. J. Geosci. 2021, 14, 2207. [Google Scholar] [CrossRef]
Wang, C.X.; Li, M.Q.; Wang, X.F.; Deng, M.T.; Wu, Y.L.; Hong, W.Y. Spatio-Temporal Dynamics of Carbon Storage in Rapidly Urbanizing Shenzhen, China: Insights and Predictions. Land 2024, 13, 1566. [Google Scholar] [CrossRef]
Liu, P.B.; Zhang, G. A Case Study on the Integration of Remote Sensing for Predicting Complicated Forest Fire Spread. Remote Sens. 2024, 16, 3969. [Google Scholar] [CrossRef]
Mendes, N.; Bianchini, N.; Karanikoloudis, G.; Blyth, A.; Scacco, J.; Flores Salazar, L.G.; Cullimore, C.; Jain, L. Preservation and Protection of Cultural Heritage: Vibration Monitoring and Seismic Vulnerability of the Ruins of Carmo Convent (Lisbon). Sensors 2024, 24, 6095. [Google Scholar] [CrossRef]
Yamazaki, D.; Sato, T.; Kanae, S.; Hirabayashi, Y.; Bates, P.D. Regional flood dynamics in a bifurcating mega delta simulated in a global river model. Geophys. Res. Lett. 2014, 41, 3127–3135. [Google Scholar] [CrossRef]
Liu, K.; Song, C.; Ke, L.; Jiang, L.; Pan, Y.; Ma, R. Global open-access DEM performances in Earth’s most rugged region High Mountain Asia: A multi-level assessment. Geomorphology 2019, 338, 16–26. [Google Scholar] [CrossRef]
Huang, J.P.; Zhang, Y.; Yu, Y. Mathematical Model Guided Interpolation for Mapping SRTM Understory Terrain by Integrating ICESat-2 Data. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Singh, A.; Kushwaha, S.K.P. Forest Degradation Assessment Using UAV Optical Photogrammetry and SAR Data. J. Indian Soc. Remote Sens. 2020, 49, 559–567. [Google Scholar] [CrossRef]
Wan, J.; Wang, C.C.; Zhu, J.J.; Fu, H.Q. Research Progress on Tomographic SAR Three Dimension Imaging Methods and Forest Parameters Inversion. Natl. Remote Sens. Bull. 2024, 28, 576–590. [Google Scholar] [CrossRef]
Chen, W.H.; Liu, J.Q.; Zhu, X.P.; Hou, X. Spaceborne Lidar Remote Sensing Progress and Developments. Chin. J. Lasers 2024, 11, 181–200. [Google Scholar]
Conto, T.D.; Armston, J.; Dubayah, R. Characterizing the structural complexity of the Earth’s forests with spaceborne lidar. Nat. Commun. 2024, 15, 8116. [Google Scholar] [CrossRef]
Huang, J.P.; Xing, Y.Q.; Qin, L.; Xia, T.T. Accuracy verification of terrain under forest estimated from ICESat-2/ATLAS data. Infrared Laser Eng. 2020, 49, 114–123. [Google Scholar] [CrossRef]
Dong, H.Y.; Yu, Y.; Fan, W.Y. Verification of performance of understory terrain inversion from spaceborne lidar GEDI data. J. Nan Jing For. Univ. Nat. 2023, 47, 141–149. [Google Scholar] [CrossRef]
Wang, C.J.; Jia, D.; Lei, S.G.; Numata, I.; Tian, L. Accuracy Assessment and Impact Factor Analysis of GEDI Leaf Area Index Product in Temperate Forest. Remote Sens. 2023, 15, 1535. [Google Scholar] [CrossRef]
Aragoneses, E.; García, M.; Benito, P.R.; Chuvieco, M. Mapping forest canopy fuel parameters at European scale using spaceborne LiDAR and satellite data. Remote Sens. Environ. 2024, 303, 114005. [Google Scholar] [CrossRef]
Yang, T.; Wang, C.; Li, G.C. Mapping Forest Canopy Heights in China Based on Spaceborne LiDAR GLAS and Optical MODIS Data. Sci. China Press 2014, 44, 2487–2498. [Google Scholar] [CrossRef]
Liu, Z.W.; Zhao, R.; Zhu, J.J.; Fu, H.Q.; Zhou, C.; Zhou, Y. Sub-canopy topography extraction via TanDEM-X DEM combined Sentinel-2 multispectral data. Natl. Remote Sens. Bull. 2015, 7, 1–11. [Google Scholar] [CrossRef]
Magruder, L.; Neuenschwander, A.; Klotz, B. Digital terrain model elevation corrections using space-based imagery and ICESat-2 laser altimetry. Remote Sens. Environ. 2021, 264, 112621. [Google Scholar] [CrossRef]
Lu, W.Q. The Method of Underlying Topography Estimation over Forest Areas Based on Polarimetric SAR Interferometry. Master’s Thesis, Shandong University of Science and Technology, Qingdao, China, 2019. [Google Scholar]
Zhu, J.J.; Fu, H.Q.; Wang, C.C. Methods and Research Progress of Underlying Topography Estimation over Forest Areas by InSAR. Geomat. Inf. Sci. Wuhan Univ. 2018, 43, 2030–2038. [Google Scholar] [CrossRef]
Wang, C.; Zhang, W.F.; Ji, Y.J.; Marino, A.; Li, C.; Wang, L.; Zhao, H.; Wang, M.J. Estimation of Aboveground Biomass for Different Forest Types Using Data from Sentinel-1, Sentinel-2, ALOS PALSAR-2, and GEDI. Forests 2024, 15, 215. [Google Scholar] [CrossRef]
Hu, H.C.; Zhu, J.; Fu, H.; Lopez-Sanchez, J.M.; Cristina, G.; Zhang, T.; Liu, K. Large-Scale Sub-Canopy Topography Estimation From Tandem-X InSAR and ICESat-2 Data Using Machine Learning Method. Natl. Remote Sens. Bull. 2023, 10, 1–14. [Google Scholar] [CrossRef]
Cai, S.X.; Yue, L.W.; Yin, C.; Qiu, Z.H. Accuracy evaluation of multi-source DEM data based on the analysis of vegetation-induced penetration rate in the forest area. Natl. Remote Sens. Bull. 2022, 26, 2268–2281. [Google Scholar] [CrossRef]
Yang, Z.D.; Zhou, W.W.; Gu, C.X.; Xu, L. Application of ICESat-2/ATLAS Radar Data in Forestry. Mod. Agric. Technol. 2024, 6, 176–181. [Google Scholar] [CrossRef]
Xing, Y.Q.; Huang, J.P.; Gruen, A.; Qin, L. Assessing the Performance of ICESat-2/ATLAS Multi-Channel Photon Data for Estimating Ground Topography in Forested Terrain. Remote Sens. 2020, 12, 2084. [Google Scholar] [CrossRef]
Shang, D.H.; Zhang, Y.S.; Dai, C.G.; Ma, Q.F.; Wang, Z.Q. Extraction Strategy for ICESat-2 Elevation Control Points Based on ATL08 Product. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5705012. [Google Scholar] [CrossRef]
Rodriguez, E.; Morris, C.S.; Belz, J.E. A global assessment of the SRTM performance. Photogramm. Eng. Remote Sens. 2006, 72, 249–260. [Google Scholar] [CrossRef]
Huang, J.P.; Yu, Y. Vertical Accuracy Assessment of the ASTER, SRTM, GLO-30, and ATLAS in a Forested Environment. Forests 2024, 15, 426. [Google Scholar] [CrossRef]
Zhu, X.X.; Nie, S.; Wang, C.; Xi, X.H.; Li, D.; Li, G.Y.; Wang, P.; Cao, D.; Yang, X.B. Estimating Terrain Slope from ICESat-2 Data in Forest Environments. Remote Sens. 2020, 12, 3300. [Google Scholar] [CrossRef]
Sun, X.P.; Zhou, C.; Xie, J.; Ouyang, Z.D.; Luo, Y.F. SRTM DEM Correction Based on PSO-DBN Model in Vegetated Mountain Areas. Forests 2023, 14, 1985. [Google Scholar] [CrossRef]
Walker, N.; Street, T.; Skagemo-Andreassen, T. The role of near real time Envisat ASAR global monitoring mode data in Arctic and Antarctic Operational ice services. Adv. SAR Oceanogr. Envisat ERS Mission. 2006, 613, 30. [Google Scholar]
Li, Y.; Dai, L.Y.; Che, T. Estimation of sea ice production in wind-blown polynyas along the Ross Sea Ice Shelf using Sentinel-1/SAR and AMSR2 data. J. Glaciol. Geocryol. 2024, 46, 100978. [Google Scholar] [CrossRef]
Keller, M.; Schimel, D.S.; Hargrove, W.W.; Hoffman, F.M. A continental strategy for the National Ecological Observatory Network. Front. Ecol. Environ. 2008, 6, 282–284. [Google Scholar] [CrossRef]
Hutsler, T.; Pricope, N.G.; Gao, P.; Rother, M.T. Detecting Woody Plants in Southern Arizona Using Data from the National Ecological Observatory Network (NEON). Remote Sens. 2023, 15, 98. [Google Scholar] [CrossRef]
Scholl, V.M.; Cattau, M.E.; Joseph, M.B.; Balch, J.K. Integrating National Ecological Observatory Network (NEON) Airborne Remote Sensing and In-Situ Data for Optimal Tree Species Classification. Remote Sens. 2020, 12, 1414. [Google Scholar] [CrossRef]
Liu, A.; Cheng, X.; Chen, Z.; Liu, A. Performance evaluation of GEDI and ICESat-2 laser altimeter data for terrain and canopy height retrievals. Remote Sens. Environ. 2021, 264, 112571. [Google Scholar] [CrossRef]
Sexton, J.O.; Feng, M.; Channan, S.; Song, X.-P. Earth Science Data Records of Global Forest Cover and Change; User Guide; Earth Science 2016. Available online: https://lpdaac.usgs.gov/documents/1371/GFCC_User_Guide_V1.pdf (accessed on 5 February 2024).
Wang, S.F.; Liu, C.; Li, W.Y.; Jia, S.J.; Yue, H. Hybrid model for estimating forest canopy heights using fused multimodal spaceborne LiDAR data and optical imagery. Int. J. Appl. Earth Obs. Geoinf. 2023, 122, 103431. [Google Scholar] [CrossRef]
Schimel, D.; Hargrove, W.; Hoffman, F.M.; MacMahon, J. NEON: A hierarchically designed national ecological network. Front. Ecol. Environ. 2007, 5, 59. [Google Scholar] [CrossRef]
He, L.; Pang, Y.; Zhang, Z.J.; Liang, X.J.; Chen, B.W. ICESat-2data classification and estimation of terrain height and canopy height. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103233. [Google Scholar] [CrossRef]
Mullissa, A.; Vollrath, A.; Odongo-Braun, C.; Slagter, B.; Balling, J.; Gou, Y.; Gorelick, N.; Reiche, J. Sentinel-1 SAR Backscatter Analysis Ready Data Preparation in Google Earth Engine. Remote Sens. 2021, 13, 1954. [Google Scholar] [CrossRef]
Zelterman, D. Applied Multivariate Statistics with R; Statistics for Biology and Health; Springer: Cham, Switzerland, 2022; pp. 235–265. [Google Scholar]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Bakır, R.; Orak, C.; Yüksel, A. Optimizing hydrogen evolution prediction: A unified approach using random forests, lightGBM, and Bagging Regressor ensemble model. Int. J. Hydrogen Energy 2024, 67, 101–110. [Google Scholar] [CrossRef]
Leo, B. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Qin, C.C.; Chen, C.F.; Yang, N. Elevation accuracy evaluation and correction of SRTM and ASTER GDEM in Shandong province based on ICESat/GLAS. J. Geo-Inf. Sci. 2020, 22, 351–360. [Google Scholar] [CrossRef]
Li, Y.; Fu, H.Q.; Zhu, J.J.; Wu, K.F.; Yang, P.F.; Wang, L.; Gao, S.J. A Method for SRTM DEM Elevation Error Correction in Forested Areas Using ICESat-2 Data and Vegetation Classification Data. Remote Sens. 2022, 14, 3380. [Google Scholar] [CrossRef]

Figure 1. The schematic diagram of the SRTM DEM data for the study area.

Figure 2. The flowchart of the method for estimating understory terrain using the LiDAR and SAR data.

Figure 3. The schematic diagram of the RF model.

Figure 4. The scatter plot of the correlation between the NEON DTM data and ICESat-2 understory terrain data.

Figure 5. The schematic diagram of the regional-scale estimation using the RF Model.

Figure 6. The schematic diagram of the SHAP values for characteristic variables.

Figure 7. The schematic diagram of the evaluation indicators of the accuracy of the regional-scale data obtained by the SRTM and RF models under different slope conditions: (a) R²; (b) RMSE.

Figure 8. The schematic diagram of the evaluation indicators of the accuracy of the regional-scale data obtained by the SRTM and RF models under different vegetation coverage conditions: (a) R²; (b) RMSE.

Figure 9. The schematic diagram of the evaluation indicators of the accuracy of the regional-scale data obtained by the SRTM and RF models under different canopy height conditions: (a) R²; (b) RMSE.

Table 1. Comparison results of the understory terrain accuracy of the SRTM DEM and the four machine learning-based models constructed in this study.

Model	R²	RMSE (m)	MAE (m)
SRTM	0.995	9.847	7.700
RF	0.999	0.701	0.249
MLR	0.996	6.429	4.771
BAGGING	0.999	0.812	0.269
LSTM	0.982	13.212	8.140

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, J.; Zhang, Y.; Ding, J. Combining LiDAR, SAR, and DEM Data for Estimating Understory Terrain Using Machine Learning-Based Methods. Forests 2024, 15, 1992. https://doi.org/10.3390/f15111992

AMA Style

Huang J, Zhang Y, Ding J. Combining LiDAR, SAR, and DEM Data for Estimating Understory Terrain Using Machine Learning-Based Methods. Forests. 2024; 15(11):1992. https://doi.org/10.3390/f15111992

Chicago/Turabian Style

Huang, Jiapeng, Yue Zhang, and Jianhuang Ding. 2024. "Combining LiDAR, SAR, and DEM Data for Estimating Understory Terrain Using Machine Learning-Based Methods" Forests 15, no. 11: 1992. https://doi.org/10.3390/f15111992

APA Style

Huang, J., Zhang, Y., & Ding, J. (2024). Combining LiDAR, SAR, and DEM Data for Estimating Understory Terrain Using Machine Learning-Based Methods. Forests, 15(11), 1992. https://doi.org/10.3390/f15111992

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combining LiDAR, SAR, and DEM Data for Estimating Understory Terrain Using Machine Learning-Based Methods

Abstract

1. Introduction

2. Study Materials

2.1. Study Area

2.2. Study Data

2.2.1. ICESat-2/ATLAS Data

2.2.2. SRTM DEM

2.2.3. Sentinel-1 SAR

2.2.4. NEON

2.2.5. Vegetation Coverage Production

2.2.6. Canopy Height Production

3. Methods

3.1. Validation of ICESat-2’s Accuracy by NEON

3.2. Data Preprocessing

3.2.1. Spaceborne LiDAR Data Preprocessing

3.2.2. SAR Data Preprocessing

3.3. Machine Learning-Based Models

3.3.1. Multiple Linear Regression Model

3.3.2. BAGGING Model

3.3.3. Random Forest Regression Model

3.3.4. Long Short-Term Memory Model

3.4. Regional Scale Estimation and Verification

4. Results and Discussion

4.1. Results

4.1.1. ICESat-2 Data Accuracy

4.1.2. Model Accuracy Analysis

4.1.3. Understory Terrain Estimation Accuracy

4.2. Slope Importance Analysis

4.3. Vegetation Coverage Effect Analysis

4.4. Canopy Height Effect Analysis

5. Innovation and Limitations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI