Landslide Susceptibility Mapping Based on Ensemble Learning in the Jiuzhaigou Region, Sichuan, China

An, Bangsheng; Zhang, Zhijie; Xiong, Shenqing; Zhang, Wanchang; Yi, Yaning; Liu, Zhixin; Liu, Chuanqi

doi:10.3390/rs16224218

Open AccessArticle

Landslide Susceptibility Mapping Based on Ensemble Learning in the Jiuzhaigou Region, Sichuan, China

by

Bangsheng An

^1,2,3,

Zhijie Zhang

^4,*,

Shenqing Xiong

⁵,

Wanchang Zhang

^1,3

,

Yaning Yi

⁶,

Zhixin Liu

^1,2,3 and

Chuanqi Liu

^1,2,3

¹

Key Laboratory of Digital Earth Science, Aerospace Information Research Institute (AIRCAS), Chinese Academy of Sciences, Beijing 100094, China

²

International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China

³

University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Department of Environment and Society, Quinney College of Natural Resources, Utah State University, Logan, UT 84322, USA

⁵

China Aero Geophysical Survey and Remote Sensing Center for Natural Resources, Beijing 100083, China

⁶

National Institute of Natural Hazards, Ministry of Emergency Management of China, Beijing 100085, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(22), 4218; https://doi.org/10.3390/rs16224218

Submission received: 2 October 2024 / Revised: 2 November 2024 / Accepted: 4 November 2024 / Published: 12 November 2024

(This article belongs to the Special Issue Remote Sensing Data for Modeling and Managing Natural Disasters)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate landslide susceptibility mapping is vital for disaster forecasting and risk management. To address the problem of limited accuracy of individual classifiers and lack of model interpretability in machine learning-based models, a coupled multi-model framework for landslide susceptibility mapping is proposed. Using Jiuzhaigou County, Sichuan Province, as a case study, we developed an evaluation index system incorporating 14 factors. We employed three base models—logistic regression, support vector machine, and Gaussian Naive Bayes—assessed through four ensemble methods: Stacking, Voting, Bagging, and Boosting. The decision mechanisms of these models were explained via a SHAP (SHapley Additive exPlanations) analysis. Results demonstrate that integrating machine learning with ensemble learning and SHAP yields more reliable landslide susceptibility mapping and enhances model interpretability. This approach effectively addresses the challenges of unreliable landslide susceptibility mapping in complex environments.

Keywords:

landslide susceptibility mapping; ensemble learning; machine learning; SHapley Additive exPlanations

1. Introduction

Landslides are a widespread geological hazard, causing significant casualties and property damage annually, thus posing a serious threat to public safety and infrastructure. Factors such as accelerated glacial melting, erratic rainfall patterns, increasing extreme weather events, and intensified human activity are contributing to the rising incidence and severity of landslides [1]. Accurately identifying potential landslide locations is essential for mitigating their impacts and safeguarding communities.

Landslide susceptibility mapping (LSM) research typically falls into two categories: knowledge-driven and data-driven methods [2,3]. Knowledge-driven methods for LSM encompass methods like Analytic Hierarchy Process (AHP) [4,5], Fuzzy AHP [6], Fuzzy Relation AHP [7], and Fuzzy Logic [8,9], among others. Conversely, data-driven methods primarily include the frequency ratio [10], evidence weight [11], and various machine learning techniques such as logistic regression [12], support vector machines [13], random forests [14], and deep learning methods such as recurrent neural networks [15], deep neural networks [16], and convolutional neural networks [17].

Despite their efficacy, these models often suffer from a lack of interpretability, complicating the understanding of their decision-making processes. The complexity and scale of machine learning models render them “black boxes”, making it challenging to discern how inputs are transformed into outputs. Each geographical area exhibits unique geomorphological, topographical, hydrological, and anthropogenic characteristics, limiting the effectiveness of traditional single-model approaches. Thus, developing new methodologies that enhance model interpretability and improve prediction accuracy has become a critical research focus [18,19,20,21].

To thoroughly investigate the potential of ensemble learning and improve model interpretability, this study combines diverse datasets and employs both traditional and ensemble learning techniques for landslide susceptibility mapping in Jiuzhaigou County, Sichuan Province. A SHAP analysis is utilized on the trained models to identify key factors and their values influencing each landslide, thereby enhancing model interpretability.

2. Study Area and Data

2.1. Study Area

Jiuzhaigou County, located in the northwestern Aba Tibetan and Qiang Autonomous Prefecture of Sichuan Province, China, lies between approximately 32°54′ to 33°19′N latitude and 103°46′ to 104°4′E longitude (Figure 1). This area, situated at the juncture of the Qinghai–Tibet Plateau and the Sichuan Basin, features a monsoon-dominated and semi-humid climate with significant topographical variation—elevations range from low in the east to high in the west, creating a relative height difference of 3712 m [22].

The region exhibits folded fractures due to intense neotectonic activity, primarily in the Minjiang, Huya, Xueshanliangzi, and Tazang Fracture Zones, with ground shaking peak accelerations of 0.04 g to 0.26 g. Jiuzhaigou County is prone to geological disasters, having experienced over 50 earthquakes of magnitude 5.0 or higher in the past century, according to Fan et al. [23]. The average annual rainfall is 552.3 mm, contributing to its susceptibility to landslides and other geological hazards.

2.2. Database

2.2.1. Historical Landslides

In the initial phase of the LSM process, we collected high-precision field observations from local geological surveys. By the end of 2020, we documented 164 significant slope failure events through field investigations, historical records, and satellite imagery interpretation. These data include medium-sized landslides, avalanches, debris flows, and unstable slopes, forming the basis of our landslide inventory (Figure 1).

Given the scarcity of detailed data on landslide boundaries, we represented each event by its center point. To mitigate the challenge of imbalanced training samples, we performed random sampling to match an equal number of non-landslide points. A 100 m buffer zone was established around landslide points, and non-landslide points were sampled outside these zones [24].

2.2.2. Landslide Conditioning Factors

The success of LSM hinges on the careful selection of conditioning factors [25]. Among these, altitude plays a critical role in the formation and distribution of landslides [26,27]. Using ArcGIS 10.8, we analyzed Digital Elevation Model (DEM) data to compute six key conditioning factors: the slope, aspect, plan curvature, profile curvature, topographic wetness index (TWI), and stream power index (SPI). A buffer analysis was employed to derive distance factors related to roads, historical earthquakes, faults, and human activities [28]. Detailed information about the data sources and nomenclature for these factors is provided in Table 1 and Table 2, as well as Figure 2.

Slope influences pressure distribution and sediment accumulation [2], while variations in the solar radiation of different slopes impact landslide stability [29]. The profile and plan curvature elucidate the spatial geometry of unstable slopes; profile curvature affects flow velocity, whereas plan curvature influences flow direction [30,31].

The TWI is associated with slope and water flow, indicating areas with potentially higher soil moisture. The SPI indicates river erosive capacity, contributing to slope instability through erosion [32]. Lithological variations in the Jiuzhaigou region affect slope stability due to differences in strength, weathering resistance, and permeability [33]. The lithology map for the Jiuzhaigou region is presented in Table 1.

The normalized difference vegetation index (NDVI) serves as an indicator of vegetation growth, which indirectly affects slope stability by influencing rock and soil erosion [34]. The proximity to faults and historical seismic sites provides valuable insights into regional tectonic activity and fault instability, both critical factors in landslide formation [35]. Additionally, distances to roads, human activity points, and land use/land cover (LULC) data reflect the effects of anthropogenic activities on slope stability [36]. Meteorological factors, especially annual precipitation, play a key role in triggering landslides [27].

After a comprehensive analysis, 14 landslide conditioning factors were selected. These factors were converted to raster format and resampled to a 30 m resolution. Each factor was categorized using established classifications for lithology and LULC, while other factors were classified based on expert knowledge to optimize value arrangements (Figure 2). Continuous variables were manually classified according to expert insights and previous studies [3,26,27,28]. To normalize each factor for the analysis, the maximum–minimum method was applied, scaling values from 0 to 1, ensuring a zero mean and unit variance for each feature.

3. Methods

The study comprises three stages, as illustrated in Figure 3. First, we conducted comprehensive data collection, including conditioning factors and historical landslide inventories. Next, we developed three conventional machine learning models and four ensemble models to create an extensive LSM. In the final stage, we thoroughly evaluated and discussed the resulting susceptibility maps. This systematic approach facilitated a detailed assessment and interpretation of our findings, contributing valuable insights to the field of landslide susceptibility research.

3.1. Evaluation of Conditioning Factors

Incorporating additional environmental variables and triggers enhances the understanding of landslide susceptibility but also increases computational burden of the model. Excessive redundant information can obscure the importance of key factors, potentially distorting results [37]. Therefore, optimizing the selected assessment factors is essential, which includes diagnosing multicollinearity.

To evaluate multicollinearity, we employed the Variance Inflation Factor (VIF) and tolerance (TOL) metrics. This analysis examined the linear correlations among selected factors, identifying redundancies that could compromise model accuracy.

3.2. Base Classifiers

3.2.1. Support Vector Machines

A support vector machine (SVM) is used for binary classification through supervised learning, leveraging a kernel trick for classification [38]. The fundamental idea is to identify the optimal hyperplane that maximizes the distance between data points of the two classes, thereby ensuring better generalization to unseen data. The effectiveness of SVMs is highly dependent on the choice of kernel and regularization parameters, which are typically tuned through cross-validation to optimize performance.

Assuming a training set

x

= {

x_{1}, x_{2}, \cdot \cdot \cdot, x_{n}

} ∈ R^B×N, the hyperplane can be minimized as follows:

\frac{1}{2} {‖ w ‖}^{2} + C \sum_{i} ξ_{i}, s . t . y_{i} (w \cdot x_{i} + b) \geq 1 - ξ_{i}

where

‖ w ‖

represents the norm of the hyperplane normal vector,

w

is the weight vector,

b

denotes the bias,

ξ_{i}

are the slack variables for non-separable data, and

C

is the penalty parameter. The cost function is defined as follows:

\sum_{i = 1}^{n} a_{i} - \frac{1}{2} \sum_{i = 1, j = 1}^{n} a_{i} a_{j} y_{i} y_{j} K (x, x_{i}), s . t .0 \leq a_{i} \leq C, \sum_{i = 1}^{l} a_{i} y_{i} = 0

where

a_{i}

are Lagrange multipliers and

K (x, x_{i})

is the kernel function.

3.2.2. Logistic Regression

Logistic regression (LR) is a generalized linear regression analysis model that falls under supervised learning in machine learning. The LR model is constructed as follows:

p = \frac{1}{1 + e^{- y}}

where

p

is the probability of a landslide occurring and

y

denotes a linear combination function as follows:

y = b_{0} + b_{1} x_{1} + b_{2} x_{2} + \cdot \cdot \cdot + b_{n} x_{n}

where (

x_{1}, x_{2}, \cdot \cdot \cdot, x_{n}

) denote the explanatory variables affecting the occurrence of landslides,

b_{0}

is a constant value, and (

b_{1}, b_{2}, \cdot \cdot \cdot, b_{n}

) denote the regression coefficients.

3.2.3. Gaussian Naive Bayes

Gaussian Naive Bayes (GNB) is a supervised learning algorithm that directly estimates the probabilistic relationship between labels and features. By applying Bayes’ theorem, GNB calculates the posterior probability of a sample belonging to each class based on its feature distribution. GNB is characterized by its stable classification efficiency, strong performance even with small datasets, and reduced sensitivity to missing data. The formula for Gaussian Naive Bayes with each feature’s values is as follows:

P (x_{i} | y_{c}) = \frac{1}{\sqrt{2 π σ_{c i}^{2}}} \exp (- \frac{{(x_{i} - μ_{c i})}^{2}}{2 σ_{c_{i}}^{2}})

where

x_{i}

denotes the ith feature dimension, and

σ_{c i}

and

μ_{c i}

denote the standard deviation and expectation corresponding to the features under the category.

3.3. Ensemble Learning Methods

3.3.1. Bagging

Bagging was initially proposed for integrating classification and regression trees [39]. It employs bootstrap sampling to create multiple subsets from the original training set, with each subset training an individual base model. This results in several base classifiers, each operating on different data. During prediction, the outputs from these models are aggregated using a Voting mechanism, enhancing model stability and accuracy by reducing variance and mitigating overfitting.

3.3.2. Voting

Voting is an ensemble learning technique for classification, utilizing two strategies: simple averaging and weighted averaging [40]. Simple averaging combines predicted probabilities from all base classifiers to generate a final prediction. In contrast, weighted averaging assigns greater importance to models with higher AUC values, which reflects their effectiveness in distinguishing classes, potentially leading to more accurate predictions.

3.3.3. Boosting

Boosting improves predictive performance by combining homogeneous base learning algorithms through dataset reweighting and updates across multiple models [41,42]. It begins by training the first classifier on the initial dataset and evaluating its error rate. The dataset is then adjusted based on this error, and the reweighted data are used to train the next classifier. This iterative process continues, ultimately aiming to construct a strong ensemble model. Final predictions are made by aggregating the outputs of all classifiers, typically using a weighted combination reflecting each model’s performance.

3.3.4. Stacking

Stacking, introduced by Wolpert (1992), is a heterogeneous ensemble learning method that employs meta-learning to integrate multiple base classifiers [43]. The Stacking framework consists of two components: a Basic Learner (Level 0) and Meta-Learner (Level 1). During training, the dataset is split into two subsets to develop the base and meta-learners. In the Base Level Training stage, various base models are trained independently on the original dataset, with their predictions serving as inputs for the Meta-Level Training stage [39]. Here, a meta-model is trained using the outputs from the base models as features.

3.4. Model Evaluation Measures

Accurately assessing model performance is crucial for evaluation. The receiver operating characteristic curve (ROC) is a widely used method for this purpose, with the area under the ROC (AUC) indicating the model’s predictive capability for unseen data; higher AUC values signify better predictions.

By setting a threshold (commonly 0.5), continuous predictions can be converted into binary classifications. Evaluation metrics such as the accuracy, Matthews Correlation Coefficient (MCC), and confusion matrix are typically employed to gauge predictive performance [44,45]. For binary classification tasks, outcomes can be categorized into four scenarios, as shown in Table 3.

Kappa measures categorization accuracy and consistency. While precision and recall focus on the positive class, neglecting the negative class, the MCC offers a balanced assessment by considering both classes equally, making it particularly useful for imbalanced datasets. The evaluation metrics are computed based on the confusion matrix as follows:

A c c u r a c y = \frac{T P + T N}{T P + F P + F N + T N}

P r e c i s i o n = \frac{T P}{T P + F P}

R e c a l l = \frac{T P}{T P + F N}

M C C = \frac{T P \times T N - T P \times F N}{\sqrt{(T P + F P) \times (T P + F N) \times (T N + F P) \times (T N + F N)}}

K a p p a = \frac{P_{o} - P_{e}}{1 - P_{e}}

Among

P_{o} = A c c u r a c y

P_{e} = \frac{(T P + F P) \times (T P + F N) + (F P + T N) \times (F N + T N)}{{(T P + F P + F N + T N)}^{2}}

3.5. SHapley Additive exPlanations

SHAP is a unified framework for interpreting machine learning models by providing consistent and interpretable feature importance scores [46]. Formally, the SHAP value

ϕ_{i}

for a feature

i

is computed using the formula

ϕ_{i} = \sum_{S \subseteq N ∖ {i}} \frac{| S |! (| N | - | S | - 1)!}{| N |!} [f (S \cup {i}) - f (S)]

where

N

denotes the set of all features,

S

represents a subset of features excluding

i

, and

f (S)

is the model’s prediction with the feature subset

S

. This formula computes the average marginal contribution of feature

i

across all possible feature subsets, ensuring a fair and comprehensive assessment of its impact.

4. Results

4.1. Multicollinearity

We assessed multicollinearity among the 14 selected assessment factors using the VIF and TOL methods. Research indicates that factors are considered independent when VIF is below 10 and TOL is above 0.1 [18,47]. All 14 factors exhibited VIF values below 10 and TOL values exceeding 0.1, confirming their suitability for landslide susceptibility modeling and analyses (Figure 4).

4.2. Model Performance and Evaluation

The performance of each model was assessed using the training set, as illustrated in Table 4 and Figure 5. SVM served as the base model for Bagging and Boosting due to its highest verification accuracy. For Voting and Stacking, three models were employed as base classifiers. The results show that all models exhibit strong performance, with SVM achieving the highest AUC of 0.9267. Among the ensemble methods, Bagging outperformed Boosting, Voting, and Stacking. Integrating SVM with Bagging improved accuracy by 7.08%, precision by 4.28%, AUC by 5.80%, recall by 10.62%, Kappa by 14.16%, and Matthews by 13.99%. Boosting yielded improvements of 6.64%, 4.85%, 5.22%, 7.97%, 13.27%, and 13.18%, respectively. Stacking and Voting also showed enhancements, achieving AUC values of 0.9516 and 0.9789.

Subsequently, the generalization ability of the model was evaluated (Table 5 and Figure 5). Among ensemble models, the optimal values for accuracy, precision, AUC, recall, Kappa, and Matthews were 0.8857, 0.9285, 0.9340, 0.8571, 0.7628, and 0.7629, respectively, with Bagging demonstrating the best predictive performance. Voting followed closely, while Stacking ranked moderately, and Boosting provided the least enhancement to predictive capabilities.

Each ensemble strategy—Bagging, Boosting, Stacking, and Voting—offers unique advantages and limitations for landslide susceptibility mapping. Bagging reduces variance by averaging predictions from correlated models, as shown by the 5.80% AUC improvement over SVM. Boosting, such as XGBoost, improves accuracy by sequentially minimizing the loss function, although it may risk overfitting. Stacking effectively utilizes outputs from multiple base learners through a secondary model for predictions, demonstrating improved accuracy. Voting aggregates predictions via majority or weighted voting, enhancing robustness against overfitting. Overall, the findings indicate that while diverse independent models minimize errors, the careful consideration of each ensemble strategy is essential for optimizing predictive performance in complex environments.

4.3. Landslide Susceptibility Mapping

Using well-trained landslide susceptibility models, we calculated a susceptibility index ranging from 0 to 1, with higher values indicating greater landslide susceptibility. To enhance interpretability, we applied the natural breaks method to classify this index into five categories, providing a clearer depiction of susceptibility distribution for informed decision making (Figure 5) [33,48,49]. The generated susceptibility maps reveal consistent spatial patterns across methodologies, with areas classified as very high susceptibility aligning with regions historically prone to landslides (Figure 6).

The results demonstrate that both machine learning and ensemble learning techniques exhibit similar spatial distribution characteristics in landslide susceptibility mapping. High- and very-high-susceptibility zones correlate with significant landslide occurrences, indicating robust predictive capabilities aligned with terrain features. We also computed landslide density within these five susceptibility zones (Table 6).

High-sensitivity areas predominantly cluster along roads, rivers, and valleys. Notably, landslide-prone areas (very-high- and high-susceptibility zones) encompass only 16% of the study area, yet they account for nearly all historical landslides. In Jiuzhaigou County, intensive human activities, particularly highway construction, exacerbate landslide risk by increasing slope exposure. Previous studies highlight the need for extensive infrastructure due to rising tourism, with 138 geological disasters recorded in Zhangzha Town, 85 of which occurred in scenic areas [50]. Thus, future research should focus on balancing sustainable tourism development in Jiuzhaigou while implementing effective strategies to mitigate landslide risks and ensure local resident safety.

5. Discussion

5.1. SHAP Model Interpretation

This study utilized the SHAP interpreter to elucidate the Bagging model and analyze how various factors influence landslide spatial distribution (Figure 7). Figure 7a presents the importance ranking of factors based on average absolute SHAP values, identifying distance to roads, rainfall, and lithology as the most significant contributors to landslides. Figure 7b illustrates the SHAP values for each feature.

The results indicate that proximity to roads correlates with lower slope stability, while areas further from roads exhibit greater stability. Annual average rainfall initially promotes landslide occurrences, but this effect diminishes at higher rainfall levels. Human activities, including deforestation, urbanization, and mining, significantly destabilize slopes. Deforestation reduces root systems that anchor soil, increasing erosion and landslide risk. Urbanization leads to increased runoff and soil saturation, further destabilizing slopes. Mining creates steep, unsupported slopes and disrupts natural drainage, heightening failure risks.

Excessive rainfall saturates soil, weakening its stability and increasing susceptibility to sliding. Meanwhile, moderate rainfall can trigger landslides, and excessive saturation results in reduced cohesion. Heavy rainfall raises pore water pressure, destabilizing slopes, while erosion can undermine surface integrity. Beyond a certain threshold, increased rainfall is associated with a decreased likelihood of landslides [51].

Lithology also significantly impacts landslide development by influencing slope stability, water retention, and erosion. Different soil types affect water retention; for example, clay and silt can increase pore pressure, reducing stability, while sandy soils may become loose when wet. Rock types further influence stability: igneous rocks tend to be stable, while sedimentary and metamorphic rocks may exhibit weaknesses due to layered structures or foliation. Weathering processes differ among lithologies, producing loose materials that impact slope integrity.

Local interpretations often provide more precise insights than global explanations for specific regions. The waterfall chart in Figure 8 illustrates the magnitude and trend of each variable within a selected landslide prediction unit. The vertical axis represents the landslide impact factors and their values, while the horizontal axis shows the corresponding SHAP values.

The chart indicates that distance to roads and NDVI are the most critical variables affecting landslide susceptibility. Notably, the rankings for rain and lithology differ when considering individual landslides compared to the overall context.

5.2. Limitation and Future Work

This study recognizes the influence of expert knowledge in the Holdout method for data partitioning. Although we used a one-to-one ratio for landslide and non-landslide data, we did not adequately address the potential imbalance between these datasets, which could affect their contributions to the model.

Additionally, the analysis utilized grid cells, which may inadequately represent geological, topographical, and hydrogeological features. Future research should consider using slope units derived from terrain segmentation, as they more effectively capture the factors influencing landslides [52].

Model interpretability is crucial in landslide research due to the significant safety and economic implications of decisions based on these models. Beyond SHAP, techniques like Class Activation Mapping (CAM) can further enhance interpretability [53]. Future studies should explore the physical mechanisms underpinning the relationships identified by these models to strengthen their practical applications.

6. Conclusions

This study leverages ensemble learning with SHAP to develop an interpretable LSM model based on 164 landslides and 14 factors in Jiuzhaigou County, Sichuan Province, achieving optimal landslide susceptibility assessment with clear insights into factor importance. Our findings reveal that ensemble learning enhances the accuracy and robustness of individual models, with SHAP clarifying the roles of precipitation, lithology, and distance to roads in landslide occurrence. This integration of interpretability and ensemble learning offers a novel approach to regional LSM. Future work could explore more advanced, interpretable machine learning models to further advance the field.

Author Contributions

B.A. and Z.Z.: Data curation, Methodology, Software, Writing—original draft. S.X. and W.Z.: Conceptualization, Methodology, Supervision. Y.Y., Z.L. and C.L.: Data curation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was jointly funded by the National Key R & D Program of China (Grant No. 2023YFC3206202, 2023YFC3209102), and Major Science and Technology Projects (Grant No: SKS-2022008) financed by the Ministry of Water Resources, China. Zhijie Zhang was financially supported by the research fund provided by the Department of Environment and Society, Quinney College of Natural Resources, Utah State University.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

All the authors thank the reviewers and editors for their valuable comments and suggestions in improving the quality of the work presented.

Conflicts of Interest

The authors declare no conflict of interest.

References

Abedini, M.; Ghasemian, B.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Binh, T.P.; Bin Ahmad, B.; Dieu, T.B. A Novel Hybrid Approach of Bayesian Logistic Regression and Its Ensembles for Landslide Susceptibility Assessment. Geocarto Int. 2019, 34, 1427–1457. [Google Scholar] [CrossRef]
Aditian, A.; Kubota, T.; Shinohara, Y. Comparison of GIS-Based Landslide Susceptibility Models Using Frequency Ratio, Logistic Regression, and Artificial Neural Network in a Tertiary Region of Ambon, Indonesia. Geomorphology 2018, 318, 101–111. [Google Scholar] [CrossRef]
Dong, V.D.; Jaafari, A.; Bayat, M.; Mafi-Gholami, D.; Qi, C.; Moayedi, H.; Tran, V.P.; Hai-Bang, L.; Tien-Thinh, L.; Phan, T.T.; et al. A Spatially Explicit Deep Learning Neural Network Model for the Prediction of Landslide Susceptibility. Catena 2020, 188, 104451. [Google Scholar] [CrossRef]
Chen, W.; Zhang, S. GIS-Based Comparative Study of Bayes Network, Hoeffding Tree and Logistic Model Tree for Landslide Susceptibility Modeling. Catena 2021, 203, 105344. [Google Scholar] [CrossRef]
Panchal, S.; Shrivastava, A.K. Landslide Hazard Assessment Using Analytic Hierarchy Process (AHP): A Case Study of National Highway 5 in India. Ain Shams Eng. J. 2022, 13, 101626. [Google Scholar] [CrossRef]
Mallick, J.; Singh, R.K.; AlAwadh, M.A.; Islam, S.; Khan, R.A.; Qureshi, M.N. GIS-Based Landslide Susceptibility Evaluation Using Fuzzy-AHP Multi-Criteria Decision-Making Techniques in the Abha Watershed, Saudi Arabia. Environ. Earth Sci. 2018, 77, 276. [Google Scholar] [CrossRef]
Cengiz, L.D.; Ercanoglu, M. A Novel Data-Driven Approach to Pairwise Comparisons in AHP Using Fuzzy Relations and Matrices for Landslide Susceptibility Assessments. Environ. Earth Sci. 2022, 81, 222. [Google Scholar] [CrossRef]
Nanehkaran, Y.A.; Mao, Y.; Azarafza, M.; Kockar, M.K.; Zhu, H.-H. Fuzzy-Based Multiple Decision Method for Landslide Susceptibility and Hazard Assessment: A Case Study of Tabriz, Iran. Geomech. Eng. 2021, 24, 407–418. [Google Scholar] [CrossRef]
Wang, Y.; Nanehkaran, Y.A. GIS-Based Fuzzy Logic Technique for Mapping Landslide Susceptibility Analyzing in a Coastal Soft Rock Zone. Nat. Hazards 2024, 120, 10889–10921. [Google Scholar] [CrossRef]
Karaman, M.O.; Cabuk, S.N.; Pekkan, E. Utilization of Frequency Ratio Method for the Production of Landslide Susceptibility Maps: Karaburun Peninsula Case, Turkey. Environ. Sci. Pollut. Res. 2022, 29, 91285–91305. [Google Scholar] [CrossRef]
Alsabhan, A.H.; Singh, K.; Sharma, A.; Alam, S.; Pandey, D.D.; Rahman, S.A.S.; Khursheed, A.; Munshi, F.M. Landslide Susceptibility Assessment in the Himalayan Range Based along Kasauli-Parwanoo Road Corridor Using Weight of Evidence, Information Value, and Frequency Ratio. J. King Saud. Univ. Sci. 2022, 34, 101759. [Google Scholar] [CrossRef]
Sahana, M.; Sajjad, H. Evaluating Effectiveness of Frequency Ratio, Fuzzy Logic and Logistic Regression Models in Assessing Landslide Susceptibility: A Case from Rudraprayag District, India. J. Mt. Sci. 2017, 14, 2150–2167. [Google Scholar] [CrossRef]
Huang, Y.; Zhao, L. Review on Landslide Susceptibility Mapping Using Support Vector Machines. Catena 2018, 165, 520–529. [Google Scholar] [CrossRef]
Deng, H.; Wu, X.; Zhang, W.; Liu, Y.; Li, W.; Li, X.; Zhou, P.; Zhuo, W. Slope-Unit Scale Landslide Susceptibility Mapping Based on the Random Forest Model in Deep Valley Areas. Remote Sens. 2022, 14, 4245. [Google Scholar] [CrossRef]
Rong, G.; Li, K.; Su, Y.; Tong, Z.; Liu, X.; Zhang, J.; Zhang, Y.; Li, T. Comparison of Tree-Structured Parzen Estimator Optimization in Three Typical Neural Network Models for Landslide Susceptibility Assessment. Remote Sens. 2021, 13, 4694. [Google Scholar] [CrossRef]
Hua, Y.; Wang, X.; Li, Y.; Xu, P.; Xia, W. Dynamic Development of Landslide Susceptibility Based on Slope Unit and Deep Neural Networks. Landslides 2021, 18, 281–302. [Google Scholar] [CrossRef]
Hakim, W.L.; Rezaie, F.; Nur, A.S.; Panahi, M.; Khosravi, K.; Lee, C.-W.; Lee, S. Convolutional Neural Network (CNN) with Metaheuristic Optimization Algorithms for Landslide Susceptibility Mapping in Icheon, South Korea. J. Environ. Manag. 2022, 305, 114367. [Google Scholar] [CrossRef]
Dou, J.; Yunus, A.P.; Merghadi, A.; Shirzadi, A.; Hoang, N.; Hussain, Y.; Avtar, R.; Chen, Y.; Binh, T.P.; Yamagishi, H. Different Sampling Strategies for Predicting Landslide Susceptibilities Are Deemed Less Consequential with Deep Learning. Sci. Total Environ. 2020, 720, 137320. [Google Scholar] [CrossRef]
Nanehkaran, Y.A.; Zhu, L.; Chen, J.; Azarafza, M.; Mao, Y. Application of Artificial Neural Networks and Geographic Information System to Provide Hazard Susceptibility Maps for Rockfall Failures. Environ. Earth Sci. 2022, 81, 475. [Google Scholar] [CrossRef]
Huang, F.; Yan, J.; Fan, X.; Yao, C.; Huang, J.; Chen, W.; Hong, H. Uncertainty Pattern in Landslide Susceptibility Prediction Modelling: Effects of Different Landslide Boundaries and Spatial Shape Expressions. Geosci. Front. 2022, 13, 101317. [Google Scholar] [CrossRef]
Zeng, T.; Wu, L.; Hayakawa, Y.S.; Yin, K.; Gui, L.; Jin, B.; Guo, Z.; Peduto, D. Advanced Integration of Ensemble Learning and MT-InSAR for Enhanced Slow-Moving Landslide Susceptibility Zoning. Eng. Geol. 2024, 331, 107436. [Google Scholar] [CrossRef]
Qiao, X.; Du, J.; Lugli, S.; Ren, J.; Xiao, W.; Chen, P.; Tang, Y. Are Climate Warming and Enhanced Atmospheric Deposition of Sulfur and Nitrogen Threatening Tufa Landscapes in Jiuzhaigou National Nature Reserve, Sichuan, China? Sci. Total Environ. 2016, 562, 724–731. [Google Scholar] [CrossRef] [PubMed]
Fan, X.; Scaringi, G.; Xu, Q.; Zhan, W.; Dai, L.; Li, Y.; Pei, X.; Yang, Q.; Huang, R. Coseismic Landslides Triggered by the 8th August 2017 M_s 7.0 Jiuzhaigou Earthquake (Sichuan, China): Factors Controlling Their Spatial Distribution and Implications for the Seismogenic Blind Fault Identification. Landslides 2018, 15, 967–983. [Google Scholar] [CrossRef]
Hu, X.; Mei, H.; Zhang, H.; Li, Y.; Li, M. Performance Evaluation of Ensemble Learning Techniques for Landslide Susceptibility Mapping at the Jinping County, Southwest China. Nat. Hazards 2021, 105, 1663–1689. [Google Scholar] [CrossRef]
Zeng, T.; Jin, B.; Glade, T.; Xie, Y.; Li, Y.; Zhu, Y.; Yin, K. Assessing the Imperative of Conditioning Factor Grading in Machine Learning-Based Landslide Susceptibility Modeling: A Critical Inquiry. Catena 2024, 236, 107732. [Google Scholar] [CrossRef]
Xia, L.; Shen, J.; Zhang, T.; Dang, G.; Wang, T. GIS-Based Landslide Susceptibility Modeling Using Data Mining Techniques. Front. Earth Sci. 2023, 11, 1187384. [Google Scholar] [CrossRef]
Yi, Y.; Zhang, W.; Xu, X.; Zhang, Z.; Wu, X. Evaluation of Neural Network Models for Landslide Susceptibility Assessment. Int. J. Digit. Earth 2022, 15, 934–953. [Google Scholar] [CrossRef]
Yi, Y.; Zhang, Z.; Zhang, W.; Xu, Q.; Deng, C.; Li, Q. GIS-Based Earthquake-Triggered-Landslide Susceptibility Mapping with an Integrated Weighted Index Model in Jiuzhaigou Region of Sichuan Province, China. Nat. Hazards Earth Syst. Sci. 2019, 19, 1973–1988. [Google Scholar] [CrossRef]
Ciurleo, M.; Cascini, L.; Calvello, M. A Comparison of Statistical and Deterministic Methods for Shallow Landslide Susceptibility Zoning in Clayey Soils. Eng. Geol. 2017, 223, 71–81. [Google Scholar] [CrossRef]
Manzo, G.; Tofani, V.; Segoni, S.; Battistini, A.; Catani, F. GIS Techniques for Regional-Scale Landslide Susceptibility Assessment: The Sicily (Italy) Case Study. Int. J. Geogr. Inf. Sci. 2013, 27, 1433–1452. [Google Scholar] [CrossRef]
Oh, H.-J.; Pradhan, B. Application of a Neuro-Fuzzy Model to Landslide-Susceptibility Mapping for Shallow Landslides in a Tropical Hilly Area. Comput. Geosci. 2011, 37, 1264–1276. [Google Scholar] [CrossRef]
Sameen, M.I.; Pradhan, B.; Lee, S. Application of Convolutional Neural Networks Featuring Bayesian Optimization for Landslide Susceptibility Assessment. Catena 2020, 186, 104249. [Google Scholar] [CrossRef]
Othman, A.A.; Gloaguen, R.; Andreani, L.; Rahnama, M. Improving Landslide Susceptibility Mapping Using Morphometric Features in the Mawat Area, Kurdistan Region, NE Iraq: Comparison of Different Statistical Models. Geomorphology 2018, 319, 147–160. [Google Scholar] [CrossRef]
Wang, Q.; Wang, Y.; Niu, R.; Peng, L. Integration of Information Theory, K-Means Cluster Analysis and the Logistic Regression Model for Landslide Susceptibility Mapping in the Three Gorges Area, China. Remote Sens. 2017, 9, 938. [Google Scholar] [CrossRef]
Kornejady, A.; Ownegh, M.; Bahremand, A. Landslide Susceptibility Assessment Using Maximum Entropy Model with Two Different Data Sampling Methods. Catena 2017, 152, 144–162. [Google Scholar] [CrossRef]
Meneses, B.M.; Pereira, S.; Reis, E. Effects of Different Land Use and Land Cover Data on the Landslide Susceptibility Zonation of Road Networks. Nat. Hazards Earth Syst. Sci. 2019, 19, 471–487. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, W.; Zhang, Z.; Xu, Q.; Li, W. Risk Factor Detection and Landslide Susceptibility Mapping Using Geo-Detector and Random Forest Models: The 2018 Hokkaido Eastern Iburi Earthquake. Remote Sens. 2021, 13, 1157. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Breiman, L. Stacked Regressions. Mach. Learn. 1996, 24, 49–64. [Google Scholar] [CrossRef]
Choubin, B.; Moradi, E.; Golshan, M.; Adamowski, J.; Sajedi-Hosseini, F.; Mosavi, A. An Ensemble Prediction of Flood Susceptibility Using Multivariate Discriminant Analysis, Classification and Regression Trees, and Support Vector Machines. Sci. Total Environ. 2019, 651, 2087–2096. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Schapire, R. The Strength of Weak Learnability. Mach. Learn. 1990, 5, 197–227. [Google Scholar] [CrossRef]
Wolpert, D. Stacked Generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
Fang, Z.; Wang, Y.; Peng, L.; Hong, H. Integration of Convolutional Neural Network and Conventional Machine Learning Classifiers for Landslide Susceptibility Mapping. Comput. Geosci. 2020, 139, 104470. [Google Scholar] [CrossRef]
Wang, Y.; Fang, Z.; Wang, M.; Peng, L.; Hong, H. Comparative Study of Landslide Susceptibility Mapping with Different Recurrent Neural Networks. Comput. Geosci. 2020, 138, 104445. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Neural Information Processing Systems (NIPS): La Jolla, CA, USA, 2017; Volume 30. [Google Scholar]
Yi, Y.; Zhang, Z.; Zhang, W.; Jia, H.; Zhang, J. Landslide Susceptibility Mapping Using Multiscale Sampling Strategy and Convolutional Neural Network: A Case Study in Jiuzhaigou Region. Catena 2020, 195, 104851. [Google Scholar] [CrossRef]
Alizadeh, M.; Alizadeh, E.; Asadollahpour Kotenaee, S.; Shahabi, H.; Beiranvand Pour, A.; Panahi, M.; Bin Ahmad, B.; Saro, L. Social Vulnerability Assessment Using Artificial Neural Network (ANN) Model for Earthquake Hazard in Tabriz City, Iran. Sustainability 2018, 10, 3376. [Google Scholar] [CrossRef]
Erener, A.; Mutlu, A.; Sebnem Düzgün, H. A Comparative Study for Landslide Susceptibility Mapping Using GIS-Based Multi-Criteria Decision Analysis (MCDA), Logistic Regression (LR) and Association Rule Mining (ARM). Eng. Geol. 2016, 203, 45–55. [Google Scholar] [CrossRef]
Cao, J.; Zhang, Z.; Wang, C.; Liu, J.; Zhang, L. Susceptibility Assessment of Landslides Triggered by Earthquakes in the Western Sichuan Plateau. Catena 2019, 175, 63–76. [Google Scholar] [CrossRef]
Zhu, Y.; Qiu, H.; Liu, Z.; Ye, B.; Tang, B.; Li, Y.; Kamp, U. Rainfall and Water Level Fluctuations Dominated the Landslide Deformation at Baihetan Reservoir, China. J. Hydrol. 2024, 642, 131871. [Google Scholar] [CrossRef]
Chang, Z.; Catani, F.; Huang, F.; Liu, G.; Meena, S.R.; Huang, J.; Zhou, C. Landslide Susceptibility Prediction Using Slope Unit-Based Machine Learning Models Considering the Heterogeneity of Conditioning Factors. J. Rock Mech. Geotech. Eng. 2023, 15, 1127–1143. [Google Scholar] [CrossRef]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the 2016 IEEE Conference On Computer Vision And Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]

Figure 1. (a,b) are the geographical location of the study area; (c) is the elevation of the study area.

Figure 2. Date layers of landslide conditioning factors: (a) slope, (b) aspect, (c) plan curvature, (d) profile curvature, (e) TWI, (f) SPI, (g) lithology, (h) distance to road, (i) distance to history earthquake, (j) distance to human active point, (k) distance to faults, (l) LULC, (m) NDVI, and (n) annual precipitation.

Figure 3. Flowchart of method.

Figure 4. Multicollinearity of the 14 factors.

Figure 5. ROC of landslide models: (a) on the training dataset, (b) on the validation dataset.

Figure 6. Landslide susceptibility zoning results from LSM models: (a) SVM, (b) GNB, (c) LR, (d) Bagging, (e) Boosting, (f) Voting, and (g) Stacking.

Figure 7. (a) The importance of factors and (b) the summary plot of SHAP values.

Figure 8. Local interpretation of landslide unit.

Table 1. Description of lithology of study area.

Legend
Number	Lithology	Number	Lithology
a	Holocene Alluvial	j	Ice Peak Group
b	Pleistocene Glaciers Accumulate	k	Qiongyi Group
c	Oligocene Geru Group	l	Late Carboniferous Hollowa Group
d	Baoding Group or Baiguowan Group	m	Gangou Group
e	Sanzhushan Group–Maso Mountain Group	n	Posongchong Group–Qujing Group
f	Qingtianbao Group–Yanyuan Group	o	Early Devonian Segala Group
g	Dashibao Group–Pinecigou Group	p	Lou Shanguan Group
h	Permian Mafic Rocks	q	Wooden Seat Group–Crystal Group
i	Gunda Overview Group	l	Yang Tianba Group

Table 2. Factors utilized in the present study.

Source	Time	Scale	Format	Factor
Landslides		-	Shapefile	-
ASTER DEM (https://earthexplorer.usgs.gov/)	2011	30 m	TIFF	Slope
				Aspect
				Profile curvature
				Plan curvature
				SPI
				TWI
Underlying geographic information data (https://www.webmap.cn/)	2021	1:1,000,000	Shapefile	Distance to road
				Distance to human active point
History earthquake point (https://data.earthquake.cn/)	2010–2022	-	Shapefile	Distance to history earthquake point
Geological map	2017	1:1,000,000	TIFF	Distance to faults
				Lithology
LULC map (http://data.ess.tsinghua.edu.cn/)	2020	30 m	TIFF	LULC
Landsat 8	2022	30 m	TIFF	NDVI
Chinese Academy of Sciences, Center for Resource and Environmental Data and Sciences	2002–2022	1000 m	TIFF	Annual precipitation

Table 3. Four cases of prediction results of binary classification model.

Confusion Matrix		Predicted Value
Confusion Matrix		Positive Class (1)	Negative Class (0)
True value	Positive Class (1)	True Positive (TP)	False Negative (FN)
True value	Negative Class (0)	False Positive (FP)	True Negative (TN)

Table 4. Evaluation of landslide models using training dataset.

	Model	Accuracy	Precision	AUC	Recall	Kappa	Matthews
Base classifiers	SVM	0.8672	0.8952	0.9267	0.8318	0.7345	0.7363
	LR	0.8230	0.8119	0.9020	0.8407	0.6560	0.6464
	GNB	0.8761	0.9126	0.9174	0.8318	0.7522	0.7551
Ensemble learning methods	Voting	0.9070	0.9259	0.9789	0.8849	0.8141	0.8149
	Bagging	0.9380	0.9380	0.9847	0.9380	0.8761	0.8762
	Stacking	0.9161	0.9126	0.9516	0.8718	0.7922	0.8081
	Boosting	0.9336	0.9437	0.9789	0.9115	0.8672	0.8681

Table 5. Evaluation of landslide models using validation dataset.

	Model	Accuracy	Precision	Auc	Recall	Kappa	Matthews
Base classifiers	SVM	0.8571	0.9069	0.9192	0.7959	0.7142	0.7197
	LR	0.8265	0.8200	0.9042	0.8367	0.6530	0.6531
	GNB	0.8571	0.9069	0.9017	0.7959	0.7142	0.7197
Ensemble learning methods	Voting	0.8775	0.9092	0.9296	0.8571	0.7551	0.7557
	Bagging	0.8775	0.9243	0.9340	0.8571	0.7551	0.7557
	Stacking	0.8673	0.9285	0.9291	0.8124	0.7522	0.7551
	Boosting	0.8857	0.8869	0.9180	0.8163	0.7628	0.7629

Table 6. Calculation results of statistical results (%) and landslide density (no./km²) in each landslide sensitivity zone of LSM models.

Level	Very Low		Low		Moderate		High		Very High
Level	Density	Ratio	Density	Ratio	Density	Ratio	Density	Ratio	Density	Ratio
SVM	0.0249	46.23	0.0867	17.76	0.0992	18.56	0.1577	10.22	1.0223	7.23
LR	0.0071	40.93	0.0323	27.23	0.1235	16.52	0.2438	9.22	1.5450	6.10
GNB	0.0035	60.30	0.0218	27.12	0.1280	12.22	0.2022	6.32	1.4233	6.26
Voting	0.0013	41.72	0.0316	30.24	0.1273	12.83	0.2012	7.12	1.2011	6.63
Boosting	0.0025	52.23	0.0225	19.60	0.1125	11.29	0.2532	7.26	1.4324	9.62
Bagging	0.0004	42.57	0.0023	18.31	0.0183	21.95	0.1943	11.29	1.6792	5.68
Stacking	0.0020	56.07	0.0113	17.48	0.0204	15.22	0.1822	5.21	1.6172	6.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

An, B.; Zhang, Z.; Xiong, S.; Zhang, W.; Yi, Y.; Liu, Z.; Liu, C. Landslide Susceptibility Mapping Based on Ensemble Learning in the Jiuzhaigou Region, Sichuan, China. Remote Sens. 2024, 16, 4218. https://doi.org/10.3390/rs16224218

AMA Style

An B, Zhang Z, Xiong S, Zhang W, Yi Y, Liu Z, Liu C. Landslide Susceptibility Mapping Based on Ensemble Learning in the Jiuzhaigou Region, Sichuan, China. Remote Sensing. 2024; 16(22):4218. https://doi.org/10.3390/rs16224218

Chicago/Turabian Style

An, Bangsheng, Zhijie Zhang, Shenqing Xiong, Wanchang Zhang, Yaning Yi, Zhixin Liu, and Chuanqi Liu. 2024. "Landslide Susceptibility Mapping Based on Ensemble Learning in the Jiuzhaigou Region, Sichuan, China" Remote Sensing 16, no. 22: 4218. https://doi.org/10.3390/rs16224218

APA Style

An, B., Zhang, Z., Xiong, S., Zhang, W., Yi, Y., Liu, Z., & Liu, C. (2024). Landslide Susceptibility Mapping Based on Ensemble Learning in the Jiuzhaigou Region, Sichuan, China. Remote Sensing, 16(22), 4218. https://doi.org/10.3390/rs16224218

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Landslide Susceptibility Mapping Based on Ensemble Learning in the Jiuzhaigou Region, Sichuan, China

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area

2.2. Database

2.2.1. Historical Landslides

2.2.2. Landslide Conditioning Factors

3. Methods

3.1. Evaluation of Conditioning Factors

3.2. Base Classifiers

3.2.1. Support Vector Machines

3.2.2. Logistic Regression

3.2.3. Gaussian Naive Bayes

3.3. Ensemble Learning Methods

3.3.1. Bagging

3.3.2. Voting

3.3.3. Boosting

3.3.4. Stacking

3.4. Model Evaluation Measures

3.5. SHapley Additive exPlanations

4. Results

4.1. Multicollinearity

4.2. Model Performance and Evaluation

4.3. Landslide Susceptibility Mapping

5. Discussion

5.1. SHAP Model Interpretation

5.2. Limitation and Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI