A New Approach to Spatial Landslide Susceptibility Prediction in Karst Mining Areas Based on Explainable Artificial Intelligence

Fang, Haoran; Shao, Yun; Xie, Chou; Tian, Bangsen; Shen, Chaoyong; Zhu, Yu; Guo, Yihong; Yang, Ying; Chen, Guanwen; Zhang, Ming

doi:10.3390/su15043094

Open AccessArticle

A New Approach to Spatial Landslide Susceptibility Prediction in Karst Mining Areas Based on Explainable Artificial Intelligence

by

Haoran Fang

^1,2,3

,

Yun Shao

^1,2,3,

Chou Xie

^1,2,3,*

,

Bangsen Tian

¹

,

Chaoyong Shen

⁴

,

Yu Zhu

¹,

Yihong Guo

¹,

Ying Yang

^1,2,

Guanwen Chen

⁴ and

Ming Zhang

^1,2

¹

Aerospace Information Research Institute, University of Chinese Academy of Sciences, Beijing 100094, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Laboratory of Target Microwave Properties, Deqing Academy of Satellite Applications, Huzhou 313200, China

⁴

The Third Surveying and Mapping Institute of Guizhou Province, Guiyang 550004, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(4), 3094; https://doi.org/10.3390/su15043094

Submission received: 6 January 2023 / Revised: 2 February 2023 / Accepted: 6 February 2023 / Published: 8 February 2023

(This article belongs to the Special Issue Geological Hazards and Risk Management)

Download

Browse Figures

Versions Notes

Abstract

:

Landslides are a common and costly geological hazard, with regular occurrences leading to significant damage and losses. To effectively manage land use and reduce the risk of landslides, it is crucial to conduct susceptibility assessments. To date, many machine-learning methods have been applied to the landslide susceptibility map (LSM). However, as a risk prediction, landslide susceptibility without good interpretability would be a risky approach to apply these methods to real life. This study aimed to assess the LSM in the region of Nayong in Guizhou, China, and conduct a comprehensive assessment and evaluation of landslide susceptibility maps utilizing an explainable artificial intelligence. This study incorporates remote sensing data, field surveys, geographic information system techniques, and interpretable machine-learning techniques to analyze the sensitivity to landslides and to contrast it with other conventional models. As an interpretable machine-learning method, generalized additive models with structured interactions (GAMI-net) could be used to understand how LSM models make decisions. The results showed that the GAMI-net model was valid and had an area under curve (AUC) value of 0.91 on the receiver operating characteristic (ROC) curve, which is better than the values of 0.85 and 0.81 for the random forest and SVM models, respectively. The coal mining, rock desertification, and rainfall greater than 1300 mm were more susceptible to landslides in the study area. Additionally, the pairwise interaction factors, such as rainfall and mining, lithology and rainfall, and rainfall and elevation, also increased the landslide susceptibility. The results showed that interpretable models could accurately predict landslide susceptibility and reveal the causes of landslide occurrence. The GAMI-net-based model exhibited good predictive capability and significantly increased model interpretability to inform landslide management and decision making, which suggests its great potential for application in LSM.

Keywords:

landslides susceptibility map; explainable AI; GIS; Karst landform; coal mining

1. Introduction

In Karst regions, landslides are a significant natural hazard distinguished by their widespread occurrence, high frequency, and tremendous damage [1,2]. The frequency, size, and destructiveness of landslides have recently increased due to enhanced human activity and frequent catastrophic weather occurrences [3,4]. Landslides provide a hazard to 74 million people in China alone, with yearly direct economic damages typically running into the hundreds of millions [5]. While landslides may not be entirely avoidable, managers may create effective mitigation measures to lessen mortality and damage by researching landslide condition factors, modeling landslide risk, and mapping and interpreting hazard-prone locations. Landslide susceptibility assessments are valuable for disaster prevention, land resource planning, and risk assessment in landslide-prone areas.

Landslide susceptibility assessment is a method of quantitatively and spatially estimating the likelihood of a landslide occurring in a specific area [6]. It considers various factors contributing to landslides, such as topographic features including slope, aspect, and material properties such as soil type and lithology [7]. In addition, environmental factors such as climate, hydrology, ecology, human activities, and seismic activity could also play a role in the occurrence of landslides [8]. Recent advances in remote sensing technology and spatial datasets have provided valuable data for geological hazard monitoring and landslide susceptibility assessment [9]. The use of remote sensing allows for the collection of a wide range of data, including optical and microwave observations, which can provide information on normalized difference vegetation index (NDVI), soil moisture content, and surface deformation [10,11]. Zhang and Shen et al. discuss the effect of vegetation distribution on LSM implementation and the use of Interferometric Synthetic Aperture Radar (InSAR) surface deformation information to achieve dynamic refinement of LSM [12,13]. However, there are differences in geography, topography, and climatic conditions in different areas, and landslide patterns, mechanisms, and drivers vary in these areas. Therefore, landslide influencing factors need to be selected accordingly to the characteristics of the study area. This study considered the impact of frequent mining activities on landslides in the Nayong area.

Researchers have used landslide influencing factors for landslide susceptibility assessment through physical models, statistical analysis methods, or machine learning techniques [14,15,16,17,18,19,20,21]. Researchers used finite element models and limit equilibrium methods in the numerical simulation methods to assess slope stability [14]. This method simplifies assumptions about some environmental conditions (e.g., soil moisture content and rock characteristics) and landslide structures, and therefore requires costly field exploration and is not conducive to large-scale regional monitoring [15]. Statistical methods are reproducible and objective, and typical methods include logistic regression (LR), frequency ratio methods, and index of entropy [16,17,18]. Machine learning is the current mainstream method for LSM, which has significantly improved the accuracy and precision of landslide susceptibility assessment models. Common machine learning methods include support vector machines (SVM), random forest (RF), and deep neural networks [19,20,21].

Previous studies have shown that machine learning (ML) approaches and deep learning (DL) methods can assess landslide susceptibility in various area [22]. However, researchers have tended to neglect the explanatory aspects of models in seeking higher accuracy. Because of the ‘black box model’, these methods have difficulty explaining how landslide factors contribute to and dominate the occurrence of all or individual landslides in the study area. The lack of interpretation is a significant drawback of the above machine models and a fundamental flaw in their application to high-risk events (e.g., landslides, earthquakes, and lives) [23]. In these applications, decisions can affect lives, property, and generate huge costs. Based on the above analysis, the study provided an explainable artificial intelligence (XAI) method, called generalized additive models with structured interactions (GAMI-net), for generating LSM by integrating landslide influence factors and explained the effects of variables (features) on predictions and their responses in predictors. The GAMI-net model is an interpretable model proposed by Professor Zhang Aijun in 2021, consisting of a generalized additive model and several artificial neural networks (ANN) and sub-networks [24]. The interpretable model results help decision makers better understand the factors contributing to landslide risk and identify areas that may require interventions such as slope stabilization or land management.

The study extracted the corresponding landslide impact factors from satellite remote sensing data and geological information. Specifically, the DEM data derived terrain elevation, slope, aspect, and curvature. The NDVI is a measure of vegetation cover that indicates dense vegetation, soil moisture content, and soil type and directly impacts soil quality movement. Lithology is an essential factor indicating the strength of the soil. Heavy rainfall may lead to soil saturation, resulting in rainfall-induced landslides. In addition, mining, roads, and rivers are also contributing factors to landslides. The GAMI-net model constructed LSM by these factors and provided a global interpretation of landslide susceptibility. Next, the performance of traditional models (LR, SVM, and RF) and GAMI-net models are evaluated and analyzed. Finally, the Shapley adaptive interpretation (SHAP) method was used to analyze the key factors leading to landslides in a typical landslide, taking the town of Zongling in Nayong County as an example.

2. Materials and Methods

2.1. Study Area and Landslide Inventory

The study area is located in Nayong County, southeast of Bijie City, Guizhou Province, at longitude 104°55′–105°38′ East and latitude 26°30′–27°05′ North (Figure 1). The Karst is widely developed in the region, the geological environment is complex, karst action is obvious, and long-term mining activities have caused frequent geological disasters in the area with strong concealment [25]. On 28 August 2017, a large-scale landslide occurred in the Qiaobian group of the Pusha community in Zhangjiawan Town, which caused more than 30 deaths [26]. Since 2010, two hundred ninety-three geological disaster events have been recorded, impacting 1716 people in Nayong [27]. As a result, landslide hazards seriously threaten the safety of people’s lives and property in the study area.

The Third Institute of Surveying and Mapping of Guizhou Province provided the landslides inventory, including a total of 293 historical landslides (2010–2020) in Nayong County. In addition, the same number of non-landslides were used as a reference group. According to the geological hazard survey results, there were 194 small landslides, 88 medium landslides, eight large landslides, and three massive landslides.

2.2. Data Preparation

Landslides are natural hazards influenced by various factors, including geometric conditions such as elevation, slope, and slope orientation, as well as hydrological factors such as the topographic moisture index (TWI) and distance from rivers [28,29]. In addition, geological and environmental factors, such as lithology, land use, and human activity conditions can also impact the likelihood of landslides [30,31]. These factors may vary depending on the region in which the hazard occurs. Accurate detection of landslides requires the consideration of multiple factors, including those related to both natural phenomena and human activities [32,33].

Based on the above considerations and relevant literature, 14 influencing factors were selected for the study, including topography and geology, hydrology, land cover, and human activities [8,9,11,28,29,30,31,32,33]. The spatial distribution of these influencing factors is shown in Figure 2. Table 1 illustrated the factors selected, resolution and data source. The main landslide factors are the following categories:

Topography: The steepness and orientation of a slope can also play a role in its susceptibility to landslides. Steep slopes are generally more prone to landslides than gentle ones, and slopes facing specific directions may be more susceptible depending on the direction of predominant weather patterns and the location of water sources.

Geology: The type of rock and soil present on a slope can affect its stability. For example, slopes composed of weak, poorly consolidated soils or those with a high clay content may be more prone to landslides.

Hydrology: Water’s presence can significantly affect a slope’s stability. Water can seep into the ground and weaken the soil, making it more prone to landslides. Additionally, heavy rain or rapid snowmelt can cause landslides by adding weight to the slope or increasing the soil’s water content.

Land use: Human activities can also contribute to landslide susceptibility. For example, the construction of buildings or roads on steep slopes can increase the weight on the slope and make it more prone to landslides. Additionally, vegetation removal can destabilize a slope by reducing its ability to absorb water.

Climate: Changes in temperature and precipitation patterns can also influence landslide susceptibility. For example, prolonged drought can cause soil to become dry and brittle, making it more prone to landslides, while heavy rain or rapid snowmelt can increase the water content of the soil and lead to landslides.

To gather data on these factors, the study employed a variety of sources, including DEM, ground surveys, Landsat-8, and Global Precipitation Measurement (GPM). The data processing implemented in this study consisted of the following steps.

(1) Based on Landsat-8 OIL images obtained from the Google Earth Engine, the NDVI of the study area was produced;

(2) Obtaining SRTM DEM data of the study area from USGS to calculate elevation, slope, aspect, plan curvature, and profile curvature (https://urs.earthdata.nasa.gov/ accessed on 12 August 2022);

(3) The GPM data was obtained from the USGS and converted to spatial raster data by Kriging interpolation and meteorological station monitoring data;

(4) Soil type, geological data, and land use and land cover (LULC) were all rasterized, and only resampling was required.

Table 1. List of landslide factors and data scale and source.

Class of Factors	Factors Selected	Scale	Data Source
Topography	Elevation	30 m	Calculated from DEM (NASA)
	Slope angle	30 m
	Aspect	30 m
	Topographic wetness index	30 m
	Profile curvature	30 m
	Plan curvature	30 m
Geolithology	Distance to an active fault	30 m	The Third Survey and Mapping Institute of Guizhou Province
Geolithology	Lithology	30 m	The Third Survey and Mapping Institute of Guizhou Province
Land use and land cover	Land use and land cover	30 m	Ministry of Natural Resources Data Center
Morphology	Distance to rivers and road	30 m	Openstreetmap
Climate	Rainfall	Vector/0.1°	Meteorological station monitoring data and GPM
Vegetation	NDVI	30 m	Landsat
Human	Coal mining	Vector	The Third Survey and Mapping Institute of Guizhou Province

Figure 2. Influencing factors in Nayong. (a) Elevation; (b) slope; (c) profile curvature; (d) plan curvature; (e) aspect; (f) fault and coal mining; (g) rainfall; (h) road and river; (i) land use and land cover; (j) NDVI; (k) TWI; and (l) lithology.

The NDVI time series was calculated from 2016 to 2021, as shown in Figure 3. The 2018 NDVI mean was the lowest. Therefore, the study chose the 2018–2021 NDVI mean as an input factor in this study. Figure 4 depicts a time series graph of NDVI variations from 2016 to 2021, correlating to landslides (purple) and non-landslides (red). The NDVI trends associated with non-landslides showed greater values than the NDVI associated with landslides. This may imply that places with less vegetation are more vulnerable to landslides. For precipitation data, the Global Precipitation Measurement (GPM) mission (IMERG) satellite precipitation products have a spatial resolution of 0.1° (approximately 10 km), which is difficult for regional landslide susceptibility analysis. Therefore, the topographic data (elevation, slope, slope direction) to establish a mapping between precipitation and topographic factors in the study area using the geo-weighted regression approach [34]. Following, descending sampling was used to generate precipitation data for Nayong Town at a resolution of 1 km. Finally, five meteorological station data were used for validation in the study area.

3. Methods

3.1. Overall Methodology

As shown in Figure 5, the study began by collecting and extracting various landslide features from various data sources, including topography, lithology, land use and land cover, morphology, climate, vegetation, and human activity. These features were essential for understanding the underlying causes of landslides and identifying areas at high risk. The extracted landslide features were then converted into a stacked dataset using band synthesis and resampling to ensure data consistency and accuracy. The study split the landslides dataset (50% for the landslide sample and 50% for the non-landslide sample) into training (80%) and test (20%) sets upon random permutation. Next, the GAMI-net and other algorithms (LR, SVM, and RF) were applied to the stack dataset for model calculation and parameter optimization. The grid search was utilized to optimize the parameters by adjusting the weights and biases of the model to achieve the best performance. After the model calculation and parameter optimization, the model’s performance was evaluated using various metrics, such as accuracy, precision, ROC, and recall. These metrics determine how well the model can predict landslides based on the input data. This evaluation helps to identify any issues or limitations of the model and to improve it if necessary. Finally, the model was interpreted and used to create a landslide susceptibility map (LSM) that can be used to identify areas at high risk of landslides.

3.2. Machine Learning Methods

3.2.1. Logistic Regression (LR)

Multiple logistic regression was widely used to examine LSM [35]. The basic idea behind logistic regression is to use Equations (1) and (2) to predict the likelihood of landslides based on a set of input factors. By training the model on a dataset of landslides and non-landslides, it is possible to use the trained model to predict the likelihood of a slope failure in a new, unseen location. The probability values of landslide susceptibility were calculated by the model for all landslide factors in the study area. In general, the probability value indicated the landslide susceptibility of the area and the closer it was to 1, the greater the landslide susceptibility. The logistic function is described as:

f (x) = a_{0} + \sum_{k = 1}^{n} (a_{k} x_{k})

(1)

P = \frac{1}{1 + (e x p^{- f (x)})}

(2)

where

P

is the probability of landslide occurrence and

f (x)

a linear combination of casual

x_{k}

factors.

3.2.2. Random Forest (RF)

Random forests are a machine-learning algorithm that can predict the likelihood of a landslide occurring in a particular area [36,37]. The algorithm creates many decision trees, each of which is trained on a random subset of the data. The trees are then combined to form a “forest”, and the predictions of the individual trees are averaged to produce a final prediction. To use random forests for landslide susceptibility analysis, a dataset containing information about the characteristics of an area, such as soil type, slope angle, and vegetation cover, is needed. The algorithm is then trained on this dataset to learn the relationship between the predictor variables and the occurrence of landslides in the area. Once the model has been trained, it can predict the likelihood of a landslide occurring in a new area by inputting the relevant predictor variables. The model’s output is a probability, indicating the likelihood of a landslide in the given area.

3.2.3. Support Vector Machines (SVM)

Support Vector Machines (SVM) is a supervised learning algorithm that can be adopted for classification and regression tasks [38,39]. In the context of landslide susceptibility, SVM can classify areas into different susceptibility classes (e.g., low, moderate, and high susceptibility) based on a set of input features or predictors.

SVM works by locating the hyperplane in a high-dimensional space that best divides the various classes. The hyperplane is chosen to maximize the distance between the hyperplane and the nearest data points from each class (called the support vectors). This distance is known as the margin. The larger the margin, the lower the generalization error and the better the classifier will perform on unseen data.

The SVM algorithm involves solving a convex optimization problem to find the hyperplane that maximally separates the classes. The optimization problem can be written as:

minimize w, b, ξ

subject to:

y_{i} (W^{T} x_{i} + b) \geq 1 - ξ_{i}, i = 1, 2, \dots, n ξ_{i} \geq 0, i = 1, 2, \dots, n

(3)

where w and b are the weight and bias of the hyperplane, respectively,

x_{i}

is the input feature vector,

y_{i}

is the class label (0 or 1), and

ξ_{i}

is the slack variable that allows for misclassification. The optimization problem is subject to the constraints that the distance between the hyperplane and the nearest data points from each class (the support vectors) is at least 1 and the slack variables are non-negative.

In the case of non-linearly separable data, SVM can still find a good hyperplane by introducing additional dimensions through the use of kernels. A kernel is a function that transforms the input data into a higher-dimensional space where the data may be more easily separable.

3.2.4. GAMI-Net

The generalized additive model (GAM) is a semi-parametric smoothing model that provides a valuable tool for investigating factor interaction by Equation (4) [40].

g (E (y | x)) = μ + \sum_{j \in S_{1}}^{M} h_{j} (x_{j}) + ε

(4)

where

h_{j} (x_{j})

is a smooth function, and j denotes various factors. The

μ

indicates the predictor variable category relationship that is unaffected by nonlinear transformation, while the

ε

represents the residual.

In a traditional GAM network, a feature was made to have monotonically increasing or decreasing properties on the target value by setting constraints and penalty functions or adjusting the number of spline functions [41]. The GAM model are more explanatory than the other black box ML models; they could not be represented as a single function describing the estimated relationship between independent and dependent variables. The GAM model does not consider the factors’ interactions, and the addable spline function does not have sufficient predictive accuracy [42].

To solve the shortcomings of the GAM model, GAMI-net considered factor interactions and replaced the spline function in GAM with an ANN network to improve prediction accuracy. Under the additive model, it was found that it explored and explained a particular variable by being able to control for other variables. As related research has demonstrated, the explanatory nature of ANNs can be achieved under appropriate constraints. This enhancement can explain neural networks with an excellent balance between prediction performance and model interpretation.

g (E (y | x)) = μ + \sum_{j \in S_{1}}^{M} h_{j} (x_{j}) + \sum_{(j, k) \in S_{2}}^{N} f_{j k} (x_{j}, x_{k})

(5)

According to Figure 6, the GAMI-net consisted of the individual sub-networks, interaction sub-networks, and three constraints. The all sub-networks estimated the individual and interaction factors. At the same time, the three constraints ensured interpretability. The sparsity constraint ranks the features according to their importance, retaining only the more critical feature ridge functions. The heredity constraint ensures that the interaction feature pair must have at least one feature from a vital feature retained in the individual feature sub-network module. The marginal clarity constraint prevents individual features and their corresponding interaction features from absorbing each other, making the model more stable. According to the GAMI-net structure, the hyperparameters are learning rate, batch size, epoch, the maximum number of features, and crossover features. In this study, the learning rate and batch size were set by default, with learning rates of 0.001 and the batch size of 200. The training process was divided into seven stages as follows (Table 2). First, the individual factors sub-networks were trained iteratively for 1000 epochs, and then the interaction factors subnetwork module was trained for 1000 epochs. In each module, the features were ranked according to their importance. Then, the unimportant tail features were removed according to the magnitude of the loss function value and then trained iteratively for 500 epochs. In this paper, the maximum number of individual features was (4, 8, 12, 16, 20), and the maximum number of interaction features was (0, 6, 9, 12, 15, 20) in the grid search.

3.3. Explainability of GAMI-Net

3.3.1. Importance Ratio

The importance ratio (IR) values are used to measure each factor contribution to the overall forecast. The definition of IR referenced the Sobol index, the main difference being that the Sobol index was generated under the premise that all variables were independent and uniformly distributed, whereas IR was based on the empirical distribution of the predicted variables [43,44]. Each factor had an importance score based on the model, and the important ratio

I R (j)

of that factor can be calculated using Equation (6). Correspondingly, the importance of each pairwise factor

I R (j, k)

could be obtained from Equation (7).

I R (j) = D (h_{j}) / T

(6)

I R (j, k) = D (f_{j k}) / T

(7)

where

D (h_{j})

and

D (f_{j k})

T = \sum_{j \in S_{1}} D (h_{j}) + \sum_{(j, k) \in S_{2}} D (f_{j k})

(8)

In addition to the IR factors, the ridge function was used to explain the relationship between one or two specific factors and the landslides. The

h_{j} (x_{j})

value drew one-dimensional linear graphs for continuous factors and the bar charts for categorical factors to show the factor-landslides relationship. Each of the pairwise factors

f_{j k} (x_{j}, x_{k})

can be visualized using a 2D heat map showing the effect of the factor on the slippage.

3.3.2. Shapley Additive Explanations (SHAP)

For a trained complex machine learning model, SHAP interprets the model by calculating the Shapley values of the features to measure the importance of the features [45,46]. The Shapley value can be calculated for each feature component of each sample, and the Shapley values of the sample feature components are additive. The magnitude of the Shapley value of a sample feature component tells how much the value taken by the sample on each feature affects the prediction result for that sample [47]. The overall feature importance can be obtained by calculating the absolute value of the Shapley values of all samples on each feature to find out the magnitude of each feature’s influence on the prediction results of the complex model by averaging the overall importance.

With the SHAP method, GAMI-net may directly extract the value of each additive component, such as the primary factor or the pairwise interaction factor, for every sample x. The sequence of these marginal effects can help to clarify how the input x’s choices should be understood. The 1D line plots (or bar charts) and 2D heat maps can be used to statistically analyze how sensitive the forecast is to modest changes in the explanatory factors.

4. Results

4.1. Model Assessments for RF, SVM, and GAMI-Net

The variation of model accuracy (ROC) for the different maximum number of feature parameters (individual and interaction) in the parametric grid search was shown in Figure 7. As the maximum number of features increased, the model accuracy improved. The highest accuracy was obtained when the maximum number of features was 16 and 15 for single and interactive features, respectively. After this point, the model accuracy decreased, probably due to the increase in error in additive models with too many parameters, indicating the need for fine-tuning. Table 3 illustrated the evaluation metrics obtained by predicting 510 samples (20% of the total sample was the test set) within the study area using each algorithm, with LR generally having the weakest algorithm performance and the SVM, RF, and GAMI-net algorithms having better performance. Based on the ROC curves, the algorithms were ranked from smallest to most significant in terms of accuracy: LR, SVM, RF, and GAMI-net.

As shown in Figure 8, the GAMI-net method provided comparable performance to the RF methods, with prediction accuracies all greater than 0.8, F1 scores of 0.7897 and 0.8683, respectively, and ROC values all greater than 0.9. The GAMI-net model outperformed LR and SVM and slightly outperformed RF in predicting landslide susceptibility in the study area, demonstrating that the algorithm could achieve good results.

Figure 9 shows the training and validation losses of the GAMI-net with 25 interaction effects considered. In the first stage, the loss rate does not change significantly when only the main effects are considered, indicating that landslide susceptibility cannot be effectively predicted using only the main effects. According to Table 3, as paired interactions were added to the network, the loss rate reduced dramatically, demonstrating the necessity to add pairwise interactions to the GAMI-net. There was a significant surge in training and validation loss at the start of the fine-tuning phase, which coincided with a fine-tuning operation on the paired factors. Figure 10 also depicts the validation loss for establishing the ideal number of main effects and pairwise interactions. The number of main effects and pairwise interactions included is indicated on the left and right image x-axes, respectively. The red star symbols represent the number of main features or pairwise interactions that should be used. According to the findings, the GAMI-net had 15 main features and 13 pairwise interactions. The rise in the primary and pair effects beyond this period had no significant influence on the loss rate.

4.2. Comparing Landslides Susceptibility Map

Landslide susceptibility maps can be generated using a variety of methods, including support vector machines (SVM), random forests, and GAMI-net. Each of these methods has its own advantages and disadvantages, and the most appropriate method will depend on the specific needs of the application.

The map divides Nayong into several categories of landslide susceptibility, including high, moderate, low, and very low. High-susceptibility areas are those with a high probability of landslides occurring, while low-susceptibility areas have a low probability of landslides.

In Figure 11, the map showed that the highest susceptibility areas are found in the coal mining area, particularly in the central and southern part of the study area. As coal is extracted from underground mines, the void left behind can cause the overlying rock and soil to collapse, leading to landslides. This is especially true in Karst areas, where the rock is naturally porous and prone to landslides. In addition, the southern region is characterized by a highly weathered layer of loose dolomite, which is subject to intense erosion due to the impact of precipitation. This erosion process significantly disrupts the mechanical equilibrium of the otherwise fragile grass and irrigation-dominated system in a short period, leading to an increase in the regional susceptibility to landslides. This underscores the need for careful management of erosion and landslides in this region. Moderate susceptibility areas are found in the northern and central parts of the state, including the Mazong Hill. These areas are characterized by a mix of steep and gentle slopes, as well naturally porous rock and prone-to-collapse slope, along with some coal mining operations. They are also prone to mining and heavy rainfall, which can increase the risk of landslides. Low-susceptibility areas are found in the western part of the state. These areas are characterized by gentle slopes and sandy soils. They are generally less prone to landslides, although they can still be affected by earthquakes and other natural disasters.

4.3. Interpretative Results for GAMI-Net and Shapley Additive Explanations

Figure 12 showed the importance estimates for all impact factor variables for the GAMI-net model. The results for the study area indicated that rainfall, mining areas, lithology, elevation, rivers, and faults are the main factors. In brief, human activity and climate had a strong influence on landslides.

The global interpretation of the GAMI-net model factors was shown in Figure 13. With the introduction of 13 main effects and 25 interaction effects, it is found that rainfall, mine, river, failure, and lithology are the factors that account for 6.8%, 5.8%, 2.8%, 1.8%, and 1.7% of the landslide susceptibility factors, respectively. In terms of interaction effects, lithology rainfall, mining, rainfall, and elevation-rainfall account for 32.8% and 20.7% of the landslide susceptibility factors. The other susceptibility factors and their interaction effects were below 1%. In relative terms, the importance of the main effect is 18.9%, and the importance of the pairwise effect is 81.1%.

Firstly, rainfall was the most dominant influence factor. Rainfall causes soil destabilization and is highly susceptible to landslides. Extreme rainfall (class 1) significantly impacts landslides, with subsequent impacts on landslides decreasing as rainfall values decrease. Rainfall can cause landslides in several ways. For the distance factor of the mining area, the closer the mining area (the smaller the classification value in the graph), the higher the model score, indicating a higher landslide susceptibility factor, which suggests that mining activities are the main factor for landslide susceptibility in the area. Certain types of rock, such as shale (class 13) and clay (class 7), have high water retention capabilities due to their high porosity and low permeability. When these rocks become saturated with water, they can become unstable and prone to landslides. In contrast, rocks with low water retention capabilities, such as sandstone (class 3) and granite (class 11), are less prone to landslides because they are able to drain excess water more effectively. Earthquake-induced landslides are less common in the region due to few earthquakes and low seismic intensities. In addition, faults can create cracks and other types of pressure in the ground, which can increase the likelihood of landslides, and this probably will increase under the influence of mining and rainfall.

Secondly, in terms of interaction effects, the interaction between rainfall and lithology, mining, and elevation are the main factors, and the vulnerability of fragile rocks to landslides under rainfall conditions is evident. For the Nayong area, long-term mining activities have increased the fragile surface of the mountain. Landslide susceptibility was then significantly increased under external conditions such as rainfall.

Overall, the research measured the influence of various factors on landslide susceptibility using 1D/2D heat maps. For the study area, rainfall, mining, and lithology are the most significant influences.

4.4. Factorial Interpretability Analysis of the Zongling Landslides

The location of the Karst landslide in Zongling Town, Guizhou Province, China, was shown in Figure 14. It is a typical karst erosion landscape in the Guizhou plateau. Figure 14 is an optical image and InSAR deformation of Zongling, with the black polygon showing the landslide area. The InSAR surface deformation monitoring results were from our previous study [13,48]. Since the 1980s, Zongling Town has gradually become a central coal mining area, and a large amount of coal mining has created many landslide hazards. In particular, there is a cluster of landslides on the southern side of the Mazongling Mountains, running from west to east for a length of 5.6 km, and these landslides seriously threaten the people’s safety. The rear edge of the slope is an almost vertical escarpment over 30 m on top and below. There are numerous ground fissures on the summit, ranging from several hundred meters to several kilometers in length, with fissures ranging from nearly 2 to 6 m in width. The study used Zongling landslides as an example for interpretable analysis based on the landslide susceptibility results above. As shown in Figure 14, points A and B are located in the more deformed areas of the Zongling landslide. The results of the interpretable model analysis in Figure 15 reveal that rainfall, mining, lithology, and their coupling were the key factors in the landslides at these two places, which is congruent with the ground research results. As shown in Figure 16, mining causes stress damage to the Karst hills in the area; lack of vegetation, extreme climatic background, and rock desertification further exacerbate landslide susceptibility and threaten the safety of the roads and people below. It suggests that our interpretable results are credible and can provide some basis for disaster mitigation and resilience.

5. Discussion

In this study, we used a combination of GIS, remote sensing, and machine learning to analyze landslide susceptibility in Nayong County, Guizhou Province. It analyzed complex functional relationships through ANN networks of main effect factors and pairwise interaction factors and considered the coupling between the landslide factors, increasing the model’s interpretability. In addition, the relationship between the influence of landslide factors on landslide susceptibility can be easily assessed using 1D line/bar diagrams and 2D heat maps. Our results showed that landslide susceptibility in the region is significantly correlated with several landslide influence factors, including rainfall, slope, and lithology. By introducing interaction factors, we also found that coupling several factors had significant effects on landslide susceptibility, such as rainfall and mining, lithology and rainfall, rainfall and elevation, and elevation and mining.

Validation of the four models showed that GAMI-net was the most reliable method, with the best model performance on the landslide dataset, followed by RF and SVM, while LR had the worst. The result suggests that GAMI-net can provide a strong interpretation while guaranteeing better accuracy. Compared with the four machine-learning models, LR has fast classification speed and easy interpretation of classification results but has a specific error rate in classification decisions; RF has high training efficiency and high practicality but ignores the correlation between data. SVM can avoid overfitting but depends on the choice of kernel function; GAMI-net has the highest accuracy, considers factors’ coupling effect, has good interpretability, and quantifies the influence of landslide features.

In addition, our study uses interpretable algorithms to account for the changing influence of these factors on landslide susceptibility in the region. This method adds credibility to the landslide susceptibility model and provides a method for quantitatively assessing the influence of natural and human factors on landslides. Ranking rainfall and mining as more influential features in landslide prediction was in line with a series of regional studies [25,49,50]. Tao and Shi et al. used the Jinhaihu landslide event in Bijie as an example of a physical experiment and field investigation to show that lithology, construction, and rainfall (snow) were the main factors contributing to the landslide [49]. Wang et al. used time-series InSAR techniques to illustrate the persistent influence of mining activities on the Zongling landslide [50]. Unusually, the slope factor was classified as a less influential feature in Nayong, contrary to the findings of Zhao and Masoumi et al. in Iran [51], which may reflect the characteristics and regional effects of landslide events, landslide causal factors, and the dependence of the algorithm from one study area to another. In the present study, although the slope was positively correlated, its effect was small (Figure 13). The Nayong region was Karst landform with steep slopes at high elevations. The slopes in most parts of the region were eligible for landslide occurrence, making slope a relatively low-impact feature in the internal modeling process.

Our results showed that targeted measures, such as rational mining and rock desertification reduction, can effectively reduce landslide risks. In addition, our study could be used to inform land resource managers and policymakers. There have several limitations in the study that should be acknowledged. First, our analysis is based on remotely sensed data, which may only capture some relevant variables affecting landslides, such as cohesion and angle of internal friction. In addition, our study area is limited to mining areas and karst landscapes, so extension to other areas with different climatic and land use patterns will require corresponding changes.

6. Conclusions

LSM is an essential tool for landslide hazard reduction. Existing LSM research has mainly focused on machine-learning algorithms, with less interpretable research on LSM. Therefore, the GAMI-net method was applied as an interpretable algorithm and three traditional machine-learning models (LR, SVM, RF) to generate landslide sensitivity maps for Nayong region of Guizhou Province.

Firstly, the performance of the GAMI-net model and the three traditional ML models were examined and compared against several evaluation metrics. The results showed that GAMI-net outperformed the other three models (accuracy = 87.3%, ROC = 0.944). GAMI-net shows excellent potential for susceptibility assessment and interpretation, and it would be advantageous to explore its predictive capabilities in landslide susceptibility assessment.

Second, the LSM shows that most of the high-susceptibility study areas are located in mining and heavy rainfall areas, which are affected by erosion processes and rock desertification. The low susceptibility areas are the main areas of gentle slopes and lithological stability.

Thirdly, GAMI-net allows for an assessment of the significance of the characteristics of the sample (global and local interpretation). The findings suggest rainfall, mining, and lithology are the main causal factors for local landslides. In addition, some interaction factors such as rainfall and mining, lithology and rainfall, rainfall and elevation, and elevation and mining are also worth noting.

Overall, GAMI-net could be used to develop LSM and provide clear explanations for why the model made that prediction. This could help decision-makers better understand the risks of landslides in their area and take appropriate actions to prevent or mitigate them. The results of this work could be useful for decision-makers and planners in planning land use in areas prone to landslides.

Author Contributions

Conceptualization, Y.S. and C.X.; data curation, C.S. and H.F.; funding acquisition, Y.S. and C.X.; investigation, G.C., Y.Z. and Y.Y.; methodology, H.F. and C.X.; project administration, Y.G. and M.Z.; software, H.F. and B.T.; supervision, Y.S. and B.T.; writing—original draft, H.F.; writing—review and editing, C.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key R&D Program of China, grant number 2022YFC3005601, Outstanding Youth Science and Technology program of Guizhou Province of China ([2021]5615), and Multi-Source Remote Sensing Regional Landslide Hazard Risk Mapping and Key Landslide Fine Survey, Science and Technology Bureau of Fuzhou City, Fujian Province, grant number E1D7110100.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The generalized additive models with structured interactions—PyTorch version (https://github.com/SelfExplainML/GamiNet-PyTorch, accessed on 12 August 2022) and the Third Surveying and Mapping Institute of Guizhou Province.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wistuba, M.; Malik, I.; Gärtner, H.; Kojs, P.; Owczarek, P. Application of eccentric growth of trees as a tool for landslide analyses: The example of Picea abies Karst. in the Carpathian and Sudeten Mountains (Central Europe). Catena 2013, 111, 41–55. [Google Scholar] [CrossRef]
Chen, L.; Zhao, C.; Li, B.; He, K.; Ren, C.; Liu, X.; Liu, D. Deformation monitoring and failure mode research of mining-induced Jianshanying landslide in karst mountain area, China with ALOS/PALSAR-2 images. Landslides 2021, 18, 2739–2750. [Google Scholar] [CrossRef]
Guzzetti, F. Landslide fatalities and the evaluation of landslide risk in Italy. Eng. Geol. 2000, 58, 89–107. [Google Scholar]
Gariano, S.L.; Guzzetti, F. Landslides in a changing climate. Earth-Sci. Rev. 2016, 162, 227–252. [Google Scholar]
Zheng, H.; Liu, B.; Han, S.; Fan, X.; Zou, T.; Zhou, Z.; Gong, H. Research on landslide hazard spatial prediction models based on deep neural networks: A case study of northwest Sichuan, China. Environ. Earth Sci. 2022, 81, 1–15. [Google Scholar] [CrossRef]
Yong, C.; Jinlong, D.; Fei, G.; Bin, T.; Tao, Z.; Hao, F.; Li, W.; Qinghua, Z. Review of landslide susceptibility assessment based on knowledge mapping. Stoch. Environ. Res. Risk Assess. 2022, 36, 2399–2417. [Google Scholar]
Dikshit, A.; Sarkar, R.; Pradhan, B.; Acharya, S.; Alamri, A.M. Spatial landslide risk assessment at Phuentsholing, Bhutan. Geosciences 2020, 10, 131. [Google Scholar] [CrossRef]
Chen, X.; Chen, W. GIS-based landslide susceptibility assessment using optimized hybrid machine learning methods. Catena 2021, 196, 104833. [Google Scholar] [CrossRef]
Kalantar, B.; Ueda, N.; Saeidi, V.; Ahmadi, K.; Halin, A.A.; Shabani, F. Landslide susceptibility mapping: Machine and ensemble learning based on remote sensing big data. Remote Sens. 2020, 12, 1737. [Google Scholar]
Wentao, Y.; Ming, W.; Peijun, S. Using MODIS NDVI Time Series to Identify Geographic Patterns of Landslides in Vegetated Regions. IEEE Geosci. Remote Sens. Lett. 2013, 10, 707–710. [Google Scholar] [CrossRef]
Zhao, F.; Meng, X.; Zhang, Y.; Chen, G.; Su, X.; Yue, D. Landslide Susceptibility Mapping of Karakorum Highway Combined with the Application of SBAS-InSAR Technology. Sensors 2019, 19, 2685. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Shen, C.; Zhou, S.; Luo, X. Analysis of the Influence of Forests on Landslides in the Bijie Area of Guizhou. Forests 2022, 13, 1136. [Google Scholar]
Shen, C.; Feng, Z.; Xie, C.; Fang, H.; Zhao, B.; Ou, W.; Zhu, Y.; Wang, K.; Li, H.; Bai, H. Refinement of Landslide Susceptibility Map Using Persistent Scatterer Interferometry in Areas of Intense Mining Activities in the Karst Region of Southwest China. Remote Sens. 2019, 11, 2821. [Google Scholar] [CrossRef]
Bai, D.; Lu, G.; Zhu, Z.; Zhu, X.; Tao, C.; Fang, J. Using Electrical Resistivity Tomography to Monitor the Evolution of Landslides’ Safety Factors under Rainfall: A Feasibility Study Based on Numerical Simulation. Remote Sens. 2022, 14, 3592. [Google Scholar] [CrossRef]
Sengani, F.; Mashao, F.M.; Allopi, D. An integrated approach to develop a slope susceptibility map based on a GIS-based approach, soft computing technique and finite element formulation of the bound theorems. Transp. Geotech. 2022, 36, 100818. [Google Scholar] [CrossRef]
Yan, F.; Zhang, Q.; Ye, S.; Ren, B. A novel hybrid approach for landslide susceptibility mapping integrating analytical hierarchy process and normalized frequency ratio methods with the cloud model. Geomorphology 2019, 327, 170–187. [Google Scholar]
Mondal, S.; Mandal, S. Landslide susceptibility mapping of Darjeeling Himalaya, India using index of entropy (IOE) model. Appl. Geomat. 2019, 11, 129–146. [Google Scholar] [CrossRef]
Das, G.; Lepcha, K. Application of logistic regression (LR) and frequency ratio (FR) models for landslide susceptibility mapping in Relli Khola river basin of Darjeeling Himalaya, India. SN Appl. Sci. 2019, 1, 1–22. [Google Scholar]
Azarafza, M.; Azarafza, M.; Akgün, H.; Atkinson, P.M.; Derakhshani, R. Deep learning-based landslide susceptibility mapping. Sci. Rep. 2021, 11, 1–16. [Google Scholar] [CrossRef]
Sun, D.; Wen, H.; Wang, D.; Xu, J. A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm. Geomorphology 2020, 362, 107201. [Google Scholar]
Luo, X.; Lin, F.; Zhu, S.; Yu, M.; Zhang, Z.; Meng, L.; Peng, J. Mine landslide susceptibility assessment using IVM, ANN and SVM models considering the contribution of affecting factors. PLoS ONE 2019, 14, e0215134. [Google Scholar] [CrossRef]
Mohan, A.; Singh, A.K.; Kumar, B.; Dwivedi, R. Review on remote sensing methods for landslide detection using machine and deep learning. Trans. Emerg. Telecommun. Technol. 2021, 32, e3998. [Google Scholar] [CrossRef]
Yin, Z.; Xu, M. Explainable Neural Network Cancer Diagnosis of Unstained Tissue Samples Measured by Chemometric Microscope. In Novel Techniques in Microscopy; Optica Publishing Group: Washington, DC, USA, 2021; p. NM1C. 5. [Google Scholar]
Yang, Z.; Zhang, A.; Sudjianto, A. GAMI-Net: An explainable neural network based on generalized additive models with structured interactions. Pattern Recognit. 2021, 120, 108192. [Google Scholar] [CrossRef]
Yao, K.; Yang, S.; Wu, S.; Tong, B. Landslide Susceptibility Assessment Considering Spatial Agglomeration and Dispersion Characteristics: A Case Study of Bijie City in Guizhou Province, China. ISPRS Int. J. Geo-Inf. 2022, 11, 269. [Google Scholar] [CrossRef]
Petley, D. Zhangjiawan Landslide: A Massive Rockslope Collapse with 35 Fatalities Caught on a Remarkable Video. Available online: https://blogs.agu.org/landslideblog/2017/08/29/zhangjiawan-landslide-1/ (accessed on 24 January 2022).
Shen, C.; Xie, C.; Ji, S.; Zhou, D. Chapter 1. In New Technology for Geological Hazard Potential Identification and Vulnerability Assessment in Karst Mountains; Wang, Y., Ed.; Science Press: Beijing, China, 2022; pp. 2–3. ISBN 978-7-03-071326-1. [Google Scholar]
Liao, M.; Wen, H.; Yang, L. Identifying the essential conditioning factors of landslide susceptibility models under different grid resolutions using hybrid machine learning: A case of Wushan and Wuxi counties, China. Catena 2022, 217, 106428. [Google Scholar]
Tang, Y.; Feng, F.; Guo, Z.; Feng, W.; Li, Z.; Wang, J.; Sun, Q.; Ma, H.; Li, Y. Integrating principal component analysis with statistically-based models for analysis of causal factors and landslide susceptibility mapping: A comparative study from the loess plateau area in Shanxi (China). J. Clean. Prod. 2020, 277, 124159. [Google Scholar]
Chowdhuri, I.; Pal, S.C.; Chakrabortty, R.; Malik, S.; Das, B.; Roy, P.; Sen, K. Spatial prediction of landslide susceptibility using projected storm rainfall and land use in Himalayan region. Bull. Eng. Geol. Environ. 2021, 80, 5237–5258. [Google Scholar] [CrossRef]
Huang, F.; Ye, Z.; Jiang, S.-H.; Huang, J.; Chang, Z.; Chen, J. Uncertainty study of landslide susceptibility prediction considering the different attribute interval numbers of environmental factors and different data-based models. Catena 2021, 202, 105250. [Google Scholar] [CrossRef]
Li, Y.; Wang, X.; Mao, H. Influence of human activity on landslide susceptibility development in the Three Gorges area. Nat. Hazards 2020, 104, 2115–2151. [Google Scholar]
Chen, L.; Guo, Z.; Yin, K.; Shrestha, D.P.; Jin, S. The influence of land use and land cover change on landslide susceptibility: A case study in Zhushan Town, Xuan’en County (Hubei, China). Nat. Hazards Earth Syst. Sci. 2019, 19, 2207–2228. [Google Scholar]
Sharifi, E.; Saghafian, B.; Steinacker, R. Downscaling satellite precipitation estimates with multiple linear regression, artificial neural networks, and spline interpolation techniques. J. Geophys. Res. Atmos. 2019, 124, 789–805. [Google Scholar] [CrossRef]
Yordanov, V.; Brovelli, M. Comparing model performance metrics for landslide susceptibility mapping. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 1277–1284. [Google Scholar] [CrossRef]
Dou, J.; Yunus, A.P.; Bui, D.T.; Merghadi, A.; Sahana, M.; Zhu, Z.; Chen, C.-W.; Khosravi, K.; Yang, Y.; Pham, B.T. Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci. Total Environ. 2019, 662, 332–346. [Google Scholar]
Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Bui, D.T.; Duan, Z.; Ma, J. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 2017, 151, 147–160. [Google Scholar] [CrossRef]
Yao, X.; Tham, L.G.; Dai, F.C. Landslide susceptibility mapping based on Support Vector Machine: A case study on natural slopes of Hong Kong, China. Geomorphology 2008, 101, 572–582. [Google Scholar] [CrossRef]
Pham, B.T.; Prakash, I.; Khosravi, K.; Chapi, K.; Trinh, P.T.; Ngo, T.Q.; Hosseini, S.V.; Bui, D.T. A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling. Geocarto Int. 2019, 34, 1385–1407. [Google Scholar] [CrossRef]
Ravindra, K.; Rattan, P.; Mor, S.; Aggarwal, A.N. Generalized additive models: Building evidence of air pollution, climate change and human health. Environ. Int. 2019, 132, 104987. [Google Scholar]
Timoori Yansari, Z.; Hosseinzadeh, S.R.; Kavian, A.; Pourghasemi, H.R. Comparison of Landslide Susceptibility Maps using Logistic Regression (LR) and Generalized Additive Model (GAM). J. Watershed Manag. Res. 2019, 9, 208–219. [Google Scholar]
Shi, Y.; Han, Y.; Zhang, Q.; Kuang, X. Adaptive iterative attack towards explainable adversarial robustness. Pattern Recognit. 2020, 105, 107309. [Google Scholar]
Archer, G.; Saltelli, A.; Sobol, I.M. Sensitivity measures, ANOVA-like techniques and the use of bootstrap. J. Stat. Comput. Simul. 1997, 58, 99–120. [Google Scholar]
Pawluszek-Filipiak, K.; Borkowski, A. On the Importance of Train–Test Split Ratio of Datasets in Automatic Landslide Detection by Supervised Classification. Remote Sens. 2020, 12, 3054. [Google Scholar] [CrossRef]
Zhou, X.; Wen, H.; Li, Z.; Zhang, H.; Zhang, W. An interpretable model for the susceptibility of rainfall-induced shallow landslides based on SHAP and XGBoost. Geocarto Int. 2022, 1–32. [Google Scholar] [CrossRef]
Mangalathu, S.; Hwang, S.-H.; Jeon, J.-S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
Van den Broeck, G.; Lykov, A.; Schleich, M.; Suciu, D. On the tractability of SHAP explanations. J. Artif. Intell. Res. 2022, 74, 851–886. [Google Scholar]
Zhu, Y.; Tian, B.; Xie, C.; Guo, Y.; Fang, H.; Yang, Y.; Wang, Q.; Zhang, M.; Shen, C.; Wei, R. Multi-Temporal InSAR Deformation Monitoring Zongling Landslide Group in Guizhou Province Based on the Adaptive Network Method. Sustainability 2023, 15, 894. [Google Scholar] [CrossRef]
Tao, T.; Shi, W.; Liang, F.; Wang, X. Failure mechanism and evolution of the Jinhaihu landslide in Bijie City, China, on 3 January 2022. Landslides 2022, 19, 2727–2736. [Google Scholar] [CrossRef]
Wang, J.; Wang, C.; Xie, C.; Zhang, H.; Tang, Y.; Zhang, Z.; Shen, C. Monitoring of large-scale landslides in Zongling, Guizhou, China, with improved distributed scatterer interferometric SAR time series methods. Landslides 2020, 17, 1777–1795. [Google Scholar]
Zhao, P.; Masoumi, Z.; Kalantari, M.; Aflaki, M.; Mansourian, A. A GIS-Based Landslide Susceptibility Mapping and Variable Importance Analysis Using Artificial Intelligent Training-Based Methods. Remote Sens. 2022, 14, 211. [Google Scholar] [CrossRef]

Figure 1. Overview of the study area in Nayong (the illustration of the landslide in Nayong was accessed on Google Earth).

Figure 3. The annual mean NDVI maps in Nayong.

Figure 4. Time series chart for NDVI changes (2016–2021).

Figure 5. The process for mapping LSM and explainability.

Figure 6. The network of GAMI-net. The main effects are fitted first, and then the top-K ranked pairwise interactions are selected and fitted to the residuals, subject to the heredity constraint. Finally, the trivial subnetworks are pruned.

Figure 7. The ROC results of the grid search (blue represents high ROC values, red represents low ROC values).

Figure 8. The ROC of LR, RF, SVM and GAMI-net.

Figure 9. The GAMI-net loss function variation on the training and validation datasets (the red part is training main effects, the blue part is training interactions, and the yellow part is fine-tuning).

Figure 10. The validation loss for landslide factors. The red dots indicate the number of features that minimize the loss function and the red stars indicate the number of features selected for retention (the dots and stars have been overlapped in the figure).

Figure 11. The XAI-LSM produced by GAMI-net.

Figure 12. Global feature importance of GAMI-net LSM.

Figure 13. The factor ridge function images output by GAMI-Net global interpretation (reflects changes in IR values due to changes in landslide factors).

Figure 14. Optical and radar interpretation of the Zongling landslide (image and InSAR). Points A and B were the two most prominent landslides in the Zongling landslides, located on the side (A) and the front (B) of Mount Zongling, respectively.

Figure 15. The SHAP results of Point A and B. The horizontal axis is the landslide impact factor and the vertical axis is the IR value (the 1 in the forecast represents the landslide).

Figure 16. Ground survey pictures of the Zongling landslide.

Table 2. The iterative training process of GAMI-net.

Step	Training Process
1	Training all individual factors sub-networks;
2	Selecting the most important $x_{i}$ individual factors sub-networks based on factors importance;
3	Fine-tuning the individual factors sub-network modules;
4	Selecting interaction pairwise factors based on the ranking of interaction factors;
5	Training the interaction factors sub-network modules;
6	Selecting the top $i n t e r a c t i o n_{i}$ significant interaction factors sub-networks based on feature importance;
7	Fine-tuning the interaction factors sub-network modules.

Table 3. Evaluation of each model.

Algorithm	Accuracy	Precision	F1 Score	Recall	AUC ROC
LR	0.6189	0.6160	0.5754	0.5400	0.6482
SVM	0.6708	0.7013	0.6129	0.5448	0.7530
RF	0.8110	0.8222	0.7897	0.7581	0.9027
GAMI-net	0.8730	0.8648	0.8683	0.8747	0.9442

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fang, H.; Shao, Y.; Xie, C.; Tian, B.; Shen, C.; Zhu, Y.; Guo, Y.; Yang, Y.; Chen, G.; Zhang, M. A New Approach to Spatial Landslide Susceptibility Prediction in Karst Mining Areas Based on Explainable Artificial Intelligence. Sustainability 2023, 15, 3094. https://doi.org/10.3390/su15043094

AMA Style

Fang H, Shao Y, Xie C, Tian B, Shen C, Zhu Y, Guo Y, Yang Y, Chen G, Zhang M. A New Approach to Spatial Landslide Susceptibility Prediction in Karst Mining Areas Based on Explainable Artificial Intelligence. Sustainability. 2023; 15(4):3094. https://doi.org/10.3390/su15043094

Chicago/Turabian Style

Fang, Haoran, Yun Shao, Chou Xie, Bangsen Tian, Chaoyong Shen, Yu Zhu, Yihong Guo, Ying Yang, Guanwen Chen, and Ming Zhang. 2023. "A New Approach to Spatial Landslide Susceptibility Prediction in Karst Mining Areas Based on Explainable Artificial Intelligence" Sustainability 15, no. 4: 3094. https://doi.org/10.3390/su15043094

APA Style

Fang, H., Shao, Y., Xie, C., Tian, B., Shen, C., Zhu, Y., Guo, Y., Yang, Y., Chen, G., & Zhang, M. (2023). A New Approach to Spatial Landslide Susceptibility Prediction in Karst Mining Areas Based on Explainable Artificial Intelligence. Sustainability, 15(4), 3094. https://doi.org/10.3390/su15043094

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Approach to Spatial Landslide Susceptibility Prediction in Karst Mining Areas Based on Explainable Artificial Intelligence

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Landslide Inventory

2.2. Data Preparation

3. Methods

3.1. Overall Methodology

3.2. Machine Learning Methods

3.2.1. Logistic Regression (LR)

3.2.2. Random Forest (RF)

3.2.3. Support Vector Machines (SVM)

3.2.4. GAMI-Net

3.3. Explainability of GAMI-Net

3.3.1. Importance Ratio

3.3.2. Shapley Additive Explanations (SHAP)

4. Results

4.1. Model Assessments for RF, SVM, and GAMI-Net

4.2. Comparing Landslides Susceptibility Map

4.3. Interpretative Results for GAMI-Net and Shapley Additive Explanations

4.4. Factorial Interpretability Analysis of the Zongling Landslides

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI