Next Article in Journal
Advanced Statistical Testing of Quantum Random Number Generators
Previous Article in Journal
The Role of Complex Analysis in Modelling Economic Growth
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid Integration Approach of Entropy with Logistic Regression and Support Vector Machine for Landslide Susceptibility Modeling

1
School of Earth Science and Resources, Chang’an University, Key Laboratory of Degraded and Unutilized Land Remediation Engineering, Ministry of Land and Resources, Shaanxi Provincial Key Laboratory of Land Rehabilitation, Xi’an 710064, Shaanxi, China
2
College of Geology & Environment, Xi’an University of Science and Technology, Xi’an 710054, Shaanxi, China
3
Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran
*
Author to whom correspondence should be addressed.
Entropy 2018, 20(11), 884; https://doi.org/10.3390/e20110884
Submission received: 7 October 2018 / Revised: 7 November 2018 / Accepted: 7 November 2018 / Published: 17 November 2018

Abstract

:
The main purpose of the present study is to apply three classification models, namely, the index of entropy (IOE) model, the logistic regression (LR) model, and the support vector machine (SVM) model by radial basis function (RBF), to produce landslide susceptibility maps for the Fugu County of Shaanxi Province, China. Firstly, landslide locations were extracted from field investigation and aerial photographs, and a total of 194 landslide polygons were transformed into points to produce a landslide inventory map. Secondly, the landslide points were randomly split into two groups (70/30) for training and validation purposes, respectively. Then, 10 landslide explanatory variables, such as slope aspect, slope angle, altitude, lithology, mean annual precipitation, distance to roads, distance to rivers, distance to faults, land use, and normalized difference vegetation index (NDVI), were selected and the potential multicollinearity problems between these factors were detected by the Pearson Correlation Coefficient (PCC), the variance inflation factor (VIF), and tolerance (TOL). Subsequently, the landslide susceptibility maps for the study region were obtained using the IOE model, the LR–IOE, and the SVM–IOE model. Finally, the performance of these three models was verified and compared using the receiver operating characteristics (ROC) curve. The success rate results showed that the LR–IOE model has the highest accuracy (90.11%), followed by the IOE model (87.43%) and the SVM–IOE model (86.53%). Similarly, the AUC values also showed that the prediction accuracy expresses a similar result, with the LR–IOE model having the highest accuracy (81.84%), followed by the IOE model (76.86%) and the SVM–IOE model (76.61%). Thus, the landslide susceptibility map (LSM) for the study region can provide an effective reference for the Fugu County government to properly address land planning and mitigate landslide risk.

1. Introduction

Landslides often occur in mountainous and hilly areas and are one of the most dangerous geological disasters [1]. Landslides can cause huge economic losses and a large number of casualties. According to statistics, almost 1000 people and 4 billion dollars are lost annually in the world [2], and this figure still keeps growing. China is also a region where landslides frequently occur; it has been reported that 7122 geological disasters occurred in 2017, causing 327 deaths, 173 injured, 25 missing, and a loss of 3.54 billion CNY [3]. In addition, in northwestern China, landslides pose a greater threat to resident security and transportation, because of the harsh environment and population concentration. However, enormous manpower and material resources may be required to control and renovate every landslide. Therefore, predicting landslide occurrence is both valuable and important.
As the first step to predicting landslide occurrences, a landslide susceptibility analysis aims to recognize hazardous and high-risk regions, and a preference for the negative effects of landslides [4]. The landslide susceptibility map (LSM) is the final result of the landslide susceptibility analysis. However, the traditional methods for landslide susceptibility mapping based on filed investigation and manual analysis are time-consuming and expensive, and the result is imprecise [5,6]. In recent years, geographical information systems (GIS) have been vigorously developed, which make the preparation of the landslide susceptibility map more convenient, which has great advantages [7]. Meanwhile, there has been a lot of research on the combination of geographical information systems, and statistical and nonstatistical methods to evaluate landslide susceptibility—in terms of the binary statistical method, for example, the frequency ratio (FR) model [8,9,10,11,12,13], the certainty factor (CF) model [14,15,16,17], the statistical index (SI) [18,19], the weights of evidence (WOE) [20,21,22], and the index of entropy (IOE) model [23,24]. The factor internal coefficient of certainty or weight of evidence is decided by landslide data, but the selection of factors would be influenced by humans. As a multivariate statistical method, the logistic regression (LR) model is extensively applied by many researchers [25,26,27,28,29,30].
Due to the limitation of statistical models, some machine learning algorithms that can avoid the influence from humans were also introduced and applied for landslide susceptibility analysis, such as artificial neural networks (ANN) [31,32,33], neuro-fuzzy [34,35,36,37], fuzzy logic [38,39], decision trees [40,41,42], kernel logistic regression (KLR) [43,44], and support vector machines (SVM) [45,46,47].
Statistical models and machine learning algorithms have their own advantages and disadvantages [48,49]. The internal parameters of the explanatory variables in binary statistical models are determined by landslide data, which can avoid the interference of human factors and be more objective. However, the selection of explanatory variables will receive interference from humans. By contrast, multivariable statistical models and machine learning methods can avoid the problem of factor dependence, but they are less widespread and limited to few cases of study for their intensive computation [50,51]. In recent years, many hybrid models have been used in the literature, such as the fuzzy weight of evidence method [17], adaptive network-based fuzzy inference system (ANFIS) based on frequency ratio (FR–ANFIS) model [52], wavelet packet–statistical (WP–SM) models [53], and integration of support vector machines and the multiboost [54]. According to plenty of research, the hybrid model generally performed better than the original models, so trying to mix different models and apply them to different regions is significant. Therefore, this research assembled the IOE model with the LR and SVM models to form two hybrid models (LR–IOE and SVM–IOE) for landslide susceptibility mapping in the Fugu County of Shaanxi Province, China.

2. Study Area

The Fugu County, whose geographic coordinates are 110°25′ to 111°15′ east longitude and 38°42′ to 39°33′ north latitude, covers an area of 3229 Km2 (Figure 1). The elevation in the study area is between 761 and 1423 m above sea level, and increases from east to west. The temperate zone with an arid continental monsoon climate is the main climate type in the study region, and the maximum and minimum temperatures in history are 38.9 °C and −24 °C, while the average annual temperature is 9.1 °C. The average annual rainfall is 428.6 mm, and the geographical distribution of rainfall shows a gradual increase from northwest to southwest. Meanwhile, most of the precipitation is concentrated from July to September, accounting for 69% of the annual rainfall. There are 62 rivers with drainage areas above 1 × 107 m2 in the study region, and the average annual runoff is 5.911 × 109 m3.
The overall topography of the study area is high in the northwest and low in the southwest. The main landform types can be divided into four types as follow: Loess girder landform, loess gully landform, canyon hilly landform, and valley terraces. The dip direction of rock formation is roughly southwest–northwest, with a dip angle of approximately 5–8 degrees except for a few areas, which are about 20 degrees. The Carboniferous–Permian strata in the east and the Jurassic strata in the northwest are coal-bearing strata, and the lithology in the study area is shown in Table 1.
Due to the rich coal resources in the study area, the mining industry is developed and the population is concentrated, which caused serious damage to the environment. At the same time, it has also formed massive landslides.

3. Data Used

3.1. Landslide Inventory Map

A landslide inventory map is the first step in a landslide susceptibility analysis and includes historical and newly discovered landslides and their relational information [43], such as the location, the date of occurrence, the extent of landslide phenomena in a region, and the types of mass movements that have left discernable traces [55]. In order to obtain a practical and accurate landslide inventory map, data collection and an adequate field survey were significantly in the current study. A digital elevation model (DEM) of the study region with 30 m resolution was obtained from ASTER GDEM, downloaded from Geospatial Data Cloud [56]. The geological map and mean annual precipitation data were provided by the government of Fugu County. Based on field investigations, a total of 194 landslides polygons, including 162 slides, 29 falls, and 3 debris flows, were drawn according to the depletion zone, and these landslides were triggered by rainfall and excavation. In the study area, the smallest and largest sizes of these landslides were about 39 m2 and 13.5 × 104 m2, respectively. Because only 12% of landslides are over 10,000 m2 in size, landslide polygons were transformed into points using the centroid method and then the landslide inventory map (Figure 1) was obtained in the present study [57,58].
To avoid the overfitting problems in modeling, a total of 194 nonlandslide points were randomly generated and mapped on the landslide inventory map. All of these landslide and nonlandslide points were randomly divided into two groups; namely, the training dataset, including 272 (70%) points, was used to train the models, and the validating dataset, including 116 (30%) points, was used for validation propose.

3.2. Landslide Explanatory Variables

In order to produce the landslide susceptibility map, 10 landslide explanatory variables, namely slope aspect, altitude, slope angle, lithology, mean annual precipitation, distance to roads, distance to rivers, distance to faults, land use, and normalized difference vegetation index (NDVI), were selected to produce data layers representing themselves with a resolution of 30 × 30 m. Slope aspect, altitude, and slope angle maps were extracted from DEM data using ArcGIS software. Land use and NDVI were extracted from GF-2 satellite images gathered from the China Center for Resources Satellite Data and Application. Lithology, distance to roads, mean annual precipitation, distance to rivers, and distance to faults maps were extracted based on existing data.
The slope aspect, which is considered to be a prerequisite condition, was frequently adopted by many works in the literature to produce a landslide susceptibility map [30]. The slope aspect was reclassified into nine groups, based on the equal interval method, as follows: Northwest, west, southwest, south, southeast, east, northeast, north, flat, respectively (Figure 2a).
As it is considered to be another critical factor, the slope angle was widely used by a lot of relevant research [59]. In the current research, the slope angle was divided into the following six categories, based on the Jenks natural break method, as follows: 0°–6.65°, 6.65°–11.40°, 11.40°–16.39°, 16.39°–22.09°, 22.09°–29.45°, 29.45°–60.57° (Figure 2b).
Altitude is also considered a significant factor for landslide susceptibility mapping [1]. Thus, based on the Jenks natural break method, elevation values were classified into the following seven ranges: 761–903 m, 903–984 m, 984–1054 m, 1054–1124 m, 1124–1194 m, 1194–1262 m, and 1262–1423 m (Figure 2c).
The difference of lithology is the basis of landslide formation conditions [60]. According to field investigations and the existing geological data and maps, lithological units were divided into six categories (Table 1) and the lithology map was produced (Figure 2d).
Previous research has indicated that there is a strong correlation between mean annual precipitation and landslide occurrences [61,62,63]. According to the existing and local observation data, mean annual precipitation is divided into seven classes based on equal interval method as follows: <360 mm/y, 360–380 mm/y, 380–400 mm/y, 400–420 mm/y, 420–440 mm/y, 440–460 mm/y, and >460 mm/y (Figure 2e).
Distance to roads is used as an important landslide explanatory variable to prepare the distance to roads map [64]. In this study, the values of distance to roads were reclassified into five ranges based on equal interval method as follows: <200 m, 200–400 m, 400–600 m, 600–800 m, and >800 m (Figure 2f).
River erosion of slope is considered to be a significant explanatory variable inducing landslides; thus, distance to rivers is employed to be a quantitative index of river erosion [25]. In this study, with 200 m as the interval, the values of distance to rivers were reclassified into five ranges based on equal interval method as follows: <200 m, 200–400 m, 400–600 m, 600–800 m, and >800 m (Figure 2g).
Fault movement is not only the requirement for individual landslide occurrences, but also a controlling factor for regional landslide occurrences [12]. A mass of field surveys indicated that the more fault movement occurred acutely, the more landslides were triggered. In the current research, with 2000 m as the interval, the values of distance to faults were reclassified into five ranges based on equal interval method as follows: <2000 m, 2000–4000 m, 4000–6000 m, 6000–8000 m, and >8000 m (Figure 2h).
Land use in different regions will be different. The use of these land may lead to an asymmetrical distribution of landslides [65]. Thus, land use was also employed to be an explanatory variable in the study region, which was generally divided into five categories as follows: Water, residential areas, bare land, forest/grassland, and farmland (Figure 2i).
NDVI reflects the surface condition and provides a quantitative estimate of vegetation growth and biomass. This is depending on the biomass, the position within the hillslope profile, the root-zone depth and possibility to crack rocks and to prevent or ease water infiltration [66,67]. Therefore, NDVI is also considered to be a pivotal explanatory variable. The computational formula of NDVI is defined as follows:
NDVI = NIR R NIR + R ,
where R stands for the red part of electromagnetic spectrum, while NIR represents the infrared part of electromagnetic spectrum. Using the Jenks natural break method, the NDVI values were reclassified into five categories as follows: −0.39 to −0.019, −0.019 to 0.063, 0.063–0.134, 0.134–0.216, and 0.216–0.607 (Figure 2j).

4. Methodologies

4.1. Multicollinearity Diagnosis

In the study region, not all explanatory variables have a positive impact on the classification results. Multicollinearity problems may exist between explanatory variables, which may lead to an overfit in modeling. Thus, the Pearson correlation coefficient (PCC), the variance inflation factor (VIF), and tolerance (TOL) were introduced to detect the potential multicollinearity problems [68].
The essence of PCC is a statistical linear correlation coefficient, and its analysis is usually used to measure the linear relationship between distance variables. For two sets of samples Xi (i = 1, 2, 3, ..., n) and Yj (j = 1, 2, 3, ..., n), the PCC between them can be expressed as:
PCC = i = 1 n ( x i x ¯ ) j = 1 n ( y j y ¯ ) i = 1 n ( x i x ¯ ) 2 j = 1 n ( y i y ¯ ) 2 ,
where xi and yj are variable values for Xi and Yj. x ¯ and y ¯ are the average of Xi and Yj, respectively. In general, the greater the absolute value of PCC is, the higher the risk of multicollinearity between the landslide explanatory variables [69], and a PCC of >0.7 indicates a multicollinearity problem [70].
The VIF and TOL are two important indexes for a multicollinearity diagnosis. VIF refers to the ratio of the variance when there is multicollinearity between the conditioning factors and the variance when there is no multicollinearity, and the tolerance is the reciprocal of VIF [71]. In general, the larger the VIF values and the smaller the tolerances values are, the stronger the multicollinearity between the conditioning factors. In this study, the explanatory variables with VIF >2 or TOL <0.4 should be abandoned [72].

4.2. Index of Entropy (IOE) Method

The first classification model applied in the present study is the index of entropy (IOE) model, which is a bivariate statistic model; the IOE is also used to be the input data to build the hybrid models in the subsequent modeling. The entropy means the degree of unsteadiness and indeterminacy of a system, and also indicates that elements in a natural environment are the most related development for mass movement [23]. In addition, the entropy represents the degree of different explanatory variables that affect the development of landslides in a landslide susceptibility analysis. The weight values (Wj) of each landslide explanatory variable are determined by the following equations [73]:
F R i j = y i j x i j ,
S i j = F R i j j = 1 N j F R i j ,
M j = i = 1 N j S i j log 2 S i j , j = 1 , 2 , 3 , ... , n ,
M j max = log 2 N j ,
I j = M j max M j M j max ,
W j = I j × F R i j ,
where FRij is the frequency ratio value; x and y represent the percentage of domain and percentage of landslides, respectively; Sij stands for the probability density; entropy values are represented by Mj and Mjmax; Nj means the number of categories or ranges of each explanatory variables; and Ij is the information parameters.
Then, the final weight values are calculated by SPSS software. Because these three explanatory variables (aspect, lithology, and land use) are generated from vector graphics with no attribute values, the FR values of aspect, lithology, and land use were used as input data for the computation of Wj. Finally, the landslide susceptibility map for the IOE model is produced using the following equation:
LSI IOE = j = 1 n e f j × C × W j ,
where LSIIOE stands for the sum of all the categories; j represents the number of explanatory variable maps; e means the number of classes within explanatory variable maps with the greatest number of groups; fj is the number of classes within particular explanatory variable maps; and C indicates the value of the categories after secondary classification [74].

4.3. Integration of Logistic Regression and Index of Entropy Model

The logistic regression (LR) model is employed to integrate with the IOE to build a new hybrid model, namely, the LR–IOE model in this study. Logistic regression is a commonly used statistical analysis method for regression analysis of binary classification dependent variables. The superiority of the LR model is that independent variables can be discrete or continuous and there is no need to satisfy the normal distribution [75]. In a logistic regression analysis, the dependent variable has values of 0 and 1, representing nonlandslide occurrences and landslide occurrences, respectively. The LR model can be expressed as the following equation:
P = exp ( Z ) 1 + exp ( Z ) ,
where P stands for the probability of landslide occurrences, whose value ranges from 0 to 1; Z is calculated by the following equation with the output values range from −∞ to +∞:
Z = B 0 + B 1 X 1 + B 2 X 2 + + B n X n ,
where n is the number of independent variables; Bi (i = 1, 2, 3, ..., n) is the logistic regression coefficient and Xi are the values of the n explanatory variables; and B0 is a constant.
Because the values of Sij were obtained from the IOE model and the dimension of Sij is uniform, it can avoid the linear correlation between landslides and explanatory variables and also reduce the noise in modeling. In this study, the 10 explanatory variables were reclassified with the corresponding Sij values. Then, the values of Sij were regarded as the input data to build the hybrid model (LR–IOE) through the forward stepwise method to calculate B0 and Bi.

4.4. Integration of Support Vector Machine and Index of Entropy Model

The basic theory of the support vector machine is to transform the input space into high-dimensional space through an inner product function using the training data [76]. The support vectors are defined as the training samples that have the smallest distance from the optimal hyper plane [40]. In this study, SVM is designed to solve binary classification problems, which means that the positive and negative samples exist at the same time.
Consider a set of training vectors xi (i = 1, 2, 3, ..., n), and xi consists of two types denoted as yi = ±1 [77]. SVM aims to search an n-dimensional hyperplane distinguishing the two categories; meanwhile, ensure that these two classes are farthest from the hyperplane. Using mathematical formulas, this can be expressed as follows:
P = 1 2 w 2 ,
followed by constraints:
y i ( ( w × x i ) + k ) 1
where w stands for the norm of hyperplane normal; k is a constant. By applying the Lagrangian multiplier ( λ i ), the cost function can be written as:
L = 1 2 w 2 i = 1 n λ i ( y i ( ( w × x i ) + k ) 1 ) .
In addition, slack variable ξ i is applied to solve the nonseparable problems [76]; thus, Equations (12) and (13) can be modified as:
y i ( ( w × x i ) + k ) 1 ξ i ,
L = 1 2 w 2 1 v n i = 1 n ξ i ,
where v stands for misclassification, with values ranging from 0 to 1. In addition, by introducing a kernel function, the nonlinear decision boundary can be calculated. In the current research, the following kernel function, namely, the radial basis function (RBF), which is considered to be one of the most powerful kernels [78], is selected to calculate LSISVM and produce landslide susceptibility map. The radial basis function is shown as follows:
K ( x i , x j ) = exp ( δ x i x j 2 ) , δ > 0 ,
where δ accounts for the width of the Gaussian kernel function [19].
Similarly, the Sij was used to be the input data for the SVM model and then build the new hybrid model (SVM–IOE).

4.5. The ROC Curve

To test the performance of LSMs obtained by the three models, the receiver operating characteristics (ROC) curve was applied. Based on a series of different dichotomies (cutoffs or decision thresholds), the ROC curve plots 1—specificity as X-axis and sensitivity as Y-axis, which can be expressed as:
X - axis = 1 specificity = 1 [ TN TN + FP ] ,
Y - axis = 1 sensitivity = TP TP + FN ,
where TP represents true positive, TN is true negative, FP is false positive, and FP is false negative [79]. The quality of these three models predicting the occurrences or non-occurrences of landslide can be measured by the area under the ROC curve (AUC) [9]. The AUC values range from 0 to 1; in addition, if the AUC value is closer to 1, it indicates that the accuracy of model prediction is higher. Conversely, if AUC value is less than 0.5, and closer to 0, it indicates that the model prediction has no practical value [80].

5. Results

5.1. Assessment of Explanatory Variables

In this study, the training dataset was used to evaluate explanatory variables and the Pearson correlation coefficient between pairs of explanatory variables was calculated (Table 2). It can be seen from the results that the lowest PCC value is −0.009, which happened between altitude and NDVI, and the highest PCC value happened between slope aspect and distance to rivers (0.368). All PCC values are less than 0.7.
The calculation results of VIF and TOL are shown in Table 3. It can be observed that the maximum VIF value is 1.926 and the minimum TOL value is 0.519, which means all the explanatory variables can be applied for landslide susceptibility modeling.

5.2. Result of IOE Model

The calculation method of Wj has already been described in Section 4.2, Equations (3)–(8), and the results are shown in Table 4. The FRij values shown in Table 4 were used as the input data for slope aspect, lithology, and land use. For the remaining explanatory variables, the original (continuous) data were used as input data to compute the IOE values. Based on the obtained results, the landslide susceptibility index for the IOE model (LSIIOE) was calculated using Equation (9) and was written as follows:
LSIIOE = (slope aspect × 0.084) + (slope angle × 0.064) + (altitude × 0.874) + (lithology × 0.119) + (mean annual precipitation × 0.232) + (distance to roads × 0.517) + (distance to rivers × 0.127) + (distance to faults × 0.030) + (land use × 0.974) + (NDVI × 0.303)
In the end, all of the 10 explanatory variables were used to build the IOE model, and LSIIOE values range from −10.37 to 11.67. LSIIOE values reflect the probability of landslide occurrence. In other words, the closer the values of LSIIOE are to 11.67, the higher the probability of landslide occurrence, and the values of LSIIOE are close to −10.37, indicating that the probability of occurrence of a landslide is lower. Then, the natural break method was applied to classify the final LSM produced by the IOE model into four categories, which were low (−10.37 to −4.33), moderate (−4.33 to −1.65), high (−1.65 to 1.64), and very high (1.64 to 11.67) (Figure 3a). Additionally, the area percentage of low, moderate, high, and very high regions is 31.24%, 16.39%, 33.23%, and 19.14%, respectively.

5.3. Result of LR–IOE Model

The calculation method of Z has already been described in Section 4.2, Equations (3)–(8). The Sij values shown in Table 4 were used as the input data for all 10 explanatory variables through the reclassification method to build the LR–IOE model and to compute B0 and Bi using SPSS software. Based on the results, Equation (11) can be written as follows:
Z = 2.345 + (slope aspect × 0.061) + (slope angle × 0.043) + (altitude × −0.252) + (lithology × −0.013) + (mean annual precipitation × 0.239) + (distance to roads × −0.533) + (distance to rivers × −0.269) + (distance to faults × 0.110) + (land use × 0.061) + (NDVI × −0.354)
Subsequently, the LSILRIOE values were obtained, which range from 0.016 to 0.983. LSILRIOE values reflect the probability of landslide occurrence. In other words, the closer the values of LSILRIOE are to 1, the higher the probability of landslide occurrence, and the values of LSILRIOE are close to 0, indicating that the probability of landslide occurrence is lower. Similarly, the natural break method was applied to classify the final LSM produced by the LR–IOE model into four categories: Low (0.016–0.248), moderate (0.248–0.445), high (0.445–0.688), and very high (0.688–0.983) (Figure 3b). In addition, the area percentage of low, moderate, high, and very high is 16.77%, 33.06%, 21.05%, and 29.12%, respectively.

5.4. Result of SVM–IOE Model

In the current research, the parameters of the radial basis function were selected by the grid search method with 10-fold cross validation, and then the entropy was regarded as the input data to calculate the LSISVM–IOE values based on SVM–IOE model. The LSISVM–IOE values range from 0.061 to 0.984. The closer the values are to 1, the higher the probability of landslide occurrence, and the values of LSISVM–IOE are close to 0, indicating that the probability of landslide occurrence is lower. Then, the natural break method was applied to classify the final LSM produced by the SVM–IOE model into four categories: Low (0.061–0.271), moderate (0.271–0.437), high (0.437–0.658), and very high (0.658–0.984) (Figure 3c). The area percentage of low, moderate, high, and very high is 15.08%, 29.56%, 33.39%, and 21.97%, respectively.

5.5. Validation of Landslide Susceptibility Maps

In the current study, the ROC curve was used to validate and compare the performance of the IOE, LR–IOE, and SVM–IOE models. The final AUC values represent the success and prediction rate derived from the training and validating dataset, respectively.
In the end, for success rate results, the AUC values for the IOE, LR–IOE, and SVM–IOE models were observed to be 0.8743, 0.9011, and 0.8653, respectively (Figure 4a). That is to say, the training accuracy of the susceptibility maps is 87.43%, 90.11%, and 86.53%, respectively. In terms of prediction rate results, the AUC values for the IOE, LR–IOE, and SVM–IOE models were found to be 0.7686, 0.8184, and 0.7661, respectively (Figure 4b). In other words, the prediction accuracy of the susceptibility maps is 76.86%, 81.84%, and 76.61%, respectively.
Generally, the results of both the success rate and prediction rate express reasonable and practical accuracies in the current research. However, the LR–IOE model shows the best result for the current study.

6. Discussion

Spatial prediction of landslides is a critical process in the study of landslides and the accuracy of prediction will be affected by the models that we used, and the input data extracted from explanatory variables. However, there is no definitive conclusion about the methods used to select and evaluate explanatory variables. Therefore, it is necessary to investigate the methods which will help us to obtain reasonable conclusions. In this study, we calculated the IOE and PCC to assess 10 explanatory variables, and evaluated three classification models, namely, IOE, LR–IOE, and SVM–IOE, for landslide susceptibility mapping.
According to PCC values (Table 2), all 10 factors are less than 0.7, which means these 10 factors cannot generate noise in landslide susceptibility modeling. From the index of entropy (Table 4), we can see the residential areas have the highest value (7.555), which means that most landslides occurred in this region. We believe that the reason for this condition is the concentration of population and the fact that human engineering activities are intense in this area. Similarly, the closer to the road, the higher the frequency of landslides that occurred was. For the slope aspect, most landslides occurred on south-facing slopes; the reason for this condition may be the climate, and the same results were also reported by the authors of [37] (p. 82). The category C (Siltstone, sandstone, mudstone, shale, coal seam, glutenite) in lithology is the region where the largest number of landslides has occurred. This may be due to the softness of sandstone and siltstone structures and strong weathering erosion. In the case of slope angle and mean annual precipitation, the rate of landslide occurrence is roughly proportional to them. The reason may be that a large amount of water infiltrate increases the water content and weight of the rock and soil mass and increases the sliding force of the rock and soil mass, and the steeper the slope, the stronger the slip force of the rock and soil mass. Interestingly, with the values of distance to faults, distance to rivers, distance to roads, altitude, and NDVI increasing, the IOE is gradually decreasing. The reason for this phenomenon is that road construction usually causes instability, while roads in the study region are generally built at low altitudes and away from faults. The root of the vegetation is conducive to the stability of the soil, while the erosion of the rivers will affect the stability of the slope. These conditions are roughly the same as those observed in the field.
In this study, the selection of explanatory variables was based on previous studies and field observations, which will cause interference from human factors. In addition, although we calculated all the Wj values for the 10 explanatory variables, it is not clear how much the method developed in the work is sensitive to the number of the classes and to the choice of the breaking points. Therefore, this is the focus of future research.
As shown in Figure 4, we can see the AUC value of the LR–IOE model is the highest among the three models, whether it is for the success or prediction rate, which means that the LR–IOE model performs best in landslide susceptibility mapping in this study. However, the AUC value of the SVM–IOE model is the lowest, which may be due to the fact that the SVM–IOE model is more dependent on the selection of the kernel function, and there is no objective way to solve it.
In terms of the proportion of the final susceptibility mapping results (Figure 5), it can be observed that the proportion of high and very high regions obtained by the three models is about 52%. Among them, the LR–IOE model has the lowest result (50.17%), which implies an efficient result corresponding to the LR–IOE model, and it can also improve the efficiency of decision-making and reduce costs.

7. Conclusions

In this present study, the IOE model, LR–IOE model, and SVM–IOE model were used to obtain landslide susceptibility maps for the Fugu County of Shaanxi Province, China. Ten explanatory variables, namely, altitude, slope aspect, mean annual precipitation, slope angle, lithology, distance to roads, land use, distance to rivers, distance to faults, and NDVI, were selected and the potential multicollinearity problem among them was detected by PCC, VIF, and TOL. The results of the analysis showed that there are no potential multicollinearity problems between these 10 factors and they are available for landslide susceptibility modeling. A total of 194 landslides, including landslides recognized from extensive field investigations and historical landslide records, and 194 nonlandslide points were also randomly generated. To build the models, 272 (70%) landslide and nonlandslide points were randomly selected and the remaining 116 (30%) landslide and nonlandslide points were applied for validating purposes. A natural break method was used to split the study region into four categories: Low, moderate, high, and very high. In the end, the performance of the achieved landslide susceptibility maps was evaluated using AUC values.
In terms of the success rate presented by the AUC values, the LR–IOE model has the highest training accuracy (90.11%), followed by the IOE model (87.43%) and the SVM–IOE model (86.53%). As for the prediction rate, the LR–IOE model has the highest training accuracy (81.84%), followed by the IOE model (76.86%) and the SVM–IOE model (76.61%). Thus, the results prove that these three models present good performance in landslide susceptibility mapping. The LR–IOE model performed best for this research and is more suitable for landslide susceptibility mapping in the study area.
The results of this study provide available information for the engineers, decision makers, and urban planners in this study region.

Author Contributions

T.Z. established the model and wrote the main manuscript text. L.H. guided the work and analysis. W.C. and H.S. contributed to the adjustment of the article structure. This paper was prepared using the contributions of all authors. All authors have read and approved the final manuscript.

Funding

This research was funded by National Key Research and Development Program of China, Ecological Safety Guarantee Technology and Demonstration Channel and Slope Treatment Project in Loess Hilly and Gully Area, grant number 2017YFC0504700.

Acknowledgments

We thank the Shaanxi Provincial Key Laboratory of Land Rehabilitation for data used in this study. Special thanks are given to Zhou Zhao, the associate professor of Xi’an University of Science and Technology. We also thank China Center for Resources Satellite Data and Application.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Akgun, A.; Erkan, O. Landslide susceptibility mapping by geographical information system-based multivariate statistical and deterministic models: In an artificial reservoir area at northern Turkey. Arab. J. Geosci. 2016, 9, 1–15. [Google Scholar] [CrossRef]
  2. Petley, D. Global patterns of loss of life from landslides. Geology 2012, 40, 927–930. [Google Scholar] [CrossRef]
  3. National Statistics on Geological Disasters in 2017. Available online: www.jianzai.gov.cn (accessed on 25 October 2018).
  4. Brabb, E.E. Innovative approaches to landslide hazard mapping. In Proceedings of the IV International Symposium on Landslides, Toronto, Canada, 23–31 August 1985; Volume 1, pp. 307–324. [Google Scholar]
  5. Yin, K. The computer-assisted mapping of landslide hazard zonation. Hydrogeol. Eng. Geol 1993, 5, 21–23. [Google Scholar]
  6. Brabb, E.E. The San Mateo County California Gis Project for Predicting the Consequences of Hazardous Geologic Processes. In Geographical Information Systems in Assessing Natural Hazards; Springer: Dordrecht, The Netherlands, 1995. [Google Scholar]
  7. Pourghasemi, H.R.; Pradhan, B.; Gokceoglu, C. Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat. Hazards 2012, 63, 965–996. [Google Scholar] [CrossRef]
  8. Yilmaz, I. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: A case study from Kat landslides (Tokat-Turkey). Comput. Geosci. 2009, 35, 1125–1138. [Google Scholar] [CrossRef]
  9. Pradhan, B. Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches. J. Indian Soc. Remote Sens. 2010, 38, 301–320. [Google Scholar] [CrossRef]
  10. Akinci, H.; Doğan, S.; Kilicoğlu, C.; Temiz, M.S. Production of landslide susceptibility map of Samsun (Turkey) city center by using frequency ratio method. Int. J. Phys. Sci. 2011, 6, 1015–1025. [Google Scholar]
  11. Mondal, S.; Maiti, R. Integrating the analytical hierarchy process (AHP) and the frequency ratio (FR) model in landslide susceptibility mapping of Shiv-Khola watershed, Darjeeling Himalaya. Int. J. Disaster Risk Sci. 2013, 4, 200–212. [Google Scholar] [CrossRef]
  12. Vakhshoori, V.; Zare, M. Landslide susceptibility mapping by comparing weight of evidence, fuzzy logic, and frequency ratio methods. Geomatics 2016, 7, 1–21. [Google Scholar] [CrossRef] [Green Version]
  13. Dev, T.; Tae, I.; Ha, D. GIS-based landslide susceptibility mapping of Bhotang, Nepal using frequency ratio and statistical index methods. J. Korean Soc. Surv. Geod. Photogramm. Cartogr. 2017, 35, 357–364. [Google Scholar]
  14. Devkota, K.C.; Regmi, A.D.; Pourghasemi, H.R.; Yoshida, K.; Pradhan, B.; Ryu, I.C.; Dhital, M.R.; Althuwaynee, O.F. Landslide susceptibility mapping using certainty factor, index of entropy and logistic regression models in GIS and their comparison at Mugling-Narayanghat road section in Nepal Himalaya. Nat. Hazards 2013, 65, 135–165. [Google Scholar] [CrossRef] [Green Version]
  15. Pourghasemi, H.R.; Pradhan, B.; Gokceoglu, C.; Mohammadi, M.; Moradi, H.R. Application of weights-of-evidence and certainty factor models and their comparison in landslide susceptibility mapping at Haraz watershed, Iran. Arab. J. Geosci. 2013, 6, 2351–2365. [Google Scholar] [CrossRef] [Green Version]
  16. Wang, Q.; Li, W.; Chen, W.; Bai, H. GIS-based assessment of landslide susceptibility using certainty factor and index of entropy models for the Qianyang County of Baoji City, China. J. Earth Syst. Sci. 2015, 124, 1–17. [Google Scholar] [CrossRef]
  17. Hong, H.; Chen, W.; Xu, C.; Youssef, A.M.; Pradhan, B.; Bui, D.T. Rainfall-induced landslide susceptibility assessment at the Chongren area (China) using frequency ratio, certainty factor, and index of entropy. Geocarto Int. 2017, 32, 139–154. [Google Scholar] [CrossRef]
  18. Bui, D.T.; Lofman, O.; Revhaug, I.; Dick, O. Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Nat. Hazards 2011, 59, 1413–1444. [Google Scholar] [CrossRef]
  19. Pourghasemi, H.R.; Moradi, H.R.; Aghda, S.M.F. Landslide susceptibility mapping by binary logistic regression, analytical hierarchy process, and statistical index models and assessment of their performances. Nat. Hazards 2013, 69, 749–779. [Google Scholar] [CrossRef]
  20. Polykretis, C.; Chalkias, C. Comparison and evaluation of landslide susceptibility maps obtained from weight of evidence, logistic regression, and artificial neural network models. Nat. Hazards 2018, 93, 1–26. [Google Scholar] [CrossRef]
  21. Ilia, I.; Tsangaratos, P. Applying weight of evidence method and sensitivity analysis to produce a landslide susceptibility map. Landslides 2016, 13, 379–397. [Google Scholar] [CrossRef]
  22. Chen, W.; Li, H.; Hou, E.; Wang, S.; Wang, G.; Panahi, M.; Li, T.; Peng, T.; Guo, C.; Niu, C. GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models. Sci. Total Environ. 2018, 634, 853. [Google Scholar] [CrossRef] [PubMed]
  23. Pourghasemi, H.R.; Mohammady, M.; Pradhan, B. Landslide susceptibility mapping using index of entropy and conditional probability models in GIS: Safarood Basin, Iran. Catena 2012, 97, 71–84. [Google Scholar] [CrossRef]
  24. Razavizadeh, S.; Solaimani, K.; Massironi, M.; Kavian, A. Mapping landslide susceptibility with frequency ratio, statistical index, and weights of evidence models: A case study in northern Iran. Environ. Earth Sci. 2017, 76, 499. [Google Scholar] [CrossRef]
  25. Manzo, G.; Tofani, V.; Segoni, S.; Battistini, A.; Catani, F. GIS techniques for regional-scale landslide susceptibility assessment: The Sicily (Italy) case study. Int. J. Geogr. Inf. Sci. 2013, 27, 1433–1452. [Google Scholar] [CrossRef]
  26. Chen, W.; Shahabi, H.; Shirzadi, A.; Hong, H.Y.; Akgun, A.; Tian, Y.Y.; Liu, J.Z.; Zhu, A.X.; Li, S.J. Novel hybrid artificial intelligence approach of bivariate statistical-methods-based kernel logistic regression classifier for landslide susceptibility modeling. Bull. Eng. Geol. Environ. 2018, 1–23. [Google Scholar] [CrossRef]
  27. Trigila, A.; Iadanza, C.; Esposito, C.; Scarascia-Mugnozza, G. Comparison of logistic regression and random forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 2015, 249, 119–136. [Google Scholar] [CrossRef]
  28. Saro, L.; Woo, J.S.; Kwanyoung, O.; Moungjin, L. The spatial prediction of landslide susceptibility applying artificial neural network and logistic regression models: A case study of Inje, Korea. Open Geosci. 2016, 8, 117–132. [Google Scholar] [CrossRef]
  29. Tsangaratos, P.; Ilia, I. Comparison of a logistic regression and naïve bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size. Catena 2016, 145, 164–179. [Google Scholar] [CrossRef]
  30. Mandal, S.; Mandal, K. Modeling and mapping landslide susceptibility zones using GIS based multivariate binary logistic regression (LR) model in the Rorachu river basin of eastern Sikkim Himalaya, India. Model. Earth Syst. Environ. 2018, 4, 69–88. [Google Scholar] [CrossRef]
  31. Lin, H.M.; Chang, S.K.; Wu, J.H.; Juang, C.H. Neural network-based model for assessing failure potential of highway slopes in the Alishan, Taiwan Area: Pre- and post-earthquake investigation. Eng. Geol. 2009, 104, 280–289. [Google Scholar] [CrossRef]
  32. Conforti, M.; Pascale, S.; Robustelli, G.; Sdao, F. Evaluation of prediction capability of the artificial neural networks for mapping landslide susceptibility in the Turbolo river catchment (northern Calabria, Italy). Catena 2014, 113, 236–250. [Google Scholar] [CrossRef]
  33. Aditian, A.; Kubota, T. Causative factors optimization using artificial neural network for GIS-based landslide susceptibility assessments in Ambon, Indonesia. Int. J. Eros. Control Eng. 2017, 10, 120–129. [Google Scholar] [CrossRef]
  34. Oh, H.J.; Pradhan, B. Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area. Comput. Geosci. 2011, 37, 1264–1276. [Google Scholar] [CrossRef]
  35. Lee, M.J.; Park, I.; Lee, S. Forecasting and validation of landslide susceptibility using an integration of frequency ratio and neuro-fuzzy models: A case study of Seorak mountain area in Korea. Environ. Earth Sci. 2015, 74, 413–429. [Google Scholar] [CrossRef]
  36. Chen, W.; Pourghasemi, H.R.; Panahi, M.; Kornejady, A.; Wang, J.; Xie, X. Spatial prediction of landslide susceptibility using an adaptive neuro-fuzzy inference system combined with frequency ratio, generalized additive model, and support vector machine techniques. Geomorphology 2017, 297, 69–85. [Google Scholar] [CrossRef]
  37. Chen, W.; Panahi, M.; Tsangaratos, P.; Shahabi, H.; Ilia, I.; Panahi, S.; Li, S.; Jaafari, A.; Ahmad, B.B. Applying population-based evolutionary algorithms and a neuro-fuzzy system for modeling landslide susceptibility. Catena 2019, 172, 212–231. [Google Scholar] [CrossRef]
  38. Anbalagan, R.; Kumar, R.; Lakshmanan, K.; Parida, S.; Neethu, S. Landslide hazard zonation mapping using frequency ratio and fuzzy logic approach, a case study of Lachung valley, Sikkim. Geoenviron. Disasters 2015, 2, 1–17. [Google Scholar] [CrossRef]
  39. Tsangaratos, P.; Loupasakis, C.; Nikolakopoulos, K.; Angelitsa, V.; Ilia, I. Developing a landslide susceptibility map based on remote sensing, fuzzy logic and expert knowledge of the island of Lefkada, Greece. Environ. Earth Sci. 2018, 77, 363. [Google Scholar] [CrossRef]
  40. Pradhan, B. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 2013, 51, 350–365. [Google Scholar] [CrossRef] [Green Version]
  41. Bui, D.T.; Pradhan, B.; Lofman, O.; Revhaug, I. Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and naı¨ve bayes models. Math. Probl. Eng. 2012, 2012. [Google Scholar]
  42. Lombardo, L.; Cama, M.; Conoscenti, C.; Märker, M.; Rotigliano, E. Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: Application to the 2009 storm event in Messina (Sicily, southern Italy). Nat. Hazards 2015, 79, 1621–1648. [Google Scholar] [CrossRef]
  43. Hong, H.; Pradhan, B.; Xu, C.; Bui, D.T. Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines. Catena 2015, 133, 266–281. [Google Scholar] [CrossRef]
  44. Chen, W.; Xie, X.; Peng, J.; Wang, J.; Duan, Z.; Hong, H. GIS-based landslide susceptibility modelling: A comparative assessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision tree models. Geomat. Nat. Hazards Risk 2017, 8, 950–973. [Google Scholar] [CrossRef]
  45. Colkesen, I.; Sahin, E.K.; Kavzoglu, T. Susceptibility mapping of shallow landslides using kernel-based gaussian process, support vector machines and logistic regression. J. Afr. Earth Sci. 2016, 118, 53–64. [Google Scholar] [CrossRef]
  46. Lee, S.; Hong, S.M.; Jung, H.S. A support vector machine for landslide susceptibility mapping in Gangwon province, Korea. Sustainability 2017, 9, 48. [Google Scholar] [CrossRef]
  47. Huang, Y.; Zhao, L. Review on landslide susceptibility mapping using support vector machines. Catena 2018, 165, 520–529. [Google Scholar] [CrossRef]
  48. Chen, W.; Shahabi, H.; Shirzadi, A.; Li, T.; Guo, C.; Hong, H.; Li, W.; Pan, D.; Hui, J.; Ma, M. A Novel Ensemble Approach of Bivariate Statistical Based Logistic Model Tree Classifier for Landslide Susceptibility Assessment. Geocarto Int. 2018, 1–32. [Google Scholar] [CrossRef]
  49. Chen, W.; Xie, X.; Peng, J.; Himan, S.; Hong, H.; Bui, D.T. GIS-based landslide susceptibility evaluation using a novel hybrid integration approach of bivariate statistical based random forest method. Catena 2018, 164, 135–149. [Google Scholar] [CrossRef]
  50. Eeckhaut, M.V.D. Statistical modelling of Europe-wide landslide susceptibility using limited landslide inventory data. Landslides 2012, 9, 357–369. [Google Scholar] [CrossRef]
  51. Trigila, A.; Frattini, P.; Casagli, N.; Catani, F.; Crosta, G.; Esposito, C.; Iadanza, C.; Lagomarsino, D.; Mugnozza, G.S.; Segoni, S. Landslide Susceptibility Mapping at National Scale: The Italian Case Study. In Landslide Science and Practice; Springer: Berlin, Germany, 2015; pp. 287–295. [Google Scholar]
  52. Aghdam, I.N.; Pradhan, B.; Panahi, M. Landslide susceptibility assessment using a novel hybrid model of statistical bivariate methods (FR and WOE) and adaptive neuro-fuzzy inference system (ANFIS) at southern Zagros mountains in Iran. Environ. Earth Sci. 2017, 76, 237. [Google Scholar] [CrossRef]
  53. Moosavi, V.; Niazi, Y. Development of hybrid wavelet packet-statistical models (WP-SM) for landslide susceptibility mapping. Landslides 2016, 13, 1–18. [Google Scholar] [CrossRef]
  54. Pham, B.T.; Jaafari, A.; Prakash, I.; Bui, D.T. A novel hybrid intelligent model of support vector machines and the multiboost ensemble for landslide susceptibility modeling. Bull. Eng. Geol. Environ. 2018, 47, 1–22. [Google Scholar] [CrossRef]
  55. Guzzetti, F.; Mondini, A.C.; Cardinali, M.; Fiorucci, F.; Santangelo, M.; Chang, K.T. Landslide inventory maps: New tools for an old problem. Earth Sci. Rev. 2012, 112, 42–66. [Google Scholar] [CrossRef]
  56. Geospatial Data Cloud. Available online: https://http://www.gscloud.cn/ (accessed on 25 September 2018).
  57. Dou, J.; Yamagishi, H.; Pourghasemi, H.R.; Yunus, A.P.; Song, X.; Xu, Y. An integrated artificial neural network model for the landslide susceptibility assessment of Osado island, Japan. Nat. Hazards 2015, 78, 1749–1776. [Google Scholar] [CrossRef]
  58. Chen, W.; Peng, J.; Hong, H.; Shahabi, H.; Pradhan, B.; Liu, J. Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren county, Jiangxi province, China. Sci. Total Environ. 2018, 626, 230. [Google Scholar] [CrossRef] [PubMed]
  59. Nakamura, H.; Kubota, T. Landslide susceptibility from the viewpoint of its slope angle and geology. J. Jpn. Landslide Soc. 2010, 23, 6–12_1. [Google Scholar] [CrossRef]
  60. Westen, C.J.V.; Rengers, N.; Soeters, R. Use of geomorphological information in indirect landslide susceptibility assessment. Nat. Hazards 2003, 30, 399–419. [Google Scholar] [CrossRef]
  61. Dahal, R.K.; Hasegawa, S.; Nonomura, A.; Yamanaka, M.; Masuda, T.; Nishino, K. GIS-based weights-of-evidence modelling of rainfall-induced landslides in small catchments for landslide susceptibility mapping. Environ. Geol. 2008, 54, 311–324. [Google Scholar] [CrossRef]
  62. Youssef, A.M. Landslide susceptibility delineation in the Ar-Rayth area, Jizan, kingdom of Saudi Arabia, using analytical hierarchy process, frequency ratio, and logistic regression models. Environ. Earth Sci. 2015, 73, 1–20. [Google Scholar] [CrossRef]
  63. Chang, S.K.; Lee, D.H.; Wu, J.H.; Juang, C.H. Rainfall-based criteria for assessing slump rate of mountainous highway slopes: A case study of slopes along Highway 18 in Alishan, Taiwan. Eng. Geol. 2011, 118, 63–74. [Google Scholar] [CrossRef]
  64. Erener, A.; Mutlu, A.; Düzgün, H.S. A comparative study for landslide susceptibility mapping using GIS-based multi-criteria decision analysis (MCDA), logistic regression (LR) and association rule mining (ARM). Eng. Geol. 2016, 203, 45–55. [Google Scholar] [CrossRef]
  65. Bourenane, H.; Guettouche, M.S.; Bouhadad, Y.; Braham, M. Landslide hazard mapping in the Constantine city, northeast Algeria using frequency ratio, weighting factor, logistic regression, weights of evidence, and analytical hierarchy process methods. Arab. J. Geosci. 2016, 9, 1–24. [Google Scholar] [CrossRef]
  66. Jaafari, A.; Najafi, A.; Pourghasemi, H.R.; Rezaeian, J.; Sattarian, A. GIS-based frequency ratio and index of entropy models for landslide susceptibility assessment in the Caspian forest, northern Iran. Int. J. Environ. Sci. Technol. 2014, 11, 909–926. [Google Scholar] [CrossRef] [Green Version]
  67. Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Alkatheeri, M.M. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah basin, Asir region, Saudi Arabia. Landslides 2016, 13, 839–856. [Google Scholar] [CrossRef]
  68. Su, Q.; Zhang, J.; Zhao, S.; Wang, L.; Liu, J.; Guo, J. Comparative assessment of three nonlinear approaches for landslide susceptibility mapping in a coal mine area. ISPRS Int. J. Geo-Inf. 2017, 6, 228. [Google Scholar] [CrossRef]
  69. Jiang, P.; Chen, J. Displacement prediction of landslide based on generalized regression neural networks with k -fold cross-validation. Neurocomputing 2016, 198, 40–47. [Google Scholar] [CrossRef]
  70. Bui, D.T.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 2016, 13, 361–378. [Google Scholar]
  71. Menard, S. Applied Logistic Regression Analysis. Technometrics 2002, 38, 192. [Google Scholar]
  72. Bai, S.B.; Wang, J.; Lü, G.N.; Zhou, P.G.; Hou, S.S.; Xu, S.N. GIS-based logistic regression for landslide susceptibility mapping of the Zhongxian segment in the Three Gorges area, China. Geomorphology 2010, 115, 23–31. [Google Scholar] [CrossRef]
  73. Al-Abadi, A.M.; Al-Temmeme, A.A.; Al-Ghanimy, M.A. A gis-based combining of frequency ratio and index of entropy approaches for mapping groundwater availability zones at Badra–Al al-Gharbi–Teeb areas, Iraq. Sustain. Water Resour. Manag. 2016, 2, 265–283. [Google Scholar] [CrossRef]
  74. Bednarik, M.; Magulová, B.; Matys, M.; Marschalko, M. Landslide susceptibility assessment of the Kra’ovany–Liptovský Mikuláš railway case study. Phys. Chem. Earth 2010, 35, 162–171. [Google Scholar] [CrossRef]
  75. Atkinson, P.M.; Massari, R. Generalised linear modelling of susceptibility to landsliding in the central Apennines, Italy. Comput. Geosci. 1998, 24, 373–385. [Google Scholar] [CrossRef]
  76. Vapnik, V.N. Statistics for Engineering and Information Science; Springer: New York, NY, USA, 2000. [Google Scholar]
  77. Xu, C.; Dai, F.; Xu, X.; Yuan, H.L. GIS-based support vector machine modeling of earthquake-triggered landslide susceptibility in the Jianjiang river watershed, China. Geomorphology 2012, 145–146, 70–80. [Google Scholar] [CrossRef]
  78. Chen, W.; Yan, X.; Zhao, Z.; Hong, H.; Bui, D.T.; Pradhan, B. Spatial prediction of landslide susceptibility using data mining-based kernel logistic regression, naive Bayes and RBFNetwork models for the Long County area (China). Bull. Eng. Geol. Environ. 2018, 1–20. [Google Scholar] [CrossRef]
  79. Chen, W.; Shirzadic, A.; Shahabi, H.; Ahmade, B.B.; Shuai, Z.; Hong, H.; Ning, Z. A novel hybrid artificial intelligence approach based on the rotation forest ensemble and na€ıve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China. Geomatics Nat. Hazards Risk 2017, 8, 1–23. [Google Scholar] [CrossRef]
  80. Chen, W.; Zhang, S.; Li, R.; Himan, S. Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve bayes tree for landslide susceptibility modeling method. Sci. Total Environ. 2018, 644, 1006–1018. [Google Scholar] [CrossRef]
Figure 1. Landslide inventory map and the location of study area.
Figure 1. Landslide inventory map and the location of study area.
Entropy 20 00884 g001
Figure 2. Landslide explanatory variable maps involving: (a) Slope aspect; (b) slope angle; (c) altitude; (d) lithology; (e) mean annual precipitation; (f) distance to roads; (g) distance to rivers; (h) distance to faults; (i) land use; (j) normalized difference vegetation index (NDVI).
Figure 2. Landslide explanatory variable maps involving: (a) Slope aspect; (b) slope angle; (c) altitude; (d) lithology; (e) mean annual precipitation; (f) distance to roads; (g) distance to rivers; (h) distance to faults; (i) land use; (j) normalized difference vegetation index (NDVI).
Entropy 20 00884 g002aEntropy 20 00884 g002bEntropy 20 00884 g002c
Figure 3. Landslide susceptibility map derived from: (a) The IOE model; (b) logistic regression (LR)–IOE model; (c) support vector machine (SVM)–IOE model.
Figure 3. Landslide susceptibility map derived from: (a) The IOE model; (b) logistic regression (LR)–IOE model; (c) support vector machine (SVM)–IOE model.
Entropy 20 00884 g003
Figure 4. Receiver operating characteristics (ROC) curves of models: (a) Training dataset; (b) validating dataset.
Figure 4. Receiver operating characteristics (ROC) curves of models: (a) Training dataset; (b) validating dataset.
Entropy 20 00884 g004
Figure 5. Percentages of different landslide susceptibility classes for the three models.
Figure 5. Percentages of different landslide susceptibility classes for the three models.
Entropy 20 00884 g005
Table 1. Lithological units of study area.
Table 1. Lithological units of study area.
CategoryGeological AgeCodeMain Lithology
AHoloceneQ4Sand, gravel, loess
PleistoceneQ3Loess, gravel
BPlioceneN2jSandy clay
PlioceneN2bQuartz sand, clay
CMiddle JurassicJ2ySiltstone, sandstone, mudstone, shale, coal seam
Late JurassicJ1fMudstone, glutenite
DEarly TriassicT3wMudstone, shale, coal seam
Early TriassicT2-3yGlutenite, mudstone, shale, siltstone
Middle TriassicT2zSandstone, mudstone
Late TriassicT1hMedium-fine sandstone, siltstone, mudstone
Late TriassicT1lSandstone, mudstone
EEarly PermianP2sGlutenite, sandstone, mudstone
Early PermianP2shMudstone, silty mudstone, sandstone, clay minerals, siliceous
Late PermianP1shFeldspar quartz sandstone, conglomerate, sandstone, mudstone, shale
Late PermianP1sMudstone, shale, sandstone, coal seam
FCarboniferousC2tCalcaremaceous sandstone, coal seam, mudstone
Table 2. Pearson correlation coefficient between pairs of explanatory variables.
Table 2. Pearson correlation coefficient between pairs of explanatory variables.
Explanatory VariablesSlope AspectSlope AngleAltitudeLithologyMean Annual PrecipitationDistance to RoadsDistance to RiversDistance to FaultsLand Use
Slope aspect1
Slope angle0.0371
Altitude0.1160.0031
Lithology0.1650.1700.0101
Mean annual precipitation0.1400.100−0.0210.0251
Distance to roads0.2800.0670.0790.0480.2051
Distance to rivers0.3680.1040.112−0.0100.0040.1601
Distance to faults0.3200.054−0.0700.0750.0240.0340.1191
Land use0.123−0.1160.0870.0530.2870.0500.0840.0191
NDVI0.0380.011−0.0090.1790.146−0.065−0.0550.0470.082
Table 3. VIF and tolerances for explanatory variables.
Table 3. VIF and tolerances for explanatory variables.
Explanatory VariablesVIFTolerances
Slope angle0.6571.523
Slope aspect0.9621.040
Altitude0.7901.265
Distance to rivers0.6871.455
Distance to roads0.5731.746
Distance to faults0.9091.100
NDVI0.7701.298
Land use0.9101.099
Lithology0.5191.926
Mean annual precipitation0.6111.637
Table 4. Spatial relationship between each landslide explanatory variable and landslide by the index of entropy (IOE) model.
Table 4. Spatial relationship between each landslide explanatory variable and landslide by the index of entropy (IOE) model.
Explanatory VariablesClassesNo. of Pixels in Domain% Percentage of DomainNo. of Landslide% Percentage of LandslidesFRijSijMjMjmaxIjWjBi
Slope aspectFlat7360.02100.0000.0000.0002.8703.1700.0950.0840.061
North436,17512.23496.5690.5370.067
Northeast478,23313.4132115.3281.1430.143
East453,97912.73396.5690.5160.065
Southeast435,97412.2283223.3581.9100.239
South492,24513.8061510.9490.7930.099
Southwest471,64613.2292518.2481.3790.173
West413,51411.598139.4890.8180.103
Northwest382,82010.737139.4890.8840.111
Slope angle (°)0–6.65434,59812.1901611.6790.9580.1352.4452.5850.0540.0640.043
6.65–11.40954,01226.7583122.6280.8460.119
11.40–16.39937,52426.2962518.2480.6940.098
16.39–22.09640,54617.9662820.4381.1380.161
22.09–29.45349,5509.8041410.2191.0420.147
29.45–60.57249,0926.9872316.7882.4030.339
Altitude (m)761–90371,7022.0112618.9789.4370.6751.5772.8070.4380.874−0.252
903–984354,9389.9552618.9781.9060.136
984–1054796,32822.3352719.7080.8820.063
1054–1124851,00423.8692618.9780.7950.057
1124–1194989,54627.7552820.4380.7360.053
1194–1262487,43813.67242.9200.2140.015
1262–142314,3660.40300.0000.0000.000
LithologyCategory A80,8052.26610.7300.3220.1091.9632.5850.2400.119−0.013
Category B650,27018.2391410.2190.5600.189
Category C2,029,31656.91811583.9421.4750.497
Category D736,19420.64964.3800.2120.072
Category E65,7041.84310.7300.3960.134
Category F30330.08500.0000.0000.000
Mean annual precipitation (mm/y)<36063,4681.78021.4600.8200.0812.3572.8070.1600.2320.239
360–380630,45617.68353.6500.2060.020
380–400537,28215.0702014.5990.9690.096
400–420850,90023.8662216.0580.6730.066
420–440999,89528.0454432.1171.1450.113
440–460451,40212.6613928.4672.2480.222
>46031,9190.89553.6504.0770.042
Distance to roads (m)<200385,49810.8127756.2045.1980.6171.6092.3220.3070.517−0.533
200–400311,5808.7392014.5991.6700.198
400–600282,1257.91396.5690.8300.099
600–800248,2896.96442.9200.4190.050
>8002,337,83065.5712719.7080.3010.036
Distance to rivers (m)<2001,108,72231.0978662.7742.0190.5011.9562.3220.1580.127−0.269
200–400881,38324.7212618.9780.7680.191
400–600642,14518.011128.7590.4860.121
600–800389,49710.92575.1090.4680.116
>800543,57515.24664.3800.2870.071
Distance to faults (m)<2000526,62414.7711913.8690.9390.1902.2512.3220.0300.0300.110
2000–4000459,27112.882107.2990.5670.115
4000–6000431,65112.1071410.2190.8440.171
6000–8000344,3399.6582014.5991.5120.307
>80001,803,43750.5837454.0151.0680.217
Land useWater13,2660.37200.0000.0000.0001.2582.3220.4580.9740.061
Residential areas86,1172.4152518.2487.5550.711
Bare land178,071249.9457151.8251.0380.098
Forest/Grassland1,317,84536.9631712.4090.3360.032
Farmland367,38210.3042417.5181.7000.160
NDVI−0.39 to −0.019278,4307.8094019.1973.7390.5771.7792.3220.2340.303−0.354
−0.019 to 0.063988,70027.7313827.7371.0000.154
0.063–0.1341,233,77734.6054331.3870.9070.140
0.134–0.216837,51223.491128.7590.3730.058
0.216–0.607226,9036.36442.9200.4590.071
B0 is 2.345.

Share and Cite

MDPI and ACS Style

Zhang, T.; Han, L.; Chen, W.; Shahabi, H. Hybrid Integration Approach of Entropy with Logistic Regression and Support Vector Machine for Landslide Susceptibility Modeling. Entropy 2018, 20, 884. https://doi.org/10.3390/e20110884

AMA Style

Zhang T, Han L, Chen W, Shahabi H. Hybrid Integration Approach of Entropy with Logistic Regression and Support Vector Machine for Landslide Susceptibility Modeling. Entropy. 2018; 20(11):884. https://doi.org/10.3390/e20110884

Chicago/Turabian Style

Zhang, Tingyu, Ling Han, Wei Chen, and Himan Shahabi. 2018. "Hybrid Integration Approach of Entropy with Logistic Regression and Support Vector Machine for Landslide Susceptibility Modeling" Entropy 20, no. 11: 884. https://doi.org/10.3390/e20110884

APA Style

Zhang, T., Han, L., Chen, W., & Shahabi, H. (2018). Hybrid Integration Approach of Entropy with Logistic Regression and Support Vector Machine for Landslide Susceptibility Modeling. Entropy, 20(11), 884. https://doi.org/10.3390/e20110884

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop