Next Article in Journal
Identification and Analysis of a CPYC-Type Glutaredoxin Associated with Stress Response in Rubber Trees
Next Article in Special Issue
Potential for Forest Restoration and Deficit Compensation in Itacaiúnas Watershed, Southeastern Brazilian Amazon
Previous Article in Journal
Initial Location Preference Together with Aggregation Pheromones Regulate the Attack Pattern of Tomicus brevipilosus (Coleoptera: Curculionidae) on Pinus kesiya
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling

1
Vietnam Academy for Water Resources, 171 Tay Son Street, Ha Noi 100000, Viet Nam
2
Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
3
Department of Geotechnical Engineering, Hydraulic Construction Institute, Vietnam Academy for Water Resources, 3/95 Chua Boc Street, Ha Noi 100000, Viet Nam
4
Department of Science & Technology, Bhaskarcharya Institute for Space Applications and Geo-Informatics (BISAG), Government of Gujarat, Gandhinagar 382007, India
5
School of Computer Engineering, KIIT-Deemed to be University, Odisha 751024, India
6
Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran
7
Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran
8
Department of Geotechnical Engineering, University of Transport and Communication, Ha Noi 100000, Vietnam
9
Computer Science and Engineering Department, LNCT College, Jabalpur-482053, India
10
Department of IT, LBEF(APUTI), Kathmandu, Nepal-44600
11
Geographic Information System group, Department of Business and IT, University of South-Eastern Norway, Bø i Telemark N-3800, Norway
*
Author to whom correspondence should be addressed.
Forests 2019, 10(2), 157; https://doi.org/10.3390/f10020157
Submission received: 13 December 2018 / Revised: 19 January 2019 / Accepted: 23 January 2019 / Published: 12 February 2019
(This article belongs to the Special Issue Watershed Scale Forest Restoration and Sustainable Development)

Abstract

:
This paper presents novel hybrid machine learning models, namely Adaptive Neuro Fuzzy Inference System optimized by Particle Swarm Optimization (PSOANFIS), Artificial Neural Networks optimized by Particle Swarm Optimization (PSOANN), and Best First Decision Trees based Rotation Forest (RFBFDT), for landslide spatial prediction. Landslide modeling of the study area of Van Chan district, Yen Bai province (Vietnam) was carried out with the help of a spatial database of the area, considering past landslides and 12 landslide conditioning factors. The proposed models were validated using different methods such as Area under the Receiver Operating Characteristics (ROC) curve (AUC), Mean Square Error (MSE), Root Mean Square Error (RMSE). Results indicate that the RFBFDT (AUC = 0.826, MSE = 0.189, and RMSE = 0.434) is the best method in comparison to other hybrid models, namely PSOANFIS (AUC = 0.76, MSE = 0.225, and RMSE = 0.474) and PSOANN (AUC = 0.72, MSE = 0.312, and RMSE = 0.558). Thus, it is reasonably concluded that the RFBFDT is a promising hybrid machine learning approach for landslide susceptibility modeling.

1. Introduction

Landslides are gravitational movements of slope-framing materials caused by natural and anthropogenic activities [1]. They are considered one of the major hazards affecting human life, property, infrastructure, and landscape [2]. A landslide susceptibility map is a fundamental tool for landslide hazard management and land use planning. Assessment of landslide susceptibility gauges the spatial probability of landslide occurrences considering a set of geo-environmental parameters [3]. As a landslide is a complex process related to geology, topography, and other geo-environmental factors associated with different conditioning and triggering factors, modeling landslide susceptibility is a difficult task. In recent years, many techniques have been developed for landslide modeling; in general, these methods can be divided in to three main approaches namely expert system, physical strategies, and information mining techniques [4]. Out of these approaches, information mining strategies, which utilize machine learning and statistical methods, are considered the best for landslide hazard assessment and prediction [5].
In the last 10 years, different information mining strategies have been adopted all over the world. Bui et al. [6] applied Adaptive Neuro-Fuzzy Inference System (ANFIS) for torrential slide mapping and modeling in the Hoa Binh area of Vietnam. Umar et al. [7] utilized an ensemble technique of frequency ratio and logistic regression for landslide susceptibility mapping. Su et al. [8] applied Support Vector Machines (SVM) for mapping precipitation accentuated landslide susceptibility mapping in the Wencheng territory of Chan Province, China. Chen et al. [9] applied and compared various data mining methods, namely Kernel Logistic Regression, Naive Bayes and RBF network models. Youssef et al. [10] compared various models named Random Forest, Boosted Regression Tree, Classification and Regression Tree, and General Linear models for landslide susceptibility mapping. In addition, there are other models developed and applied for assessment of susceptibility of landslide such as Artificial Neural Networks [11], Best First Decision Tree [12], and Kernel Logistic Regression [13].
More recently, many researchers have combined different single methods and techniques to develop various hybrid models for better assessment of landslide susceptibility. Abedini et al. [14] developed a hybrid model that is a combination of Bayesian Logistic Regression and various ensemble techniques, and stated that the hybrid models are promising techniques for the assessment of landslide susceptibility. Zhang et al. [15] enhanced the prediction performance of landslide susceptibly model by developing the novel hybrid approach of Entropy with Logistic Regression and the SVM, and claimed that this developed hybrid model outperformed the singe Entropy model. Chen et al. [16] developed a novel hybrid approach of Bagging Ensemble and Kernel Logistic Regression for modeling landslide susceptibility, and proved that the novel developed model outperformed the benchmark SVM model. Even though the mentioned methods performed well for landslide susceptibility modeling at a given area, there is no conclusive information about which model is the best for other regions. Moreover, the applicability of the developed new techniques and approaches for better assessment of predictive capability of landslide susceptibility models needs to be further evaluated.
In this study, the main aim is to develop novel hybrid machine learning approaches such as Adaptive Neuro Fuzzy Inference System optimized by Particle Swarm Optimization (PSOANFIS), Artificial Neural Networks optimized by Particle Swarm Optimization (PSO) (PSOANN), and Best First Decision Trees based Rotation Forest (RFBFDT) for the evaluation and selection of the best landslide susceptibility model. More specifically, the PSOANFIS is a hybrid approach of ANFIS and PSO, whereas the PSOANN is a hybrid approach of Artificial Neural Networks (ANN) and the PSO and the RFBFDT is a hybrid model of Rotation Forest (RF) and Best First Decision Trees (BFDT). The Van Chan district, Yen Bai province, a landslide-prone hilly area in Vietnam, was selected for the present study. The Area under the Receiver Operating Characteristics (ROC) curve (AUC), Mean Square Error (MSE), and Root Mean Square Error (RMSE) methods were used for the model validation.

2. Study Area

The study area is Van Chan district of Yen Bai Province, located between longitudes 104°16′02″ and 104°54′43″ and latitudes 21°48′49″ to 21°19′34″ in the northeast region of Vietnam (Figure 1). The area of the district is approximately 1207 km2 and it has a population of about 144,201. The topography of the area is mountainous and midland type, with elevation ranges from 60 m to 2542 m. High mountains, namely Tay Con Linh and Kieu Lieu Ti, are located on the western side. Bac Ha, Quan Bạ, and Dong Van are the plateaus (highlands) located on the northern side, with an average elevation of 1000–1200 m. Dong Van Plateau is the highest at 1600 m. The midlands (elevation 100–150 m) are on the southwest side. The lowest elevation in the area is in the southeast.
Hills and valleys are generally aligned in the northwest to southeast direction, parallel to the orientation of geological faults. Drainage density in the area is high and most of the drainage is structurally controlled. Hill slopes are very steep in places (up to 84°). Narrow valleys and steep hill slopes are some of the main factors causing landslides, besides heavy rains and anthropogenic activity. Changes in the land use pattern for cultivation of rice on terraces and other developmental activities increased the landslide occurrences in the area. Accumulation of irrigation water on the terraces increases effective weight and reduces the strength of the slope-forming materials, thus adversely affecting the stability of slopes.
Geologically, the study area is occupied by igneous, metamorphic, and sedimentary rocks belonging to the Tu Le–Ngoi Thia complex (21.56%), Tram Tau formation (15.42%), and Ca Vinh complex (13.17%). Rock mass in this area is highly weathered. Depth of weathering varies from 10 m to 18 m. Most of the landslides are observed in the weathered Tu Le–Ngoi Thia complex (10.78%), Tram Tau formation (10.18%), and in gabbro and diabase rocks (11.38%) (Figure 2 and Table 1). Weathered rocks have high permeability and low strength, resulting in slope failure.

3. Materials and Methods

3.1. Data Used

3.1.1. Landslide Inventory

A landslide inventory showing the location and type of landslides occurring in the area is important for the development of landslide models. In this area, 167 landslides were identified from Google Earth images and air photos checked against the available historical record and limited field investigations. Based on these data, a landslide inventory map was constructed. Translational, rotational, mixed, and debris flow types of landslides occur in the area. Translation type of landslides are prominent in the study area, hence only these landslides were taken into account for modeling. National Road No. 32 is most affected by landslide hazards (Figure 3). The size of landslides varies from a few cubic meters to thousands of cubic meters. We selected the center of each scar (polygon) of the landslide as one point with a cell size of 20 m for sampling as we considered that most of the pixels of a landslide polygon have identical conditions for landslide occurrence in similar types of slope-forming materials [17,18].

3.1.2. Landslide Influencing Parameters

In landslide modeling, it is very important to select the suitable affecting factors for landslide assessment. In our study, the selection of factors is based on the analysis of the nature of landslide occurrences in relation to the characteristics of geomorphology, geology, hydrology, meteorology, and human impacts in the study area. Thus, we have selected 12 factors, namely slope, aspect, elevation, curvature, slope length, valley depth, distance to rivers, distance to roads, distance to faults, Topographic Wetness Index (TWI), and Terrain Ruggedness Index (TRI), for landslide analysis and modeling. Each factor was classified into several classes based on the standard classification for lithology and aspect, natural break method for slope and expert’s knowledge method for elevation, curvature, slope length, valley depth, distance to rivers, distance to roads, distance to faults, TWI, and TRI [19,20,21,22,23]. In addition, the Frequency Ratio (FR) method, which is defined as the percentage of the number of landslide pixels per the percentage of the number of class pixels in the study area, was applied to assess the spatial relationship between the landslides and 12 conditioning factors (Table 2).
Slope is important in landslide susceptibility study [24]. A slope angle map of the study area was generated from a Digital Elevation Model (DEM) with 20 m spatial resolution, which was generated from the topographic map of 1:50000 scale. A total of six classes (0–7.92, 7.92–17.82, 17.82–26.07, 26.07–34.65, 34.65–44.88, and 44.88–84.16°) were obtained on the slope map using the natural break method in GIS application (Figure 4a). According to the FR analysis, slopes in this area between 7.92°and 34.65° had the high FR values, ranging from 1.13 to 1.69, which indicate the highest susceptibility to landslide occurrences in these three classes.
Aspect is a significant factor in the development of landslide susceptibility maps [25]. A map of aspect was extracted from the DEM with nine slope aspect classes: north (0–22.5°; 337.5–360°), flat (−1°), east (67.5–112.5°), northeast (22.5–67.5°), southeast (112.5–157.5°), south (157.5–202.5°), west (247.5–292.5°), southwest (202.5–247.5°), and northwest (292.5–337.5°) (Figure 4b). The FR analysis showed that slopes facing north, northeast, east, south, and southeast are generally prone to landslides as their FR values are 1.15, 1.12, 1.41, 1.27, and 1.22, respectively.
Elevation is one of the important factors in the occurrence of landslides as height affects the loading on the slope and thus enhances the chances of landslides when the sliding plain has a dip (orientation) towards the open excavation [26]. The weathering profile also depends on the elevation of the area. An elevation map was extracted from the DEM 20 m including seven classes (0–200, 200–400, 400–600, 600–800, 800–1000, 1000–1200, 1200–1400, 1400–1600, 1600–1800, and 1800–2542 m) (Figure 4c). The FR analysis indicated that the class of 400–600 m above sea level is the most susceptible (FR = 1.66), whereas above elevation 1400 m the frequency of occurrence of landslide susceptibility is the lowest. This might be due to more weathering on the middle height slope in comparison to higher levels.
Curvature is an important landslide affecting factors such as the runoff or accumulation of water on the slope, depending on the type of curvature [27]. In this study, a curvature map was extracted from the DEM 20 m and classified as concave, convex, or flat depending on its value either below, above, or equal to 0.05, respectively (Figure 4d). The FR analysis showed that 55.69% of landslides occurred in concave class curvature slopes, which occupy 41.71% of the area. The occurrence of more landslides on a concave surface can be related to the accumulation of more water on such slopes.
Slope length is the distance from the origin of the landslide’s flow along its flow path to the place of its runout distance or end. The parameters that control the runout distance of a landslide are geometry, physical property, and frictional coefficients. A slope length map was constructed from the DEM 20 m using SAGA tool with six classes (0–20, 20–50, 50–100, 100–150, 150–200, and 200–2501 m) (Figure 4e). The FR analysis based on the slope length map showed that the highest susceptibility to landslide incidence is in the 200–500 m slope length class (Table 2). This may be due to the topography and structure of the area.
Valley depth controls the weathering process and water transportation and accumulation; thus, it affects landslide occurrences. In this area, a total valley depth map was constructed from the DEM 20 m using SAGA tool considering six classes of depth (0–5, 5–30, 30–60, 60–100, 100–150, and 150–656 m) (Figure 4f). The FR analysis showed that the most landslide-susceptible class is at 100–150 m (FR = −1.62), whereas the lowest FR value (0.47) was obtained for valley depth >150 m.
Distance to rivers is one of the most important factors for the stability as distance from a river affects the saturation degree of the slope-forming materials (Dai et al., 2001; Saha et al., 2002). A distance to rivers map was constructed on the basis of buffering the rivers extracted from the topographic map (1: 50,000) with five classes (0–100, 100–200, 200–300, 300–400, and >400 m) (Figure 4g). The FR analysis indicated that with the increase of the distance to the rivers, the probability of landslide occurrence is decreased. Specifically, most of the landslides are located within the 100‒200m distance class (FR = 1.56).
Distance to roads is one of the factors that most affects landslide occurrences as most of the landslides are observed close to roads [28]. In this study, a distance to roads map was constructed on the basis of buffering the roads extracted from the topographic map (1: 50,000) and divided into five different buffer class (0–100, 100–200, 200–300, 300–400, and >400 m) (Figure 4h). The FR analysis indicated that most landslides occurred within 0–100 m from roads.
Distance to faults is one of the most important affecting factors as slope may fail along faults depending on the nature and orientation of faults [29]. Faults with clay gouge and dipping towards the slope face are the most unfavorable features for slope stability. In the study area, a distance to faults map was constructed with five different buffer classes on the basis of buffering the faults extracted from the geological map (1: 50,000) (0–250, 250–500, 500–750, 750–900, and >900 m) (Figure 4i). The FR analysis indicated that with increasing distance from the faults, the probability of landslides is decreased. In this area, fault distance between 250 m and 500 m was most vulnerable to landslide occurrence (FR = 1.56).
Lithology plays a very important role in landslide occurrences as soft and weathered rocks are more vulnerable than hard unjointed rocks, thus lithological units have different vulnerability to landslides [30]. In the study area, a lithology map was extracted from the Geological and Mineral Recourses Map on a scale of 1:50,000 with seven major lithological units (A, B, C, D, E, F, and G) (Figure 4j and Table 3). The FR analysis indicated that group A has the highest FR value (1.46), while group C has the lowest value (0.26) (Table 2).
Topographic Wetness Index (TWI) is a secondary geomorphometric parameter used to describe and quantify local relief [31] as it reveals the diversity and complexity of landslide topographic surface. As the slope-forming material moves, the TWI range increases. In this study, a TWI map was generated from the DEM 20 m using the SAGA tool with different classes (0–8, 8–9, 9–10, 10–11, and 11–24) (Figure 4k). The FR analysis indicated that the class of 9–10 of TWI is the most susceptible (FR = 0.99) (Table 2).
Terrain Ruggedness Index (TRI) proves capable of differentiating landslide population into smaller groups, consistent with their variable origin and mechanism of displacement. As the slope surface moves, the TRI range decreases. However, in the case of slump and rockslide, the calculation is different. In this study, a TRI map was generated from the DEM using the SAGA tool with different classes (0–1, 1–3, 3–5, 5–7, and >7) (Figure 4l). The FR analysis indicated that the class of 3–5 of TRI is the most susceptible class (Table 2).

3.2. Methods Used

3.2.1. Adaptive Neuro Fuzzy Inference System (ANFIS)

The ANFIS was first introduced by Roger Jang [32]. It consists of two parts, a neural network (ANN) and a reasoning capability of Fuzzy Inference System (FIS) in order to enhance the power prediction for comparing the use of a single model [33]. In other word, the ANFIS is able to train FIS membership function (MF) parameters on a training dataset using a combination of back-propagation gradient descent and least-squares methods [34]. The FIS performed is based on the concepts of fuzzy set theory, fuzzy if‒then rules, and fuzzy reasoning [35]. Among all FIS membership function, the Sugeno fuzzy model has been widely used due to high interpretability and computational efficiency, and built-in optimal and adaptive techniques [36]. The flowchart of ANFIS architecture is shown in Figure 5.
In this figure, a circle indicates a node and rectangles denote adaptive nodes. We assumed that there are two FIS, including x and y and one input, z. At first, using the Sugeno fuzzy model, four fuzzy “if‒then rules” can be developed;
R 1 : If   x   is   A 1   and   y   is   B 1 ,   then   z 1 = p 1 x + q 1 y + r 1 R 2 : If   x   is   A 1   and   y   is   B 2 ,   then   z 2 = p 2 x + q 2 y + r 2 R 3 : If   x   is   A 2   and   y   is   B 1 ,   then   z 3 = p 3 x + q 3 y + r 3 R 4 : If   x   is   A 2   and   y   is   B 2 ,   then   z 4 = p 4 x + q 4 y + r 4
where, Ai and Bi are the fuzzy sets, and pi, qi, and ri are the parameters obtained during the training process. The ANFIS consists of five layers as follows (Figure 5):
Layer 1 (fuzzification): In this layer, the amount of the input variables will fuzzify and each node employs a node function by:
O i 1 = μ A i ( x ) ,   i = 1 ,   2 O i 1 = μ B i 2 ( x ) ,   i = 3 ,   4 ,
where any fuzzy membership function (MFs) can be adopted on μAi(x) and μBi − 2(y) such as Triangle, Generalized bell (Gbell), and Gaussian.
Layer 2 (fuzzy AND): in this layer, each node calculates the firing strength of a rule via multiplication.
O k 2 = ω k = μ A i ( x ) μ B j ( y ) ,   i = 1 ,   2 ;   j = 1 , 2 ;   k = 2 ( i 1 ) + j  
Layer 3 (normalization): In this layer, the firing strength of each node will be normalized using the ratio of firing strength of every node to the total value of each node.
O i 3 = ω i ¯ = ω i ω 1 + ω 2 + ω 3 + ω 4 ,   i = 1 ,   2 ,   3 ,   4 ,
where ω i ¯ is the normalized firing strength.
Layer 4 (fuzzy inference): In this layer, each node has the following function:
O i 4 = ω i ¯ z i = ω i ¯ ( p i x i + q i y + r i ) ,   i = 1 ,   2 ,   3 ,   4 ,
where ω i ¯ is the output of layer 3 and (pi; qi; ri) is the consequent parameters set.
Layer 5 (defuzzification): The overall outputs of all the rules will be obtained in this layer using the defuzzification process of the FIS, which is formulated as follows:
O i 5 = i = 1 4 ω i ¯ z i = ω 1 z 1 + ω 2 z 2 + ω 3 z 3 + ω 4 z 4 ω 1 + ω 2 + ω 3 + ω 4
In addition, the details of the ANFIS model can be observed in various studies including those by Chen, Panahi, and Pourghasemi [34], Jang [32], and Aghdam et al. [37].

3.2.2. Multilayer Perceptron Neural Networks

Artificial Neural Networks (ANNs), as a branch of Artificial Intelligence (AI), are nonlinear function approximation algorithms that can be used as a proper approach for classification and prediction problems such as landslides based on the degree of membership value of each pixel over the study area [38]. It indicates that with increasing the value of membership of each pixel, the probability of landslide occurrence will be increased. The ANNs have two functions, Multi-Layer-Perceptron (MLP) and Radial Base Function (RBF). Some researchers that have used the ANNs for landslide susceptibility mapping reported that the MLP is better than the RBF function in the detection of landslide locations [27,39].
The MLP consists of input, one and more hidden layers, and one output so that its complexity will increase when increasing the number of hidden layers [27]. In the landslide susceptibility assessment using the MLP, the condition factors are input layer, the result of landslide modeling, landslide and non-landslide, is output layer, and the classifying layers are the hidden layer [40].
This approach, based on the two main datasets, including training and testing datasets, was performed. A training dataset is applied for the training process, which it performs in two steps; firstly, the hidden layers propagate forward the input layer to output value and consequently the error is computed to compare the pre-value and target value. Secondly, during the training process, the weights will be regulated for achieving the best results with the least difference [41]. Moreover, in the testing phase, the validity of the obtained results (target values) based on some error criteria will be checked for future samples.
Consider that x = x i ,   i = 1 ,   2 , ,   n is the vector of landslide conditioning factors, y = y i ,   i = 1 ,   2 that indicates landslide and non-landslide classes. The MLP neural network function in the landslide modeling can be expressed as follows:
y = f ( x ) + b ,
where b is bias and f ( x ) is an unknown function that is optimized by the adjustable network weights during the training process for a given network architecture [40].

3.2.3. Particle Swarm Optimization (Pso)

The PSO is one of the evolutionary algorithms (meta-heuristic) developed by Kennedy et al. (1995). Design of the PSO is based on the nearest route to find food using the movement of biological organisms such as flocks and fish [42]. In recent years, it has been most popular in the optimization of nonlinear problems [34]. In this algorithm, a swarm of particles denotes a potential answer to the problem that searches for the best position based on the best solution. The fitness function can be used to assess the merit of the particles for calculating the fitness values. The particles in the PSO move along the feature space using a set of the following updated equations [42]:
{ v i ( t + 1 ) = w v i ( t ) + c 1 r a n d 1 ( p b e s t x i ( t ) ) + c 2 r a n d 2 ( g b e s t x i ( t ) ) x i ( t + 1 ) = x i ( t ) + v i ( t + 1 ) ,
where x i and v i are the position and velocity of the ith particle in the feature space, respectively; w is the inertial weight coefficients; c 1 and c 2 are learning factors, and r a n d 1 and r a n d 2 are positive random numbers from 0 to 1. p b e s t is the personal best position of particle i, and g b e s t is the best among all of the particles. In this study, the PSO method is used to optimize the ANFIS and ANN modeling parameters to construct the PSOANFIS and PSOANN prediction models for landslide susceptibility assessment.

3.2.4. Rotation Forest

Rotation Forest (RF) is one of the meta ensemble algorithms that was first introduced by Rodriguez et al. [43] to enhance the power prediction of a weak individual classifier in comparing with using a weak individual classifier alone and also increasing the diversity of base classifiers [44]. In this approach, feature space of training dataset are divided into some subsets based on the Principal Component Analysis [45] for learning base classifiers. The Meta classifiers generally create higher prediction accuracy in comparison with single-based classifiers [46].
In this study, the RF as a Meta classifier in order to detect landslide occurrence locations has been applied. Consider x = x ( x 1 ,   x 2 , ,   x 12 ) is the vector of 11 landslide conditioning factors, y = ( y 1 , y 2 ) is the vector of landslide and no-landslide occurrence class, and D indicates the training dataset. C 1 , C 2 , , C L are the number of classifiers for learning, and φ is a set of landslide conditioning factors. In the first step, φ are divided into k training subsets in which 10/k landslide conditioning factors in each training subset are created. Let φi,j be j-th (j = 1, 2, …, k) subset of landslide conditioning factors Ci and Pi,j is landslide conditioning factor in φi,j from D. According to the bootstrap algorithm, i,j with 75% sized randomly selected from Pi,j.
In the next step, to calculate the coefficients of z i , 1 ( 1 ) ,   z i , 2 ( 2 ) , ,   z i , 1 ( K i ) , the Pi,j will be transformed with the size zi,1 equals to T × 1. In fact, the RF is constituted using base classifier and the rotation matrix ( Z i a ) by transformation technique (rearranging the matrix of Z i ), which is observed as follows [40]:
[ z ( 1 ) i 1 , , z ( M 1 ) i 1 { 0 } { 0 } { 0 } z ( 2 ) i 2 , , z ( M 2 ) i 2 { 0 } { 0 } z ( K ) i K , , z ( M K ) i K ] .
Then, the columns of Z i are rearranged using the original feature set. In the next step, the ( θ Z i M ) value will be transformed on a training dataset using classifier D i . Consequently, all classifiers after training with parallel manner will be summed [43].
The classification phase, using the testing dataset of x, will be evaluated when d i j ( θ Z i a ) is the probability value determined by classifier D i based on the hypothesis that x belongs to class y . Then, the average combination method of a class is obtained as follows:
m j ( θ ) = 1 L i = 1 L d i j ( θ Z i a ) , y = 1 , , c .
Lastly, the largest confidence of the class will be assigned by θ .

3.2.5. Best First Decision Trees

The main idea of the expansion of decision tree nodes of Best First Decision Trees (BFDT) algorithm was introduced by Friedman et al. (2000). In this algorithm, the best node expanded in depth-first order as compared to C4.5 and CART [47]. The best node among all nodes to split is a node that leads to maximum reduction of impurity such as Gini index or information gain. The BFDT creates a binary tree in which each internal node is assigned two outgoing edges.
The growth of the tree will continue until the internal nodes reach maximum homogeneity. This means that a terminal node does not split further when it will be pureed so that all cases have the same value for the dependent variable (landslide and non-landslide). To assess the impurity in this algorithm, information gain and Gini index measures based on the entropy are used. In this study, Information Gain (IG) is used for assessing the impurity. Moreover, the entropy specifies the purity of any sample set. Consider D as the training dataset, A as a conditioning factor such as slope angle, and “i” a class label (landslide and non-landslide). The following equation can obtain the IG values of factors (e.g., slope angle):
Entropy ( D ) = i = 1 p i log 2 p i ,
where p i is the proportion of D belonging to class i. The IG leads to splitting the training dataset by a reduction in entropy using the following equation:
Information   Gain ( D , A ) = Entropy ( D ) i value ( A ) | D i | | D | Entropy ( D i ) ,
where values (A) is the set of all possible values for slope angle factor (A) and D i is the subset of D for which attribute A has value i. The tree in the BFDT algorithm will be stopped when all instances belonging to a landslide or non-landslide as a target feature or the best value of IG value are less than zero [48].

3.2.6. Validation Assessment

In this study, mean square error (MSE), root mean square error (RMSE), and area under the receiver operative characteristic (AUC) curve were used to validate the performance of the developed models. The MSE estimates the generalization error of the model, whereas the RMSE measures the forecasting errors of the models [49]. The MSE and RMSE can be expressed as follows:
MSE = i = 1 N ( X obs X est ) 2 n
RMSE = i = 1 N ( X obs X est ) 2 n ,
where Xobs denotes the observed values in the training dataset or validation dataset, Xest represents the estimated (output) values from the landslide susceptibility models, and n is the total number of samples in the training or validation datasets [50]. The result of modeling is effective when the values of RMSE and MSE are small [51].
In addition, another standard and applicable technique that has been utilized in almost all landslide susceptibility assessments is the Area under the Receiver Operative Characteristic (AUC) Curve [52]. Generally, the ROC curve is plotted based on the sensitivity as the y-axis and the 1-specificity as the x-axis [53]. The AUC pinpoints the performance of a model so that a higher AUC indicates better model performance [52]. It has a range between 0.5 (random model) and 1 (ideal model) [54,55]. The AUC can be formulated as follows:
AUC = TP + TN R ,
where TP and TN are the number of correctly and incorrectly classified as landslides, respectively; R is the total number of landslides and non-landslides [53].

4. Methodology Adopted for Developing Landslide Susceptibility Maps

The methodology of the present study includes four main steps: (1) generation of training and testing dataset, (2) building of the hybrid models, (3) validation of the hybrid models, and (4) development of landslide susceptibility map (Figure 6). A brief description of methodology is below:
Step 1: Training and testing datasets were generated using landslide data of the study area. A training dataset was generated with 70% of landslide inventory (117 locations), whereas a testing dataset was constructed with the 30% remaining landslide inventory (50 locations). In the datasets, non-landslide locations were also taken into account as landslide prediction is considered a binary classification problem. Non-landslide locations were identified based on the study of the area. Out of these, 117 non-landslide locations were used for the training dataset while 50 non-landslide locations were used for testing datasets. For modeling, landslide instances were assigned “1” whereas non-landslide instances were assigned “0”.
Step 2: Using the training dataset, the hybrid models (RFBDFT, PSOANFIS, and PSOANN) were constructed for spatial prediction of landslides at the study area. More specifically, the RFBDFT was constructed by combining the RF ensemble and the BDFT classifier. In the RFBDFT, the RF was trained with 25 iterations and the BDFT was trained with 10 folds in internal cross-validation. The PSOANFIS was constructed by combining the PSO optimization and the ANFIS classifier, while the PSOANN was constructed by combining the PSO and the ANN classifier. In the PSOANFIS, the model was trained with 1500 iterations, 0.99 inertia weight, and 25 populations. In the PSOANN, the number of hidden layers was set to nine.
Step 3: The hybrid models was validated using several criteria, namely MEA, RMSE, and AUC. In this step, the models were validated in goodness-of-fit using the training dataset and predictive capability using the testing dataset.
Step 4: Mapping landslide susceptibility started with generation of Landslide Susceptibility Index (LSI) values for each pixel of the study area using the hybrid models. Thereafter, the LSIs were assigned to each pixel in the GIS environment and were reclassified using the natural break classification method [19].

5. Results and Discussion

Goodness-of-fit and prediction accuracy of the RFBFDT model are given in Figure 7. This figure has three parts including outputs and targets versus number of samples, errors versus number of samples, and frequency versus errors. In first part, the hybrid model graphically predicts the value of landslide and non-landslide, as output values, according to landside and non-landslide locations, as target values, overlaid with normalized conditioning factors.
The predictive values range between 0 and 1. The error part of this figure specifies the values of MSE and RMSE. The frequency versus errors depicts the values of error mean and standard deviation (SD). Results indicate that in the training phase using the RFBFDT model, the values of RMSE, RMSE, error mean, and error SD are 0.172, 0.414, −1.7 × 10−0.8, and 0.415, respectively. In the validation phase, these values are 0.189, 0.434, 0.017, and 0.436, respectively. In the PSOANFIS model, Figure 8 shows the results of goodness of fit and prediction accuracy using training and validation datasets. The results indicate that using the training dataset the values of RMSE, RMSE, error mean, and error SD are 0.14, 0.374, 0.005, and 0.375, respectively. These values using the validation dataset are 0.225, 0.474, −0.0298, and 0.476, respectively. Moreover, the results expressed that in the PSOANN model, the values of RMSE, RMSE, error mean, and error SD using training dataset are 0.168, 0.41, −0.0005, and 0.411, respectively. In the validation process, the results stated that the values of 0.312, 0.558, 0.0003, and 0.561 acquired for RMSE, RMSE, error mean, and error SD, respectively (Figure 9).
Landslide hybrid models were then evaluated through the ROC curve analysis. The results are given in Figure 10. The results of performance of the ensemble models exhibited that the RFBFDT model acquired the highest of AUC value (0.891), followed by the PSOANFIS model (0.890) and the PSOANN model (0.850). Additionally, the validation dataset confirmed that the RFBFDT ensemble models had the highest prediction accuracy, with an AUC value of 0.826. This is followed by the PSOANFIS model (AUC = 0.760) and the PSOANN model (AUC = 0.720). The results of AUC are completely in agreement with the results of model validation using MSE, RMSE, error mean, and error SD values in the training and validation phases. Overall, the RFBFDT ensemble model is the best model for predicting landslide locations compared to the other models (PSOANFIS and PSOANN).
Landslide susceptibility is assessed based on the landslide susceptibility index (LSI), which was generated from the model construction process. Thereafter, the obtained LSI was transferred to all pixels of the study area and they were classified for determining the susceptibility levels of landslides in the study area. Landslide susceptibility maps of the study area were finally constructed with five susceptibility classes including very low, low, moderate, high, and very high (Figure 11). The distribution of these susceptibility classes on the maps was calculated and shown in Figure 12. A map generated by the RFBDFT model indicated that 48% of the study area falls into the low class, 42% in the moderate class, and 11% in the high class, whereas, in the map generated by the PSOANFIS model, 25% of the study area is covered by the low class, 44% by the moderate class, and 31% by the high class. A further map generated by the PSOANN model indicated that 25% of the study area falls in the low class, 63% in the moderate class, and 13% in the high class.

6. Conclusions

In this study, three novel hybrid machine learning approaches, namely PSOANFIS, PSOANN, and RFBFDT, were applied for the development of landslide susceptibility maps. A spatial database of 167 past landslides of Van Chan district, Yen Bai province, Vietnam was used to generate the datasets for modeling, considering 12 landslide conditioning factors. Validation of the models was done using the AUC, MSE, and RMSE methods. The results show that the RFBFDT (AUC = 0.826, MSE = 0.189, RMSE = 0.434) is the best model in comparison to other hybrid models, namely PSOANFIS (AUC = 0.76, MSE = 0.225, RMSE = 0.474) and PSOANN (AUC = 0.72, MSE = 0.312, RMSE = 0.558). Thus, it can be reasonably concluded that the RFBFDT model can be used for better landslide susceptibility assessment, land use planning, and hazard management in landslide-prone areas. However, as these proposed models were applied in only one of the areas of Vietnam, their applicability must be tested in other hilly areas of Vietnam as well as other parts of the world. Moreover, another limitation of this research is that we considered a fixed combination of conditioning factors for modeling; therefore, it would be better to test the effectiveness of the models with different combinations of conditioning factors to explore the possibility of further improvement of the models.

Author Contributions

For research articles with several authors, a short paragraph specifying their individual contributions must be provided. The following statements should be used “Conceptualization, B.T.P.; Methodology, B.T.P.; Software, S.J.; Validation, H.S. and A.S.; Formal Analysis, I.P.; Investigation, V.V.N., B.T.V. and D.B.N.; Resources, B.T.P.; Data Curation, V.V.N., B.T.V. and D.B.N.; Writing—Original Draft Preparation, S.J., H.S., A.S., R.K. and J.M.C.; Writing—Review & Editing, I.P.; Visualization, I.P., R.K. and J.M.C.; Supervision, B.T.P.; Project Administration, D.T.B.; Funding Acquisition, D.T.B.”, please turn to the CRediT taxonomy for the term explanation. Authorship must be limited to those who have contributed substantially to the work reported.

Funding

This research received no external funding.

Acknowledgments

This study was supported by a research project named “Study on assessment of causes of landslides and proposal of measures for prevention and mitigation landslide hazards in some provinces in the northern parts of Vietnam” carried out at the Vietnam Academy for Water Resources, Hanoi, Vietnam.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cruden, D.M. A simple definition of a landslide. Bull. Eng. Geol. Environ. 1991, 43, 27–29. [Google Scholar] [CrossRef]
  2. Fell, R.; Corominas, J.; Bonnard, C.; Cascini, L.; Leroi, E.; Savage, W.Z. Guidelines for landslide susceptibility, hazard and risk zoning for land use planning. Eng. Geol. 2008, 102, 85–98. [Google Scholar] [CrossRef] [Green Version]
  3. Ercanoglu, M.; Gokceoglu, C. Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area (West Black Sea Region, Turkey). Eng. Geol. 2004, 75, 229–250. [Google Scholar] [CrossRef]
  4. Song, Y.; Gong, J.; Gao, S.; Wang, D.; Cui, T.; Li, Y.; Wei, B. Susceptibility assessment of earthquake-induced landslides using Bayesian network: A case study in Beichuan, China. Comput. Geosci. 2012, 42, 189–199. [Google Scholar] [CrossRef]
  5. Pradhan, B. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 2013, 51, 350–365. [Google Scholar] [CrossRef] [Green Version]
  6. Bui, D.T.; Pradhan, B.; Lofman, O.; Revhaug, I.; Dick, O.B. Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS. Comput. Geosci. 2012, 45, 199–211. [Google Scholar]
  7. Umar, Z.; Pradhan, B.; Ahmad, A.; Jebur, M.N.; Tehrany, M.S. Earthquake induced landslide susceptibility mapping using an integrated ensemble frequency ratio and logistic regression models in West Sumatera Province, Indonesia. Catena 2014, 118, 124–135. [Google Scholar] [CrossRef]
  8. Su, C.; Wang, L.; Wang, X.; Huang, Z.; Zhang, X. Mapping of rainfall-induced landslide susceptibility in Wencheng, China, using support vector machine. Nat. Hazards 2015, 76, 1759–1779. [Google Scholar] [CrossRef]
  9. Chen, W.; Yan, X.; Zhao, Z.; Hong, H.; Bui, D.T.; Pradhan, B. Spatial prediction of landslide susceptibility using data mining-based kernel logistic regression, naive Bayes and RBFNetwork models for the Long County area (China). Bull. Eng. Geol. Environ. 2018, 1–20. [Google Scholar] [CrossRef]
  10. Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Al-Katheeri, M.M. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 2016, 13, 839–856. [Google Scholar] [CrossRef]
  11. Dou, J.; Yamagishi, H.; Pourghasemi, H.R.; Yunus, A.P.; Song, X.; Xu, Y.; Zhu, Z. An integrated artificial neural network model for the landslide susceptibility assessment of Osado Island, Japan. Nat. Hazards 2015, 78, 1749–1776. [Google Scholar] [CrossRef]
  12. Chen, W.; Zhang, S.; Li, R.; Shahabi, H. Performance evaluation of the gis-based data mining techniques of best-first decision tree, random forest, and naïve bayes tree for landslide susceptibility modeling. Sci. Total Environ. 2018, 644, 1006–1018. [Google Scholar] [CrossRef]
  13. Pham, B.T.; Prakash, I. Machine Learning Methods of Kernel Logistic Regression and Classification and Regression Trees for Landslide Susceptibility Assessment at Part of Himalayan Area, India. Indian J. Sci. Technol. 2018, 11. [Google Scholar] [CrossRef]
  14. Abedini, M.; Ghasemian, B.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Pham, B.T.; Bin Ahmad, B.; Tien Bui, D. A novel hybrid approach of bayesian logistic regression and its ensembles for landslide susceptibility assessment. Geocarto Int. 2018, 1–31. [Google Scholar] [CrossRef]
  15. Zhang, T.; Han, L.; Chen, W.; Shahabi, H. Hybrid Integration Approach of Entropy with Logistic Regression and Support Vector Machine for Landslide Susceptibility Modeling. Entropy 2018, 20, 884. [Google Scholar] [CrossRef]
  16. Chen, W.; Shahabi, H.; Zhang, S.; Khosravi, K.; Shirzadi, A.; Chapi, K.; Pham, B.; Zhang, T.; Zhang, L.; Chai, H. Landslide susceptibility modeling based on gis and novel bagging-based kernel logistic regression. Appl. Sci. 2018, 8, 2540. [Google Scholar] [CrossRef]
  17. Hoang, N.-D.; Tien Bui, D. A novel relevance vector machine classifier with cuckoo search optimization for spatial prediction of landslides. J. Comput. Civ. Eng. 2016, 30, 04016001. [Google Scholar] [CrossRef]
  18. Chen, W.; Pourghasemi, H.R.; Naghibi, S.A. A comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in China. Bull. Eng. Geol. Environ. 2018, 77, 647–664. [Google Scholar] [CrossRef]
  19. Pourghasemi, H.R.; Pradhan, B.; Gokceoglu, C. Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat. Hazards 2012, 63, 965–996. [Google Scholar] [CrossRef]
  20. Jebur, M.N.; Pradhan, B.; Tehrany, M.S. Manifestation of LiDAR-derived parameters in the spatial prediction of landslides using novel ensemble evidential belief functions and support vector machine models in GIS. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 674–690. [Google Scholar] [CrossRef]
  21. Meinhardt, M.; Fink, M.; Tünschel, H. Landslide susceptibility analysis in central Vietnam based on an incomplete landslide inventory: Comparison of a new method to calculate weighting factors by means of bivariate statistics. Geomorphology 2015, 234, 80–97. [Google Scholar] [CrossRef]
  22. Gomez, H.; Kavzoglu, T. Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River Basin, Venezuela. Eng. Geol. 2005, 78, 11–27. [Google Scholar] [CrossRef]
  23. Bai, S.; Lü, G.; Wang, J.; Zhou, P.; Ding, L. GIS-based rare events logistic regression for landslide-susceptibility mapping of Lianyungang, China. Environ. Earth Sci. 2011, 62, 139–149. [Google Scholar] [CrossRef]
  24. Lee, S.; Min, K. Statistical analysis of landslide susceptibility at Yongin, Korea. Environ. Geol. 2001, 40, 1095–1113. [Google Scholar] [CrossRef]
  25. Galli, M.; Ardizzone, F.; Cardinali, M.; Guzzetti, F.; Reichenbach, P. Comparing landslide inventory maps. Geomorphology 2008, 94, 268–289. [Google Scholar] [CrossRef]
  26. Yilmaz, I. Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: Conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ. Earth Sci. 2010, 61, 821–836. [Google Scholar] [CrossRef]
  27. Ermini, L.; Catani, F.; Casagli, N. Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 2005, 66, 327–343. [Google Scholar] [CrossRef]
  28. Ayalew, L.; Yamagishi, H. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 2005, 65, 15–31. [Google Scholar] [CrossRef]
  29. Demir, G.; Aytekin, M.; Akgün, A.; Ikizler, S.B.; Tatar, O. A comparison of landslide susceptibility mapping of the eastern part of the North Anatolian Fault Zone (Turkey) by likelihood-frequency ratio and analytic hierarchy process methods. Nat. Hazards 2013, 65, 1481–1506. [Google Scholar] [CrossRef]
  30. Nefeslioglu, H.A.; Duman, T.Y.; Durmaz, S. Landslide susceptibility mapping for a part of tectonic Kelkit Valley (Eastern Black Sea region of Turkey). Geomorphology 2008, 94, 401–418. [Google Scholar] [CrossRef]
  31. Różycka, M.; Migoń, P.; Michniewicz, A. Topographic Wetness Index and Terrain Ruggedness Index in geomorphic characterisation of landslide terrains, on examples from the Sudetes, SW Poland. Z. Für Geomorphol. Suppl. Issues 2017, 61, 61–80. [Google Scholar]
  32. Jang, J.-S. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Manand Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
  33. Shirzadi, A.; Shahabi, H.; Chapi, K.; Bui, D.T.; Pham, B.T.; Shahedi, K.; Ahmad, B.B. A comparative study between popular statistical and machine learning methods for simulating volume of landslides. CATENA 2017, 157, 213–226. [Google Scholar] [CrossRef]
  34. Chen, W.; Panahi, M.; Pourghasemi, H.R. Performance evaluation of GIS-based new ensemble data mining techniques of adaptive neuro-fuzzy inference system (ANFIS) with genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO) for landslide spatial modelling. Catena 2017, 157, 310–324. [Google Scholar] [CrossRef]
  35. Sarikaya, N.; Guney, K.; Yildiz, C. Adaptive neuro-fuzzy inference system for the computation of the characteristic impedance and the effective permittivity of the micro-coplanar strip line. Prog. Electromagn. Res. 2008, 6, 225–237. [Google Scholar] [CrossRef]
  36. Turkmen, I.; Guney, K. Genetic tracker with adaptive neuro-fuzzy inference system for multiple target tracking. Expert Syst. Appl. 2008, 35, 1657–1667. [Google Scholar] [CrossRef]
  37. Aghdam, I.N.; Varzandeh, M.H.M.; Pradhan, B. Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran). Environ. Earth Sci. 2016, 75, 553. [Google Scholar] [CrossRef]
  38. Haykin, S.; Haykin, S. Neural Networks and Learning Machines. vol. 3; Pearson. Prentice Hall: Upper Saddle River, NJ, USA, 2009; ISBN 10: 0-13-147139-2. [Google Scholar]
  39. Zare, M.; Pourghasemi, H.R.; Vafakhah, M.; Pradhan, B. Landslide susceptibility mapping at Vaz Watershed (Iran) using an artificial neural network model: A comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms. Arab. J. Geosci. 2013, 6, 2873–2888. [Google Scholar] [CrossRef]
  40. Pham, B.T.; Bui, D.T.; Prakash, I.; Dholakia, M. Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena 2017, 149, 52–63. [Google Scholar] [CrossRef]
  41. Bui, D.T.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 2016, 13, 361–378. [Google Scholar]
  42. Eberhart, R.; Kennedy, J. A new optimizer using particle swarm theory. In Proceedings of the Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, 4–6 October 1995; pp. 39–43. [Google Scholar]
  43. Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A new classifier ensemble method. Ieee Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef] [PubMed]
  44. Rodriguez, J.J. Rotation forest and random oracles: Two classifier ensemble methods. In Proceedings of the Twentieth IEEE International Symposium on Computer-Based Medical Systems, Maribor, Slovenia, 20–22 June 2007; p. 3. [Google Scholar]
  45. Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar] [CrossRef]
  46. Ozcift, A.; Gulten, A. Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Comput. Methods Programs Biomed. 2011, 104, 443–451. [Google Scholar] [CrossRef] [PubMed]
  47. Dufour, D. Finding Cost-Efficient Decision Trees; University of Waterloo: Waterloo, ON, Canada, 2014. [Google Scholar]
  48. Kumar, N.; Reddy, G.O.; Chatterji, S. Evaluation of best first decision tree on categorical soil survey data for land capability classification. Int. J. Comput. Appl. 2013, 72, 5–8. [Google Scholar] [CrossRef]
  49. Gorum, T.; Gonencgil, B.; Gokceoglu, C.; Nefeslioglu, H. Implementation of reconstructed geomorphologic units in landslide susceptibility mapping: The Melen Gorge (NW Turkey). Nat. Hazards 2008, 46, 323–351. [Google Scholar] [CrossRef]
  50. Chen, W.; Shirzadi, A.; Shahabi, H.; Ahmad, B.B.; Zhang, S.; Hong, H.; Zhang, N. A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China. Geomat. Nat. Hazards Risk 2017, 8, 1955–1977. [Google Scholar] [CrossRef] [Green Version]
  51. Zhou, C.; Yin, K. Landslide displacement prediction of WA-SVM coupling model based on chaotic sequence. Electr. J. Geol. Eng. 2014, 19, 2973–2987. [Google Scholar]
  52. Pham, B.T.; Bui, D.; Prakash, I.; Dholakia, M. Evaluation of predictive ability of support vector machines and naive Bayes trees methods for spatial prediction of landslides in Uttarakhand state (India) using GIS. J. Geomat. 2016, 10, 71–79. [Google Scholar]
  53. Shirzadi, A.; Bui, D.T.; Pham, B.T.; Solaimani, K.; Chapi, K.; Kavian, A.; Shahabi, H.; Revhaug, I. Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environ. Earth Sci. 2017, 76, 60. [Google Scholar] [CrossRef]
  54. Shahabi, H.; Hashim, M. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment. Sci. Rep. 2015, 5, 9899. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  55. Chen, W.; Peng, J.; Hong, H.; Shahabi, H.; Pradhan, B.; Liu, J.; Zhu, A.-X.; Pei, X.; Duan, Z. Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China. Sci. Total Environ. 2018, 626, 1121–1135. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Location of the Van Chan district, Vietnam.
Figure 1. Location of the Van Chan district, Vietnam.
Forests 10 00157 g001
Figure 2. Geological map of the study area.
Figure 2. Geological map of the study area.
Forests 10 00157 g002
Figure 3. Photos of landslides in the Van Chan district (Photographs by Thai Minh Hai, Vo Nguyen Thien, and Nguyen Van Phu).
Figure 3. Photos of landslides in the Van Chan district (Photographs by Thai Minh Hai, Vo Nguyen Thien, and Nguyen Van Phu).
Forests 10 00157 g003
Figure 4. Thematic maps of the study area: (A) Slope, (B) distance to faults, (C) curvature, (D) slope aspect map, (E) slope length, (F) distance to rivers, (G) elevation, (H) distance to roads, (I) lithology, (J) valley depth, (K) TWI, and (L) TRI.
Figure 4. Thematic maps of the study area: (A) Slope, (B) distance to faults, (C) curvature, (D) slope aspect map, (E) slope length, (F) distance to rivers, (G) elevation, (H) distance to roads, (I) lithology, (J) valley depth, (K) TWI, and (L) TRI.
Forests 10 00157 g004aForests 10 00157 g004bForests 10 00157 g004c
Figure 5. The architecture of ANFIS.
Figure 5. The architecture of ANFIS.
Forests 10 00157 g005
Figure 6. Methodology chart.
Figure 6. Methodology chart.
Forests 10 00157 g006
Figure 7. Analysis of errors of the RFBFDT model using (A) the training dataset and (B) the validating dataset.
Figure 7. Analysis of errors of the RFBFDT model using (A) the training dataset and (B) the validating dataset.
Forests 10 00157 g007
Figure 8. Analysis of errors of the PSOANFIS model using (A) the training dataset and (B) the validating dataset.
Figure 8. Analysis of errors of the PSOANFIS model using (A) the training dataset and (B) the validating dataset.
Forests 10 00157 g008
Figure 9. Analysis of errors of the PSOANN model using (A) the training dataset and (B) the validating dataset.
Figure 9. Analysis of errors of the PSOANN model using (A) the training dataset and (B) the validating dataset.
Forests 10 00157 g009
Figure 10. ROC curves and AUC values of: (A) RFBFDT with the training dataset, (B) RFBFDT with the validating dataset, (C) PSOANFIS with the training dataset, (D) PSOANFIS with the validating dataset, (E) PSOANN with the training dataset, and (F) PSOANN with the validating dataset.
Figure 10. ROC curves and AUC values of: (A) RFBFDT with the training dataset, (B) RFBFDT with the validating dataset, (C) PSOANFIS with the training dataset, (D) PSOANFIS with the validating dataset, (E) PSOANN with the training dataset, and (F) PSOANN with the validating dataset.
Forests 10 00157 g010
Figure 11. Landslide susceptibility maps of different models: (A) RFBFDT, (B) PSOANFIS, and (C) PSOANN.
Figure 11. Landslide susceptibility maps of different models: (A) RFBFDT, (B) PSOANFIS, and (C) PSOANN.
Forests 10 00157 g011aForests 10 00157 g011b
Figure 12. Distribution of classes on the susceptibility maps.
Figure 12. Distribution of classes on the susceptibility maps.
Forests 10 00157 g012
Table 1. Geological formations and complexes and the main characteristics of the research zone.
Table 1. Geological formations and complexes and the main characteristics of the research zone.
NoGeological Formations and ComplexesNotationArea (%)Landslide Pixels (%)Thickness (m)
1Ban Cai FormationD3bc0.761.18810
2Ban Nguon FormationD1bn3.182.4-
3Ban Pap FormationD1-2bp1.613.0560
4Bac Son FormationC-Pbs4.621.2360–770
5Ba Vi ComplexU/T1bv0.040-
6Ben Khe FormationЄ-Obk1.230300–500
7Ca Vinh ComplexG/PP-MPcv13.174.19-
8Cam Duong FormationЄ14.724.79500–700
9Nghia Lo FormationT1-2nl0.226.59500–550
10Phu Sa Phin ComplexsG,Sy/Kpp0.427.18-
11Quaternary-4.187.782–18
12Song Mua FormationD1sm4.018.98700–800
13Da Dinh FormationNP-Є1đđ0.980200–400
14Cha Pa FormationNPcp3.075.39500–700
15Suoi Bang FormationT3n-rsb8.409.58990
16Tu Le–Ngoi Thia ComplextR/KtlR/Knt21.5610.78-
17Tram Tau FormationJ-Ktt15.4210.18200–800
18Unknown in age dykes and veins-0.2211.38-
19Van Yen FormationN12vy0.040100
20Vien Nam FormationT1vn0.450800–1500
21Xom Giau ComplexG/NPxg0.250-
22Sinh Quyen FormationPP-MPsq9.898.381600–1800
23Yen Chau FormationK2yc1.580300
Table 2. Analysis of frequency of landslides on the thematic maps.
Table 2. Analysis of frequency of landslides on the thematic maps.
No.ParameterAttributeClassNumber of Pixels in ClassNo. of Landslide in Pixels% Class Pixels% Landslide PixelsFR
1Slope (o)10–7.92515,596017.1800.00
27.92–17.82541,4705118.0430.541.69
317.82–26.07711,5575723.7134.131.44
426.07–34.65668,5464222.2725.151.13
534.65–44.88431,7261414.388.380.58
644.88–84.16132,68334.421.80.41
2Aspect1Flat143,31704.7700.00
2North 327,2832110.912.571.15
3Northeast 418,2412613.9315.571.12
4East395,5233113.1818.561.41
5Southeast 325,2182210.8313.171.22
6South 339,8442411.3214.371.27
7Southwest 388,1761812.9310.780.83
8West 349,2641311.647.780.67
9Northwest 314,7121210.487.190.69
3Elevation (m)10–200311,5861110.386.590.63
2200–400822,6805327.4131.741.16
3400–600583,1905419.4332.341.66
4600–800474,3872615.815.570.99
5800–1000328,8001610.959.580.87
61000–1200218,79957.292.990.41
71200–1400122,49624.081.20.29
81400–160065,69502.1900.00
91600–180035,63201.1900.00
101800–254238,31301.2800.00
4Curvature1Concave (<−0.05)1,251,9739341.7155.691.34
2Flat (−0.05–0.05)477,452015.9100.00
3Convex (>0.05)1,272,1537442.3844.311.05
5Lithology1Group A1,156,2179438.5256.291.46
2Group B253,577178.4510.181.20
3Group C208,54736.951.80.26
4Group D335,0111811.1610.780.97
5Group E419,594913.985.390.39
6Group F124,35344.142.40.58
7Group G504,2702216.813.170.78
6Slope length (m)10–20917,0773630.5521.560.71
220–50440,2962014.6711.980.82
350–100586,1023319.5319.761.01
4100–150343,2412511.4414.971.31
5150–200227,146217.5712.571.66
6200–2501487,7163216.2519.161.18
7Valley depth (m)10–51,379,4298045.9647.91.04
25–30538,9483417.9620.361.13
330–60320,9951610.699.580.90
460–100272,900109.095.990.66
5100–150221,974207.411.981.62
6150–656267,33278.914.190.47
8Distance (Roads) (m)10–100528,1028017.5947.92.72
2100–200402,6411913.4111.380.85
3200–300300,8341510.028.980.90
4300–400235,154107.835.990.76
5>4001,534,8384351.1325.750.50
9Distance (Rivers) (m)10–100692,4913223.0719.160.83
2100–200599,3335219.9731.141.56
3200–300469,9112915.6617.371.11
4300–400342,1221911.411.381.00
5>400897,7123529.9120.960.70
10Distance (Faults) (m)10–250442,1003014.7317.961.22
2250–500393,9562813.1316.771.28
3500–750342,6412111.4212.571.10
4750–900179,67795.995.390.90
5>9001,643,1957954.7447.310.86
11TWI10–8800,7512226.713.170.49
28–986,52822.891.20.42
39–10240,496178.0210.181.27
410–11360,5062312.0213.771.15
511–241,510,52910350.3761.681.22
12TRI10–1366,542012.2100.00
21–3274,886129.167.190.78
33–5460,4664615.3427.541.80
45–7596,5764919.8829.341.48
5>71,303,1086043.4135.930.83
Table 3. Lithology groups and their characteristics.
Table 3. Lithology groups and their characteristics.
No.GroupNameCharacteristics of Rock Types
1AAcid-neutral igneous magmatic rocks Dacite, felsite, rhyolite, and andesite rocks
2BTerrigenous sedimentary rocks with rich aluminosilicate componentsRhyolites, gritstone, siltstone, carbonates, claystone, alternated dacites, sandstone, and andesite sediments
3CTerrigenous sedimentary and transformative rocks with rich quartz segmentsQuartz–mica sandstone, gritstone, sandstone, claystone, siltstone, alternated rhyolites, dacites, carbonates, quartzitic sandstone, andesite sediments, cherty shale
4DCarbonate rocksCherty limestone, clayish limestone, and dolomitized limestone
5EAcid-neutral intrusive magmatic rocksPlagioclase–granite, rhyolite, felsite, dacite, andesite rocks, granophyre, granodiorite, granosyenite, diorite, and quartz-diorite
6FQuaternary depositsPluvial and alluvial sedimentary: pebbles, cobble, stone, sand, silt
7GMetamorphic rocks with rich aluminosilicate componentsQuartz sericite–schist, quartz mica–schist, quartzite, sericite–quartzite

Share and Cite

MDPI and ACS Style

Nguyen, V.V.; Pham, B.T.; Vu, B.T.; Prakash, I.; Jha, S.; Shahabi, H.; Shirzadi, A.; Ba, D.N.; Kumar, R.; Chatterjee, J.M.; et al. Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling. Forests 2019, 10, 157. https://doi.org/10.3390/f10020157

AMA Style

Nguyen VV, Pham BT, Vu BT, Prakash I, Jha S, Shahabi H, Shirzadi A, Ba DN, Kumar R, Chatterjee JM, et al. Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling. Forests. 2019; 10(2):157. https://doi.org/10.3390/f10020157

Chicago/Turabian Style

Nguyen, Vu Viet, Binh Thai Pham, Ba Thao Vu, Indra Prakash, Sudan Jha, Himan Shahabi, Ataollah Shirzadi, Dong Nguyen Ba, Raghvendra Kumar, Jyotir Moy Chatterjee, and et al. 2019. "Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling" Forests 10, no. 2: 157. https://doi.org/10.3390/f10020157

APA Style

Nguyen, V. V., Pham, B. T., Vu, B. T., Prakash, I., Jha, S., Shahabi, H., Shirzadi, A., Ba, D. N., Kumar, R., Chatterjee, J. M., & Tien Bui, D. (2019). Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling. Forests, 10(2), 157. https://doi.org/10.3390/f10020157

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop