Next Article in Journal
Analysis of Debris Flow Protective Barriers Using the Coupled Eulerian Lagrangian Method
Next Article in Special Issue
The Evaluation of Rainfall Warning Thresholds for Shallow Slope Stability Based on the Local Safety Factor Theory
Previous Article in Journal
Identification and Verification of the Movement of the Hidden Active Fault Using Electrical Resistivity Tomography and Excavation
Previous Article in Special Issue
Landslide Susceptibility Assessment by Machine Learning and Frequency Ratio Methods Using XRAIN Radar-Acquired Rainfall Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Unique Conditions Model for Landslide Susceptibility Mapping

by
Florimond De Smedt
* and
Prabin Kayastha
Department of Water and Climate, Vrije Universiteit Brussel, 1050 Brussels, Belgium
*
Author to whom correspondence should be addressed.
Geosciences 2024, 14(8), 197; https://doi.org/10.3390/geosciences14080197
Submission received: 30 May 2024 / Revised: 18 July 2024 / Accepted: 22 July 2024 / Published: 24 July 2024
(This article belongs to the Special Issue Landslide Monitoring and Mapping II)

Abstract

:
Several methods and approaches have been proposed to assess landslide susceptibility. The likelihood of landslides occurring can be determined by applying statistical models to historical landslides, taking into account controlling factors. Popular methods for predicting the probability of landslides are weights-of-evidence and logistic regression. We discuss the assumptions and interpretations of these methods, the relationships between them, and their strengths and weaknesses in case of categorical factors. Of particular interest is the conditional independence of the controlling factors and its effect on model bias. To avoid lack of conditional independence of factors and model bias, we present a unique conditions model that is always unbiased. To illustrate the theoretical developments, a practical application is given using observed landslides and geo-environmental factors from a previous study. The unique conditions model appears superior to the other models.

1. Introduction

Landslide susceptibility models are used to predict the spatial occurrence of slope failure given a range of geo-environmental conditions, allowing identification of landslide-prone areas, and supporting spatial planning to reduce landslide risk [1,2,3]. Various methods and approaches have been proposed to assess landslide susceptibility, such as heuristic or index-based zoning techniques, physically-based models, and statistically based classification methods [4,5,6].
The likelihood of occurrences can be determined by fitting a statistical model to historical landslides, taking into account explanatory factors affecting landslides, such as geology, topography, hydrometry, land use, etc. The main characteristics of statistical models are a high efficiency and better understanding of the relationships between the spatial factors used to identify areas prone to landslides [7,8]. Lee [5] assessed the status of landslide susceptibility mapping based on 776 papers published over a 20-year period (1999–2018) and found that commonly used statistical methods were logistic regression, the frequency ratio, and weights-of-evidence. A review of statistically based modeling of landslide susceptibility, including 565 peer-reviewed articles from 1983 to 2016, was presented by Reichenbach et al. [6], who noted that the most applied statistical methods for modeling of landslide susceptibility were logistic regression, neural network analysis, and weights-of-evidence.
Weights-of-evidence is a very popular technique for landslide susceptibility mapping because it is easy to use and can easily be incorporated in geographic information systems [7,8,9]; recent examples are [10,11,12,13,14,15,16]. The purpose of weights-of-evidence is to weigh and combine the controlling factors to predict the probability of landslide occurrence. However, weights-of-evidence is hampered by the assumption of conditional independence of the controlling factors, which is often untrue in practice. Violation of conditional independence between factors has received much attention in geosciences in mineral prospectivity modeling. When there is significant conditional dependency between factors, the probabilities derived from weights-of-evidence are biased and generally too large compared to the observations [9,17]. Several attempts have been proposed to account for model bias or to relax the conditional independence assumption, such as modified weighing [18,19], additive mixed terms [20,21], semi-Naïve Bayes approaches [22,23], or machine learning algorithms such as decision trees, random forest, and artificial neural networks [24,25,26,27]. However, to date, no generally accepted improvement has been found.
Logistic regression is one of the most common methods for modeling landslide susceptibility because it is easy to implement and very efficient for analyzing relationships between a binary response variable and numerical or categorical explanatory variables [28,29,30]. Budimir et al. [31] presented a review of landslide probability mapping using logistic regression, based on 75 peer-reviewed papers, and concluded that there is no consistent methodology for applying logistic regression analysis for landslide susceptibility. In particular, the method by which explanatory factors or factor classes are selected is often not well explained. Furthermore, the majority of the published papers apply a combination of frequency ratio and logistic regression, where factor classes are replaced by their landslide frequency ratio and logistic regression is only applied to the factors; recent examples are [32,33,34,35,36,37].
Most studies using logistic regression to predict landslide susceptibility provide little or no information on the conditional independence of factors and model bias. In mineral prospectivity modeling, on the other hand, it is well known that logistic regression models always produce unbiased estimates, regardless of whether the controlling factors are conditionally independent with respect to the target variable, as opposed to weights-of-evidence [19,38]. Moreover, it is well known that weights-of-evidence and logistic regression produce similar results if the predictor factors are categorical and conditionally independent [18,19,20,21,38,39].
A disadvantage of logistic regression is that estimated regression coefficients can have large variances unless there is conditional independence of the controlling factors [38]. However, in the case where the factors are categorical, interaction terms in logistic regression models can compensate for violations of conditional independence [21,39]. Therefore, combinations of factors or factor classes can be added to the model as additional terms to compensate for the lack of conditional independence of the factors. Additional interaction terms result in a hierarchy of models, where each former model is a special case of the successive latter model and is therefore more restrictive [39]. However, the practical application of this method has been questioned because the number of additive terms can increase rapidly, so that the estimation of the logistic regression coefficients becomes increasingly difficult, if not impossible, given the accuracy of the numerical solution procedure and the limited number of training data [20,38].
In this study, we focus on statistical methods for landslide susceptibility mapping, which predict the conditional probability of landslides with categorical controlling factors, in particular, weights-of-evidence and logistic regression. We investigate how modeling techniques, conditional independence of the factors, and model bias are related. A unique conditions model is proposed that reproduces observed landslide probabilities for any combination of categorical controlling factors, without any bias. The feasibility, strengths, and weaknesses of the modeling approaches are illustrated and tested through application to a practical case study.

2. Materials and Methods

2.1. Preliminaries

We denote the observation of a landslide in the study area with a binary indicator x 0 , such that x 0 = 1 indicates the presence and x 0 = 0 the absence of a landslide. Similarly, landslide controlling factors are denoted with a set of binary indicators x = x i j , where i = 1 , n refers to n factor types and j = 1 , n i to n i subtypes of factor i , so that x i j = 1 indicates the presence and x i j = 0 the absence of factor class x i j . We assume that all factors completely overlap the study area and that the classes of each factor do not overlap, so
j = 1 n i x i j = 1 .
We also define the unconditional landslide probability p 0 = p x 0 = 1 , generally denoted as the prior probability; the conditional landslide probability for a single factor class p i j = p x 0 = 1 x i j ; and the conditional landslide probability for all factors combined p x = p x 0 = 1 x , which is commonly referred to as the posterior probability. When factor class x i j promotes landslides, p i j exceeds p 0 , and vice-versa. The same applies to the combined set of factors; when p x is larger than p 0 , the environmental conditions are more favorable for landslides to occur.
Estimates of the prior probability and the conditional probability for a given factor class can be obtained directly from landslide observations as follows:
p 0 = 1 A A   x 0 d A ,
p i j = 1 A i j A i j   x 0 d A ,
where A is the total area of the study domain and A i j is the area occupied by factor class x i j = 1 .
The posterior probability p x is not easy to derive from the data, and finding a suitable model to predict posterior probabilities is the main goal of a landslide susceptibility study. Various statistically based methods and approaches have been applied in practice, but little attention is paid to whether estimated probabilities are reliable. Model bias refers to systematic errors, which can result from inaccurate data or from bias in the algorithms used to validate the model. In geosciences, it is common to verify that the mean of the posterior probability is equal to the observed prior probability [9,17], so
1 A A   p x d A = p 0 .
In addition, one can also verify whether the conditional landslide probability for a single factor class agrees with the observations, i.e.,
1 A i j A i j   p x d A = p i j .
Note that if Equation (5) holds, then Equation (4) is also satisfied, because
A   p x d A = j = 1 n i A i j   p x d A = j = 1 n i p i j A i j = j = 1 n i A i j   x 0 d A = A   x 0 d A = p 0 A ,
which can be generalized as: any area partitioned into unbiased sub-areas is unbiased.
However, in practice, models for landslide susceptibility mapping are often biased, resulting in incorrect predictions, which are generally overlooked or ignored.

2.2. Weights-of-Evidence

Weights-of-evidence is a very popular and widely used method for predicting the probability of landslides. In weights-of evidence, the posterior probability p x is derived from Bayes’ theorem as
p x 0 = 1 x = p x = p 0 p x x 0 = 1 p x .
Similarly, the posterior probability for absence of landslides p x x 0 = 0 is derived as
p x 0 = 0 x = 1 p x = ( 1 p 0 ) p x x 0 = 0 p x .
Combined, this leads to an expression for the odds of the presence versus the absence of a landslide:
p x 1 p x = p 0 1 p 0 p x x 0 = 1 p x x 0 = 0 .
Using the logit function, this can be rewritten as
l o g i t p x = l o g i t p 0 + l o g p x x 0 = 1 p x x 0 = 0 .
Furthermore, weights-of-evidence assumes a conditional independence of the controlling factors, so that the joint probabilities on the right-hand side of Equation (10) can be derived from the product of individual probabilities, so
l o g i t p x = l o g i t p 0 + i = 1 n w i j ,
where w i j are factor class weights given by
w i j = l o g p x i j x 0 = 1 p x i j x 0 = 0 .
Equation (11) is a statistical equation. To use it as a model for the prediction of landslide probabilities in a domain, it must be reformulated in algebraic form as
l o g i t ( p x ) = l o g i t ( p 0 ) + i = 1 n j = 1 n i w i j x i j ,
clearly showing that the weights only apply if the corresponding factor class is present.
In practice, estimates of the probabilities in Equation (12) can be obtained from observed landslides, so the weights are derived as
w i j = l o g p i j p 0 / 1 p i j 1 p 0 = l o g i t p i j l o g i t p 0 ,
showing that there is a one-to-one relationship between the weight w i j and the observed landslide probability p i j of a factor class.
Because landslides are rare, landslides may not be observed if the area of a factor class is small, so that p i j = 0 , which poses a problem for the application of Equation (14) because the logarithm of zero is infinite. In such a case, one is accustomed to setting the weight equal to zero, although this violates Equation (14) and introduces a model bias because w i j = 0 implies that p i j = p 0 , which contradicts what is observed.
In the case of conditional independence of all factors, Equations (4) and (5) hold, showing that the posterior probabilities are unbiased; the proof is given in Section 2.3. However, in practice, the conditional independence of controlling factors is generally not guaranteed, so the posterior probabilities obtained by weight-of-evidence are biased. Usually, violation of conditional independence results in posterior probabilities that are too large, and conversely, if the model results are found to be biased, this may be due to a lack of conditional independence of the factors.

2.3. Logistic Regression

Multiple logistic regression is probably the most commonly used technique to predict posterior landslide probabilities. Starting from Equation (13), the idea arises to derive the weights by logistic regression. However, there is a complication, namely that the factor classes are linearly dependent, as shown by Equation (1), which is not allowed in multiple regression. To get around this, one class in each factor must be removed: usually the first class, although any class will do. Therefore, the logistic regression model is formulated as follows:
l o g i t ( p x ) = β 0 + i = 1 n j = 2 n i β i j x i j ,
where β 0 and β i j are model parameters to be estimated by maximum likelihood, which is a measure of fit between predicted probabilities and the observed data. The log-likelihood is given by [40] as
l o g ( L ) = A x 0 l o g p x + ( 1 x 0 ) l o g 1 p x d A .
Maximum likelihood is obtained by setting the derivatives of the log-likelihood for each parameter equal to zero, so
l o g ( L ) β 0 = A ( x 0 p x ) d A = 0 ,
l o g ( L ) β i j = A ( x 0 p x ) x i j d A = A i j ( x 0 p x ) d A = 0 ,
which can be solved to determine β 0 and β i j . In practice, this requires specialized software because the logistic regression model is non-linear. Note that these equations are equivalent to Equations (4) and (5), which express the model bias. Therefore, the maximum likelihood solution also ensures that the posterior probabilities predicted by the model are unbiased, which is an important advantage of using logistic regression.
In the case of conditional independence of the factors, logistic regression and weights-of-evidence are equivalent [21,37,39]. To prove the correspondence between weights-of-evidence and logistic regression, eliminate the first class of each factor in Equation (13) by substituting x i 1 = 1 j = 2 n i x i j , so
l o g i t ( p x ) = l o g i t ( p 0 ) + i = 1 n w i 1 + i = 1 n j = 2 n i w i j w i 1 x i j .
Comparison with Equation (15) shows the correspondence between the weights and the logistic regression coefficients as
β 0 = l o g i t ( p 0 ) + i = 1 n w i 1 ,
β i j = w i j w i 1 .
Similar expressions have been presented in the literature, for example [21,37,39].
A final note about logistic regression is that it cannot handle missing data. Therefore, factor classes without observed landslides should be excluded from the model.

2.4. Unique Conditions

One way to avoid conditional dependency is to overlay factors to create combined factor classes, which can improve conditional independence [20]. Ultimately, all factors can be overlaid and all factor classes combined to identify unique conditions, which can be indicated as
y k = i = 1 n x i j ,
where the subscript j indicates any factor class of factor i in the range 1 to n i , and y k is a binary indicator such that y k = 1 indicates the presence and y k = 0 the absence of a unique combination of factor classes. There are i = 1 n n i possibilities for y k , but most of these will not occur because many factor classes do not overlap. Furthermore, all occurring combinations are categorical and non-overlapping, and fully cover the study area, so
k y k = 1 .
Therefore, the unique conditions form a conditionally independent set y = y k of controlling factors, so that a model for predicting the conditional landslide probability can be obtained as
l o g i t p y = l o g i t p 0 + k w k y k ,
where p y is the posterior landslide probability, given that the set of unique conditions y and w k are weights that can be obtained from landslide observations, similarly as in Equation (14), so that
w k = l o g i t p k l o g i t p 0 ,
where p k is the frequency of landslides observed in area A k occupied by factor y k = 1 , given by
p k = 1 A k A k x 0 d A .
When Equation (25) is inserted into Equation (24) and Equation (23) is used, it follows that
l o g i t p y = k l o g i t p k y k .
This shows that the predicted posterior probability p y is constant in the area occupied by y k = 1 and equal to the probability p k observed in that area. Since the study area is completely covered by the set y and each factor class is covered by a subset of y , Equations (4) and (5) apply, which shows that there is no model bias. Furthermore, the model is unbiased in any subset that can be composed of y . Therefore, we can assume that there is no other model that can produce better results than this.
When all factors are conditionally independent, the weights-of-evidence model and the unique conditions model are equivalent, because both methods solve Equation (10) exactly and predict the same posterior probabilities: the latter by combining the observed probabilities of all possible combinations of the factor classes, and the former by combining the observed probabilities of the factor classes, which should lead to the same result if the factors are conditionally independent. Since the unique conditions model is unbiased, weights-of-evidence must also be unbiased because the results are the same if the factors are conditionally independent.

3. Results

3.1. Test Case

For demonstration and discussion, we consider a case study derived from Kayastha et al. [41] and De Smedt et al. [23]. In these studies, conditional probabilities were calculated to predict landslides in a river basin in Nepal. The basin covers an area of 124.26 km2 and 295 landslides were observed, ranging in size from about 400 m2 to 0.1 km2. The total area of landslides is 2.35 km2 or 1.9% of the total basin. The prior probability of landslides is therefore p 0 = 0.019 . The geo-environmental conditions that influence landslides consist of eight factors with three to nine classes, as listed in Table 2. All factors are categorical and given in the form of raster maps with a resolution of 20 m, as shown in Figure 1. Details can be found in the original studies.
Weights-of-evidence results are obtained using the R Information package [42], and for logistic regression, we used the glm generalized maximum likelihood fitting procedure from the R stats package [43]. The unique conditions method can be performed using standard GIS techniques or programmed numerically accordingly, as done in this study (Figure 2).

3.2. Conditional Independence of the Factors

We first present a simple illustrative application by considering only a single controlling factor, namely geology, which is the most important predictor of landslides, as found in the previous studies [23,41]. Because the nine geology classes are non-overlapping, they are conditionally independent, so the weights-of-evidence model and the unique conditions model become identical. The results for weights-of-evidence and logistic regression are presented in Table 1. The second column lists the observed prior probability p j for each geology class (subscript i is deleted because there is only one factor), which are also the probabilities p k predicted by the unique conditions model. The third column gives the weights w j derived from the observed probabilities p j as given by Equations (14) or (25). Thus, predictions of the posterior landslide probability with weights-of-evidence given geology as the only controlling factor correspond exactly to the observed probabilities, because the geology classes are conditionally independent.
The last column gives the coefficients β j of the logistic regression model obtained by maximum likelihood, which relate to the w j values as given by Equation (21). For example, β 2 = 0.536 is equal to w 2 w 1 = 1.780 + 1.244 = 0.536 . The last row of the table lists the prior probability p 0 , logit( p 0 ) and the intercept of the logistic regression β 0 , the latter corresponding to Equation (20) since β 0 = 5.193 is equal to l o g i t p 0 + w 1 = 3.949 1.244 = 5.193 . Hence, all model results are equivalent and unbiased due to the conditional independence of the controlling factor.

3.3. Conditional Dependence of the Factors

In the following illustrative application, all controlling factors are considered. The results for weights-of-evidence and logistic regression are presented in Table 2. The unique conditions model is also applied, but these results are too extensive to present in tabular form. The column marked by p i j lists the observed landslide probability for all factor classes. The column denoted by w i j gives the weights of evidence, which relate to the observed prior probabilities as given by Equation (14). However, predictions of the posterior landslide probability with weights-of-evidence do not match the observed probabilities, because there is no conditional independence of the factors. For example, the mean of the posterior probability is 0.027, which is larger than the observed prior probability p 0 = 0.019 . Likewise, the mean of the posterior probabilities in each factor class area is not equal to the observed probability in that area. The weights-of-evidence model is therefore biased because the controlling factors are not conditionally independent.
The β i j coefficients of the logistic regression model are again obtained by maximum likelihood. However, unlike the previous case, the resulting β i j values are not related to the w i j values as expressed by Equation (21). For example, for the second geology class, we obtain β 62 = 0.116 , which is clearly different from w 62 w 61 = 0.53 . Moreover, the logit of the prior probability logit( p 0 ) and the estimated intercept of the logistic regression β 0 , listed in the last row of Table 2, do not correspond to Equation (20) since β 0 = 7.438 , while l o g i t p 0 + i = 1 n w i 1 = 3.949 4.793 = 8.742 , because the factors are not conditionally independent. Nevertheless, predictions of the posterior probability with logistic regression for each factor class satisfy Equation (5) and agree with the observed probabilities. Thus, the logistic model is unbiased with respect to the factor classes and the overall domain, while the weights-of-evidence model is not.
The results obtained with the unique conditions model are also unbiased because the unique condition areas do not overlap, and are therefore conditionally independent. This model therefore precisely predicts the observed landslide probabilities in the total domain and in all factor classes.
So, in this case, weights-of-evidence is biased while logistic regression and the unique conditions model are unbiased. Furthermore, logistic regression is unbiased for the factor classes and the total domain, while the unique conditions model is unbiased for all possible combinations of factor classes. Thus, the models are not equivalent, logistic regression is better than weights-of-evidence, and the unique conditions model performs best.
Maps of the estimated posterior probabilities obtained with the different models are presented in Figure 3. Note that the distribution of the probabilities is very skewed. Most values are lower than the prior probability p 0 = 0.019 and correspond to areas where landslide susceptibility is very low. These areas are marked with the blue color in the figures and cover a large part of the study area. Posterior probabilities that are greater than the prior probability are marked by the green and yellow colors in the maps. These are areas prone to landslide and mainly occur in the eastern and southern parts of the study area. Note that the map obtained with weights-of-evidence shows more of these areas than the other maps, which is the result of the model bias leading to an overprediction of landslide susceptibility. Also note that that the unique conditions model indicates some landslide-prone areas in the northeastern part of the domain that are absent or less pronounced on the other maps. Orange and red colors indicate areas with very high landslide susceptibility. Such areas are present in the western part of the map obtained with the unique conditions model and, to a lesser extent, in the map obtained with weights-of-evidence, but the latter is likely due to overprediction due to model bias.
There appears to be a close similarity between the logistic regression and unique conditions maps, but the logistic map is more blurred, while the unique conditions map is more crisp. It appears that the map obtained by logistic regression is a smoothed version of the map obtained by the unique conditions model.
The performance of the models for classification of landslide susceptibility is evaluated by the receiver operating characteristic curve method and the resulting area under the curve (AUC) [44]. The results are shown in Figure 4. The AUC value is 0.760 for weights-of-evidence, 0.772 for logistic regression, and 0.928 for the unique conditions model. Thus, weights-of-evidence and logistic regression have almost equal discriminatory power for landslide susceptibility classification. The bias of weights-of-evidence has little effect on its discriminatory power. Nevertheless, the resulting landslide susceptibility classification will have little physical meaning due to the model bias. It is clear that the unique conditions model outperforms the other models because its discriminatory power is much higher and the AUC value is close to one (i.e., perfect classification).

4. Discussion

The theoretical developments and illustrative examples show that conditional independence of the controlling factors and model bias are related, as also reported in the literature [17,18,19,20,21,22,38,39]. In this study, it is clearly shown that in the case where controlling factors are conditionally independent, the weights-of-evidence, logistic regression, and unique conditions model are equivalent, meaning they will yield the same posterior probabilities. Therefore, in practice, one can choose any of these methods based on simplicity of the technique or the skill and experience of the user. The equivalence between weights-of-evidence and logistic regression in the case of conditional independence of the factors has been demonstrated in other studies [18,19,20,21,38,39], but the equivalence with the unique conditions method is a new contribution from this study.
Conditional independence of the controlling factors is, in practice, the exception rather than the rule. When there is no conditional independence, weights-of-evidence produces biased posterior probabilities, which is usually ignored or disregarded in practice, especially in landslide susceptibility studies, where weights-of-evidence has proven to be a very popular technique [4,5,6]. On the other hand, logistic regression provides unbiased posterior probabilities for individual factor classes and the overall study area, but not for higher levels when factors are combined. Methods have been proposed in the literature to improve weights-of-evidence and logistic regression by including so-called mixed terms, that is, combinations of factors, based on trial and error or search algorithms that improve the likelihood of the predictions [4,21,33,35]. Such methods may be justified, but it seems likely that their results will never match the results of the unique conditions model.
The above discussion is further illustrated by considering the ROC curves and AUC values obtained with the different models, as shown in Figure 4. When only landslide susceptibility classification is considered, weights-of-evidence and logistic regression perform almost equally. Because weights-of-evidence is easier to perform, it may be preferable in practice if only classification is pursued. However, none of these methods can produce the discriminatory power of the unique conditions model. Moreover, Figure 3 shows that the posterior probability map obtained with the unique conditions model is more detailed and covers the entire range of probabilities from zero to one, while for logistic regression, the predicted probabilities range only from zero to a maximum of 0.36, and for weights-of-evidence from zero to 0.81—the latter likely due to overestimation due to the model bias.
It is generally accepted that direct estimation of conditional probabilities for all combinations of controlling factors is infeasible due to the excessive computational requirements. When the controlling factors consists of n binary patterns, 2n unique combinations are possible, making it very difficult, if not impossible, to directly estimate the conditional probabilities of all combinations [17,20]. For instance, in the present case, there are a total of 47 factor classes, which would imply more than 1014 possible combinations. In practice, however, this is not the case, because classes of a same factor do not overlap. In such a case, the possible unique combinations reduce to the product of the number of classes in each factor. In the present case, this would amount to 629,856 possible combinations, which is still a large number. However, in addition, not all classes of different factors overlap, so the actual number of unique combinations may be much smaller. Hence, the trick is to consider only the combinations that actually occur and ignore the rest. This can be achieved, as shown in Figure 2, by comparing the combination of factors in a unit area with all other unit areas in the study domain and, if the conditions match, counting the number of unit areas and the number of observed landslides for this combination, making it possible to estimate the prior landslide probability using Equation (26). Because the unique combinations of all factors do not overlap, they are conditionally independent, such that the posterior probabilities are equal to the observed prior probabilities as given in Equation (27). The numerical derivation can be tedious if the unit area is small relative to the total domain. In the present case, there are 310,649 unit cells, which is large but achievable with modern computing power.
The number of unique conditions actually occurring is 28,605, which is much less than the theoretical possible number of combinations. The size of the unique conditions areas ranges from 1 to 467 grid cells, i.e., 400 m2 to 187 ha. The average size of the unique conditions area is 4.34 ha. In general, the areas with unique conditions are quite small, so in many of these areas, no landslide has been observed. In fact, 66% of the total domain appears to be free of landslides. This could be interpreted as missing data, and the posterior probability could be set equal to the prior value. However, this would conflict with the unbiasedness of the model, so we chose to be consistent and set the posterior probability equal to zero. Furthermore, 76% of the domain is found to have a posterior probability lower than the prior probability, and thus can be assumed to have low landslide susceptibility. There are also 171 unit areas found with unique conditions and observed landslides, implying a landslide probability of one. These represent 3% of all observed landslides in the study area.
At first glance, one might conclude that the unique conditions model does not provide information about the importance of each factor class. However, this is not the case, because the model is unbiased and precisely predicts the average posterior landslide probability in each factor class area. Because these are equal to the observed landslide probabilities, the p i j values or the corresponding weights w i j are measures of the importance and predictive power of each factor class. Such information could be used prior to modelling to discard factors with classes that exhibit little or negligible discriminatory power. For instance, in the present case, one might decide to remove the slope shape factor because it has low w i j values. This reduces the number of unique conditions to 13,341, which can save computing time. The size of the unique conditions areas now ranges from 1 to 1090 grid cells, i.e., 400 m2 to 436 ha, with an average of 11.4 ha. Now, 54% of the total area is free of landslides and 74% has a posterior probability that is lower than the prior probability. However, the resulting posterior landslide probabilities are not much different from the previous results and the AUC value becomes 0.90. Removing the slope shape factor therefore has little effect, apart from the gain in computing time.
Landslide probability estimates are not well suited for mapping landslide susceptibility, because the distribution over the study area is very skewed and there is no clear rule for classifying the probability values in landslide susceptibility categories. Therefore, we propose a landslide susceptibility index (LSI), similar to [23], defined as
L S I = l o g i t p y l o g i t p 0 ,
which for the unique conditions model becomes
L S I = k l o g i t p k l o g i t p 0 y k = k w k y k .
Such a classification has the advantage of being easy to interpret, since positive values indicate areas prone to landslides and conversely, negative values indicate areas less prone to landslides. Moreover, landslide susceptibility classes can be easily obtained without subjective judgment by dividing the LSI values into equal intervals.
The resulting LSI map for the present case is shown in Figure 5. Since the probability values can be zero or one, the weights can go to infinity. Therefore, the weights are limited to a range of −10 to +10. The LSI map shows much more detail than the posterior landslide probability maps. LSI values around zero are represented by the yellow color and correspond to areas that are no more or less prone to landslides than observed. Negative LSI values, represented by the green and blue colors, indicate areas not prone to landslides. These take up a large part of the basin. On the other hand, areas with orange and red colors are prone to landslides and cover only some small parts of the basin. Thus, transformation of posterior landslide probability into corresponding LSI values allows a simple and clear interpretation of landslide susceptibility.

5. Conclusions

We examined three statistical methods for landslide susceptibility mapping, which predict the conditional probability of landslides with categorical controlling factors—weights-of-evidence, logistic regression, and a unique conditions model—by considering all possible combinations of the controlling factors. The strengths and weaknesses of the models were illustrated and tested through application to a practical case study.
It is shown that when all factors are conditionally independent, all models are equivalent and result in unbiased predictions of the posterior landslide probability. When there is a conditional dependency between factors, the posterior probabilities derived from weights-of-evidence are biased and generally too large compared to the observations. However, the bias of the weights-of-evidence has little effect on its discriminatory power. On the other hand, logistic regression produces unbiased estimates with respect to the factor classes and the overall study area, regardless of whether the controlling factors are conditionally independent.
The unique conditions model is always unbiased because the unique condition areas do not overlap and are therefore conditionally independent. Therefore, this model predicts the landslide probabilities without bias in the total domain, in all factor classes, and in any area that can be composed by combining factor classes. Moreover, the unique conditions model outperforms the other models because the discriminatory power is much higher, and the AUC value is close to one. The application of the unique conditions model can become computationally cumbersome if there are too many controlling factors and overlaps. This can also lead to unique conditions areas becoming too small to be meaningful. To avoid this, the most important factors can first be selected based on the observed landslide probabilities.
Because landslide probability estimates are not well suited for landslide susceptibility mapping, we propose a landslide susceptibility index that has the advantage of being easy to interpret without subjective judgment.
Although the landslide dataset used in this study does not include all possible landslide-causing factors, the results of this study are promising and show potential for broader practical use. However, quality and quantity of input data are important for achieving good results, so further research is necessary. We therefore recommend that future research consider other field cases with more complete thematic layers and provide a geomorphological evaluation and cross-validation of the predictions. In future research, we also recommend validating the findings of this work with other innovative data-driven methods such as machine learning and/or deep learning models.

Author Contributions

Conceptualization, F.D.S. and P.K.; methodology, F.D.S. and P.K.; software, F.D.S.; validation, F.D.S. and P.K.; formal analysis, F.D.S. and P.K.; investigation, F.D.S.; resources, P.K.; data curation, F.D.S.; writing—original draft preparation, F.D.S.; writing—review and editing, F.D.S.; visualization, F.D.S.; supervision, F.D.S.; project administration, not applicable.; funding acquisition, not applicable. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chung, C.J.F.; Fabbri, A.G. Probabilistic prediction models for landslide hazard mapping. Photogramm. Eng. Remote Sens. 1999, 65, 1389–1399. [Google Scholar]
  2. Guzzetti, F.; Reichenbach, P.; Cardinali, M.; Galli, M.; Ardizzone, F. Probabilistic landslide hazard assessment at the basin scale. Geomorphology 2005, 72, 272–299. [Google Scholar] [CrossRef]
  3. Chacón, J.; Irigaray, C.; Fernández, T.; El Hamdouni, R. Engineering geology maps: Landslides and geographical information systems. Bull. Eng. Geol. Environ. 2006, 65, 341–411. [Google Scholar] [CrossRef]
  4. Huabin, W.; Gangjun, L.; Gonghui, W. GIS-based landslide hazard assessment: An overview. Prog. Phys. Geog. 2005, 29, 548–567. [Google Scholar] [CrossRef]
  5. Lee, S. Current and future status of GIS-based landslide susceptibility mapping: A literature review. Kor. J. Remote Sens. 2019, 35, 179–193. [Google Scholar] [CrossRef]
  6. Reichenbach, P.; Rossi, M.; Malamud, B.D.; Mihir, M.; Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth-Sci. Rev. 2018, 180, 60–91. [Google Scholar] [CrossRef]
  7. van Westen, C.J. Statistical Landslide Hazard Analysis. In ILWIS 2.1 for Windows; ILWIS Department, International Institute for Aerospace Survey & Earth Sciences: Enschede, The Netherlands, 1997; pp. 73–84. [Google Scholar]
  8. Lee, S.; Choi, J.; Min, K. Landslide susceptibility analysis and verification using the Bayesian probability model. Environ. Geol. 2002, 43, 120–131. [Google Scholar] [CrossRef]
  9. Bonham-Carter, G.F. Geographic Information Systems for Geoscientists; Pergamon: Oxford, UK, 1994; p. 398. [Google Scholar]
  10. Regmi, N.R.; Giardino, J.R.; Vitek, J.D. Modeling susceptibility to landslides using the weight of evidence approach: Western Colorado, USA. Geomorphology 2010, 115, 172–187. [Google Scholar] [CrossRef]
  11. Cervi, F.; Berti, M.; Borgatti, L.; Ronchetti, F.; Manenti, F.; Corsini, A. Comparing predictive capability of statistical and deterministic methods for landslide susceptibility mapping: A case study in the northern Apennines (Reggio Emilia Province, Italy). Landslides 2010, 7, 433–444. [Google Scholar] [CrossRef]
  12. Chen, X.; Chen, H.; You, Y.; Chen, X.; Liu, J. Weights-of-evidence method based on GIS for assessing susceptibility to debris flows in Kangding County, Sichuan Province, China. Environ. Earth Sci. 2016, 75, 70. [Google Scholar] [CrossRef]
  13. Rahman, M.S.; Ahmed, B.; Di, L. Landslide initiation and runout susceptibility modeling in the context of hill cutting and rapid urbanization: A combined approach of weights of evidence and spatial multi-criteria. J. Mt. Sci. 2017, 14, 1919–1937. [Google Scholar] [CrossRef]
  14. Polykretis, C.; Chalkias, C. Comparison and evaluation of landslide susceptibility maps obtained from weight of evidence, logistic regression, and artificial neural network models. Nat. Hazards 2018, 93, 249–274. [Google Scholar] [CrossRef]
  15. Jaafari, A. LiDAR-supported prediction of slope failures using an integrated ensemble weights- of-evidence and analytical hierarchy process. Environ. Earth Sci. 2018, 77, 42. [Google Scholar] [CrossRef]
  16. Pecoraro, G.; Nicodemo, G.; Menichini, R.; Luongo, D.; Peduto, D.; Calvello, M. Combining statistical, displacement and damage analyses to study slow-moving landslides interacting with roads: Two case studies in Southern Italy. Appl. Sci. 2023, 13, 3368. [Google Scholar] [CrossRef]
  17. Agterberg, F.P.; Cheng, Q. Conditional independence test for Weights-of-Evidence modelling. Nat. Resour. Res. 2002, 11, 249–255. [Google Scholar] [CrossRef]
  18. Deng, M. A conditional dependence adjusted weights of evidence model. Nat. Resour. Res. 2009, 18, 249–258. [Google Scholar] [CrossRef]
  19. Zhang, D.; Agterberg, F.; Cheng, Q.; Zuo, R. A comparison of modified fuzzy weights of evidence, fuzzy weights of evidence, and logistic regression for mapping mineral prospectivity. Math. Geosci. 2014, 46, 869–885. [Google Scholar] [CrossRef]
  20. Cheng, Q. BoostWofE: A new sequential weights of evidence model reducing the effect of conditional dependency. Math. Geosci. 2015, 47, 591–621. [Google Scholar] [CrossRef]
  21. Schaeben, H.; Semmler, G. The quest for conditional independence in prospectivity modeling: Weights-of-evidence, boost weights-of-evidence, and logistic regression. Front. Earth Sci. 2016, 10, 389–408. [Google Scholar] [CrossRef]
  22. Agterberg, F.P. A modified weights-of-evidence method for regional mineral resource estimation. Nat. Resour. Res. 2011, 20, 95–101. [Google Scholar] [CrossRef]
  23. De Smedt, F.; Kayastha, P.; Dhital, M.R. Naïve and Semi-Naïve Bayesian Classification of Landslide Susceptibility Applied to the Kulekhani River Basin in Nepal as a Test Case. Geosciences 2023, 13, 306. [Google Scholar] [CrossRef]
  24. Sun, D.; Xu, J.; Wen, H.; Wang, Y. An optimized random forest model and its generalization ability in landslide susceptibility mapping: Application in two areas of Three Gorges Reservoir, China. J. Earth Sci. 2020, 31, 1068–1086. [Google Scholar] [CrossRef]
  25. Tehrani, F.S.; Calvello, M.; Liu, Z.; Zhang, L.; Lacasse, S. Machine learning and landslide studies: Recent advances and applications. Nat. Hazards 2022, 114, 1197–1245. [Google Scholar] [CrossRef]
  26. Lima, P.; Steger, S.; Glade, T.; Murillo-Garcia, F.G. Literature review and bibliometric analysis on data-driven assessment of landslide susceptibility. J. Mt. Sci. 2022, 19, 1670–1698. [Google Scholar] [CrossRef]
  27. Yong, C.; Jinlong, D.; Fei, G.; Bin, T.; Tao, Z.; Hao, F.; Li, W.; Qinghua, Z. Review of landslide susceptibility assessment based on knowledge mapping. Stoch. Environ. Res. Risk Assess. 2022, 36, 2399–2417. [Google Scholar] [CrossRef]
  28. Dai, F.C.; Lee, C.F.; Li, J.; Xu, Z.W. Assessment of landslide susceptibility on the natural terrain of Lantau Island, Hong Kong. Environ. Geol. 2001, 40, 381–391. [Google Scholar] [CrossRef]
  29. Ohlmacher, G.C.; Davis, J.C. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng. Geol. 2003, 69, 331–343. [Google Scholar] [CrossRef]
  30. Lee, S. Application of likelihood ratio and logistic regression models to landslide susceptibility mapping using GIS. Environ. Manag. 2004, 34, 223–232. [Google Scholar] [CrossRef] [PubMed]
  31. Budimir, M.E.A.; Atkinson, P.M.; Lewis, H.G. A systematic review of landslide probability mapping using logistic regression. Landslides 2015, 12, 419–436. [Google Scholar] [CrossRef]
  32. Du, G.-L.; Zhang, Y.-S.; Iqbal, J.; Yang, Z.-H.; Yao, X. Landslide susceptibility mapping using an integrated model of information value method and logistic regression in the Bailongjiang watershed, Gansu Province, China. J. Mt. Sci. 2017, 14, 249–268. [Google Scholar] [CrossRef]
  33. Lima, P.; Steger, S.; Glade, T. Counteracting flawed landslide data in statistically based landslide susceptibility modelling for very large areas: A national-scale assessment for Austria. Landslides 2021, 18, 3531–3546. [Google Scholar] [CrossRef]
  34. Ozturk, U.; Pittore, M.; Behling, R.; Roessner, S.; Andreani, L.; Korup, O. How robust are landslide susceptibility estimates? Landslides 2012, 18, 681–695. [Google Scholar] [CrossRef]
  35. Ng, C.W.W.; Yang, B.; Liu, Z.Q.; Kwan, J.S.H.; Chen, L. Spatiotemporal modelling of rainfall-induced landslides using machine learning. Landslides 2021, 18, 2499–2514. [Google Scholar] [CrossRef]
  36. Broeckx, J.; Maertens, M.; Isabirye, M.; Vanmaercke, M.; Namazzi, B.; Deckers, J.; Tamale, J.; Jacobs, L.; Thiery, W.; Kervyn, M.; et al. Landslide susceptibility and mobilization rates in the Mount Elgon region, Uganda. Landslides 2019, 16, 571–584. [Google Scholar] [CrossRef]
  37. Zhang, Y.; Wen, H.; Xie, P.; Hu, D.; Zhang, J.; Zhang, W. Hybrid-optimized logistic regression model of landslide susceptibility along mountain highway. Bull. Eng. Geol. Environ. 2021, 80, 7385–7401. [Google Scholar] [CrossRef]
  38. Zhang, D.; Agterberg, F. Modified weights-of-evidence modeling with example of missing geochemical data. Complexity 2018, 2018, 7945960. [Google Scholar] [CrossRef]
  39. Schaeben, H. A mathematical view of weights-of-evidence, conditional independence, and logistic regression in terms of Markov random fields. Math. Geosci. 2014, 46, 691–709. [Google Scholar] [CrossRef]
  40. Taboga, M. Logistic Regression—Maximum Likelihood Estimation; Lectures on Probability Theory and Mathematical Statistics; Kindle Direct Publishing: Seattle, WA, USA, 2021; Available online: https://www.statlect.com/fundamentals-of-statistics/logistic-model-maximum-likelihood (accessed on 5 January 2024).
  41. Kayastha, P.; Dhital, M.R.; De Smedt, F. Evaluation and comparison of GIS based landslide susceptibility mapping procedures in Kulekhani watershed, Nepal. J. Geol. Soc. India 2013, 81, 219–231. [Google Scholar] [CrossRef]
  42. Larsen, K. Information: Data Exploration with Information Theory (Weight-of-Evidence and Information Value). Available online: https://CRAN.R-project.org/package=Information (accessed on 12 August 2023).
  43. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017; Available online: http://www.R-project.org/ (accessed on 15 November 2021).
  44. Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.C.; Müller, M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 2011, 12, 77. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Raster maps of the observed landslides indicator x 0 and the controlling factors x i j the legend labels of the factor classes are listed in the same order as in Table 2.
Figure 1. Raster maps of the observed landslides indicator x 0 and the controlling factors x i j the legend labels of the factor classes are listed in the same order as in Table 2.
Geosciences 14 00197 g001
Figure 2. Workflow for the unique conditions model explained in pseudocode.
Figure 2. Workflow for the unique conditions model explained in pseudocode.
Geosciences 14 00197 g002
Figure 3. Raster maps of the posterior landslide probability obtained with weights-of-evidence, logistic regression, and unique conditions model.
Figure 3. Raster maps of the posterior landslide probability obtained with weights-of-evidence, logistic regression, and unique conditions model.
Geosciences 14 00197 g003
Figure 4. ROC curves obtained using weights-of evidence, logistic regression, and the unique conditions model.
Figure 4. ROC curves obtained using weights-of evidence, logistic regression, and the unique conditions model.
Geosciences 14 00197 g004
Figure 5. Map of the landslide susceptibility index (LSI) obtained with the unique conditions model.
Figure 5. Map of the landslide susceptibility index (LSI) obtained with the unique conditions model.
Geosciences 14 00197 g005
Table 1. Results obtained by weights-of evidence, logistic regression, and the unique conditions model, using only geology as a controlling factor: geology classes, p j : observed and unique conditions model probabilities, w j : weights-of evidence, and β j : logistic regression coefficients; the last row lists the prior probability p 0 , logit( p 0 ), and the intercept of the logistic regression model β 0 , respectively.
Table 1. Results obtained by weights-of evidence, logistic regression, and the unique conditions model, using only geology as a controlling factor: geology classes, p j : observed and unique conditions model probabilities, w j : weights-of evidence, and β j : logistic regression coefficients; the last row lists the prior probability p 0 , logit( p 0 ), and the intercept of the logistic regression model β 0 , respectively.
Geology Class p j w j β j
Chitlang Formation0.006–1.244-
Chandragiri Limestone0.003–1.780–0.536
Sopyang Formation0.011–0.5780.666
Tistung Formation0.012–0.4280.816
Markhu Formation0.0210.0881.333
Kulikhani Formation0.0501.0032.247
Chisapani Quartzite0.0220.1651.410
Palung Granite0.0250.2701.515
Quarternary deposits0.002–2.116–0.871
p0/logit(p0) / β00.019–3.949–5.193
Table 2. Results obtained by weights-of-evidence and logistic regression, using all controlling factors: factor classes, p i j : observed probabilities, w i j : weights-of evidence, and β i j : logistic regression coefficients; the last row lists p 0 : the prior probability, logit( p 0 ), and β 0 : the intercept of the logistic regression model, respectively.
Table 2. Results obtained by weights-of-evidence and logistic regression, using all controlling factors: factor classes, p i j : observed probabilities, w i j : weights-of evidence, and β i j : logistic regression coefficients; the last row lists p 0 : the prior probability, logit( p 0 ), and β 0 : the intercept of the logistic regression model, respectively.
Factor Class p i j w i j β i j
Slope aspect
N0.0210.121-
NE0.0290.4270.201
E0.0250.2790.192
SE0.0190.026–0.029
S0.015–0.242–0.105
SW0.012–0.502–0.318
W0.009–0.743–0.529
NW0.018–0.051–0.152
Flat0.002–2.165–0.796
Slope angle
<5°0.005–1.408-
5–15°0.010–0.6850.110
15–25°0.018–0.0630.274
25–35°0.0240.2600.354
35–45°0.0260.3350.293
>45°0.0260.3350.207
Slope shape
Convex0.0210.100-
Straight0.013–0.410–0.040
Concave0.0200.068–0.027
Relative relief
<25 m/ha0.005–1.447-
25–50 m/ha0.017–0.1100.683
50–100 m/ha0.0290.4481.184
>100 m/ha0.012–0.4570.501
Drainage distance
<25 m0.0260.331-
25–50 m0.016–0.170–0.526
50–100 m0.008–0.837–1.106
>100 m0.001–3.130–3.147
Geology
Chitlang Formation0.006–1.244-
Chandragiri Limestone0.003–1.780–0.116
Sopyang Formation0.011–0.5781.087
Tistung Formation0.012–0.4281.453
Markhu Formation0.0210.0881.797
Kulikhani Formation0.0501.0032.423
Chisapani Quartzite0.0220.1651.021
Palung Granite0.0250.2701.573
Quarternary deposits0.002–2.1160.378
Land use
Built-up area0.0190.000-
Agriculture0.014–0.284-
Forest0.0230.2210.320
Nursery0.0190.000-
Grassland0.0490.9800.241
Bush0.0240.2230.507
Swamp0.010–0.626–0.226
Barren land0.1582.2781.289
Reservoir0.0190.000-
Annual rainfall
<1500 mm/y0.007–0.962-
1500–1750 mm/y0.016–0.2000.864
> 1750 mm/y0.0300.4801.418
p0/logit(p0)/β00.019–3.949−7.438
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

De Smedt, F.; Kayastha, P. A Unique Conditions Model for Landslide Susceptibility Mapping. Geosciences 2024, 14, 197. https://doi.org/10.3390/geosciences14080197

AMA Style

De Smedt F, Kayastha P. A Unique Conditions Model for Landslide Susceptibility Mapping. Geosciences. 2024; 14(8):197. https://doi.org/10.3390/geosciences14080197

Chicago/Turabian Style

De Smedt, Florimond, and Prabin Kayastha. 2024. "A Unique Conditions Model for Landslide Susceptibility Mapping" Geosciences 14, no. 8: 197. https://doi.org/10.3390/geosciences14080197

APA Style

De Smedt, F., & Kayastha, P. (2024). A Unique Conditions Model for Landslide Susceptibility Mapping. Geosciences, 14(8), 197. https://doi.org/10.3390/geosciences14080197

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop