Next Article in Journal
The Effects of Climate and Bioclimate on COVID-19 Cases in Poland
Next Article in Special Issue
Quantifying the Contributions of Climate Change and Human Activities to Water Volume in Lake Qinghai, China
Previous Article in Journal
Change Detection of Selective Logging in the Brazilian Amazon Using X-Band SAR Data and Pre-Trained Convolutional Neural Networks
Previous Article in Special Issue
Landsat TM/OLI-Based Ecological and Environmental Quality Survey of Yellow River Basin, Inner Mongolia Section
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid Models Incorporating Bivariate Statistics and Machine Learning Methods for Flash Flood Susceptibility Assessment Based on Remote Sensing Datasets

1
School of Civil Engineering and Geomatics, Southwest Petroleum University, Chengdu 610500, China
2
Chinese Academy of Sciences, State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China
3
School of Geography and Planning, Sun Yat-sen University, Guangzhou 510275, China
4
School of Geoscience and Technology, Southwest Petroleum University, Chengdu 610500, China
5
School of Geographical Sciences, Northeast Normal University, Changchun 130024, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2021, 13(23), 4945; https://doi.org/10.3390/rs13234945
Submission received: 27 October 2021 / Revised: 23 November 2021 / Accepted: 30 November 2021 / Published: 5 December 2021
(This article belongs to the Special Issue Remote Sensing in Natural Resource and Water Environment)

Abstract

:
Flash floods are considered to be one of the most destructive natural hazards, and they are difficult to accurately model and predict. In this study, three hybrid models were proposed, evaluated, and used for flood susceptibility prediction in the Dadu River Basin. These three hybrid models integrate a bivariate statistical method of the fuzzy membership value (FMV) and three machine learning methods of support vector machine (SVM), classification and regression trees (CART), and convolutional neural network (CNN). Firstly, a geospatial database was prepared comprising nine flood conditioning factors, 485 flood locations, and 485 non-flood locations. Then, the database was used to train and test the three hybrid models. Subsequently, the receiver operating characteristic (ROC) curve, seed cell area index (SCAI), and classification accuracy were used to evaluate the performances of the models. The results reveal the following: (1) The ROC curve highlights the fact that the CNN-FMV hybrid model had the best fitting and prediction performance, and the area under the curve (AUC) values of the success rate and the prediction rate were 0.935 and 0.912, respectively. (2) Based on the results of the three model performance evaluation methods, all three hybrid models had better prediction capabilities than their respective single machine learning models. Compared with their single machine learning models, the AUC values of the SVM-FMV, CART-FMV, and CNN-FMV were 0.032, 0.005, and 0.055 higher; their SCAI values were 0.05, 0.03, and 0.02 lower; and their classification accuracies were 4.48%, 1.38%, and 5.86% higher, respectively. (3) Based on the results of the flood susceptibility indices, between 13.21% and 22.03% of the study area was characterized by high and very high flood susceptibilities. The three hybrid models proposed in this study, especially CNN-FMV, have a high potential for application in flood susceptibility assessment in specific areas in future studies.

1. Introduction

Flash floods arise from interactions between the hydrological and the atmospheric systems. They are characterized by a runoff peak developing over a period of minutes to hours during or after heavy rainfall, and they generally occur in the river basins smaller than 200 km2 [1]. They are considered to be one of the most devastating and frequent natural disasters worldwide [2]. Over the last few decades, the number of such disasters and the amount of damage generated have significantly increased due to the global climatic change [3]. For example, ~4.63 million km2 of China is susceptible to flash floods, which have threatened 560 million people [4]. In Europe, 40% of flood-related casualties between 1950 and 2006 were caused by flash floods [5], and the percentage in southern Europe has exceeded 80% [6]. Therefore, it is an indispensable task to study flood susceptibility (defined as the probability of flooding) in order to develop preventive measures and reduce the impacts of flash floods. The accurate identification of flood-prone areas plays a crucial role in this task [7]. Thus, flood susceptibility maps with high precisions are widely considered into the flood risk management as a non-structure measure [8].
In recent years, several types of methods have been mainly applied in flood susceptibility mapping, including multi-criteria decision analysis (MCDA) methods based on knowledge-driven, physically-based simulation methods and statistical methods and machine learning methods driven by historical data. The MCDA methods (i.e., the analytical hierarchy process and analytical network process) are considered to be simple but subjective [9]. The physically based simulation methods (i.e., the MIKE 11 [10] and FLO-2D) can describe the details of the flash flood, but they require a sea of input data and substantial computational resources [11]. In contrast to the MCDA and physically based simulation methods, the latest techniques, i.e., statistical methods and machine learning methods, overcome the shortcomings of the MDCA and physical simulation methods and are able to quickly and accurately predict the susceptibility to flooding. Therefore, they have been widely used in many recent studies [12,13,14,15]. The main statistical methods used in flood susceptibility mapping include the frequency ratio [15], weights of evidence [16], index of entropy [17], and statistical index [18]. However, due to the complex mechanism of flood occurrence, the accuracies of these statistical methods are not very high [19]. In this context, machine learning methods have been introduced in order to improve the accuracy of flood susceptibility predictions because these methods can solve nonlinear problems better [20]. In recent years, many machine learning algorithms have been successfully applied in the assessment of the susceptibility to natural disasters, such as support vector machine (SVM) [21], random forest algorithm [22], classification and regression trees (CART) [23], boosted regression trees [24], and artificial neural networks [25]. These methods are modeled based on the rule of treating past flood points as dependent variables and flood conditioning factors as explanatory variables. In addition, past flood data can be used to evaluate the performance of these models, which is one of the advantages of these methods.
Although machine learning methods have significantly improved the accuracy of flood susceptibility predictions compared to statistical methods, no single method or technique is considered to be the best in all areas. One of the reasons for this phenomenon is that a single method has some shortcomings, which limit the performance of the flood susceptibility prediction [17]. It is difficult to ensure that the input data for the flood susceptibility prediction most appropriately represent the flooding in the single method, which may cause the model to miss the best fit function or the true distribution of the sample set [17]. However, a hybrid model is considered to be an effective technique for solving this problem. Therefore, integrating the statistical and machine learning methods to create hybrid models has become a popular trend in recent researches. In this regard, Costache et al. used one bivariate statistical method (Statistical Index) and its novel ensemble with the following machine learning models: Logistic Regression, Classification and Regression Trees, Multilayer Perceptron, Random Forest, and Support Vector Machine and Decision Tree CART to predict the flash flood susceptibility in Bâsca Chiojdului River Basin. From their results, it can be found that the proposed Multilayer Perceptron–Statistical Index (MLP-SI) ensemble has the highest efficiency [18]. Wang et al. integrated two independent models of frequency ratio and index of entropy with multilayer perceptron and classification and regression tree models to evaluate the flood susceptibility of Poyang County, in China [17]. Tehrany et al. integrated the weights-of-evidence and four kernel types of SVM in flood susceptibility mapping in Terengganu, and found that the WoE-RBF-SVM showed the best performance [26]. In addition, there are a lot of studies that have made contributions in this regard [27,28]. The common conclusion of the above studies of hybrid models is that these hybrid models have improved flood susceptibility prediction capabilities. Therefore, new hybrid models need to be investigated further.
In this context, based on nine flash flood conditioning factors, three hybrid models were proposed in this study to predict the flood susceptibility in Dadu River basin by integrating a bivariate statistic method (FMV) and three machine learning methods (SVM, CART, and CNN). The aims of this study are as follows: (1) proposing and validating the three hybrid models to enrich the methods for predicting flood susceptibility and (2) predicting and assessing the flood susceptibility of the Dadu River Basin for mitigating the negative effects of the flash flood disasters in the study area.

2. Materials

2.1. Study Area

The present study is focused on the Dadu River Basin (28°24′–33°65′N, 99°62′–103°77′E), which is situated on the eastern margin of the Tibetan Plateau and to the west of the Sichuan Basin (Figure 1). The Dadu River, as a tributary of the Min River and a sub-tributary of the Yangtze River, has a full length of 1062 km, an elevation drop of 4175 m, and a catchment area of 90,016 km2. The study area is highly undulatory, with altitudes ranging from 337 to 7304 m. The precipitation in the Dadu River Basin increases from north to south and reaches 116 mm/day in the south. Eighty percent of the precipitation in the study area occurs from May to October [29]. Despite the high degree of afforestation (78%), the high precipitation, high slopes, and dense ditches frequently lead to severe flash flood events [30,31].

2.2. Data

2.2.1. Flash Flood Inventory Map

The inventory of the areas previously affected by flash floods is the basic information for predicting areas where flash floods could occur in future [27]. In particular, for machine learning and statistical models, the accuracy of the historical flash flood locations significantly affects the prediction results. In this study, the flash flood inventory maps were obtained from the National Flash Flood Investigation and Evaluation Project (NFFIEP), which was launched by the Ministry of Water Resources of China and the Ministry of Finance of China in 2013 [4]. In this project, the flood disaster areas were determined by data collection and analysis and field surveys which were assumed as points for collection [4]. Thus, the central location of the historical flash flood ditch is used to indicate the location where the flash flood occurred. However, because of the age of many flash flood events, it is difficult to determine their boundaries and height today. Meanwhile, this project recorded the longitude, latitude, time, casualties, and economic losses of historical flash flood events from 1949 to 2015. More importantly, the reliability and attributes of these flash floods were strictly inspected by the experts and scholars of the China Institute of Water Resources and Hydropower [32]. Since the establishment of this database, it has successfully served a large number of studies [4,33,34].
There was a total of 485 flash floods in the Dadu River Basin (Figure 1). In addition to the flooded points, we randomly selected an equal number of non-flooded points in the study area. When applying them in the machine learning and statistical models, a value of 1 was assigned to the flooded points (i.e., the positive samples), and a value of 0 was assigned to the non-flooded points (i.e., the negative samples). Finally, 70% of the positive and negative samples were combined as the training sample (340 flood points and 340 non-flood points) and the remaining 30% were used as the validation sample (145 flood points and 145 non-flood points). In order to select the best model in the training process, the above training sample was again divided into 80% and 20% used to be the sub-training sample (272 flood points and 272 non-flood points) and testing sample (68 flood points and 68 non-flood points), respectively. Thus, a total of 272 flood and 272 non-flood points were used to train the models, 68 flood points and 68 non-flood points were used to test the models, and 145 flood points and 145 non-flood points were used to validate the models.

2.2.2. Flash Flood Conditioning Factors

Based on the formation mechanism of the flash floods, we considered nine factors from two perspectives (triggering factors and disaster-pregnant environment) based on previous studies [35,36,37]. These flash flood conditioning factors include the altitude, slope, slope aspect, topographic wetness index (TWI), maximum three-day precipitation (M3DP), land cover, soil texture, normalized difference vegetation index (NDVI), and distance to the river (DR). Each of these factors were converted into a gridded database with a spatial resolution of 1 km × 1 km in ArcGIS. The primary sources of the factors are presented in Table 1. Furthermore, short descriptions of the factors that influence the flash flood occurrence are provided below.
Altitude (Figure 2a) is an important factor affecting flood occurrence. In general, areas with lower altitudes experience higher river discharge and a greater likelihood of flooding [38]. The altitude of the study area increases from southeast to northwest, ranging from 337 to 7304 m. In this study, the altitude was represented by a digital elevation model (DEM) obtained from the Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences (GDC). The spatial resolution of DEM has been converted into 1 km × 1 km by Resample tool in ArcGIS. Slope (Figure 2b) is considered to be one of the factors with the most influence on flash flood genesis [27]. Due to the nature of water flowing downhill, the slope can directly affect the vertical percolation and surface runoff [39]. Generally, floods occur more frequently in areas with low slopes. The slopes in the study area are highly variable, even reaching 52.6° in some places. The slope factor used in this study was calculated using the DEM. Slope aspect (Figure 2c) is defined as the direction of the maximum slope of the terrain surface [40], which was also obtained from the DEM in this study. It is generally accepted that the slope aspect affects flooding in an indirect way, that is, by controlling various geographic and environmental factors, i.e., vegetation, soils, and rainfall [41]. In this study, the slope aspect map was divided into 10 categories ranging from flat to north zones. The TWI (Figure 2d) is a significant factor in flood susceptibility mapping, which reflects the geotechnical wetness [42]. As a signal for water accumulation in a river basin, the TWI values are positively correlated with the likelihood of flooding. In this study, the TWI was also calculated from the DEM, and the detailed calculation formula was presented by Ali et al. [43]. The M3DP (Figure 2e), a representative factor for rainfall, is a triggering factor for flooding. It has been shown to have a non-negligible influence on the occurrence of floods [44]. Higher M3DP values generally signify a higher risk of flooding. The rainfall in the study area decreases from south to north, and the M3DP varies from 37 to 167 mm. The M3DP values used in this study were calculated using Global Precipitation Measurement (GPM) database, which records the average daily precipitation across the world. As a new generation of precipitation observation satellites, GPM was launched in February 2014 by National Aeronautics and Space Administration, integrating advanced microwave detection technology and data correction algorithms [45]. More importantly, its applicability in many regions of China has been verified [46]. The GPM data were resampled to 1 km × 1 km using the kriging interpolation method [17]. The land cover (Figure 2f) and soil texture (Figure 2g) are considered to have significant effects on flooding and drought incidences [47]. In the study area, there are six types of land cover and 12 types of soil texture. These two factors were both derived from the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences. Among them, the land cover data provided by this platform were produced by human visual interpretation based on Landsat TM. The NDVI (Figure 2h) has been used extensively for predicting flood susceptibility [48], and it represents the status of the vegetation cover. The higher the NDVI, the higher the vegetation cover and the lower the potential for flooding. The study area is basically covered with vegetation with NDVI of 0–0.92. The NDVI data used in this study was obtained from the National Earth System Science Data Center. The DR (Figure 2i) is also a commonly used factor for identifying the flood susceptibility because river flows are the main pathways for flood discharge and areas near rivers are susceptible to flooding [23]. The DR values were calculated by imposing multifarious buffer zones every 1000 m around the four-level river systems. To prepare the input for the bivariate statistics method, following the method used in previous studies [18,19], all five continuous numerical factors (altitude, slope, TWI, M3DP, and NDVI) were divided into five classes using the natural break method.

3. Methods

3.1. Feature Selection Methods

3.1.1. Information Gain Method

To avoid overfitting, feature selection is an essential step before applying the machine learning algorithms. This is because the use of less redundant data results in a performance boost of the model and leads to less opportunity for noise-based decisions [49]. Therefore, feature selection was used to improve the performance of the flood susceptibility model.
The information gain (IG) method is a popular feature selection method, which was proposed by Hunt et al. (1966). It was selected not only for its ability to eliminate invalid indicators, but also for its ability to rank the importance of the input variables [50]. Information theory is the basis of the IG method, which calculates the amount of gain. The IG value for each flash flood conditioning variable Fi was estimated using the following formula [51]:
I G ( Y i F i ) = H ( Y ) H ( Y | F i ) ,
where H(Y) is the entropy value of Yi, and H(Y|Fi) is the entropy of Y after associating the values of the flash flood conditioning factor Fi.

3.1.2. Variance Inflation and Tolerance

Multi-collinearity analysis was carried out to evaluate the correlations between the flash flood conditioning factors. In general, severe multi-collinearity makes it difficult for the model to accurately estimate by mistakenly describing the relevant factors in the statistical models [52]. The variance inflation (VIF) and tolerance (TOL) were used to examine the multi-collinearity among the factors in this study. When VIF > 10 or TOL < 0.1, the factor has multiple collinearity problems and needs be eliminated [52].

3.2. Bivariate Statistics Method

The FMV was selected as the bivariate statistics method in this study. Fuzzy logic was first proposed by Lotfi Zadeh in 1965 for computing set theory as fuzzy logic [53]. In reality, the relationship between variables or concepts is often inaccurate and ambiguous, which makes it difficult to describe the relationship using values of 0 (irrelevant) or 1 (relevant). However, fuzzy logic solves this problem by showing a gray look into the actual world and by finding a way to draw the external facts [53]. For example, if black is represented by 0 and white is represented by 1, then gray will be a number between 0 and 1, that is, fuzzy logic is a method that shows the correctness of the numbers between 0 and 1 [54]. Fuzzy principles have been implemented in several methods, such as frequency ratio (FR) and weight of evidence et al. [55]. Among which the frequency ratio (FR) is one of the most popular [56]. The FR can be calculated as follows [56]:
F R i j = A i j / A j B i j / B j ,
where FRij is the FR value of class i of factor j, Aij is the number of flash floods within class i of factor j, Aj is the number of total flash floods in factor j, Bij is the number of pixels in class i of factor j, and Bj is the total number of pixels in factor j.
After calculating the FRij, the fuzzy membership values are obtained using the following equation:
μ i j = F R i j / max i ( F R i j ) ,
where uij is the FMV of class i of factor j.

3.3. Machine Learning Methods

3.3.1. Support Vector Machine

The SVM algorithm is a cogent prediction machine learning method based on the structural risk minimization principle and statistical learning theory [57]. This algorithm implements binary classification by constructing a hyperplane, which can divide the training data based on the bands and an optimization algorithm. This hyperplane is generated by transforming the original input space into a higher-dimensional feature space [58]. After finding the hyperplane, the support vector whose position is closest to the hyperplane can be identified [59]. Then, a specific kernel function is applied to transform the input data into two classes: non-flood susceptibility and flood susceptibility {0, 1} [23]. Several kernel functions are used in SVM, but numerous studies [19,24] have shown that the radial basis function (RBF) has a better performance than the other kernels in the context of flood prediction. Thus, the RBF function was used in this study. The major steps of the algorithm are as follows:
(1) Assume that T = {x1, x2,..., xn, y} is the training set of known samples where xi is the ith input data, and y is the output data where i = 1, 2, …, n.
(2) Separate the training set into two categories using an n-dimensional hyperplane to obtain the maximum interval:
1 2 w 2 ,
y i ( ( w x i ) + b ) 1 ,
where ‖w‖ is the norm of the hyperplane normal, b is a scalar base, and (·) represents the product operation.
(3) Using the Lagrange multiplier, the cost function can be defined as follows:
L = 1 2 w 2 i = 1 n λ i ( y i ( ( w x i ) + b ) 1 ) ,
where λi is the Lagrangian multiplier. By using standard procedures, the solution can be obtained by minimizing the duality of w and b using Equation (6) [60].
(4) For the non-separable case, the constraints can be modified by introducing slack variables, ξi [60]:
y i ( ( w x ) + b ) 1 ξ i ,
Thus, Equation (6) becomes
L = 1 2 w 2 1 v n i = 1 n ξ i ,
where ν∈(0,1], which is introduced in order to account for misclassifications [61].
The BRF is described as follows:
K ( x i , x j ) = e γ ( x i x j ) 2 , γ > 0 ,
where γ is the parameter of the kernel function. Sometimes kernel functions are parameterized using γ = 1/2σ2, where σ is an adjustable parameter that governs the performance of the kernel.

3.3.2. Classification and Regression Trees

The CART, proposed by Breiman et al. in 1984 [62], is a nonparametric machine learning method that can build predictions based on the input variables [63]. Both the classification and regression tasks can be accomplished by the CART algorithm. For the classification problem, the predictors used can be number, binary, and categorical [19]. This is considered to be an advantage of the CART. Another advantage is the resistance to missing data [64] because the determination of the optimal ramification tree does not use a missing value. When using the CART to make the predictions, the missing values are processed using substitutes (surrogates) [62]. The predicted values are represented by the average of the response values. The optimal sampling rule of the CART algorithm, i.e., modified towing, is expressed as follows:
I ( S p l i t ) = [ 0.25 ( 1 ( 1 q ) ) u k | P L ( k ) P R ( k ) | ] 2 ,
where k denotes the target classes; PL(k) and PR (k) are the probability distributions of the targets in the left and right child nodes, respectively; and the power term u embeds a user-trollable penalty in splits that generate unequal-sized child nodes [65].

3.3.3. Convolutional Neural Network

The CNN was proposed by Bengio et al. in order to overcome the challenge of decreased speed in the learning process faced by traditional artificial neural networks (ANNs) when analyzing complex networks [66]. The CNN is still a neural network, and it carries the local connections among the different layers. In recent years, several CNN architectures have been established to solve increasingly complicated nonlinear problems, among which, the 1D-CNN is regarded as the most typical [67] and was used in this study. In general, the 1D-CNN has five neuronal layers, including an input layer, a convolutional layer, a pooling layer, a fully connected neural network layer, and an output layer. To simplify the problem, small squares of input data were used to extract the features in the input layer. The convolution layer needs an image matrix and a filter to process a series of mathematical operations. The role of the pooling layer is to reduce the number of parameters while retaining the critical information. The fully connected neural network layer, which is a simple and multilayer perceptron, is used to identify the object classes and to learn the weights.

3.4. Model Performance Evaluation Methods

3.4.1. Statistical Measures

As statistical measures, the sensitivity, specificity, and k-index were used to evaluate the performances of the models. These indexes reflect the classification accuracy of a model, which, in this study, is the ability of the model to correctly differentiate the flood pixels from the non-flood pixels [68]. Specifically, the sensitivity and specificity represent the proportion of flood events that are classified as flood pixels and the proportion of non-flood locations that are classified as non-flood pixels, respectively [69]. The statistical indexes, including the sensitivity, specificity, and accuracy, were calculated using the following equations [69]:
S e n s i t i v i t y = T P T P + F N ,
S p e c i f i c i t y = T N F P + T N ,
A c c u r a c y = T P + T N T P + F P + T N + F N ,
where FP (false positive) and FN (false negative) are the numbers of pixels erroneously classified. In addition, the Root Mean Squared Error (RMSE) (Equation (8)) was used to determine the difference between the expected and actual flood results. Although it is generally believed that the lower the RMSE, the better the model performance [70], Singh et al. claimed that RMSE values less than half the SD of the measured data may be considered low and acceptable for model evaluation [71]. Therefore, the SDs of prediction values were calculated in this study to provide reference for understanding the RMSE. Notably, the SDs of real sample datasets are equal to 0.5, because these datasets contain only the same number of 0 (non-flood points) and 1 (flood points).
R M S E = 1 n i = 1 n ( X p r e d i c t e d X a c t u a l ) 2 ,
The seed cell area index (SCAI) was used to compare the ratio of the density of the flood locations and the areas of the susceptibility classes in this study. Lower SCAI values indicate a higher flood susceptibility. This approach is helpful in evaluating the consistency and effectiveness of the models [72]. The SCAI is calculated as follows [73]:
S C A I = A e r i a   extent   of   susceptibility   classes   ( % ) I n v e n t o r y   of   floods   in   each   susceptibility   classes   ( % ) ,

3.4.2. ROC Curve

In general, the receiver operating characteristic (ROC) curve is applied to describe the performance of a statistical model that combines different clues and test results for predictive purposes [27]. The area under the curve (AUC) is the most important indicator of a ROC curve and can directly reflect the accuracy of a model. The greater the AUC, the better the model. In this study, the training dataset was used to determine the success rate curve, and the validation dataset were used to determine the prediction rate curve. The success rate and the prediction rate reflect the goodness of fit and the prediction power of the models, respectively [74]. The AUC can be calculated as follows:
A U C = T P + T N P + N ,
where TP (true positive) and TN (true negative) are the number of pixels that are correctly classified, P is the total number of pixels with flash floods, and N is the total number of pixels without flash floods.

3.5. Processing

The methodological workflow implemented in this study is schematically shown in Figure 3. First, a database comprising a flood inventory map and nine flash flood condition factors was created from different sources. Next, we applied the information gain method to extract the valid factors and used the VIF and TOL to verify that there were no serious collinear relationships between the factors. Then, based on the training dataset, we used the FMV model to calculate the FMV values of the factors’ classes and input them into the three machine learning models for training. In addition, the three single machine learning models also were trained using the training dataset. Meanwhile, the test dataset was used to evaluate the training accuracy of the models and determine the optimal parameters of the models. Finally, the flood susceptibility maps were generated, and the prediction performances of the models were evaluated using the ROC and several statistical methods. It should be noted that the machine learning models were run using the packages in R software.

4. Results

4.1. Feature Selection

The predictive abilities of the nine flash flood conditioning factors in terms of flood susceptibility are shown in Figure 4. The altitude had the highest IG value of 0.47, followed by the M3DP (0.38), TWI (0.2), soil texture (0.19), land cover (0.18), DR (0.09), slope (0.09), slope aspect (0.05), and NDVI (0). These results indicate that the IG values of the remaining eight factors were greater than 0, except for the NDVI. This phenomenon (IG value of NDVI < 0) can be explained by the fact that the precipitation in this area exceeded the maximum interception capacity of the tree canopy, and thus the protective effect of the forest was eliminated from the hydrological point of view [27]. Therefore, the NDVI was not further employed to train the six models.
Table 2 presents the results of the multi-collinearity analysis of the eight flash flood conditioning factors (not including the NDVI). It can be seen that the altitude had the highest VIF (5.476) and the lowest TOL (0.183). However, neither of these values exceeded the critical values (10 and 0.1, respectively), indicating the absence of multicollinearity among the eight flash flood conditioning factors. Thus, all eight of the flash flood conditioning factors were taken into account in the modeling.

4.2. Fuzzy Membership Value

The FMV calculation results are presented in Table 3, and the relative distribution of the flood pixels within the factor classes are shown in Figure 5. It can be seen that four classes had FMV values of 0. Among these classes, one was attributed to the altitude, two belonged to the soil texture, and one belonged to the slope aspect. This is because there were no flood pixels in these classes.
The highest FMV values (1.00) for each factor occurred for the altitude class of 337–1494 m, the M3DP class of 116–167 mm, the TWI of −6.50 to 0.36, the DR class of <1000 m, the slope class of 0°–5.36°, the land cover class of built-up areas, the soil texture class of sandy-clay, and the slope aspect class of southeast. These classes had significantly different area ratios and flash flood point density ratios (Figure 5). In addition, as can also be seen from Figure 5, the variation in the FMV values of the five continuous numerical factors (altitude, M3DP, TWI, DR, and slope) exhibited a clear pattern. For the M3DP and TWI, the FMV values were positively correlated with the values of these factors, while for altitude, DR, and slope, the FMV values were inversely correlated with the value of the factors. These results indicate that the incidence of flash floods increased with increasing M3DP and TWI, while it decreased with increasing altitude, DR, and slope.
After the FMV calculations, the FMV values were input to the three hybrid models (SVM-FMV, CART-FMV, and CNN-FMV) for training and prediction.

4.3. Model Training Results

4.3.1. SVM and SVM-FMV

In order to determine the best model structure, the RMSE of the testing dataset (RMSEtesting) was used to reflect the model’s performance. From the Table 4, it is can be seen that the RMSE of each model is almost equal to the SD of prediction values and slightly higher than the half of SD of sample dataset (0.5). This result indicates that there is a difference between the observed and modeled flood susceptibility, but this difference could be considered as acceptable [75]. After conducting the 5 cross-validation procedure, the model with the lowest RMSE was identified as the best model structure. In this study, the best structure for the SVM model (RMSEtesting = 0.38) is cost = 100 and gamma = 0.1, and the best structure for the SVM-FMV model (RMSEtesting = 0.29) has the same cost but different gamma parameters (0.001) as the SVM.
The statistical indices used to evaluate the performance of the SVM and the SVM-FMV models in terms of the training and the testing dataset are presented in Table 4. As can be seen, 222 flood pixels and 214 non-flood pixels were classified correctly in the training dataset of the SVM model, with a sensitivity and specificity of 79.29% and 81.06%, respectively, while 229 flood pixels and 218 non-flood pixels were classified correctly by the SVM-FMV model, with a sensitivity and specificity of 80.92% and 83.52%, respectively. Overall, the classification accuracies of the SVM and SVM-FMV reached 80.15% and 82.17%, respectively, in terms of the training dataset, and 80.88% and 88.97%, respectively, in terms of the testing dataset. This result indicated that the SVM and SVM-FMV models had a high training accuracy.
After the normalization, the flood susceptibility indices obtained using the SVM (FSISVM) and SVM-FMV (FSISVM-FMV) models were classified into five classes using the natural break method (Figure 6). In terms of the FSISVM, the first class of values (0–0.11) identified the zones with a very low flood susceptibility, accounting for 42.79% of the study area. The low (0.11–0.30), moderate (0.30–0.53), and high (0.53–0.78) susceptibility zones accounted for 22.43%, 12.86%, and 11.57% of the study area, respectively. Approximately 10.35% of the study area had a very high flood susceptibility. In terms of the FSISVM-FMV, the very low (0–0.07) and low (0.07–0.30) flood susceptibility classes accounted for 77.02% together of the Dadu River Basin, while the moderate (0.30–0.58) flood susceptibility class accounted for only 6.31%. The high (0.58–0.82) and very high (0.82–1.00) classes accounted for 8.74% and 7.92%, respectively.

4.3.2. CART and CART-FMV

The CART and CART-FMV models were constructed by the training dataset with the 5-fold cross-validation method. After the cross-validation procedure, according to the minimum RMSEtesting, the optimal CART and CART-FMV trees were built (Figure 7). Note that the two trees were not pruned due to their simplicity and good performance.
In terms of the training dataset (Table 4), it can see that the CART and CART-FMV correctly classified the 253 and 245 flood pixels with sensitivities of 84.47% and 80.59%, respectively. The CART and CART-FMV had specificities of 92.34% and 88.75%, respectively, correctly classifying 229 and 213 non-flood pixels, respectively. Overall, the classification accuracies of the CART and CART-FMV reached 88.6% and 84.49%, respectively. In terms of the testing dataset, the classification accuracies of the CART and CART-FMV models reached 83.09% and 88.97%, respectively. These values showed that the two models had a good training performance.
For FSICART and FSICART-FMV, the natural break method was also used to reclassify them into five classes (Figure 6). For the FSICART, the very low flood susceptibility accounted for 59.10% of the study area, followed by the very high (14.75%), low (11.51%), moderate (7.37%), and high (7.28%) classes. However, for the FSICART-FMV, the situation was somewhat different. Although the very low class also accounted for the largest area (53.78%), the moderate class ranked second in area (18.12%) in terms of the FSICART, not the very high class. Notably, the very high class also accounted for a large proportion (12.16%) of the study area. The low class and high class accounted for 7.04% and 8.90%, respectively.

4.3.3. CNN and CNN-FMV

Based on the minimum RMSEtesting, after the 5-fold cross-validation and trial-and-error procedure, we determined the final CNN and CNN-FMV model structures (Figure 8). Considering the eight flash flood conditioning factors used in this study, the input shape was determined to be 8 × 1. In the convolutional layer, the rectified linear unit (relu) function was applied, which was considered to be the most typical activation function [66]. The filter and the kernel size were set to 100 and 2 in the convolutional layer, respectively. In addition, the pool shape in the pooling layer and the units in the full connected layer were determined to be 2 and 32, respectively.
Following the training processes of the two models, their classification results were counted and calculated (Table 4). For the CNN model, 187 flood pixels and 237 non-flood pixels were correctly classified, resulting in a sensitivity of 84.23% and a specificity of 73.60%. For the CNN-FMV model, 242 flood pixels and 216 non-flood pixels were correctly classified, resulting in a sensitivity of 81.21% and a specificity of 87.80%. Overall, the classification accuracies of the CNN and CNN-FMV reached 77.94% and 84.19%, respectively, in terms of the training dataset, while they reached 77.21% and 88.24%, respectively, in terms of the testing dataset. Thus, the two models were considered to have good training accuracy.
For the flood susceptibility indices of the CNN (FSICNN) and CNN-FMV (FSICNN-FMV), the range of values was also divided into five classes using the natural break method (Figure 6). In terms of the FSICNN, over 75% of the entire region of the Dadu River Basin had very low and low flood susceptibilities. Notably, the very high class occupied the smallest area, just 5.73%, while the moderate and high classes accounted for 10.41% and 7.48% of the entire study area, respectively. With respect to the FSICNN-FMV, the very low and low classes, between 0 and 0.28, accounted for 66% of the Dadu River Basin together, while the moderate and high classes accounted for 13.76% and 12.12%, respectively. Similar to the FSICNN, the fifth class of the FSICNN-FMV, characterized by very high flood susceptibility, accounted for the smallest area (7.74%).

4.4. Model Training Results

4.4.1. Statistical Measures

In terms of the validation dataset (Table 5), the CART-FMV had the highest classification accuracy (85.86%) and the highest sensitivity (84.67%) and specificity (87.14%) values, while the classification accuracies of the SVM and CNN were the lowest (78.28%). Notably, the classification accuracy of each of the three individual machine learning models (SVM, CART, and CNN) was lower than that of their corresponding respective hybrid models (SVM-FMV, CART-FMV, and CNN-FMV).
Based on the SCAI results, the SCAI value of the very high class in the CNN-FMV was the lowest (0.11) and that in the CART was the highest (0.19). In addition, all three individual machine learning models (SVM, CART, and CNN) had higher SCAI values than their respective corresponding hybrid models (SVM-FMV, CART-FMV, and CNN-FMV).

4.4.2. ROC Curve

The success rate curves, which were computed using the training dataset, are shown in Figure 9a. From these success rate curves, it can be seen that the CNN-FMV hybrid model had the highest performance (AUC = 0.935), followed by the CART-FMV hybrid model (AUC = 0.915) and the SVM-FMV model (AUC = 0.884). Compared to the three hybrid models, the three individual machine learning models had lower AUC values in terms of the training datasets, with AUC values of 0.884, 0.887, and 0.857 for the SVM, CART, and CNN, respectively.
Similar to the success rate curves, the prediction rates were graphically represented as curves using the validation samples (Figure 9b). Thus, the CNN-FMV hybrid model had the highest performance (AUC = 0.912), followed by the CART-FMV and SVM-FMV hybrid models, both of which had an AUC value of 0.898. Consistent with the success rate results, the AUC values of the three individual machine learning models for the prediction rate were lower than the AUC values of their respective hybrid models. The AUC values of the SVM, CART, and CNN were 0.866, 0.893, and 0.857, respectively.

5. Discussion

5.1. Assessment of the Methodology

One of the elements of novelty of this study is represented by the first application of machine learning models to assess the flood susceptibility within the Dadu River Basin. More important novelty is reflected in the first application of the following ensemble models to determine flood susceptibility in this study: SVM-FMV, CART-FMV, and CNN-FMV. As a bivariate statistical method, the FMV has been used for landslide susceptibility mapping [56], but it has rarely been applied in flood susceptibility mapping as well as in ensemble modeling. In this context, exploring the applicability of FMV in ensemble modelling become one of the novelties of this study. As shown in formulations (2) and (3), one of the advantages of the FMV is that it can be simply and easily implemented since the calculation is based on the FR. Therefore, it has great potential in future practical application. In addition, as with all bivariate statistical models, the FMV provides a good representation of the relationships between the flood conditioning factors and the flood occurrence. However, bivariate statistical methods lack the ability to capture the hidden characteristics of floods because of the complex triggering mechanism of floods [17]. Fortunately, machine learning models can reflect more of the high-dimensional relationships between the non-linearly related input variables [76]. As a popular machine learning model, the SVM is widely used in the assessment of flood and even landslide susceptibility [23,77]. This algorithm has been demonstrated to have an excellent generalization ability [23]. As a decision tree algorithm, the CART is popularly used in the assessment of natural disaster susceptibility. The data distribution and the existence of data outliers do not have a huge impact on its results, which is one of the advantages of this algorithm [78]. The CNN, as one of the most popular deep learning techniques, is able to obtain reliable results comparable to or superior to those of conventional machine learning methods [79]. In 2020, Wang et al. proposed two CNN frameworks for flood susceptibility prediction [80]. Gang et al. and Khosravi et al. have applied CNN to the prediction of flood susceptibility in cities and Iran, respectively [66,67]. However, the application of CNN in flood susceptibility prediction is still rare [80]. It is unclear whether CNN can be combined with statistical models to improve the accuracy of flood susceptibility prediction. In order to explore this issue, CNN was used to ensemble with FMV for the first time in this study. Overall, in the three hybrid models presented in this study, the FMV clearly depicts the relationship between the factors and flooding, and it can provide a more appropriate data representation for the machine learning methods than the raw data. Therefore, the hybrid models potentially have better accuracies and performances in flood susceptibility mapping than the single machine learning models.

5.2. Assessment of the Model Performances

According to the model training results (Table 4 and Figure 9a), the classification accuracies ranged from 77.79% for the CNN to 88.60% for the CART, and the AUC values ranged from 0.884 for CNN to 0.935 for CNN-FMV. According to the model validation results (Table 5 and Figure 9b), the classification accuracies ranged from 78.28% for the CNN to 85.86% for the CART, and the AUC values ranged from 0.857 for CNN to 0.912 for CNN-FMV. These classification accuracies and AUC values were higher than 0.75% and 0.80, respectively, which showed that all the six models had acceptable fitting accuracies and good prediction performances. Therefore, the applications of these six models in this study were successful. However, in terms of training dataset (Table 4 and Figure 9a), it can be seen that the CART had the best classification accuracy of 88.60%, while the CNN-FMV hybrid model had the highest AUC value of 0.935. In terms of validation dataset (Table 5 and Figure 9b), we can observe that the one with the highest classification accuracy was the CART-FMV hybrid model (85.86%), while the one with the highest AUC value was the CNN-FMV hybrid model (0.912). The reason for this result is due to the fact that this study treated values with predictions greater than 0.5 as 1 (flood point), and vice versa, as 0 (non-flood point) when calculating the classification accuracy, which is different from calculating AUC values [81]. Therefore, the ranking of each model based on the classification accuracy and the ranking based on the AUC value may be different, a phenomenon that also appeared in many studies [17,19,27,28,43,69,70,81]. In other words, the model with the maximum classification accuracy does not necessarily have the highest AUC value. However, as a useful method for representing the quality of the probabilistic natural disaster susceptibility model classifiers, the ROC curve indicates that the CNN-FMV hybrid model had the highest Success-rate and prediction-rate AUC values, 0.935 and 0.912 respectively. This result indicates that the CNN-FMV hybrid model had the best fitting and prediction performances in this study. In addition, the SCAI results revealed that the CNN-FMV hybrid model had the lowest SCAI value (0.11), which also indicates that it had the best result in terms of the validation results. This is because the FMV can express the degree of correlation between each class in the factors and the development of floods. Moreover, the CNN can consider the topographical information of the surrounding environment to achieve a higher performance [67].
Another important finding was obtained from the model validation results (Table 5 and Figure 9). Compared with their single machine learning models, the AUC values of the SVM-FMV, CART-FMV, and CNN-FMV were 0.032, 0.005, and 0.055 higher; their SCAI values were 0.05, 0.03, and 0.2 lower; and their classification accuracies were 4.48%, 1.38%, and 5.86% higher, respectively. These results suggest that the three hybrid models proposed in this study are some degree of accuracy improvement compared to their respective single machine learning models. Therefore, these three models can be used as a benchmark for future studies, the main scope of which will be to assess the flash flood susceptibility in specific areas. However, some studies on the ensemble modeling in flash flood susceptibility have showed higher AUC values of their hybrid models. For example, Costache et al. showed ADT-IOE have excellent capability for flood susceptibility prediction with AUC = 0.972 in Suha River Basin [27]. At first glance, it seems that the accuracy of the hybrid models proposed in this study is less than the ADT-IOE model that have already been used in a basin. However, the Suha River Basin (363 km2) is much smaller than our study basin (90,016 km2). Therefore, it is reasonable for the model to have lower accuracy in a larger area. For a large area, the study of Dodangeh et al. showed that SVR-HS model had a lower accuracy (AUC = 0.75) than the hybrid models proposed by this study [81], although the area of the basin studied in their research (18,644 km2) is only quarter of Dudu River Basin. Therefore, the proposed hybrid models are promising in flash flood susceptibility prediction, especially in a large basin.
Although, the RMSE result in this study showed that these models are not perfect, due to the RMSE is not smaller than a half of SD. However, when review relative previous studies [14,81], it is can be found that the SD of prediction values is also almost equal to the RMSE in their studies. This result indicates that the SD and RMSE results obtained in this study is not perfect but reasonable. This may be one of the limitations of the flood susceptibility mapping based on machine learning models compared to hydrological model. As shown in the study of Kastridis et al., the results based hydrological model may have a better RMSE [75].
In terms of the FSI results of the six models, the percentages of the high and very high FSI values ranged from 13.2% for the CNN to 22.02% for the CART. These zones were mainly distributed in the southern part of the Dadu River Basin, where the terrain is relatively flat and the precipitation is high. In addition, the valley areas in the northern part of the Dadu River Basin are similarly characterized by high or very high flood susceptibilities. The above two spatial distribution characteristics of the high and very high flood susceptibility classes are similar to the results obtained by Costache and Bui [19].

5.3. Applications and Limitations

The three hybrid models (SVM-FMV, CART-FMV, and CNN-FMV) proposed in this study have been demonstrated to be excellent at predicting flood susceptibility. Thus, they can be applied to any other area as an effective method of identifying flood-prone areas. In addition, these models also have the potential to be applied to the susceptibility assessment of other natural disasters, such as landslides and mudslides. Furthermore, the results of this study (the flood susceptibility map) may be useful in helping the local authorities take the most appropriate measures to mitigate the negative effects of flash floods.
However, several limitations exist in this study. The results generated by these three hybrid models cannot describe the details of flooding including the flood inundation extent, water depth, and the velocity [82]. For more detailed scales, such as river sector, the development of a combined hydraulic model is recommended, which can more easily take into account anthropogenic influences in order to assess the extent of flooding for different probability estimates of flow [43]. In the modelling process of this study, only the typical CNN architecture (1D-CNN) was used, and the use of higher dimensional architectures (2D-CNN and 3D-CNN) is suggested for future studies. On the other hand, using binary values (0, 1) for flood absence and presence cannot reflect the frequencies of each point, which should be considered in future studies as possible [81]. In addition, only the location of the flood events was taken into account when preparing the flood sample, not the date of the occurrence. However, the date of the flooding could reflect the effects of the changes in certain factors on the occurrence of flooding, such as land use changes. This topic is also an interesting future research direction. In terms of the flood factor selection, more number of them could be considered, such as slope-length, curvature, and others, which may contribute to better prediction accuracy. In terms of the results, we did not list the importance of each factor. This is due to the fact that the main purpose of this study was to compare the performances of the novel hybrid models and, except for the CART, both the SVM and CNN algorithms cannot be used to directly calculate the importance of the factors. Note that the FSI results of the six models have some degree of variation (Figure 6). As was described by Shafizadeh-Moghadam et al., each method creates different results [24]. Thus, the technique of combining the results of individual models, which is considered to generate more generalizable results, could be applied in future studies. In addition, integrating machine learning and physical simulation methods for flood susceptibility mapping is worth to be considered in future studies.

6. Conclusions

Flash flood events are becoming more frequent worldwide. Therefore, the accurate identification of areas prone to flash floods is particularly important in flash flood prevention and mitigation. In this study, we proposed three proposed hybrid models (SVM-FMV, CART-FMV, and CNN-FMV) to identify the areas prone to flash floods within the Dadu River basin. Then, we evaluated the capabilities of these three hybrid models for flood susceptibility prediction and compared them with three single machine learning models. The ROC curves revealed that the CNN-FMV hybrid model had the best fitting (AUC value = 0.915) and prediction performance (AUC value = 0.912). This result is attributed to the fact that FMV clearly depicts the relationship between the factors and flooding, and CNN achieve a higher performance by considering the topographical information of the surrounding environment. In addition, according to the validation results of the ROC, the statistical measures, and the SCAI, the three novel hybrid models proposed in this study all outperformed their respective single machine learning models in terms of flood susceptibility prediction. Compared with their respective single machine learning models, the AUC values of the SVM-FMV, CART-FMV, and CNN-FMV were 0.032, 0.005, and 0.055 higher; their SCAI values were 0.05, 0.03, and 0.2 lower; and their classification accuracies were 4.48%, 1.38%, and 5.86% higher, respectively. Therefore, these three hybrid models can be used as a reference for future studies involving flood susceptibility predictions and even for predicting other natural disasters in a given area. The FSI results obtained in this study revealed that the proportion of the area with high and very high flood susceptibilities ranged from 13.2% for the CNN to 22.02% for the CART. These zones were mainly distributed in the southern part of the Dadu River Basin where the terrain is relatively flat and the precipitation is high. However, flash flood susceptibility mapping cannot describe the details of flooding such as flood inundation extent, and also cannot quantify the impact of the flood management measures. In future study, it is recommended to develop combined hydraulic model.

Author Contributions

Conceptualization, J.L. and J.X.; methodology, J.L.; software, J.W.; validation, Z.Y. and H.S.; formal analysis, N.W.; investigation, W.C.; resources, J.L.; data curation, J.W.; writing—original draft preparation, J.L.; writing—review and editing, J.X.; visualization, W.C.; supervision, J.X.; project administration, H.S.; funding acquisition, J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key R & D project of Sichuan Science and Technology Department (Grant No. 2021YFQ0042), Key R & D project of Sichuan Science and Technology Department (Grant No. 21QYCX0016), National Key R&D Program of China (2020YFD1100701),Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDA20030302), Science and Technology Project of Xizang Autonomous Region (Grant No. XZ201901-GA-07), National Flash Flood Investigation and Evaluation Project (Grant No. SHZH-IWHR-57), and Project form Science and Technology Bureau of Altay Region in Yili Kazak Autonomous Prefecture(Grant No.Y99M4600AL).

Institutional Review Board Statement

The study did not involve humans or animals.

Informed Consent Statement

The study did not involve humans or animals.

Data Availability Statement

Precipitation data (GPM) are available at https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDE_06/summary?keywords=GPM (accessed on 26 March 2021). DEM data are available at http://www.gscloud.cn/ (accessed on 8 May 2021). Land use data are available at https://www.resdc.cn/ (accessed on 6 April 2021). NDVI data are available at https://www.geodata.cn/ (accessed on 23 April 2021). River data and flood inventory map are available at the National Flash Flood Investigation and Evaluation Project.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Antonetti, M.; Horat, C.; Sideris, I.V.; Zappa, M. Ensemble flood forecasting considering dominant runoff processes—Part 1: Set-up and application to nested basins (Emme, Switzerland). Nat. Hazards Earth Syst. Sci. 2019, 19, 19–40. [Google Scholar] [CrossRef] [Green Version]
  2. Shen, G.; Hwang, S.N. Spatial–Temporal snapshots of global natural disaster impacts Revealed from EM-DAT for 1900-2015. Geomat. Nat. Hazards Risk 2019, 10, 912–934. [Google Scholar] [CrossRef] [Green Version]
  3. Arnell, N.W.; Gosling, S.N. The impacts of climate change on river flood risk at the global scale. Clim. Chang. 2016, 134, 387–401. [Google Scholar] [CrossRef] [Green Version]
  4. Xiong, J.N.; Ye, C.C.; Cheng, W.M.; Guo, L.; Zhou, C.H.; Zhang, X.L. The Spatiotemporal Distribution of Flash Floods and Analysis of Partition Driving Forces in Yunnan Province. Sustainability 2019, 11, 2926–2944. [Google Scholar] [CrossRef] [Green Version]
  5. Barredo, J.I. Major flood disasters in Europe: 1950–2005. Nat. Hazards 2007, 42, 125–148. [Google Scholar] [CrossRef]
  6. Pereira, S.; Diakakis, M.; Deligiannakis, G.; Zezere, J.L. Comparing flood mortality in Portugal and Greece (Western and Eastern Mediterranean). Int. J. Disaster Risk Reduct. 2017, 22, 147–157. [Google Scholar] [CrossRef]
  7. Ngo, P.-T.T.; Hoang, N.-D.; Pradhan, B.; Nguyen, Q.K.; Tran, X.T.; Nguyen, Q.M.; Nguyen, V.N.; Samui, P.; Tien Bui, D. A Novel Hybrid Swarm Optimized Multilayer Neural Network for Spatial Prediction of Flash Floods in Tropical Areas Using Sentinel-1 SAR Imagery and Geospatial Data. Sensors 2018, 18, 3704. [Google Scholar] [CrossRef] [Green Version]
  8. Vogel, R. Methodology and software solutions for multicriteria evaluation of floodplain retention suitability. Cartogr. Geogr. Inf. Sci. 2016, 43, 301–320. [Google Scholar] [CrossRef]
  9. Chowdary, V.M.; Chakraborthy, D.; Jeyaram, A.; Murthy, Y.V.N.K.; Sharma, J.R.; Dadhwal, V.K. Multi-Criteria Decision Making Approach for Watershed Prioritization Using Analytic Hierarchy Process Technique and GIS. Water Resour. Manag. 2013, 27, 3555–3571. [Google Scholar] [CrossRef]
  10. Knebl, M.R.; Yang, Z.L.; Hutchison, K.; Maidment, D.R. Regional scale flood modeling using NEXRAD rainfall, GIS, and HEC-HMS/RAS: A case study for the San Antonio River Basin Summer 2002 storm event. J. Environ. Manag. 2005, 75, 325–336. [Google Scholar] [CrossRef] [PubMed]
  11. Tehrany, M.; Jones, S.; Shabani, F. Identifying the essential flood conditioning factors for flood prone area mapping using machine learning techniques. Catena 2018, 175, 174–192. [Google Scholar] [CrossRef]
  12. Pham, B.T.; Luu, C.; Dao, D.V.; Phong, T.V.; Nguyen, H.D.; Le, H.V.; von Meding, J.; Prakash, I. Flood risk assessment using deep learning integrated with multi-criteria decision analysis. Knowl. -Based Syst. 2021, 219, 15. [Google Scholar] [CrossRef]
  13. Malik, S.; Pal, S.C.; Arabameri, A.; Chowdhuri, I.; Saha, A.; Chakrabortty, R.; Roy, P.; Das, B. GIS-based statistical model for the prediction of flood hazard susceptibility. Environ. Dev. Sustain. 2021, 23, 16713–16743. [Google Scholar] [CrossRef]
  14. Panahi, M.; Dodangeh, E.; Rezaie, F.; Khosravi, K.; Le, H.; Lee, M.J.; Lee, S.; Pham, B.T. Flood spatial prediction modeling using a hybrid of meta-optimization and support vector regression modeling. Catena 2021, 199, 15. [Google Scholar] [CrossRef]
  15. Natarajan, L.; Usha, T.; Gowrappan, M.; Kasthuri, B.P.; Moorthy, P.; Chokkalingam, L. Flood Susceptibility Analysis in Chennai Corporation Using Frequency Ratio Model. J. Indian Soc. Remote Sens. 2021, 49, 1533–1543. [Google Scholar] [CrossRef]
  16. Rahmati, O.; Pourghasemi, H.R.; Zeinivand, H. Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto Int. 2016, 31, 42–70. [Google Scholar] [CrossRef]
  17. Wang, Y.; Fang, Z.; Hong, H.; Costache, R.; Tang, X. Flood susceptibility mapping by integrating frequency ratio and index of entropy with multilayer perceptron and classification and regression tree. J. Environ. Manag. 2021, 289. [Google Scholar] [CrossRef]
  18. Costache, R.; Hong, H.; Quoc Bao, P. Comparative assessment of the flash-flood potential within small mountain catchments using bivariate statistics and their novel hybrid integration with machine learning models. Sci. Total Environ. 2020, 711. [Google Scholar] [CrossRef] [PubMed]
  19. Costache, R.; Bui, D.T. Spatial prediction of flood potential using new ensembles of bivariate statistics and artificial intelligence: A case study at the Putna river catchment of Romania. Sci. Total Environ. 2019, 691, 1098–1118. [Google Scholar] [CrossRef] [PubMed]
  20. Zhu, Z.J.; Zhang, Y. Flood disaster risk assessment based on random forest algorithm. Neural Comput. Appl. 2021. [Google Scholar] [CrossRef]
  21. Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Flood susceptibility analysis and its verification using a novel ensemble support vector machine and frequency ratio method. Stoch. Environ. Res. Risk Assess. 2015, 29, 1149–1165. [Google Scholar] [CrossRef]
  22. Zhao, G.; Pang, B.; Xu, Z.X.; Yue, J.J.; Tu, T.B. Mapping flood susceptibility in mountainous areas on a national scale in China. Sci. Total Environ. 2018, 615, 1133–1142. [Google Scholar] [CrossRef]
  23. Choubin, B.; Moradi, E.; Golshan, M.; Adamowski, J.; Sajedi-Hosseini, F.; Mosavi, A. An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci. Total Environ. 2019, 651, 2087–2096. [Google Scholar] [CrossRef]
  24. Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J. Environ. Manag. 2018, 217, 1–11. [Google Scholar] [CrossRef] [Green Version]
  25. Kia, M.B.; Pirasteh, S.; Pradhan, B.; Mahmud, A.R.; Sulaiman, W.N.A.; Moradi, A. An artificial neural network model for flood simulation using GIS: Johor River Basin, Malaysia. Environ. Earth Sci. 2012, 67, 251–264. [Google Scholar] [CrossRef]
  26. Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. J. Hydrol. 2014, 512, 332–343. [Google Scholar] [CrossRef]
  27. Costache, R.; Bui, D.T. Identification of areas prone to flash-flood phenomena using multiple-criteria decision-making, bivariate statistics, machine learning and their ensembles. Sci. Total Environ. 2020, 712. [Google Scholar] [CrossRef] [PubMed]
  28. Costache, R.D.; Thao, N.T.P.; Bui, D.T. Novel Ensembles of Deep Learning Neural Network and Statistical Learning for Flash-Flood Susceptibility Mapping. Water 2020, 12, 1549–1573. [Google Scholar] [CrossRef]
  29. Yang, Y.; Tang, G.Q.; Lei, X.H.; Hong, Y.; Yang, N. Can Satellite Precipitation Products Estimate Probable Maximum Precipitation: A Comparative Investigation with Gauge Data in the Dadu River Basin. Remote Sens. 2018, 10, 41. [Google Scholar] [CrossRef] [Green Version]
  30. Hou, Y.; Liu, R.; Zhao, J. Characteristics of flood disaster in dry-hot valley of Dadu river in Hanyuan county, Sichuan province in Qing Dynasty. Bull. Soil Water Conserv. 2019, 39, 271–277. [Google Scholar]
  31. Zhang, D. Research on Uncertainty Analysis of Flood Frequency; China Institute of Water Resources & Hydropower Research: Beijing, China, 2015. [Google Scholar]
  32. Yuan, X.M.; Liu, Y.S.; Huang, Y.H.; Tian, F.C. An approach to quality validation of large-scale data from the Chinese Flash Flood Survey and Evaluation (CFFSE). Nat. Hazards 2017, 89, 693–704. [Google Scholar] [CrossRef]
  33. Liu, Y.S.; Yuan, X.M.; Guo, L.; Huang, Y.H.; Zhang, X.L. Driving Force Analysis of the Temporal and Spatial Distribution of Flash Floods in Sichuan Province. Sustainability 2017, 9, 1527. [Google Scholar] [CrossRef] [Green Version]
  34. Xiong, J.N.; Pang, Q.; Cheng, W.M.; Wang, N.; Yong, Z.W. Reservoir risk modelling using a hybrid approach based on the feature selection technique and ensemble methods. Geocarto Int. 2020, 1–25. [Google Scholar] [CrossRef]
  35. Hosseini, F.S.; Choubin, B.; Mosavi, A.; Nabipour, N.; Shamshirband, S.; Darabi, H.; Haghighi, A.T. Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: Application of the simulated annealing feature selection method. Sci. Total Environ. 2020, 711, 14. [Google Scholar] [CrossRef]
  36. Khosravi, K.; Pham, B.T.; Chapi, K.; Shirzadi, A.; Shahabi, H.; Revhaug, I.; Prakash, I.; Bui, D.T. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 2018, 627, 744–755. [Google Scholar] [CrossRef] [PubMed]
  37. Khosravi, K.; Shahabi, H.; Pham, B.T.; Adamowski, J.; Shirzadi, A.; Pradhan, B.; Dou, J.; Ly, H.B.; Grof, G.; Ho, H.L.; et al. A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods. J. Hydrol. 2019, 573, 311–323. [Google Scholar] [CrossRef]
  38. Bout, B.; Jetten, V.G. The validity of flow approximations when simulating catchment-integrated flash floods. J. Hydrol. 2018, 556, 674–688. [Google Scholar] [CrossRef]
  39. Chaabani, C.; Chini, M.; Abdelfattah, R.; Hostache, R.; Chokmani, K. Flood Mapping in a Complex Environment Using Bistatic TanDEM-X/TerraSAR-X InSAR Coherence. Remote Sens. 2018, 10, 1873. [Google Scholar] [CrossRef] [Green Version]
  40. Regmi, A.D.; Devkota, K.C.; Yoshida, K.; Pradhan, B.; Pourghasemi, H.R.; Kumamoto, T.; Akgun, A. Application of frequency ratio, statistical index, and weights-of-evidence models and their comparison in landslide susceptibility mapping in Central Nepal Himalaya. Arab. J. Geosci. 2014, 7, 725–742. [Google Scholar] [CrossRef]
  41. Arora, A.; Arabameri, A.; Pandey, M.; Siddiqui, M.A.; Shukla, U.K.; Bui, D.T.; Mishra, V.N.; Bhardwaj, A. Optimization of state-of-the-art fuzzy-metaheuristic ANFIS-based machine learning models for flood susceptibility prediction mapping in the Middle Ganga Plain, India. Sci. Total Environ. 2021, 750, 21. [Google Scholar] [CrossRef] [PubMed]
  42. Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Bui, D.T.; Pham, B.T.; Khosravi, K. A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ. Model. Softw. 2017, 95, 229–245. [Google Scholar] [CrossRef]
  43. Ali, S.A.; Parvin, F.; Pham, Q.B.; Vojtek, M.; Vojtekova, J.; Costache, R.; Linh, N.T.T.; Nguyen, H.Q.; Ahmad, A.; Ghorbani, M.A. GIS-based comparative assessment of flood susceptibility mapping using hybrid multi-criteria decision-making approach, naive Bayes tree, bivariate statistics and logistic regression: A case of Topla basin, Slovakia. Ecol. Indic. 2020, 117, 23. [Google Scholar] [CrossRef]
  44. Liu, J.F.; Wang, X.Q.; Zhang, B.; Li, J.; Zhang, J.Q.; Liu, X.J. Storm flood risk zoning in the typical regions of Asia using GIS technology. Nat. Hazards 2017, 87, 1691–1707. [Google Scholar] [CrossRef]
  45. Jin, X.; Shao, H.; Zhang, C.; Yan, Y. The Applicability Evaluation of Three Satellite Products in Tianshan Mountains. J. Nat. Resour. 2016, 31, 2074–2085. [Google Scholar]
  46. Chen, X.; Zhong, R.; Wang, Z.; Lai, C.; Zhang, J. Evaluation on the accuracy and hydrological performance of the latest-generation GPM IMERG product over South China. J. Hydraul. Eng. 2017, 48, 1147–1156. [Google Scholar]
  47. Peng, Y.; Wang, Q.H.; Wang, H.T.; Lin, Y.Y.; Song, J.Y.; Cui, T.T.; Fan, M. Does landscape pattern influence the intensity of drought and flood? Ecol. Indic. 2019, 103, 173–181. [Google Scholar] [CrossRef]
  48. Powell, S.J.; Jakeman, A.; Croke, B. Can NDVI response indicate the effective flood extent in macrophyte dominated floodplain wetlands? Ecol. Indic. 2014, 45, 486–493. [Google Scholar] [CrossRef]
  49. Bui, D.T.; Ho, T.C.; Pradhan, B.; Pham, B.T.; Nhu, V.H.; Revhaug, I. GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks. Environ. Earth Sci. 2016, 75, 22. [Google Scholar] [CrossRef]
  50. Yang, Q.L.; Shao, J.M.; Scholz, M.; Plant, C. Feature selection methods for characterizing and classifying adaptive Sustainable Flood Retention Basins. Water Res. 2011, 45, 993–1004. [Google Scholar] [CrossRef] [PubMed]
  51. Chen, W.; Peng, J.B.; Hong, H.Y.; Shahabi, H.; Pradhan, B.; Liu, J.Z.; Zhu, A.X.; Pei, X.J.; Duan, Z. Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China. Sci. Total Environ. 2018, 626, 1121–1135. [Google Scholar] [CrossRef] [PubMed]
  52. Dou, J.; Yunus, A.P.; Bui, D.T.; Merghadi, A.; Sahana, M.; Zhu, Z.F.; Chen, C.W.; Khosravi, K.; Yang, Y.; Pham, B.T. Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci. Total Environ. 2019, 662, 332–346. [Google Scholar] [CrossRef] [PubMed]
  53. Gheshlaghi, H.A.; Feizizadeh, B. An integrated approach of analytical network process and fuzzy based spatial decision making systems applied to landslide risk mapping. J. Afr. Earth Sci. 2017, 133, 15–24. [Google Scholar] [CrossRef]
  54. Yaghoob Nejad Asl, N. Application of fuzzy logic in the evaluation of land suitability for urban. Sci. J. Iran. Geogr. Assoc 2013, 36, 231–249. [Google Scholar]
  55. Hong, H.Y.; Ilia, I.; Tsangaratos, P.; Chen, W.; Xu, C. A hybrid fuzzy weight of evidence method in landslide susceptibility analysis on the Wuyuan area, China. Geomorphology 2017, 290, 1–16. [Google Scholar] [CrossRef]
  56. Gheshlaghi, H.A.; Feizizadeh, B. GIS-based ensemble modelling of fuzzy system and bivariate statistics as a tool to improve the accuracy of landslide susceptibility mapping. Nat. Hazards 2021, 107, 1981–2014. [Google Scholar] [CrossRef]
  57. Choubin, B.; Darabi, H.; Rahmati, O.; Sajedi-Hosseini, F.; Klove, B. River suspended sediment modelling using the CART model: A comparative study of machine learning techniques. Sci. Total Environ. 2018, 615, 272–281. [Google Scholar] [CrossRef] [PubMed]
  58. Naghibi, S.A.; Ahmadi, K.; Daneshi, A. Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping. Water Resour. Manag. 2017, 31, 2761–2775. [Google Scholar] [CrossRef]
  59. Pham, Q.B.; Yang, T.C.; Kuo, C.M.; Tseng, H.W.; Yu, P.S. Combing Random Forest and Least Square Support Vector Regression for Improving Extreme Rainfall Downscaling. Water 2019, 11, 451. [Google Scholar] [CrossRef] [Green Version]
  60. Vapnik, V.N. The Nature of Statistical Learning Theory. IEEE Trans. Neural Netw. 1995, 8, 1564. [Google Scholar] [CrossRef]
  61. Xu, C.; Dai, F.; Xu, X.; Lee, Y. GIS-based support vector machine modeling of earthquake-triggered landslide susceptibility in the Jianjiang River watershed, China. Geomorphology 2012, 145–146, 70–80. [Google Scholar] [CrossRef]
  62. Ripley, B. Tree: Classification and Regression Trees. Available online: https://cran.r-project.org/web/packages/tree/tree.pdf (accessed on 17 October 2021).
  63. Choubin, B.; Zehtabian, G.; Azareh, A.; Rafiei-Sardooi, E.; Sajedi-Hosseini, F.; Kisi, O. Precipitation forecasting using classification and regression trees (CART) model: A comparative study of different approaches. Environ. Earth Sci. 2018, 77. [Google Scholar] [CrossRef]
  64. Elmahdy, S.; Ali, T.; Mohamed, M. Flash Flood Susceptibility Modeling and Magnitude Index Using Machine Learning and Geohydrological Models: A Modified Hybrid Approach. Remote Sens. 2020, 12, 2695. [Google Scholar] [CrossRef]
  65. Li, H.; Sun, J.; Wu, J. Predicting business failure using classification and regression tree: An empirical comparison with popular classical statistical methods and top classification mining methods. Expert Syst. Appl. 2010, 37, 5895–5904. [Google Scholar] [CrossRef]
  66. Khosravi, K.; Panahi, M.; Golkarian, A.; Keesstra, S.D.; Saco, P.M.; Dieu Tien, B.; Lee, S. Convolutional neural network approach for spatial prediction of flood hazard at national scale of Iran. J. Hydrol. 2020, 591. [Google Scholar] [CrossRef]
  67. Zhao, G.; Pang, B.; Xu, Z.; Peng, D.; Zuo, D. Urban flood susceptibility assessment based on convolutional neural networks. J. Hydrol. 2020, 590. [Google Scholar] [CrossRef]
  68. Baratloo, A.; Hosseini, M.; Negida, A.; El Ashal, G. Evidence Based Emergency Medicine; Part 1: Simple Definition and Calculation of Accuracy, Sensitivity and Specificity. Emergency 2015, 3, 48–49. [Google Scholar] [PubMed]
  69. Costache, R.; Quoc Bao, P.; Avand, M.; Nguyen Thi Thuy, L.; Vojtek, M.; Vojtekova, J.; Lee, S.; Dao Nguyen, K.; Pham Thi Thao, N.; Tran Duc, D. Novel hybrid models between bivariate statistics, artificial neural networks and boosting algorithms for flood susceptibility assessment. J. Environ. Manag. 2020, 265. [Google Scholar] [CrossRef] [PubMed]
  70. Arabameri, A.; Saha, S.; Mukherjee, K.; Blaschke, T.; Chen, W.; Ngo, P.T.T.; Band, S.S. Modeling Spatial Flood using Novel Ensemble Artificial Intelligence Approaches in Northern Iran. Remote Sens. 2020, 12, 3423. [Google Scholar] [CrossRef]
  71. Singh, J.; Knapp, H.V.; Arnold, J.G.; De Missie, M. HYDROLOGICAL MODELING OF THE IROQUOIS RIVER WATERSHED USING HSPF AND SWAT. JAWRA J. Am. Water Resour. Assoc. 2005, 41, 343–360. [Google Scholar] [CrossRef]
  72. Gupta, R.P.; Kanungo, D.P.; Arorac, M.K.; Sarkar, S. Approaches for comparative evaluation of raster GIS-based landslide susceptibility zonation maps. Int. J. Appl. Earth Obs. Geoinf. 2008, 10, 330–341. [Google Scholar] [CrossRef]
  73. Sahana, M.; Rehman, S.; Sajjad, H.; Hong, H. Exploring effectiveness of frequency ratio and support vector machine models in storm surge flood susceptibility assessment: A study of Sundarban Biosphere Reserve, India. Catena 2020, 189. [Google Scholar] [CrossRef]
  74. Termeh, S.V.R.; Kornejady, A.; Pourghasemi, H.R.; Keesstra, S. Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Sci. Total Environ. 2018, 615, 438–451. [Google Scholar] [CrossRef] [PubMed]
  75. Kastridis, A.; Kirkenidis, C.; Sapountzis, M. An integrated approach of flash flood analysis in ungauged Mediterranean watersheds using post-flood surveys and unmanned aerial vehicles. Hydrol. Process. 2020, 34, 4920–4939. [Google Scholar] [CrossRef]
  76. Hong, H.; Tsangaratos, P.; Ilia, I.; Liu, J.; Zhu, A.X.; Chen, W. Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China. Sci. Total Environ. 2018, 625, 575–588. [Google Scholar] [CrossRef] [PubMed]
  77. Huang, Y.; Zhao, L. Review on landslide susceptibility mapping using support vector machines. Catena 2018, 165, 520–529. [Google Scholar] [CrossRef]
  78. Sutton, C.D. Classification and Regression Trees, Bagging, and Boosting. In Handbook of Statistics; Rao, C.R., Wegman, E.J., Solka, J.L., Eds.; Elsevier: Amsterdam, The Netherlands, 2005; Volume 24, pp. 303–329. [Google Scholar]
  79. Farsal, W.; Anter, S.; Ramdani, M.; Assoc Comp, M. Deep Learning: An Overview. In International Conference on Intelligent Systems, Theories and Applications, Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications, Rabat, Morocco, 24–25 October 2018; Assoc Computing Machinery: New York, NY, USA, 2018; pp. 1–6. [Google Scholar]
  80. Wang, Y.; Fang, Z.C.; Hong, H.Y.; Peng, L. Flood susceptibility mapping using convolutional neural network frameworks. J. Hydrol. 2020, 582, 15. [Google Scholar] [CrossRef]
  81. Dodangeh, E.; Panahi, M.; Rezaie, F.; Lee, S.; Bui, D.T.; Lee, C.W.; Pradhan, B. Novel hybrid intelligence models for flood-susceptibility prediction: Meta optimization of the GMDH and SVR models with the genetic algorithm and harmony search. J. Hydrol. 2020, 590. [Google Scholar] [CrossRef]
  82. Mazzoleni, M.; Bacchi, B.; Barontini, S.; Baldassarre, G.D.; Pilotti, M.; Ranzi, R. Flooding Hazard Mapping in Floodplain Areas Affected by Piping Breaches in the Po River, Italy. J. Hydrol. Eng. 2013, 19, 717–731. [Google Scholar] [CrossRef]
Figure 1. The study area of Dadu river basin: (a) digital elevation model (DEM), and inventory map in Dadu river basin; (b) the location of Dadu river basin in China; and (c) flooded area in the city of Ya’an. Note: panel (c) is cited from https://www.sohu.com/a/413756156_583574 (accessed on 18 November 2021).
Figure 1. The study area of Dadu river basin: (a) digital elevation model (DEM), and inventory map in Dadu river basin; (b) the location of Dadu river basin in China; and (c) flooded area in the city of Ya’an. Note: panel (c) is cited from https://www.sohu.com/a/413756156_583574 (accessed on 18 November 2021).
Remotesensing 13 04945 g001
Figure 2. Flash flood conditioning factors: (a) altitude, (b) slope, (c) slope aspect, (d) topographic wetness index (TWI), (e) maximum three-day precipitation (M3DP), (f) land cover, (g) soil texture, (h) normalized difference vegetation index (NDVI), and (i) distance to the river (DR).
Figure 2. Flash flood conditioning factors: (a) altitude, (b) slope, (c) slope aspect, (d) topographic wetness index (TWI), (e) maximum three-day precipitation (M3DP), (f) land cover, (g) soil texture, (h) normalized difference vegetation index (NDVI), and (i) distance to the river (DR).
Remotesensing 13 04945 g002
Figure 3. The processing of the methodology used in this study.
Figure 3. The processing of the methodology used in this study.
Remotesensing 13 04945 g003
Figure 4. Information Gain (IG) values of the flash flood conditioning factors: altitude, slope, slope aspect, topographic wetness index (TWI), maximum three-day precipitation (M3DP), land cover, soil texture, normalized difference vegetation index (NDVI), and distance to the river (DR).
Figure 4. Information Gain (IG) values of the flash flood conditioning factors: altitude, slope, slope aspect, topographic wetness index (TWI), maximum three-day precipitation (M3DP), land cover, soil texture, normalized difference vegetation index (NDVI), and distance to the river (DR).
Remotesensing 13 04945 g004
Figure 5. Frequency distribution of the flood pixels and the Fuzzy membership values (FMVs) of the factors (a) altitude, (b) maximum three-day precipitation (M3DP), (c) topographic wetness index (TWI), (d) distance to the river (DR), (e) slope, (f) land cover, (g) soil texture, and (h) slope aspect.
Figure 5. Frequency distribution of the flood pixels and the Fuzzy membership values (FMVs) of the factors (a) altitude, (b) maximum three-day precipitation (M3DP), (c) topographic wetness index (TWI), (d) distance to the river (DR), (e) slope, (f) land cover, (g) soil texture, and (h) slope aspect.
Remotesensing 13 04945 g005
Figure 6. Spatial distributions and area proportions of the FSI classes: (a) FSISVM, (b) FSISVM-FMV, (c) FSICART, (d) FSICART-FMV, (e) FSICNN, and (f) FSICNN-FMV.
Figure 6. Spatial distributions and area proportions of the FSI classes: (a) FSISVM, (b) FSISVM-FMV, (c) FSICART, (d) FSICART-FMV, (e) FSICNN, and (f) FSICNN-FMV.
Remotesensing 13 04945 g006
Figure 7. Optimal tree used for (a) the CART model and (b) the CART-FMV hybrid model.
Figure 7. Optimal tree used for (a) the CART model and (b) the CART-FMV hybrid model.
Remotesensing 13 04945 g007
Figure 8. Architecture of the 1D-CNN used in this study.
Figure 8. Architecture of the 1D-CNN used in this study.
Remotesensing 13 04945 g008
Figure 9. The ROC curves and AUC values of the six models: (a) Success rate curve. (b) Prediction-rate curve.
Figure 9. The ROC curves and AUC values of the six models: (a) Success rate curve. (b) Prediction-rate curve.
Remotesensing 13 04945 g009
Table 1. Primary sources for the datasets used in this study.
Table 1. Primary sources for the datasets used in this study.
FactorsSub-FactorsSource of DataTimeResolution
Flood inventory mapHistorical flash flood pointsNational Flash Flood Investigation and Evaluation Project (NFFIEP)1949–20151:50,000
DEMAltitudeGeospatial Data Cloud (www.gscloud.cn) (accessed on 8 May 2021)201030 m × 30 m
Slope
Slope aspect
TWI
GPMMaximum three-day precipitation (M3DP)National Aeronautics and Space Administration (https://pmm.nasa.gov/precipitation-measurement-missions) (accessed on 26 March 2021)2000–20180.1° × 0.1°
Land useLand coverResource and Environment Data Cloud
Platform (https://www.resdc.cn/) (accessed on 6 April 2021)
20101 km × 1 km
SoilSoil textureResource and Environment Data Cloud
Platform (https://www.resdc.cn/) (accessed on 16 April 2021)
20101 km × 1 km
VegetationNDVINational Earth System Science Data Center
(https://www.geodata.cn/) (accessed on 23 April 2021)
20151 km × 1 km
RiverDistance to river (DR)National Flash Flood Investigation and Evaluation Project (NFFIEP)20131:1,000,000
Table 2. The results of the multicollinearity analysis of the flash flood conditioning factors.
Table 2. The results of the multicollinearity analysis of the flash flood conditioning factors.
Flash Flood Conditioning FactorsCollinearity Statistics
ToleranceVIF
Altitude0.1835.476
M3DP0.2354.252
TWI0.5701.756
DR0.7271.376
Slope0.7841.276
Land Cover0.8001.251
Soil Texture0.9341.071
Slope Aspect0.9501.053
M3DP is maximum three-day precipitation; TWI is topographic wetness index; DR is distance to the river.
Table 3. The fuzzy membership value (FMV) calculation results.
Table 3. The fuzzy membership value (FMV) calculation results.
FactorsClassesFlood PixelsClass PixelsFRFMV
Altitude (m)337–149419312,1505.651.00
1494–24976715,5511.530.27
2497–33735616,9571.170.21
3373–40782434,4420.250.04
4078–7304041,8450.000.00
M3DP (mm)35.99–52.946541,6680.550.09
52.94–69.905044,4150.400.07
69.90–91.487416,1841.630.27
91.48–116.657113,9551.810.30
116.65–1678047236.031.00
TWI−6.50–0.361545,2780.120.02
−0.36–3.482431,4270.270.03
3.48–6.444319,7840.770.10
6.44–9.738016,3591.740.22
9.73–21.4717880977.821.00
DR (m)<100016312,2714.731.00
1000–20004411,4031.370.29
2000–30001710,8660.560.12
3000–50003219,9940.570.12
>50008466,4110.450.10
Slope (°)0–5.36155307571.791.00
5.36–10.327035,3190.710.39
10.32–15.476528,3030.820.46
15.47–21.663619,2940.660.37
21.66–52.611472720.680.38
Land CoverAgriculture land14677723.550.42
Forests9347,5150.370.04
Grassland7661,0710.240.03
Water108082.340.28
Built-up areas143138.451.00
Wasteland134660.050.01
Soil TextureHeavy-Clay125770.150.01
Silty-Clay11551,6010.850.05
Clay2614027.040.38
Silty-Clay-Loam3622,9550.600.03
Clay-Loam05540.000.00
Silty-Loam475750.200.01
Loamy-Clay10327,5901.420.08
Sandy-Clay2245318.431.00
Loam1217492.600.14
Sandy-Clay-Loam517211.030.60
Sandy-Loam1640301.510.08
Sandy/Loamy-Sand02870.000.00
Slope AspectFlat zones0240.000.00
North1873800.870.60
North-East4516,3110.980.68
East5217,4021.060.73
South-East6315,4821.451.00
South4013,9721.020.70
South-West4214,2491.050.72
West3814,8660.910.63
North-West2614,3170.650.45
North1669420.820.57
M3DP is maximum three-day precipitation; TWI is topographic wetness index; DR is distance to the river.
Table 4. Statistical indices used to evaluate the models’ performances.
Table 4. Statistical indices used to evaluate the models’ performances.
ModelsTPTNFPFNSensitivitySpecificityAccuracySDRMSE
TrainingSVM222214505879.2981.0680.150.410.37
SVM-FMV229218435480.9283.5282.170.380.36
CART253229194385.4792.3488.600.330.30
CART-FMV245213275980.5988.7584.490.360.35
CNN187237853584.2373.6077.940.400.41
CNN-FMV242216305681.2187.8084.190.350.33
TestingSVM5555131380.8880.8880.880.390.38
SVM-FMV62596987.3290.7788.970.320.29
CART5558131084.6281.6983.090.340.35
CART-FMV62596987.3290.7788.970.310.29
CNN436225687.7671.2677.210.400.40
CNN-FMV645641284.2193.3388.240.320.30
Table 5. Prediction performances of the six models used in this study based on the validation dataset.
Table 5. Prediction performances of the six models used in this study based on the validation dataset.
ModelsTPTNFPFNSensitivitySpecificityAccuracySDRMSE
ValidatingSVM120107253875.9581.0678.280.420.39
SVM-FMV121119242682.3183.2282.760.370.36
CART124121212483.7885.2184.480.360.35
CART-FMV127122182384.6787.1485.860.350.35
CNN104123412282.5475.0078.280.400.41
CNN-FMV128116172981.5387.2284.140.330.34
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liu, J.; Wang, J.; Xiong, J.; Cheng, W.; Sun, H.; Yong, Z.; Wang, N. Hybrid Models Incorporating Bivariate Statistics and Machine Learning Methods for Flash Flood Susceptibility Assessment Based on Remote Sensing Datasets. Remote Sens. 2021, 13, 4945. https://doi.org/10.3390/rs13234945

AMA Style

Liu J, Wang J, Xiong J, Cheng W, Sun H, Yong Z, Wang N. Hybrid Models Incorporating Bivariate Statistics and Machine Learning Methods for Flash Flood Susceptibility Assessment Based on Remote Sensing Datasets. Remote Sensing. 2021; 13(23):4945. https://doi.org/10.3390/rs13234945

Chicago/Turabian Style

Liu, Jun, Jiyan Wang, Junnan Xiong, Weiming Cheng, Huaizhang Sun, Zhiwei Yong, and Nan Wang. 2021. "Hybrid Models Incorporating Bivariate Statistics and Machine Learning Methods for Flash Flood Susceptibility Assessment Based on Remote Sensing Datasets" Remote Sensing 13, no. 23: 4945. https://doi.org/10.3390/rs13234945

APA Style

Liu, J., Wang, J., Xiong, J., Cheng, W., Sun, H., Yong, Z., & Wang, N. (2021). Hybrid Models Incorporating Bivariate Statistics and Machine Learning Methods for Flash Flood Susceptibility Assessment Based on Remote Sensing Datasets. Remote Sensing, 13(23), 4945. https://doi.org/10.3390/rs13234945

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop