Next Article in Journal
Accurate Mapping and Evaluation of Small Impact Craters within the Lunar Landing Area
Next Article in Special Issue
Enhanced Tailings Dam Beach Line Indicator Observation and Stability Numerical Analysis: An Approach Integrating UAV Photogrammetry and CNNs
Previous Article in Journal
VALNet: Vision-Based Autonomous Landing with Airport Runway Instance Segmentation
Previous Article in Special Issue
Near-Real Prediction of Earthquake-Triggered Landslides on the Southeastern Margin of the Tibetan Plateau
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating Satellite Images and Machine Learning for Flood Prediction and Susceptibility Mapping for the Case of Amibara, Awash Basin, Ethiopia

by
Gizachew Kabite Wedajo
1,
Tsegaye Demisis Lemma
1,2,
Tesfaye Fufa
1,2 and
Paolo Gamba
3,*
1
Department of Remote Sensing, Entoto Observatory and Research Center, Space Science and Geospatial Institute, Addis Ababa 251, Ethiopia
2
Department of Satellite Operation, Space Science and Geospatial Institute, Addis Ababa 251, Ethiopia
3
Department of Electrical, Biomedical and Computer Engineering, University of Pavia, 27100 Pavia, Italy
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(12), 2163; https://doi.org/10.3390/rs16122163
Submission received: 8 April 2024 / Revised: 1 June 2024 / Accepted: 12 June 2024 / Published: 14 June 2024

Abstract

:
Flood is one of the most destructive natural hazards affecting the environment and the socioeconomic system of the world. The effects are higher in the developing countries due to their higher vulnerability to disaster and limited coping capacity. The Awash basin is one of the flood-prone basins in Ethiopia where the frequency and severity of flooding has been increasing. Amibara district is one of the flood-affected areas in the Awash basin. To minimize the effects of flooding, reliable and up-to-date information on flooding is highly required. However, flood monitoring and forecasting systems are lacking in most basins of Ethiopia including the Awash basin. Therefore, this study aimed to (i) identify important flood causative factors, (ii) evaluate the performance of random forest (RF), linear regression, support vector machine (SVM), and long short-term memory (LSTM) machine learning models for flood prediction and susceptibility mapping in the Amibara area. For developing flood prediction and susceptibility modeling, nine causative factors were considered, namely elevation, slope, aspect, curvature, topographic wetness index, soil texture, rainfall, land use/land cover, and curve number. The Pearson correlation coefficient and information gain ratio (InGR) techniques were used to evaluate the relative importance of the factors. The machine learning models were trained and tested using 400 historic flood points collected from the 10 September 2020 Sentinel 2 image, during which a flood event occurred in the area. Multiple metrics, namely precession, recall, F1-score, accuracy, and receiver operating characteristics (area under curve), were used to evaluate the performance of the models. The results showed that all the factors considered in this study were important; elevation, rainfall, topographic wetness index, aspect, and slope were more important while land use/land cover, curve number, curvature, and soil texture were less important. Furthermore, the results showed that random forest outperformed in predicting and mapping flooding for the study area whereas the linear regression model showed the next best performance to RF. However, SVM performed poorly in flood prediction and susceptibility mapping. The integration of satellite and field datasets coupled with state-of-the-art-machine learning models are novel approaches and thus improved the accuracy of flood prediction and susceptibility mapping. Such methodology improves the state-of-the-art knowledge in this field and fills the gaps of traditional flood mapping techniques. Thus, the results of the study can provide crucial information for informed decision-making in the processes of designing flood control strategies and risk management.

1. Introduction

Floods are one of the world most destructive natural global environmental hazards and result in huge casualties and economic losses [1]. According to UNDRR and CRED [2], floods affected 1.65 billion people between 2000 and 2019 alone. The number of causalities and human displacement is higher in developing countries due to their higher vulnerability to disaster and limited coping capacity [3]. Most African countries, including Ethiopia, have been highly affected by flooding. For example, in the Greater Horn of Africa alone, about 2 million people are affected by extreme climate-induced hazards including flooding every year [4]. In this region, in addition to the direct effects of flooding, the indirect effects have been increasing, such as spread of water-borne diseases, severe food insecurity, large-scale migration flows, increasing social inequality, and political instability [4]. For example, Reed et al. [5] reported that about 12% of the African population experienced food insecurity attributed to flooding. The frequency and severity of flooding have been increasing due to climate change, extreme weather, and anthropogenic factors [6].
Flood is a common meteorological hazard in most river basins of Ethiopia, causing water pollution and destruction of infrastructure and livelihood, displacing, killing, and injuring many peoples [7,8]. In Ethiopia, flooding is recurrent and happening every year in different parts of the country, and with various impact levels [9,10]. The Awash basin is one of the most flood-prone basins in Ethiopia [10], with increasing frequency and severity resulting from increasing climate change and socioeconomic developments such as increasing urbanization and industrialization. Flooding in the Awash basin is either caused by excess precipitation or unexpected release of an excess amount of water from reservoirs. The upper parts of the basin, with elevation of more than 3000 m and mean annual precipitation of about 2000 mm, have experienced flooding due to excess precipitation whereas the lower parts of the basin have experienced flooding mainly due to unexpected release of reservoirs or overtopping of the river channel due to high precipitation received in the highland parts of the basin.
Lack of flood monitoring and forecasting systems, inadequate budget, insufficient data, lack of centralized database, conflicts within the region, policy implementation issues, lack of high computing infrastructures, and the complexity of topography make predicting and monitoring of flooding very challenging. On the other hand, to minimize the impact of flooding, proper flood monitoring, forecasting and early warning systems are crucial.
Flood mapping, monitoring, and prediction provide meaningful information for decision-makers in designing flood mitigation and management strategies [11]. Flood mapping techniques includes physical hydrological modeling and statistical methods [12,13]. These techniques, however, require large amounts of data, which are difficult to collect, and consider predefined relationships between controlling factors and flood occurrence. Moreover, physical-based model techniques do not capture the nonlinear characteristics of floods, which leads to uncertainties of the results [14,15]. The techniques have advanced owing to the advancement of remote sensing technologies, geographic information system, artificial intelligence, and cloud computing [11]. Recently, machine learning coupled with geospatial technologies has been widely used for flood mapping, monitoring, and prediction [13,16]. Machine learning, which is a data-driven computer algorithm, is effective for flood mapping in data-scarce areas [11] and captures the nonlinear characteristics of flood occurrence [15]. Machine learning can be used as an alternative flood mapping technique and overcomes the limitations of physical-based and statistical models [17]. Machine learning is effective and outperforms physical hydrologic models in mapping and predicting flooding as it describes the nonlinear nature of flooding [18]. Accordingly, machine learning algorithms such as support vector machine [14], artificial neural networks [19,20], and random forest [21] have been widely used for flood mapping.
For example, Zehra [22] employed nonlinear autoregressive exogenous model (NARX) and support vector machine (SVM) algorithms for flood prediction and reported that the effectiveness of machine learning algorithms for flood mapping and prediction are attributed to its capability to utilize data from multiple sources to classify flood and non-flood classes. Moreover, Nevo et al. [16] reported the higher performance of long short-term memory (LSTM) algorithms compared to linear models in modeling and forecasting flooding. However, Motta et al. [15] reported better performance of random forest in detecting flood events. On the other hand, Nur-Adib et al. [17] reported the best performance of an artificial neuron network (ANN) in mapping and predicting flood event. According to Ali et al. [23], the deep recurrent neural network (DRNN) model better performed in mapping and predicting flood events compared to the long short-term memory (LSTM) and bidirectional long short-term memory (BI-LSTM) models.
Therefore, there is no censuses on which machine learning algorithms are best for mapping and predicting flood event as the performances of the algorithms depend on the quality and types of datasets used for developing the models, and biophysical characteristics of an area. Therefore, the objectives of this study were to select the most important flood causative factors, evaluate the performance of SVM, RF, linear regression, and LSTM machine learning models for flood mapping and prediction for the Amibara district.

2. Materials and Methods

2.1. Description of the Study Area

The Awash River basin is one of the 12 major river basins of Ethiopia, which is highly populated and the most utilized basin due to availability of land and water resources. As such, it is the most important river basin in Ethiopia. The Awash River originates from the high plateau near Ginchi town west of Addis Ababa and flows along the Rift Valley into the Afar Triangle, and joins the salty Lake Abbe near the Djibouti border. The total area of the basin is about 110,000 km2 with elevation ranging from 4193 to 290 m above sea level (m.a.s.l). The Awash basin is divided into upper, middle, and lower parts based on topographic, climatic, and socioeconomic factors. The basin is characterized by diverse topography, climate, and farming system, and is frequently affected by extreme climate events such as flooding and drought [24].
This study was conducted in the Amibara district of the lower Awash River basin, which is located in the Afar Regional State, Eastern Ethiopia. Amibara district is part of the Administrative Zone 4 of the Afar region, bordered in the south by the Awash Fentale district, in the west by the Awash River, which is separated from Dulecha in the northwest by Administrative Zone 5, in the north by Gewane, in the east by the Somali region, and in the southeast by the Oromia region. The geographic location of the study area is between 09°–10°N and 40°–41°E and the elevation ranges from 563 to 1344 m above sea level (m.a.s.l) (Figure 1).
The area is part of the East Africa Great Rift Valley [25], which is characterized by patches of scattered dry forests, Acacia woodlands, bush land, wooded savanna, and scrubland. About 64% of the region is degraded and bare due to the harsh semi-desert climatic conditions [26]. Awash Arba, Awash Sheleko, Melka Sedi, and Melka Werer are towns that are found in the Amibara district.
The physiography of the study area is dominated by plains, with a slope range of 0–8%. The main soil types in the area include Eutric Fluvisols, Eutric Cambisols, and Vertic Cambisols. The climate can generally be described as arid to semiarid, with maximum and minimum temperatures varying from 25 to 42 °C and 15.2 to 23.5 °C, respectively, and an average annual rainfall of 560 mm [25,26]. May and June are the driest months, whereas July through September is the main rainy season. During the longest rainy season, flash riverine floods originating from the highlands recurrently break the Awash River bank and cause loss of lives, crops, and infrastructure damage to the surrounding area. The district is identified as one of the most flood-vulnerable sites in the region.

2.2. Datasets

Flood occurrence is determined by various biophysical and environmental factors including streamflow, rainfall, topographic-related variables, land use/land cover, soil texture, and other watershed characteristics. Thus, for predicting and mapping flood events accurately, such multiple flood causative factors should be considered. As such, various datasets from different sources were used for this study, including streamflow, rainfall, ESA WorldCover, historic flood event data, Shuttle Radar Topographic Mission (SRTM) digital elevation model (DEM) (SRTM DEM) data, and soil datasets. The datasets were used for developing machine learning models for mapping and predicting flood in the study area.
Long-term time series streamflow data were obtained from the Ministry of Water and Energy of Ethiopia (MoWE), Climate Hazards Group InfraRed Precipitation with Station (CHIRPS) satellite-derived rainfall (5 km spatial resolution) data were downloaded from the Climate Hazards Center (https://www.chc.ucsb.edu/data/chirps, accessed on 12 December 2023), and the SRTM DEM with 30 m spatial resolution was downloaded from NASA’S Earth Explorer website (https://earthexplorer.usgs.gov, accessed on 13 December 2023/) from which various topographic parameters such as slope, elevation, aspect, curvature, and topographic wetness index (TWI) were extracted. In addition, ESA WorldCover maps (10 m), which were developed by the integration of Sentinel 1 and 2, were downloaded from the ESA website (https://esa-worldcover.org/en, accessed on 13 December 2023), from which the land use/land cover maps of the study area were extracted. Major land use/land cover classes of the study area include farmland, bare land, waterbody, woodland, grassland, and settlement. Land use/land cover classes are one of the important environmental factors that determine flood occurrence in a given area. For example, farmland, settlement, and grassland are susceptible to flooding compared to forest and shrubland, as in the former case, the natural surface is disturbed, promoting runoff rather than infiltration. Furthermore, USDA system soil texture classes for six depths were downloaded freely from an open-source dataset (https://developers.google.com/earthengine/datasets/catalog/OpenLandMap_SOL_SOL_TEXTURE-CLASS_USDA-TT_M_v02, accessed on 12 December 2023), from which soil permeability and curve number (CN) were derived.
Historic flood event data are important for training and testing machine learning models. However, obtaining spatial flood record data is difficult and challenging as the data are either unavailable or scarce. Therefore, historic satellite images acquired during floods in the study area can be used to collect labeled flood points. According to reliefweb [27], severe flooding occurred in most parts of Ethiopia including in the Amibara district of the Afar region from the middle of June to September 2020 and heavy flooding occurred from 7 to 10 September 2020. Due to this flooding event, more than 38,000 individuals were displaced from Amibara district alone. Figure 2 shows Sentinel images acquired before the occurrence of the flood event (2 June 2020) and during the flood event (10 September 2020). Accordingly, the Sentinel 2 satellite image acquired on 10 September 2020 was used to collect historic flood information, from which about 400 labeled points were collected, and the points were divided into training and testing datasets (Figure 1 and Figure 2). Accordingly, the training datasets were used for training and hyperparameter tuning the models and the testing dataset was used for testing the performance of the models. For fair evaluation of the performance of the models, the training datasets were independent datasets that were not used for the training process. Table 1 show the sources and descriptions of the datasets used in this study.

2.3. Methods

2.3.1. Data Preprocessing and Standardization

Multiple datasets required for this project were collected from various sources. Therefore, data management and preprocessing were highly required to ensure the accuracy, standardization, reliability, and quality of the data collected from multiple sources. As such, preprocessing including removing outliers, filling missed data, and correcting inconsistencies of the datasets were carried out using Python libraries (geopandas) before being used as an input for the models. These preprocessing techniques were implemented to ensure the quality and standards of the datasets. After preprocessing, cleaning, and formatting the datasets, relevant variables required for the development of the model were extracted. Finally, the cleaned and formatted datasets were standardized in a format required by the machine learning models, which were used for flood prediction and susceptibility mapping. Nine causative factors were extracted, and used for developing the machine learning models.

2.3.2. Machine Learning Models Evaluated in this Study

For this study, four machine learning models were evaluated for their flood prediction and susceptibility mapping performances. The algorithms considered in this study included random forest (RF), support vector machine (SVM), long short-term memory (LSTM), and linear regression. Model training and testing as well as flood prediction and susceptibility mapping activities were carried out on the Jupyter notebook in the EO-Africa Research and Facility innovation lab to predict and map flood susceptibility in the Amibara district, Afar region, Ethiopia.

Random Forest

Random forest is one of the common machine learning models that are integral learning models and uses multiple trees to train and predict labeled data [28]. It is an ensemble classification model that utilizes the decision trees as one single tree in the forest to solve both regression and classification problems. The algorithm introduces randomness into the model, reduces the correlation between decision trees, and improves the prediction accuracy of the algorithm while ensuring the accuracy of each model. The model has two parameters, namely the number of parameters and the depth of the decision trees, that improve the accuracy of the classification [29].
The strength of random forest include that it can: (1) generate highly accurate classifiers for high-dimensional feature inputs while maintaining efficient training process; (2) evaluate feature importance during the determining category; (3) handle missing information in the samples while achieving high accuracy; and (4) is able to process unbalanced data and consider their interaction [30]. To run the random model in this case, the number of trees considered was 126 and the number of randomly selected predictor variables was 9.

Linear Regression

Linear regression is a machine learning algorithm that provides a linear relationship between an independent variable and a dependent variable to predict the outcome of future events. The model predicts the value of the dependent variable as response or output variables. Therefore, linear regression is a supervised learning algorithm that simulates a mathematical relationship between variables and makes predictions for continuous or numeric variables.
The linear regression fits a linear model with coefficients β = (β1, … βk) to minimize the residual sum of squares between the observed targets in the datasets, and the targets predicted by the linear approximation. The general formula for linear regression mode is described by the following formulas (Equations (1)–(3)).
Y = β 0 + β 1 * X 1 + β 2 * X 2 + + β k * X k
where Y = dependent variable, X i = independent variable (i = 1, 2, 3, … k), β0 = an intercept (or bias) term and βi = regression coefficient.
To evaluate best fit line, the corresponding cost function (or loss function) needs to be optimized. For example, consider mean squared error (MSE) as a cost function:
a r g m i n β 0 , β 1 N i = 1 N y i β 0 j = 1 N x i j β j 2
The coefficients of the above cost function are determined by the following closed form solutions.
β = X T X 1 X T Y

Support Vector Machine (SVM)

The SVM is a supervised learning binary classifier. It is a nonlinear mathematical structure that is able to represent complex nonlinear relationships between the inputs and outputs [31]. The SVM model can be constructed by an optimum linear separating hyperplane that can be used to separate the data patterns and use kernel function to convert the original nonlinear data pattern into a linearly separable format in the high-dimensional feature space [32]. It is a supervised classification technique that uses a hyperplane in a data space.
Through changing the kernel function, which determines the shape of the decision boundary, an SVM regression algorithm can be optimized for flood prediction. This is because the kernel function has significant impact on the performances of the model. Accordingly, among the radial basis function (RBF), polynomial kernel, linear kernel, and sigmoid kernel, the linear kernel was selected as it performed best in predicting floods.

Long Short-Term Memory (LSTM)

LSTM is used in both deep learning and artificial intelligence that has feedback connections and handles an entire data sequence. As such, LSTM networks are effective for classification and prediction of time series datasets [33]. Particularly, they have significant potential for flood forecasting [34]. The LSTM has a unique internal structure capable of processing and memorizing long-term sequential dependencies [35].
To evaluate the performance of an LSTM model for flood prediction, various parameters were carefully considered during training and testing phases. The parameters are crucial for optimization and accuracy improvement included the number of layers, epochs, batch size, and the number of dense layers. A nuanced approach was adopted to balance the number of LSTM layers that enhanced the ability of the model to capture complex spatial patterns while preventing overfitting. At the same time, the number of epochs, which determines how often the model iterates over the training dataset, was adjusted with a keen eye on accuracy improvement while avoiding the risk of overfitting. This iterative process involved continuous monitoring of validation loss and accuracy to pinpoint the optimal number of epochs. Furthermore, choosing best optimization and activation function can improve the performance of an LSTM model. Therefore, optimizers including Adam, RMSprop, and SGD with momentum were experimented with to find the one that resulted in the best accuracy. Likewise, activation functions including ReLU, sigmoid, and tanh were tested to determine the best activation for better accuracy. Accordingly, ReLu activation and the Adam optimizer were selected for the model in this project, which provided the best model accuracy.

2.3.3. Parameters Considered in Developing the Models

Topographic parameters such as slope, elevation, aspect, curvature, and TWI were derived from the Shuttle Radar Topographic Mission (SRTM) digital elevation model (DEM). The CHIRPS satellite was used to extract mean annual rainfall of the project areas. In addition, soil texture and CN data were derived from USDA global soil datasets. The land use/land cover classes obtained from the ESA WorldCover maps were reclassified and aggregated into farmland, bareland, waterbody, woodland, grassland, and settlement classes. Finally, all the parameters were resampled to the same resolution with the SRTM DEM (30 m) and standardized ready for model development. Table 2 shows the summary of parameters considered for the Amibara flood prediction and susceptibility mapping.

Feature Selection

Feature selection is one of the most important steps during the processes of flood prediction and susceptibility mapping. It is required to reduce the size of the input datasets by removing less important features. The removal of less important features while maintaining more important features improves learning accuracy and minimizes the computational time during model training [36]. More influencing factors were selected using two techniques: Pearson correlation analysis and information gain ratio (InGR). The correlation between causative factors was analyzed using the Pearson correlation coefficient (Equation (4)). The absolute value of r ranged from 0 to 1. Two factors that have an absolute of r greater than 0.6 are considered to have a strong correlation [21].
r = 1 n X i X ¯ Y i Y ¯ 1 n X i X ¯ 2 1 n Y i Y ¯ 2
where X i and Y i are samples of causative factors, X ¯ and Y ¯ are the average values of X and Y, respectively, and n is number of samples.
The information gain ratio (InGR) is utilized to measure the information contribution level of causative factors to prediction models. The InGR method was used to analyze the relationship between flood conditioning factors and flood occurrence, and evaluate the importance of flood conditioning factors [37]. Assuming training dataset S consists of n classes, the expected information is calculated as follows (Equations (5)–(8)):
H S = i = 1 n p i log 2 p i
where p i is the probability that a sample belongs to class Ci (flood and non-flood).
The factor A has m values and its average entropy is calculated as follows:
E A = i = 1 m p i H S
The split information value represents the potential information obtained by dividing S into m parts corresponding to m results on the attribute A and can be calculated as follows:
S p l i t I n f o   A S = i = 1 m X i S i S log 2 S i / S
Finally, the variable importance value (VI) is defined as follows:
V I A = H S E ( A ) S p l i t I n f o   A S
Factors with higher V I values are more important for prediction models. The higher the variable importance (VI) values, the more important they are for prediction models, whereas factors with VI of zero indicate no contribution to flood occurrence and should be removed from flood susceptibility modeling [33].

2.3.4. Machine Learning Accuracy Assessment Metrics

The performance of machine learning models is commonly evaluated using precision, recall, F1-score, and accuracy metrics [29,38]. The values of these metrics range from 0 to 1, where the higher the value, the better the performance of the model. These metrics are calculated from the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) sample testing datasets. The true positive (TP) samples are pixels and the machine learning model correctly detects the true classified pixel type as the water pixel in the prediction, whereas the true negative (TN) pixels are non-water pixels in observation correctly detected as non-water pixels in the machine learning prediction. On the other hand, the false positive (FP) samples are incorrectly detected as water pixels that are non-water pixels in observed datasets. The false negative (FN) samples are the non-water pixels detected by the machine learning mode, that are water pixels in the observed datasets (see Equations (9)–(12)).
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
F 1 s c o r e = 2   *   P r e c e s i o n   *   R e c a l l P r e c e s i o n + R e c a l l
A c c u r a c y = T P + T N T P + T N + F P + F N

Receiver Operating Characteristic (ROC) Curve

The ROC curve is a common approach used for determining the general performance of models, which is widely used in spatial modeling. It can evaluate qualitatively the prediction of occurrence or non-occurrence of the events using the area under the ROC curve (AUC) [39] that highlights the performance of a model for flood prediction and mapping. Therefore, the receiver operating characteristics curve (ROC) and area under the ROC curve (AUC) were used in evaluating the machine learning models in addition to the abovementioned evaluation metrics. The values of the ROC curve ranged from 0.5 to 1, indicating inaccurate and highly accurate, respectively [40].

2.3.5. Flood Prediction Techniques

Historic flood data can be used for training and testing machine learning algorithms. However, such data have not been, in most cases, recorded in spatial data rather flood situations are stored in reports with simple descriptive information about where the flood occurred, how large it extended, and how deep it was. Thus, historic flood data were not available and were challenging to collect from the respective offices and fields. Therefore, satellite images were the best alternative data sources for extracting historic flood occurrence in a given area. As such, Sentinel 2 satellite images acquired in 2020 (flooded year in the study area) were used for collecting flood data, which were used for training and testing machine learning algorithms. About 400 total labeled flooded points were collected from the satellite images, of which 0.8 were use for training and 0.2 were used for testing the models.

2.3.6. Flood Susceptibility Mapping

Flood susceptibility mapping is crucial for characterizing flood risk zones and for designing appropriate flood mitigation and management strategies. Flood occurrence depends on various biophysical and environmental parameters including topography, hydrometeorology, soil, land use/land cover, and anthropogenic factors. However, various studies have considered different parameters for mapping flood susceptibility. As such, based on literature review and the characteristics of the study area, nine causative factors were considered for flood prediction and susceptibility modeling.
The importance of the factors was evaluated based on InGR and the Pearson correlation coefficient. Accordingly, the importance of all the factors considered in this study, which included elevation (DEM), slope, aspect, soil, rainfall, land use/land cover, and curve number, were assessed. The trained models were used for mapping flood susceptibility of the study area, which showed the probability of flood occurrence, and the resulting maps produced from the models were compared. Figure 3 shows the general methodology flowchart.

3. Results and Discussion

3.1. Feature Importance

Figure 4 shows the InGR values of the features used for the machine learning models for flood prediction and susceptibility mapping, revealing significant differences in their importance.
The results showed that the InGR values of all the features were >0, which indicates that all the features considered in the model development are important although their importance differs significantly. The closer the InGR values approached to 1, the more important the feature, whereas the closer the InGR values to 0 showed the less important the feature was. DEM (elevation) (0.35), Ap-CHI (rainfall) (0.16), and TWI-Am (topographic wetness index) (0.13) were the most important factors that determined flood prediction and susceptibility mapping. Moreover, aspect (0.1) and slope (0.09) were important factors in determining flood events, whereas land use/land cover (LULC), curve number (CN), curvature, and soil were less important. The results generally showed that topographic-related factors and rainfall are crucial factors that determine flood occurrences in the Amibara district. As such, for better prediction and susceptibility mapping of flood events, those features should be considered during model development.
Figure 5 shows the Pearson correlation coefficient values between the nine factors: DEM, AP_CHI, TWI-Am, aspect, slope, LULC, CN-New, curvature, and soil-res. The results showed two pairs of factors with a correlation coefficient greater than 0.6. The highly correlated pair of factors were Soil-res and CN-New (0.76) and elevation (DEM)-slope (0.61). Pairs of factors with a correlation coefficient less than 0.6 are considered important and should be considered in the development of machine learning models. In this regard, most of the factors considered in this study were very important in determining flood events.
For a highly correlated pair of factors (>0.6), the factor with the lower InGR is considered as less important. As such, elevation (DEM) and curve number (CN-New) were considered more important while soil texture and slope were relatively less important, which was in line with the InGR-based factor importance analysis result.
Generally, the results of the feature importance analysis showed that most of the features considered in this study were important for the occurrence of flooding, whereas topographic variables like DEM (elevation), aspect, topographic wetness index (TWI), slope, and climate variables such as rainfall were very important. This result is in line with previous studies [33,41,42]. For instance, according to Bentivoglio et al. [41], slope, land use/land cover, aspect, terrain, and curvature are the most important factors for determining flood occurrence, and thus should be considered in flood prediction and susceptibility mapping. Moreover, a study performed in the Shangyou Country, Jiangxi Province, showed that slope, distance to rivers, land use, and NDVI are more important in influencing flood occurrence [33]. Likewise, Al-Ruzouq et al. [42] reported that elevation, rainfall, and NDVI are more important for determining flood occurrence in Fujairah, United Arab Emirates. Furthermore, Nguyen et al. [37] reported that elevation, land use/land cover, NDVI, curvature, topographic wetness index, and rainfall are more important for flood occurrence. However, aspect and slope were reported to be less important for flood susceptibility mapping [37]. This implies that flood prediction and susceptibility mapping are determined by multiple biophysical factors whereas their significance varies from region to region and from model to model. Therefore, assessing the types and the significance level of factors that affect flooding should be performed first, before modeling flood prediction and susceptibility mapping.

3.2. Performances of the Machine Learning Models

Table 3 shows machine learning performance evaluation results for predicting flooding and susceptibility mapping in the Amibara district.
The performances of the models were evaluated using the abovementioned metrics. The results showed that precision, recall, F1-score, and accuracy values for the RF model were 0.90, 0.94, 0.91, and 0.90, respectively. These values were similar to those of the linear regression model. On the other hand, the performance of the SVM model in terms of precision, recall, F1-score, and accuracy were 0.75, 0.90, 081, and 0.75, respectively, showing a poorer performance. Finally, the performance of LSTM was modest in predicting flood events, at least for the study area.
Figure 6 shows the receiver operating characteristics (ROC) area under curve (AUC) values for the SVM, LSTM, linear regression, and RF models. The area under curve can explain the accuracy of the models. The higher the AUC, the higher the performance of the models for predicting and mapping flooding. According to the ROC curve result, the RF (0.94) and linear regression (0.94) models outperformed the LSTM (0.81) and SVM (0.5) models.
The results show that SVM was not very accurate in predicting flooding for the Amibara district (Figure 6 and Figure 7, and Table 2). On the other hand, the same results further reveal that, though RF and linear regression were better at predict flooding (AUC > 0.90), LSTM had high accuracy in flood prediction (AUC > 0.8). Therefore, RF, linear regression and LSTM models were better in predicting flood events; RF and linear regression models were robust while the SVM model performed poorly in predicting flood events. Generally, machine learning model performance evaluation results showed that RF was the most accurate and robust in predicting flood events in the Amibara district.

3.3. Flood Susceptibility Maps

Figure 7 shows flood susceptibility maps of the Amibara district obtained by the four models. The susceptibility maps represent flood probability values ranging from 0 to 1. The value closer to 0 corresponds to areas almost safe from floods, while values closer to 1 represent high flood occurrence probability. It is worth noting that flood probability values for the RF model ranged from 0 to 1 whereas the ranges for the other models did not exploit the whole range, which may indicate a poorer performance of flood susceptibility mapping and underestimation of flood occurrence.
This indicates that RF was best for flood susceptibility map for the Amibara district (Figure 7C). The flood probability values are color coded. Assuming five intervals, ranging from 0 to 0.2, 0.2 to 0.4, 0.4 to 0.6, 0.6 to 0.8, and 0.8 to 1 that correspond to very low, low, moderate, high, and very high flood susceptibility, respectively, the results indicate that the high and very high susceptibility areas are clearly located in flat areas with low elevation. The flood susceptibility map obtained from the RF model is in agreement with flooded areas during 2020 that suffered huge environmental and socioeconomic destruction [27]. Heavy flood events caused by heavy rainfall were reported in different parts of Ethiopia including the Amibara district of Afar region from mid-June to September 2020 [27]. The cause of flooding in the Amibara district was overflow of the Kesem, Tendaho, and Koka dams on the Awsh River due to the heavy rain that occurred in the highlands of the Awash basin. The flooding displaced about 38,000 individuals from Amibara district alone, destroying infrastructure and farmland.
Furthermore, [43] reported that the 2020 flood was the most disastrous compared to the other floods in the Awash basin, displacing 240,000 persons, damaging 60,000 ha agricultural land, and affecting bridges and roads. The 2017‒2018 flood event that occurred in the upper Awash basin affected 8477 households, about 18,996 ha of agricultural land, 25,087 head of livestock, and infrastructure and institutions [44].
The findings of this study are in agreement with the study conducted by Hasanuzzaman et al. [45], who reported better performance of RF (AUC = 0.85) in flood susceptibility mapping for the Silabati River, a river in tropical India. The better performance of RF in flood prediction and susceptibility mapping could be attributed to its ability to overcome overfitting while maintaining predictive accuracy, as it is an ensemble of classification and regression trees [46]. Moreover, the model combines straightforward operability with computational speed [47]. Furthermore, the model is able to generate highly accurate classifiers for high-dimensional feature inputs while maintaining efficient training processes, evaluate feature importance during the determining category, and handle missing information in the samples while achieving high accuracy, and processing unbalanced data and considering their interaction [30]. As such, RF outperformed other machine learning models such as support vector machine, artificial neural network, and regression models [11,31,48].
However, Nevo et al. [16] reported the strong performance of LSTM in predicting flood events. Moreover, Fang et al. [33] reported the effectiveness of LSTM for predicting flood occurrence and flood susceptibility mapping. However, El-Magd et al. [49] reported that XGBoost outperformed (AUC accuracy of 90.2%) in predicting flood event in Wadi El-Laqeita, Central Eastern Desert, Egypt. In addition, Le et al. [50] reported a strong performance (with Nash‒Sutcliffe efficiency (NSE > 87) from LSTM in flood forecasting in Da River basin in Vietnam. Likewise, the outperformance of LSTM with NSE > 0.98 and RMSE of <0.2 m in forecasting flooding was reported in the Xiangjiang River [51,52]. Moreover, Fang et al. [33] reported that the local spatial sequential long short-term memory neural network (LSS-LSTM) method performed satisfactory prediction in terms of accuracy (93.75%) and area under the receiver operating characteristic (ROC) curve (0.965). On the other hand, the highest performance of SVM in predicting and mapping flood events was reported in California [29].
On the other hand, the outperformance of artificial neural networks in flood inundation and forecasting was reported by several studies [18,52,53]. Moreover, outperformance of machine learning models is reported using artificial neural networks (ANNs) [54], random forest (RF) [39], and support vector machine (SVM) models [55]. In addition, Bentivoglio et al. [41] reported that convolutional layer-based models control inductive biases and better process the spatial characteristics of flooding events than the model and statistics-based techniques.
Studies show that the performance of machine learning models for predicting flood event varies from study to study and it is challenging to generalize. This could be due to differences in the types and quality of datasets used, model parameters considered, and biophysical characteristics of the study areas. Moreover, the underlying principles of the models and their predictive performance significantly varies. For example, RF leverages ensemble learning and aggregate multiple decision trees. As such, it handles complex, nonlinear relationships and high-dimensional data, which makes it robust and outperform other models [28,56]. SMV, which seeks to find the optimal hyperplane that separates data points into different classes, tackles nonlinearity with kernels and is effective for complex datasets with many features. However, this model is sensitive to hyperparameter tuning [57]. On the other hand, linear regression assumes a linear relationship between inputs and outputs and predict values based on a weighted sum of features. Linear regression models are suitable for simple and linear datasets but less accurate for nonlinear and complex datasets [58]. The LSTM deep machine learning model is effective for time series and sequential datasets whereas it is less accurate for static and non-sequential data [59]. These studies show that choosing the appropriate model depends on the characteristics of the data and the nature of the underlying relationships.
Our findings highlight carefully choosing suitable models and parameters for effective and accurate flood prediction and mapping is an important prerequisite for designing flood control strategies that minimize the negative consequancs of flooding. Therefore, flood management planning in the Amibara district is important to minimize the negative consequences of flooding. This is because the severity, frequency, and geographic coverage of flooding are expected to increase in the future, attributed to increasing climate change and human activities, endangering the environment and the socioeconomic system of the world [16,60]. Hence, monitoring and predicting flooding are important for designing flood management strategies.
Hydrological and meteorological model-based flood mapping and forecasting techniques are traditional approaches that do not consider the complex flood causative factor interactions and have limitations in integrating large datasets from multiple sources. Hence, more sophisticated, adaptive, and data-driven techniques are required for reliable flood mapping and prediction. In this regard, machine learning is an innovative approach for flood mapping and prediction as it handles multiple datasets from various sources and complicated correlations within large volumes of datasets. Such capabilities improved the accuracy of flood mapping and predictions [60]. As such, machine learning offers promising performance for flood inundation modeling and provides a plausible alternative to physically based hydraulic models that are both highly computationally demanding and challenging to use in operational flood forecasting systems [16,18].
Generally, machine learning, which is a data-driven computer algorithm, is effective for flood mapping in data-scarce areas like in Ethiopia [11] and captures the nonlinear characteristics of flood occurrence [15]. It can also be used as an alternative flood mapping technique and overcomes the limitations of the physical-based and statistical models [17]. Machine learning is effective and outperforms physical hydrologic models in mapping and predicting flooding as it describes the nonlinear nature of flooding [18]. However, conventional machine learning techniques require specific feature engineering of raw data before their processing. On the other hand, deep learning, which is among the advanced techniques of machine learning, performs better than the traditional machine learning attributed mainly to high computational time. Deep learning models are more accurate compared to the traditional approaches and computationally fast compared to numerical methods. This is because the deep learning method is a data-driven method with multiple levels of representation, obtained by composing simple but nonlinear modules where each transform the representation at one level.
Even though study has generally shown the effectiveness of deep learning for flood mapping and prediction, its generalization ability across different case studies and modeling complex interactions with the natural and built environment remain challenging in using deep learning for flood mapping and predictions [41]. Therefore, the major limitation of machine learning/deep learning models is their difficulties in predicting beyond the data used to train the models [53]. Furthermore, machine learning in general and deep learning in particular require large datasets to obtain better performance whereas ground measured datasets are scarce and publicly unavailable for model training, testing, and validation. The other limitations of machine learning/deep learning models include embedding expert knowledge for calibrating the models manually, and models that use grid-based data to learn mapping functions pose challenges in understanding the interaction between grids and the black box nature of the model [61].

4. Conclusions

Flooding is one of the most destructive natural hazards that affects the environment and socio-economic systems of both developing and developed countries. To minimize the negative effects of flooding, monitoring and predicting the occurrence of flood events is important, as it supports informed decision-making. In this regard, the advancement of geospatial technologies and the state-of-the-art machine learning algorithms and cloud computing have contributed much in mapping and predicting flood events more accurately compared to the traditional physical- and statistical-based modeling techniques. These technologies and techniques are effective in integrating multiple factors from various sources that determine flood occurrence. The factors that determine flooding and the performance of machine learning models varies from region to region. Therefore, identifying the relative importance of flood causative factors and evaluating the performance of machine learning models in predicting and mapping flooding is a prerequisite for accurate mapping and predicting the events.
Accordingly, feature importance results showed that all the factors considered in this study, namely elevation, rainfall, topographic wetness index, aspect, slope, land use/land cover, curve number, curvature, and soil texture, are determining factors for flood occurrence with different level of importance. The results showed that topographic-related factors such as elevation, aspect, topographic wetness index, aspect, and slope, and rainfall are more important while land use/land cover, curve number, curvature, and soil texture are less important. Furthermore, the performance evaluation results of the machine learning models indicate that random forest outperformed the other three models with the highest accuracy (0.9), precision (0.9), recall (0.94), and AUC (0.94) in flood prediction and susceptibility mapping. Linear regression also performed better next to RF with 0.85 precision, 0.96 recall, 0.90 F1-score, 0.87 accuracy, and 0.94 AUC. However, support vector machine performed poorly whereas LSTM was moderate in its performance.
The results generally showed that topographic factors and rainfall were more important in determining flood occurrence whereas the RF model was best in predicting and mapping flood events more accurately compared to SVM, LSTM, and linear regression for the Amibara area and similar areas. However, caution should be taken in selecting important flood causative factors and best performing models for a particular area as the accuracy and reliability of the results depends on the quality of the training data, and types and quality of the features considered for the development of the models and model parameter estimation. The results of this study can be used as a baseline for informed decision-making and developing a flood monitoring and early warning system for Amibara and similar areas. Particularly, information on flood prediction and susceptibility are crucial for policymakers in disaster risk management.

Author Contributions

Conceptualization, T.D.L.; methodology, T.D.L., P.G. and G.K.W.; software, T.F.; validation, T.D.L., T.F. and G.K.W.; formal analysis, T.D.L. and T.F.; data curation, T.D.L. and T.F.; writing—original draft preparation, G.K.W.; writing—review and editing, P.G. and G.K.W.; supervision, P.G. and G.K.W.; project administration and funding acquisition, P.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by ESA through the EO AFRICA R&D Facility, ESRIN/Contract No. 4000133905/21/I-EF.

Data Availability Statement

Datasets used for this study will be provided up on reasonable request to the authors.

Acknowledgments

We acknowledge the Ministry of Water and Energy of Ethiopia, Amibara District administration and international satellite data providers including NASA and ESA for providing satellite data free of charge.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Giovannettone, J.; Copenhaver, T.; Burns, M.; Choquette, S. A statistical approach to mapping flood susceptibility in the Lower Connecticut River Valley Region. Water Resour. Res. 2018, 54, 7603–7618. [Google Scholar] [CrossRef]
  2. UNDRR; CRED. The Human Cost of Disasters: An Overview of the Last 20 Years (2000–2019). UNDRR. 2021. Towards an IGAD Regional Flood Risk Profile: Preliminary Results of the Regional Flood Risk Assessment. 2020. Available online: https://www.preventionweb.net/files/74124_humancostofdisasters20002019reportu.pdf (accessed on 12 June 2024).
  3. Christian Aid. Counting the Cost 2022: A Year of Climate Breakdown. 2022. Available online: https://www.christianaid.org.uk/resources/our-work/counting-cost-2022-year-climate-breakdown (accessed on 12 June 2014).
  4. FAO; WFP. Special Report—2021 FAO/WFP Crop and Food Security Assessment Mission (CFSAM) to the Republic of South Sudan; FAO: Rome, Italy, 2022. [Google Scholar]
  5. Reed, C.; Anderson, W.; Kruczkiewicz, A.; Nakamura, J.; Gallo, D.; Seager, R.; McDermind, S.S. The impact of flooding on food security across Africa. Proc. Natl. Acad. Sci. USA 2022, 119, e2119399119. [Google Scholar] [CrossRef] [PubMed]
  6. Balgah, R.A.; Buchenrieder, G.R.; Mbue, I.M. When nature frowns: A comprehensive impact assessment of the 2012 Babessi floods on people’s livelihoods in rural Cameroon. Jàmbá J. Disaster Risk Stud. 2015, 7, 197. [Google Scholar] [CrossRef] [PubMed]
  7. Mamo, S.; Berhanu, B.; Melesse, A.M. Chapter 29—Historical flood events and hydrological extremes in Ethiopia. In Extreme Hydrology and Climate Variability; Melesse, A.M., Abtew, W., Senay, G., Eds.; Elsevier: Amsterdam, The Netherlands, 2019; pp. 379–384. [Google Scholar]
  8. Woldegebrael, S.M.; Kidanewold, B.B.; Melesse, A.M. Development and Evaluation of a Web-Based and Interactive Flood Management Tool for Awash and Omo-Gibe Basins, Ethiopia. Water 2022, 14, 2195. [Google Scholar] [CrossRef]
  9. Amede, T.; Van den Akker, E.; Berdel, W.; Keller, C.; Tilahun, G.; Dejen, A.; Legesse, G.; Abebe, H. Facilitating livelihoods diversification through flood-based land restoration in pastoral systems of Afar, Ethiopia. Renew. Agric. Food Syst. 2020, 37, S43–S54. [Google Scholar] [CrossRef]
  10. Tegegne, G.; Melesse, A.M.; Asfaw, D.H.; Worqlul, A.W. Flood Frequency Analyses Over Different Basin Scales in the Blue Nile River Basin, Ethiopia. Hydrology 2020, 7, 44. [Google Scholar] [CrossRef]
  11. Lei, X.; Chen, W.; Panahi, M.; Falah, F.; Rahmati, O.; Uuemaa, E.; Kalantari, Z.; Ferreira, C.S.S.; Rezaie, F.; Tiefenbacher, J.P.; et al. Urban flood modeling using deep-learning approaches in Seoul, South Korea. J. Hydrol. 2021, 601, 126684. [Google Scholar] [CrossRef]
  12. Zhang, S.; Pan, B. An urban storm-inundation simulation method based on GIS. J. Hydrol. 2014, 517, 260–268. [Google Scholar] [CrossRef]
  13. Kalantari, Z.; Ferreira, C.S.S.; Koutsouris, A.J.; Ahlmer, A.K.; Cerdà, A.; Destouni, G. Assessing Flood Probability for Transportation Infrastructure Based on Catchment Characteristics, Sediment Connectivity and Remotely Sensed Soil Moisture. Sci. Total Environ. 2019, 661, 393–406. [Google Scholar] [CrossRef]
  14. Rahmati, O.; Darabi, H.; Panahi, M.; Kalantari, Z.; Naghibi, S.A.; Ferreira, C.S.S.; Kornejady, A.; Karimidastenaei, Z.; Mohammadi, F.; Stefanidis, S.; et al. Development of novel hybridized models for urban flood susceptibility mapping. Sci. Rep. 2020, 10, 12937. [Google Scholar] [CrossRef]
  15. Motta, M.; Neto, M.C.; Sarmento, P. A mixed approach for urban prediction using Machine Learning and GIS. Int. J. Disaster Risk Reduct. 2021, 56, 102154. [Google Scholar] [CrossRef]
  16. Nevo, S.; Morin, E.; Gerzi Rosenthal, A.; Metzger, A.; Barshai, C.; Weitzner, D.; Voloshin, D.; Kratzert, F.; Elidan, G.; Dror, G.; et al. Flood forecasting with machine learning models in an operational framework. Hydrol. Earth Syst. Sci. 2022, 26, 4013–4032. [Google Scholar] [CrossRef]
  17. Nur-Adib, M.; Harun, A.N.B.; Goto, M.; Cheros, F.; Haron, N.A.; Nawi, M.N.M. Evaluation of Machine Learning approach in flood prediction scenarios and its input parameters: A systematic review. IOP Conf. Ser. Earth Environ. Sci. 2020, 479, 012038. [Google Scholar]
  18. Kabir, S.; Patidar, S.; Xia, X.; Liang, Q.; Neal, J.; Pender, G. A deep convolutional neural network model for rapid prediction of fluvial flood inundation. J. Hydrol. 2020, 590, 125481. [Google Scholar] [CrossRef]
  19. Falah, F.; Rahmati, O.; Rostami, M.; Ahmadisharaf, E.; Daliakopoulos, I.N.; Pourghasemi, H.R. Artificial Neural Networks for Flood Susceptibility Mapping in Data-Scarce Urban Areas. In Spatial Modeling in GIS and R for Earth and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2019; pp. 323–336. [Google Scholar]
  20. Michielsen, A.; Kalantari, Z.; Lyon, S.W.; Liljegren, E. Predicting and communicating flood risk of transport infrastructure based on watershed characteristics. J. Environ. Manag. 2016, 182, 505–518. [Google Scholar] [CrossRef]
  21. Wang, Y.; Hong, H.; Chen, W.; Li, S.; Panahi, M.; Khosravi, K.; Shirzadi, A.; Shahabi, H.; Panahi, S.; Costache, R. Flood susceptibility mapping in Dingnan County (China) using adaptive neuro-fuzzy inference system with biogeography-based optimization and imperialistic competitive algorithm. J. Environ. Manag. 2019, 247, 712–729. [Google Scholar] [CrossRef] [PubMed]
  22. Zehra, N. Prediction Analysis of Floods using Machine Learning Algorithms (NARX and SVM). Int. J. Sci. Basic Appl. Res. (IJSBAR) 2020, 49, 24–34. [Google Scholar]
  23. Ali, M.H.M.; Asmai, S.A.; Abidin, Z.Z.; Abas, Z.A.; Emran, N.A. Flood prediction using Deep Learning Models. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 972–981. [Google Scholar] [CrossRef]
  24. Karimi, P.; Bastiaanssen, W.G.M.; Molden, D. Water Accounting Plus (WA+)—A water accounting procedure for complex river basins based on satellite measurements. Hydrol. Earth Syst. Sci. 2013, 17, 2459–2472. [Google Scholar] [CrossRef]
  25. Mohr, P. Plate Tectonics of the Red Sea and East Africa. Nature 1970, 228, 547–548. [Google Scholar] [CrossRef]
  26. Shiferaw, H.; Bewket, W.; Alamirew, T.; Zeleke, G.; Teketay, D.; Bekele, K.; Schaffner, U.; Eckert, S. Implications of land use/land cover dynamics and Prosopis invasion on ecosystem service values in Afar Region. Ethiopia 2019, 675, 354–366. [Google Scholar] [CrossRef] [PubMed]
  27. ReliefWorld. Ethiopia Flood Response Plan—2020 Kiremt Season Floods (September 2020), Informing Humanitarians Worldwide 24/7—A Service Provided by UN OCHA. Available online: https://reliefweb.int/report/ethiopia/ethiopia-flood-response-plan-2020-kiremt-season-floods-september-2020 (accessed on 12 June 2024).
  28. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  29. Tanim, A.H.; McRae, C.B.; Tavakol-Davani, H.; Goharian, E. Flood Detection in Urban Areas Using Satellite Imagery and Machine Learning. Water 2022, 14, 1140. [Google Scholar] [CrossRef]
  30. Tang, Y.; Sun, Y.; Han, Z.; Soomro, S.; Wu, Q.; Tan, B.; Hu, C. Flood forecasting based on machine learning pattern recognition and dynamic migration of parameters. J. Hydrol. Reg. Stud. 2023, 47, 101406. [Google Scholar] [CrossRef]
  31. Li, B.; Yang, G.; Wan, R.; Dai, X.; Zhang, Y. Comparison of random forests and other statistical methods for the prediction of lake water level: A case study of the Poyang Lake in China. Hydrol. Res. 2016, 47, 69–83. [Google Scholar] [CrossRef]
  32. Yao, X.; Tham, L.G.; Dai, F.C. Landslide susceptibility mapping based on Support Vector Machine: A case study on natural slopes of Hong Kong, China. Geomorphology 2008, 101, 572–582. [Google Scholar] [CrossRef]
  33. Fang, Z.; Wang, Y.; Peng, L.; Hong, H. Predicting flood susceptibility using LSTM neural networks. J. Hydrol. 2021, 594, 125734. [Google Scholar] [CrossRef]
  34. Granata, F.; Di Nunno, F. Neuro-forecasting of daily stream flows in the UK for short-and medium-term horizons: A novel insight. J. Hydrol. 2023, 624, 129888. [Google Scholar] [CrossRef]
  35. Zou, Y.; Wang, J.; Lei, P.; Li, Y.A. A novel multi-step ahead forecasting model for flood based on time residual LSTM. J. Hydrol. 2023, 620, 129521. [Google Scholar] [CrossRef]
  36. Mahdavi, S.; Salehi, B.; Amani, M.; Granger, J.; Brisco, B.; Huang, W. A dynamic classification scheme for mapping spectrally similar classes: Application to wetland classification. Int. J. Appl. Earth Obs. Geoinf. 2019, 83, 101914. [Google Scholar] [CrossRef]
  37. Nguyen, D.L.; Chou, T.Y.; Hoang, T.V.; Chen, M.H. Flood Susceptibility Mapping Using Machine Learning Algorithms: A Case Study in Huong Khe District, Ha Tinh Province, Vietnam. Int. J. Geoinform. 2023, 19, 1–17. [Google Scholar] [CrossRef]
  38. Han, J.; Pei, J.; Kamber, M. Data Mining: Concepts and Techniques; Elsevier: Waltham, MA, USA, 2011. [Google Scholar]
  39. Chen, W.; Li, Y.; Xue, W.; Shahabi, H.; Li, S.; Hong, H.; Wang, X.; Bian, H.; Zhang, S.; Pradhan, B.; et al. Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods. Sci. Total Environ. 2020, 701, 134979. [Google Scholar] [CrossRef] [PubMed]
  40. Shirzadi, A.; Bui, D.T.; Pham, B.T.; Solaimani, K.; Chapi, K.; Kavian, A.; Shahabi, H.; Revhaug, I. Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environ. Earth Sci. 2017, 76, 60. [Google Scholar] [CrossRef]
  41. Bentivoglio, R.; Isufi, E.; Jonkman, S.N.; Taormina, R. Deep learning methods for flood mapping: A review of existing applications and future research directions. Hydrol. Earth Syst. Sci. 2022, 26, 4345–4378. [Google Scholar] [CrossRef]
  42. Al-Ruzouq, R.; Shanableh, A.; Jena, R.; Gibril, M.B.A.; Hammouri, N.A.; Lamghari, F. Flood susceptibility mapping using a novel integration of multi-temporal sentinel-1 data and eXtreme deep learning model. Geosci. Front. 2024, 15, 101780. [Google Scholar] [CrossRef]
  43. Haile, A.T.; Bekele, T.W.; Rientjes, T. Interannual comparison of historical floods through flood detection using multi-temporal Sentinel-1 SAR images, Awash River Basin, Ethiopia. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103505. [Google Scholar]
  44. Mekonnen, T.M.; Mitiku, A.B.; Woldemichael, A.T. Flood Hazard Zoning of Upper Awash River Basin, Ethiopia, Using the Analytical Heerarchy Process (AHP) as Compared to Sensetivity Analysis. Sci. World J. 2023, 15, 1675634. [Google Scholar]
  45. Hasanuzzaman, M.; Islam, A.; Bera, B.; Shit, P.K. A comparison of performance measures of three machine learning algorithms for flood susceptibility mapping of river Silabati (tropical river, India). Phys. Chem. Earth Parts A/B/C 2022, 127, 103198. [Google Scholar] [CrossRef]
  46. Schoppa, L.; Disse, M.; Bachmair, S. Evaluating the performance of random forest for large-scale flood discharge simulation. J. Hydrol. 2020, 590, 125531. [Google Scholar] [CrossRef]
  47. Mosavi, A.; Ozturk, P.; Chau, K. Flood prediction using machine learning models: Literature review. Water 2018, 10, 1536. [Google Scholar] [CrossRef]
  48. Bachmair, S.; Svensson, C.; Prosdocimi, I.; Hannaford, J.; Stahl, K. Developing drought impact functions for drought risk management. Nat. Hazards Earth Syst. Sci. 2017, 17, 1947–1960. [Google Scholar] [CrossRef]
  49. El-Magd, S.A.; Pradhan, B.; Alamri, A. Machine learning algorithm for flash flood prediction mapping in Wadi El-Laqeita and surroundings, Central Eastern Desert, Egypt. Arab. J. Geosci. 2021, 14, 323. [Google Scholar] [CrossRef]
  50. Le, X.H.; Ho, H.V.; Lee, G.; Jung, S. Application of Long Short-Term Memory (LSTM) Neural Network for Flood Forecasting. Water 2019, 11, 1387. [Google Scholar] [CrossRef]
  51. Liu, Y.; Yang, Y.; Chin, R.J.; Wang, C.; Wang, C. Long Short-Term Memory (LSTM) Based Model for Flood Forecasting in Xiangjiang River. KSCE J Civ Eng. 2023, 27, 5030–5040. [Google Scholar] [CrossRef]
  52. Chang, L.C.; Amin, M.Z.M.; Yang, S.N.; Chang, F.J. Building ANN-based regional multi-step-ahead flood inundation forecast models. Water 2018, 10, 1283. [Google Scholar] [CrossRef]
  53. Chu, H.; Wu, W.; Wang, Q.; Nathan, R.; Wei, J. An ANN-based emulation modelling framework for flood inundation modelling: Application, challenges and future directions. Environ. Model. Softw. 2020, 124, 104587. [Google Scholar] [CrossRef]
  54. Rahman, M.; Ningsheng, C.; Islam, M.M.; Dewan, A.; Iqbal, J.; Washakh, R.A.A.; Shufeng, T. Flood Susceptibility Assessment in Bangladesh Using Machine Learning and Multi-criteria Decision Analysis. Earth Syst. Environ. 2019, 3, 585–601. [Google Scholar] [CrossRef]
  55. Mojaddadi, H.; Pradhan, B.; Nampak, H.; Ahmed, N.; bin Ghazali, A.H. Ensemble machine-learning-based geospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS. Geomat. Nat. Hazards Risk 2017, 8, 1080–1102. [Google Scholar] [CrossRef]
  56. Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
  57. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  58. Draper, N.R.; Smith, H. Applied Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1998. [Google Scholar] [CrossRef]
  59. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  60. Rajab, A.; Farman, H.; Islam, N.; Syed, D.; Elmagzoub, M.A.; Shaikh, A.; Akram, M.; Alrizq, M. Flood Forecasting by Using Machine Learning: A Study Leveraging Historic Climatic Records of Bangladesh. Water 2013, 15, 3970. [Google Scholar] [CrossRef]
  61. Karim, F.; Armin, M.A.; Ahmedt-Aristizabal, D.; Tychsen-Smith, L.; Petersson, L. A Review of Hydrodynamic and Machine Learning Approaches for Flood Inundation Modeling. Water 2023, 15, 566. [Google Scholar] [CrossRef]
Figure 1. Description of the study area.
Figure 1. Description of the study area.
Remotesensing 16 02163 g001
Figure 2. Sentinel 2 images acquired on 2 June 2020 (left) (before flood event) and on 10 September 2020 (right) during a flood event (The Images are displayed in False Color Composite (RGB)).
Figure 2. Sentinel 2 images acquired on 2 June 2020 (left) (before flood event) and on 10 September 2020 (right) during a flood event (The Images are displayed in False Color Composite (RGB)).
Remotesensing 16 02163 g002
Figure 3. Methodology flowchart.
Figure 3. Methodology flowchart.
Remotesensing 16 02163 g003
Figure 4. Importance of features (InGR value). Note: Soil-res (soil texture); CN-New (curve number); LULC (land use/land cover); TWI-Am (topographic wetness index); AP-CHI (CHIRPS rainfall); DEM (elevation).
Figure 4. Importance of features (InGR value). Note: Soil-res (soil texture); CN-New (curve number); LULC (land use/land cover); TWI-Am (topographic wetness index); AP-CHI (CHIRPS rainfall); DEM (elevation).
Remotesensing 16 02163 g004
Figure 5. Pearson correlation matrix between nine factors. Note: Soil-res (soil texture); CN-New (curve number); LULC (land use/land cover); TWI-Am (topographic wetness index); AP-CHI (CHIRPS rainfall); DEM (elevation).
Figure 5. Pearson correlation matrix between nine factors. Note: Soil-res (soil texture); CN-New (curve number); LULC (land use/land cover); TWI-Am (topographic wetness index); AP-CHI (CHIRPS rainfall); DEM (elevation).
Remotesensing 16 02163 g005
Figure 6. Receiver operating characteristics (ROC) curve for LSTM (A), RF (B), SVM (C), and linear regression (D).
Figure 6. Receiver operating characteristics (ROC) curve for LSTM (A), RF (B), SVM (C), and linear regression (D).
Remotesensing 16 02163 g006
Figure 7. Flood susceptibility maps developed by SVM (A), LSTM (B), RF (C), and linear regression models (D) (1e5 = 1 × 105 and 1e6 = 1 × 106).
Figure 7. Flood susceptibility maps developed by SVM (A), LSTM (B), RF (C), and linear regression models (D) (1e5 = 1 × 105 and 1e6 = 1 × 106).
Remotesensing 16 02163 g007
Table 1. Description and sources of the datasets.
Table 1. Description and sources of the datasets.
S. NoData TypesSourceDescription
1StreamflowMoWEUsed to train the models
2RainfallSatellite Rainfall (CHIRPS) (5 km)Used to train the models
3Sentinel 2 imagesESAUsed to train the models
4Land use/land coverESA WorldCover (10 m)Used to derive major LULC classes
5DEMSRTM (30 m)Used to derive topographic parameters (slope, elevation, aspect, curvature, and TWI)
6SoilUSDAUsed for deriving CN and soil texture
Table 2. Parameters considered for flood susceptibility mapping.
Table 2. Parameters considered for flood susceptibility mapping.
S. NoFlood Conditioning FactorsSourceResampled Resolution (m)
1DEM (elevation)SRTM30
2SlopeDerived from DEM30
3AspectDerived from DEM30
4CurvatureDerived from DEM30
5Topographic Wetness Index (TWI)Derived from DEM30
6Curve Number (CN)From soil data30
7SoilUSDA (EnvirometriX Ltd.)30
8RainfallClimate Hazards Center (CHIRPS)30
9Land use/land cover (LULC)Sentinel 230
Table 3. Performance evaluation of machine learning models for flood prediction.
Table 3. Performance evaluation of machine learning models for flood prediction.
ModelPrecisionRecallF1-ScoreAccuracyAUC
SVM0.750.900.810.750.5
LSTM0.790.870.830.760.81
RF0.900.940.910.910.94
Linear regression0.850.960.900.870.94
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wedajo, G.K.; Lemma, T.D.; Fufa, T.; Gamba, P. Integrating Satellite Images and Machine Learning for Flood Prediction and Susceptibility Mapping for the Case of Amibara, Awash Basin, Ethiopia. Remote Sens. 2024, 16, 2163. https://doi.org/10.3390/rs16122163

AMA Style

Wedajo GK, Lemma TD, Fufa T, Gamba P. Integrating Satellite Images and Machine Learning for Flood Prediction and Susceptibility Mapping for the Case of Amibara, Awash Basin, Ethiopia. Remote Sensing. 2024; 16(12):2163. https://doi.org/10.3390/rs16122163

Chicago/Turabian Style

Wedajo, Gizachew Kabite, Tsegaye Demisis Lemma, Tesfaye Fufa, and Paolo Gamba. 2024. "Integrating Satellite Images and Machine Learning for Flood Prediction and Susceptibility Mapping for the Case of Amibara, Awash Basin, Ethiopia" Remote Sensing 16, no. 12: 2163. https://doi.org/10.3390/rs16122163

APA Style

Wedajo, G. K., Lemma, T. D., Fufa, T., & Gamba, P. (2024). Integrating Satellite Images and Machine Learning for Flood Prediction and Susceptibility Mapping for the Case of Amibara, Awash Basin, Ethiopia. Remote Sensing, 16(12), 2163. https://doi.org/10.3390/rs16122163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop