1. Introduction
Among natural disasters, floods are considered one of the greatest threats, the frequency of which is expected to increase in the future due to urban development and climate change [
1]. Southeast Asian countries are particularly vulnerable to flooding hazards, especially during the wet season. Organizations such as the Mekong River Commission (MRC) and the Asian Disaster Preparedness Center (ADPC) are therefore implementing regional flood forecasting systems by synergistically combining hydrological data and modeling outputs [
2]. Nowadays, satellite data have become an important component of environmental risk management through flood extent mapping. The latter—flood extent mapping- is a process used to identify the land areas impacted by flooding. At the same time, the production of land use/land cover (LULC) maps has become a frequently used method for flood risk monitoring [
3,
4]. These maps can be integrated into flood databases to identify risk zones and determine levels of vulnerability.
Floodable zones have become a considerable socio-economic issue at the local level of the Mekong Delta, which is ecologically, economically and socially important, thus many studies have been conducted over the years using Earth Observation data specially to characterize the rice crop. The majority of these studies have focused on delineating the distribution of rice crops using either optical satellite imagery (passive sensors) [
5,
6] or radar satellite images (active sensors) [
7,
8,
9]. Active or passive sensors used for flood risk applications cover a very wide range of the electromagnetic spectrum, and information from these spectral ranges or their combination contributes significantly to forecasting and risk management. Unlike passive sensors (optical), which are severely affected by cloud cover, an obstacle to environmental studies in tropical and equatorial areas, satellite-based Synthetic Aperture Radar (SAR) can penetrate clouds. Thus, SAR is an interesting source of information for flood monitoring and soil moisture studies. SAR (post-2000), ALOS (L-band, Japan, 2006), TerraSAR-X (X-band, Germany, 2007), RADARRSAT-2 (C-band, Canada, 2007), COSMO-SkyMed and Sentinel-1 (C-band, European Union, 2014) offer new avenues of research, including high spatial resolution (metric resolution) and coherent polarimetric data, in flood management and sustainable development. With a higher temporal resolution than previous SAR instruments, Sentinel-1 is able to monitor the seasonal cycle of water cover every six days. These latest advances in SAR data acquisition have enabled the development of near real-time and automated flood mapping [
10,
11,
12]. Indeed, the possibility of fully automated services for surface water [
12] has been investigated and several works have focused on the mapping of the extension of flooded areas and the application of automatic methods using satellite images [
13,
14,
15,
16,
17]. However, at the present time, Sentinel-1 C-band images for flood mapping have not yet been used in an exhaustive manner.
In addition, the low backscatter value of water, in the absence of wind effects, is frequently used for the detection of flooded areas on radar images. Water surfaces constitute a specular reflector of the radar pulse, which results in a reduced signal returning to the satellite [
18,
19]. However, rain and wind can increase the roughness of the water surface, backscatter the SAR signal and mask flooded areas. [
18,
20]. The backscatter of the SAR signal also varies with the angle of incidence (AI) and variations in the local angle of incidence (LIA) due to target topography and AI. [
21]. Thus, the backscatter intensity can be influenced by environmental conditions such as landscape topography and shadows.
Another possible difficulty is the identification of flooding in areas where objects protrude above the water surface and thus interact with the radar signal. As such, it is difficult to determine a general threshold for backscatter. The environment can play a very important role in this. For example, water can be masked by vegetation cover, with lotuses and aquatic grasses resulting in uncertainty in mapping the extent of flooding. According to some authors [
22], there are normally large areas of widespread aquatic grasses and lotus lakes during the flood season.
In the field of flood mapping, several methods have been applied using satellite images: photo-interpretation and image segmentation, which use mathematical principles such as edge detection, and fuzzy logic with artificial neural network exploration. The most frequently used method is thresholding, which is used for the analysis of SAR images in order to discriminate between the water and non-water areas. These techniques include the following: image histogram thresholding [
17,
23,
24], image classification algorithms [
25,
26,
27,
28,
29,
30] image texture algorithms [
31] and, multi-temporal change detection methods [
28,
29,
32,
33]. The scientific community agrees that machine learning (ML) methods have several advantages in environmental applications, such as improved mapping accuracy, reduced computation time and reduced model development cost [
34,
35,
36,
37]. According to several studies, ML has the potential to fundamentally improve future flood risk and impact assessments [
38,
39,
40,
41]. Moreover, recent developments in ML, especially neural network models, have made advanced applications in the field of environmental and risk analysis possible. Applications of ML methods to flood mapping have emerged in recent years [
4,
42]. Furthermore, CNNs have demonstrated excellent performance in various domains, including image classification [
43], object-based image analysis (OBIA [
44]) and, scene labeling [
45], in the field of computer vision [
46,
47,
48]. Nemni et al., 2020 [
49] proposed a CNN-based method for isolating flooded pixels from Sentinel-1 images without any optical band and with minimal preprocessing. Li et al., 2019 [
50] evaluated the role of interferometric coherence in urban flood detection using multi-temporal TerraSAR-X data. They introduced an active self-learning convolutional neural network (A-SL CNN) framework to mitigate the effect of a limited annotated training data set. Kang et al., 2018 [
51] applied a fully convolutional network (FCN) based on the classical FCN to flood mapping using Gaofen-3 SAR images in China. Shen et al., 2019 [
52] developed a near-real-time (NRT) flood mapping system, named RAPID, based on dual-polarized SAR data.
Overall, the present study puts in place a methodology of mapping the floodplain and its land use—in this case, rice paddies—before, during and after the floods and aims to map the fluctuation of the flooded areas. Specifically, it explores the potential of several robust ML models, namely CNN, MLP and RF, by comparing the accuracy of predictive models for flood and floodable area mapping in a complex deltaic environment (An Giang province, Mekong Delta). Moreover, this study attempts to analyze the contribution of Sentinel-1 SAR data in vertical and horizontal polarization, VV and VH, according to backscatter characteristics. Furthermore, based on the results of a comparative study between the optimized ML models, this paper proposes an accurate method for deriving complex decision boundaries between flooded and non-flooded areas and producing reliable detection and mapping of LULC classes that can be potentially impacted by floods.
4. Discussion
The most frequent flooding in the Mekong Delta is mainly induced by the Mekong River flooding regime as well as by the surface flow. The construction of dyke systems from the early 1990s onwards to protect fields from flooding allows for third crop cultivation in some parts of the delta. However, this is not the case everywhere in our study area. Another category of floods is those caused by the distribution of a dense network of canals and controlled by dikes and lock gates. In this case flood monitoring is a hard task and relies mainly on data from a few meteorological stations in the region and on the hydrological models. However, flood forecasting from these models is becoming more difficult due to anthropogenic factors, sea level rise and environmental and climate changes [
70] This study aimed at providing an accurate method for flood prone mapping in the Mekong Delta using satellite data and ML algorithms.
By testing the accuracy of the ML algorithm, the first objective of this study were to define an effective and validated method for detecting flooded and non-flooded areas in the Mekong Delta. This type of study could provide a large panel of users with the possibility and choice to reproduce the exact method in an automatic and standard way, which should allow the updating of a possible database shared between local institutes in an efficient way [
71]. Thus, it was important not only use software algorithms and data that provide a reliable and accurate result but also that are available and reproducible, in order that the mapping can also be undertaken by the partners in Vietnam or in other concerned study areas. Regarding the data, we noticed that, although Sentinel-1 SAR data are widely used for flood monitoring due to their high spatial and temporal resolution and free availability, very few studies using Sentinel-1 SAR data for flood mapping and monitoring have been conducted in the Mekong Delta. For the algorithm choice, we have provided detailed results and the accuracy of three ML algorithms for image classification, via a comparative study. In addition, supervised and unsupervised classification methods have been widely applied for surface water mapping using satellite images [
72] and have compared supervised and unsupervised machine learning methods, revealing an overall accuracy of the unsupervised water classification method of 89.3% [
73], while in the present study, the accuracy has achieved higher values (globally higher than 95%). Although the supervised classification method is able to map water bodies efficiently, it could be a tedious task because of the creation of the training and the validation dataset, which could be time-consuming.
The flood maps derived from the algorithms tested here in our work were validated by overlaying them with metrological and hydrographical data. We noticed that the accuracy of the image classifications varies with the methods and techniques employed. Few studies have reported minor to moderate fluctuation in the accuracy of classification of flooded and non-flooded areas using different classifiers. Therefore, we attempt to pay a particular and detailed attention to the accuracy assessment and validation of the classification and mapping model, by comparing the results of the CNN model with other robust models such as MLP and RF. We noted that all models have the same trends of the accuracy indicators values. In general the accuracy indicators values achieved the highest values for months with maximum extension of flooded areas (October and then November) and the lowest values for months with less extension of flooded areas (July). However, the CNN model performed the best, achieving the highest accuracy. This accuracy analysis could be one of the added values of this study as it gives an idea of the performance of each model used and allows users to choose the appropriate one for flood and LULC mapping.
The Flood and Floodable Area Forecasting Model
In the field of flood mapping, the main objective is to distinguish between flooded and non-flooded areas, which can be treated as a binary classification process in which regions are labelled as “flood” or “non-flood.” In this study, the CNN classifier showed a very high overall accuracy of about 99% for flooded and non-flooded areas. It was directly used for binary classification in order to identify the regional floodable and non-floodable areas. In order to provide a simplified and reproducible approach, a 2D-CNN architecture was used for the generalized classification process.
Furthermore, we focused our analysis on the most flood-impacted zones of the study area (
Table 9). It was found that the 2020 flooded areas were smaller (3923.3 km
2) than the 2019 flooded areas (4478.9 km
2). These differences in the extent of flooding could indicate that the flooded area should be analyzed in relation to the maximum level of the Mekong River. This level was recorded at Tan Chau station, and in 2019 it was almost 4 m compared to the level recorded in 2020 of less than 3 m. At the same time, the uncertainties of mapping using SAR techniques could be considered. It should be mentioned that the SAR signal may be influenced by speckle and thus by under- or over-detection of the flood extent, especially in urban and vegetated areas. In this context, the CNN framework aims to reduce classification errors associated with land cover heterogeneities and underlying complexity. This framework can efficiently distinguish permanent water from flood water even though minor misclassification errors may be observed among land cover classes.
In order to interpret and understand the driving forces behind the onset and progression of flooding in the Mekong Delta, it is important to understand the climate and hydrological regime in this extremely complex flooding environment. The most frequent flooding in the 203 delta is mainly induced by the Mekong River flooding regime as well as by the surface flow. The construction of dyke systems from the early 1990s onwards to protect fields from flooding allows for third crop cultivation in some parts of the delta. However, this is not the case everywhere in our study area. Another category of floods consists of those caused by the distribution of a dense network of canals and controlled by dikes and lock gates. We can admit that the flooding in the Mekong Delta has a series of secondary undesired and desired effects. The undesired effects of flooding lead to the destruction of infrastructure and crops. Meanwhile, floodwaters fertilize floodplain soils and can provide a habitat for aquatic animals, and when controlled, they enable irrigation activities and even energy generation. Based on the SAR time series alone, it is not possible to fully differentiate between the individual components, nor is it possible to distinguish between a “desirable flood” and an “undesirable flood’’ [
26]. It is not always possible to distinguish between natural and man-made floods. However, interpretation can be more reliable in this respect if auxiliary data such as information on the type of land use and humane activities are available.
5. Conclusions
Floods are a recurrent risk in the Vietnamese Mekong Delta. This phenomenon is happening more frequently and with higher intensity due to climate change [
74,
75]. Indeed, the analysis and monitoring of flood events through mapping of flooded and floodable areas is becoming a priority in risk management. This study provides a systemic approach by exploring the potentials of advanced ML models with an optimal architectural design for flood and flood-prone area mapping from SAR images in tropical deltaic environments. In order to exploit the multi-temporal series of the Sentinel-1 images in dual polarization (VV and VH), a backscatter coefficient analysis was performed using a large amount of reference images (60 images per year and 5 images per month). Moreover, the hydrological regime data, the calendar of flooding and the rice cultivation period were incorporated in order to allow a much more reliable and accurate detection of changes during floods.
Three robust models of ML, namely CNN, MLP and RF, were developed, revealing high potentials for flood and floodable area mapping in the Mekong Delta. A comparative detailed analysis between different accuracy indicators recorded by the three ML models, with the correlation of flooding periods, could be considered especially important to allow for a perfect accuracy assessment. It was noted that the proposed CNN model demonstrated the highest reliability and flexibility for flood and floodable area mapping. These prediction results provide new insights into the patterns of flood variation in space and time in the Mekong Delta. Furthermore, the use of segmentation parameters adapted to seasonal and annual variations and the adaptation of CNN models to these variations are one of the original aspects of our classification method.
According to the results of the flood extent mapping derived from the application of the three ML algorithms, the predictions of the spatiotemporal flood forecast models based on the Sentinel-1 time series appear to be globally consistent. Furthermore, from a qualitative point of view, the magnitude of seasonal and inter-annual variations in flood extent was also consistent with significant peaks during the wet season and troughs during the dry season highlighted by the hydro-meteorological data. Indeed, peaks and troughs in flood extent are generally well aligned with the CNN mapping of flood events and floodable areas.
Although rice fields were the economic issue addressed in this study, a LULC analysis was also conducted to quantify the impact of flood risk on different land use classes with significant local economic value. This research suggests that the CNN model developed here could be generalized to other deltaic areas for future studies, using other types of remotely sensed images.