1. Introduction
Frequent global flood disasters have become a pressing issue, resulting in millions of people falling into poverty each year. These events not only impose enormous pressure on governments, but also hinder social development significantly. In July 2021, catastrophic rainfall in Zhengzhou, Henan Province, China, led to 302 deaths or disappearances [
1]. In February 2022, heavy rains in Queensland, Australia, caused eight deaths. In September 2022, Sudan experienced torrential rain and flooding that resulted in 146 deaths. The economic losses caused by flood disasters worldwide in the 21st century have exceeded USD 46 billion annually [
2], and approximately 70% of the deaths caused by floods in China are attributed to flash flood disasters [
3,
4]. These devastating impacts of flood disasters highlight the urgency and importance of effective water resource management. During floods, the prolonged duration and immense destructive power of the floodwaters necessitate the timely and accurate acquisition of flood inundation extents [
5], which is crucial for minimizing disaster losses and monitoring floods.
Traditional methods for water body extraction often require extensive manpower and material resources. These methods include direct comparison approaches such as the difference method, ratio method, regression analysis, as well as image transformation approaches such as the Normalized Difference Vegetation Index difference (NDVI), change vector analysis, principal component analysis, and texture-based analysis [
6]. In the 1990s, with the emergence of machine learning, researchers began to apply more sophisticated approaches for water detection, including artificial neural networks [
7], support vector machines (SVM), decision trees, random forests [
7,
8,
9], multi-kernel learning, and various hybrid methods such as spectral mixture analysis, fuzzy clustering analysis, and bio-inspired evolutionary algorithms. In the early 21st century, researchers shifted their focus towards object-based image analysis, introducing techniques like Markov random fields, conditional random fields, object-level change vector analysis, and other technologies [
6,
10,
11]. Simultaneously, object category comparison method emerged, including hybrid method at the pixel and object levels. In the 2010s, with the advent of remote sensing big data and artificial intelligence, deep learning such as autoencoders, neural networks [
12], recurrent neural networks, and knowledge graphs gradually found application in water body identification.
Deep learning exhibits the ability to dynamically learn by assessing the differences between predicted outcomes and actual ground truth labels, wherein Convolutional Neural Networks (CNNs) stand out as a prominent branch of this domain. Currently, owing to ongoing innovations, CNNs have demonstrated remarkable success across a spectrum of computer vision tasks. Today, diverse standard CNN models have been widely employed for water body segmentation tasks. Several widely recognized and exceptional algorithms, such as SegNet [
13], U-Net [
14], RefineNet [
15], PSPNet [
16], Mask R-CNN [
17], Deeplab series [
18,
19,
20,
21], DUPNet [
22], CoANet [
23], D-LinkNet [
24], and PANet [
25]. The advent of deep learning techniques has revolutionized the segmentation paradigm from traditional methods, significantly enhancing the accuracy and speed of water body segmentation.
As the data-driven algorithm, the deep learning model relies heavily on the quantity and quality of training data. For water body segmentation tasks, the primary data sources include optical imagery and Synthetic Aperture Radar (SAR) imagery [
26]. Optical remote sensing sensors utilize the reflection of sunlight to generate images, constituting a passive imaging system. However, optical imagery is susceptible to factors such as weather conditions, lighting variations, and cloud cover, making it unable to provide all-weather remote sensing capabilities. On the other hand, SAR represents another remote sensing method that relies on active high-resolution microwave radar sensors for imaging. It actively emits electromagnetic waves towards the Earth’s surface, and the surface generates echoes that are received by the radar sensor to generate SAR images. SAR images have notable advantages over optical images, since SAR satellites can perform all-weather and all-day data monitoring without being affected by factors such as clouds, fog, and illumination conditions [
27]. Researchers and practitioners are more inclined to use SAR images as the preferred data source for water body segmentation tasks. Therefore, utilizing SAR images for water body segmentation is a better choice. In recent years, with the continuous development of SAR technology, the spatial resolution of SAR images has been progressively improving and has now reached sub-meter levels [
12,
28,
29,
30], which facilitates achieving high-precision water body extraction and provides feedback for promptly identifying disaster-stricken areas and mitigating the associated losses.
As is well known, publicly available datasets play a crucial role in driving advancements in the computer vision field and documenting its development process [
31]. However, in the context of water body segmentation, there is a lack of accurately annotated high-resolution SAR image datasets. This shortage poses challenges such as unclear targets and inaccurate data, directly impacting the accuracy and generalization capability of models. To address this issue, this study constructs a precisely annotated high-resolution SAR image dataset for water body segmentation based on the GF-3 satellite imagery, termed the DaliWS dataset. Furthermore, several state-of-the-art semantic segmentation models are employed for experimental evaluation on our dataset. Through the analysis of experimental results, a valuable performance benchmark has been established.
Our contributions are summarized as follows:
To address the scarcity of publicly available accurately annotated high-resolution SAR image datasets for water body segmentation, this paper provides a manually annotated dataset using GF-3 satellite imagery. The dataset covers large rural water bodies in Dali County and provides pixel-level annotations for the training and validation of deep learning algorithms.
We explore the impact of different polarization modes, including HH polarization (single copolarization, horizontal transmit/horizontal receive), HV polarization (single copolarization, horizontal transmit/vertical receive), and HHHV (dual copolarization, horizontal transmit/horizontal receive) on water body segmentation tasks. on the water segmentation task, which provides reference for subsequent related research and facilitates the application of our dataset.
To further understand the characteristics of the dataset, this study extensively evaluates its performance using several state-of-the-art segmentation algorithms. The results demonstrate the dataset’s inherent challenges and provide new opportunities for water segmentation research material.
This study conducts numerous experiments and establishes performance benchmarks specifically for our dataset, laying the foundation for future research. It is anticipated to provide valuable resources and references for the research and development of water body segmentation.
The structure of this paper is as follows:
Section 2 presents the processing procedure for SAR images and the creation of the DaliWS dataset.
Section 3 demonstrates the segmentation networks used for evaluation, data augmentation methods, and evaluation metrics employed in this study.
Section 4 describes the experiments and presents the results. Finally, in
Section 5 and
Section 6, we discuss and summarize the findings of this research.
4. Results
4.1. Quantitative Analysis of Four Networks on DaliWS Dataset
To comprehensively explore the DaliWS dataset, we train four segmentation networks, e.g., FCN, SegNeXt, U-Net, and DeeplabV3+ on the training set. Subsequently, we evaluate the trained models using the test set. We then perform a quantitative comparison of the segmentation results of these four models on the DaliWS dataset and conduct an in-depth investigation into the impact of polarization modes on segmentation performance. To ensure experimental fairness, we employ consistent evaluation metrics, evaluation code, and maintain identical network parameter configurations as described in
Section 3.3.2.
In the first experiment, we focus on examining the water body extraction performance of different models on the DaliWS dataset. Results from
Table 5 indicate that all four models achieve F1 scores of over 85%, with U-Net exhibiting the best performance. In the dual-polarization mode, U-Net achieves F1, PA, and mIoU scores of 92.110%, 98.371%, and 85.374%, respectively. DeeplabV3+ closely followed, achieving F1, PA, and mIoU scores of 91.199%, 98.188%, and 83.822% in the dual-polarization mode. In comparison, FCN-8s and SegNeXt networks demonstrate relatively poorer performance in water area segmentation, with F1 scores of 90.361% and 90.192% in the dual-polarization mode.
Table 6 displays the GFLOPs and Params for these four models with an input size of (1,1,256,256), revealing that the order of GFLOPs from highest to lowest is U-Net, DeeplabV3+, SegNeXt, and FCN. U-Net has the smallest parameter count while also having the highest computational workload.
In the second experiment, we delve into the impact of different polarization modes on segmentation performance. Firstly, FCN, SegNeXt, U-Net, and DeeplabV3+ achieve F1 scores of 90.361%, 90.192%, 92.110%, and 91.199%, respectively, on HH + HV dual-polarization data, demonstrating superior performance compared to HH and HV single-polarization data. This is understandable, as the fusion of HH and HV dual-polarization data provides more comprehensive and richer spatial features for water body segmentation models. On HH and HV single-polarization data, U-Net consistently performs the best, with F1 scores of 91.173% and 90.198%, respectively. Overall, dual-polarization outperforms single-polarization, with HH polarization being superior to HV polarization.
4.2. Qualitative Analysis of Four Networks on DaliWS Dataset
Figure 7 and
Figure 8 illustrate the partial water body prediction results of FCN-8s, SegNeXt, U-Net, and DeeplabV3+ on the DaliWS test dataset. From the figure, it can be observed that all models are able to predict large water bodies to some extent. However, they exhibit limitations in accurately segmenting boundaries, shadows, and small water bodies.
In the first row, it can be seen that SegNeXt and U-Net incorrectly identify a small tributary as water, while FCN and DeeplabV3+ correctly classify it as background. However, most methods erroneously label farmland as water. In the second row, U-Net misses a portion of the water body in the river. In the last row, none of the methods are able to effectively segment the water body and path boundaries in the shadow areas. Among these four models, FCN produces relatively coarse predictions due to the final feature map output not being of the same size as the original input. It requires 8-fold upsampling to restore the prediction map size. While this approach reduces computational complexity, it results in the loss of significant detail. For the DaliWS dataset, which has a spatial resolution of 5 m, such coarse boundaries severely impact the accuracy of water body extent calculations.
The four models are trained using three different modes: HH, HV, and HH+HV. By observing the results in the first and second rows of
Figure 7 and
Figure 8, it can be seen that U-Net achieves the best segmentation performance across all three modes. The models trained with HV single-polarization data performed the worst. In the fourth row of
Figure 7 and
Figure 8, the models trained with HH single-polarization data show better segmentation performance compared to HH+HV dual-polarization data, particularly in accurately segmenting elongated rivers. In the fifth row of
Figure 7 and
Figure 8, all four models struggled to segment water bodies with indistinct boundaries.
4.3. Generalization Analysis of Four Models Trained on the DaliWS Dataset
To assess the generalization capabilities of models trained on the DaliWS dataset, we conduct generalization experiments on the HISEA-1 dataset. To do this, different polarization data from DaliWS are used to train segmentation models like FCN-8s, SegNeXt, U-Net, and DeeplabV3+. These models are then tested on the HISEA-1 dataset to obtain segmentation metrics. The DaliWS dataset has two polarization modes, HH and HV, while the HISEA-1 dataset only has VV polarization. For each model, we perform three sets of experiments: (1) HH testing on VV, (2) HV testing on VV, and (3) HHHV testing on VV. It is noteworthy that, in experiment (3), we duplicate the VV data to create a dual-channel image.
The partial water body prediction results of FCN-8s, SegNeXt, U-Net, and DeeplabV3+ on the HISEA-1 dataset are shown in
Figure 9 and
Figure 10. From the figures, it can be observed that HH polarization data overall outperformed HV and HH+HV, with the latter two polarization data showing instances of interrupted flow in predicting water bodies. The higher rate of false detections in the U-Net prediction map in
Figure 10 demonstrates that for HH+HV polarization data; the models do not efficiently extract complementary features from both data types.
The generalization test results in
Table 7 show that models trained with HH polarization data from the DaliWS dataset obtain F1 scores higher than 80% on the VV polarization data from the HISEA-1 dataset, showing better generalization performance. The models that are trained with HH+HV dual-polarization data come in second. Except for FCN, the HH+HV dual-polarization data exhibits commendable performance on the other three models. The generalization capability of HV polarization data are the weakest, significantly lower than that of HH and HH + HV. In terms of overall model generalization performance, SegNeXt exhibits the best results, with F1 scores of 85.276%, 77.307%, and 82.608% for HH, HV, and HH+HV, respectively. DeeplabV3+ follows closely, achieving F1 scores of 85.982%, 77.446%, and 81.650%. FCN demonstrates the poorest generalization performance. In summary, the four models trained on the DaliWS dataset in this study display excellent generalization capabilities on the public dataset, affirming the dataset’s value for research purposes.
5. Discussion
In order to support research on Deep Neural Network (DEN)-based water body segmentation algorithms, this paper creates a high-resolution SAR image water body segmentation dataset. The dataset consists of L1A-level SLC images from the GF-3 satellite, captured on 24 September 2019, 28 February 2020, and 23 May 2020, respectively. The original images underwent multi-view processing, Frost filtering, geocoding, and radiometric calibration to generate the final images used for dataset creation. Subsequently, the images are partitioned and sampled, dividing the large image into non-overlapping 256 × 256 image blocks and selecting the parts containing water bodies for the dataset. Finally, the Labelme tool was used for image annotation.
After completing the construction of the DaliWS dataset, this study selects four segmentation networks, namely FCN, SegNeXt, U-Net, and DeeplabV3+, for training and evaluation on the dataset.
Figure 11 shows a full comparison of how well these four models can segment and generalize across three polarization modes on the DaliWS dataset. All four models demonstrate outstanding segmentation performance on the DaliWS dataset. Among them, U-Net exhibits the best segmentation accuracy and inference time, achieving an F1 score of 92.110%. However, there is still room for improvement in the segmentation results, particularly in capturing details at the edges.
The performance of SegNeXt in
Section 4.1 prompted deep reflection. Despite ranking highest in the VOC challenge, SegNeXt shows the poorest segmentation performance on the DaliWS dataset. In response, we provide a reasonable explanation by comparing the differences between the two datasets. Firstly, although the SegNeXt model excelled on the VOC dataset, it cannot be assumed that it would achieve the same performance on SAR images due to domain differences between natural images and SAR images, which may hinder effective knowledge transfer. Secondly, water body segmentation is a binary classification task, while SAR images are primarily composed of black and gray. A complex SegNeXt network may struggle to allow each parameter to learn sufficiently, whereas simpler segmentation networks like U-Net and Deeplabv3+ are more suitable for binary segmentation tasks on SAR images.
In
Table 6, U-Net has the highest GFLOPS among the four models and the shortest inference time per image. In contrast, SegNeXt has significantly lower GFLOPS than U-Net, yet it has the longest inference time per image. Additionally, by comparing the GFLOPS and inference time of other models in
Table 6, we conclude that GFLOPS is only a theoretical indicator for measuring model speed and cannot represent the actual inference speed of the model.
In the generalization experiments shown in
Table 7, the HH polarization mode performs the best in generalization testing, while models trained using HV polarization data show significantly lower performance than HH polarization. We provide an explanation for this phenomenon: the HISEA-1 dataset used VV polarization, corresponding to HH, making the data distribution of these two modes more similar. On the other hand, although the HH+HV polarization mode integrates HH polarization information, the presence of HV polarization interferes with the overall segmentation results.
In summary, the experimental results of this study validate the effectiveness of the DaliWS dataset and compare and analyze the water body segmentation performance under different segmentation networks and polarization modes. This research holds significant implications for water remote sensing image analysis and water resource management, providing valuable insights for related studies and applications.
6. Conclusions
In conclusion, despite the rapid advancements in computer vision technology, its high-precision performance still relies heavily on precisely annotated datasets. This study aims to contribute significantly to SAR image water body segmentation tasks and offer valuable insights for future research endeavors. We meticulously create and extensively explore the DaliWS dataset, assessing it through a range of network models. While achieving satisfactory segmentation results across diverse network models, there remains room for enhancement in capturing finer details at water body edges.
Our investigation also delves deeply into the impact of different polarization modes on segmentation performance. And the experimental results demonstrate a certain correlation between datasets sharing similar polarization modes, thereby contributing to superior generalization performance. Notably, models trained on HH+HV datasets perform worse on the VV dataset. This shows that more research is needed to figure out how to combine multi-polarization data effectively.
DaliWS provides multi-polarized information and precise annotations, facilitating the extraction of practical disaster-related information. However, the dataset’s sources are not yet comprehensive enough. In the future, we will continue to upgrade DaliWS by collecting multi-source remote sensing data and conducting pixel-level annotations, thereby offering greater convenience for flood disaster emergency responses. To sum up, our work establishes a high-resolution SAR image water body segmentation dataset, analyzes the performance of various segmentation networks and polarization modes, and provides crucial practical insights and references for water body remote sensing image analysis and water resource management.