1. Introduction
Surface water is important in maintaining the biosphere and the basis of life [
1] and severely impacts ecosystems and human activities. However, floods are one of the most destructive natural disasters in the world, posing tremendous damage and threat to people’s safety and social security [
2], and about 10 to 20 million persons are affected by floods every year [
3]. With an increasingly abnormal global climate and extreme precipitation, flood events are occurring more frequently and causing more serious losses than before [
3,
4]. Therefore, the rapid acquisition of surface water bodies benefits resource investigation, emergency relief, post-disaster management, etc., which is of great research significance [
4,
5].
Remote sensing (RS) techniques have the potential for highly detailed, high-temporal-resolution, high-efficiency, and large-scale surface water mapping. Compared to optical RS imagery, synthetic aperture radar (SAR) has advantages and is usually the prime candidate for the task [
6,
7]. First, SAR works in the active mode and is independent of sun illumination and weather conditions (capable of penetrating clouds and rain). Second, it can easily differentiate the roughness of land and water bodies. Through double or multiple scattering on a rough land cover surface, the sensor can acquire a high amount of ground signals. Specular backscattering tends to occur when the radar signal reflects off a very smooth water surface, sending most of the signals away from the sensor and creating dark areas in SAR images.
In recent years, SAR technology has advanced by leaps and bounds, and various satellites such as sentinel-1 and GF-3 have been launched. They can provide many images with low cost, large coverage, and over a short time period, which greatly reduces the obstacles of using SAR data. Polarimetric SAR (PolSAR) refers to the data with additional polarimetric information and is the orientation of the plane in which the radar wave oscillates. Different polarization modes carry information about the different scattering mechanisms and structures of the imaged surface [
8,
9]. Thus, multi-polarization data always perform better in various segmentation and classification tasks than single-polarization, and likewise, it is preferred in the extraction of surface water bodies [
2,
10,
11].
Currently, many extraction methods of surface water bodies can be used. Since the scattering intensity of water bodies in PolSAR images is usually significantly lower than that of land covers [
12], threshold segmentation (TS) can be directly applied to the intensity or composite images. Some commonly used algorithms include the Ostu algorithm, adaptive global method, entropy threshold method, minimum error threshold method [
13,
14,
15], and active contour model (ACM) [
16]. Due to the advantage of requiring no additional information, low computational complexity, and ease of use, TS is still the most frequently used method in disaster emergency scenarios [
17]. However, TS methods have some natural defects, including: (1) the specific threshold value cannot be calculated quickly and often needs to be adjusted manually for better results, which means the accuracy is affected by subjective factors of the selector. (2) The segmentation result produces many noise patches, which requires a series of post-processing activities [
18,
19].
The supervised machine learning (ML) method adopts artificial design features, such as texture, terrain, and color at the pixel level, and classifiers such as support vector machine (SVM), random forest (RF), and gradient boosting decision tree (GBDT) to extract surface water [
20,
21]. In addition, the object-oriented analysis method is also commonly used by extracting features from the object of the superpixel [
22,
23]. Superpixel techniques segment images into regions by considering similarity measures defined using perceptual features. These methods often regard classification units as independent individuals and ignore their spatial relationships. Therefore, these methods’ effects on complex flood areas still need to be improved. Some studies have proposed the use of conditional random field (CRF) processing at the pixel level [
24] or association methods using the adjacent frequency of spatial object categories at the object level [
25] to introduce spatial relationships. However, these methods are cumbersome and cannot acquire enough contextual information, which limits accuracy improvement.
Deep learning (DL) methods have great performance in various segmentation tasks. Convolution neural networks (CNNs) have been widely used to identify the regions of interest in SAR images [
26]. With its multi-layer network structure and strong learning ability, the DL method is an end-to-end classifier and does not need to design features manually. However, DL is a data-driven method requiring many samples in advance, which is time-consuming and costly [
17,
24]. Another point is that the dataset is a sampling of the real environment, which means that the trained network only has a better effect on data with a similar distribution. It is particularly obvious in remote-sensing tasks that the performance of models declines sharply when used in a totally unfamiliar dataset. Moreover, the results of CNNs are easily confused at the boundary, especially in complex flood-inundated areas. Above all, CNNs are difficult to map in cases of sudden floods in different regions.
In general, the TS method cannot accurately calculate the threshold and needs to be adjusted manually. ML methods treat each classified object as an isolated object and ignore their intrinsic relationship, which limits its accuracy. DL methods can achieve the highest accuracy, but they require large samples and computing resources, which are expensive, and their performance degrades when migrating across datasets.
In recent years, graph neural networks (GNNs) have received extensive attention and have shown considerable potential in graphing data, one-shot learning, and optimizing CNN results. In the field of RS, GNNs have also been successfully applied to image classification [
27,
28,
29,
30]. A significant reason for these successes is due to the aggregation ability of GNNs for global contextual information. The object-oriented analysis method can accurately capture the boundary and reduce the number of classified objects simultaneously, which is important for analyzing high-resolution images and beneficial for converting rule data into a graph structure.
Thus, a GCN based on the object-oriented analysis method is proposed to overcome the abovementioned problems. The object-oriented method can capture the boundary of the object well, and the GCN establishes the spatial relationship between superpixels [
31]. The proposed method is tested in a complex flood area and a lake compared with the traditional method. This paper is organized as follows: the method we propose is described in
Section 2;
Section 2.1,
Section 2.2 and
Section 2.3 describe the superpixel segmentation and feature extraction, sample selection, graph construction, and training, respectively; experimental results and analysis are demonstrated in
Section 3; and the discussion and conclusion are provided in
Section 4 and
Section 5, respectively.
2. Materials and Methods
The flowchart of the proposed method is shown in
Figure 1. There are three main sub-processes: (a) superpixel segmentation and feature extraction, (b) selection of training samples, and (c) graph convolution networks construction, training, and prediction of unknown nodes.
First, the PolSAR image was over-segmented using the method of SLIC to obtain the superpixels as the classification unit. The features of each superpixel, involving scattering features, texture features, and statistics features, were calculated. Their importance in our task was also discussed.
Second, for unbiased selection of training samples (particularly to avoid the misclassification of the land with high water content), the t-SNE [
32], a popular method for dimension reduction and visualization, was applied to roughly judge the numbers of the initial clustering center of superpixels. Then an unsupervised clustering algorithm was used for generating the coarse classification results to assist selecting samples by hand. Note that the processes of t-SNE and coarse classification are not indispensable in our methods and are just used for the extremely complex scenario where manual annotation is difficult.
Third, an undirected graph was constructed according to the relationships between adjacency superpixels and the features of each superpixel. After converting the raster image into a graph, the GCN was trained with the training samples, used for prediction of the whole image, and evaluated based on the ground truth labels.
2.1. Superpixel Segmentation and Feature Extraction
2.1.1. Superpixel Segmentation
An image is a kind of regular data in Euclidean space that can also be regarded as a special kind of graph structure data. However, using each pixel as a node when training a GNN requires huge computing resources. With the ability to aggregate similar pixels to form a more representative large ‘element’, superpixel segmentation can greatly reduce the number of nodes and the effect of speckles. Additionally, superpixels preserve most object information and adhere well to boundaries [
33]. Therefore, the proposed method takes the superpixel as the basic classification unit. Available algorithms include FNEA, simple linear iterative clustering (SLIC) [
31], improved SLIC method [
34] and mixture-based methods [
35].
The SLIC algorithm clusters pixels in the combined five-dimensional color (the L, a, b values of the CIELAB color space) and image plane space (x, y coordinates of the pixels) to cluster pixels. It includes three steps, initializing the center of clustering, local k-means clustering, and postprocessing. Due to its ease of use and effectiveness, it can be adapted to segment scenes such as ponds and lakes. The improved SLIC method modifies the SLIC clustering function to adapt the characteristics of polarimetric statistical measures and renews the initialization method to produce robust cluster centers. It is a better choice in complex scenes such as floods. In the following study areas of this paper, the improved version was adopted. Note that over-segmentation is required to obtain object boundaries accurately.
2.1.2. Feature Extraction
After superpixel segmentation, the features of each node were extracted, including:
- (1)
Scattering matrix and Statistical features
Usually, the electromagnetic scattering characteristics of radar targets in the far field are linear. If the scattering space coordinate system and the corresponding polarization base are selected, there is a polarization component between the radar illumination wave and the target scattered wave. Therefore, the variable polarization effect of the target can be expressed in the form of a complex two-dimensional matrix called the scattering matrix, which represents the full polarization information of the target at a specific attitude and observation frequency.
where
H stands for horizontal polarization and
V means vertical polarization.
and
denote the co-polarization components of the scattering matrix.
and
are the cross-polarization components, equal for a reciprocal medium. All matrix elements are directly acquired from the complex values of multi-polarization single-look images. For dual-polarization data, only one co-polarization and one cross-polarization element were kept. These components tend to play different roles in extracting water. For instance,
images usually have a better signal-to-noise ratio than
in various targets, especially calm surface water bodies, due to the specular reflection mechanism;
scarcely differs from
; and
is more sensitive to rough water surfaces, such as flowing rivers or floods.
A scattering matrix is acquired for each pixel, and to extract the features representing each superpixel, all pixel-level matrix elements need to be translated into superpixel-level statistics values. For example, the mean, median, standard deviation, maximum, and minimum of each scattering matrix element should be calculated for all pixels belonging to the same superpixel. Among these features, maximum and minimum values are easily affected by the speckle noise, and the median can measure the central tendency better than the mean.
- (2)
Texture features and covariance matrix
Texture analysis is important for SAR segmentation and has already been investigated in some literature [
36]. Textures can be described by first-order metrics, such as the mean, variance, or entropy, as well as second-order metrics, such as gamma distribution features [
7]. They can be calculated in the superpixel space and serve as a direct indicator of ‘disorder’ within them. All the above are based on single-polarization images; although they can be calculated for each polarization channel, a more sensible method is to obtain a mixed texture value from multi-polarization data. Considering that heterogeneous regions are common in high-resolution SAR images, the spherically invariant random vector (SIRV) product model can be adopted [
37]. In this model, the maximum likelihood estimator of the texture value is given by:
where
is the target vector in the linear bases as
for quad-polarization data.
refers to the normalized covariance matrix (
).
denotes the conjugate transpose operator and
equals 2 (dual-polarization) or 3 (quad-polarization). It is important to note that the covariance matrix is the mean of all pixels in the superpixels rather than a single pixel. It contains all the polarization information and is usually taken as the basis of PolSAR processing. Thus, the upper triangular elements of
(symmetric matrices) could be used as features. The elements of the main diagonal are highly related to the statistical mean values of the
matrix, and they refer to the mean intensity and magnitude value, respectively.
- (3)
Land to Water pixel Ratio (LWR)
With no regard to various defects, TS methods are widely used in water extraction due to their convenience. Thus, their pre-segmentation results can be expected as an important input for a GNN network, and especially in some complex flood areas, they can facilitate the distinction between high-water soil and water bodies. However, almost all TS methods are concentrated on the pixel level, which cannot be used directly at the object level. Therefore, this study proposes an indicator called LWR based on superpixel and TS methods as follows:
The standard Ostu method [
38] is first applied to all pixels to obtain a binary image. It is a self-adaptive maximum inter-Class variance method and is commonly used in gray image segmentation. Pixels with intensity values lower than the threshold are marked as 0, and others are marked as 1. Then, for each superpixel, the ratio of the land pixel number to water (1 to 0) is counted. The superpixel is more likely to be water as the LWR becomes smaller.
2.2. Sampling
Few samples should be chosen for GNN model training. It is an operation that manually assigns several superpixels their true ground labels (water or non-water). It is important to note that the non-water label probably contains several different ground features, such as bare soil, grass, and buildings. More seriously, some ground with a high-water content has a gray tone, which is easily confused with water in SAR images. Thus, samples should carefully include all features that probably affect the extraction of water. Any sample errors or deficiencies reduce the model’s precision. An unsupervised method could be optionally used to assist sample selection, the specific operation of which is as follows.
First, the feature vector of each superpixel is reduced to two dimensions using the t-SNE method [
32], and the aggregation of the two-dimensional feature scatterplot is drawn after dimensionality reduction. Different cluster centers are regarded as different categories, other scattered points are ignored, and the amount of cluster centrality is recorded. Next, the K-Means clustering method is used to cluster the features of all superpixels. The number of cluster centers is the number of clusters recorded in the previous step using t-SNE. Using K-Means results as a reference, some representative water and non-water objects are selected by hand as samples (train sample data). It is important to note that K-Means results cannot be regarded as final segmentation results because they still have many errors that are removed later in the GNN network.
2.3. Graph Construction and Classification
2.3.1. Graph Construction
An image is in a regular grid structure where each pixel has four neighbors in its first-order neighborhood and eight in its second-order neighborhood. The regularity of the data in Euclidean space makes it simple to handle, which also can be expressed as graph structure data composed of several nodes and the edges of the adjacency relationship between them. Graph structure data can be expressed as G = (V, E), where V is the set of graph nodes with its feature vector, and E is the set of edges connected between nodes, including the attributes of edges. Based on the result of superpixel segmentation, an undirected graph (the weight of the edge is the same) is constructed by connecting each center point superpixel with all the surrounding superpixels with shared segmentation edges as the adjacency relationship and taking the features extracted by each superpixel unit as the features of each vertex in the graph structure. The undirected graph reflects the adjacency relationship between superpixels, and each node corresponds to a meaningful real entity. It reduces the amount of calculation through superpixel generation, which is particularly important in high-resolution images.
2.3.2. Node Classification via a Graph Convolutional Network
This study adopts the GraphSAGE framework, which is a kind of inductive learning model in the spatial domain [
39]. The framework first aggregates the information of each node and its adjacent nodes through several graph convolution layers and then outputs the classification graph. The output classification graph is the graph data with the same structure as the input graph, but the feature of each point is the category of nodes, as shown in
Figure 2.
In the hidden layer, each layer first samples each node and its surrounding neighbors and then aggregates the sampled data of each node to obtain the new characteristics of the node, as shown in
Figure 3. There are two key points in the hidden layer: sampling and information aggregation. Taking node V2 as an example, first, we can randomly sample the neighbor nodes of V2, set the number of samples as
N, and obtain the set
, while f is the dimension of node characteristics. If the number of vertex neighbors is less than
k, the sampling method with put-back is adopted until n nodes are sampled. We can aggregate the mean value of the set sampled from each node and obtain the new characteristics of the node after linear transformation and activation function, as shown in the following formula:
In the output layer, the cross entropy of the predicted label and the real label of each node is used as the loss function.
where
is the loss and
is the number of categories. If the real category of sample
is category
, we take
; otherwise,
,
is the prediction probability that sample
belongs to category
.
3. Results
Our proposed method was tested on a Sentinel-1 image with a flood event and a GF-3 image for the river extraction. The results were compared to other pixel-based and superpixel-based methods such as RF and XGBoost [
40].
3.1. Study Areas and Data
3.1.1. Study Areas
The main experimental area is the Xinfa reservoir (see
Figure 4a), located in Inner Mongolia, China. It is used for flood control, water supply, the comprehensive utilization of power generation, and aquaculture irrigation. The total storage capacity of the reservoir is 38.08 million cubic meters, including 23.31 million cubic meters of flood-control storage and 5.26 million cubic meters of regulation storage. The standard return period of design flood is 30 years, and the corresponding peak flow is 303
. At 15:30 (UTC/GMT +8) on 18 July 2021, the Xinfa reservoir collapsed, causing the downstream highway to be washed away.
Figure 4(a1,a2) show the Sentinel-1 SAR images acquired before and after the flood, respectively. Another study area is the Zhujiang River in Guangzhou province (see
Figure 4b), where dense river branches and fishponds are distributed. Both study areas are challenging and can prove the superiority of the proposed method. In this paper, the comprehensive processing results of study area 1 and only the final classification results of study area 2 are presented.
3.1.2. Experimental Data
The experimental data include sentinel-1 and GaoFen-3. Sentinel-1 data in the experiment are dual-polarization GRD products (VV and VH polarized images) in the IW mode with the C-band. The final data used in the experiment were obtained after orbit correction, thermal noise removal, radiometric calibration, speckle filtering, and terrain correction [
41]. All this preprocessing was achieved with SNAP software. Then, Python code was used for feature extraction, superpixel segmentation, samples, training GCN, and accuracy evaluation. GaoFen-3 is a C-band SAR imaging satellite launched by the Chinese government. The data used in the experiment were in full-polarization mode with a resolution of 1 m. The whole study area was labeled through a pre-classification operation and by manual correction, meaning each superpixel has its ground label (water or non-water). This was used to evaluate the accuracy of various methods and to validate the efficacy of the proposed method.
3.2. Results of Study Area 1
3.2.1. Superpixel Segmentation
For study area 1, the desired size of the superpixel was set to 225, which is the most commonly used value in previous studies. If the value is too small, the superpixel will lose statistical significance, introducing a large amount of noise and a great computational burden to the subsequent classification task. Likewise, a bigger value will affect the accuracy of land–water boundaries to some extent. It is important to note that the real superpixel sizes are changeable and adaptive to their locations. In this study, most sizes were between 200 and 300, with some smaller, reaching a given minimum value of 10.
Figure 5 shows the superpixel-segmentation result, and even in the most complex regions, the superpixels adhere well to the boundaries of land and water. Thus, it can be concluded that superpixel-segmentation methods are suitable for pre- and over-segmentation in water extraction.
3.2.2. Analyzation of Features
All the features discussed in
Section 2.1.2 compose a vector representing each superpixel. LWR was calculated for all superpixels, and the histogram result is shown in
Figure 6.
Figure 6 shows obvious double peaks, and that closer to 0 belongs to the water body.
For the redundancy analysis of features, in this part, their correlations were calculated, as shown in
Figure 7a. The texture value is generally independent of other features. The LWR value is related to the mid-value in the HV mode and the mean and standard deviation values in the HH mode separately; furthermore, the mid of the HV mode is related to the standard deviation and mean in the HH mode. Generally, not every polarization mode shows the same trend, and the correlation between different polarization modes is not strong, which shows that different polarization modes are of different importance to water extractors; the LWR is independent of the texture value and scattering matrix and shows a certain correlation with statistical characteristics, as it is calculated from those features. On the other hand, we already know the strong correlation between LWR and water bodies, so the features with a strong correlation with LWR should also be conducive to water extraction; in other words, HH is more conducive to water extraction than HV polarization.
To analyze whether the design features are useful, the importance of the features is evaluated by the tree model, as shown in
Figure 7b. According to the results, the most important features are texture information, then the LWR features and the scatter matrix. In terms of the statistical features, the most useful are the standard deviation, min, and mean in the HH mode, which further shows that the HH polarization mode is more suitable for water extraction.
In general, the above analysis shows that the features are effective, and the importance of texture and the HH polarization mode is demonstrated. Considering that a neural network can resist the risk caused by feature redundancy, different models have different weights for different features; all the above are selected as the features of nodes.
3.2.3. Samples
A total of 14,878 superpixels were acquired and then sampled according to the method in
Section 2.2.
Figure 8a shows that the t-SNE method classified all superpixels into three categories: land, water, and unknown. Unknown superpixels refer to shallow water or pretty wetland areas with backscattering characteristics between land and water.
Figure 8b shows the visual distributions of these three categories. It is important to note that the pre-classification cannot replace the final GNN classification because it still has many errors and should be carefully selected for samples. The pre-classification is just used to assist in selecting training samples and is not a prerequisite. In this experiment, samples of 46 water superpixels and 67 non-water superpixels were used for the model training, and all others (already having ground truth labels due to manual and pre-classification) were used as the validation dataset. Though the training samples were small, we still obtained satisfying results through further model classification operations.
3.2.4. Classification Results
A small area is shown in
Figure 9 to visualize the superpixels and constructed graph data. To highlight the performance of GNN, classifiers of RF and XGBoost were also used to classify the superpixels based on the selected features. Results show that they all achieved good visual performance (see
Figure 10a–c) relative to the ground truth (see
Figure 10g). To validate the effectiveness of superpixels, standard TS, RF, and XGBoost were also tested on the dimension of pixels. All pixels in 46 water superpixels and 67 non-water superpixels (the same as in the superpixel classification) were used for training, and all other pixels in the image (with labels) were used for validation. The results can be seen in
Figure 10d–f. Pixel-based methods produced much more noise than superpixel-based methods, and standard TS had the worst performance. Of course, extra filter operations could improve the results to some extent, but in this research, we merely discuss standard processing.
Quantitative evaluations are given in
Table 1. Both superpixel-based and pixel-based methods were tested, and scores were calculated in their own dimensions. To compare our method’s results with pixel-based methods, the scores should be recalculated again from the superpixel-level to pixel-level according to their spatial inclusion relations (see the last line of
Table 1). Results show that our method had the highest precision, recall, and F1 scores. For superpixel-based methods, the recall score of our method was 3 to 4 percent higher than RF and XGBoost. XGBoost obtained the second-best scores. We also tested another graph convolution network called GAT [
42], whose performance was worse as well. To further visualize the difference between XGBoost and GNN, regions of TP (true-positive), FN (false-negative), and FP (false-positive) are presented in
Figure 11. We can see that XGBoost missed many positive superpixels in small water areas or the boundary of land–water areas. A GNN can deal with this problem better due to its powerful ability to capture contextual information. The second part of
Table 1 clearly shows that our proposed method achieved much higher scores than pixel-based RF and XGBoost.
3.3. Results of Study Area 2
In study area 2, a total of 377,579 superpixels were acquired, and all of them were manually labeled through pre-classification. The ground truth can be seen in
Figure 12g. A total of 575 water superpixels and 2717 land superpixels were used for training, and other superpixels were used for validation.
Figure 12 shows the results of various methods. Overall, the GNN achieved the best performance in detecting small targets and preserving boundaries. The red rectangles in
Figure 12a indicate where ships were accurately removed. Similar to study area 1, pixel-based methods (see
Figure 12d–f) generated more noise than superpixel-based methods.
Quantitative evaluations are given in
Table 2. For superpixel-based methods, our method acquired an increase of 5 to 6 percent in precision, 12 to 13 percent in recall, and 9 to 10 percent in F1 scores, compared to RF and XGBoost. GAT network acquired the second performance, and about 6 percent in F1 scores lower than ours. The positive effect of GNN network was validated in this experiment. Meanwhile, there were much bigger increases for pixel-based methods—a maximum of 12 percent in precision, 22 percent in recall, and 17 percent in F1 scores were acquired. This shows the power of our proposed method in processing complex scenarios.
4. Discussion
Based on superpixel segmentation, this study transforms a regular image into irregular graph structure data. Another more direct method is to take each pixel as the vertex, the pixel value as the features of the node, and the second-order neighborhood as the adjacency relationship. This causes two problems: the first is the network size and computational burden. Taking a single-channel small image (size of 1024 × 1024) as an example, 1,048,576 vertices and nearly 10 million edges are generated. Assuming that the superpixel is composed of 200 pixels on average, the number of vertices is reduced by 200 times, from one million to five thousand, which also means that the number of network calculations is reduced by 200 times. The second problem is the speckle noise. Different from an optical image, any pixel-level algorithm on SAR data is greatly affected by speckle noise. Statistical values on an adjacent area are more robust and meaningful than a single pixel value and are thus more often used for PolSAR classification tasks. Moreover, a prior knowledge graph can be constructed by superpixel segmentation. Thus, superpixels are preferred in a GNN network.
However, the usage of superpixels still causes some problems. First, several segmentation methods have been recently proposed for multi-polarization and single-polarization SAR data. Often, we need to choose the optimal method according to the data, and its precision and efficiency directly affect the GNN results. On the one hand, some small targets are ignored when the original size of the superpixel is too large. On the other hand, compared to simple TS methods, over-segmentation is time-consuming, especially for a large area. However, once superpixels are acquired, a GNN can classify them as quickly as other methods. Another problem is how to express the features of nodes. In this paper, a series of features were designed and analyzed, including statistical values of the scattering matrix, texture, covariance matrix, and LWR. For different data, the feature values probably have different expressions. Like texture value, several calculation methods are available. These features tend to be redundant because some of them are linearly dependent. However, the GNN network can deal with this redundancy, and the accuracy does not reduce, even when all available features are inputted. Moreover, the importance of features differs greatly. Through experiments, we determined that the texture and LWR are the most important, and the second most important are the scattering values of the HH polarization. For single-polarization data, texture and HH are the most important.
The parameters affecting the GNN results mainly include the sampling number of surrounding neighbors and the layer number. A large sampling number causes lots of redundant information and computation, while a small number leads to a feature deficiency in adjacent nodes. Usually, the mean value of neighbor nodes is the optimal number. Meanwhile, the layer number depends on how many layers of information you want to aggregate. If there are too many layers, the characteristics of the nodes in the same area converge, and the nodes are not separable because they all contain a lot of the same neighborhood information. Generally, graph convolution should not have more than five layers. We also tried the graph convolution network of GAT and found that the accuracy in this task is not as good as the GraphSAGE. That may be because GAT more easily smooths features between different nodes.
The proposed method is suitable for all kinds of water-related scenes, such as rivers, lakes, ponds, coastlines, floods, and probably paddies. It has advantages in uncommon and complicated scenes, where large samples are hard to collect, and trained DL models cannot be directly used. High precision can be expected, while for an extremely large area, the computation time is alarming compared to other simple TS methods. It is important to note that the proposed method can also be sped up by simplifying the steps of superpixel segmentation and feature selection. On the other hand, the use of superpixels leads to a problem: how to express the features of each superpixel reasonably. Manually designed SAR image features are shallowly expressed and can affect classification accuracy. Missing any key features likely leads to bad results. A simple operation involves designing and choosing all available features as much as possible, but this makes the method cumbersome and inefficient.
5. Conclusions
For all-weather imaging, PolSAR images play an increasingly important role in emergency response and disaster relief. This study proposed a method based on superpixel methods and graph convolution to extract the water area in PolSAR images, which has three inherent advantages: (1) superpixel segmentation makes the classified object correspond to a real-world object, reducing the influence of noise in the classification; (2) graph convolution no longer considers each classification object separately but considers the spatial relationship between objects, which improves the accuracy of classification. Moreover, this paper emphatically explains three processing parts: (1) feature extraction, including the statistical values of the scattering matrix, texture value, covariance matrix elements, and LWR, and their correlation and importance are discussed; (2) samples—an auxiliary unsupervised method was proposed to deal with complex scenes; (3) the GNN model, including graph construction and node classification. A GraphSAGE framework was chosen.
Comparison experiments were conducted in two study areas at both the superpixel level and the pixel level. The results show that superpixel-based methods achieved much better performance than pixel-based methods, proving the advantage of using superpixels in SAR images. Though the performances of the former can be improved using extra skills or advanced algorithms, superpixels are still preferred. Among the superpixel-based methods, the GNN classifier obtained the highest scores, especially the recall metrics, which were about 3–4 percent higher than XGBoost and RF. The experiments in this paper prove that our proposed method is suitable for finely mapping water bodies in complex areas, such as in the case of floods. The GNN uses an inductive learning framework to mine deeper spatial relationship information and can efficiently classify the nodes (superpixels). Compared with CNN methods, the proposed method does not require high-quality datasets and much computing power while maintaining high accuracy. The proposed method can be used for reference in extracting other ground features in SAR images.
6. Patents
Haoming Wan, Panpan Tang, et al., A refined flood area extraction method based on Spaceborne SAR image: China, ZL202210043155.8[P]. 14 January 2022.