1. Introduction
Synthetic aperture radar (SAR) plays an essential role in the remote sensing (RS) field. Its all-day and all-weather capabilities make it widely applied in both military and civil fields [
1,
2,
3,
4]. Specifically, ship detection in SAR images attracts increased attention due to its application in marine monitoring, shipping management, shipwreck rescue and illegal vessel control [
5,
6,
7,
8]. Thus, it is of great significance to obtain accurate ship detection results.
Recently, with the great breakthrough of deep learning (DL) in the computer vision (CV) field, ship detection in SAR images based on convolutional neural networks (CNNs) has attracted an increasing amount of attention. For instance, Mao et al. [
9] proposed a lightweight network with an efficient low-cost regression subnetwork for SAR ship detection. Dai et al. [
10] proposed a fusion feature extractor network and a refined detection network for the problem of multi-scale ship detection in complex background. Zhang et al. [
11] presented a lightweight network for SAR ship detection, where the depth-wise separable convolution was adopted for lightening the model and three other mechanisms were proposed to compensate for the accuracy. Zhao et al. [
12] proposed a two-stage SAR ship detector with a receptive field block (RFB) and a convolutional block attention module (CBAM) for the multi-scale ship detection problem. Pan et al. [
13] used a multi-stage RBox detector for arbitrary-oriented ship detection in SAR images. Fu et al. [
14] offered an anchor-free CNN composed of a feature-balanced pyramid and a feature refinement network to tackle the multi-scale SAR ship detection problem. Zhang et al. [
15] proposed a lightweight SAR ship detector, where a feature fusion module, a feature enhancement module and a scale share feature pyramid module were adopted to guarantee its detection performance. Zhang et al. [
16] constructed a hyper-light deep learning network to reach high-accuracy and high-speed ship detection in which five contributions are offered, i.e., a multi-receptive-field module, a dilated convolution module, a channel and spatial attention module, a feature fusion module and a feature pyramid module. Han et al. [
17] explored the training of a ship detector from scratch, and they established a CNN-based SAR ship detection model with a multi-size convolution module and a feature-reused module to verify the methodology’s effectiveness. Geng et al. [
18] proposed a ship detection method where a traditional island filter and a threshold segmentation method were integrated into a CNN model. The above studies have presented promising results in SAR ship detection. However, they all ignore the rich information in the dual-polarization SAR features (i.e., VV polarization features, VH polarization features and polarization coherence features) that have great potential to help achieve better SAR ship detection performance.
There are a few CNN-based researchers focusing on ship detection in multi-polarization SAR images [
19,
20,
21,
22]. Fan et al. [
19] offered a semantic segmentation method for complex scene ship detection. However, in their compact polarimetric (CP) SAR images, only two types of polarization features are used, without considering their polarization coherence features, so their input data do not contain enough SAR polarization information. Jin et al. [
20] proposed a pixel-level detector for small-scale ship detection and verified the effectiveness of the network on PolSAR images. Fan et al. [
21] aimed to solve the multi-scale ship detection problem and carried out experiments on PolSAR images. They first established a deep convolutional neural network (DCNN)-based sea–coast–ship classifier for ship region extraction, and then proposed a modified Faster-RCNN for ship detection. Hu et al. [
22] constructed pseudo-color SAR images composed of rich dual-polarization features and proposed a weakly supervised method for ship detection. However, these works [
20,
21,
22] all neglect to enhance and fuse different polarization characteristics but directly feed them into the network without any distinguished treatment, which fails to fully mine the respective characteristics from different channels. All in all, the above pixel-level ship detection studies [
19,
20] and object-level ship detection studies [
21,
22] all monotonously utilize polarimetric SAR images to verify their respective tasks, without fully excavating polarization SAR features and considering such information when optimizing their network structures.
To tackle the above problems, in this paper, we propose a novel group-wise feature enhancement-and-fusion network with dual-polarization feature enrichment (GWFEF-Net) for better dual-polarization SAR ship detection. We introduce four contributions to guarantee the performance of GWFEF-Net, i.e., (1) a dual-polarization feature enrichment (DFE) proposed for enriching the dual-polarization feature library and suppressing clutter interferences to facilitate the subsequent feature extraction, (2) a group-wise feature enhancement (GFE) designed for autonomously enhancing the various polarization semantic features to highlight various polarization regions, (3) a group-wise feature fusion (GFF) designed for obtaining fused multi-scale polarization features to realize polarization features’ group-wise information interaction, (4) a hybrid pooling channel attention (HPCA) proposed for channel modeling to equalize each polarization feature’s contribution. We also conduct sufficient ablation experiments to verify the effectiveness of each contribution. Finally, extensive experimental results on the Sentinel-1 dual-polarization SAR ship dataset demonstrate the superior dual-polarization SAR ship detection performance of GWFEF-Net, with 94.18% in average precision (AP), compared with the other ten competitive methods. Moreover, it can offer a 4% AP improvement over the baseline Faster R-CNN and a 2.51% AP improvement compared with the second-best method.
Our main contributions are as follows:
We design a novel two-stage deep learning network named “GWFEF-Net” for better dual-polarization SAR ship detection.
To achieve the excellent detection performance of GWFEF-Net, we (1) propose DFE to facilitate subsequent feature extraction; (2) design GFE to highlight each polarization semantic feature region; (3) design GFF to realize polarization features’ group-wise information interaction; and (4) propose HPCA to balance each polarization feature’s contribution.
GWFEF-Net achieves state-of-the-art detection accuracy with AP of up to 94.18% on the Sentinel-1 dual-polarization SAR ship dataset, compared with the other ten competitive methods.
The remaining materials are arranged as follows.
Section 2 introduces the materials and methods.
Section 3 describes the experimental details.
Section 4 shows the quantitative and qualitative experimental results.
Section 5 presents ablation studies on four contributions.
Section 6 discusses the whole framework. Finally,
Section 7 provides the conclusion of the work.
2. Methodology
GWFEF-Net is established based on the mainstream two-stage detector, i.e., Faster R-CNN [
23]. Generally speaking, two-stage detectors have superior accuracy performance over one-stage ones [
24], so we choose the former as our baseline.
Figure 1 shows the network structure of GWFEF-Net. The raw Faster R-CNN contains a backbone network, a region proposal network (RPN) and a detection subnetwork [
25]. DFE is proposed as a preprocessing technology. The GFE, GFF and HPCA are inserted into the detection subnetwork for better polarization feature enhancement and fusion.
Firstly, GWFEF-Net preprocesses the input dual-polarization SAR images by the proposed DFE to enrich the dual-polarization features. The details will be introduced in
Section 2.1. Then, GWFEF-Net uses a backbone network to extract ship features from dual-polarization SAR images. Without losing generality, we use the mainstream ResNet50 [
26] as our backbone network. Then, an RPN is used for the extraction of regions of interest (i.e., regions containing ships). Afterward, a RoIAlign [
27] layer is used to map the generated proposed regions by the RPN to the feature maps of the backbone network for subsequent classification and regression in the detection subnetwork. Finally, feature maps generated by the RoIAlign are input into a detection subnetwork for the final prediction, and the final ship detection results are obtained.
Note that in the detection subnetwork, the proposed GFE, GFF and HPCA are inserted, which are used for better polarization feature enhancement and fusion. Specifically, for better polarization feature enhancement, we insert the GFE into the detection subnetwork. It is used to highlight each polarization semantic feature region by means of enhancing each polarization semantic feature, which will be introduced in detail in
Section 2.2. For better polarization feature fusion, we then insert the GFF after the GFE. It is used to realize polarization features’ multi-scale information interaction by means of fusing multi-scale polarization features, whose details will be introduced in
Section 2.3. In addition, HPCA is inserted in the GFF to balance each polarization feature’s contribution by channel modeling. It will be introduced in detail in
Section 2.4.
The motivation for the core idea of GWFEF-Net can be summarized as follows.
- (1)
It has already been demonstrated that dual-polarization features play an important role in improving the detection accuracy of traditional SAR ship detection methods. Inspired by this, employing such features is also likely to improve the performance of DL-based methods in SAR ship detection tasks. However, most of the CNN-based SAR ship detection methods only utilize single-polarization features as the input of networks, ignoring the dual-polarization characteristics with rich structural information of ships. Though a few researchers have tried to utilize polarimetric SAR images to verify their respective ship detection tasks, their networks are not especially designed for the polarimetric characteristics and no special treatments, such as enhancement and fusion, have been applied for different polarization features. Hence, it is of great significance to study how to fully excavate polarization SAR features in a CNN-based network.
- (2)
To address the above problems, we propose a group-wise feature enhancement-and-fusion network with dual-polarization feature enrichment (GWFEF-Net) to improve the SAR ship detection performance. Specifically, four contributions (i.e., DFE, GFE, GFF and HPCA) are proposed in GWFEF-Net. DFE enables the enrichment of the feature library with more abundant ship polarization information to facilitate the subsequent feature extraction; GFE adopts group-wise features to learn and enhance the semantic representation of various polarization features so as to highlight various target ship regions; GFF performs information interaction between polarization features and multi-scale ship features, which is helpful to extract more abundant information of polarization features and multi-scale ships; HPCA is designed for channel modeling to further balance the contribution of each polarization feature.
Next, we will introduce the DFE, GFE, GFF and HPCA in detail in the following sub-sections.
2.1. Dual-Polarization Feature Enrichment (DFE)
As for the Sentinel-1 satellite product, it contains two polarization modes of VV polarization and VH polarization. However, the coherence polarization feature is also useful for identifying ships [
28]. Inspired by the work [
28], we introduce the coherence polarization feature to characterize ship feature relationships in different polarization channels, which can enrich the dual-polarization feature library and suppress clutter interference to further improve the follow-up detection performance. For brevity, we call the above process the dual-polarization feature enrichment (DFE).
We will describe the feature types mentioned above in detail.
(1) VV feature: In the VV polarization image, a ship often has strong backscattering values in the sea background, which means that the outline and texture of the ship are relatively clear [
29,
30]. Thus, VV features are widely utilized for SAR ship detection.
(2) VH feature: In the VH polarization image, a ship often has lower backscattering values in the sea background, and the sea clutter is lower than the instrument noise level. However, the signal-to-noise-ratio (SNR) of VH is higher than that of VV [
31,
32]. Thus, VH features are also applicable to SAR ship detection.
(3)
CVV-VH feature: Considering the dual-polarization characteristic in the Sentinel-1 satellite product, a polarization covariance matrix C2 is obtained by the following formula [
33]:
where
Svh denotes the VH polarization complex data,
Svv denotes the VV polarization complex data, |·| denotes the function of the absolute value, * denotes the conjugate operation. The polarization coherence feature is defined by
CVV-VH can effectively represent the dual-polarization channel correlation. In the dual-polarization image, the reflection symmetry effect of the sea scene is significant. In other words, the CVV-VH polarization value can be very low in an image with sea clutter, because the image meets the reflection symmetry; conversely, it can be very high in an image with artificial objects, such as a ship, because the image does not meet the reflection symmetry. In short, the ship-to-clutter-ratio of CVV-VH features is higher than that of the other two features, that is, the clutter interference can be suppressed with CVV-VH features. Thus, CVV-VH features have the potential to improve SAR ship detection.
Briefly speaking, first, we enrich the existing VV features and VH features according to formula (2), and we can obtain the generated
CVV-VH features. Then, the amplitude values of VV, VH and
CVV-VH polarization complex data are integrated into the R, G, B channels of the pseudo-color images, so we can obtain resulting images with less sidelobe and clutter interference. DFE can be described as in
Figure 2.
To summarize, DFE introduces the CVV-VH feature into the feature library characterizing feature relations in different polarization channels. Moreover, it provides richer polarization information and suppresses clutter interferences, therefore facilitating subsequent feature extraction. In our subsequent implementation, in order to make full use of the polarization features provided by DFE, we insert the proposed GFE, GFF and HPCA into the detection subnetwork for better dual-polarization SAR ship detection.
2.2. Group-Wise Feature Enhancement (GFE)
Group-wise features are widely used in the CV community and can adaptively learn semantic representations of different interested entities. Thus far, a large number of scholars from the SAR ship detection field have devoted themselves to researching single-channel polarization SAR images [
9,
10,
11,
12,
13,
14,
15,
16,
17]. However, they ignore the exploration of multi-polarization characteristics and further adoption of group-wise features to learn the semantic representation of various polarization features. Thus, different from the former ship detection networks, which are designed for single-polarization SAR images, considering the polarization semantic feature differences of different channels, we adopt group-wise features to autonomously enhance the learned semantic representations of various polarization features.
Specifically, we attempt to conduct feature grouping enhancement along the channel dimension, which is inspired by Sabour et al. [
34] and Li et al. [
35]. Thus, we propose the GFE to obtain enhanced semantic information for each SAR polarization feature. First, considering that the output data preprocessed by the DFE is composed of three polarization channels, i.e., VV feature, VH feature and
CVV-VH feature, we use a group convolution to enrich the polarization features (i.e., the number of feature channels is tripled). Then, we group-wise enhance the spatial information of the three polarization features in the channel dimension. Finally, we can obtain enhanced semantic features of each SAR polarization feature.
Figure 3 shows the detailed structure of the GFE. In the detection subnetwork, first, the feature map
Fin ∈ R
W×H×Cis input, where
W,
H and
C represent the height, width and channel of the input feature map, respectively. In our implementation,
W and
H are equal to 7, and
C is equal to 256. Its channel is expanded three times to obtain richer SAR polarization features (i.e., VV feature, VH feature and
CVV-VH polarization). Specifically, we obtain three group feature maps along the channel dimension through the following operation:
where GC(·) is the group convolution operation and
Fi is the
i-th group feature maps. Note that
i = 1, 2, 3, which keeps the same as the number of polarization features (i.e., VV polarization, VH polarization and
CVV-VH feature).
Because the noise distribution in each polarization image is inconsistent [
35], it is necessary to enhance the polarization feature in the group space for highlighting each polarization semantic feature region. Without loss of generality, we first examine a certain group feature map, namely
F1 = {
f1, …,
fm}, m = 256. The global pooling operation (i.e., global average pooling and global max pooling) is conducted to extract the global semantic feature
g of the group polarization feature map
F1. The operation can be described by
Then, the corresponding importance coefficient
ci is obtained by conducting a dot product between the global semantic feature
g and local feature
fi. The operation formula is defined by
Subsequently, in order to reduce the coefficient deviation caused by the various ship samples (i.e., inshore ones and offshore ones), we conduct the following normalization operations:
where
ε (i.e., 1 × 10
−5) is a constant added for numerical stability, which follows the work [
35].
Finally, to obtain the final enhanced polarization feature, the original feature
fi is weighted by the corresponding importance coefficient
ci via a sigmoid function
σ(·):
Thus, we can obtain the polarization feature group , i.e., = {, …, }, m = 256. In this way, we can obtain all three resulted polarization feature groups , i.e., = {, , }. The enhanced features can highlight each polarization semantic feature region so as to enhance the meaningful ship target area and better focus on the interested ship targets. To summarize, the network will detect more meaningful ships and suppress useless clutter interferences with the help of the GFE, which will greatly improve the detection performance of GWFEF-Net.
2.3. Group-Wise Feature Fusion (GFF)
It is important for enhancing the represented capability of object detection CNNs to obtain the fused features from different scales. Most existing methods in the SAR ship detection field attempt to fuse the multi-scale features in the layer-wise dimension [
36,
37]. The former work [
36] aims to fuse the high-layer feature maps and low-layer feature maps to achieve enhanced multi-scale semantic features. The latter [
37] aims to transmit the location information of the shallow layer to the deep layer to achieve enhanced multi-scale spatial features. Different from the above works that fuse features from different resolutions, considering various polarization semantic representations of different channels, we aim to obtain the fused multi-scale polarization features in the channel group-wise level. In other words, we tend to achieve multi-scale polarization feature interaction at a channel group-wise dimension besides other existing dimensions, i.e., depth [
38], width and cardinality [
39]. In addition, our inspiration is derived from the works of Gao et al. [
40], Lin et al. [
41] and Ezegedy et al. [
42], which are recommended to the readers.
We have obtained each enhanced polarization semantic feature group from
Section 2.2; to further utilize the advantages of all polarization features, the information interaction between different polarization features should be considered. Thus, a group-wise features fusion (GFF) is proposed. Firstly, this can increase the range of receptive fields of each polarization feature group. Secondly, it can fuse the different polarization feature groups. The above factors all guarantee the extraction capability of multi-scale polarization features and the excellent information interaction capability of the GFF.
Figure 4 shows the detailed structure of the GFF. Note that, after the GFE, we obtain three feature map groups, denoted by
, where
i ∈ {1, 2, 3}. Each feature group
has the same height, width size and channel amount compared with the original input feature map
Fin ∈ R
W×H×C in
Section 2.2. In addition, in order to balance each polarization feature’s contribution, a HPCA is inserted in the GFF, which will be described in detail in
Section 2.4.
We conduct a 3 × 3 convolution operation for each feature map group and feed the results into the next group. In this way, we can obtain the fused polarization feature group with a larger range of receptive fields.
In short, the above can be described by
Note that each 3 × 3 convolution operation Conv3×3() could receive polarization information from multi-group features {’, j ≤ i}. In addition, each instance of conducting a 3 × 3 convolution operation Conv3×3() on can provide an output with a larger receptive field than .
In this way, we can obtain all fused multi-scale polarization features , i.e., = {, , }. In short, GFF can offer excellent information interaction from polarization features and multi-scale ship features, which is helpful to extract more abundant information about ships. Thus, GWFEF-Net can detect more multi-scale ships with the help of the GFE and the final detection performance will be improved.
2.4. Hybrid Pooling Channel Attention (HPCA)
Attention mechanisms have been widely applied in the CV community and can enhance valuable features and improve the expression ability of a CNN through spatial or channel-wise information. Considering the polarization semantic feature differences of different channels, we attempt to balance each polarization feature’s contribution at the channel-wise dimension to achieve a better polarization feature fusion. Thus, in our implementation, we choose to use the channel attention mechanism to better obtain reasonable channel modeling during the feature fusion described in
Section 2.3. There have been a few attempts [
43,
44] to incorporate channel attention processing into CNNs to obtain the importance of each channel. However, the above channel attention models are all extracted through the global average pooling operation, which could be suboptimal [
45], so we utilize both global average pooling and global max pooling operations to achieve channel attention.
Thus, we propose a hybrid pooling channel attention (HPCA), which is inserted in the GFF of
Section 2.3 to obtain the channel importance for balancing each polarization feature’s contribution.
Figure 5 illustrates the detailed implementation of the HPCA. Then, we will further describe the principle of the HPCA.
Different from the SE module [
43], there are two parallel branches in the HPCA. Specifically, given the input feature map
X ∈ R
W×H×C, in the first branch, we first conduct the global average pooling operation of each channel to obtain the feature map with the global receptive field; then, two full connection layers with excitation functions are used to predict the channel importance. In the second branch, we first conduct the global max pooling operation of each channel to obtain the feature map with the global receptive field; then, two full connection layers with excitation functions are used to predict the channel importance weight. Next, we add the importance weight of the two branches to obtain the final channel importance coefficient. Finally, the importance coefficient is applied to the corresponding channels to construct the correlation between channels.
The above steps can be described as follows:
where
X denotes the balanced polarization feature map,
Y denotes the input feature map, ⨀ denotes the channel-wise multiplication, and
W denotes the channel importance coefficient, i.e.,
where
GAP denotes the global average pooling,
GMP denotes the global max pooling, ⨁ denotes the channel-wise summation, and
fencode denotes the channel encoder that can assist in the non-linearity and generality of the model, where two full connection layers with non-linearity excitation functions are adopted.
Finally, we can obtain finer channel information, in which each polarization feature’s contribution is more balanced. By inserting HPCA into each polarization feature group, the network can learn the contribution of each polarization feature adaptively in the process of group-wise feature fusion. Therefore, HPCA can equalize the contribution of each group polarization feature, so as to improve the expression ability of the network.
6. Discussion
The above quantitative results, qualitative results and ablation studies fully reveal the superior dual-polarization SAR ship detection performance. The proposed four contributions (i.e., the DFE, GFE, GFF and HPCA) guarantee the excellent ship detection results of GWFEF-Net in dual-polarization SAR images. It can be found that GWFEF-Net can ensure very few missed inspections, which is very applicable to some specific occasions (e.g., the illegal ship monitoring field, where it is essential not to generate missed detections).
We also discuss the generalization ability of GWFEF-Net by conducting an experiment in detecting dual-polarization SAR ships from the Singapore Strait. Note that there is no other public dual-polarimetric SAR ship detection dataset, so we choose to construct some dual-polarization images from the Singapore Strait ourselves. Moreover, these images are not included in the DSSDD dataset used in our paper, and therefore can be used to test the generalization performance of GWFEF-Net. The images are from the Sentinel-1 satellite, with the incident angle of 27.6°~34.8°, resolution of 2.3 m × 14.0 m and swathes of ~250 km. Specifically, in this discussion, the images from Shanghai, the Suez Canal, etc., in DSSDD are selected as our training set, and the dual-polarization images from the Singapore Strait serve as the test set.
Figure 8 shows the detection results of GWFEF-Net on the dual-polarization images from the Singapore Strait. From
Figure 8, GWFEF-Net can successfully detect many ships in both offshore and inshore scenes. Specifically, GWFEF-Net can correctly detect most ships except for one inshore ship. The above shows the excellent generalization ability of GWFEF-Net.
In the future, the typical roll-invariant polarimetric feature will be considered due to its advantage of robustness for rotation ships [
59]; the quad-polarization SAR (QP SAR) will be considered because it has the most abundant polarization information [
60]; the compact polarimetric SAR (CP SAR) will also be considered because it can reach a balance between swath width and polarization information [
61]. In short, we will explore the SAR polarimetric features mentioned above to further improve the ship detection performance. In addition, some traditional artificial features with expert knowledge also reflect the scattering mechanism of ships. Thus, we will also consider integrating the traditional artificial features and polarization features into CNNs to further improve GWFEF-Net’s detection performance.
Our future work is as follows:
7. Conclusions
In this paper, we present a novel two-stage deep learning network named “GWFEF-Net” for better dual-polarization SAR ship detection. The proposed GWFEF-Net introduces novel contributions on four aspects to achieve better detection performance, i.e., (1) DFE is used to enrich the feature library and suppress clutter interferences to facilitate feature extraction, (2) GFF is used to obtain each enhanced polarization semantic feature to highlight each polarization feature region, (3) GFF is used to obtain fused multi-scale polarization features to realize polarization features’ group-wise information interaction, (4) HPCA is used for channel modeling to balance each polarization feature’s contribution. Finally, extensive experimental results on the Sentinel-1 dual-polarization SAR ship dataset demonstrate the superior dual-polarization SAR ship detection performance of GWFEF-Net (94.18% in AP), compared with the other ten competitive methods. Specifically, GWFEF-Net can achieve a 4% improvement in AP compared to the baseline Faster R-CNN and a 2.51% improvement in AP compared to the second-best model. In brief, GWFEF-Net can offer high-quality dual-polarization SAR ship detection results, especially ensuring very few missed inspections, which is of great value.