Ship Detection in SAR Images Based on Feature Enhancement Swin Transformer and Adjacent Feature Fusion
Abstract
:1. Introduction
- A FESwin module is proposed as a backbone network to extract ship feature information. The module not only has the excellent spatial feature information processing capability of the Swin transformer but also uses CNN to enhance the association among feature map channels. It effectively suppresses the problem of insufficient feature extraction caused by strong scattering of SAR objects, obtaining more significant feature information at different scales, and enhances the transmission capability of feature information.
- We construct an AFF module that allows shallow feature information in the feature pyramid to be selectively fused into adjacent higher-level feature information adaptively. The idea of learnable weights and proximity fusion reduces the huge information difference between the bottom and higher-level features and alleviate the problem of attentional dispersion in feature maps.
- A ship detector with SAR images is constructed by combining the FESwin module with the AFF module. The effects of FESwin and AFF on ESTDNet were verified separately for both models to improve performance. Experiments on SSDD and SARShip datasets show that ESTDNet can detect ships better in SAR images with higher detection accuracy.
2. The Proposed Method
2.1. FESwin Backbone Network
2.2. AFF Module
2.3. Architecture of ESTDNet
3. Results
3.1. Experiment Settings
3.2. Experiment Datasets
3.3. Experiments on the SSDD Dataset
3.4. Experiments on the SARShip Dataset
3.5. Ablation Experiments
3.6. Comparison of Inference Time
4. Discussion
4.1. FESwin Module Effect Validation
4.2. AFF Module Effect Validation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fan, Y.; Wang, F.; Wang, H. A Transformer-Based Coarse-to-Fine Wide-Swath SAR Image Registration Method under Weak Texture Conditions. Remote Sens. 2022, 14, 1175. [Google Scholar] [CrossRef]
- Zhang, X.; Wang, H.; Xu, C.; Lv, Y.; Fu, C.; Xiao, H.; He, Y. A Lightweight Feature Optimizing Network for Ship Detection in SAR Image. IEEE Access 2019, 7, 141662–141678. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Qian, X.; Cheng, X.; Cheng, G.; Yao, X.; Jiang, L. Two-Stream Encoder GAN With Progressive Training for Co-Saliency Detection. IEEE Signal Process. Lett. 2021, 28, 180–184. [Google Scholar] [CrossRef]
- Lin, S.; Zhang, M.; Cheng, X.; Wang, L.; Xu, M.; Wang, H. Hyperspectral Anomaly Detection via Dual Dictionaries Construction Guided by Two-Stage Complementary Decision. Remote Sens. 2022, 14, 1784. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: New York, NY, USA, 2020; pp. 213–229. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Dai, Z.; Cai, B.; Lin, Y.; Chen, J. UP-DETR: Unsupervised Pre-Training for Object Detection with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 1601–1610. [Google Scholar]
- Wang, J.; Lu, C.; Jiang, W. Simultaneous Ship Detection and Orientation Estimation in SAR Images Based on Attention Module and Angle Regression. Sensors 2018, 18, 2851. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chang, Y.-L.; Anagaw, A.; Chang, L.; Wang, Y.; Hsiao, C.-Y.; Lee, W.-H. Ship Detection Based on YOLOv2 for SAR Imagery. Remote Sens. 2019, 11, 786. [Google Scholar] [CrossRef] [Green Version]
- Qian, X.; Lin, S.; Cheng, G.; Yao, X.; Ren, H.; Wang, W. Object Detection in Remote Sensing Images Based on Improved Bounding Box Regression and Multi-Level Features Fusion. Remote Sens. 2020, 12, 143. [Google Scholar] [CrossRef] [Green Version]
- Su, N.; He, J.; Yan, Y.; Zhao, C.; Xing, X. SII-Net: Spatial Information Integration Network for Small Target Detection in SAR Images. Remote Sens. 2022, 14, 442. [Google Scholar] [CrossRef]
- Li, J.; Qu, C.; Shao, J. Ship Detection in SAR Images Based on an Improved Faster R-CNN. In Proceedings of the SAR in Big Data Era (BIGSARDATA), Beijing, China, 13–14 November 2017. [Google Scholar]
- Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. A SAR Dataset of Ship Detection for Deep Learning under Complex Backgrounds. Remote Sens. 2019, 11, 765. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Zhang, X. High-Speed Ship Detection in SAR Images Based on a Grid Convolutional Neural Network. Remote Sens. 2019, 11, 1206. [Google Scholar] [CrossRef] [Green Version]
- Zhou, K.; Zhang, M.; Wang, H.; Tan, J. Ship Detection in SAR Images Based on Multi-Scale Feature Extraction and Adaptive Feature Fusion. Remote Sens. 2022, 14, 755. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Ke, X. Quad-FPN: A Novel Quad Feature Pyramid Network for SAR Ship Detection. Remote Sens. 2021, 13, 2771. [Google Scholar] [CrossRef]
- Cui, Z.; Li, Q.; Cao, Z.; Liu, N. Dense Attention Pyramid Networks for Multi-Scale Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8983–8997. [Google Scholar] [CrossRef]
- Xia, R.; Chen, J.; Huang, Z.; Wan, H.; Wu, B.; Sun, L.; Yao, B.; Xiang, H.; Xing, M. CRTransSar: A Visual Transformer Based on Contextual Joint Representation Learning for SAR Ship Detection. Remote Sens. 2022, 14, 1488. [Google Scholar] [CrossRef]
- Qu, H.; Shen, L.; Guo, W.; Wang, J. Ships Detection in SAR Images Based on Anchor-Free Model With Mask Guidance Features. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 666–675. [Google Scholar] [CrossRef]
- Feng, Y.; Chen, J.; Huang, Z.; Wan, H.; Xia, R.; Wu, B.; Sun, L.; Xing, M. A Lightweight Position-Enhanced Anchor-Free Algorithm for SAR Ship Detection. Remote Sens. 2022, 14, 1908. [Google Scholar] [CrossRef]
- Rostami, M.; Kolouri, S.; Eaton, E.; Kim, K. Deep Transfer Learning for Few-Shot SAR Image Classification. Remote Sens. 2019, 11, 1374. [Google Scholar] [CrossRef] [Green Version]
- Hao, P.; He, M. Ship Detection Based on Small Sample Learning. J. Coast. Res. 2020, 108, 135–139. [Google Scholar] [CrossRef]
- Zhang, H.; Zhang, X.; Meng, G.; Guo, C.; Jiang, Z. Few-Shot Multi-Class Ship Detection in Remote Sensing Images Using Attention Feature Map and Multi-Relation Detector. Remote Sens. 2022, 14, 2790. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhoa, J.; Liang, X. Zero-shot Learning Based on Semantic Embedding for Ship Detection. In Proceedings of the 2020 3rd International Conference on Unmanned Systems (ICUS), Harbin, China, 27–28 November 2020; pp. 1152–1156. [Google Scholar]
- Zhang, X.; Zhang, H.; Jiang, Z. Few shot object detection in remote sensing images. In Image and Signal Processing for Remote Sensing XXVII; Bruzzone, L., Bovolo, F., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, WA, USA, 2021; Volume 11862, pp. 76–81. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Kim, K.; Lee, H.S. Probabilistic Anchor Assignment with oU Prediction for Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 355–371. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9759–9768. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-Aligned One-Stage Object Detection. In Proceedings of the 2021 IEEE International Conference on Computer Vision (ICCV), Montreal, QC, Canada; 2021; pp. 3490–3499. [Google Scholar]
- Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You Only Look One-Level Feature. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 13039–13048. [Google Scholar]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Layer_Name | Patch_Size | Layer | ||
---|---|---|---|---|
Pretreatment | H/4 × W/4 × 48 | Patch partition | ||
Stage1 | H/4 × W/4 × C | Linear Embedding | ||
Swin transformer block × 2 | LayerNorm W-MSA/SW-MSA LayerNorm MLP | ×2 | ||
Feature enhancement | Conv 3 × 3 LayerNorm ReLU Conv 3 × 3 LayerNorm | Conv 1 × 1 LayerNorm ReLU Conv 1 × 1 LayerNorm | ||
sigmoid | ||||
Stage2 | H/8 × W/8 × 2C | PatchMerging | ||
Swin transformer block × 2 | LayerNorm W-MSA/SW-MSA LayerNorm MLP | ×2 | ||
Feature enhancement | Conv 3 × 3 LayerNorm ReLU Conv 3 × 3 LayerNorm | Conv 1 × 1 LayerNorm ReLU Conv 1 × 1 LayerNorm | ||
sigmoid | ||||
Stage3 | H/16 × W/16 × 4C | PatchMerging | ||
Swin transformer block × 6 | LayerNorm W-MSA/SW-MSA LayerNorm MLP | ×6 | ||
Feature enhancement | Conv 3 × 3 LayerNorm ReLU Conv 3 × 3 LayerNorm | Conv 1 × 1 LayerNorm ReLU Conv 1 × 1 LayerNorm | ||
sigmoid | ||||
Stage4 | H/32 × W/32 × 8C | PatchMerging | ||
Swin transformer block × 2 | LayerNorm W-MSA/SW-MSA LayerNorm MLP | ×2 | ||
Feature enhancement | Conv 3 × 3 LayerNorm ReLU Conv 3 × 3 LayerNorm | Conv 1 × 1 LayerNorm ReLU Conv 1 × 1 LayerNorm | ||
sigmoid |
Methods | AP (%) | AP50 (%) | AP75 (%) | APS (%) | APM (%) | APL (%) |
---|---|---|---|---|---|---|
Cascade R-CNN Swin | 56.6 | 91.5 | 64.7 | 53.5 | 63.5 | 51.3 |
ESTDNet | 59.4 | 93.8 | 69.1 | 55.4 | 66.7 | 60.3 |
Methods | AP (%) | AP50 (%) | AP75 (%) | APS (%) | APM (%) | APL (%) |
---|---|---|---|---|---|---|
Faster RCNN | 53.5 | 90.7 | 56.8 | 52.2 | 56.8 | 53.6 |
YOLOv3 | 57.7 | 93.8 | 64.8 | 54.5 | 63.6 | 60.2 |
Cascade R-CNN | 56.9 | 91.8 | 63.3 | 53.4 | 64.8 | 53.1 |
PAA | 56.0 | 91.6 | 64.0 | 51.1 | 65.7 | 53.1 |
ATSS | 55.2 | 92.4 | 59.9 | 51.9 | 60.9 | 52.2 |
DETR | 50.2 | 91.1 | 52.7 | 41.7 | 64.3 | 59.3 |
Deformable DETR | 52.6 | 93.3 | 55.0 | 46.9 | 61.8 | 58.2 |
Tood | 56.4 | 91.1 | 66.0 | 52.0 | 65.2 | 41.0 |
YOLOF | 57.2 | 93.2 | 62.7 | 51.8 | 67.4 | 56.9 |
Cascade R-CNN Swin | 56.6 | 91.5 | 64.7 | 53.5 | 63.5 | 51.3 |
ESTDNet | 59.4 | 93.8 | 69.1 | 55.4 | 66.7 | 60.3 |
Methods | AP (%) | AP50 (%) | AP75 (%) | APS (%) | APM (%) | APL (%) |
---|---|---|---|---|---|---|
Cascade R-CNN Swin | 57.3 | 93.4 | 63.1 | 53.0 | 62.8 | 56.4 |
ESTDNet | 60.8 | 95.0 | 69.8 | 55.9 | 66.5 | 67.5 |
Methods | AP (%) | AP50 (%) | AP75 (%) | APS (%) | APM (%) | APL (%) |
---|---|---|---|---|---|---|
Faster RCNN | 50.8 | 92.7 | 50.5 | 47.2 | 55.4 | 46.9 |
YOLOv3 | 46.6 | 90.9 | 42.6 | 42.8 | 52.1 | 43.1 |
Cascade R-CNN | 58.1 | 93.4 | 65.1 | 53.7 | 63.8 | 57.7 |
PAA | 53.6 | 93.2 | 56.1 | 49.5 | 58.7 | 50.5 |
ATSS | 53.7 | 93.5 | 56.2 | 49.4 | 59.2 | 52.7 |
DETR | 56.5 | 94.5 | 62.7 | 49.1 | 65.2 | 64.2 |
Deformable DETR | 56.8 | 94.2 | 63.3 | 50.2 | 64.1 | 52.2 |
Tood | 57.7 | 94.4 | 64.1 | 53.2 | 63.4 | 66.1 |
YOLOF | 54.4 | 94.7 | 56.3 | 48.2 | 62.2 | 54.0 |
Cascade R-CNN Swin | 57.3 | 93.4 | 63.1 | 53.0 | 62.8 | 56.4 |
ESTDNet | 60.8 | 95.0 | 69.8 | 55.9 | 66.5 | 67.5 |
Methods | FESwin | AFF | AP (%) | AP50 (%) | AP75 (%) | APS (%) | APM (%) | APL (%) |
---|---|---|---|---|---|---|---|---|
Cascade R-CNN Swin | 56.6 | 91.5 | 64.7 | 53.5 | 63.5 | 51.3 | ||
ESTDNet | √ | 58.8 | 93.8 | 68.3 | 54.6 | 66.6 | 58.9 | |
√ | 57.5 | 92.3 | 66.2 | 54.3 | 64.3 | 54.2 | ||
√ | √ | 59.4 (0.1%) | 93.8 (0.3%) | 69.1 (0.2%) | 55.4 (0.3%) | 66.7 (0.2%) | 60.3 (0.4%) |
Methods | FPS (Image/Seconds) | Inference Time (Milliseconds/Image) | ||
---|---|---|---|---|
SSDD | SARShip | SSDD | SARShip | |
Faster RCNN | 16.9 | 21 | 59.1 | 47.6 |
YOLOv3 | 47.4 | 41 | 21.1 | 24.4 |
Cascade R-CNN | 13.7 | 15.7 | 73.1 | 63.7 |
PAA | 13.2 | 12.9 | 75.6 | 77.8 |
ATSS | 18.9 | 23.2 | 53 | 43.1 |
DETR | 15.7 | 15.8 | 63.7 | 63.4 |
Deformable DETR | 10.6 | 13.1 | 94.8 | 76.2 |
Tood | 19.1 | 16.3 | 52.4 | 61.4 |
YOLOF | 29 | 37.2 | 34.5 | 26.9 |
Cascade R-CNN Swin | 13.7 | 12.8 | 73 | 78.1 |
ESTDNet | 12.3 | 11.5 | 81.3 | 86.9 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, K.; Zhang, M.; Xu, M.; Tang, R.; Wang, L.; Wang, H. Ship Detection in SAR Images Based on Feature Enhancement Swin Transformer and Adjacent Feature Fusion. Remote Sens. 2022, 14, 3186. https://doi.org/10.3390/rs14133186
Li K, Zhang M, Xu M, Tang R, Wang L, Wang H. Ship Detection in SAR Images Based on Feature Enhancement Swin Transformer and Adjacent Feature Fusion. Remote Sensing. 2022; 14(13):3186. https://doi.org/10.3390/rs14133186
Chicago/Turabian StyleLi, Kuoyang, Min Zhang, Maiping Xu, Rui Tang, Liang Wang, and Hai Wang. 2022. "Ship Detection in SAR Images Based on Feature Enhancement Swin Transformer and Adjacent Feature Fusion" Remote Sensing 14, no. 13: 3186. https://doi.org/10.3390/rs14133186
APA StyleLi, K., Zhang, M., Xu, M., Tang, R., Wang, L., & Wang, H. (2022). Ship Detection in SAR Images Based on Feature Enhancement Swin Transformer and Adjacent Feature Fusion. Remote Sensing, 14(13), 3186. https://doi.org/10.3390/rs14133186