An Improved SAR Image Semantic Segmentation Deeplabv3+ Network Based on the Feature Post-Processing Module
Abstract
:1. Introduction
2. Relevant Theories
- (1)
- Coordinate Information Embedding
- (2)
- Coordinate Attention Generation
3. Method
3.1. Feature Post-Processing Module (FPPM)
3.2. Determination of FPPM Model Parameters
3.2.1. Logarithmic Energy Characteristics
3.2.2. Spatial Feature Extraction
3.2.3. MVG (Multivariate Gaussian) Model
3.3. Focal Loss Function
3.4. Improvement of ASPP Module
4. Results
4.1. Determination of CA Attention Mechanism
4.2. FPPM Model Parameter Experimental Results
4.3. Ablation Experiment
4.4. Results Comparison on Composite Images
4.5. Results on SAR Images
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar]
- Yu, H.; Zhang, X.; Wang, S.; Hou, B. Context-based hierarchical unequal merging for SAR image segmentation. IEEE Trans. Geosci. Remote Sens. 2013, 51, 995–1009. [Google Scholar] [CrossRef]
- Zhang, Z.J.; Shui, P.L. SAR images segmentation algorithm based on region merging using edge information. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Syst. Eng. Electron. 2014, 36, 1948–1954. [Google Scholar]
- Krähenbühl, P.; Koltun, V. Efficient inference in fully connected crfs with gaussian edge potentials. In Proceedings of the Advances in Neural Information Processing Systems, Granada, Spain, 12–15 December 2011; pp. 109–117. [Google Scholar]
- Teichmann, M.; Cipolla, R. Convolutional CRFs for Semantic Segmentation. arXiv 2018, arXiv:1805.04777. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Comput. Sci. 2014, 40, 357–361. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 3146–3154. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, Venice, Italy, 7 August 2017; pp. 2999–3007. [Google Scholar]
- Alvarez, J.; Petersson, L. DecomposeMe: Simplifying ConvNets for end-to-end learning. arXiv 2016, arXiv:1606.05426. [Google Scholar]
- Wang, B. Research on Digital Image Scaling and Its Quality Evaluation Method; Harbin Engineering University: Harbin, China, 2015. [Google Scholar]
- Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, Asilomar, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1398–1402. [Google Scholar]
- Brooks, A.C.; Zhao, X.; Pappas, T.N. Structural similarity quality metrics in a coding context: Exploring the space of realistic distortions. IEEE Trans. Image Process. 2008, 17, 1261–1273. [Google Scholar] [CrossRef] [PubMed]
- Sampat, M.P.; Wang, Z.; Gupta, S.; Bovik, A.C.; Markey, M.K. Complex wavelet structural similarity: A new image similarity index. IEEE Trans. Image Process. 2009, 18, 2385–2401. [Google Scholar] [CrossRef] [PubMed]
- Li, C.; Bovik, A.C. Three-component weighted structural similarity index. In Proceedings of the IS&T/SPIE Electronic Imaging, San Jose, CA, USA, 18–22 October 2008; International Society for Optics and Photonics: San Jose, CA, USA, 2009; pp. 72420Q–72420Q-9. [Google Scholar]
- Li, C.; Bovik, A.C. Content-partitioned structural similarity index for image quality assessment. Signal Process. Image Commun. 2010, 25, 517–526. [Google Scholar] [CrossRef]
- Linsley, D.; Dan, S.; Eberhardt, S.; Serre, T. Learning what and where to attend. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Sun, Z.; Meng, C.; Cheng, J.; Zhang, Z.; Chang, S. A Multi-Scale Feature Pyramid Network for Detection and Instance Segmentation of Marine Ships in SAR Images. Remote Sens. 2022, 14, 6312. [Google Scholar] [CrossRef]
- Fu, G.; Huang, J.; Yang, T.; Zheng, S. Improved Lightweight Attention Model Based on CBAM. Comput. Eng. Appl. 2021, 57, 150–156. [Google Scholar]
- Sun, Y.; Hua, Y.; Mou, L.; Zhu, X.X. CG-Net: Conditional GIS-Aware Network for Individual Building Segmentation in VHR SAR Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Zhao, Z.; Chen, K.; Yamane, S. CBAM-Unet++:easier to find the target with the attention module “CBAM”. In Proceedings of the 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE), Kyoto, Japan, 12–15 October 2021. [Google Scholar]
DICE | Time[ms] | ||||||||
---|---|---|---|---|---|---|---|---|---|
Deeplabv3+ | 0.79 | 88.16% | 96.95% | 90.69% | 95.20% | 93.33% | 54.25% | 85.69% | 28 |
+SE | 0.81 | 88.22% | 96.99% | 90.79% | 95.49% | 93.60% | 54.88% | 85.91% | 37 |
+GALA | 0.81 | 88.34% | 97.03% | 90.97% | 95.86% | 93.79% | 54.97% | 86.07% | 46 |
+CBAM | 0.83 | 89.03% | 97.09% | 91.40% | 96.34% | 94.03% | 55.38% | 86.11% | 59 |
+ECA-Net | 0.83 | 88.41% | 97.10% | 91.48% | 96.46% | 94.06% | 55.75% | 86.12% | 30 |
+MS-FPN | 0.83 | 88.52% | 97.09% | 91.55% | 96.68% | 93.88% | 56.17% | 86.05% | 47 |
+CA | 0.85 | 88.79% | 97.11% | 91.64% | 96.90% | 94.21% | 56.87% | 86.20% | 35 |
DICE | ||||||||
---|---|---|---|---|---|---|---|---|
X = 2 | 0.55 | 89.60% | 96.58% | 92.39% | 95.99% | 89.67% | 48.33% | 85.45% |
X = 3 | 0.73 | 90.25% | 97.66% | 93.50% | 97.39% | 94.22% | 58.78% | 90.33% |
X = 4 | 0.63 | 89.15% | 96.17% | 92.92% | 95.77% | 88.85% | 42.60% | 84.13% |
X = 5 | 0.46 | 88.60% | 95.89% | 91.85% | 94.69% | 88.36% | 41.65% | 83.47% |
X = 6 | 0.41 | 88.05% | 95.63% | 91.30% | 94.09% | 87.26% | 37.08% | 82.27% |
No. | Deeplabv3+ | CA | Focal-Loss | Improved ASPP | FPPM | Time[ms] | |
---|---|---|---|---|---|---|---|
1 | ✓ | 85.69% | 28 | ||||
2 | ✓ | ✓ | 86.20% | 35 | |||
3 | ✓ | ✓ | ✓ | 87.50% | 35 | ||
4 | ✓ | ✓ | ✓ | 88.82% | 20 | ||
5 | ✓ | ✓ | ✓ | ✓ | ✓ | 90.33% | 37 |
Methods | DICE | Time[ms] | |||||||
---|---|---|---|---|---|---|---|---|---|
adaptive threshold | 0.49 | 43.95% | 50.36% | 42.58% | 48.15% | 45.17% | 25.66% | 40.66% | 789 |
OSTU | 0.52 | 50.67% | 55.69% | 50.45% | 54.67% | 50.14% | 22.68% | 47.82% | 541 |
FCN | 0.61 | 80.12% | 94.82% | 88.06% | 94.93% | 91.27% | 13.62% | 78.22% | 71 |
PSPNet | 0.67 | 82.14% | 95.81% | 89.22% | 95.01% | 92.00% | 32.58% | 79.69% | 356 |
Deeplabv3+ | 0.71 | 87.66% | 96.03% | 90.01% | 94.80% | 92.56% | 52.70% | 84.90% | 36 |
Improved Deeplabv3+ | 0.77 | 90.10% | 96.60% | 90.60% | 96.98% | 94.10% | 57.77% | 90.14% | 40 |
Methods | DICE | Time[ms] | |||||||
---|---|---|---|---|---|---|---|---|---|
adaptive threshold | 0.50 | 40.22% | 48.33% | 40.74% | 39.85% | 41.09% | 44.52% | 35.62% | 669 |
OSTU | 0.59 | 43.85% | 47.19% | 45.61% | 50.21% | 47.51% | 20.36% | 41.83% | 580 |
FCN | 0.73 | 80.51% | 95.61% | 89.36% | 95.15% | 92.88% | 14.20% | 78.60% | 63 |
PSPNet | 0.77 | 83.57% | 96.31% | 90.12% | 96.62% | 93.10% | 35.45% | 80.30% | 332 |
Deeplabv3+ | 0.79 | 88.16% | 96.95% | 90.69% | 95.20% | 93.33% | 54.25% | 85.69% | 28 |
Improved Deeplabv3+ | 0.83 | 90.25% | 97.66% | 93.50% | 97.39% | 94.22% | 58.78% | 90.33% | 38 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Q.; Kong, Y. An Improved SAR Image Semantic Segmentation Deeplabv3+ Network Based on the Feature Post-Processing Module. Remote Sens. 2023, 15, 2153. https://doi.org/10.3390/rs15082153
Li Q, Kong Y. An Improved SAR Image Semantic Segmentation Deeplabv3+ Network Based on the Feature Post-Processing Module. Remote Sensing. 2023; 15(8):2153. https://doi.org/10.3390/rs15082153
Chicago/Turabian StyleLi, Qiupeng, and Yingying Kong. 2023. "An Improved SAR Image Semantic Segmentation Deeplabv3+ Network Based on the Feature Post-Processing Module" Remote Sensing 15, no. 8: 2153. https://doi.org/10.3390/rs15082153
APA StyleLi, Q., & Kong, Y. (2023). An Improved SAR Image Semantic Segmentation Deeplabv3+ Network Based on the Feature Post-Processing Module. Remote Sensing, 15(8), 2153. https://doi.org/10.3390/rs15082153