Adaptive Local Cross-Channel Vector Pooling Attention Module for Semantic Segmentation of Remote Sensing Imagery
Abstract
:1. Introduction
- By analyzing the pooling method of current mainstream attention modules for feature maps, we propose an efficient vector average pooling method to realize the construction and retention of spatial information of feature maps by attention modules.
- By comparing the weight mapping methods of the existing mainstream attention modules, we introduce the most efficient weight mapping method of adaptive local cross-channel interaction to achieve lightweight and efficient attention modules.
- We build VPA modules with vector average pooling and adaptive local cross-channel interaction methods to realize the functions of the channel attention module and spatial attention module simultaneously with a single attention module, avoiding the dependency between modules caused by the coupling between modules.
2. Related Work
2.1. Attention Mechanism
2.2. Attention in Semantic Segmentation for Remote Sensing
2.3. ECA Module
- Depthwise Separable Convolution(DW) [42]. When the weights in Equation (3) are learned directly using a 1 × 1 depthwise separable convolution (note: the hyperparameter r is no longer used for dimensionality reduction), can be defined as follows:
- Standard Convolution(SC) [43]. When the weights in Equation (3) are learned directly using a 1 × 1 standard convolution, is written as follows:
- Group Convolution(GC) [44]. Considering the compromise, convolution design capable of achieving a small number of parameters and high learning performance is necessary. When the weights in Equation (3) are learned by using a 1 × 1 group convolution, the expression of can be written as follows:
3. Methodology
3.1. Vector Average Pooling
3.2. VPA Module
4. Experiments and Results
4.1. Datasets
4.1.1. MO-CSSSD
4.1.2. WHU Building Dataset
4.1.3. ISPRS Vaihingen Dataset
4.2. Evaluation Metrics and Experimental Setting
4.3. Experimental Architecture
4.4. Experimental Results
4.4.1. Ablation Study
4.4.2. Comparison Experiment
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Anilkumar, P.; Venugopal, P. Research Contribution and Comprehensive Review towards the Semantic Segmentation of Aerial Images Using Deep Learning Techniques. Secur. Commun. Netw. 2022, 2022, 6010912. [Google Scholar] [CrossRef]
- Wang, J.J.; Ma, A.L.; Zhong, Y.F.; Zheng, Z.; Zhang, L.P. Cross-sensor domain adaptation for high spatial resolution urban land-cover mapping: From airborne to spaceborne imagery. Remote Sens. Environ. 2022, 277, 113058. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhong, Y.F.; Wang, J.J.; Ma, A.L. Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 14–19 June 2020; pp. 4095–4104. [Google Scholar] [CrossRef]
- Huang, X.; Zhang, L.P.; Gong, W. Information fusion of aerial images and LIDAR data in urban areas: Vector-stacking, re-classification and post-processing approaches. Int. J. Remote Sens. 2011, 32, 69–84. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2016; pp. 3431–3440. [Google Scholar] [CrossRef] [Green Version]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [Green Version]
- Chen, L.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 833–851. [Google Scholar] [CrossRef] [Green Version]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J.; Soc, I.C. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 5686–5696. [Google Scholar] [CrossRef] [Green Version]
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Mu, Y.; Wang, X.; Liu, W.; Wang, J. High-resolution representations for labeling pixels and regions. arXiv 2019, arXiv:1904.04514. [Google Scholar]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; pp. 3–11. [Google Scholar] [CrossRef] [Green Version]
- Tsotsos, J.K. ANALYZING VISION AT THE COMPLEXITY LEVEL. Behav. Brain Sci. 1991, 14, 768. [Google Scholar] [CrossRef]
- Vikram, T.N. A Computational Perspective on Visual Attention. Cognit. Syst. Res. 2012, 19–20, 88–90. [Google Scholar] [CrossRef]
- Li, W.; Liu, K.; Zhang, L.Z.; Cheng, F. Object detection based on an adaptive attention mechanism. Sci. Rep. 2020, 10, 11307. [Google Scholar] [CrossRef]
- Tian, Z.; Zhan, R.; Hu, J.; Wang, W.; He, Z.; Zhuang, Z. Generating Anchor Boxes Based on Attention Mechanism for Object Detection in Remote Sensing Images. Remote Sens. 2020, 12, 2416. [Google Scholar] [CrossRef]
- Chen, Z.; Tian, S.; Yu, L.; Zhang, L.; Zhang, X. An object detection network based on YOLOv4 and improved spatial attention mechanism. J. Intell. Fuzzy Syst. 2022, 42, 2359–2368. [Google Scholar] [CrossRef]
- Zhang, M.; Su, H.; Wen, J. Classification of flower image based on attention mechanism and multi-loss attention network. Comput. Commun. 2021, 179, 307–317. [Google Scholar] [CrossRef]
- Cao, P.; Xie, F.; Zhang, S.; Zhang, Z.; Zhang, J. MSANet: Multi-scale attention networks for image classification. Multimed. Tools Appl. 2022, 81, 34325–34344. [Google Scholar] [CrossRef]
- Roy, S.K.; Dubey, S.R.; Chatterjee, S.; Baran Chaudhuri, B. FuSENet: Fused squeeze-and-excitation network for spectral-spatial hyperspectral image classification. Iet Image Process. 2020, 14, 1653–1661. [Google Scholar] [CrossRef]
- Guo, M.; Xu, T.; Liu, J.; Liu, Z.; Jiang, P.; Mu, T.; Zhang, S.; Martin, R.R.; Cheng, M.; Hu, S. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef] [Green Version]
- Li, H.; Qiu, K.; Chen, L.; Mei, X.; Hong, L.; Tao, C. SCAttNet: Semantic Segmentation Network With Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2021, 18, 905–909. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H.; Soc, I.C. Dual Attention Network for Scene Segmentation. In Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 3141–3149. [Google Scholar] [CrossRef] [Green Version]
- Jin, Z.; Liu, B.; Chu, Q.; Yu, N. ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation. In Proceedings of the 18th IEEE/CVF International Conference on Computer Vision (ICCV), Virtual, 11–17 October 2021; pp. 7169–7178. [Google Scholar] [CrossRef]
- Liu, S.; Cheng, J.; Liang, L.; Bai, H.; Dang, W. Light-Weight Semantic Segmentation Network for UAV Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8287–8296. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef] [Green Version]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 11534–11542. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Yuan, Y.; Chen, X.; Wang, J. Object-contextual representations for semantic segmentation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 173–190. [Google Scholar]
- Wang, Y. Remote Sensing Image Semantic Segmentation Algorithm Based on Improved ENet Network. Sci. Program. 2021, 2021, 5078731. [Google Scholar] [CrossRef]
- Sofla, R.A.D.; Alipour-Fard, T.; Arefi, H. Road extraction from satellite and aerial image using SE-Unet. J. Appl. Remote Sens. 2021, 15, 014512. [Google Scholar] [CrossRef]
- Han, G.; Zhang, M.; Wu, W.; He, M.; Liu, K.; Qin, L.; Liu, X. Improved U-Net based insulator image segmentation method based on attention mechanism. Energy Rep. 2021, 7, 210–217. [Google Scholar] [CrossRef]
- Han, L.; Zhao, Y.; Lv, H.; Zhang, Y.; Liu, H.; Bi, G. Remote Sensing Image Denoising Based on Deep and Shallow Feature Fusion and Attention Mechanism. Remote Sens. 2022, 14, 1243. [Google Scholar] [CrossRef]
- Liu, R.R.; Tao, F.; Liu, X.T.; Na, J.M.; Leng, H.J.; Wu, J.J.; Zhou, T. RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022, 14, 3109. [Google Scholar] [CrossRef]
- Wang, M.Y.; Wang, J.T.; Liu, C.; Li, F.Y.; Wang, Z.Y. Spatial-Coordinate Attention and Multi-Path Residual Block Based Oriented Object Detection in Remote Sensing Images. Int. J. Remote Sens. 2022, 43, 5757–5774. [Google Scholar] [CrossRef]
- Li, Y.; Si, Y.; Tong, Z.; He, L.; Zhang, J.; Luo, S.; Gong, Y. MQANet: Multi-Task Quadruple Attention Network of Multi-Object Semantic Segmentation from Remote Sensing Images. Remote Sens. 2022, 14, 6256. [Google Scholar] [CrossRef]
- Zhao, D.; Wang, C.; Gao, Y.; Shi, Z.; Xie, F. Semantic Segmentation of Remote Sensing Image Based on Regional Self-Attention Mechanism. IEEE Geosci. Remote Sens. Lett. 2022, 19. [Google Scholar] [CrossRef]
- Zhang, Y.J.; Cheng, J.; Bai, H.W.; Wang, Q.; Liang, X.Y. Multilevel Feature Fusion and Attention Network for High-Resolution Remote Sensing Image Semantic Labeling. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6512305. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. Acm. 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Yang, X.; Xu, L.; Li, X. Research on multi-scale target semantic segmentation for coastal ecological supervision. Environ. Resour. 2022, 4, 48–61. [Google Scholar]
- Zhu, Q.; Liao, C.; Hu, H.; Mei, X.; Li, H. MAP-Net: Multiple Attending Path Neural Network for Building Footprint Extraction From Remote Sensed Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6169–6181. [Google Scholar] [CrossRef]
- Guo, R.; Liu, J.; Li, N.; Liu, S.; Chen, F.; Cheng, B.; Duan, J.; Li, X.; Ma, C. Pixel-Wise Classification Method for High Resolution Remote Sensing Imagery Using Deep Neural Networks. ISPRS Int. J. Geo-Inf. 2018, 7, 110. [Google Scholar] [CrossRef] [Green Version]
- Xu, Z.; Zhang, W.; Zhang, T.; Li, J. HRCNet: High-Resolution Context Extraction Network for Semantic Segmentation of Remote Sensing Images. Remote Sens. 2021, 13, 71. [Google Scholar] [CrossRef]
Method | FLOPs(G) | Parameters(M) | OA (%) | mIoU (%) |
---|---|---|---|---|
Baseline | 30.37 | 37.26 | 91.61 | 79.43 |
+VPA_DW | 30.41 | 37.32 | 92.95 | 82.39 |
+VPA_GC16 | 30.49 | 39.69 | 92.72 | 81.50 |
+VPA_SC | 31.80 | 75.67 | 92.73 | 81.96 |
+VPA_MLPr | 30.58 | 40.89 | 92.60 | 81.22 |
+VPA_C2D(k, 1) | 30.41 | 37.27 | 92.96 | 82.43 |
Baseline | +SE | +CBAM | +ECA | +SCA | +VPA | |
---|---|---|---|---|---|---|
Parameters(M) | 37.26 | 39.66 | 42.06 | 37.27 | 40.89 | 37.27 |
FLOPs(G) | 30.37 | 30.39 | 30.42 | 30.39 | 30.60 | 30.41 |
Method | Mangrove | Aquaculture Raft | Aquaculture Pond | Background | OA (%) | mIoU (%) |
---|---|---|---|---|---|---|
Baseline | 81.36/76.04 | 80.75/70.62 | 92.10/85.98 | 92.90/85.07 | 91.61 | 79.43 |
+SE | 84.95/78.96 | 80.67/70.50 | 93.20/85.94 | 91.75/84.98 | 91.66 | 80.10 |
+CBAM | 82.66/77.23 | 80.76/71.45 | 93.44/88.29 | 94.00/87.02 | 92.78 | 81.00 |
+ECA | 86.47/80.61 | 83.57/73.28 | 93.12/87.20 | 93.11/86.42 | 92.48 | 81.87 |
+SCA | 87.43/80.82 | 82.36/72.43 | 93.61/87.78 | 93.16/86.82 | 92.73 | 81.96 |
+VPA | 89.24/82.30 | 81.87/72.06 | 94.02/88.18 | 93.15/87.19 | 92.96 | 82.43 |
Method | IoU (%) | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|
Baseline | 86.23 | 93.32 | 91.90 | 92.60 |
+SE | 87.26 | 92.40 | 94.02 | 93.20 |
+CBAM | 86.82 | 91.72 | 94.20 | 92.95 |
+ECA | 87.64 | 93.35 | 93.47 | 93.41 |
+SCA | 87.42 | 93.64 | 92.94 | 93.29 |
+VPA | 87.92 | 93.41 | 93.73 | 93.57 |
Method | Imp. Surfaces | Building | Low Veg | Tree | Car | OA (%) | mIoU (%) |
---|---|---|---|---|---|---|---|
Baseline | 76.65 | 82.60 | 72.21 | 61.70 | 47.67 | 84.60 | 68.17 |
+SE | 77.81 | 83.50 | 72.40 | 62.23 | 49.77 | 85.10 | 69.15 |
+CBAM | 77.05 | 82.50 | 72.37 | 61.46 | 49.33 | 84.68 | 68.54 |
+ECA | 78.23 | 83.40 | 72.87 | 62.54 | 49.17 | 85.26 | 69.24 |
+SCA | 77.92 | 84.32 | 73.04 | 62.92 | 48.43 | 85.52 | 69.33 |
+VPA | 77.86 | 83.44 | 72.84 | 62.45 | 50.49 | 85.21 | 69.41 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, X.; Kang, M.; Chen, Y.; Jiang, W.; Wang, M.; Weise, T.; Tan, M.; Xu, L.; Li, X.; Zou, L.; et al. Adaptive Local Cross-Channel Vector Pooling Attention Module for Semantic Segmentation of Remote Sensing Imagery. Remote Sens. 2023, 15, 1980. https://doi.org/10.3390/rs15081980
Wang X, Kang M, Chen Y, Jiang W, Wang M, Weise T, Tan M, Xu L, Li X, Zou L, et al. Adaptive Local Cross-Channel Vector Pooling Attention Module for Semantic Segmentation of Remote Sensing Imagery. Remote Sensing. 2023; 15(8):1980. https://doi.org/10.3390/rs15081980
Chicago/Turabian StyleWang, Xiaofeng, Menglei Kang, Yan Chen, Wenxiang Jiang, Mengyuan Wang, Thomas Weise, Ming Tan, Lixiang Xu, Xinlu Li, Le Zou, and et al. 2023. "Adaptive Local Cross-Channel Vector Pooling Attention Module for Semantic Segmentation of Remote Sensing Imagery" Remote Sensing 15, no. 8: 1980. https://doi.org/10.3390/rs15081980
APA StyleWang, X., Kang, M., Chen, Y., Jiang, W., Wang, M., Weise, T., Tan, M., Xu, L., Li, X., Zou, L., & Zhang, C. (2023). Adaptive Local Cross-Channel Vector Pooling Attention Module for Semantic Segmentation of Remote Sensing Imagery. Remote Sensing, 15(8), 1980. https://doi.org/10.3390/rs15081980