A Parameter-Free Pixel Correlation-Based Attention Module for Remote Sensing Object Detection
Abstract
:1. Introduction
- We propose a parameter-free, plug-and-play attention module for remote sensing image object detection that evaluates pixel correlations by computing Euclidean distances and Pearson coefficients between pixels.
- We integrate various attention modules into the Oriented RCNN model and conduct comparative experiments. The experimental results on the DOTA and NWPU VHR-10 datasets indicate that the proposed attention module achieves the highest mAP.
2. Related Work
2.1. Network Architecture
2.2. Attention Mechanism Module
3. Method
3.1. Euclidean–Pearson Attention Mechanism Module
3.1.1. Euclidean Distance
3.1.2. Pearson Correlation
3.1.3. Information Integration
3.2. Overall Architecture
3.2.1. Feature Extraction Module
3.2.2. Oriented RPN
- Anchor boxes exhibiting an Intersection over Union (IoU) greater than 0.7 with any ground-truth box are categorized as positive samples.
- The anchor box with the highest IoU overlap with a ground-truth box, provided that its IoU surpasses 0.3, is selected.
- Anchor boxes with an IoU less than 0.3 are designated as negative samples.
- Anchor boxes failing to meet the aforementioned criteria are excluded from the training process.
3.2.3. Region-of-Interest Detection Module
3.3. Loss Function
4. Experiments
4.1. Datasets
4.2. Evaluation Metrics
4.3. Implementation Details
5. Results
5.1. Results on the DOTA Dataset
5.2. Results on the NWPU VHR-10 Dataset
5.3. Results on the DIOR Dataset
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Richfeature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8232–8241. [Google Scholar]
- Zhang, G.; Lu, S.; Zhang, W. CAD-Net: A context-aware detection network for objects in remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10015–10024. [Google Scholar] [CrossRef]
- Zheng, G.; Zhang, F.; Zheng, Z.; Xiang, Y.; Yuan, N.J.; Xie, X.; Li, Z. DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 167–176. [Google Scholar]
- Yang, X.; Yan, J.; Feng, Z.; He, T. R3det: Refined single-stage detector with feature refinement for rotating object. Proc. AAAI Conf. Artif. Intell. 2021, 35, 3163–3171. [Google Scholar] [CrossRef]
- Han, J.; Ding, J.; Xue, N.; Xia, G.S. Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2786–2795. [Google Scholar]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2849–2858. [Google Scholar]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 19–25 June 2021; pp. 3520–3529. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Liu, Y.; Shao, Z.; Teng, Y.; Hoffmann, N. NAM: Normalization-based attention module. arXiv 2021, arXiv:2111.12419. [Google Scholar]
- Shugar, D.H.; Jacquemart, M.; Shean, D.; Bhushan, S.; Upadhyay, K.; Sattar, A.; Schwanghart, W.; McBride, S.; De Vries, M.V.W.; Mergili, M.; et al. Massive rock and ice avalanche caused the 2021 disaster at Chamoli, Indian Himalaya. Science 2021, 373, 300–306. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Liu, Z.; Hu, J.; Weng, L.; Yang, Y. Rotated region based CNN for ship detection. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 900–904. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 900–904. [Google Scholar]
- Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. Reppoints: Point set representation for object detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9657–9666. [Google Scholar]
- Yang, X.; Yan, J. Arbitrary-oriented object detection with circular smooth label. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part VIII 16. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 677–694. [Google Scholar]
- Chun, M.M. Visual working memory as visual attention sustained internally over time. Neuropsychologia 2011, 49, 1407–1409. [Google Scholar] [CrossRef] [PubMed]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 618–626. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 2961–2969. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Muhuri, A.; Gascoin, S.; Menzel, L.; Kostadinov, T.S.; Harpold, A.A.; Sanmiguel-Vallelado, A.; López-Moreno, J.I. Performance Assessment of Optical Satellite-Based Operational Snow Cover Monitoring Algorithms in Forested Landscapes. IEEE JSTARS 2022, 14, 7159–7178. [Google Scholar] [CrossRef]
- Zhou, Y.; Yang, X.; Zhang, G.; Wang, J.; Liu, Y.; Hou, L.; Jiang, X.; Liu, X.; Yan, J.; Lyu, C.; et al. Mmrotate: A rotated object detection benchmark using pytorch. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 7331–7334. [Google Scholar]
- Yang, X.; Yan, J. On the arbitrary-oriented object detection: Classification based approaches revisited. Int. J. Comput. Vis. 2022, 130, 1340–1365. [Google Scholar] [CrossRef]
METHOD | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
R3Det [8] | 88.8 | 67.4 | 44.1 | 69.0 | 62.9 | 71.7 | 78.7 | 89.9 | 47.3 | 61.2 | 47.4 | 59.3 | 59.2 | 51.7 | 24.3 | 61.5 |
CSL [38] | 88.1 | 72.2 | 39.8 | 63.8 | 64.3 | 71.9 | 78.5 | 89.6 | 52.4 | 61.0 | 50.5 | 66.0 | 56.6 | 50.1 | 27.5 | 62.2 |
RoI Trans [10] | 89.9 | 76.5 | 48.1 | 73.1 | 68.7 | 78.2 | 88.7 | 90.8 | 73.6 | 62.7 | 62.0 | 63.4 | 73.7 | 57.2 | 47.9 | 70.3 |
Oriented rcnn [11] | 85.9 | 75.5 | 51.8 | 78.0 | 69.1 | 84.0 | 89.3 | 90.7 | 73.9 | 62.6 | 62.8 | 66.4 | 75.4 | 56.9 | 44.0 | 71.3 |
Oriented rcnn + SE [14] | 89.7 | 80.6 | 50.8 | 74.1 | 67.9 | 84.2 | 89.2 | 90.8 | 72.3 | 62.7 | 61.0 | 65.5 | 75.5 | 57.6 | 50.5 | 71.5 |
Oriented rcnn + CBAM [15] | 89.8 | 75.4 | 50.9 | 70.8 | 68.7 | 84.4 | 89.1 | 90.6 | 73.8 | 62.7 | 59.3 | 66.9 | 75.1 | 59.0 | 49.6 | 71.6 |
Oriented rcnn + SimAM [18] | 89.8 | 75.4 | 50.9 | 78.8 | 68.7 | 84.4 | 89.1 | 90.6 | 73.8 | 62.7 | 59.3 | 66.9 | 75.1 | 59.0 | 49.6 | 71.6 |
Oriented rcnn + Eu | 89.8 | 76.5 | 51.5 | 75.2 | 67.7 | 84.5 | 89.2 | 90.7 | 74.0 | 62.5 | 61.6 | 65.0 | 75.5 | 57.6 | 53.1 | 71.6 |
Oriented rcnn + Pea | 89.4 | 75.0 | 50.0 | 76.8 | 68.8 | 84.5 | 89.4 | 90.7 | 73.0 | 62.8 | 61.5 | 66.3 | 75.3 | 57.5 | 53.7 | 71.7 |
Oriented rcnn + EuPea | 89.6 | 77.6 | 51.3 | 76.6 | 69.3 | 85.3 | 89.3 | 90.7 | 74.2 | 62.6 | 63.1 | 66.4 | 75.4 | 58.4 | 55.3 | 72.3 |
METHOD | FPS | FLOPs | Parameters | mAP |
---|---|---|---|---|
Baseline | 15.8 | 211.43G | 41.14M | 71.3 |
Baseline + SE | 15.2 | 211.46G | 41.84M | 71.5 |
Baseline + CBAM | 15.0 | 211.53G | 42.08M | 71.6 |
Baseline + SimAM | 15.4 | 211.43G | 41.14M | 71.6 |
Baseline + EuPea | 15.1 | 211.43G | 41.14M | 72.3 |
METHOD | AP | SH | ST | BD | TC | BC | GTF | HA | BR | VE | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|
R3Det [8] | 97.8 | 87.5 | 80.8 | 88.6 | 71.2 | 80.1 | 85.2 | 72.3 | 84.2 | 82.1 | 82.9 |
CSL [38] | 97.8 | 88.2 | 80.5 | 88.7 | 71.3 | 81.0 | 86.1 | 71.9 | 86.1 | 84.6 | 83.6 |
RoI Trans [10] | 99.9 | 90.1 | 81.0 | 90.7 | 72.4 | 81.8 | 90.9 | 78.0 | 88.6 | 89.3 | 86.3 |
Oriented rcnn [11] | 99.9 | 90.4 | 81.4 | 90.6 | 72.3 | 81.5 | 98.0 | 76.8 | 90.9 | 85.1 | 86.7 |
Oriented rcnn + SE [14] | 99.9 | 90.8 | 81.0 | 90.7 | 72.7 | 81.3 | 98.9 | 78.4 | 89.1 | 89.0 | 87.2 |
Oriented rcnn + CBAM [15] | 99.9 | 90.4 | 80.7 | 90.7 | 72.7 | 88.7 | 100 | 76.1 | 89.7 | 89.7 | 87.9 |
Oriented rcnn + SimAM [18] | 99.9 | 90.4 | 80.9 | 90.7 | 72.7 | 89.3 | 99.8 | 77.9 | 89.8 | 89.6 | 88.1 |
Oriented rcnn + Eu | 100 | 90.7 | 80.6 | 90.6 | 72.7 | 81.5 | 99.7 | 86.2 | 88.2 | 89.1 | 87.9 |
Oriented rcnn + Pea | 99.9 | 90.7 | 80.9 | 90.7 | 72.7 | 87.2 | 100 | 86.3 | 87.8 | 88.5 | 88.5 |
Oriented rcnn + EuPea | 99.9 | 90.2 | 89.4 | 90.7 | 72.7 | 89.2 | 100 | 85.4 | 84.4 | 89.0 | 89.1 |
METHOD | FPS | FLOPs | Parameters | mAP |
---|---|---|---|---|
Baseline | 21.5 | 211.43G | 41.13M | 86.7 |
Baseline + SE | 21.4 | 211.46G | 41.83M | 87.2 |
Baseline + CBAM | 21.3 | 211.53G | 42.10M | 87.9 |
Baseline + SIM | 21.4 | 211.43G | 41.13M | 88.1 |
Baseline + EuPea | 21.4 | 211.43G | 41.14M | 89.1 |
METHOD | APL | APT | BF | BC | BR | CM | DA | ESA | EST | GF |
---|---|---|---|---|---|---|---|---|---|---|
R3Det [8] | 89.6 | 6.40 | 89.5 | 71.2 | 14.4 | 81.7 | 8.90 | 26.5 | 48.1 | 31.8 |
CSL [38] | 90.9 | 2.60 | 89.4 | 71.5 | 7.10 | 81.8 | 9.30 | 31.4 | 41.5 | 58.1 |
RoI Trans [10] | 90.8 | 12.1 | 90.8 | 79.8 | 22.9 | 81.8 | 8.20 | 51.3 | 54.1 | 60.7 |
Baseline [11] | 90.9 | 18.3 | 90.8 | 80.7 | 35.0 | 81.8 | 15.4 | 59.8 | 52.9 | 55.3 |
Baseline + SE [14] | 90.9 | 19.5 | 90.8 | 80.4 | 33.8 | 81.8 | 15.3 | 59.7 | 52.6 | 57.9 |
Baseline + SimAM [18] | 90.9 | 20.0 | 90.7 | 80.4 | 34.2 | 81.8 | 17.4 | 60.0 | 53.7 | 55.4 |
Baseline + Eu | 90.9 | 20.2 | 90.8 | 80.1 | 35.4 | 81.8 | 17.6 | 59.7 | 53.7 | 58.9 |
Baseline + Pea | 90.9 | 18.6 | 90.7 | 80.3 | 34.3 | 81.8 | 19.3 | 58.6 | 53.5 | 57.0 |
Baseline + EuPea | 90.9 | 21.6 | 90.8 | 80.9 | 37.0 | 81.8 | 20.5 | 62.0 | 53.7 | 59.7 |
GTF | HA | OPS | SP | STD | ST | TC | TS | VEH | WD | mAP |
65.6 | 8.20 | 33.4 | 69.2 | 51.9 | 72.9 | 81.1 | 21.4 | 54.2 | 44.7 | 48.5 |
63.2 | 17.5 | 26.6 | 69.3 | 53.8 | 72.8 | 81.6 | 18.4 | 47.2 | 46.3 | 49.0 |
75.8 | 30.6 | 40.5 | 89.9 | 88.2 | 79.5 | 81.8 | 20.6 | 67.9 | 55.1 | 59.1 |
76.3 | 21.8 | 53.2 | 89.8 | 85.6 | 79.3 | 81.8 | 36.3 | 68.8 | 56.2 | 61.5 |
76.4 | 24.6 | 56.1 | 89.7 | 89.0 | 79.7 | 81.8 | 29.6 | 68.4 | 56.0 | 61.7 |
76.5 | 23.5 | 54.4 | 89.8 | 88.5 | 80.0 | 81.8 | 40.4 | 68.7 | 55.8 | 62.2 |
76.2 | 28.3 | 56.1 | 89.9 | 86.3 | 79.8 | 81.8 | 29.7 | 69.0 | 55.6 | 61.9 |
76.7 | 25.8 | 55.1 | 89.7 | 88.7 | 79.8 | 81.8 | 41.3 | 68.7 | 55.8 | 62.4 |
78.9 | 31.6 | 56.6 | 90.0 | 88.3 | 79.7 | 85.4 | 42.6 | 59.2 | 56.5 | 63.3 |
METHOD | FPS | FLOPs | Parameters | mAP |
---|---|---|---|---|
Baseline | 20.9 | 211.44G | 41.14M | 61.5 |
Baseline + SE | 19.4 | 211.46G | 41.79M | 61.7 |
Baseline + SIM | 20.4 | 211.43G | 41.13M | 62.2 |
Baseline + EuPea | 20.3 | 211.43G | 41.14M | 63.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guan, X.; Dong, Y.; Tan, W.; Su, Y.; Huang, P. A Parameter-Free Pixel Correlation-Based Attention Module for Remote Sensing Object Detection. Remote Sens. 2024, 16, 312. https://doi.org/10.3390/rs16020312
Guan X, Dong Y, Tan W, Su Y, Huang P. A Parameter-Free Pixel Correlation-Based Attention Module for Remote Sensing Object Detection. Remote Sensing. 2024; 16(2):312. https://doi.org/10.3390/rs16020312
Chicago/Turabian StyleGuan, Xin, Yifan Dong, Weixian Tan, Yun Su, and Pingping Huang. 2024. "A Parameter-Free Pixel Correlation-Based Attention Module for Remote Sensing Object Detection" Remote Sensing 16, no. 2: 312. https://doi.org/10.3390/rs16020312
APA StyleGuan, X., Dong, Y., Tan, W., Su, Y., & Huang, P. (2024). A Parameter-Free Pixel Correlation-Based Attention Module for Remote Sensing Object Detection. Remote Sensing, 16(2), 312. https://doi.org/10.3390/rs16020312