OrientedDiffDet: Diffusion Model for Oriented Object Detection in Aerial Images
Abstract
:1. Introduction
2. Related Work
2.1. Object Detection
2.2. Diffusion Models
3. Approach
3.1. Preliminaries
3.2. Proposed Method
3.2.1. Oriented Box Representation
3.2.2. Feature Alignment
3.2.3. Accelerated Diffusion
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Comparisons with the State-of-the-Art Methods
4.4. Ablation Studies
4.5. Discussions
- Signal scaling.
- Box renewal threshold.
- Comparison using accelerated diffusion modules.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cheng, G.; Zhou, P.; Han, J. RIFD-CNN: Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 2884–2893. [Google Scholar] [CrossRef]
- Fu, K.; Chen, Z.; Zhang, Y.; Sun, X. Enhanced Feature Representation in Detection for Optical Remote Sensing Images. Remote Sens. 2019, 11, 2095. [Google Scholar] [CrossRef]
- Wang, G.; Wang, X.; Fan, B.; Pan, C. Feature Extraction by Rotation-Invariant Matrix Representation for Object Detection in Aerial Image. IEEE Geosci. Remote Sens. Lett. 2017, 14, 851–855. [Google Scholar] [CrossRef]
- Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
- Han, J.; Ding, J.; Li, J.; Xia, G.S. Align Deep Features for Oriented Object Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–11. [Google Scholar] [CrossRef]
- Qiu, H.; Li, H.; Wu, Q.; Meng, F.; Ngan, K.N.; Shi, H. A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for Object Detection in Remote Sensing Images. Remote. Sens. 2019, 11, 1594. [Google Scholar] [CrossRef]
- Li, Y.; Huang, Q.; Pei, X.; Jiao, L.; Shang, R. RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images. Remote. Sens. 2020, 12, 389. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, H.; Weng, L.; Yang, Y. Ship Rotated Bounding Box Space for Ship Extraction From High-Resolution Optical Satellite Images With Complex Backgrounds. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1074–1078. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Liao, W.; Chen, X.; Yang, J.; Roth, S.; Goesele, M.; Yang, M.Y.; Rosenhahn, B. LR-CNN: Local-aware Region CNN for Vehicle Detection in Aerial Imagery. arXiv 2020, arXiv:2005.14264. [Google Scholar] [CrossRef]
- He, X.; Ma, S.; He, L.; Ru, L.; Wang, C. Multi-Sector Oriented Object Detector for Accurate Localization in Optical Remote Sensing Images. Remote. Sens. 2021, 13, 1921. [Google Scholar] [CrossRef]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. IEEE Trans. Multim. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
- Azimi, S.M.; Vig, E.; Bahmanyar, R.; Körner, M.; Reinartz, P. Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery. In Asian Conference on Computer Vision; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2019; pp. 150–165. [Google Scholar] [CrossRef]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February 2021; pp. 3163–3171. [Google Scholar]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8231–8240. [Google Scholar] [CrossRef]
- Qian, W.; Yang, X.; Peng, S.; Yan, J.; Guo, Y. Learning Modulated Loss for Rotated Object Detection. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February 2021; pp. 2458–2466. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; 2020. [Google Scholar]
- Song, Y.; Ermon, S. Generative Modeling by Estimating Gradients of the Data Distribution. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019; Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R., Eds.; 2019; pp. 11895–11907. [Google Scholar]
- Kim, B.; Oh, Y.; Ye, J.C. Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation. arXiv 2022, arXiv:2209.14566. [Google Scholar] [CrossRef]
- Wolleb, J.; Sandkühler, R.; Bieder, F.; Valmaggia, P.; Cattin, P.C. Diffusion Models for Implicit Image Segmentation Ensembles. In Proceedings of the International Conference on Medical Imaging with Deep Learning, MIDL 2022, Zurich, Switzerland, 6–8 July 2022; Proceedings of Machine Learning Research. Konukoglu, E., Menze, B.H., Venkataraman, A., Baumgartner, C.F., Dou, Q., Albarqouni, S., Eds.; 2022; Volume 172, pp. 1336–1348. [Google Scholar]
- Chen, N.; Zhang, Y.; Zen, H.; Weiss, R.J.; Norouzi, M.; Chan, W. WaveGrad: Estimating Gradients for Waveform Generation. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, 3–7 May 2021. [Google Scholar]
- Ho, J.; Salimans, T.; Gritsenko, A.A.; Chan, W.; Norouzi, M.; Fleet, D.J. Video Diffusion Models. arXiv 2022, arXiv:2204.03458. [Google Scholar] [CrossRef]
- Chen, N.; Zhang, Y.; Zen, H.; Weiss, R.J.; Norouzi, M.; Dehak, N.; Chan, W. WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. In Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August–3 September 2021; Hermansky, H., Cernocký, H., Burget, L., Lamel, L., Scharenborg, O., Motlícek, P., Eds.; 2021; pp. 3765–3769. [Google Scholar] [CrossRef]
- Amit, T.; Nachmani, E.; Shaharabany, T.; Wolf, L. SegDiff: Image Segmentation with Diffusion Probabilistic Models. arXiv 2021, arXiv:2112.00390. [Google Scholar]
- Baranchuk, D.; Rubachev, I.; Voynov, A.; Khrulkov, V.; Babenko, A. Label-Efficient Semantic Segmentation with Diffusion Models. arXiv 2021, arXiv:2112.03126. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Salimans, T.; Ho, J. Progressive Distillation for Fast Sampling of Diffusion Models. In Proceedings of the Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. [Google Scholar]
- Lam, M.W.Y.; Wang, J.; Huang, R.; Su, D.; Yu, D. Bilateral Denoising Diffusion Models. arXiv 2021, arXiv:2108.11514. [Google Scholar]
- Zhang, Q.; Tao, M.; Chen, Y. gDDIM: Generalized denoising diffusion implicit models. arXiv 2022, arXiv:2206.05564. [Google Scholar] [CrossRef]
- Sohl-Dickstein, J.; Weiss, E.A.; Maheswaranathan, N.; Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015; JMLR Workshop and Conference, Proceedings. Bach, F.R., Blei, D.M., Eds.; 2015; Volume 37, pp. 2256–2265. [Google Scholar]
- Xia, G.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.J.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.; Lu, Q. Learning RoI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 2849–2858. [Google Scholar] [CrossRef]
- Prete, R.D.; Graziano, M.D.; Renga, A. RetinaNet: A deep learning architecture to achieve a robust wake detector in SAR images. In Proceedings of the 6th IEEE International Forum on Research and Technology for Society and Industry, RTSI 2021, Naples, Italy, 6–9 September 2021; pp. 171–176. [Google Scholar] [CrossRef]
- Chen, S.; Sun, P.; Song, Y.; Luo, P. DiffusionDet: Diffusion Model for Object Detection. arXiv 2022, arXiv:2211.09788. [Google Scholar] [CrossRef]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10–17 October 2021; pp. 3500–3509. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. arXiv 2019, arXiv:1904.01355. [Google Scholar] [CrossRef]
- Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. RepPoints: Point Set Representation for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9656–9665. [Google Scholar] [CrossRef]
- Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, 3–7 May 2021. OpenReview.net. [Google Scholar]
RoI_Trans [37] | R3Det [18] | RRetinaNet [38] | DiffusionDet [39] | S2ANet [8] | Ours | |
---|---|---|---|---|---|---|
PL | 88.64 | 88.76 | 88.67 | 88.20 | 89.11 | 88.84 |
BD | 78.52 | 83.09 | 77.62 | 77.03 | 82.84 | 82.69 |
BR | 43.44 | 50.91 | 41.81 | 41.10 | 48.37 | 53.94 |
GTF | 75.92 | 67.27 | 58.17 | 52.09 | 71.11 | 74.98 |
SV | 68.81 | 76.23 | 74.58 | 71.25 | 78.11 | 78.95 |
LV | 73.68 | 80.39 | 71.64 | 70.08 | 78.39 | 84.89 |
SH | 83.59 | 86.72 | 79.11 | 76.55 | 87.25 | 88.86 |
TC | 90.74 | 90.78 | 90.29 | 88.63 | 90.83 | 90.93 |
BC | 77.27 | 84.68 | 82.18 | 80.16 | 84.90 | 87.84 |
ST | 81.46 | 83.24 | 74.32 | 74.43 | 85.64 | 85.75 |
SBF | 58.39 | 61.98 | 54.75 | 53.96 | 60.36 | 61.67 |
RA | 53.54 | 61.35 | 60.60 | 57.37 | 62.60 | 60.45 |
HA | 62.83 | 66.91 | 62.57 | 60.23 | 65.26 | 75.79 |
SP | 58.93 | 70.63 | 69.67 | 61.47 | 69.13 | 68.65 |
HC | 47.67 | 53.94 | 60.64 | 54.92 | 57.94 | 64.67 |
mAP | 69.56 | 73.79 | 68.43 | 67.16 | 74.12 | 76.59 |
FPS | 11.3 | 14.7 | 16.1 | 10.8 | 15.1 | 11.3 |
AP | AP_50 | AP_75 | |
---|---|---|---|
DiffusionDet [39] | 0.506 | 0.761 | 0.606 |
RoI_Trans [37] | 0.614 | 0.892 | 0.758 |
ORCNN [40] | 0.547 | 0.896 | 0.637 |
FCOS [41] | 0.529 | 0.883 | 0.620 |
RepPoints [42] | 0.518 | 0.837 | 0.591 |
ours | 0.724 | 0.903 | 0.895 |
Oriented Box Representation | Feature Alignment | Accelerated Diffusion | mAP | FPS |
---|---|---|---|---|
73.20 | 10.8 | |||
✓ * | 75.93 | 11.7 | ||
✓ | 74.21 | 11.9 | ||
✓ | ✓ | ✓ | 78.96 | 11.3 |
Scale | 0.1 | 1.0 | 2.0 | 3.0 |
---|---|---|---|---|
mAP | 77.88 | 78.96 | 78.21 | 78.21 |
Score Thresh. | 0.0 | 0.3 | 0.5 | 0.7 |
---|---|---|---|---|
mAP | 77.04 | 78.96 | 77.37 | 77.3 |
DDPM | DDIM | Accelerated Diffusion (Ours) | |
---|---|---|---|
FPS | 10.8 | 11.20 | 11.30 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, L.; Jia, J.; Dai, H. OrientedDiffDet: Diffusion Model for Oriented Object Detection in Aerial Images. Appl. Sci. 2024, 14, 2000. https://doi.org/10.3390/app14052000
Wang L, Jia J, Dai H. OrientedDiffDet: Diffusion Model for Oriented Object Detection in Aerial Images. Applied Sciences. 2024; 14(5):2000. https://doi.org/10.3390/app14052000
Chicago/Turabian StyleWang, Li, Jiale Jia, and Hualin Dai. 2024. "OrientedDiffDet: Diffusion Model for Oriented Object Detection in Aerial Images" Applied Sciences 14, no. 5: 2000. https://doi.org/10.3390/app14052000
APA StyleWang, L., Jia, J., & Dai, H. (2024). OrientedDiffDet: Diffusion Model for Oriented Object Detection in Aerial Images. Applied Sciences, 14(5), 2000. https://doi.org/10.3390/app14052000