Domain Feature Decomposition for Efficient Object Detection in Aerial Images
Abstract
:1. Introduction
- We propose a fine-grained feature-disentanglement network. It uses airborne metadata as supervision information to disentangle domain-invariant and domain-specific features. These decomposed features improve cross-domain generalization and single-domain detection accuracy.
- We propose a domain mix augmentation algorithm. It alleviates the problem of fine-grained domain-distribution imbalance and solves the problem that Mosaic augmentation cannot be used for airborne metadata label learning.
- The proposed method achieves state-of-the-art performance on both VisDrone and UAVDT datasets. Meanwhile, it is able to process 720P images at 20 Hz on an Nvidia Jetson Xavier NX airborne computer (Nvidia Corporation, Santa Clara, CA, USA).
2. Related Work
2.1. Domain Shift of Remote Sensing
2.2. Fine-Grained Domain Object Detection
2.3. Disentangled Representation Learning
3. Approach
3.1. Problem Definition
3.2. Framework
3.3. Fine-Grained Domain Mix Augmentation
Algorithm 1: FDM augmentation |
Output: A batch of fine-grained domain samples ; |
3.4. Fine-Grained Feature Disentanglement
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Evaluation Metrics
4.4. Ablation Study
4.4.1. Influence of Imaging Conditions on Fine-Grained Feature Disentanglement
4.4.2. Effectiveness of the Fine-Grained Domain Mix
4.4.3. Effectiveness of Invariant and Specific Features
4.4.4. Effectiveness of Loss Functions
4.5. Comparisons with the State of the Art
4.5.1. UAVDT
4.5.2. VisDrone
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Jin, R.; Owais, H.M.; Lin, D.; Song, T.; Yuan, Y. Ellipse proposal and convolutional neural network discriminant for autonomous landing marker detection. J. Field Robot. 2019, 36, 6–16. [Google Scholar] [CrossRef]
- Shao, W.; Kawakami, R.; Yoshihashi, R.; You, S.; Kawase, H.; Naemura, T. Cattle detection and counting in UAV images based on convolutional neural networks. Int. J. Remote Sens. 2020, 41, 31–52. [Google Scholar] [CrossRef]
- Tijtgat, N.; Van Ranst, W.; Goedeme, T.; Volckaert, B.; De Turck, F. Embedded real-time object detection for a UAV warning system. In Proceedings of the ICCVW, Venice, Italy, 22–29 October 2017; pp. 2110–2118. [Google Scholar]
- Zhou, Y.; Rui, T.; Li, Y.; Zuo, X. A UAV patrol system using panoramic stitching and object detection. Comput. Electr. Eng. 2019, 80, 106473. [Google Scholar] [CrossRef]
- Song, S.; Yu, H.; Miao, Z.; Zhang, Q.; Lin, Y.; Wang, S. Domain adaptation for convolutional neural networks-based remote sensing scene classification. IEEE Geosci. Remote Sens. 2019, 16, 1324–1328. [Google Scholar] [CrossRef]
- Lu, X.; Gong, T.; Zheng, X. Multisource compensation network for remote sensing cross-domain scene classification. IEEE Trans. Geosci. Remote 2019, 58, 2504–2515. [Google Scholar] [CrossRef]
- Deng, X.; Yang, H.L.; Makkar, N.; Lunga, D. Large scale unsupervised domain adaptation of segmentation networks with adversarial learning. In Proceedings of the IGARSS, Yokohama, Japan, 28 July–2 August 2019; pp. 4955–4958. [Google Scholar]
- Koga, Y.; Miyazaki, H.; Shibasaki, R. A method for vehicle detection in high-resolution satellite images that uses a region-based object detector and unsupervised domain adaptation. Remote Sens. 2020, 12, 575. [Google Scholar] [CrossRef]
- Tasar, O.; Giros, A.; Tarabalka, Y.; Alliez, P.; Clerc, S. Daugnet: Unsupervised, multisource, multitarget, and life-long domain adaptation for semantic segmentation of satellite images. IEEE Trans. Geosci. Remote 2020, 59, 1067–1081. [Google Scholar] [CrossRef]
- Tasar, O.; Tarabalka, Y.; Giros, A.; Alliez, P.; Clerc, S. Standardgan: Multi-source domain adaptation for semantic segmentation of very high resolution satellite images by data standardization. In Proceedings of the CVPRW, Virtual, 14–19 June 2020; pp. 192–193. [Google Scholar]
- Wu, Z.; Suresh, K.; Narayanan, P.; Xu, H.; Kwon, H.; Wang, Z. Delving into robust object detection from unmanned aerial vehicles: A deep nuisance disentanglement approach. In Proceedings of the ICCV, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1201–1210. [Google Scholar]
- Lee, C.; Seo, J.; Jung, H. Training Domain-invariant Object Detector Faster with Feature Replay and Slow Learner. In Proceedings of the CVPR, Virtual, 19–25 June 2021; pp. 1172–1181. [Google Scholar]
- Kiefer, B.; Messmer, M.; Zell, A. Diminishing Domain Bias by Leveraging Domain Labels in Object Detection on UAVs. In Proceedings of the ICAR, Ljubljana, Slovenia, 6–10 December 2021; pp. 523–530. [Google Scholar]
- Jia, D.; Yuan, Y.; He, H.; Wu, X.; Yu, H.; Lin, W.; Sun, L.; Zhang, C.; Hu, H. DETRs with Hybrid Matching. arXiv 2023, arXiv:2207.13080. [Google Scholar]
- Pu, Y.; Wang, Y.; Xia, Z.; Han, Y.; Wang, Y.; Gan, W.; Wang, Z.; Song, S.; Huang, G. Adaptive Rotated Convolution for Rotated Object Detection. arXiv 2023, arXiv:2303.07820. [Google Scholar]
- Pu, Y.; Liang, W.; Hao, Y.; Yuan, Y.; Yang, Y.; Zhang, C.; Hu, H.; Huang, G. Rank-DETR for High Quality Object Detection. arXiv 2023, arXiv:2310.08854. [Google Scholar]
- Shen, Y.; Geng, Z.; Yuan, Y.; Lin, Y.; Liu, Z.; Wang, C.; Hu, H.; Zheng, N.; Guo, B. V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection. arXiv 2023, arXiv:2308.04409. [Google Scholar]
- Yang, L.; Zheng, Z.; Wang, J.; Song, S.; Huang, G.; Li, F. AdaDet: An Adaptive Object Detection System Based on Early-Exit Neural Networks. IEEE Trans. Cogn. Dev. Syst. 2024, 16, 332–345. [Google Scholar] [CrossRef]
- Glenn, J. YOLOv5 in PyTorch. 2022. Available online: https://github.com/ultralytics/yolov5 (accessed on 20 April 2022).
- Weir, N.; Lindenbaum, D.; Bastidas, A.; Etten, A.V.; McPherson, S.; Shermeyer, J.; Kumar, V.; Tang, H. Spacenet mvoi: A multi-view overhead imagery dataset. In Proceedings of the ICCV, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 992–1001. [Google Scholar]
- Lee, H.Y.; Tseng, H.Y.; Huang, J.B.; Singh, M.; Yang, M.H. Diverse image-to-image translation via disentangled representations. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 35–51. [Google Scholar]
- Ridgeway, K.; Mozer, M.C. Learning deep disentangled embeddings with the f-statistic loss. In Proceedings of the NeurIPS, Montreal, QC, Canada, 3–8 December 2018; pp. 1–10. [Google Scholar]
- Peng, X.; Huang, Z.; Sun, X.; Saenko, K. Domain agnostic learning with disentangled representations. In Proceedings of the ICML, Long Beach, CA, USA, 9–15 June 2019; pp. 5102–5112. [Google Scholar]
- Wu, A.; Liu, R.; Han, Y.; Zhu, L.; Yang, Y. Vector-Decomposed Disentanglement for Domain-Invariant Object Detection. In Proceedings of the ICCV, Virtual, 11–17 October 2021; pp. 9342–9351. [Google Scholar]
- Sugiyama, M.; Kawanabe, M. Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the CVPR, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3339–3348. [Google Scholar]
- Saito, K.; Ushiku, Y.; Harada, T.; Saenko, K. Strong-weak distribution alignment for adaptive object detection. In Proceedings of the CVPR, Long Beach, CA, USA, 15–20 June 2019; pp. 6956–6965. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Ramamonjison, R.; Banitalebi-Dehkordi, A.; Kang, X.; Bai, X.; Zhang, Y. Simrod: A simple adaptation method for robust object detection. In Proceedings of the ICCV, Virtual, 11–17 October 2021; pp. 3570–3579. [Google Scholar]
- Smith, L.N.; Topin, N. Super-convergence: Very fast training of neural networks using large learning rates. In Proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications. International Society for Optics and Photonics, Baltimore, MD, USA, 14–18 April 2019; Volume 11006, pp. 1–612. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the ECCV, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 370–386. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the NeurIPS, Montreal, QC, Canada, 7–12 December 2015; pp. 1–9. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Dong, C.; Pang, C.; Li, Z.; Zeng, X.; Hu, X. PG-YOLO: A Novel Lightweight Object Detection Method for Edge Devices in Industrial Internet of Things. IEEE Access 2022, 10, 123736–123745. [Google Scholar] [CrossRef]
- Froehlich, S.; Klemmer, L.; Große, D.; Drechsler, R. ASNet: Introducing Approximate Hardware to High-Level Synthesis of Neural Networks. In Proceedings of the 2020 IEEE 50th International Symposium on Multiple-Valued Logic (ISMVL), Miyazaki, Japan, 9–11 November 2020; pp. 64–69. [Google Scholar] [CrossRef]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Ling, H.; Hu, Q.; Nie, Q.; Cheng, H.; Liu, C.; Liu, X.; et al. Visdrone-det2018: The vision meets drone object detection in image challenge results. In Proceedings of the ECCVW, Munich, Germany, 8–14 September 2018; pp. 1–30. [Google Scholar]
A | Low | Med | High | Overall | |
---|---|---|---|---|---|
0.0 | 75.5 | 60.4 | 23.2 | 53.5 | |
0.02 | 75.4 | 60.2 | 28.4 | 55.9 | |
0.05 | 75.4 | 59.6 | 31.5 | 56.5 | |
0.1 | 76.7 | 61.2 | 25.6 | 55.6 | |
0.2 | 76.8 | 61.6 | 23.8 | 55.4 |
V | Front | Side | Bird | Overall | |
---|---|---|---|---|---|
0.0 | 59.8 | 69.1 | 34.6 | 53.5 | |
0.02 | 61.0 | 69.3 | 38.1 | 55.8 | |
0.05 | 60.8 | 69.6 | 39.9 | 56.3 | |
0.1 | 60.0 | 70.2 | 36.0 | 55.6 | |
0.2 | 60.3 | 69.3 | 33.3 | 54.4 |
W | Day | Night | Overall | |
---|---|---|---|---|
0.0 | 63.8 | 72.1 | 53.5 | |
0.02 | 64.9 | 72.8 | 54.8 | |
0.05 | 64.7 | 72.6 | 54.6 | |
0.1 | 64.4 | 73.7 | 55.2 | |
0.2 | 66.0 | 65.4 | 56.9 |
Baseline | A | V | W | A+V+W | |
---|---|---|---|---|---|
Flying Altitude | |||||
Low | 75.5 | 77.3 | 75.2 | 75.4 | 77.6 |
Med | 59.6 | 61.6 | 59.5 | 59.3 | 61.9 |
High | 23.2 | 26.3 | 23.4 | 23.1 | 28.1 |
Camera View | |||||
Front | 59.8 | 59.7 | 60.8 | 59.6 | 61.0 |
Side | 69.1 | 68.9 | 69.6 | 69.0 | 70.3 |
Bird | 34.6 | 34.9 | 39.9 | 34.5 | 38.8 |
Weather Condition | |||||
Day | 62.8 | 62.5 | 62.7 | 66.0 | 66.9 |
Night | 70.1 | 70.5 | 70.3 | 73.7 | 73.9 |
Overall | 53.5 | 56.1↑2.6 | 55.2↑1.7 | 56.3↑2.8 | 57.7↑4.2 |
Model | Augmentation | AP | ||
---|---|---|---|---|
YOLOv5m | - | 38.8 | 69.3 | 50.9 |
YOLOv5m | Mosaic | 41.5 | 72.9 | 53.5 |
FGFD | - | 40.4 | 71.6 | 53.1 |
FGFD | FDM | 44.1 | 75.8 | 57.7 |
Model | Invariant Features | Specific Features | |||
---|---|---|---|---|---|
YOLOv5m+NDFT [11] | ✓ | 42.6 | 75.1 | 56.8 | |
FGFD | ✓ | 42.4 | 74.9 | 56.9 | |
FGFD | ✓ | 43.8 | 75.3 | 57.1 | |
FGFD | ✓ | ✓ | 44.1 | 75.8 | 57.7 |
Model | ||||||
---|---|---|---|---|---|---|
FGFD | ✓ | 41.6 | 73.0 | 54.1 | ||
FGFD | ✓ | ✓ | 43.0 | 74.8 | 56.6 | |
FGFD | ✓ | ✓ | ✓ | 44.1 | 75.8 | 57.7 |
Model | Input Size | Backbone | Avg. Time (ms) | |
---|---|---|---|---|
Faster RCNN [34] | 800 × 1280 | ResNet-101 | 45.6 | 136.2 |
NDFT [11] | 800 × 1280 | ResNet-101 | 47.9 | 138.4 |
NDFT+FPN | 800 × 1280 | ResNet-101 | 52.0 | 106.1 |
A-NDFT [12] | 800 × 1280 | ResNet-101 | 48.1 | 138.1 |
Kiefer et al. [13] | 800 × 1280 | ResNet-101 | 49.4 | 125.7 |
YOLOv5m [19] | 800 × 1280 | CSPDarknet | 53.5 | 28.5 |
FGFD (ours) | 800 × 1280 | CSPDarknet | 57.7↑5.7 | 29.2 |
Model | Input Size | Backbone | Avg. Time (ms) | |
---|---|---|---|---|
Faster RCNN+FPN | 800 × 1280 | ResNet-101 | 40.0 | 108.5 |
NDFT-DE-FPN [11] | 800 × 1280 | ResNeXt-101 64-4d | 52.8 | 227.8 |
Kiefer et al. [13] | 800 × 1280 | ResNet-101 | 49.6 | 127.1 |
PG-YOLO [36] | 800 × 1280 | CSPDarknet | 49.6 | 53.3 |
ASNet [37] | 800 × 1280 | ResNet-50 | 52.3 | 92.1 |
YOLOv5s [19] | 800 × 1280 | CSPDarknet | 44.8 | 21.5 |
YOLOv5m [19] | 800 × 1280 | CSPDarknet | 51.1 | 32.7 |
YOLOv5l [19] | 800 × 1280 | CSPDarknet | 52.0 | 51.9 |
FGFD (ours) | 800 × 1280 | CSPDarknet | 55.2↑2.4 | 32.9 |
Model | Input Size | Backbone | TensorRT | INT8 | FPS |
---|---|---|---|---|---|
Faster RCNN+FPN | 720 × 1280 | ResNet-101 | ✓ | ✓ | |
YOLOv5m | 720 × 1280 | CSPDarknet | ✓ | ✓ | |
FGFD (ours) | 720 × 1280 | CSPDarknet | ✓ | ✓ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jin, R.; Jia, Z.; Yin, X.; Niu, Y.; Qi, Y. Domain Feature Decomposition for Efficient Object Detection in Aerial Images. Remote Sens. 2024, 16, 1626. https://doi.org/10.3390/rs16091626
Jin R, Jia Z, Yin X, Niu Y, Qi Y. Domain Feature Decomposition for Efficient Object Detection in Aerial Images. Remote Sensing. 2024; 16(9):1626. https://doi.org/10.3390/rs16091626
Chicago/Turabian StyleJin, Ren, Zikai Jia, Xingyu Yin, Yi Niu, and Yuhua Qi. 2024. "Domain Feature Decomposition for Efficient Object Detection in Aerial Images" Remote Sensing 16, no. 9: 1626. https://doi.org/10.3390/rs16091626
APA StyleJin, R., Jia, Z., Yin, X., Niu, Y., & Qi, Y. (2024). Domain Feature Decomposition for Efficient Object Detection in Aerial Images. Remote Sensing, 16(9), 1626. https://doi.org/10.3390/rs16091626