Diverse Feature-Level Guidance Adjustments for Unsupervised Domain Adaptative Object Detection
Abstract
:1. Introduction
- A novel feature alignment strategy named Pixel-wise Multi-scale Alignment (PMA) is designed to minimize the contextual feature differences of the data distribution between the source domain and the target domain. Additionally, PMA instructs the backbone to generate more domain-agnostic feature representations.
- An effective sample division module named Adaptative Threshold Confidence Adjustment (ATCA) is proposed to guide the detector in better perceiving the importance of predictions for foreground and background samples. This approach effectively steers the model’s perception towards the objects of interest as perceived.
- Extensive experiments were carried out on four benchmark datasets, Cityscapes, Foggy Cityscapes, KITTI, and Sim10k, to validate the effectiveness of the newly proposed modules. Our DFGAs improved by 1.3% mAP on weather adaptation and by 11.7% and 15.7% mAP on cross-camera adaptation, based on the same baseline method.
2. Related Works
2.1. Object Detection
2.2. Unsupervised Domain Adaptation for Object Detection
3. Methods
3.1. Framework Overview
3.2. Pixel-Wise Multi-Scale Alignment
3.3. Adaptative Threshold Confidence Adjustment
4. Experiments
4.1. Implementation Details
4.2. Datasets
4.3. Comparative Experiments
4.3.1. Weather Adaptation
4.3.2. Cross-Camera Adaptation
4.3.3. Synthetic-to-Real Adaptation
4.4. Ablation Study
4.5. Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3339–3348. [Google Scholar]
- Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; March, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
- Chen, C.; Zheng, Z.; Huang, Y.; Ding, X.; Yu, Y. I3net: Implicit instance-invariant network for adapting one-stage object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12576–12585. [Google Scholar]
- Zhang, S.; Tuo, H.; Hu, J.; Jing, Z. Domain adaptive YOLO for one-stage cross-domain detection. In Proceedings of the Asian Conference on Machine Learning. PMLR, London, UK, 8–11 November 2021; pp. 785–797. [Google Scholar]
- Hsu, C.C.; Tsai, Y.H.; Lin, Y.Y.; Yang, M.H. Every pixel matters: Center-aware feature alignment for domain adaptive object detector. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part IX 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 733–748. [Google Scholar]
- Hnewa, M.; Radha, H. Multiscale domain adaptive yolo for cross-domain object detection. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AL, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 3323–3327. [Google Scholar]
- Liu, W.; Ren, G.; Yu, R.; Guo, S.; Zhu, J.; Zhang, L. Image-adaptive YOLO for object detection in adverse weather conditions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2022; Volume 36, pp. 1792–1800. [Google Scholar]
- Xu, C.D.; Zhao, X.R.; Jin, X.; Wei, X.S. Exploring categorical regularization for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11724–11733. [Google Scholar]
- Zhou, W.; Du, D.; Zhang, L.; Luo, T.; Wu, Y. Multi-granularity alignment domain adaptation for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9581–9590. [Google Scholar]
- Sindagi, V.A.; Oza, P.; Yasarla, R.; Patel, V.M. Prior-based domain adaptive object detection for hazy and rainy conditions. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XIV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 763–780. [Google Scholar]
- Jiang, J.; Chen, B.; Wang, J.; Long, M. Decoupled adaptation for cross-domain object detection. arXiv 2021, arXiv:2110.02578. [Google Scholar]
- He, Z.; Zhang, L. Domain adaptive object detection via asymmetric tri-way faster-rcnn. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXIV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 309–324. [Google Scholar]
- Li, S.; Huang, J.; Hua, X.S.; Zhang, L. Category dictionary guided unsupervised domain adaptation for object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 2–9 February 2021; Volume 35, pp. 1949–1957. [Google Scholar]
- Saito, K.; Ushiku, Y.; Harada, T.; Saenko, K. Strong-weak distribution alignment for adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6956–6965. [Google Scholar]
- Vs, V.; Gupta, V.; Oza, P.; Sindagi, V.A.; Patel, V.M. Mega-cda: Memory guided attention for category-aware unsupervised domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 4516–4526. [Google Scholar]
- Zhao, L.; Wang, L. Task-specific inconsistency alignment for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14217–14226. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Xie, S.; Liu, C.; Gao, J.; Li, X.; Luo, J.; Fan, B.; Chen, J.; Pu, H.; Peng, Y. Diverse receptive field network with context aggregation for fast object detection. J. Vis. Commun. Image Represent. 2020, 70, 102770. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Bulbul, M.F.; Tabussum, S.; Ali, H.; Zheng, W.; Lee, M.Y.; Ullah, A. Exploring 3D human action recognition using STACOG on multi-view depth motion maps sequences. Sensors 2021, 21, 3642. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Liu, C.; Xie, S.; Li, X.; Gao, J.; Xiao, W.; Fan, B.; Peng, Y. Mitigate the classification ambiguity via localization-classification sequence in object detection. Pattern Recognit. 2023, 138, 109418. [Google Scholar] [CrossRef]
- Zhou, Q.; Gu, Q.; Pang, J.; Lu, X.; Ma, L. Self-adversarial disentangling for specific domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 8954–8968. [Google Scholar] [CrossRef] [PubMed]
- Menke, M.; Wenzel, T.; Schwung, A. AWADA: Attention-Weighted Adversarial Domain Adaptation for Object Detection. arXiv 2022, arXiv:2208.14662. [Google Scholar]
- Yang, Y.; Soatto, S. Fda: Fourier domain adaptation for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 14–19 June 2020; pp. 4085–4095. [Google Scholar]
- Liu, R.; Han, Y.; Wang, Y.; Tian, Q. Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection. arXiv 2021, arXiv:2112.08605. [Google Scholar]
- Hsu, H.K.; Yao, C.H.; Tsai, Y.H.; Hung, W.C.; Tseng, H.Y.; Singh, M.; Yang, M.H. Progressive domain adaptation for object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Village, CO, USA, 1–5 March 2020; pp. 749–757. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Shen, Z.; Maheshwari, H.; Yao, W.; Savvides, M. Scl: Towards accurate domain adaptive object detection via gradient detach based stacked complementary losses. arXiv 2019, arXiv:1911.02559. [Google Scholar]
- Li, C.; Du, D.; Zhang, L.; Wen, L.; Luo, T.; Wu, Y.; Zhu, P. Spatial attention pyramid network for unsupervised domain adaptation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XIII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 481–497. [Google Scholar]
- Sharma, A.; Kalluri, T.; Chandraker, M. Instance level affinity-based transfer for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 5361–5371. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Sakaridis, C.; Dai, D.; Van Gool, L. Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vis. 2018, 126, 973–992. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 3354–3361. [Google Scholar]
- Johnson-Roberson, M.; Barto, C.; Mehta, R.; Sridhar, S.N.; Rosaen, K.; Vasudevan, R. Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? arXiv 2016, arXiv:1610.01983. [Google Scholar]
- He, Z.; Zhang, L. Multi-adversarial faster-rcnn for unrestricted object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6668–6677. [Google Scholar]
Method | Bus | Bicycle | Car | Mcycle | Person | Rider | Train | Truck | mAP |
---|---|---|---|---|---|---|---|---|---|
Faster R-CNN | 25.5 | 30.3 | 34.2 | 19.0 | 25.1 | 33.4 | 9.1 | 12.1 | 23.7 |
DA Faster R-CNN | 35.3 | 27.1 | 40.5 | 20.0 | 25.0 | 31.0 | 20.2 | 22.1 | 27.6 |
SDWA | 36.2 | 35.3 | 43.5 | 30.0 | 29.9 | 42.3 | 32.6 | 24.5 | 34.3 |
ATF | 43.3 | 38.8 | 50.0 | 33.4 | 34.6 | 47.0 | 38.7 | 23.7 | 38.7 |
SAP | 46.8 | 40.7 | 59.8 | 30.4 | 40.8 | 46.7 | 37.5 | 24.3 | 40.9 |
MeGA | 49.2 | 39.0 | 52.4 | 34.5 | 37.7 | 49.0 | 46.9 | 25.4 | 42.8 |
TIA | 52.1 | 38.1 | 49.7 | 37.7 | 34.8 | 46.3 | 48.6 | 31.1 | 42.3 |
Ours | 53.6 | 41.6 | 53.0 | 40.1 | 36.8 | 47.9 | 41.3 | 34.3 | 43.6 |
Method | KITTI → City (mAP) | City → KITTI (mAP) |
---|---|---|
Faster R-CNN | 30.2 | 53.5 |
DA Faster R-CNN | 38.5 | 64.1 |
SWDA | 37.9 | 71.0 |
SAP | 43.4 | 75.2 |
ATF | 42.1 | 73.5 |
MeGA | 43.0 | 75.5 |
TIA | 44.0 | 75.9 |
DFGAs | 59.5 | 87.6 |
Method | Sim10K → City (mAP) |
---|---|
DA Faster R-CNN | 34.3 |
TIA | 39.6 |
SWDA | 40.1 |
MAF | 41.1 |
DFGAs | 41.6 |
Model | PMA | ATCA | mAP |
---|---|---|---|
A | − | − | 42.3 |
B | ✓ | − | 43.2 |
C | − | ✓ | 43.0 |
D | ✓ | ✓ | 43.6 |
Model | PMA | ATCA | mAP |
---|---|---|---|
E | − | − | 29.9 |
F | ✓ | − | 32.1 |
G | − | ✓ | 31.8 |
H | ✓ | ✓ | 32.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, Y.; Liu, C.; Bai, Y.; Wang, C.; Wei, C.; Li, Z.; Zhou, Y. Diverse Feature-Level Guidance Adjustments for Unsupervised Domain Adaptative Object Detection. Appl. Sci. 2024, 14, 2844. https://doi.org/10.3390/app14072844
Zhu Y, Liu C, Bai Y, Wang C, Wei C, Li Z, Zhou Y. Diverse Feature-Level Guidance Adjustments for Unsupervised Domain Adaptative Object Detection. Applied Sciences. 2024; 14(7):2844. https://doi.org/10.3390/app14072844
Chicago/Turabian StyleZhu, Yuhe, Chang Liu, Yunfei Bai, Caiju Wang, Chengwei Wei, Zhenglin Li, and Yang Zhou. 2024. "Diverse Feature-Level Guidance Adjustments for Unsupervised Domain Adaptative Object Detection" Applied Sciences 14, no. 7: 2844. https://doi.org/10.3390/app14072844
APA StyleZhu, Y., Liu, C., Bai, Y., Wang, C., Wei, C., Li, Z., & Zhou, Y. (2024). Diverse Feature-Level Guidance Adjustments for Unsupervised Domain Adaptative Object Detection. Applied Sciences, 14(7), 2844. https://doi.org/10.3390/app14072844