Learning Lightweight and Superior Detectors with Feature Distillation for Onboard Remote Sensing Object Detection
Abstract
:1. Introduction
- (1)
- Context-aware Dense Feature Distillation (CDFD) is proposed, which takes multiple large-scale object detection networks as teacher models and distils their feature representation capabilities into lightweight student networks, thereby significantly improving the remote sensing object detection performance of the lightweight network without introducing any additional computational effort;
- (2)
- To address the difficulty of detecting small objects in remote sensing images, a contextual feature generation module (CFGM) is designed, which enables the student network to not only mimic the generation of the teacher’s feature map, but also to extract rich contextual features to assist in the remote sensing object detection;
- (3)
- To integrate the strengths of multiple teachers, an Adaptive Dense Multi-teacher Distillation (ADMD) strategy is proposed, which calculates the adaptive weighted loss of students and multiple teachers to integrate the detection superiority of multiple teachers into the student detector;
- (4)
- Extensive experiments were conducted on two large-scale remote sensing object detection datasets with various network structures. The results demonstrate that both CFGM and ADMD can improve students’ detection performance effectively, while students with our CDFD not only obtained state-of-the-art performance, but even outperformed most teacher networks. In addition, the CDFD has good generalisation and is applicable to several two-stage remote sensing object detection approaches.
2. Relation Works
2.1. CubeSats
2.2. Remote Sensing Object Detection
2.3. Knowledge Distillation
3. Methods
3.1. Context-Aware Dense Feature Distillation
3.2. Contextual Feature Generation Module
3.3. Adaptive Dense Multi-Teacher Distillation Strategy
3.4. Overall Loss
4. Experiment
4.1. Dataset
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Experimental Results
4.4.1. Comparison of Object Detection Results
4.4.2. Ablation Studies
4.4.3. Comparison with State-of-the-Art Distillation Approaches on RS Datasets
4.4.4. Distillation of State-of-the-Art Detectors
4.4.5. Comparison with State-of-the-Art Distillation Approaches on General Object Datasets
4.5. Visualisation Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Maskey, A.; Cho, M. CubeSatNet: Ultralight Convolutional Neural Network designed for on-orbit binary image classification on a 1U CubeSat. Eng. Appl. Artif. Intell. 2020, 96, 103952. [Google Scholar] [CrossRef]
- Lingyun, G.; Popov, E.; Ge, D. Fast Fourier Convolution Based Remote Sensor Image Object Detection for Earth Observation. arXiv 2022, arXiv:2209.00551. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Selva, D.; Krejci, D. A survey and assessment of the capabilities of Cubesats for Earth observation. Acta Astronaut. 2012, 74, 50–68. [Google Scholar] [CrossRef]
- Manning, J.; Langerman, D.; Ramesh, B.; Gretok, E.; Wilson, C.; George, A.; MacKinnon, J.; Crum, G. Machine-Learning Space Applications on Smallsat Platforms with Tensorflow; NASA Goddard Space Flight Center: Greenbelt, MD, USA, 2018. [Google Scholar]
- Arechiga, A.P.; Michaels, A.J.; Black, J.T. Onboard image processing for small satellites. In Proceedings of the NAECON 2018-IEEE National Aerospace and Electronics Conference, Dayton, OH, USA, 11 July 2018; IEEE: Piscatway, NJ, USA, 2018; pp. 234–240. [Google Scholar]
- Huq, R.; Islam, M.; Siddique, S. AI-OBC: Conceptual Design of a Deep Neural Network based Next Generation Onboard Computing Architecture for Satellite Systems. In Proceedings of the 1st China Microsatellite Symposium, Xi’an, China, 18–20 November 2018. [Google Scholar]
- Giuffrida, G.; Diana, L.; de Gioia, F.; Benelli, G.; Meoni, G.; Donati, M.; Fanucci, L. Cloudscout: A deep neural network for on-board cloud detection on hyperspectral images. Remote Sens. 2020, 12, 2205. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Wu, J.; Leng, C.; Wang, Y.; Hu, Q.; Cheng, J. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4820–4828. [Google Scholar]
- Kim, J.; Park, S.; Kwak, N. Paraphrasing complex network: Network compression via factor transfer. In Proceedings of the Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Mirzadeh, S.I.; Farajtabar, M.; Li, A.; Levine, N.; Matsukawa, A.; Ghasemzadeh, H. Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 5191–5198. [Google Scholar]
- Lee, S.; Song, B.C. Graph-based knowledge distillation by multi-head attention network. arXiv 2019, arXiv:1907.02226. [Google Scholar]
- Tung, F.; Mori, G. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1365–1374. [Google Scholar]
- Cheng, G.; Si, Y.; Hong, H.; Yao, X.; Guo, L. Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci. Remote. Sens. Lett. 2020, 18, 431–435. [Google Scholar] [CrossRef]
- Saeed, N.; Elzanaty, A.; Almorad, H.; Dahrouj, H.; Al-Naffouri, T.Y.; Alouini, M.S. Cubesat communications: Recent advances and future challenges. IEEE Commun. Surv. Tutor. 2020, 22, 1839–1862. [Google Scholar] [CrossRef]
- Cooley, S.W.; Smith, L.C.; Ryan, J.C.; Pitcher, L.H.; Pavelsky, T.M. Arctic-Boreal lake dynamics revealed using CubeSat imagery. Geophys. Res. Lett. 2019, 46, 2111–2120. [Google Scholar] [CrossRef]
- Houborg, R.; McCabe, M.F. Daily Retrieval of NDVI and LAI at 3 m Resolution via the Fusion of CubeSat, Landsat, and MODIS Data. Remote Sens. 2018, 10, 890. [Google Scholar] [CrossRef] [Green Version]
- Altena, B.; Kääb, A. Glacier ice loss monitored through the Planet cubesat constellation. In Proceedings of the 2017 9th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp), Bruges, Belgium, 27–29 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–4. [Google Scholar]
- Huang, L.; Luo, J.; Lin, Z.; Niu, F.; Liu, L. Using deep learning to map retrogressive thaw slumps in the Beiluhe region (Tibetan Plateau) from CubeSat images. Remote Sens. Environ. 2020, 237, 111534. [Google Scholar] [CrossRef]
- Ghuffar, S. DEM generation from multi satellite PlanetScope imagery. Remote Sens. 2018, 10, 1462. [Google Scholar] [CrossRef]
- Toorian, A.; Diaz, K.; Lee, S. The cubesat approach to space access. In Proceedings of the 2008 IEEE Aerospace Conference, Tampa, FL, USA, 8–11 December 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 1–14. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Zhang, K.; Shen, H. Multi-Stage Feature Enhancement Pyramid Network for Detecting Objects in Optical Remote Sensing Images. Remote Sens. 2022, 14, 579. [Google Scholar] [CrossRef]
- Cheng, G.; He, M.; Hong, H.; Yao, X.; Qian, X.; Guo, L. Guiding clean features for object detection in remote sensing images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 801920. [Google Scholar] [CrossRef]
- Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual transformer networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022. [Google Scholar] [CrossRef] [PubMed]
- Qingyun, F.; Zhaokui, W. Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery. Pattern Recognit. 2022, 130, 108786. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Peng, Z.; Li, Z.; Zhang, J.; Li, Y.; Qi, G.J.; Tang, J. Few-shot image recognition with knowledge transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 441–449. [Google Scholar]
- Wang, J.; Gou, L.; Zhang, W.; Yang, H.; Shen, H.W. Deepvid: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE Trans. Vis. Comput. Graph. 2019, 25, 2168–2180. [Google Scholar] [CrossRef]
- Dou, Q.; Liu, Q.; Heng, P.A.; Glocker, B. Unpaired multi-modal segmentation via knowledge distillation. IEEE Trans. Med. Imaging 2020, 39, 2415–2425. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hou, Y.; Ma, Z.; Liu, C.; Hui, T.W.; Loy, C.C. Inter-region affinity distillation for road marking segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 12486–12495. [Google Scholar]
- Wang, X.; Hu, J.F.; Lai, J.H.; Zhang, J.; Zheng, W.S. Progressive teacher-student learning for early action prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3556–3565. [Google Scholar]
- Wu, M.C.; Chiu, C.T.; Wu, K.H. Multi-teacher knowledge distillation for compressed video action recognition on deep neural networks. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscatway, NJ, USA, 2019; pp. 2202–2206. [Google Scholar]
- Cun, X.; Pun, C.M. Defocus blur detection via depth distillation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 747–763. [Google Scholar]
- Chawla, A.; Yin, H.; Molchanov, P.; Alvarez, J. Data-free knowledge distillation for object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Nashville, TN, USA, 20–25 June 2021; pp. 3289–3298. [Google Scholar]
- Zhao, H.; Sun, X.; Dong, J.; Chen, C.; Dong, Z. Highlight every step: Knowledge distillation via collaborative teaching. IEEE Trans. Cybern. 2020. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Qingyun, F.; Lin, Z.; Zhaokui, W. An efficient feature pyramid network for object detection in remote sensing imagery. IEEE Access 2020, 8, 93058–93068. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Kang, Z.; Zhang, P.; Zhang, X.; Sun, J.; Zheng, N. Instance-conditional knowledge distillation for object detection. Adv. Neural Inf. Process. Syst. 2021, 34, 16468–16480. [Google Scholar]
- Yang, Z.; Li, Z.; Jiang, X.; Gong, Y.; Yuan, Z.; Zhao, D.; Yuan, C. Focal and global knowledge distillation for detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4643–4652. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Yang, Z.; Li, Z.; Shao, M.; Shi, D.; Yuan, Z.; Yuan, C. Masked Generative Distillation. arXiv 2022, arXiv:2205.01529. [Google Scholar]
- Wang, T.; Yuan, L.; Zhang, X.; Feng, J. Distilling object detectors with fine-grained feature imitation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4933–4942. [Google Scholar]
- Dai, X.; Jiang, Z.; Wu, Z.; Bao, Y.; Wang, Z.; Liu, S.; Zhou, E. General instance distillation for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20 June 2021; pp. 7842–7851. [Google Scholar]
Dataset | Model | Backbone | mAP | Model Size (MB) | GFLOPs |
---|---|---|---|---|---|
DIOR | YOLOv3 [50] | Darknet53 | 57.0 | 61.6 | 121.4 |
RetinaNet [51] | ResNet101 [52] | 66.1 | 55.4 | 179.5 | |
Faster RCNN [5] | ResNet18 | 68.4 | 28.2 | 101.5 | |
Faster RCNN | ResNet50 | 70.6 | 41.2 | 134.5 | |
Faster RCNN(T1) | ResNet101 | 70.9 | 60.2 | 182.0 | |
Faster RCNN (T2) | ResNext32 [53] | 73.0 | 59.9 | 184.4 | |
Faster RCNN(T3) | ResNext64 | 73.0 | 98.9 | 280.3 | |
+Ours | ResNet18 | 70.9 | 28.2 | 101.5 | |
+Ours | ResNet50 | 73.1 | 41.2 | 134.5 | |
DOTA | YOLOv3 [50] | Darknet53 | 55.1 | 61.6 | 121.4 |
RetinaNet [51] | ResNet101 [52] | 42.4 | 55.4 | 179.5 | |
Faster RCNN [5] | ResNet18 | 48.4 | 28.2 | 101.5 | |
Faster RCNN | ResNet50 | 56.6 | 41.2 | 134.5 | |
Faster RCNN(T1) | ResNet101 | 57.4 | 60.2 | 182.0 | |
Faster RCNN (T2) | ResNext32 [53] | 58.1 | 59.9 | 184.4 | |
Faster RCNN(T3) | ResNext64 | 57.8 | 98.9 | 280.3 | |
+Ours | ResNet18 | 56.9 | 28.2 | 101.5 | |
+Ours | ResNet50 | 58.0 | 41.2 | 134.5 |
Dataset | Student | Baseline | CFGM | ADMD | mAP |
---|---|---|---|---|---|
DIOR | Faster RCNN-Res50 | ✓ | 70.6 | ||
✓ | ✓ | 72.3 | |||
✓ | ✓ | 72.5 | |||
✓ | ✓ | ✓ | 73.1 | ||
Faster RCNN-Res18 | ✓ | 68.4 | |||
✓ | ✓ | 70.6 | |||
✓ | ✓ | 69.5 | |||
✓ | ✓ | ✓ | 70.9 | ||
DOTA | Faster RCNN-Res50 | ✓ | 56.6 | ||
✓ | ✓ | 57.0 | |||
✓ | ✓ | 57.2 | |||
✓ | ✓ | ✓ | 58.0 | ||
Faster RCNN-Res18 | ✓ | 48.4 | |||
✓ | ✓ | 56.7 | |||
✓ | ✓ | 56.3 | |||
✓ | ✓ | ✓ | 56.9 |
Dataset | Student | Approach | mAP |
---|---|---|---|
DIOR | Faster RCNN-Res50 | No-KD | 70.6 |
+FGD [48] | 71.9 | ||
+MGD [54] | 72.2 | ||
+Ours | 72.3 | ||
+Ours | 73.1 | ||
Our gain | +2.5 | ||
Faster RCNN-Res18 | No-KD | 68.4 | |
+FGD [48] | 68.8 | ||
+MGD [54] | 70.9 | ||
+Ours | 70.6 | ||
+Ours | 70.9 | ||
Our gain | +2.5 | ||
DOTA | Faster RCNN-Res50 | No-KD | 56.6 |
+FGD [48] | 53.6 | ||
+MGD [54] | 57.0 | ||
+Ours | 57.0 | ||
+Ours | 58.0 | ||
Our gain | +1.4 | ||
Faster RCNN-Res18 | No-KD | 48.4 | |
+FGD [48] | 51.6 | ||
+MGD [54] | 56.9 | ||
+Ours | 56.7 | ||
+Ours | 56.9 | ||
Our gain | +8.5 |
Model | mAP | AP_S | AP_M | AP_L |
---|---|---|---|---|
Faster RCNN-ResNext64 (T1) | 42.1 | 24.8 | 46.2 | 55.3 |
Faster RCNN-ResNext32 (T2) | 41.2 | 24.0 | 45.5 | 53.5 |
Faster RCNN-Res101 (T3) | 39.8 | 22.5 | 43.6 | 52.8 |
Faster RCNN-Res50 (S) | 38.4 | 21.5 | 42.1 | 50.3 |
+FGFI [55] | 39.3 | 22.5 | 42.3 | 52.2 |
+GID [56] | 40.2 | 22.7 | 44.0 | 53.2 |
+FGD [48] | 40.5 | 22.6 | 44.7 | 53.2 |
+Ours | 40.9 | 23.4 | 44.8 | 53.7 |
Our gain | +2.5 | +1.9 | +2.7 | +3.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gu, L.; Fang, Q.; Wang, Z.; Popov, E.; Dong, G. Learning Lightweight and Superior Detectors with Feature Distillation for Onboard Remote Sensing Object Detection. Remote Sens. 2023, 15, 370. https://doi.org/10.3390/rs15020370
Gu L, Fang Q, Wang Z, Popov E, Dong G. Learning Lightweight and Superior Detectors with Feature Distillation for Onboard Remote Sensing Object Detection. Remote Sensing. 2023; 15(2):370. https://doi.org/10.3390/rs15020370
Chicago/Turabian StyleGu, Lingyun, Qingyun Fang, Zhaokui Wang, Eugene Popov, and Ge Dong. 2023. "Learning Lightweight and Superior Detectors with Feature Distillation for Onboard Remote Sensing Object Detection" Remote Sensing 15, no. 2: 370. https://doi.org/10.3390/rs15020370
APA StyleGu, L., Fang, Q., Wang, Z., Popov, E., & Dong, G. (2023). Learning Lightweight and Superior Detectors with Feature Distillation for Onboard Remote Sensing Object Detection. Remote Sensing, 15(2), 370. https://doi.org/10.3390/rs15020370