A Dense Feature Pyramid Network for Remote Sensing Object Detection
Abstract
:1. Introduction
2. Related Work
3. Methods
3.1. Attention-Based Feature Extraction Network
- The feature map is divided into G groups based on the channel dimension;
- The attention factor of each group is determined;
- Global average pooling is performed on each group to obtain the vector g;
- The vector g is element-wise dotted with the original group feature;
- The vector is normalized, sigmoid activated, and element-wise dotted with the original group feature;
- Finally, the enhanced feature map is generated.
3.2. Detection Layer for Small Objects
3.3. Multiscale Feature Fusion Based on Dense Feature Pyramids
3.4. K-Means for Anchor Boxes
- Randomly select some points as centroids of cluster for the initial aggregation, with the centroid of the cluster corresponding to the center of the sample that we will approach;
- For each sample in the datasets, calculate the ground truth to the centroid of each cluster, and classify the sample into the cluster with the smallest distance, as shown in Equations (2) and (3), where bbox represents the bounding box, and d(bbox, centriod) represents the distance between the centroid of the cluster and the center of the bbox;
- 3.
- Recalculate the cluster center for each cluster;
- 4.
- Repeat steps 2 and 3 until the clusters converge.
4. Experiments and Results
4.1. Datasets
- 4993 aircraft in 446 images;
- 191 playgrounds in 189 images;
- 180 overpasses in 176 images;
- 1586 oil tanks in 165 images.
4.2. Evaluation Metrics
4.3. Experimental Design
4.4. Results and Analysis
4.4.1. Experimental Results of DFPN-YOLO
4.4.2. Results of the Comparison Experiment
4.4.3. Ablation Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chen, W.; Zhou, S.; Pan, Z.; Zheng, H.; Liu, Y. Mapless Collaborative Navigation for a Multi-Robot System Based on the Deep Reinforcement Learning. Appl. Sci. 2019, 9, 4198. [Google Scholar] [CrossRef] [Green Version]
- Tang, S.; Chen, Z. Understanding Natural Disaster Scenes from Mobile Images Using Deep Learning. Appl. Sci. 2021, 11, 3952. [Google Scholar] [CrossRef]
- Zhao, Y.; Deng, X.; Lai, H. A Deep Learning-Based Method to Detect Components from Scanned Structural Drawings for Reconstructing 3D Models. Appl. Sci. 2020, 10, 2066. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv 2013, arXiv:1312.6034. [Google Scholar] [CrossRef]
- Kaut, H.; Singh, R. A Review on Image Segmentation Techniques for Future Research Study. Int. J. Eng. Trends Technol. 2016, 35, 504–505. [Google Scholar] [CrossRef]
- Li, R.; Zeng, B.; Liou, M.L. A new three-step search algorithm for block motion estimation. IEEE Trans. Circuits Syst. Video Technol. 2002, 4, 438–442. [Google Scholar]
- Benfold, B.; Reid, I. Stable multi-target tracking in real-time surveillance video. In Proceedings of the Computer Vision & Pattern Recognition (CVPR 2011), Colorado Springs, CO, USA, 20–25 June 2011. [Google Scholar]
- Cheng, G.; Han, J. A Survey on Object Detection in Optical Remote Sensing Images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [Green Version]
- Divvala, S.K.; Efros, A.A.; Hebert, M. How important are Deformable Parts in the Deformable Parts Model? In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
- Gunn, S.R. Support vector machines for classification and regression. ISIS Tech. Rep. 1998, 14, 5–16. [Google Scholar]
- Ferrigno, P. Regulated nucleo/cytoplasmic exchange of HOG1 MAPK requires the importin β homologs NMD5 and XPO1. EMBO J. 2014, 17, 5606–5614. [Google Scholar] [CrossRef]
- Roska, T.; Chua, L.O. The CNN universal machine: An analogic array computer. IEEE Trans. Circuits Syst. II Analog. Digit. Signal Process. 2015, 40, 163–173. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Cui, Z.; Li, Q.; Cao, Z.; Liu, N. Dense Attention Pyramid Networks for Multi-Scale Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8983–8997. [Google Scholar] [CrossRef]
- Huang, W.; Li, G.; Chen, Q.; Ju, M.; Qu, J. CF2PN: A Cross-Scale Feature Fusion Pyramid Network Based Remote Sensing Target Detection. Remote Sens. 2021, 13, 847. [Google Scholar] [CrossRef]
- Xu, D.; Wu, Y. FE-YOLO: A Feature Enhancement Network for Remote Sensing Target Detection. Remote Sens. 2021, 13, 1311. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Fu, C.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- RScott. FCLIP demos improved SSDS detect-to-engage co-ordination. Jane’s Int. Def. Rev. 2016, 49, 17. [Google Scholar]
- Bai, G.; Hou, J.; Zhang, Y.; Li, B.; Han, H.; Wang, T.; Hinkelmann, R.; Zhang, D.; Guo, L. An intelligent water level monitoring method based on SSD algorithm. Measurement 2021, 185, 110047. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Shaifee, M.J.; Chywl, B.; Li, F.; Wong, A. Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video. arXiv 2017, arXiv:1709.05943. [Google Scholar] [CrossRef]
- Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3039–13048. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR 2019, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934v1. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Li, X.; Hu, X.; Yang, J. Spatial group-wise enhance: Improving semantic feature learning in convolutional networks. arXiv 2019, arXiv:1905.09646. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Wang, H.; Zhang, F.; Wang, L. Fruit classification model based on improved Darknet53 convolutional neural network. In Proceedings of the 2020 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Vientiane, Laos, 11–12 January 2020; pp. 881–884. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Venice, Italy, 22–29 October 2017; pp. 2117–2125. [Google Scholar]
- Xiao, Z.; Liu, Q.; Tang, G.; Zhai, X. Elliptic Fourier transformation-based histograms of oriented gradients for rotationally invariant object detection in remote-sensing images. Int. J. Remote Sens. 2015, 36, 618–644. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 15, 296–307. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Target detection in VHR Optical Remote Sensing Images. IEEE Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Layers | Filters | Size | Output Size |
---|---|---|---|
Convolutional | 32 | k = 3, s = 1, p = 1 | 416 × 416 × 32 |
Convolutional | 64 | k = 3, s = 2, p = 1 | 208 × 208 × 64 |
Convolutional | 32 | k = 1, s = 1, p = 0 | 208 × 208 × 32 |
SGEresidual | 64 | k = 3, s = 1, p = 1 | 208 × 208 × 64 |
Convolutional | 128 | k = 3, s = 2, p = 1 | 104 × 104 × 128 |
2 × Convolutional | 64 | k = 1, s = 1, p = 0 | 104 × 104 × 64 |
2 × SGEresidual | 128 | k = 3, s = 1, p = 1 | 104 × 104 × 128 |
Convolutional | 256 | k = 3, s = 2, p = 1 | 52 × 52 × 256 |
8 × Convolutional | 128 | k = 1, s = 1, p = 0 | 52 × 52 × 128 |
8 × SGEresidual | 256 | k = 3, s = 1, p = 1 | 52 × 52 × 256 |
Convolutional | 512 | k = 3, s = 2, p = 1 | 26 × 26 × 512 |
8 × Convolutional | 256 | k = 1, s = 1, p = 0 | 26 × 26 × 256 |
8 × SGEresidual | 512 | k = 3, s = 1, p = 1 | 26 × 26 × 512 |
Convolutional | 1024 | k = 3, s = 2, p = 1 | 13 × 13 × 1024 |
4 × Convolutional | 512 | k = 1, s = 1, p = 0 | 13 × 13 × 512 |
4 × SGEresidual | 1024 | k = 3, s = 1, p = 1 | 13 × 13 × 1024 |
Class | Train | Val | Test |
---|---|---|---|
Aircraft | 268 | 88 | 90 |
Oil tank | 93 | 36 | 36 |
Overpass | 106 | 35 | 35 |
Playground | 113 | 38 | 38 |
Total | 580 | 197 | 199 |
Class | Train | Val | Test |
---|---|---|---|
Airplane | 344 | 338 | 705 |
Airport | 326 | 327 | 657 |
Baseball field | 551 | 557 | 1312 |
Basketball court | 336 | 329 | 704 |
Bright | 379 | 495 | 1302 |
Chimney | 202 | 204 | 448 |
Dam | 238 | 246 | 502 |
Expressway service area | 279 | 281 | 565 |
Expressway toll station | 285 | 299 | 634 |
Golf field | 216 | 239 | 491 |
Ground track field | 536 | 454 | 1322 |
Harbor | 328 | 332 | 814 |
Overpass | 410 | 510 | 1099 |
Ship | 650 | 652 | 1400 |
Stadium | 289 | 292 | 619 |
Storage tank | 391 | 384 | 839 |
Tennis court | 605 | 630 | 1347 |
Train station | 244 | 549 | 501 |
Vehicle | 1556 | 1558 | 3306 |
Windmill | 403 | 404 | 809 |
Total | 5862 | 5863 | 11,738 |
Method | Faster RCNN | SSD | YOLOv2 | YOLOv3 | YOLOv3-SPP | YOLOv4 | Ours |
---|---|---|---|---|---|---|---|
Class 1 | 54.5 | 60.1 | 58.5 | 76.2 | 76.7 | 79.1 | 80.2 |
Class 2 | 70.2 | 61.8 | 52.4 | 66.9 | 67.2 | 72.7 | 76.8 |
Class 3 | 63.6 | 67.5 | 70.6 | 72.0 | 71.4 | 73.2 | 72.7 |
Class 4 | 82.4 | 59.2 | 66.2 | 85.6 | 86.2 | 88.4 | 89.1 |
Class 5 | 43.1 | 34.5 | 37.1 | 34.2 | 39.6 | 40.2 | 43.4 |
Class 6 | 74.7 | 66.0 | 70.0 | 73.6 | 75.3 | 76.3 | 76.9 |
Class 7 | 59.1 | 46.2 | 51.4 | 55.2 | 62.4 | 66.5 | 72.3 |
Class 8 | 65.4 | 57.8 | 55.7 | 56.7 | 55.1 | 58.8 | 59.8 |
Class 9 | 62.8 | 54.3 | 55.9 | 55.2 | 53.9 | 56.0 | 56.4 |
Class 10 | 74.9 | 66.8 | 68.9 | 64.1 | 68.3 | 68.1 | 74.3 |
Class 11 | 75.3 | 70.1 | 66.2 | 71.4 | 72.8 | 72.4 | 71.6 |
Class 12 | 44.2 | 26.3 | 42.1 | 51.6 | 52.4 | 57.5 | 63.1 |
Class 13 | 52.9 | 47.2 | 50.9 | 54.3 | 56.0 | 57.2 | 58.7 |
Class 14 | 72.2 | 58.4 | 66.2 | 75.2 | 79.6 | 78.8 | 81.5 |
Class 15 | 57.1 | 51.7 | 51.3 | 37.4 | 42.9 | 38.8 | 40.1 |
Class 16 | 51.2 | 50.2 | 49.6 | 66.2 | 62.1 | 70.7 | 74.2 |
Class 17 | 79.8 | 64.5 | 67.4 | 84.3 | 85.5 | 85.4 | 85.8 |
Class 18 | 51.3 | 42.3 | 39.3 | 50.7 | 58.7 | 64.4 | 73.6 |
Class 19 | 45.0 | 37.2 | 40.2 | 41.5 | 42.0 | 46.6 | 49.7 |
Class 20 | 80.7 | 62.2 | 55.8 | 75.8 | 79.1 | 83.5 | 86.5 |
mAP (%) | 63.02 | 54.22 | 55.79 | 62.41 | 64.37 | 66.73 | 69.33 |
Method | Faster RCNN | SSD | YOLOv2 | YOLOv3 | YOLOv3-SPP | YOLOv4 | Ours |
---|---|---|---|---|---|---|---|
Class 1 | 90.4 | 70.6 | 69.3 | 86.2 | 90.6 | 94.3 | 97.6 |
Class 2 | 89.2 | 81.3 | 84.8 | 88.7 | 87.2 | 92.1 | 97.8 |
Class 3 | 87.6 | 77.5 | 70.7 | 86.1 | 91.5 | 94.6 | 93.3 |
Class 4 | 73.2 | 69.2 | 66.1 | 74.6 | 80.3 | 84.0 | 79.3 |
mAP (%) | 85.1 | 74.7 | 72.7 | 83.9 | 87.4 | 91.3 | 92.0 |
Experiment | Exp 1 | Exp 2 | Exp 3 | Exp 4 |
---|---|---|---|---|
SGE | √ | √ | √ | |
Scale 4 | √ | √ | ||
DFPN | √ | |||
Class 1 | 86.2 | 90.2 | 91.7 | 97.6 |
Class 2 | 88.7 | 91.6 | 92.2 | 97.8 |
Class 3 | 86.1 | 88.4 | 91.4 | 93.3 |
Class 4 | 74.6 | 75.3 | 75.0 | 79.3 |
mAP (%) | 83.9 | 86.4 | 87.8 | 92.0 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, Y.; Liu, W.; Gao, Y.; Hou, X.; Bi, F. A Dense Feature Pyramid Network for Remote Sensing Object Detection. Appl. Sci. 2022, 12, 4997. https://doi.org/10.3390/app12104997
Sun Y, Liu W, Gao Y, Hou X, Bi F. A Dense Feature Pyramid Network for Remote Sensing Object Detection. Applied Sciences. 2022; 12(10):4997. https://doi.org/10.3390/app12104997
Chicago/Turabian StyleSun, Yu, Wenkai Liu, Yangte Gao, Xinghai Hou, and Fukun Bi. 2022. "A Dense Feature Pyramid Network for Remote Sensing Object Detection" Applied Sciences 12, no. 10: 4997. https://doi.org/10.3390/app12104997
APA StyleSun, Y., Liu, W., Gao, Y., Hou, X., & Bi, F. (2022). A Dense Feature Pyramid Network for Remote Sensing Object Detection. Applied Sciences, 12(10), 4997. https://doi.org/10.3390/app12104997