High-Resolution Drone Detection Based on Background Difference and SAG-YOLOv5s
Abstract
:1. Introduction
2. High-Resolution Drone Detection
2.1. Extracting Potential Targets through Background Difference
2.2. SAG-Yolov5s, a Lightweight Detection Network
2.2.1. The Ghost Module
2.2.2. YOLOv5s
2.2.3. SimAM Attention Module
2.2.4. SAG-YOLOv5s
- The CBS basic convolution block is replaced with Ghostconv. Ordinary convolution brings numerous parameters, and the introduction of the Ghost module can greatly reduce the total number of model parameters. According to the Ghost module, for input to obtain the dimension feature maps, the normal convolution is used to generate the inherent feature maps of the m-dimension, and then, the linear transformations of the inherent feature maps of the m-dimension are performed to generate the dimension feature maps. Finally, the two parts of the feature maps are stacked to obtain the results. Therefore, the number of common convolution parameters is about times that of the Ghost module. In this paper, it is assumed that the number of output channels are , the dimensions of the inherent feature graph are , and the BN term and bias term are simplified. Therefore, the acceleration ratio is about 2, and the number of parameters of GhostConv is only about half of CBS.
- The SimAM attention mechanism is introduced, and SAGBottleneck is designed as the basic structure of SAG-YOLOv5s. To better suppress the complex background and focus on drone targets during feature extraction, the SimAM attention module and the Ghost module are integrated and embedded into the Bottleneck structure of YOLOv5s. The SAGBottleneck structure is shown in Figure 6a. This structure is similar to the residual structure, mainly including two 1 × 1 Ghostconv modules and a skip connection. The first Ghostconv is used to reduce the number of input channels, thereby reducing the feature dimension. The SimAM attention mechanism is embedded in the second Ghostconv module to improve feature extraction and increase the feature dimensions to keep consistent with the input and finally add the input and output of the left and right parts through skip connections. In the second Ghostconv, the SimAM attention module is fused before the BN operation to adapt to the output, and the SiLU activation function is removed. This is because after the SiLU activation function, the distribution of the input data of the current layer and the next layer are different, which will reduce the model convergence speed during training. In addition, the SAGBottleneck is integrated into the CSP_n structure to obtain the SAGCSP_n structure, as shown in Figure 6b, which consists of 3 and 1 × 1 Ghostconv and n SAGBottlenecks.
2.3. α-DIoU Loss for Bounding Box Regression
3. Experiment and Result Analysis
3.1. Dataset
3.2. Experimental Results and Evaluation
- From the data in the first and second rows, it can be seen that after using the background difference, the mAP value increased from 73.3% to 97.0%, an increase of 23.7%, this is because the original YOLOv5s directly reduces the high-resolution image to a fixed size and then performs feature extraction during detection, which is easy to cause small targets to be lost, and it is difficult to extract effective features, resulting in a poor detection effect.
- It can be seen from the data in the third and second rows, as well as the data in the fifth and fourth rows that after introducing the SimAM attention mechanism, mAP is increased by 0.5% and 1.8%, respectively, indicating that SimAM can pay more attention to drone targets in the process of feature extraction and show better results on lightweight networks.
- Comparing our SAG-YOLOv5s network and YOLOv5s network, the amount of floating-point operations is reduced from 16.4 GFLOPS to 8.8 GFLOPS, which shows that our network model has lower model complexity. After combining the background difference and SAG-YOLOv5s, the detection speed when the candidate areas are 416 × 416 reaches 13.2 FPS, which is 4.2 FPS higher than before the introduction of the Ghost module;
- Comparing the data in the sixth row and the fifth row, it can be seen that reducing the size of the candidate region from 416 × 416 to 96 × 96 can improve the detection speed, but it will lead to a slight decrease in mAP. This is mainly because of the 96 × 96 input image, which reduces the detection accuracy of larger size drones. In specific applications, actual adjustments can be made according to the distance between the fixed camera and the monitored drone.
3.3. Results Visualization
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yoshihashi, R.; Kawakami, R.; You, S.; Trinh, T.T.; Iida, M.; Naemura, T. Finding a Needle in a Haystack: Tiny Flying Object Detection in 4K Videos using a Joint Detection-and-Tracking Approach. arXiv 2021, arXiv:2105.08253. [Google Scholar]
- Samaras, S.; Diamantidou, E.; Ataloglou, D.; Sakellariou, N.; Vafeiadis, A.; Magoulianitis, V.; Lalas, A.; Dimou, A.; Zarpalas, D.; Votis, K.; et al. Deep Learning on Multi Sensor Data for Counter UAV Applications—A Systematic Review. Sensors 2019, 19, 4837. [Google Scholar] [CrossRef] [PubMed]
- Aker, C.; Kalkan, S. Using deep networks for drone detection. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar]
- Craye, C.; Ardjoune, S. Spatio-temporal semantic segmentation for drone detection. In Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; pp. 1–5. [Google Scholar]
- Sommer, L.; Schumann, A.; Müller, T.; Schuchert, T.; Beyerer, J. Flying object detection for automatic UAV recognition. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- He, J.; Erfani, S.; Ma, X.; Bailey, J.; Chi, Y.; Hua, X.S. Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression. Adv. Neural Inf. Process. Syst. 2021, 34, 20230–20242. [Google Scholar]
- Garcia-Garcia, B.; Bouwmans, T.; Silva, A.J.R. Background subtraction in real applications: Challenges, current models and future directions. Comput. Sci. Rev. 2020, 35, 100204. [Google Scholar] [CrossRef]
- Ultralytics YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 20 May 2021).
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G.; Wu, E. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 516–520. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Coluccia, A.; Fascista, A.; Schumann, A.; Sommer, L.; Dimou, A.; Zarpalas, D.; Akyon, C.F.; Eryuksel, O.; Ozfuttu, A.K.; Altinuc, O.S.; et al. Drone-vs-Bird Detection Challenge at IEEE AVSS2021. In Proceedings of the 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Washington, DC, USA, 16–19 November 2021; pp. 1–8. [Google Scholar]
- Coluccia, A.; Fascista, A.; Schumann, A.; Sommer, L.; Ghenescu, M.; Piatrik, T.; De Cubber, G.; Nalamati, M.; Kapoor, A.; Saqib, M.; et al. Drone vs. Bird Detection Challenge at IEEE AVSS2019. In Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019. [Google Scholar]
- Van Etten, A. You only look twice: Rapid multi-scale object detection in satellite imagery. arXiv 2018, arXiv:1805.09512. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Method | [email protected] (%) | Parameters (106) | FLOPs (109) | FPS | Input Size (pixels) |
---|---|---|---|---|---|
YOLOv5s | 73.3 | 7.06 | 16.4 | 21.0 | 416 × 416 |
BD + YOLOv5s | 97.0 | 7.06 | 16.4 | 9.2 | 416 × 416 |
BD + YOLOv5s + SimAM | 97.5 | 7.06 | 16.4 | 9.0 | 416 × 416 |
BD + Ghost + YOLOv5s | 95.8 | 3.90 | 8.8 | 13.4 | 416 × 416 |
BD + SAG-YOLOv5s | 97.6 | 3.90 | 8.8 | 13.2 | 416 × 416 |
BD + SAG-YOLOv5s | 97.2 | 3.90 | 8.8 | 15.0 | 96 × 96 |
Method | Precision (%) | Recall (%) | [email protected] (%) | FPS |
---|---|---|---|---|
CenterNet | 87.2 | 69.5 | 69.8 | 10.5 |
YOLT | 92.9 | 88.0 | 91.1 | 4.6 |
YOLT-YOLOV5s | 95.3 | 92.8 | 95.2 | 6.0 |
Ours | 97.3 | 95.5 | 97.6 | 13.2 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lv, Y.; Ai, Z.; Chen, M.; Gong, X.; Wang, Y.; Lu, Z. High-Resolution Drone Detection Based on Background Difference and SAG-YOLOv5s. Sensors 2022, 22, 5825. https://doi.org/10.3390/s22155825
Lv Y, Ai Z, Chen M, Gong X, Wang Y, Lu Z. High-Resolution Drone Detection Based on Background Difference and SAG-YOLOv5s. Sensors. 2022; 22(15):5825. https://doi.org/10.3390/s22155825
Chicago/Turabian StyleLv, Yaowen, Zhiqing Ai, Manfei Chen, Xuanrui Gong, Yuxuan Wang, and Zhenghai Lu. 2022. "High-Resolution Drone Detection Based on Background Difference and SAG-YOLOv5s" Sensors 22, no. 15: 5825. https://doi.org/10.3390/s22155825
APA StyleLv, Y., Ai, Z., Chen, M., Gong, X., Wang, Y., & Lu, Z. (2022). High-Resolution Drone Detection Based on Background Difference and SAG-YOLOv5s. Sensors, 22(15), 5825. https://doi.org/10.3390/s22155825