Towards Efficient Detection for Small Objects via Attention-Guided Detection Network and Data Augmentation
Abstract
:1. Introduction
- We use the data augmentation strategy to divide images, so as to increase the number of objects more pertinently, and use non maximum suppression to reduce the number of duplicate images.
- We use the image pyramid mechanism to upsample the divided images, which can not only increase the size of small objects, but also sparse dense objects, which is convenient for dense small object detection.
- We add a detection head on the underlying feature map, so that the detector can handle multi-size objects.
- The attention mechanism is added to each detection head to make it be able to use the context, and it is easier to notice the dense small objects in the image.
- Through experimental verification, this method can significantly improve the detection performance of small objects.
2. Related Work
2.1. Data Augmentation
2.2. Small Object Detection
3. Method
3.1. Overview of Yolov5
3.2. Data Augmentation Algorithms
3.3. Image Pyramid Mechanism
3.4. Multi-Head-Attention-Yolo
4. Experiments
4.1. Experimental Setting
4.2. Results
4.3. Ablation Studies
4.3.1. Effect of Data Augmentation
4.3.2. Effect of Multiple Detection Heads
4.3.3. Effect of Attention Mechanism
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Karaoguz, H.; Jensfelt, P. Object detection approach for robot grasp detection. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 4953–4959. [Google Scholar]
- Gupta, A.; Anpalagan, A.; Guan, L.; Khwaja, A.S. Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues. Array 2021, 10, 100057. [Google Scholar] [CrossRef]
- Pickering, A.; Kingsbury, N. Object search using wavelet-based polar matching for aerial imagery. In Proceedings of the Sensor Signal Processing for Defence 2011, London, UK, 27–29 September 2011; pp. 1–5. [Google Scholar]
- Kisantal, M.; Wojna, Z.; Murawski, J.; Naruniec, J.; Cho, K. Augmentation for small object detection. arXiv 2019, arXiv:1902.07296. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Everingham, M.; Winn, J. The pascal visual object classes challenge 2012 (voc2012) development kit. Pattern Anal. Stat. Model. Comput. Learn. Tech. Rep. 2011, 8, 5. [Google Scholar]
- Jocher, G.; Stoken, A.; Borovec, J.; NanoCode012; Chaurasia, A.; TaoXie; Liu, C.; Abhiram, V.; Laughing; tkianai; et al. ultralytics/yolov5: v5.0—YOLOv5-P6 1280 Models, AWS, Supervise.ly and YouTube Integrations (v5.0). Zenodo. 2021. Available online: https://zenodo.org/record/4679653/export/hx#.Y0Ik3D8RWUk (accessed on 5 October 2022). [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Gao, M.; Wang, F.; Liu, J.; Song, P.; Chen, J.; Yang, H.; Mu, H.; Qi, D.; Chen, M.; Wang, Y.; et al. Estimation of the volatile neural network with attention mechanism and transfer learning on wood knot defect classification. J. Appl. Phys. 2022, 131, 233101. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef] [Green Version]
- Wan, L.; Zeiler, M.; Zhang, S.; Le Cun, Y.; Fergus, R. Regularization of neural networks using dropconnect. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1058–1066. [Google Scholar]
- Devries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. Mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Yin, Y.; Lei, L.; Liang, M.; Li, X.; He, Y.; Qin, L. Research on Fall Detection Algorithm for the Elderly Living Alone Based on YOLO. In Proceedings of the 2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT), Chongqing, China, 22–24 November 2021; pp. 403–408. [Google Scholar] [CrossRef]
- Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2874–2883. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–27 July 2017; pp. 2117–2125. [Google Scholar]
- Chen, X.; Gupta, A. Spatial memory for context reasoning in object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4086–4096. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 63, 2672–2680. [Google Scholar]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1222–1230. [Google Scholar]
- Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; Volume 3, pp. 850–855. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 15–17 June 2018. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
Method | ||||||
---|---|---|---|---|---|---|
Faster R-CNN | 34.1 | 56.7 | 27.2 | 46.9 | 43.2 | 47.5 |
Swin transformer Mask R-CNN | 37.5 | 59.9 | 30.9 | 50.0 | 51.4 | 50.3 |
Yolov5 | 40.3 | 64.1 | 32.5 | 53.3 | 58.2 | 58.5 |
Ours | 41.8 | 66.1 | 33.7 | 54.2 | 59.4 | 59.2 |
Method | |||
---|---|---|---|
Yolov5 + no data augmentation | 36.8 | 58.5 | 38.9 |
TPH-Yolov5 + no data augmentation | 38.3 | 60.3 | 40.8 |
Ours | 41.8 | 66.1 | 43.6 |
Method | Pedes-Trian | People | Bicycle | Car | Van | Truck | Tricycle | Awning-Tricycle | Bus | Motor |
---|---|---|---|---|---|---|---|---|---|---|
Yolov5 + no data augmentation | 35.9 | 24.8 | 21.3 | 65.3 | 43.8 | 39.1 | 29.9 | 17.9 | 56.9 | 33.6 |
TPH-Yolov5 + no data augmentation | 39.2 | 27.1 | 23.3 | 68.1 | 45.4 | 39.6 | 29.5 | 18.4 | 56.3 | 36.1 |
Ours | 41.4 | 29.5 | 25.3 | 68.1 | 49.9 | 43.1 | 35.2 | 21.9 | 61.5 | 39.3 |
Method | ||||||
---|---|---|---|---|---|---|
Faster R-CNN | 44.1 | 71.0 | 35.7 | 49.3 | 46.4 | 53.8 |
Swin transformer Mask R-CNN | 46.9 | 72.9 | 36.8 | 51.4 | 52.4 | 56.8 |
Yolov5 | 49.8 | 75.6 | 37.0 | 52.9 | 54.9 | 61.1 |
Ours | 50.7 | 76.9 | 37.8 | 53.4 | 54.7 | 61.5 |
Method | |||||
---|---|---|---|---|---|
Yolov5 | 36.8 | 58.5 | 28.1 | 49.7 | 59.7 |
Yolov5 + data augmentation + image pyramid | 40.3 | 64.1 | 32.5 | 53.3 | 58.2 |
Yolov5 + previous + multiple detection head | 41.0 | 65.0 | 33.4 | 53.3 | 57.0 |
Yolov5 + previous + attention | 41.8 | 66.1 | 33.7 | 54.2 | 59.4 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, X.; Zhu, D.; Yan, Y. Towards Efficient Detection for Small Objects via Attention-Guided Detection Network and Data Augmentation. Sensors 2022, 22, 7663. https://doi.org/10.3390/s22197663
Wang X, Zhu D, Yan Y. Towards Efficient Detection for Small Objects via Attention-Guided Detection Network and Data Augmentation. Sensors. 2022; 22(19):7663. https://doi.org/10.3390/s22197663
Chicago/Turabian StyleWang, Xiaobin, Dekang Zhu, and Ye Yan. 2022. "Towards Efficient Detection for Small Objects via Attention-Guided Detection Network and Data Augmentation" Sensors 22, no. 19: 7663. https://doi.org/10.3390/s22197663
APA StyleWang, X., Zhu, D., & Yan, Y. (2022). Towards Efficient Detection for Small Objects via Attention-Guided Detection Network and Data Augmentation. Sensors, 22(19), 7663. https://doi.org/10.3390/s22197663