YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Detection in Aerial Imagery
Abstract
:1. Introduction
- We design YOLO-S, a small and fast network with a single fine-grained output scale and exploiting residual connection and feature fusion of 4×, 8× and 16× down-sampled feature maps via upsampling and reshape–passthrough layers in order to strengthen feature propagation and reuse, and improve target position accordingly.
- We design also YOLO-L, or YOLO-large, a baseline CNN detecting at three different resolution levels corresponding to 4×, 8× and 16× down-scaled layers, but more focused on accuracy and suitable only for offline data processing due to the FPS close to YOLOv3.
- We prepared two different vehicle datasets for experiments: VEDAI and AIRES, a new dataset for cAr detectIon fRom hElicopter imageS. In addition, we make experiments on SARD (SeArch and Rescue image Dataset) [4] to verify how YOLO-S generalizes to the context of search and rescue (SAR);
- We also propose a double-stage training procedure by first fine-tuning on a dataset composed of DOTA and VEDAI, and then by training on the specific vehicle dataset of interest. In fact, usually a domain knowledge transfer is realized by fine-tuning a learning model starting from pre-trained weights on COCO or ImageNet [35]. However, since large publicly available datasets do not include small vehicles from aerial images, the basic single-stage trained model may be less efficient due to a domain gap and can under-perform with respect to the proposed double-stage training.
2. Materials and Methods
2.1. AIRES: The Proposed Vehicles Dataset
2.2. DOTAv2 and VEDAI Datasets
2.3. The Search and Rescue Dataset (SARD) for Person Detection
2.4. The Proposed Networks YOLO-L and YOLO-S
2.5. State-of-the-Art Detectors
2.6. The Experiments
2.6.1. Experiment 1 on AIRES Dataset
2.6.2. Experiment 2 on AIRES Dataset
2.6.3. Experiment 1 on VEDAI Dataset, Double-Stage Sliding
2.6.4. Experiment 2 on VEDAI Dataset, Single-Stage Full-Sized
2.6.5. Experiment on SARD Dataset
2.6.6. Evaluation Metrics
3. Results and Discussion
3.1. Experiments on AIRES Dataset
3.1.1. Quantitative Comparison with State-of-the-Art Detectors
3.1.2. Qualitative Evaluation
3.2. Experiments on VEDAI Dataset
3.2.1. Quantitative Comparison with State-of-the-Art Detectors
3.2.2. Qualitative Evaluation
3.3. Experiment on SARD Dataset
3.3.1. Quantitative Comparison with State-of-the-Art Detectors
3.3.2. Qualitative Evaluation
3.4. Summary
- By means of a sliding window method, as reported in Figure 9 for experiment 1 on AIRES dataset, YOLO-S is almost 37% faster and has a relative mAP improvement of almost 16.5% with respect to YOLOv3. Regarding experiment 1 on VEDAI dataset, YOLO-S is almost 52% faster and has a relative mAP improvement of almost 15.6% with respect to YOLOv3;
- With respect to Tiny-YOLOv3 on AIRES dataset, YOLO-S is only about 23% slower, but has a relative F1ma@0.5 increase by 104.0%. On VEDAI dataset, YOLO-S is only about 25% slower than Tiny-YOLOv3, but has a relative F1ma@0.5 increase by 17.9%;
- The use of pre-trained weights on a source task, such as the publicly-available DOTAv2 and VEDAI, more similar to the target domain, corresponding to AIRES dataset, leads always to a relative increase in the overall micro-average metrics @0.5 with respect to more basic preliminary features extracted on COCO dataset;
- Inference on full-sized VEDAI images with preliminary features learned on basic COCO task leads to a F1ma@0.5 drop for all networks, ranging from ≈−50% for Tiny-YOLOv3 down to ≈−10% for [3] and YOLO-S, the latter having the best performance;
- YOLO-S processes as many as 25.6 FPS on our hardware, resulting in being 38% faster than YOLOv3 and 19% slower than Tiny-YOLOv3 (14% slower than [3]);
- YOLO-S generalizes well also to SARD dataset, with a slightly better detection accuracy than YOLOv3 and with a 32% improvement in the processing speed. Among the discussed baselines, only YOLO-L has a better average precision, but with a ≈26% slower speed and almost 2.6 times larger BFLOPs, as can be observed in Figure 9;
- Regardless of experiments and data, YOLO-S is just from 15% to 25% slower than Tiny-YOLOv3, but even more accurate than YOLOv3 and YOLO-L;
- YOLO-S has about one half BFLOPs of YOLOv3, quite close to SlimYOLOv3-SPP3-50 [25]. Thus, as far as power consumption is a concern like in low-power UAV applications, it is an interesting and less voluminous alternative to the by far less precise Tiny-YOLOv3.
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
YOLO | You Only Look Once |
AIRES | cAr detectIon fRom hElicopter imageS |
VEDAI | Vehicle Detection in Aerial Imagery |
DOTA | Dataset for Object deTection in aerial images |
COCO | Common Objects in Context |
SARD | Search and Rescue Image Dataset for Person Detection |
FLOPs | FLoating point Operations Per Second |
BFLOPs | Billion FLoating point Operations Per Second |
UAV | Unmanned Aerial Vehicle |
CNN | Convolutional Neural Network |
GPU | Graphics Processing Unit |
GT | Ground Truth |
FHD | Full High-Definition |
FPN | Feature Pyramid Network |
NMS | Non Maximal Suppression |
mAP | mean Average Precision |
AP | Average Precision |
wAP | weighted Average Precision |
IoU | Intersection over Union |
REC | recall |
PREC | precision |
FPS | Frames Per Second |
References
- Qu, T.; Zhang, Q.; Sun, S. Vehicle detection from high-resolution aerial images using spatial pyramid pooling-based deep convolutional neural networks. Multimed. Tools Appl. 2017, 76, 21651–21663. [Google Scholar] [CrossRef]
- Tang, T.; Zhou, S.; Deng, Z.; Zou, H.; Lei, L. Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining. Sensors 2017, 17, 336. [Google Scholar] [CrossRef] [PubMed]
- Ju, M.; Luo, J.; Zhang, P.; He, M.; Luo, H. A Simple and Efficient Network for Small Target Detection. IEEE Access 2019, 7, 85771–85781. [Google Scholar] [CrossRef]
- Sambolek, S.; Ivasic-Kos, M. Automatic Person Detection in Search and Rescue Operations Using Deep CNN Detectors. IEEE Access 2021, 9, 37905–37922. [Google Scholar] [CrossRef]
- Razakarivony, S.; Jurie, F. Vehicle Detection in Aerial Imagery: A small target detection benchmark. J. Vis. Commun. Image Represent. 2016, 34, 187–203. [Google Scholar] [CrossRef]
- Xia, G.; Bay, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 8–23 June 2018; pp. 3974–3983. [Google Scholar] [CrossRef]
- Liu, K.; Mattyus, G. Fast Multiclass Vehicle Detection on Aerial Images. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1938–1942. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 9 November 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767v1. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Mark Liao, H.Y. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934v1. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. Lect. Notes Comput. Sci. 2016, 9905, 21–37. [Google Scholar] [CrossRef]
- YOLO: Real-Time Object Detection. Available online: https://pjreddie.com/darknet/yolo/ (accessed on 21 December 2022).
- Çintaş, E.; Özyer, B.; Şimşek, E. Vision-based moving UAV tracking by another UAV on low-cost hardware and a new ground control station. IEEE Access 2020, 8, 194601–194611. [Google Scholar] [CrossRef]
- Saribas, H.; Uzun, B.; Benligiray, B.; Eker, O.; Cevikalp, H. A hybrid method for tracking of objects by UAVs. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar] [CrossRef]
- Liu, M.; Wang, X.; Zhou, A.; Fu, X.; Ma, Y.; Piao, C. Uav-yolo: Small object detection on unmanned aerial vehicle perspective. Sensors 2020, 20, 2238. [Google Scholar] [CrossRef] [PubMed]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar] [CrossRef]
- Jung, H.K.; Choi, G.S. Improved yolov5: Efficient object detection using drone images under various conditions. Appl. Sci. 2022, 12, 7255. [Google Scholar] [CrossRef]
- He, W.; Huang, Z.; Wei, Z.; Li, C.; Guo, B. TF-YOLO: An Improved Incremental Network for Real-Time Object Detection. Appl. Sci. 2019, 9, 3225. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote. Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote. Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Everingham, M.; Gool, L.V.; Williams, C.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Zhang, P.; Zhong, Y.; Li, X. SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 37–45. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Ling, H.; Hu, Q.; Nie, Q.; Cheng, H.; Liu, C.; Liu, X.; et al. VisDrone-DET2018: The Vision Meets Drone Object Detection in Image Challenge Results. Lect. Notes Comput. Sci. 2019, 11133, 437–468. [Google Scholar] [CrossRef]
- Taheri Tajar, A.; Ramazani, A.; Mansoorizadeh, M. A lightweight Tiny-YOLOv3 vehicle detection approach. J. Real-Time Image Proc. 2021, 18, 2389–2401. [Google Scholar] [CrossRef]
- Cao, C.; Wu, J.; Zeng, X.; Feng, Z.; Wang, T.; Yan, X.; Wu, Z.; Wu, Q.; Huang, Z. Research on Airplane and Ship Detection of Aerial Remote Sensing Images Based on Convolutional Neural Network. Sensors 2020, 20, 4696. [Google Scholar] [CrossRef] [PubMed]
- Yang, G.; Tang, Z.; Huang, J. Vehicle-specific retrieval based on aerial images. In Proceedings of the International Conference on AI and Big Data Application (AIBDA 2019), Guangzhou, China, 20–22 December 2019. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar] [CrossRef]
- Zhong, J.; Lei, T.; Yao, G. Robust Vehicle Detection in Aerial Images Based on Cascaded Convolutional Neural Networks. Sensors 2017, 17, 2720. [Google Scholar] [CrossRef] [PubMed]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
- Mostofa, M.; Ferdous, S.N.; Riggan, B.S.; Nasrabadi, N.M. Joint-SRVDNet: Joint Super Resolution and Vehicle Detection Network. IEEE Access 2020, 8, 82306–82319. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Heartexlabs/labelImg—GitHub. Available online: https://github.com/tzutalin/labelImg (accessed on 21 December 2022).
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. Lect. Notes Comput. Sci. 2014, 8693, 740–755. [Google Scholar] [CrossRef]
- Dataset—DOTA. Available online: https://captain-whu.github.io/DOTA/dataset.html (accessed on 21 December 2022).
- Vehicle Detection in Aerial Imagery (VEDAI): A benchmark. Available online: https://downloads.greyc.fr/vedai/ (accessed on 21 December 2022).
- Utah GIS—Utah.gov. Available online: http://gis.utah.gov/ (accessed on 21 December 2022).
- Search and Rescue Image Dataset for Person Detection—SARD. Available online: https://ieee-dataport.org/documents/search-and-rescue-image-dataset-person-detection-sard (accessed on 21 December 2022).
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 9 November 2017; pp. 936–944. [Google Scholar] [CrossRef]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? arXiv 2014, arXiv:1411.1792. [Google Scholar] [CrossRef]
- Alganci, U.; Soydas, M.; Sertel, E. Comparative Research on Deep Learning Approaches for Airplane Detection from Very High-Resolution Satellite Images. Remote Sens. 2020, 12, 458. [Google Scholar] [CrossRef]
- Everingham, M.; Eslami, S.M.A.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Padilla, R.; Passos, W.L.; Dias, T.L.B.; Netto, S.L.; Da Silva, E.A.B. A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics 2021, 10, 279. [Google Scholar] [CrossRef]
- Véstias, M.P. A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing. Algorithms 2019, 12, 154. [Google Scholar] [CrossRef]
AIRES | VEDAI | SARD | |||||||
---|---|---|---|---|---|---|---|---|---|
Class | Global | Train | Test | Global | Train | Test | Global | Train | Test |
Van | 2012 | 1437 | 575 | 498 | 350 | 148 | - | - | - |
Truck | 755 | 534 | 221 | 307 | 221 | 86 | - | - | - |
Car | 10,518 | 7520 | 2998 | 2332 | 1676 | 656 | - | - | - |
Motorbike | 83 | 61 | 22 | - | - | - | - | - | - |
Person | 990 | 792 | 198 | - | - | - | 6525 | 5220 | 1305 |
Other | 123 | 89 | 34 | 204 | 157 | 47 | - | - | - |
Boat | 498 | 349 | 149 | 171 | 120 | 51 | - | - | - |
Bus | 268 | 194 | 74 | - | - | - | - | - | - |
Tractor | - | - | - | 190 | 137 | 53 | - | - | - |
Plane | - | - | - | 48 | 36 | 12 | - | - | - |
# of GTs | 15,247 | 10,976 | 4271 | 3750 | 2697 | 1053 | 6525 | 5220 | 1305 |
# of Images | 1275 | 898 | 377 | 1246 | 892 | 354 | 1980 | 1601 | 379 |
YOLO-L | YOLO-S | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
# | Type | F | S/S | Input | Output | CS | RF | Type | F | S/S | Input | Output | CS | RF |
0 | Conv | 32 | 3/1 | 416 × 416 × 3 | 416 × 416 × 32 | 1 | 3 | Conv | 32 | 3/1 | 416 × 416 × 3 | 416 × 416 × 32 | 1 | 3 |
1 | Conv | 64 | 3/2 | 416 × 416 × 32 | 208 × 208 × 64 | 2 | 5 | Conv | 64 | 3/2 | 416 × 416 × 32 | 208 × 208 × 64 | 2 | 5 |
2 | Conv (R1) | 32 | 1/1 | 208 × 208 × 64 | 208 × 208 × 32 | 2 | 5 | Conv (R1) | 32 | 1/1 | 208 × 208 × 64 | 208 × 208 × 32 | 2 | 5 |
3 | Conv (R1) | 64 | 3/1 | 208 × 208 × 32 | 208 × 208 × 64 | 2 | 9 | Conv (R1) | 64 | 3/1 | 208 × 208 × 32 | 208 × 208 × 64 | 2 | 9 |
4 | Conv | 128 | 3/2 | 208 × 208 × 64 | 104 × 104 × 128 | 4 | 13 | Conv | 128 | 3/2 | 208 × 208 × 64 | 104 × 104 × 128 | 4 | 13 |
5 | Conv (R1) | 64 | 1/1 | 104 × 104 × 128 | 104 × 104 × 64 | 4 | 13 | Conv (R1) | 64 | 1/1 | 104 × 104 × 128 | 104 × 104 × 64 | 4 | 13 |
6 | Conv (R1) | 128 | 3/1 | 104 × 104 × 64 | 104 × 104 × 128 | 4 | 21 | Conv (R1) | 128 | 3/1 | 104 × 104 × 64 | 104 × 104 × 128 | 4 | 21 |
7 | Conv (R2) | 64 | 1/1 | 104 × 104 × 128 | 104 × 104 × 64 | 4 | 21 | Conv (R2) | 64 | 1/1 | 104 × 104 × 128 | 104 × 104 × 64 | 4 | 21 |
8 | Conv (R2) | 128 | 3/1 | 104 × 104 × 64 | 104 × 104 × 128 | 4 | 29 | Conv (R2) | 128 | 3/1 | 104 × 104 × 64 | 104 × 104 × 128 | 4 | 29 |
9 | Conv | 256 | 3/2 | 104 × 104 × 128 | 52 × 52 × 256 | 8 | 37 | Conv | 256 | 3/2 | 104 × 104 × 128 | 52 × 52 × 256 | 8 | 37 |
10 | Conv (R1) | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 37 | Conv (R1) | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 37 |
11 | Conv (R1) | 256 | 3/1 | 52 × 52 × 128 | 52 × 52 × 256 | 8 | 53 | Conv (R1) | 256 | 3/1 | 52 × 52 × 128 | 52 × 52 × 256 | 8 | 53 |
12 | Conv (R2) | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 53 | Conv (R2) | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 53 |
13 | Conv (R2) | 256 | 3/1 | 52 × 52 × 128 | 52 × 52 × 256 | 8 | 69 | Conv (R2) | 256 | 3/1 | 52 × 52 × 128 | 52 × 52 × 256 | 8 | 69 |
14 | Conv (R3) | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 69 | Conv | 512 | 3/2 | 52 × 52 × 256 | 26 × 26 × 512 | 16 | 85 |
15 | Conv (R3) | 256 | 3/1 | 52 × 52 × 128 | 52 × 52 × 256 | 8 | 85 | Conv (R1) | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 85 |
16 | Conv (R4) | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 85 | Conv (R1) | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 117 |
17 | Conv (R4) | 256 | 3/1 | 52 × 52 × 128 | 52 × 52 × 256 | 8 | 101 | Conv (R2) | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 117 |
18 | Conv (R5) | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 101 | Conv (R2) | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 149 |
19 | Conv (R5) | 256 | 3/1 | 52 × 52 × 128 | 52 × 52 × 256 | 8 | 117 | Conv | 128 | 1/1 | 26 × 26 × 512 | 26 × 26 × 128 | 16 | 149 |
20 | Conv (R6) | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 117 | Upsample | 2/1 | 26 × 26 × 128 | 52 × 52 × 128 | 16 | 149 | |
21 | Conv (R6) | 256 | 3/1 | 52 × 52 × 128 | 52 × 52 × 256 | 8 | 133 | Route 8 | ||||||
22 | Conv (R7) | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 133 | Reshape | 104 × 104 × 128 | 52 × 52 × 512 | 4 | 29 | ||
23 | Conv (R7) | 256 | 3/1 | 52 × 52 × 128 | 52 × 52 × 256 | 8 | 149 | Route 22,20,13 | ||||||
24 | Conv (R8) | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 149 | Conv | 256 | 1/1 | 52 × 52 × 896 | 52 × 52 × 256 | 8 | 69 |
25 | Conv (R8) | 256 | 3/1 | 52 × 52 × 128 | 52 × 52 × 256 | 8 | 165 | Conv | 512 | 3/1 | 52 × 52 × 256 | 52 × 52 × 512 | 8 | 85 |
26 | Conv | 512 | 3/2 | 52 × 52 × 256 | 26 × 26 × 512 | 16 | 181 | Conv | 256 | 1/1 | 52 × 52 × 512 | 52 × 52 × 256 | 8 | 85 |
27 | Conv (R1) | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 181 | Conv | 512 | 3/1 | 52 × 52 × 256 | 52 × 52 × 512 | 8 | 101 |
28 | Conv (R1) | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 213 | Conv | 255 | 1/1 | 52 × 52 × 512 | 52 × 52 × 255 | 8 | 101 |
29 | Conv (R2) | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 213 | Yolo | ||||||
30 | Conv (R2) | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 245 | |||||||
31 | Conv (R3) | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 245 | |||||||
32 | Conv (R3) | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 277 | |||||||
33 | Conv (R4) | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 277 | |||||||
34 | Conv (R4) | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 309 | |||||||
35 | Conv (R5) | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 309 | |||||||
36 | Conv (R5) | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 341 | |||||||
37 | Conv (R6) | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 341 | |||||||
38 | Conv (R6) | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 373 | |||||||
39 | Conv (R7) | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 373 | |||||||
40 | Conv (R7) | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 405 | |||||||
41 | Conv (R8) | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 405 | |||||||
42 | Conv (R8) | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 437 | |||||||
43 | Conv | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 437 | |||||||
44 | Conv | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 469 | |||||||
45 | Conv | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 469 | |||||||
46 | Conv | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 501 | |||||||
47 | Conv | 256 | 1/1 | 26 × 26 × 512 | 26 × 26 × 256 | 16 | 501 | |||||||
48 | Conv | 512 | 3/1 | 26 × 26 × 256 | 26 × 26 × 512 | 16 | 533 | |||||||
49 | Conv | 255 | 1/1 | 26 × 26 × 512 | 26 × 26 × 255 | 16 | 533 | |||||||
50 | Yolo | |||||||||||||
51 | Route 47 | |||||||||||||
52 | Conv | 128 | 1/1 | 26 × 26 × 256 | 26 × 26 × 128 | 16 | 501 | |||||||
53 | Upsample | 2/1 | 26 × 26 × 128 | 52 × 52 × 128 | 16 | 501 | ||||||||
54 | Route 53,25 | |||||||||||||
55 | Conv | 256 | 1/1 | 52 × 52 × 384 | 52 × 52 × 256 | 8 | 165 | |||||||
56 | Conv | 512 | 3/1 | 52 × 52 × 256 | 52 × 52 × 512 | 8 | 181 | |||||||
57 | Conv | 256 | 1/1 | 52 × 52 × 512 | 52 × 52 × 256 | 8 | 181 | |||||||
58 | Conv | 512 | 3/1 | 52 × 52 × 256 | 52 × 52 × 512 | 8 | 197 | |||||||
59 | Conv | 256 | 1/1 | 52 × 52 × 512 | 52 × 52 × 256 | 8 | 197 | |||||||
60 | Conv | 512 | 3/1 | 52 × 52 × 256 | 52 × 52 × 512 | 8 | 213 | |||||||
61 | Conv | 255 | 1/1 | 52 × 52 × 512 | 52 × 52 × 255 | 8 | 213 | |||||||
62 | Yolo | |||||||||||||
63 | Route 59 | |||||||||||||
64 | Conv | 128 | 1/1 | 52 × 52 × 256 | 52 × 52 × 128 | 8 | 197 | |||||||
65 | Upsample | 2/1 | 52 × 52 × 128 | 104 × 104 × 128 | 8 | 197 | ||||||||
66 | Route 65,8 | |||||||||||||
67 | Conv | 128 | 1/1 | 104 × 104 × 256 | 104 × 104 × 128 | 4 | 29 | |||||||
68 | Conv | 256 | 3/1 | 104 × 104 × 128 | 104 × 104 × 256 | 4 | 37 | |||||||
69 | Conv | 128 | 1/1 | 104 × 104 × 256 | 104 × 104 × 128 | 4 | 37 | |||||||
70 | Conv | 256 | 3/1 | 104 × 104 × 128 | 104 × 104 × 256 | 4 | 45 | |||||||
71 | Conv | 128 | 1/1 | 104 × 104 × 256 | 104 × 104 × 128 | 4 | 45 | |||||||
72 | Conv | 256 | 3/1 | 104 × 104 × 128 | 104 × 104 × 256 | 4 | 53 | |||||||
73 | Conv | 255 | 1/1 | 104 × 104 × 256 | 104 × 104 × 255 | 4 | 53 | |||||||
74 | Yolo |
Net | Par [M] | Vol [MB] | BFLOPs | OS | RF | CS |
---|---|---|---|---|---|---|
13 | 917 | 32 | ||||
YOLOv3 | 61.6 | 242.9 | 65.86 | 26 | 533 | 16 |
52 | 213 | 8 | ||||
Tiny-YOLOv3 | 8.7 | 34.7 | 5.58 | 13 | 318 | 32 |
26 | 110 | 16 | ||||
Ref. [3] | 0.7 | 3.4 | 6.73 | 52 | 133 | 8 |
26 | 533 | 16 | ||||
YOLO-L | 23.8 | 94.9 | 90.53 | 52 | 213 | 8 |
104 | 53 | 4 | ||||
YOLO-S | 7.8 | 31.9 | 34.59 | 52 | 101 | 8 |
YOLOv3 | Tiny-YOLOv3 | Reference [3] | YOLO-L | YOLO-S | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Metric [%] | Exp. 1 | Exp. 2 | Exp. 1 | Exp. 2 | Exp. 1 | Exp. 2 | Exp. 1 | Exp. 2 | Exp. 1 | Exp. 2 |
Van | 33.1 | 33.5 | 9.1 | 4.4 | 18.4 | 19.4 | 36.6 | 38.2 | 44.0 | 39.2 |
Truck | 47.1 | 47.4 | 17.5 | 12.3 | 17.0 | 18.6 | 47.3 | 50.3 | 50.5 | 48.1 |
Car | 47.8 | 44.9 | 19.1 | 12.2 | 46.5 | 40.6 | 57.3 | 54.8 | 58.7 | 57.0 |
Motorbike | 20.4 | 32.6 | 11.4 | 9.1 | 0.0 | 0.0 | 26.6 | 23.9 | 13.6 | 20.5 |
Person | 35.5 | 44.3 | 6.4 | 6.0 | 15.2 | 12.6 | 47.6 | 47.3 | 42.4 | 41.8 |
Other | 39.7 | 37.6 | 6.5 | 3.8 | 0.0 | 5.9 | 20.3 | 40.5 | 39.9 | 32.1 |
Boat | 39.5 | 34.5 | 13.5 | 10.1 | 44.8 | 34.6 | 52.5 | 60.3 | 65.9 | 59.0 |
Bus | 57.7 | 60.1 | 28.2 | 29.6 | 24.3 | 31.0 | 58.9 | 63.0 | 58.9 | 56.8 |
mAP | 40.1 | 41.9 | 13.9 | 10.9 | 20.8 | 20.3 | 43.4 | 47.3 | 46.7 | 44.3 |
wAP | 44.9 | 43.2 | 16.9 | 11.0 | 38.7 | 34.5 | 52.9 | 52.0 | 55.4 | 53.1 |
APS | 5.5 | 6.8 | 0.8 | 0.4 | 3.8 | 2.8 | 11.9 | 11.5 | 10.4 | 9.4 |
APM | 17.5 | 17.4 | 5.1 | 3.1 | 11.3 | 11.4 | 21.6 | 22.4 | 25.4 | 23.7 |
APL | 28.1 | 31.7 | 11.1 | 10.5 | 10.9 | 10.2 | 29.3 | 31.0 | 22.1 | 22.8 |
Rma | 54.4 | 52.9 | 27.5 | 20.1 | 46.4 | 42.0 | 60.7 | 59.7 | 62.0 | 60.2 |
Pma | 62.9 | 62.6 | 38.6 | 30.9 | 63.4 | 60.9 | 71.4 | 69.6 | 69.5 | 67.3 |
F1ma | 58.3 | 57.3 | 32.1 | 24.3 | 53.6 | 49.7 | 65.6 | 64.3 | 65.5 | 63.6 |
FPS | 5.9 | 5.9 | 10.5 | 10.5 | 9.7 | 9.7 | 5.8 | 5.8 | 8.1 | 8.1 |
Metric [%] | YOLOv3 | Tiny-YOLOv3 | Reference [3] | YOLO-L | YOLO-S | |||||
---|---|---|---|---|---|---|---|---|---|---|
Exp. 1 | Exp. 2 | Exp. 1 | Exp. 2 | Exp. 1 | Exp. 2 | Exp. 1 | Exp. 2 | Exp. 1 | Exp. 2 | |
Van | 62.6 | 31.4 | 38.7 | 0.4 | 54.2 | 27.1 | 59.4 | 38.2 | 70.7 | 44.5 |
Truck | 51.1 | 13.9 | 35.4 | 1.4 | 40.3 | 12.4 | 60.3 | 30.1 | 65.5 | 32.6 |
Car | 91.3 | 59.2 | 80.8 | 10.6 | 93.1 | 71.4 | 91.5 | 66.3 | 95.5 | 80.6 |
Tractor | 39.0 | 4.3 | 21.6 | 0.5 | 42.3 | 4.5 | 70.6 | 3.5 | 63.8 | 27.3 |
Other | 30.7 | 15.2 | 33.6 | 0.7 | 25.0 | 9.6 | 39.4 | 16.0 | 47.9 | 19.9 |
Plane | 86.7 | 96.1 | 89.9 | 26.4 | 81.1 | 57.3 | 98.7 | 62.9 | 75.0 | 74.2 |
Boat | 65.2 | 7.9 | 23.9 | 1.0 | 44.5 | 3.9 | 62.5 | 4.0 | 74.2 | 25.9 |
mAP | 60.9 | 32.6 | 46.3 | 5.8 | 54.3 | 26.6 | 68.9 | 31.6 | 70.4 | 43.6 |
wAP | 77.3 | 44.8 | 63.4 | 7.2 | 75.2 | 50.8 | 79.7 | 50.9 | 84.6 | 63.5 |
APS | 24.2 | 6.7 | 9.4 | 0.9 | 16.4 | 6.1 | 22.8 | 8.7 | 30.8 | 9.4 |
APM | 35.4 | 10.0 | 25.0 | 1.4 | 31.4 | 12.1 | 39.6 | 13.5 | 41.8 | 22.2 |
APL | 30.3 | 23.6 | 23.4 | 6.9 | 23.9 | 13.9 | 39.0 | 30.6 | 31.9 | 22.8 |
Rma | 83.4 | 52.8 | 70.4 | 16.7 | 80.3 | 57.3 | 84.9 | 57.5 | 88.1 | 68.8 |
Pma | 81.1 | 71.6 | 71.5 | 33.5 | 73.6 | 74.2 | 82.7 | 78.9 | 79.6 | 78.0 |
F1ma | 82.2 | 60.8 | 70.9 | 22.3 | 76.8 | 64.6 | 83.8 | 66.6 | 83.6 | 73.1 |
FPS | 6.4 | 18.5 | 13.0 | 31.6 | 11.9 | 29.8 | 6.3 | 18.3 | 9.7 | 25.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Betti, A.; Tucci, M. YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Detection in Aerial Imagery. Sensors 2023, 23, 1865. https://doi.org/10.3390/s23041865
Betti A, Tucci M. YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Detection in Aerial Imagery. Sensors. 2023; 23(4):1865. https://doi.org/10.3390/s23041865
Chicago/Turabian StyleBetti, Alessandro, and Mauro Tucci. 2023. "YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Detection in Aerial Imagery" Sensors 23, no. 4: 1865. https://doi.org/10.3390/s23041865
APA StyleBetti, A., & Tucci, M. (2023). YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Detection in Aerial Imagery. Sensors, 23(4), 1865. https://doi.org/10.3390/s23041865