Investigating Training Datasets of Real and Synthetic Images for Outdoor Swimmer Localisation with YOLO
Abstract
:1. Introduction
Our Contribution
2. YOLO Models
2.1. YOLOv1
2.2. YOLOv3
2.3. YOLOv5
2.4. YOLOv8
3. Data Preparation and Experimental Setup
3.1. Real Images
3.2. Synthetic Images
3.3. Overview of the Image Datasets as Used in Our Work
3.4. Images Used for Quantitative Assessment: Evaluation Dataset
3.5. Data Augmentation
3.6. Experimental Setup
- The impact of replacing real images with synthetic images.
- The benefits of adding synthetic images to real images.
4. Results and Discussion
4.1. YOLOv3 vs. YOLOv5 vs. YOLOv8
4.2. Experiment 1: Replacing Real Images with Synthetic Images
Conclusions on Experiment 1
4.3. Experiment 2: Adding Synthetic Images to Real Images
Conclusions
4.4. Detection on Real Images with Objects
5. Discussion and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
FPN | Feature Pyramid Network; |
IoU | Intersection over Union; |
mAP | mean Average Precision; |
TP | True Positive; |
FP | False Positive; |
FN | False Negative; |
CNN | Convolutional Neural Network; |
YOLO | You Only Look Once. |
References
- Shatnawi, M.; Albreiki, F.; Alkhoori, A.; Alhebshi, M. Deep Learning and Vision-Based Early Drowning Detection. Information 2023, 14, 52. [Google Scholar] [CrossRef]
- Xiao, H.; Li, Y.; Xiu, Y.; Xia, Q. Development of outdoor swimmers detection system with small object detection method based on deep learning. Multimed. Syst. 2022, 29, 323–332. [Google Scholar] [CrossRef]
- Cafarelli, D.; Ciampi, L.; Vadicamo, L.; Gennaro, C.; Berton, A.; Paterni, M.; Benvenuti, C.; Passera, M.; Falchi, F. MOBDrone: A Drone Video Dataset for Man OverBoard Rescue. In Proceedings of the Image Analysis and Processing—ICIAP 2022, Lecce, Italy, 23–27 May 2022; Springer: Cham, Switzerland, 2022; pp. 633–644. [Google Scholar]
- Handalage, U.; Nikapotha, N.; Subasinghe, C.; Prasanga, T.; Thilakarthna, T.; Kasthurirathna, D. Computer Vision Enabled Drowning Detection System. In Proceedings of the 2021 3rd International Conference on Advancements in Computing (ICAC), Colombo, Sri Lanka, 9–11 December 2021; pp. 240–245. [Google Scholar] [CrossRef]
- “Drowning”, 25 July 2023. Available online: https://www.who.int/news-room/fact-sheets/detail/drowning (accessed on 12 March 2024).
- Drowning—United States, 2005–2009; CDC: Atlanta, GA, USA, 2012.
- Seguin, C.; Blaquière, G.; Loundou, A.; Michelet, P.; Markarian, T. Unmanned aerial vehicles (drones) to prevent drowning. Resuscitation 2018, 127, 63–67. [Google Scholar] [CrossRef] [PubMed]
- Piccardi, M. Background subtraction techniques: A review. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), The Hague, The Netherlands, 10–13 October 2004; Volume 4, pp. 3099–3104. [Google Scholar] [CrossRef]
- Georgakis, G.; Mousavian, A.; Berg, A.C.; Kosecka, J. Synthesizing training data for object detection in indoor scenes. arXiv 2017, arXiv:1702.07836. [Google Scholar]
- Benarab, D.; Napoléon, T.; Alfalou, A.; Verney, A.; Hellard, P. Swimmer’s Head Detection Based on a Contrario and Scaled Composite JTC Approaches. Int. J. Opt. 2020, 2020, 4145938. [Google Scholar] [CrossRef]
- Pogalin, E.; Thean, A.H.C.; Baan, J.; Schipper, N.W.; Smeulders, A.W.M. Video-based training registration for swimmers. Int. J. Comput. Sci. Sport 2007, 6, 4–17. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv 2013, arXiv:1311.2524. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2017, arXiv:1703.06870. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef]
- Terven, J.; Cordova-Esparza, D. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv 2023, arXiv:2304.00501. [Google Scholar]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2015, arXiv:1506.02640. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.; Liao, H.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; NanoCode012; Kwon, Y.; Tao, X.; Michael, K.; Fang, J.; Imyhxy; et al. ultralytics/yolov5: v6.2—YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai Integrations (v6.2). Zenodo 2022. Available online: https://zenodo.org/records/7002879 (accessed on 12 March 2024).
- Divvala, S.K.; Hoiem, D.; Hays, J.H.; Efros, A.A.; Hebert, M. An empirical study of context in object detection. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1271–1278. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Zhang, X.; Sun, J. Object Detection Networks on Convolutional Feature Maps. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1476–1481. [Google Scholar] [CrossRef] [PubMed]
- Chlap, P.; Min, H.; Vandenberg, N.; Dowling, J.; Holloway, L.; Haworth, A. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 2021, 65, 545–563. [Google Scholar] [CrossRef] [PubMed]
- Zoph, B.; Cubuk, E.D.; Ghiasi, G.; Lin, T.Y.; Shlens, J.; Le, Q.V. Learning Data Augmentation Strategies for Object Detection. ECCV 2020. In Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12372, pp. 566–583. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Yarahmadi, A.M.; Breuß, M.; Mohammadi, M.K. Explaining StyleGAN Synthesized Swimmer Images in Low-Dimensional Space. In Proceedings of the Computer Analysis of Images and Patterns, Limassol, Cyprus, 25–28 September 2023; Springer: Cham, Switzerland, 2023; pp. 164–173. [Google Scholar]
- Sha, L.; Lucey, P.; Morgan, S.; Pease, D.L.; Sridharan, S. Swimmer Localization from a Moving Camera. In Proceedings of the 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Hobart, Australia, 26–28 November 2013; pp. 1–8. [Google Scholar]
- Bahri, F.; Ray, N. Weakly Supervised Realtime Dynamic Background Subtraction. arXiv 2023, arXiv:2303.02857. [Google Scholar]
- Kara, E.; Zhang, G.; Williams, J.J.; Ferrandez-Quinto, G.; Rhoden, L.J.; Kim, M.; Kutz, J.N.; Rahman, A. Deep Learning Based Object Tracking in Walking Droplet and Granular Intruder Experiments. arXiv 2023, arXiv:2302.05425. [Google Scholar] [CrossRef]
- Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern Recognit. 2020, 106, 107404. [Google Scholar] [CrossRef]
- Zivkovic, Z. Improved adaptive Gaussian mixture model for background subtraction. In Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, 23–26 August 2004; Volume 2, pp. 28–31. [Google Scholar] [CrossRef]
- Zivkovic, Z.; van der Heijden, F. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit. Lett. 2006, 27, 773–780. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar] [CrossRef]
- Everingham, M.; Eslami, S.M.; Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vision 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regression. arXiv 2019, arXiv:1902.09630. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. arXiv 2016, arXiv:1612.03144. [Google Scholar] [CrossRef]
- Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar]
- Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; NanoCode012; Kwon, Y.; Michael, K.; Tao, X.; Fang, J.; Imyhxy; et al. ultralytics/yolov5: v7.0—YOLOv5 SOTA Realtime Instance Segmentation (v7.0). Zenodo 2022. Available online: https://ieeexplore.ieee.org/document/5206532 (accessed on 12 March 2024).
- Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. arXiv 2019, arXiv:1911.11929. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. arXiv 2018, arXiv:1803.01534. [Google Scholar] [CrossRef]
- Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Talaat, F.M.; ZainEldin, H. An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. 2023, 35, 20939–20954. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO (Version 8.0.0) [Computer Software]. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 12 March 2024).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Coughlin, S. Swimmers. 2021. Available online: https://www.kaggle.com/datasets/seanmc4/swimmers (accessed on 12 March 2024).
- Xu, Y.; Goodacre, R. On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning. J. Anal. Test. 2018, 2, 249–262. [Google Scholar] [CrossRef] [PubMed]
- Jung, A.B.; Wada, K.; Crall, J.; Tanaka, S.; Graving, J.; Reinders, C.; Yadav, S.; Banerjee, J.; Vecsei, G.; Kraft, A.; et al. Imgaug [Computer Software]. 2020. Available online: https://github.com/aleju/imgaug (accessed on 12 March 2024).
Dataset | Experiment 1 | Experiment 2 |
---|---|---|
1 | 150 real + 0 synthetic | 150 real + 25 synthetic |
2 | 135 real + 15 synthetic | 150 real + 50 synthetic |
3 | 120 real + 30 synthetic | 150 real + 75 synthetic |
4 | 105 real + 45 synthetic | 150 real + 100 synthetic |
5 | 90 real + 60 synthetic | 150 real + 125 synthetic |
6 | 75 real + 75 synthetic | 150 real + 150 synthetic |
7 | 60 real + 90 synthetic | |
8 | 45 real + 105 synthetic | |
9 | 30 real + 120 synthetic | |
10 | 15 real + 135 synthetic | |
11 | 0 real + 150 synthetic |
Dataset | YOLOv3 | YOLOv5 | YOLOv8 | |||
---|---|---|---|---|---|---|
[email protected] | [email protected]:.95 | [email protected] | [email protected]:.95 | [email protected] | [email protected]:.95 | |
exp1-1 | 0.960 | 0.770 | 0.935 | 0.725 | 0.979 | 0.780 |
exp1-6 | 0.983 | 0.797 | 0.963 | 0.812 | 0.985 | 0.825 |
exp1-11 | 0.911 | 0.727 | 0.904 | 0.656 | 0.245 | 0.183 |
exp2-1 | 0.983 | 0.797 | 0.666 | 0.520 | 0.949 | 0.751 |
exp2-3 | 0.941 | 0.764 | 0.719 | 0.489 | 0.995 | 0.794 |
exp2-6 | 0.967 | 0.814 | 0.625 | 0.476 | 0.931 | 0.720 |
Number | Batch Size | Test Dataset | Evaluation Dataset | ||
---|---|---|---|---|---|
[email protected] | [email protected]:.95 | [email protected] | [email protected]:.95 | ||
1 | 128 | 0.995 | 0.995 | 0.907 | 0.682 |
2 | 64 | 0.995 | 0.994 | 0.928 | 0.713 |
3 | 32 | 0.995 | 0.994 | 0.924 | 0.697 |
4 | 16 | 0.995 | 0.995 | 0.983 | 0.797 |
5 | 8 | 0.995 | 0.993 | 0.944 | 0.731 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khan Mohammadi, M.; Schneidereit, T.; Mansouri Yarahmadi, A.; Breuß, M. Investigating Training Datasets of Real and Synthetic Images for Outdoor Swimmer Localisation with YOLO. AI 2024, 5, 576-593. https://doi.org/10.3390/ai5020030
Khan Mohammadi M, Schneidereit T, Mansouri Yarahmadi A, Breuß M. Investigating Training Datasets of Real and Synthetic Images for Outdoor Swimmer Localisation with YOLO. AI. 2024; 5(2):576-593. https://doi.org/10.3390/ai5020030
Chicago/Turabian StyleKhan Mohammadi, Mohsen, Toni Schneidereit, Ashkan Mansouri Yarahmadi, and Michael Breuß. 2024. "Investigating Training Datasets of Real and Synthetic Images for Outdoor Swimmer Localisation with YOLO" AI 5, no. 2: 576-593. https://doi.org/10.3390/ai5020030
APA StyleKhan Mohammadi, M., Schneidereit, T., Mansouri Yarahmadi, A., & Breuß, M. (2024). Investigating Training Datasets of Real and Synthetic Images for Outdoor Swimmer Localisation with YOLO. AI, 5(2), 576-593. https://doi.org/10.3390/ai5020030