Enhanced YOLOv8-Based Model with Context Enrichment Module for Crowd Counting in Complex Drone Imagery
Abstract
:1. Introduction
- We propose a modified YOLOv8-based framework specifically tailored for crowd counting in aerial images. The model is capable of accurately detecting, localizing, and counting the individuals in a complex environment with varying crowd densities and altitudes.
- We enhance YOLOv8 by introducing a CEM, which significantly improves its ability to detect small targets. The CEM effectively captures multiscale contextual information and increases the model’s ability to differentiate the tiny targets from complex backgrounds.
- To illustrate the efficacy of the proposed framework, we apply the model to a complex and challenging dataset, VisDrone-CC2020 [19]. However, the dataset provides dot annotations, which is incompatible with YOLOv8. To facilitate the training, we introduce a method that converts dot annotations into four-tuple bounding box annotations.
2. Related Work
2.1. Crowd Counting in Natural Images
2.2. Crowd Counting in Drone Images
3. Proposed Methodology
3.1. Context Enrichment Module
3.2. Spatial Pyramid Pooling Fast (SPPF)
4. Experiment Results
4.1. Dataset
4.2. Evaluation Metrics
4.3. Conversion of Dot Annotations to Bounding Boxes
4.4. Comparisons with Different Variants of YOLOv8
4.5. Comparisons with Different Generic Detectors
4.6. Comparisons with Crowd Counting Methods
4.7. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, T.; Chang, H.; Wang, M.; Ni, B.; Hong, R.; Yan, S. Crowded scene analysis: A survey. IEEE Trans. Circuits Syst. Video Technol. 2014, 25, 367–386. [Google Scholar] [CrossRef]
- Klatt, K.; Serino, R.; Davis, E.; Grimes, J.O. Crowd-Related Considerations at Mass Gathering Events: Management, Safety, and Dynamics. In Mass Gathering Medicine A Guide to the Medical Management of Large Events; Cambridge University Press: Cambridge, UK, 2024; p. 268. [Google Scholar]
- Kok, V.J.; Lim, M.K.; Chan, C.S. Crowd behavior analysis: A review where physics meets biology. Neurocomputing 2016, 177, 342–362. [Google Scholar] [CrossRef]
- Zhu, F.; Wang, X.; Yu, N. Crowd tracking with dynamic evolution of group structures. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part VI 13. pp. 139–154. [Google Scholar]
- Khan, M.A.; Menouar, H.; Hamila, R. Revisiting crowd counting: State-of-the-art, trends, and future perspectives. Image Vis. Comput. 2023, 129, 104597. [Google Scholar] [CrossRef]
- Basalamah, S.; Khan, S.D.; Felemban, E.; Naseer, A.; Rehman, F.U. Deep learning framework for congestion detection at public places via learning from synthetic data. J. King Saud-Univ. Comput. Inf. Sci. 2023, 35, 102–114. [Google Scholar] [CrossRef]
- Wang, J.; Guo, X.; Li, Q.; Abdelmoniem, A.M.; Gao, M. SDANet: Scale-deformation awareness network for crowd counting. J. Electron. Imaging 2024, 33, 043002. [Google Scholar] [CrossRef]
- Guo, H.; Wang, R.; Zhang, L.; Sun, Y. Dual convolutional neural network for crowd counting. Multimed. Tools Appl. 2024, 83, 26687–26709. [Google Scholar] [CrossRef]
- Chen, J.; Wang, Z. One-Shot Any-Scene Crowd Counting With Local-to-Global Guidance. IEEE Trans. Image Process. 2024. [Google Scholar] [CrossRef]
- Tripathy, S.K.; Srivastava, S.; Bajaj, D.; Srivastava, R. A Novel cascaded deep architecture with weak-supervision for video crowd counting and density estimation. Soft Comput. 2024, 28, 8319–8335. [Google Scholar] [CrossRef]
- Alhawsawi, A.N.; Khan, S.D.; Ur Rehman, F. Crowd Counting in Diverse Environments Using a Deep Routing Mechanism Informed by Crowd Density Levels. Information 2024, 15, 275. [Google Scholar] [CrossRef]
- Gao, M.; Souri, A.; Zaker, M.; Zhai, W.; Guo, X.; Li, Q. A comprehensive analysis for crowd counting methodologies and algorithms in Internet of Things. Clust. Comput. 2024, 27, 859–873. [Google Scholar] [CrossRef]
- Chavan, R.; Rani, G.; Thakkar, P.; Dhaka, V.S. CrowdDCNN: Deep convolution neural network for real-time crowd counting on IoT edge. Eng. Appl. Artif. Intell. 2023, 126, 107089. [Google Scholar] [CrossRef]
- Ptak, B.; Pieczyński, D.; Piechocki, M.; Kraft, M. On-board crowd counting and density estimation using low altitude unmanned aerial vehicles—Looking beyond beating the benchmark. Remote Sens. 2022, 14, 2288. [Google Scholar] [CrossRef]
- Nag, S.; Khandelwal, Y.; Mittal, S.; Mohan, C.K.; Qin, A.K. ARCN: A real-time attention-based network for crowd counting from drone images. In Proceedings of the 2021 IEEE 18th India Council International Conference (INDICON), Guwahati, India, 19–21 December 2021; pp. 1–6. [Google Scholar]
- Bakour, I.; Bouchali, H.N.; Allali, S.; Lacheheb, H. Soft-CSRNet: Real-time dilated convolutional neural networks for crowd counting with drones. In Proceedings of the 2020 2nd International Workshop on Human-Centric Smart Environments for Health and Well-being (IHSH), Boumerdes, Algeria, 9–10 February 2021; pp. 28–33. [Google Scholar]
- Elharrouss, O.; Almaadeed, N.; Abualsaud, K.; Al-Ali, A.; Mohamed, A.; Khattab, T.; Al-Maadeed, S. Drone-SCNet: Scaled cascade network for crowd counting on drone images. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 3988–4001. [Google Scholar] [CrossRef]
- Peng, T.; Li, Q.; Zhu, P. Rgb-t crowd counting from drone: A benchmark and mmccn network. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Liu, Z.; He, Z.; Wang, L.; Wang, W.; Yuan, Y.; Zhang, D.; Zhang, J.; Zhu, P.; Van Gool, L.; Han, J.; et al. VisDrone-CC2021: The vision meets drone crowd counting challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 2830–2838. [Google Scholar]
- Laradji, I.H.; Rostamzadeh, N.; Pinheiro, P.O.; Vazquez, D.; Schmidt, M. Where are the blobs: Counting by localization with point supervision. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 547–562. [Google Scholar]
- Li, Y.; Zhang, X.; Chen, D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1091–1100. [Google Scholar]
- Babu Sam, D.; Surya, S.; Venkatesh Babu, R. Switching convolutional neural network for crowd counting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5744–5752. [Google Scholar]
- Wang, B.; Liu, H.; Samaras, D.; Nguyen, M.H. Distribution matching for crowd counting. Adv. Neural Inf. Process. Syst. 2020, 33, 1595–1607. [Google Scholar]
- Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
- Yi, H.; Liu, B.; Zhao, B.; Liu, E. Small object detection algorithm based on improved YOLOv8 for remote sensing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 1734–1747. [Google Scholar] [CrossRef]
- Ma, M.; Pang, H. SP-YOLOv8s: An improved YOLOv8s model for remote sensing image tiny object detection. Appl. Sci. 2023, 13, 8161. [Google Scholar] [CrossRef]
- Zhai, X.; Huang, Z.; Li, T.; Liu, H.; Wang, S. YOLO-Drone: An optimized YOLOv8 network for tiny UAV object detection. Electronics 2023, 12, 3664. [Google Scholar] [CrossRef]
- Chan, A.B.; Vasconcelos, N. Counting people with low-level features and Bayesian regression. IEEE Trans. Image Process. 2011, 21, 2160–2177. [Google Scholar] [CrossRef]
- Chen, K.; Loy, C.C.; Gong, S.; Xiang, T. Feature Mining for Localised Crowd Counting; BMVC: Glasgow, UK, 2012; Volume 1, p. 3. [Google Scholar]
- Wang, Y.; Lian, H.; Chen, P.; Lu, Z. Counting people with support vector regression. In Proceedings of the 2014 10th International Conference on Natural Computation (ICNC), Xiamen, China, 19–21 August 2014; pp. 139–143. [Google Scholar]
- Saqib, M.; Khan, S.D.; Blumenstein, M. Texture-based feature mining for crowd density estimation: A study. In Proceedings of the 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ), Palmerston North, New Zealand, 21–22 November 2016; pp. 1–6. [Google Scholar]
- Zhang, Y.; Zhou, D.; Chen, S.; Gao, S.; Ma, Y. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 589–597. [Google Scholar]
- Boominathan, L.; Kruthiventi, S.S.; Babu, R.V. Crowdnet: A deep convolutional network for dense crowd counting. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 640–644. [Google Scholar]
- Ranjan, V.; Le, H.; Hoai, M. Iterative crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 270–285. [Google Scholar]
- Sindagi, V.A.; Patel, V.M. A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recognit. Lett. 2018, 107, 3–16. [Google Scholar] [CrossRef]
- Zeng, L.; Xu, X.; Cai, B.; Qiu, S.; Zhang, T. Multi-scale convolutional neural networks for crowd counting. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 465–469. [Google Scholar]
- Cao, X.; Wang, Z.; Zhao, Y.; Su, F. Scale aggregation network for accurate and efficient crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Babu Sam, D.; Sajjan, N.N.; Venkatesh Babu, R.; Srinivasan, M. Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3618–3626. [Google Scholar]
- Sindagi, V.A.; Patel, V.M. Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar]
- Idrees, H.; Tayyab, M.; Athrey, K.; Zhang, D.; Al-Maadeed, S.; Rajpoot, N.; Shah, M. Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 532–546. [Google Scholar]
- Xiong, F.; Shi, X.; Yeung, D.Y. Spatiotemporal modeling for crowd counting in videos. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5151–5159. [Google Scholar]
- Hu, Y.; Jiang, X.; Liu, X.; Zhang, B.; Han, J.; Cao, X.; Doermann, D. NAS-Count: Counting-by-Density with Neural Architecture Search. arXiv 2020, arXiv:2003.00217. [Google Scholar]
- Zhai, W.; Gao, M.; Li, Q.; Jeon, G.; Anisetti, M. FPANet: Feature pyramid attention network for crowd counting. Appl. Intell. 2023, 53, 19199–19216. [Google Scholar] [CrossRef]
- Wang, T.; Zhang, T.; Zhang, K.; Wang, H.; Li, M.; Lu, J. Context attention fusion network for crowd counting. Knowl. Based Syst. 2023, 271, 110541. [Google Scholar] [CrossRef]
- Du, Z.; Shi, M.; Deng, J.; Zafeiriou, S. Redesigning multi-scale neural network for crowd counting. IEEE Trans. Image Process. 2023, 32, 3664–3678. [Google Scholar] [CrossRef]
- Wang, R.; Hao, Y.; Hu, L.; Chen, J.; Chen, M.; Wu, D. Self-supervised learning with data-efficient supervised fine-tuning for crowd counting. IEEE Trans. Multimed. 2023, 25, 1538–1546. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, Y.; Li, B.; Piao, X.; Yin, B. CrowdGraph: Weakly supervised crowd counting via pure graph neural network. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 1–23. [Google Scholar] [CrossRef]
- Yan, L.; Zhang, L.; Zheng, X.; Li, F. Deep feature network with multi-scale fusion for highly congested crowd counting. Int. J. Mach. Learn. Cybern. 2024, 15, 819–835. [Google Scholar] [CrossRef]
- Küchhold, M.; Simon, M.; Eiselein, V.; Sikora, T. Scale-adaptive real-time crowd detection and counting for drone images. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 943–947. [Google Scholar]
- Zhang, B.; Du, Y.; Zhao, Y.; Wan, J.; Tong, Z. I-MMCCN: Improved MMCCN for RGB-T crowd counting of drone images. In Proceedings of the 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC), Beijing, China, 17–19 November 2021; pp. 117–121. [Google Scholar]
- Castellano, G.; Cotardo, E.; Mencar, C.; Vessio, G. Density-based clustering with fully-convolutional networks for crowd flow detection from drones. Neurocomputing 2023, 526, 169–179. [Google Scholar] [CrossRef]
- Chen, J.; Xiu, S.; Chen, X.; Guo, H.; Xie, X. Flounder-Net: An efficient CNN for crowd counting by aerial photography. Neurocomputing 2021, 420, 82–89. [Google Scholar] [CrossRef]
- Castellano, G.; Castiello, C.; Mencar, C.; Vessio, G. Crowd detection in aerial images using spatial graphs and fully-convolutional neural networks. IEEE Access 2020, 8, 64534–64544. [Google Scholar] [CrossRef]
- Bai, H.; Wen, S.; Gary Chan, S.H. Crowd counting on images with scale variation and isolated clusters. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Zhao, L.; Bao, Z.; Xie, Z.; Huang, G.; Rehman, Z.U. A point and density map hybrid network for crowd counting and localization based on unmanned aerial vehicles. Connect. Sci. 2022, 34, 2481–2499. [Google Scholar] [CrossRef]
- Bahmanyar, R.; Vig, E.; Reinartz, P. MRCNet: Crowd counting and density map estimation in aerial and ground imagery. arXiv 2019, arXiv:1909.12743. [Google Scholar]
- Husman, M.A.; Albattah, W.; Abidin, Z.Z.; Mustafah, Y.M.; Kadir, K.; Habib, S.; Islam, M.; Khan, S. Unmanned aerial vehicles for crowd monitoring and analysis. Electronics 2021, 10, 2974. [Google Scholar] [CrossRef]
- Gu, S.; Lian, Z. A unified multi-task learning framework of real-time drone supervision for crowd counting. arXiv 2022, arXiv:2202.03843. [Google Scholar]
- Almagbile, A. Estimation of crowd density from UAVs images based on corner detection procedures and clustering analysis. Geo Spat. Inf. Sci. 2019, 22, 23–34. [Google Scholar] [CrossRef]
- Zhu, J.; Hu, T.; Zheng, L.; Zhou, N.; Ge, H.; Hong, Z. YOLOv8-C2f-Faster-EMA: An Improved Underwater Trash Detection Model Based on YOLOv8. Sensors 2024, 24, 2483. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H. Designing network design strategies through gradient path analysis. arXiv 2022, arXiv:2211.04800. [Google Scholar]
- Zhang, Z. Drone-YOLO: An efficient neural network method for target detection in drone images. Drones 2023, 7, 526. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Wen, L.; Du, D.; Zhu, P.; Hu, Q.; Wang, Q.; Bo, L.; Lyu, S. Detection, tracking, and counting meets drones in crowds: A benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 7812–7821. [Google Scholar]
- Zhu, P.; Peng, T.; Du, D.; Yu, H.; Zhang, L.; Hu, Q. Graph regularized flow attention network for video animal counting from drones. IEEE Trans. Image Process. 2021, 30, 5339–5351. [Google Scholar] [CrossRef]
- Zhang, C.; Li, H.; Wang, X.; Yang, X. Cross-scene crowd counting via deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 833–841. [Google Scholar]
- Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.S. Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2022, 190, 79–93. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22—29 October 2017; pp. 2980–2988. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2018; pp. 4203–4212. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 6569–6578. [Google Scholar]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Deb, D.; Ventura, J. An aggregated multicolumn dilated convolution network for perspective-free counting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 195–204. [Google Scholar]
- Golda, T.; Krüger, F.; Beyerer, J. Temporal Extension for Encoder-Decoder-based Crowd Counting Approaches. In Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA), Virtual, 25–27 July 2021; pp. 1–5. [Google Scholar]
- Huang, S.; Li, X.; Cheng, Z.Q.; Zhang, Z.; Hauptmann, A. Stacked pooling: Improving crowd counting by boosting scale invariance. arXiv 2018, arXiv:1808.07456. [Google Scholar]
- Zou, Z.; Su, X.; Qu, X.; Zhou, P. DA-Net: Learning the fine-grained density distribution with deformation aggregation network. IEEE Access 2018, 6, 60745–60756. [Google Scholar] [CrossRef]
- Shen, Z.; Xu, Y.; Ni, B.; Wang, M.; Hu, J.; Yang, X. Crowd counting via adversarial cross-scale consistency pursuit. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2018; pp. 5245–5254. [Google Scholar]
- Zhu, L.; Zhao, Z.; Lu, C.; Lin, Y.; Peng, Y.; Yao, T. Dual path multi-scale fusion networks with attention for crowd counting. arXiv 2019, arXiv:1902.01115. [Google Scholar]
Model | Parameters | mAP@50 | mAP@70 | I.T (ms) | Size (MB) | GFLOPs |
---|---|---|---|---|---|---|
YOLOv8n | 3.01 | 51.27 | 42.51 | 7.20 | 6.5 | 8.2 |
YOLOv8s | 11.12 | 57.65 | 51.41 | 12.42 | 22.6 | 28.4 |
YOLOv8m | 25.84 | 68.82 | 56.34 | 20.50 | 52.1 | 78.7 |
YOLOv8l | 43.61 | 76.10 | 65.29 | 16.40 | 87.8 | 164.8 |
YOLOv8x | 68.12 | 79.86 | 70.46 | 19.00 | 136.9 | 257.4 |
Proposed | 72.54 | 82.10 | 76.23 | 21.10 | 164.2 | 294.5 |
Model | mAP@50 | mAP@70 |
---|---|---|
Faster-R-CNN [68] | 23.74 | 15.32 |
YOLOv3-spp [69] | 48.52 | 44.28 |
Cascade R-CNN [70] | 42.40 | 32.23 |
RetinaNet [71] | 34.12 | 29.42 |
ATSS [72] | 41.75 | 30.17 |
RefineDet [73] | 28.62 | 19.72 |
YOLOv5s | 54.39 | 49.58 |
CenterNet [74] | 40.64 | 29.25 |
NWD [67] | 77.24 | 72.52 |
YOLOv9n [75] | 68.41 | 64.19 |
YOLOv10n [76] | 74.20 | 69.72 |
Proposed | 82.10 | 76.23 |
Model | MAE | MSE |
---|---|---|
LCFCN [20] | 136.90 | 150.60 |
AMDCN [77] | 165.60 | 167.70 |
MSCNN [36] | 58.00 | 75.20 |
StackPooling [79] | 68.8 0 | 77.20 |
SwitchCNN [22] | 66.50 | 77.80 |
DA-Net [80] | 36.5 0 | 47.30 |
C-MTL [39] | 56.70 | 65.90 |
ACSCP [81] | 48.10 | 60.20 |
SFANet [82] | 39.70 | 48.30 |
MRCNet [56] | 46.70 | 58.30 |
TE-M2O (MRCNet) [78] | 46.70 | 59.80 |
TE-M20 (SFANet) [78] | 46.00 | 55.50 |
MTE-(MRCNet) [78] | 44.30 | 56.90 |
MTE-(SFANet) [78] | 33.20 | 41.80 |
CSRNet [21] | 19.8 0 | 25.60 |
Proposed | 25.42 | 34.73 |
Method | mAP@50 | mAP@70 |
---|---|---|
YOLOv3 | 48.52 | 44.28 |
YOLOv3 + CEM | 52.10 | 46.37 |
YOLOv8s + SPPF | 57.65 | 51.41 |
YOLOv8s + SPPF + CEM | 59.28 | 55.62 |
YOLOv8s + SPP | 59.64 | 54.44 |
YOLOv8s + SPP + CEM | 61.12 | 56.42 |
Configuration | Dilation Rate | mAP@50 | mAP@70 |
---|---|---|---|
1 | d = 1 | 77.10 | 71.34 |
2 | d = 3 | 79.24 | 73.75 |
3 | d = 5 | 78.68 | 72.32 |
4 | d = 1, d = 3 | 80.35 | 75.15 |
5 | d = 1, d = 5 | 79.45 | 74.01 |
6 | d = 3, d = 5 | 79.90 | 74.45 |
7 (Proposed) | d = 1, d = 3, d = 5 | 82.10 | 76.23 |
8 | d = 1, d = 3, d = 5, d = 7 | 81.90 | 75.80 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alhawsawi, A.N.; Khan, S.D.; Rehman, F.U. Enhanced YOLOv8-Based Model with Context Enrichment Module for Crowd Counting in Complex Drone Imagery. Remote Sens. 2024, 16, 4175. https://doi.org/10.3390/rs16224175
Alhawsawi AN, Khan SD, Rehman FU. Enhanced YOLOv8-Based Model with Context Enrichment Module for Crowd Counting in Complex Drone Imagery. Remote Sensing. 2024; 16(22):4175. https://doi.org/10.3390/rs16224175
Chicago/Turabian StyleAlhawsawi, Abdullah N., Sultan Daud Khan, and Faizan Ur Rehman. 2024. "Enhanced YOLOv8-Based Model with Context Enrichment Module for Crowd Counting in Complex Drone Imagery" Remote Sensing 16, no. 22: 4175. https://doi.org/10.3390/rs16224175
APA StyleAlhawsawi, A. N., Khan, S. D., & Rehman, F. U. (2024). Enhanced YOLOv8-Based Model with Context Enrichment Module for Crowd Counting in Complex Drone Imagery. Remote Sensing, 16(22), 4175. https://doi.org/10.3390/rs16224175