Enhancing Object Detection in Smart Video Surveillance: A Survey of Occlusion-Handling Approaches
Abstract
:1. Introduction
- Thorough literature summary: Provides a comprehensive overview of successful techniques for occlusion handling in object detection.
- Extensive algorithm evaluation: Conducts a detailed evaluation of current object detection algorithms in occlusion scenarios, offering experimental analysis and insights.
- Identification of challenges: Highlights the challenges faced by state-of-the-art deep learning algorithms in handling partial occlusion, emphasizing the need for more effective solutions.
- Motivation for future research: Serves as a foundational resource, guiding future research in the realm of object detection in occluded scenes.
- Utilization of multi-object datasets: Integrates the use of multi-object datasets for occlusion-handling evaluations, ensuring the robustness and applicability of proposed techniques to diverse real-world scenarios.
2. Related Works
2.1. Generative Algorithms for Occlusion Handling
2.1.1. Generative Adversarial Networks Approach
2.1.2. Probabilistic Occupancy Map (POM) Approach
2.1.3. Compositional Generative Networks Approach
2.2. Deep Learning Strategies for Occlusion Handling
2.2.1. Single-Stage Detector Algorithms
2.2.2. Two-Stage Detector Algorithms
2.3. Alternative Approaches for Occlusion Handling
2.3.1. Graphical Models for Occlusion Handling
2.3.2. Data Augmentation for Occlusion Handling
2.4. Summary
3. Proposed Comparative Analysis
3.1. Occlusion-Handling Databases
3.2. Evaluation Criteria
3.3. Experimental Results
3.4. Discussion
4. Conclusions and Future Directions
Funding
Data Availability Statement
Conflicts of Interest
References
- Pandya, S.; Srivastava, G.; Jhaveri, R.; Babu, M.R.; Bhattacharya, S.; Maddikunta, P.K.R.; Mastorakis, S.; Piran, M.J.; Gadekallu, T.R. Federated learning for smart cities: A comprehensive survey. Sustain. Energy Technol. Assess. 2023, 55, 102987. [Google Scholar] [CrossRef]
- Dhivya, C.; Monika, A. Cutting Edge Technologies in Agriculture—Role and Its Application. In Encyclopedia of Agriculture and Allied Sciences; Royal Book Publishing-International: St Baltimore, MD, USA, 2023. [Google Scholar]
- Wang, T.; Gan, V.J. Automated joint 3D reconstruction and visual inspection for buildings using computer vision and transfer learning. Autom. Constr. 2023, 149, 104810. [Google Scholar] [CrossRef]
- Thiruthaigesan, K.; Nawarathna, R.; Ragel, R. Detection of Suspicious Objects in Unconstrained Environments to Effectively Manage a Security System in Sri Lanka. In Multisectoral Approaches to Accelerate Economic Transformation in the Face of Crisis in Sri Lanka; National Science and Technology Commission, Sri Lanka Young Scientists Forum (YSF): Battaramulla, Sri Lanka, 2023; p. 255. [Google Scholar]
- Aliouat, A.; Kouadria, N.; Maimour, M.; Harize, S.; Doghmane, N. Region-of-interest based video coding strategy for rate/energy-constrained smart surveillance systems using WMSNs. Hoc Netw. 2023, 140, 103076. [Google Scholar] [CrossRef]
- Dellermann, D.; Ebel, P.; Söllner, M.; Leimeister, J.M. Hybrid intelligence. Bus. Inf. Syst. Eng. 2019, 61, 637–643. [Google Scholar] [CrossRef]
- Zhang, X.; Yao, L.; Wang, X.; Monaghan, J.; Mcalpine, D.; Zhang, Y. A survey on deep learning-based non-invasive brain signals: Recent advances and new frontiers. J. Neural Eng. 2021, 18, 031002. [Google Scholar] [CrossRef]
- Gunasekaran, K.P.; Jaiman, N. Now You See Me: Robust approach to Partial Occlusions. arXiv 2023, arXiv:2304.11779. [Google Scholar]
- Beymer, D.; McLauchlan, P.; Coifman, B.; Malik, J. A real-time computer vision system for measuring traffic parameters. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA, 17–19 June 1997; IEEE: Piscataway, NJ, USA, 1997; pp. 495–501. [Google Scholar]
- Wen, L.; Du, D.; Cai, Z.; Lei, Z.; Chang, M.C.; Qi, H.; Lim, J.; Yang, M.H.; Lyu, S. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 2020, 193, 102907. [Google Scholar] [CrossRef]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
- Ye, H.; Zhao, J.; Pan, Y.; Cherr, W.; He, L.; Zhang, H. Robot Person Following Under Partial Occlusion. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 7591–7597. [Google Scholar]
- Jha, S.; Seo, C.; Yang, E.; Joshi, G.P. Real time object detection and trackingsystem for video surveillance system. Multimed. Tools Appl. 2021, 80, 3981–3996. [Google Scholar] [CrossRef]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
- Kortylewski, A.; Liu, Q.; Wang, A.; Sun, Y.; Yuille, A. Compositional convolutional neural networks: A robust and interpretable model for object recognition under occlusion. Int. J. Comput. Vis. 2021, 129, 736–760. [Google Scholar] [CrossRef]
- Zhan, X.; Pan, X.; Dai, B.; Liu, Z.; Lin, D.; Loy, C.C. Self-supervised scene de-occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3784–3792. [Google Scholar]
- Ehsani, K.; Mottaghi, R.; Farhadi, A. Segan: Segmenting and generating the invisible. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6144–6153. [Google Scholar]
- Bagautdinov, T.; Fleuret, F.; Fua, P. Probability occupancy maps for occluded depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2829–2837. [Google Scholar]
- Wang, A.; Sun, Y.; Kortylewski, A.; Yuille, A.L. Robust object detection under occlusion with context-aware compositionalnets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12645–12654. [Google Scholar]
- Bharati, P.; Pramanik, A. Deep learning techniques—R-CNN to mask R-CNN: A survey. In Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019; Springer: Berlin/Heidelberg, Germany, 2020; pp. 657–668. [Google Scholar]
- Ali, W.; Abdelkarim, S.; Zidan, M.; Zahran, M.; El Sallab, A. Yolo3d: End-to-end real-time 3d oriented object bounding box detection from lidar point cloud. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Takahashi, M.; Ji, Y.; Umeda, K.; Moro, A. Expandable YOLO: 3D object detection from RGB-D images. In Proceedings of the 2020 21st International Conference on Research and Education in Mechatronics (REM), Cracow, Poland, 9–11 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
- Ouyang, W.; Wang, X.; Zeng, X.; Qiu, S.; Luo, P.; Tian, Y.; Li, H.; Yang, S.; Wang, Z.; Loy, C.C.; et al. Deepid-net: Deformable deep convolutional neural networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2403–2412. [Google Scholar]
- Yang, C.; Ablavsky, V.; Wang, K.; Feng, Q.; Betke, M. Learning to separate: Detecting heavily-occluded objects in urban scenes. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 530–546. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Gu, C.; Dullien, T.; Vinyals, O.; Kohli, P. Graph matching networks for learning the similarity of graph structured objects. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; PMLR: London, UK, 2019; pp. 3835–3845. [Google Scholar]
- Cao, Y.; Wang, Y.; Peng, J.; Zhang, L.; Xu, L.; Yan, K.; Li, L. DML-GANR: Deep metric learning with generative adversarial network regularization for high spatial resolution remote sensing image retrieval. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8888–8904. [Google Scholar] [CrossRef]
- Qiu, F.; Pi, Y.; Liu, K.; Li, X.; Zhang, J.; Wu, Y. Influence of sports expertise level on attention in multiple object tracking. PeerJ 2018, 6, e5732. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Li, X.; Luo, P.; Loy, C.C.; Tang, X. Deep learning markov random field for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1814–1828. [Google Scholar] [CrossRef]
- Singh, K.K.; Yu, H.; Sarmasi, A.; Pradeep, G.; Lee, Y.J. Hide-and-seek: A data augmentation technique for weakly-supervised localization and beyond. arXiv 2018, arXiv:1811.02545. [Google Scholar]
- Li, P.; Li, X.; Long, X. Fencemask: A data augmentation approach for pre-extracted image features. arXiv 2020, arXiv:2006.07877. [Google Scholar]
- DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13001–13008. [Google Scholar]
- Saleh, K.; Szénási, S.; Vámossy, Z. Occlusion handling in generic object detection: A review. In Proceedings of the 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), Herl’any, Slovakia, 21–23 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 000477–000484. [Google Scholar]
- Saleh, K.; Szénási, S.; Vámossy, Z. Generative Adversarial Network for Overcoming Occlusion in Images: A Survey. Algorithms 2023, 16, 175. [Google Scholar] [CrossRef]
- Ruan, J.; Cui, H.; Huang, Y.; Li, T.; Wu, C.; Zhang, K. A Review of Occluded Objects Detection in Real Complex Scenarios for Autonomous Driving. In Green Energy and Intelligent Transportation; Elsevier: Amsterdam, The Netherlands, 2023; p. 100092. [Google Scholar]
- Gilroy, S.; Jones, E.; Glavin, M. Overcoming Occlusion in the Automotive Environment—A Review. IEEE Trans. Intell. Transp. Syst. 2019, 22, 23–35. [Google Scholar] [CrossRef]
- Pérez-Hernández, F.; Tabik, S.; Lamas, A.; Olmos, R.; Fujita, H.; Herrera, F. Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowl.-Based Syst. 2020, 194, 105590. [Google Scholar] [CrossRef]
- Kortylewski, A.; He, J.; Liu, Q.; Yuille, A.L. Compositional convolutional neural networks: A deep architecture with innate robustness to partial occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8940–8949. [Google Scholar]
- Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VIII 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 483–499. [Google Scholar]
- Zoph, B.; Cubuk, E.D.; Ghiasi, G.; Lin, T.Y.; Shlens, J.; Le, Q.V. Learning data augmentation strategies for object detection. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXVII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 566–583. [Google Scholar]
- Wang, K.; Gou, C.; Duan, Y.; Lin, Y.; Zheng, X.; Wang, F.Y. Generative adversarial networks: Introduction and outlook. IEEE/CAA J. Autom. Sin. 2017, 4, 588–598. [Google Scholar] [CrossRef]
- Mumuni, A.; Mumuni, F. Robust appearance modeling for object detection and tracking: A survey of deep learning approaches. Prog. Artif. Intell. 2022, 11, 279–313. [Google Scholar] [CrossRef]
- Mumuni, A.; Mumuni, F. Data augmentation: A comprehensive survey of modern approaches. Array 2022, 16, 100258. [Google Scholar] [CrossRef]
- Berclaz, J.; Shahrokni, A.; Fleuret, F.; Ferryman, J.; Fua, P. Evaluation of probabilistic occupancy map people detection for surveillance systems. In Proceedings of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar] [CrossRef]
- Ivamoto, V.; Simões, R.; Kemmer, B.; Lima, C. Occluded Face In-painting Using Generative Adversarial Networks—A Review. In Proceedings of the Brazilian Conference on Intelligent Systems, Belo Horizonte, Brazil, 25–29 September 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 243–258. [Google Scholar]
- Porkodi, S.; Sarada, V.; Maik, V.; Gurushankar, K. Generic image application using GANs (generative adversarial networks): A review. Evol. Syst. 2022, 14, 903–917. [Google Scholar] [CrossRef]
- Sunil, S.; Mozaffari, S.; Singh, R.; Shahrrava, B.; Alirezaee, S. Feature-Based Occupancy Map-Merging for Collaborative SLAM. Sensors 2023, 23, 3114. [Google Scholar] [CrossRef] [PubMed]
- Sun, Y.; Kortylewski, A.; Yuille, A. Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1215–1224. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Sharifani, K.; Amini, M. Machine Learning and Deep Learning: A Review of Methods and Applications. World Inf. Technol. Eng. J. 2023, 10, 3897–3904. [Google Scholar]
- Somers, V.; De Vleeschouwer, C.; Alahi, A. Body part-based representation learning for occluded person Re-Identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 1613–1623. [Google Scholar]
- Deepa, D.; Kupparu, J. A deep learning based stereo matching model for autonomous vehicle. IAES Int. J. Artif. Intell. 2023, 12, 87. [Google Scholar] [CrossRef]
- Liu, J.; Gao, J.; Ji, S.; Zeng, C.; Zhang, S.; Gong, J. Deep learning based multi-view stereo matching and 3D scene reconstruction from oblique aerial images. ISPRS J. Photogramm. Remote Sens. 2023, 204, 42–60. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Sozzi, M.; Cantalamessa, S.; Cogato, A.; Kayad, A.; Marinello, F. Automatic bunch detection in white grape varieties using YOLOv3, YOLOv4, and YOLOv5 deep learning algorithms. Agronomy 2022, 12, 319. [Google Scholar] [CrossRef]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
- Huang, Z.; Li, L.; Krizek, G.C.; Sun, L. Research on Traffic Sign Detection Based on Improved YOLOv8. J. Comput. Commun. 2023, 11, 226–232. [Google Scholar] [CrossRef]
- Sharma, P.; Gupta, S.; Vyas, S.; Shabaz, M. Retracted: Object detection and recognition using deep learning-based techniques. IET Commun. 2023, 17, 1589–1599. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv 2013, arXiv:1312.6229. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Washington, DC, USA, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Sucar, L.E. Probabilistic graphical models. In Advances in Computer Vision and Pattern Recognition; Springer: London, UK, 2015; Volume 10, p. 1. [Google Scholar]
- Wang, R.; Yan, J.; Yang, X. Learning combinatorial embedding networks for deep graph matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republisc of Korea, 27 October–2 November 2019; pp. 3056–3065. [Google Scholar]
- Salehinejad, H.; Sankar, S.; Barfett, J.; Colak, E.; Valaee, S. Recent advances in recurrent neural networks. arXiv 2017, arXiv:1801.01078. [Google Scholar]
- Gong, C.; Wang, D.; Li, M.; Chandra, V.; Liu, Q. Keepaugment: A simple information-preserving data augmentation approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1055–1064. [Google Scholar]
- Chen, P.; Liu, S.; Zhao, H.; Jia, J. Gridmask data augmentation. arXiv 2020, arXiv:2001.04086. [Google Scholar]
- Xiang, Y.; Mottaghi, R.; Savarese, S. Beyond pascal: A benchmark for 3d object detection in the wild. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Steamboat Springs, CO, USA, 24–26 March 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 75–82. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 3354–3361. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11621–11631. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Zhang, S.; Benenson, R.; Schiele, B. Citypersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3213–3221. [Google Scholar]
- Yebes, J.J.; Bergasa, L.M.; Arroyo, R.; Lázaro, A. Supervised learning and evaluation of KITTI’s cars detector with DPM. In Proceedings of the 2014 IEEE Intelligent Vehicles Symposium Proceedings, Ypsilanti, MI, USA, 8–11 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 768–773. [Google Scholar]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Zhao, J.; Chu, J.; Leng, L.; Pan, C.; Jia, T. RGRN: Relation-aware graph reasoning network for object detection. Neural Comput. Appl. 2023, 35, 16671–16688. [Google Scholar] [CrossRef]
Dataset | Description | Number of Classes | Number of Images | Data Type |
---|---|---|---|---|
KITTI [75] | Real-world urban scenes with varying | 3 (Car, Pedestrian, Cyclist) | 7481 | 3D LIDAR scans |
occlusion levels (high occlusion focus) | and camera images | |||
NuScenes [76] | Diverse urban driving scenarios with | 10+ | 1000+ | 3D LIDAR scans |
extensive sensor data (low occlusion focus) | and camera images | |||
Varied indoor and outdoor scenes with | 12 | 2073 | RGB Images | |
OccludedPascal3D [74] | varying occlusions (high occlusion focus) | and point clouds | ||
Diverse scenes and environments for | 20+ | 17,125 | Camera images | |
PascalVOC 2012 [77] | object detection (low occlusion focus) | |||
Urban pedestrian detection with varying | 1 (Pedestrian) | 5000 | Camera images | |
CityPersons [78] | occlusion levels (high occlusion focus) |
Model | AP(%) (KITTI 2D) | (CityPersons) | ||
---|---|---|---|---|
Car | Pedestrian | Cyclist | Person | |
F-RCNN | 57.2 | 53.8 | 48.5 | 79.3 |
YOLOv5s | 65.6 | 63.3 | 58.4 | 82.5 |
YOLOv6s | 66.7 | 64.4 | 58.1 | 84.1 |
YOLOv7 | 59.2 | 58.4 | 47.6 | 80.9 |
YOLOv8s | 67.2 | 63.4 | 60.7 | 85.7 |
YOLO-NAS | 69.2 | 64.3 | 61.7 | 87.3 |
Model | AP per Class | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
aeroplane | bird | bicycle | boat | bottle | bus | car | cat | chair | cow | dining table | dog | horse | motorbike | person | potted plant | sheep | sofa | |
DeepLab-CRF | 79.3 | 76.7 | 78.7 | 77.6 | 76.1 | 78.5 | 79.5 | 74.8 | 76.4 | 73.7 | 78.4 | 73.7 | 78.0 | 78.9 | 79.1 | 76.8 | 74.4 | 77.3 |
RGRN | 76.3 | 73.7 | 75.9 | 74.3 | 72.6 | 75.9 | 76.9 | 71.8 | 73.6 | 71.0 | 76.1 | 70.9 | 75.6 | 75.9 | 76.2 | 73.9 | 70.7 | 75.1 |
YOLONAS-Cutout | 92.4 | 88.7 | 89.8 | 88.3 | 87.9 | 90.1 | 91.3 | 86.6 | 88.3 | 85.8 | 91.2 | 85.9 | 91.0 | 89.3 | 91.4 | 87.8 | 86.5 | 88.6 |
YOLONAS-GridMask | 89.1 | 86.1 | 87.6 | 86.4 | 85.4 | 88.3 | 89.2 | 83.5 | 85.9 | 82.9 | 89.2 | 82.9 | 89.0 | 87.0 | 89.3 | 85.5 | 83.5 | 87.0 |
Network | Data | Car | Pedestrian | Cyclist | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Easy | Moderate | Hard | Easy | Moderate | Hard | Easy | Moderate | Hard | ||
YOLO-NAS | 2D | 97.4 | 80.5 | 69.2 | 92.5 | 76.8 | 64.3 | 89.8 | 73.2 | 61.7 |
DeepLab-CRF | 2D | 79.3 | 70.7 | 68.7 | 76.1 | 69.5 | 65.5 | 74.8 | 66.4 | 63.7 |
RGRN | 2D | 76.3 | 69.7 | 67.9 | 74.3 | 67.6 | 63.9 | 71.8 | 64.6 | 60.0 |
YOLONAS-Cutout | 2D | 98.0 | 85.5 | 73.2 | 94.7 | 80.2 | 69.9 | 90.9 | 75.1 | 70.7 |
YOLONAS-GridMask | 2D | 97.5 | 82.3 | 70.8 | 93.1 | 79.6 | 68.2 | 89.3 | 74.8 | 69.1 |
CompNet | 2D | 81.6 | 76.8 | 72.3 | 78.9 | 71.2 | 66.7 | 75.2 | 69.1 | 65.6 |
YOLO3D | LIDAR | 79.8 | 64.5 | 49.3 | 75.2 | 60.1 | 45.8 | 69.7 | 54.6 | 39.4 |
MV3D | 2D + LIDAR | 92.5 | 88.4 | 86.1 | 89.7 | 84.5 | 81.2 | 82.6 | 79.1 | 75.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ouardirhi, Z.; Mahmoudi, S.A.; Zbakh, M. Enhancing Object Detection in Smart Video Surveillance: A Survey of Occlusion-Handling Approaches. Electronics 2024, 13, 541. https://doi.org/10.3390/electronics13030541
Ouardirhi Z, Mahmoudi SA, Zbakh M. Enhancing Object Detection in Smart Video Surveillance: A Survey of Occlusion-Handling Approaches. Electronics. 2024; 13(3):541. https://doi.org/10.3390/electronics13030541
Chicago/Turabian StyleOuardirhi, Zainab, Sidi Ahmed Mahmoudi, and Mostapha Zbakh. 2024. "Enhancing Object Detection in Smart Video Surveillance: A Survey of Occlusion-Handling Approaches" Electronics 13, no. 3: 541. https://doi.org/10.3390/electronics13030541
APA StyleOuardirhi, Z., Mahmoudi, S. A., & Zbakh, M. (2024). Enhancing Object Detection in Smart Video Surveillance: A Survey of Occlusion-Handling Approaches. Electronics, 13(3), 541. https://doi.org/10.3390/electronics13030541