Dual-View Single-Shot Multibox Detector at Urban Intersections: Settings and Performance Evaluation
Abstract
:1. Introduction
2. Related Works
3. Video Content Analysis System
3.1. Single-Shot Multibox Detector Model
3.1.1. Loss Function
3.1.2. Network Parameters, Training and Testing
3.2. VCA Architecture
3.3. Data Labeling
4. Driver Alert Use Case
4.1. Scenario Definition
4.2. Performance Evaluation
4.2.1. Object Detection Performance
4.2.2. Alert Generation Performance
5. Results
5.1. Object Detection Performance
5.2. Alert Generation Performances
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Founoun, A.; Hayar, A. Evaluation of the concept of the smart city through local regulation and the importance of local initiative. In Proceedings of the 2018 IEEE International Smart Cities Conference (ISC2), Kansas City, MO, USA, 16–19 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Savithramma, R.; Ashwini, B.; Sumathi, R. Smart Mobility Implementation in Smart Cities: A Comprehensive Review on State-of-art Technologies. In Proceedings of the 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 20–22 January 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 10–17. [Google Scholar]
- Celidonio, M.; Di Zenobio, D.; Fionda, E.; Panea, G.G.; Grazzini, S.; Niemann, B.; Pulcini, L.; Scalise, S.; Sergio, E.; Titomanlio, S. Safetrip: A bi-directional communication system operating in s-band for road safety and incident prevention. In Proceedings of the 2012 IEEE 75th Vehicular Technology Conference (VTC Spring), Yokohama, Japan, 6–9 May 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1–6. [Google Scholar]
- Wen, J.; He, Z.; Yang, Y.; Cheng, Y. Study on the factors and management strategy of traffic block incident on Hangzhou Province Highway. In Proceedings of the 2020 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Vientiane, Laos, 11–12 January 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 67–71. [Google Scholar]
- Mauri, A.; Khemmar, R.; Decoux, B.; Ragot, N.; Rossi, R.; Trabelsi, R.; Boutteau, R.; Ertaud, J.Y.; Savatier, X. Deep learning for real-time 3D multi-object detection, localisation, and tracking: Application to smart mobility. Sensors 2020, 20, 532. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jiao, L.; Zhang, R.; Liu, F.; Yang, S.; Hou, B.; Li, L.; Tang, X. New generation deep learning for video object detection: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 3195–3215. [Google Scholar] [CrossRef] [PubMed]
- Chen, Z.; Khemmar, R.; Decoux, B.; Atahouet, A.; Ertaud, J.Y. Real Time Object Detection, Tracking, and Distance and Motion Estimation based on Deep Learning: Application to Smart Mobility. In Proceedings of the 2019 Eighth International Conference on Emerging Security Technologies (EST), Colchester, UK, 22–24 July 2019; pp. 1–6. [Google Scholar] [CrossRef] [Green Version]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Qi, S.; Ning, X.; Yang, G.; Zhang, L.; Long, P.; Cai, W.; Li, W. Review of multi-view 3D object recognition methods based on deep learning. Displays 2021, 69, 102053. [Google Scholar] [CrossRef]
- Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 945–953. [Google Scholar]
- Feng, Y.; Zhang, Z.; Zhao, X.; Ji, R.; Gao, Y. Gvcnn: Group-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 264–272. [Google Scholar]
- Wei, X.; Yu, R.; Sun, J. View-gcn: View-based graph convolutional network for 3d shape analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1850–1859. [Google Scholar]
- Kanezaki, A.; Matsushita, Y.; Nishida, Y. Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5010–5019. [Google Scholar]
- Chavdarova, T.; Baqué, P.; Bouquet, S.; Maksai, A.; Jose, C.; Bagautdinov, T.; Lettry, L.; Fua, P.; Van Gool, L.; Fleuret, F. WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5030–5039. [Google Scholar] [CrossRef]
- Tang, Z.; Naphade, M.; Liu, M.Y.; Yang, X.; Birchfield, S.; Wang, S.; Kumar, R.; Anastasiu, D.; Hwang, J.N. CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8789–8798. [Google Scholar] [CrossRef] [Green Version]
- Wu, H.; Zhang, X.; Story, B.; Rajan, D. Accurate Vehicle Detection Using Multi-camera Data Fusion and Machine Learning. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3767–3771. [Google Scholar] [CrossRef] [Green Version]
- Chavdarova, T.; Fleuret, F. Deep multi-camera people detection. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 848–853. [Google Scholar]
- Dinh, V.Q.; Munir, F.; Azam, S.; Yow, K.C.; Jeon, M. Transfer learning for vehicle detection using two cameras with different focal lengths. Inf. Sci. 2020, 514, 71–87. [Google Scholar] [CrossRef]
- Ciampi, L.; Gennaro, C.; Carrara, F.; Falchi, F.; Vairo, C.; Amato, G. Multi-camera vehicle counting using edge-AI. Expert Syst. Appl. 2022, 207, 117929. [Google Scholar] [CrossRef]
- Unlu, E.; Zenou, E.; Riviere, N.; Dupouy, P.E. Deep learning-based strategies for the detection and tracking of drones using several cameras. IPSJ Trans. Comput. Vis. Appl. 2019, 11, 7. [Google Scholar] [CrossRef] [Green Version]
- Seeland, M.; Mäder, P. Multi-view classification with convolutional neural networks. PLoS ONE 2021, 16, e0245230. [Google Scholar] [CrossRef] [PubMed]
- Ezatzadeh, S.; Keyvanpour, M.R.; Shojaedini, S.V. A human fall detection framework based on multi-camera fusion. J. Exp. Theor. Artif. Intell. 2022, 34, 905–924. [Google Scholar] [CrossRef]
- Saurav, S.; Saini, R.; Singh, S. A dual-stream fused neural network for fall detection in multi-camera and 360° videos. Neural Comput. Appl. 2022, 34, 1455–1482. [Google Scholar] [CrossRef]
- Narteni, S.; Lenatti, M.; Orani, V.; Rampa, V.; Paglialonga, A.; Ravazzani, P.; Mongelli, M. Technology transfer in smart mobility: The driver alert pilot of 5G Genova project. In Proceedings of the 11th World Conference on Information Systems and Technologies (WorldCIST’23), 1st Workshop on Artificial Intelligence for Technology Transfer (WAITT’23), Pisa, Italy, 4–6 April 2023; pp. 1–4. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Charbonnier, P.; Blanc-Feraud, L.; Aubert, G.; Barlaud, M. Deterministic edge-preserving regularization in computed imaging. IEEE Trans. Image Process. 1997, 6, 298–311. [Google Scholar] [CrossRef] [PubMed]
- Barrow, D.; Kourentzes, N.; Sandberg, R.; Niklewski, J. Automatic robust estimation for exponential smoothing: Perspectives from statistics and machine learning. Expert Syst. Appl. 2020, 160, 113637. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
- Kowalczyk, P.; Izydorczyk, J.; Szelest, M. Evaluation Methodology for Object Detection and Tracking in Bounding Box Based Perception Modules. Electronics 2022, 11, 1182. [Google Scholar] [CrossRef]
- Krasin, I.; Duerig, T.; Alldrin, N.; Ferrari, V.; Abu-El-Haija, S.; Kuznetsova, A.; Rom, H.; Uijlings, J.; Popov, S.; Kamali, S.; et al. OpenImages: A Public Dataset for Large-Scale Multi-Label and Multi-Class Image Classification. Available online: https://storage.googleapis.com/openimages/web/index.html (accessed on 11 January 2023).
- Ess, A.; Leibe, B.; Schindler, K.; van Gool, L. A Mobile Vision System for Robust Multi-Person Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08), Anchorage, Alaska, 23–28 June 2008; IEEE: Piscataway, NJ, USA, 2008. [Google Scholar]
- Braun, M.; Krebs, S.; Flohr, F.B.; Gavrila, D.M. EuroCity Persons: A Novel Benchmark for Person Detection in Traffic Scenes. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1844–1861. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dhillon, A.; Verma, G.K. Convolutional neural network: A review of models, methodologies and applications to object detection. Prog. Artif. Intell. 2020, 9, 85–112. [Google Scholar] [CrossRef]
- FFmpeg 5.0. Available online: https://ffmpeg.org/ (accessed on 5 July 2022).
- Jocher, G. YOLOv5 by Ultralytics (Version 7.0)[Computer Software], 2020. Available online: https://zenodo.org/record/7347926/#.ZBGNcnZByUk (accessed on 10 November 2022).
- Informative, A.B. ST 296:2011 - SMPTE Standard; 1280× 720 Progressive Image Sample Structure—Analog and Digital Representation and Analog Interface. 2011. [Google Scholar]
- ONVIF Profiles. Available online: https://www.onvif.org/profiles/ (accessed on 1 March 2023).
- French, R.M. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 1999, 3, 128–135. [Google Scholar] [CrossRef]
- Pang, Y.; Cheng, S.; Hu, J.; Liu, Y. Evaluating the robustness of bayesian neural networks against different types of attacks. In Proceedings of the CVPR 2021 Workshop on Adversarial Machine Learning in Real-World Computer Vision Systems and Online Challenges (AML-CV), Virtual Conference, 19–25 June 2021. [Google Scholar]
Output grids | 24 × 40, 12 × 20, 6 × 10 and 3 × 5 |
Priors | 1 × 1, 2 × 1, 4 × 1, 1 × 4 and 1 × 2 |
# trainable parameters | 5000 |
Learning rate | |
2 | |
IoU threshold |
TLC1 | TLC2 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
#Real Objects | TPobj | FPobj | FNobj | PREobj | RECobj | #Real Objects | TPobj | FPobj | FNobj | PREobj | RECobj | |
Grid: 12 × 20 Prior: 1 × 2 | 0.19 (0.80) | 0.10 (0.55) | 0.04 (0.20) | 0.08 (0.87) | 55% | 54% | 0.71 (1.43) | 0.24 (0.79) | 0.22 (0.57) | 0.40 (0.96) | 43% | 31% |
Grid: 12 × 20 Prior: 2 × 1 | 0.02 (0.31) | 0.02 (0.24) | 0.08 (0.36) | 0.005 (0.13) | 11% | 66% | 0.34 (1.18) | 0.24 (0.99) | 0.28 (0.65) | 0.10 (0.61) | 24% | 67% |
Grid: 6 × 10 Prior: 1 × 2 | 0.05 (0.35) | 0.05 (0.30) | 0.15 (0.42) | 0.02 (0.23) | 19.76% | 63.46% | 0.13 (0.48) | 0.07 (0.36) | 0.07 (0.27) | 0.06 (0.28) | 37% | 43% |
Grid: 6 × 10 Prior: 2 × 1 | 0.005 (0.08) | 0.005 (0.10) | 0.18 (0.47) | 0.00 (0.00) | 1.6% | 100% | 0.08 (0.40) | 0.01 (0.14) | 0.08 (0.35) | 0.07 (0.43) | 15% | 21% |
Grid: 3 × 5 Prior: 1 × 2 | 0.01 (0.14) | 0.00 (0.05) | 0.07 (0.25) | 0.01 (0.13) | 4.11% | 42.86% | - | - | 1.70 (0.59) | - | 0% | - |
Grid: 3 × 5 Prior: 2 × 1 | 0.001 (0.03) | 0.00 (0.00) | 0.08 (0.28) | 0.001 (0.03) | 0% | 0% | 0.07 0.33 | 0.04 0.23 | 1.63 0.56 | 0.03 (0.19) | 1.3% | 61.44% |
TLC1 | TLC2 | |
---|---|---|
PREobj | 17% | 73% |
RECobj | 90% | 89% |
TLC1 | TLC2 | Fusion(TLC1,TLC2) | ||||
---|---|---|---|---|---|---|
Ground Truth Alerts | TPalert | Ground Truth Alerts | TPalert | Ground Truth Alerts | TPalert | |
Grid: 12 × 20 Prior: 1 × 2 | 62 | 54 | 41 | 27 | 76 | 61 |
Grid: 12 × 20 Prior: 2 × 1 | 9 | 6 | 29 | 25 | 34 | 28 |
Grid: 6 × 10 Prior: 1 × 2 | 66 | 49 | 29 | 23 | 76 | 57 |
Grid: 6 × 10 Prior: 2 × 1 | 8 | 8 | 3 | 0 | 11 | 8 |
Grid: 3 × 5 Prior: 1 × 2 | 3 | 2 | 2 | 0 | 5 | 2 |
Grid: 3 × 5 Prior: 2 × 1 | 7 | 4 | 3 | 0 | 10 | 4 |
Ground Truth Alerts | TPalert | TNalert | FPalert | FNalert | FPalert | FNalert | |
---|---|---|---|---|---|---|---|
TLC1 | 89 | 77 | 865 | 46 | 12 | 2 | 0 |
TLC2 | 74 | 59 | 908 | 18 | 15 | 1 | 2 |
fusion(TLC1,TLC2) | 125 | 105 | 827 | 48 | 20 | 3 | 3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lenatti, M.; Narteni, S.; Paglialonga, A.; Rampa, V.; Mongelli, M. Dual-View Single-Shot Multibox Detector at Urban Intersections: Settings and Performance Evaluation. Sensors 2023, 23, 3195. https://doi.org/10.3390/s23063195
Lenatti M, Narteni S, Paglialonga A, Rampa V, Mongelli M. Dual-View Single-Shot Multibox Detector at Urban Intersections: Settings and Performance Evaluation. Sensors. 2023; 23(6):3195. https://doi.org/10.3390/s23063195
Chicago/Turabian StyleLenatti, Marta, Sara Narteni, Alessia Paglialonga, Vittorio Rampa, and Maurizio Mongelli. 2023. "Dual-View Single-Shot Multibox Detector at Urban Intersections: Settings and Performance Evaluation" Sensors 23, no. 6: 3195. https://doi.org/10.3390/s23063195
APA StyleLenatti, M., Narteni, S., Paglialonga, A., Rampa, V., & Mongelli, M. (2023). Dual-View Single-Shot Multibox Detector at Urban Intersections: Settings and Performance Evaluation. Sensors, 23(6), 3195. https://doi.org/10.3390/s23063195