Traffic Accident Detection Using Background Subtraction and CNN Encoder–Transformer Decoder in Video Frames
Abstract
:1. Introduction
- A framework that uses YOLOv5 to automatically subtract the road background, minimize background interference, and integrate the CNN encoder and Transformer decoder is proposed to jointly model spatiotemporal relationships between objects, thereby providing an improved understanding of traffic accidents to the model.
- Unlike with previous techniques, this method does not average or concatenate spatiotemporal features. Instead, it extracts features between different time points in the input video frames.
- The framework establishes an end-to-end model that allows parallel processing of the sequential input video frames, enabling the model to simultaneously output detection results for multiple frames. Therefore, it is feasible for application with large-scale datasets.
2. Related Work
3. Traffic-Accident-Detection Framework
3.1. Overview
3.2. Bounding-Box-Masks Extractor
3.3. Traffic-Accident Detector
4. Experiment
4.1. Experimental Environment
4.2. Experimental Data
4.3. Experimental Results
4.4. Ablation Experimental Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Road Safety Facts. Available online: http://www.asirt.org/safe-travel/road-safety-facts/ (accessed on 6 April 2023).
- Damjanović, M.; Stević, Ž.; Stanimirović, D.; Tanackov, I.; Marinković, D. Impact of the Number of Vehicles on Traffic Safety: Multiphase Modeling. Facta Univ. Ser. Mech. Eng. 2022, 20, 177–197. [Google Scholar] [CrossRef]
- Qiu, L.; Li, S.; Sung, Y. 3D-DCDAE: Unsupervised Music Latent Representations Learning Method Based on a Deep 3D Convolutional Denoising Autoencoder for Music Genre Classification. Mathematics 2021, 9, 2274–2290. [Google Scholar] [CrossRef]
- Jang, S.; Li, S.; Sung, Y. Fasttext-based Local Feature Visualization Algorithm for Merged Image-based Malware Classification Framework for Cyber Security and Cyber Defense. Mathematics 2020, 8, 460. [Google Scholar] [CrossRef] [Green Version]
- Qiu, L.; Li, S.; Sung, Y. DBTMPE: Deep Bidirectional Transformers-based Masked Predictive Encoder Approach for Music Genre Classification. Mathematics 2021, 9, 530. [Google Scholar] [CrossRef]
- Zhaoyou, M.; Changjun, W.; Shouen, F.; Shuo, L. Comparative Analysis and Control Strategy for Traffic Accidents in Different Types of Tunnels. In Proceedings of the 2019 5th International Conference on Transportation Information and Safety (ICTIS), Liverpool, UK, 14–17 July 2019; pp. 1132–1136. [Google Scholar]
- Chen, X.; Wu, S.; Shi, C.; Huang, Y.; Yang, Y.; Ke, R.; Zhao, J. Sensing Data Supported Traffic Flow Prediction via Denoising Schemes and ANN: A comparison. IEEE Sens. J. 2020, 20, 14317–14328. [Google Scholar] [CrossRef]
- Ji, S.; Xu, W.; Yang, M.; Kai, Y. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 221–231. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jin, X.; Srinivasan, D.; Cheu, R.L. Classification of Freeway Traffic Patterns for Incident Detection using Constructive Probabilistic Neural Networks. IEEE Trans. Neural Netw. 2001, 12, 1173–1187. [Google Scholar] [CrossRef] [PubMed]
- Liu, G.; Jin, H.; Li, J.; Hu, X.; Li, J. A Bayesian Deep Learning Method for Freeway Incident Detection with Uncertainty Quantification. Accid. Anal. Prev. 2022, 176, 106796. [Google Scholar] [CrossRef] [PubMed]
- Hadi, R.A.; George, L.E.; Mohammed, M.J. A Computationally Economic Novel Approach for Real-Time Moving Multi-Vehicle Detection and Tracking Toward Efficient Traffic Surveillance. Arab. J. Sci. Eng. 2017, 42, 817–831. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent Advances in Convolutional Neural Networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef] [Green Version]
- Yang, D.; Wu, Y.; Sun, F.; Chen, J.; Zhai, D.; Fu, C. Freeway Accident Detection and Classification based on the Multi-Vehicle Trajectory Data and Deep Learning Model. Transp. Res. Part C Emerg. Technol. 2021, 130, 103303. [Google Scholar] [CrossRef]
- Bortnikov, M.; Khan, A.; Khattak, A.M.; Ahmad, M. Accident Recognition via 3D CNNs for Automated Traffic Monitoring in Smart Cities. In Proceedings of the 2019 Computer Vision Conference (CVC), Las Vegas, NV, USA, 25–26 April 2019; pp. 256–264. [Google Scholar]
- Tian, D.; Zhang, C.; Duan, X.; Wang, X. An Automatic Car Accident Detection Method based on Cooperative Vehicle Infrastructure Systems. IEEE Access 2019, 7, 127453–127463. [Google Scholar] [CrossRef]
- Ijjina, E.P.; Chand, D.; Gupta, S.; Goutham, K. Computer Vision-Based Accident Detection in Traffic Surveillance. In Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kanpur, India, 6–8 July 2019; pp. 1–6. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Humeau, S.; Shuster, K.; Lachaux, M.A.; Weston, J. Poly-Encoders: Transformer Architectures and Pre-Training Strategies for Fast and Accurate Multi-Sentence Scoring. arXiv 2019, arXiv:1905.01969. [Google Scholar]
- Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in Transformer. In Proceedings of the 2021 35th Advances in Neural Information Processing Systems (NIPS), Virtual, 7–10 December 2021; Volume 34, pp. 15908–15919. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Alan, Y.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder–Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, T.; Polosukhin, I. Attention is All You Need. In Proceedings of the 2017 31st Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Chan, F.H.; Chen, Y.T.; Xiang, Y.; Sun, M. Anticipating Accidents in Dashcam Videos. In Revised Selected Papers, Part IV 13, Proceedings of the 13th Asian Conference on Computer Vision (ACCV), Taipei, Taiwan, 20–24 November 2016; Springer International Publishing: Berlin/Heidelberg, Germany; pp. 136–153.
- Li, X.; Porikli, F.M. A Hidden Markov Model Framework for Traffic Event Detection Using Video Features. In Proceedings of the IEEE 11th International Conference on Image Processing (ICIP), Singapore, 24–27 October 2004; pp. 2901–2904. [Google Scholar]
- Kamijo, S.; Matsushita, Y.; Ikeuchi, K.; Sakauchi, M. Traffic Monitoring and Accident Detection at Intersections. IEEE Trans. Intell. Transp. Syst. 2000, 1, 108–118. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Z. Attention based Stack ResNet for Citywide Traffic Accident Prediction. In Proceedings of the 2019 20th IEEE International Conference on Mobile Data Management (MDM), Hong Kong, China, 10–13 June 2019. [Google Scholar]
- Jiang, F.; Yuen, K.K.R.; Lee, E.W.M. A Long Short-Term Memory-Based Framework for Crash Detection on Freeways with Traffic Data of Different Temporal Resolutions. Accid. Anal. Prev. 2020, 141, 105520. [Google Scholar] [CrossRef] [PubMed]
- Huang, X.; He, P.; Rangarajan, A.; Ranka, S. Intelligent Intersection: Two-Stream Convolutional Networks for Real-Time Near-Accident Detection in Traffic Video. ACM Trans. Spat. Algorithms Syst. (TSAS) 2020, 6, 1–28. [Google Scholar] [CrossRef] [Green Version]
- Le, T.N.; Ono, S.; Sugimoto, A.; Kawasaki, H. Attention R-CNN for Accident Detection. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Melbourne, Australia, 7–11 September 2020; pp. 313–320. [Google Scholar]
- Kang, M.; Lee, W.; Hwang, K.; Yoon, Y. Vision Transformer for Detecting Critical Situations and Extracting Functional Scenario for Automated Vehicle Safety Assessment. Sustainability 2022, 14, 9680. [Google Scholar] [CrossRef]
- Bao, W.; Yu, Q.; Kong, Y. Uncertainty-Based Traffic Accident Anticipation with Spatio–Temporal Relational Learning. In Proceedings of the 28th ACM International Conference on Multimedia, New York, NY, USA, 12–16 October 2020; pp. 2682–2690. [Google Scholar]
Recent Research | Neural Networks | Representation of Data | Type of Features | Parallel Structure | Strengthen Understanding |
---|---|---|---|---|---|
GMHMM [26] | - | Traffic Patterns | - | × | × |
MRF [27] | - | Event Patterns | - | × | × |
ASRAP [28] | ResNet, Attention | City Data | Spatiotemporal | × | √ |
LSTMDTR [29] | LSTM | Traffic Data | Temporal | × | × |
Two-Stream CNNs [30] | CNN | Object Trajectories | Spatiotemporal | × | √ |
Attention R-CNN [31] | CNN, Attention | Object Bounding Boxes | Spatial | × | × |
The Proposed Framework | CNN Encoder, Transformer Decoder | Bounding Box Masks | Spatiotemporal | √ | √ |
Hyperparameter | Proposed Framework | DCNN [15] | LSTMDTR [29] | ViT-TA [32] |
---|---|---|---|---|
Bounding Box Masks Dimension | (224, 224) | (224, 224) | (224, 224) | (224, 224) |
Max Sequence Length | 256 | 256 | 256 | 256 |
Attention Heads | 8 | - | - | 8 |
Dropout | 0.3 | 0.3 | 0.4 | 0.3 |
Batch Size | 128 | 128 | 128 | 64 |
Learning Rate | 2 × 10−6 | 1 × 10−5 | 3 × 10−4 | 1 × 10−6 |
Decay learning rate | 2 × 10−4 | 1 × 10−4 | 1 × 10−4 | 1 × 10−4 |
Total Training Epochs | 1000 | 1000 | 1000 | 1000 |
Optimizer | SGD | SGD | Adam | Radam |
Objective Function | Softmax | Softmax | Softmax | Softmax |
Training Speed | 4.06 it/s | 1.87 it/s | 2.09 it/s | 3.56 it/s |
Car Crash Dataset | Value |
---|---|
Preprocessed Videos | 179 |
Frames per Videos | 50 |
Total Coordinates Files | 179 |
Total Frames | 8950 |
Frames Labeled as Accident | 2496 |
Frames Labeled as Non-Accident | 6454 |
Total Bounding Box Masks | 8950 |
Training Data | 7160 (80%) |
Validation Data | 1790 (20%) |
Traffic Scenarios | Frame | |
---|---|---|
Daytime | ||
Nighttime | ||
Snowy | ||
Rainy | ||
Low Traffic Volumes | ||
High Traffic Volumes |
Method | Precision | Recall | F1-Score | Accuracy |
---|---|---|---|---|
Subtracting Background | 0.98 | 0.98 | 0.97 | 0.96 |
Without Subtracting Background | 0.95 | 0.95 | 0.96 | 0.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Sung, Y. Traffic Accident Detection Using Background Subtraction and CNN Encoder–Transformer Decoder in Video Frames. Mathematics 2023, 11, 2884. https://doi.org/10.3390/math11132884
Zhang Y, Sung Y. Traffic Accident Detection Using Background Subtraction and CNN Encoder–Transformer Decoder in Video Frames. Mathematics. 2023; 11(13):2884. https://doi.org/10.3390/math11132884
Chicago/Turabian StyleZhang, Yihang, and Yunsick Sung. 2023. "Traffic Accident Detection Using Background Subtraction and CNN Encoder–Transformer Decoder in Video Frames" Mathematics 11, no. 13: 2884. https://doi.org/10.3390/math11132884
APA StyleZhang, Y., & Sung, Y. (2023). Traffic Accident Detection Using Background Subtraction and CNN Encoder–Transformer Decoder in Video Frames. Mathematics, 11(13), 2884. https://doi.org/10.3390/math11132884