Highly Accurate Deep Learning Models for Estimating Traffic Characteristics from Video Data
Abstract
:1. Introduction
- This paper introduces a new affine transformation from the image space to the world space, which is deduced from the referenced literature. The projection matrix is generated according to road markings to obtain a representation of the offset of each vehicle’s position relative to the camera for speed evaluation [10].
- A modified fairness multi-object tracking (mFairMOT) network is applied to strengthen the ability of a typical FairMOT network to detect and track traffic conditions. When utilising mFairMOT and a spatial affine transformation matrix, it is feasible to detect various traffic parameters such as individual vehicle speed, traffic flow, vehicle type, and abnormal incidents like fire, congestion, and parking.
- The developed algorithm can be used not only for extracting real-time traffic parameters but also for historical video data without requiring any calibration.
2. Literature Review
2.1. Sensor Dependent Detection Methods
2.2. Video-Based Computer Vision Detection Methods
2.3. Camera Calibration and Object Tracking Algorithms
3. Methodology
3.1. Video Detection and Tracking Algorithm
- -
- Simultaneously detecting and re-identifying (re-ID) tasks
- -
- Effectively fusing aggregated features
- -
- Person re-ID vs. vehicle re-ID
- -
- Faster learning ability
- -
- Generating high-quality re-ID features
3.2. Vehicle Speed Calculation
3.3. Traffic Volume Calculation
3.4. Zero-Inflated Logarithmic Link for Count Time-Series Model
4. Results Analysis
4.1. Speed Detection Accuracy
4.2. Traffic Volume and Abnormal Traffic Scenarios Detection
- (1)
- Abnormal parking
- (2)
- People on the road
- (3)
- Traffic congestion
- (4)
- Vehicle fire
4.3. Real Application for Video Detection Algorithm in Monthly Crash Prediction
5. Discussion
- (1)
- The lane detection method can be integrated into the model. More detailed dynamic traffic parameters such as lane changing times and lane speed variances could be incorporated into the crash prediction model to enhance the accuracy of real-time crash prediction models.
- (2)
- The causal effect of crash occurrence with dynamic traffic parameters could be analysed. Deep learning computer vision techniques could be applied to extract traffic parameters relating to pre-crash scenarios. The relationship between crash occurrence with influencing variables can be built and the influence of individual variables on crashes can be investigated.
- (3)
- With involvement of traffic operation parameters and consideration of the importance of the temporal spatial structure in hourly traffic crash prediction, our ongoing research proposes a new joint model by combining the time-series generalized regression neural network (TGRNN) model and the binomially weighted convolutional neural network (BWCNN) model. The joint model aims to capture all these characteristics in short-term crash prediction.
6. Conclusions
- (i)
- Contrary to other existing techniques, such as drawing virtual boxes or setting reference lines that should be measured in advance, this method does not need any calibration that requires measuring in advance, meaning that this can be considered a ‘generic’ approach.
- (ii)
- Important parameters can be estimated that could not be estimated by employing existing camera-based video processing methods. This includes lane change behaviour, speed variance, detecting fire, and vehicle trajectories.
- (iii)
- This technique could offer higher accuracy, especially in estimating individual vehicle speed. Therefore, it not only can be applied to real-time videos but can also be used to read existing videos regardless of camera pose and recording angle.
- (iv)
- This paper optimises a fairness multi-object tracking (FairMOT) network by using CSPDarknet53 to strengthen typical FairMOT ability in detecting and tracking traffic scenarios, where a bounding box was obtained by subjecting an encoder–decoder network within FairMOT predictions to non-max suppression, integrating CSPDarknet53 and retaining DLA structure in FairMOT to fuse aggregated features. The modified FairMOT associates the same vehicle across different frames, allowing the obtaining of screen–space distance offset per vehicle [33].
- (v)
- A proportional integral derivative (PID) controller was incorporated as a kernel function σ to suppress sporadic noise in speed caused by precision limitation due to low resolution.
- (vi)
- Finally, a world–space speed (i.e., traveling speed of vehicles) was obtained by transforming the image–space offset with a projection matrix concluded from pre-labelled road markings.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- IHS Markit, Video Surveillance: How Technology and The Cloud is Disrupting The Market. 2016. Available online: https://cdn.ihs.com/www/pdf/IHS-Markit-Technology-Video-surveillance.pdf (accessed on 14 June 2024).
- Sabey, B.; Staughton, G.C. Interacting Roles of Road Environment Vehicle and Road User in Accidents. CESTE 1 Most. 1975. Available online: https://trid.trb.org/View/46132 (accessed on 14 June 2024).
- Hossain, M.; Muromachi, Y. A Bayesian network based framework for real-time crash prediction on the basic freeway segments of urban expressways. Accid. Anal. Prev. 2011, 45, 373–381. [Google Scholar] [CrossRef]
- Zheng, L.; Sayed, T.; Mannering, F. Modeling traffic conflicts for use in road safety analysis: A review of analytic methods and future directions. Anal. Methods Accid. Res. 2021, 29, 100142. [Google Scholar] [CrossRef]
- Formosa, N.; Quddus, M.; Ison, S.; Abdel-Aty, M.; Yuan, J. Predicting real-time traffic conflicts using deep learning. Accid. Anal. Prev. 2020, 136, 105429. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Feng, Y.; Angeloudis, P.; Demiris, Y. Monocular visual traffic surveillance: A review. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14148–14165. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. FairMOT: On the fairness of detection and re-identification in multiple object tracking. arXiv 2020, arXiv:2004.01888. [Google Scholar] [CrossRef]
- Dailey, D.; Cathey, F.; Pumrin, S. The Use of Uncalibrated Roadside CCTV Cameras to Estimate Mean Traffic Speed. December 2021. Available online: https://rosap.ntl.bts.gov/view/dot/14762 (accessed on 16 June 2024).
- Damulo, J.; Dy, R.; Pestaño, S.; Signe, D.; Vasquez, E.; Saavedra, L.; Cañete, E. Video-based traffic density calculator with traffic light control simulation. AIP Conf. Proc. 2020, 2278, 20046. [Google Scholar]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
- Cai, B.; Quddus, M.; Miao, Y. A new modelling approach for predicting disaggregated time-series traffic crashes. In Proceedings of the 102th Transportation Research Board Annual Meeting, Washington, DC, USA, 9–13 January 2022. [Google Scholar]
- Retallack, A.; Ostendorf, B. Relationship between traffic volume and accident frequency at intersections. Int. J. Environ. Res. Public Health 2020, 17, 1393. [Google Scholar] [CrossRef]
- Duivenvoorden, K. The Relationship between Traffic Volume and Road Safety on the Secondary Road Network; SWOV: Den Haag, The Netherlands, 2010. [Google Scholar]
- Eisenberg, D. The mixed effects of precipitation on traffic crashes. Accid. Anal. Prev. 2004, 36, 637–647. [Google Scholar] [CrossRef]
- Shefer, D.; Rietveld, P. Congestion and safety on highways: Towards an analytical model. Urban Stud. 1997, 34, 679–692. [Google Scholar] [CrossRef]
- Milton, J.; Mannering, F. The relationship among highway geometrics, traffic-related elements and motor-vehicle accident frequencies. Transportation 1998, 25, 395–413. [Google Scholar] [CrossRef]
- Ren, H.; Song, Y.; Wang, J.; Hu, Y.; Lei, J. A deep learning approach to the citywide traffic accident risk prediction. In Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 3346–3351. [Google Scholar]
- Heyns, E.; Uniyal, S.; Dugundji, E.; Tillema, F.; Huijboom, C. Predicting traffic phases from car sensor data using machine learning. Procedia Comput. Sci. 2019, 151, 92–99. [Google Scholar] [CrossRef]
- Nithya, M.; Nagarajan, P.; Deepalakshmi, R.; Rani, M.; Swarna, S. Sensor based accident prevention system. J. Comput. Theor. Nanosci. 2020, 17, 1720–1724. [Google Scholar] [CrossRef]
- Ajao, L.A.; Abisoye, B.O.; Jibril, I.Z.; Jonah, I.Z.; Kolo, J.G. In-vehicle traffic accident detection and alerting system using distance-time based parameters and radar range algorithm. In Proceedings of the 2020 IEEE PES/IAS PowerAfrica, Nairobi, Kenya, 25–28 August 2020; pp. 1–5. [Google Scholar]
- Sable, T.; Parate, N.; Nadkar, D.; Shinde, S. Density and time based traffic control system using video processing. ITM Web Conf. 2020, 32, 3028. [Google Scholar] [CrossRef]
- Nguyen, N.; Do, T.; Ngo, T.; Le, D. An evaluation of deep learning methods for small object detection. J. Electr. Comput. Eng. 2020, 2020, 3189691. [Google Scholar] [CrossRef]
- Tian, D.; Zhang, C.; Duan, X.; Wang, X. An automatic car accident detection method based on cooperative vehicle infrastructure systems. IEEE Access 2019, 7, 127453–127463. [Google Scholar] [CrossRef]
- Feng, Y.; Zhao, Y.; Zhang, X.; Batista, S.; Demiris, Y.; Angeloudis, P. Predicting spatio-temporal traffic flow: A comprehensive end-to-end approach from surveillance cameras. Transp. B Transp. Dyn. 2024, 12, 2380915. [Google Scholar] [CrossRef]
- Wang, C.; Dai, Y.; Zhou, W.; Geng, Y. A vision-based video crash detection framework for mixed traffic flow environment considering low-visibility condition. J. Adv. Transp. 2020, 2020, 9194028. [Google Scholar] [CrossRef]
- Yao, Y.; Xu, M.; Wang, Y.; Crandall, D.; Atkins, E. Unsupervised traffic accident detection in first-person videos. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Venetian Macao, Macau, 4–8 November 2019; pp. 273–280. [Google Scholar]
- Huang, X.; He, P.; Rangarajan, A.; Ranka, S. Intelligent intersection: Two-stream convolutional networks for real-time near-accident detection in traffic video. ACM Trans. Spat. Algorithms Syst. 2020, 6, 10. [Google Scholar] [CrossRef]
- Ozbayoglu, M.; Kucukayan, G.; Dogdu, E. A real-time autonomous highway accident detection model based on big data processing and computational intelligence. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 5–8 December 2016; Volume 12, pp. 1807–1813. [Google Scholar]
- Agrawal, A.K.; Agarwal, K.; Choudhary, J.; Bhattacharya, A.; Tangudu, S.; NMakhija, M.; Bakthula, B. Automatic traffic accident detection system using resnet and svm. In Proceedings of the 2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Bangalore, India, 26–27 November 2020; pp. 71–76. [Google Scholar]
- Zu, H.; Xu, Y.; Ma, L.; Fang, J. Vision-based real-time traffic accident detection. In Proceeding of the 11th World Congress on Intelligent Control and Automation, Shenyang, China, 9 June–4 July 2014; pp. 1035–1038. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Lai, H.S. Vehicle Extraction and Modeling, an Effective Methodology for Visual Traffic Surveillance. Ph.D. Thesis, The University of Hong Kong, Hong Kong, 2000. [Google Scholar]
- Bas, E.K.; Crisman, J.D. An easy to install camera calibration for traffic monitoring. In Proceedings of the Conference on Intelligent Transportation Systems, Boston, MA, USA, 9–12 November 1997; pp. 362–366. [Google Scholar]
- Fung, G.; Yung, N.; Pang, G. Camera calibration from road lane markings. Opt. Eng. 2003, 42, 2967–2977. [Google Scholar] [CrossRef]
- Haralick, R.M. Using perspective transformations in scene analysis. Comput. Graph. Image Process. 1980, 13, 191–221. [Google Scholar] [CrossRef]
- Liboschik, T.; Fokianos, K.; Fried, R. Tscount: An R package for analysis of count time series following generalized linear models. J. Stat. Softw. 2017, 82, 1–51. [Google Scholar] [CrossRef]
- An, W.; Wang, H.; Sun, Q.; Xu, J.; Dai, Q.; Zhang, L. A PID controller approach for stochastic optimization of deep networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8522–8531. [Google Scholar]
- Kim, J.; Sung, J.; Park, S. Comparison of faster-rcnn, yolo, and ssd for real-time vehicle type recognition. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics—Asia (ICCE-Asia), Seoul, Republic of Korea, 1–3 November 2020; pp. 1–4. [Google Scholar]
Model Comparison | Benefits | Drawback |
---|---|---|
Visual average speed computer and recorder of OpenCV (VASCAR) [8] | Light computing | Intensively relies on object detection. If the first crossing point of the reference line or the exiting point appears earlier or later, speed will be calculated differently. |
Yolo+DeepSORT [9] | Resolves the time difference problem by not setting a reference line | Suffers from the segregation of the detection and tracking processes. The overall speed detection accuracy is influenced by small disturbances. |
FairMOT [7] | Combines detection and re-ID process | Re-ID accuracy should be improved to be more robust. |
Modified FairMOT | Can be adopted to track vehicles and their trajectory generation | Not apparent drawback. |
Distance (Meter) | Side View | Right Frontal View |
---|---|---|
≤20 | 96% (daytime)/92% (nighttime) | 98% (daytime)/94% (nighttime) |
≤50 | 91%(daytime)/84%(nighttime) | 95% (daytime)/88% (nighttime) |
>50 | 75%(daytime)/70%(nighttime) | 85% (daytime)/80% (nighttime) |
Item | Accuracy | False Positive Reason | Optimisation Solutions |
---|---|---|---|
Traffic Flow | 96% (Daytime), 95% (Nighttime) | Influenced by articulated vehicle | Label more data for special vehicles |
Parking | 92% (Daytime) 90% (Nighttime) | Recognise roadside signs as car, especially at night | Label more data that cater to specific scenarios |
Pedestrian Walking | 93% (Daytime) 89% (Nighttime) | Recognise roadside tree as person occasionally | Label more data for specific scenarios |
Vehicle Fires | 96% (Daytime) 97% (Nighttime) | Sometimes when the intensity of the fire is low, the algorithm does not make alarm. | Label more data for both smoke and flame |
Traffic Congestion | 92% (Daytime) 90% (Nighttime) | Affected by tracking models. Same ID may be assigned twice. | Improve tracking model stability by changing its structure |
Prediction Accuracy | Mean Square Error | |
---|---|---|
Prediction without Dynamic Traffic Data | 78.5% | 0.96 |
Prediction with Dynamic Traffic Data | 83.1% | 0.88 |
Model | IDF1—The Fraction of Correctly Identified Detections over the Average Number of True Detections | Multiple Object Tracking Precision (MOTP)—Measures the Accuracy of Detection Box Localization | Multi-Object Tracking Accuracy—MOTA Measures the Overall Accuracy of both the Tracker and Detection |
---|---|---|---|
mFairMOT | 96.0% | 98.5% | 95.7% |
YOLOv8 | 94.5% | 97.2% | 94.3% |
SORT | 90.8% | 94.3% | 90.5% |
DeepSORT | 93.9% | 95.9% | 93.8% |
Kalman Filter | 88.2% | 91.2% | 87.9% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cai, B.; Feng, Y.; Wang, X.; Quddus, M. Highly Accurate Deep Learning Models for Estimating Traffic Characteristics from Video Data. Appl. Sci. 2024, 14, 8664. https://doi.org/10.3390/app14198664
Cai B, Feng Y, Wang X, Quddus M. Highly Accurate Deep Learning Models for Estimating Traffic Characteristics from Video Data. Applied Sciences. 2024; 14(19):8664. https://doi.org/10.3390/app14198664
Chicago/Turabian StyleCai, Bowen, Yuxiang Feng, Xuesong Wang, and Mohammed Quddus. 2024. "Highly Accurate Deep Learning Models for Estimating Traffic Characteristics from Video Data" Applied Sciences 14, no. 19: 8664. https://doi.org/10.3390/app14198664
APA StyleCai, B., Feng, Y., Wang, X., & Quddus, M. (2024). Highly Accurate Deep Learning Models for Estimating Traffic Characteristics from Video Data. Applied Sciences, 14(19), 8664. https://doi.org/10.3390/app14198664