Selective Scale-Aware Network for Traffic Density Estimation and Congestion Detection in ITS
Abstract
:1. Introduction
- (1)
- A novel framework is proposed for estimating traffic density and congestion level from surveillance videos.
- (2)
- Selective Scale-Aware Network (SSANet) is developed to generate the vehicle density map and estimate end-to-end static congestion factor. SSANet is equipped with the Selective Scale Local Self-Attention (SSLSA) mechanism in its encoder layers, which can effectively hand with scale variation.
- (3)
- A novel holistic traffic flow velocity estimation method is proposed, by utilizing the density map to guide optical flow map analysis.
2. Related Work
2.1. Local Attention in Vision Transformers
2.2. Congestion Detection Method
3. Method
3.1. Static Congestion Quantification
3.2. Structure of the Selective Scale-Aware Network
3.2.1. Backbone
3.2.2. Encoder Layer
3.2.3. Selective Scale Local Self-Attention
3.2.4. Decoder Network
3.3. Traffic Flow Velocity Estimation
3.3.1. Optical Flow Estimation
3.3.2. Density-Map-Guided Traffic Velocity Estimation
3.4. Dynamic Traffic Congestion Detection
3.4.1. Dynamic Congestion Quantification
Algorithm 1: Dynamic Congestion Detection. |
3.4.2. Traffic Congestion Detaction
3.5. Ground Truth Generation
3.5.1. Ground Truth Density Map Generation
3.5.2. Ground Truth Static Congestion Factor Generation
4. Experiment
4.1. Traffic Congestion Detection Dataset Collection
4.2. Experiments for Static Congestion Estimation on COTRS
4.2.1. Evaluation Metrics
4.2.2. Comparison with Object Counting Methods
4.2.3. Comparison with Attention Mechanisms
4.3. Experiments for Congestion Detection on COTRS
4.3.1. Evaluation Metrics
4.3.2. Experimental Results
4.3.3. Evaluation of Processing Speed
4.4. Experiments on the UCSD Dataset
4.4.1. UCSD Dataset
4.4.2. Implementation and Training Process
4.4.3. Experimental Results
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sadollah, A.; Gao, K.; Zhang, Y.; Zhang, Y.; Su, R. Management of traffic congestion in adaptive traffic signals using a novel classification-based approach. Eng. Optim. 2018, 51, 1509–1528. [Google Scholar] [CrossRef]
- Buch, N.; Velastin, S.A.; Orwell, J. A Review of Computer Vision Techniques for the Analysis of Urban Traffic. IEEE Trans. Intell. Transp. Syst. 2011, 12, 920–939. [Google Scholar] [CrossRef]
- Ribeiro, M.V.L.; Samatelo, J.L.A.; Bazzan, A.L.C. A New Microscopic Approach to Traffic Flow Classification Using a Convolutional Neural Network Object Detector and a Multi-Tracker Algorithm. IEEE Trans. Intell. Transp. Syst. 2022, 23, 3797–3801. [Google Scholar] [CrossRef]
- Ke, X.; Shi, L.; Guo, W.; Chen, D. Multi-Dimensional Traffic Congestion Detection Based on Fusion of Visual Features and Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2019, 20, 2157–2170. [Google Scholar] [CrossRef]
- Gao, Y.; Li, J.; Xu, Z.; Liu, Z.; Zhao, X.; Chen, J. A novel image-based convolutional neural network approach for traffic congestion estimation. Expert Syst. Appl. 2021, 180, 115037. [Google Scholar] [CrossRef]
- Wang, Q.; Wan, J.; Yuan, Y. Locality constraint distance metric learning for traffic congestion detection. Pattern Recognit. 2018, 75, 272–281. [Google Scholar] [CrossRef]
- Luo, Z.; Jodoin, P.-M.; Li, S.-Z.; Su, S.-Z. Traffic analysis without motion features. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015. [Google Scholar] [CrossRef]
- Pamula, T. Road traffic conditions classification based on multilevel filtering of image content using convolutional neural networks. IEEE Intell. Transp. Syst. Mag. 2018, 10, 11–21. [Google Scholar] [CrossRef]
- Lin, C.; Hu, X.; Zhan, Y.; Hao, X. MobileNetV2 with Spatial Attention module for traffic congestion recognition in surveillance images. Expert Syst. Appl. 2024, 255, 124701. [Google Scholar] [CrossRef]
- Lin, C.; Hu, X. Efficient crowd density estimation with edge intelligence via structural reparameterization and knowledge transfer. Appl. Soft Comput. 2024, 154, 111366. [Google Scholar] [CrossRef]
- Liu, Z. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021. [Google Scholar] [CrossRef]
- Dong, X. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar] [CrossRef]
- Hassani, A.; Walton, S.; Li, J.; Li, S.; Shi, H. Neighborhood Attention Transformer. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar] [CrossRef]
- Li, Y. LocalViT: Analyzing Locality in Vision Transformers. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023. [Google Scholar] [CrossRef]
- Jiao, J. DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition. IEEE Trans. Multimed. 2023, 25, 8906–8919. [Google Scholar] [CrossRef]
- Yang, J.; Li, C.; Zhang, P.; Dai, X.; Xiao, B.; Yuan, L.; Gao, J. Focal self-attention for local-global interactions in vision transformers. arXiv 2021, arXiv:2107.00641. [Google Scholar]
- Cao, X.; Lan, J.; Yan, P.; Li, X. Vehicle detection and tracking in airborne videos by multi-motion layer analysis. Mach. Vis. Appl. 2011, 23, 921–935. [Google Scholar] [CrossRef]
- Ke, R.; Li, Z.; Tang, J.; Pan, Z.; Wang, Y. Real-Time Traffic Flow Parameter Estimation From UAV Video Based on Ensemble Classifier and Optical Flow. IEEE Trans. Intell. Transp. Syst. 2019, 20, 54–64. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Hu, S.; Wu, J.; Xu, L. Real-time traffic congestion detection based on video analysis. J. Inf. Comput. Sci. 2012, 9, 2907–2914. [Google Scholar]
- Li, K.Q.; Yin, Z.Y.; Zhang, N.; Li, J. A PINN-based modelling approach for hydromechanical behaviour of unsaturated expansive soils. Comput. Geotech. 2024, 169, 106174. [Google Scholar] [CrossRef]
- Zhang, N.; Xu, K.; Yin, Z.Y.; Li, K.Q.; Jin, Y.F. Finite element-integrated neural network framework for elastic and elastoplastic solids. Comput. Methods Appl. Mech. Eng. 2025, 433, 117474. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Hui, T.-W.; Tang, X.; Loy, C.C. A Lightweight Optical Flow CNN—Revisiting Data Fidelity and Regularization. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2555–2569. [Google Scholar] [CrossRef] [PubMed]
- Seifnaraghi, N.; Ebrahimi, S.G.; Ince, E.A. Novel traffic lights signaling technique based on lane occupancy rates. In Proceedings of the 2009 24th International Symposium on Computer and Information Sciences, Guzelyurt, Turkey, 14–16 September 2009; pp. 592–596. [Google Scholar]
- Li, Y.; Zhang, X.; Chen, D. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Liu, W.; Salzmann, M.; Fua, P. Context-Aware Crowd Counting. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Guerrero-Gómez-Olmedo, R.; Torre-Jiménez, B.; López-Sastre, R.; Maldonado-Bascón, S.; Oñoro-Rubio, D. Extremely Overlapping Vehicle Counting. In Pattern Recognition and Image Analysis; Springer: Cham, Switzerland, 2015; pp. 423–431. [Google Scholar]
- Chan, A.; Vasconcelos, N. Classification and retrieval of traffic video using auto-regressive stochastic processes. In Proceedings of the IEEE Proceedings. Intelligent Vehicles Symposium, 2005, Las Vegas, NV, USA, 6–8 June 2005. [Google Scholar] [CrossRef]
- Sobral, A.; Oliveira, L.; Schnitman, L.; Souza, F.D. Highway Traffic Congestion Classification using Holistic Properties. In Proceedings of the Computer Graphics and Imaging/798: Signal Processing, Pattern Recognition and Applications, Innsbruck, Austria, 12–14 February 2013. [Google Scholar] [CrossRef]
- Derpanis, K.G.; Wildes, R.P. Classification of traffic video based on a spatiotemporal orientation analysis. In Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV), Kona, HI, USA, 5–7 January 2011. [Google Scholar] [CrossRef]
- Asmaa, O.; Mokhtar, K.; Abdelaziz, O. Road traffic density estimation using microscopic and macroscopic parameters. Image Vis. Comput. 2013, 31, 887–894. [Google Scholar] [CrossRef]
- Szegedy, C. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef]
Operation | Kernel Size | Stride | Filter Depth | Activation |
---|---|---|---|---|
Conv2D | 1 | 256 | ReLU | |
Conv2D | 1 | 128 | ReLU | |
Conv2D | 1 | 1 | - |
Method | MAE | MSE |
---|---|---|
CSRNet [26] | 0.158 | 0.177 |
CAN [27] | 0.148 | 0.147 |
Repmobilenet [10] | 0.157 | 0.183 |
SSANet (Ours) | 0.117 | 0.158 |
Method | MAE | MSE |
---|---|---|
Baseline | 0.165 | 0.177 |
Baseline + MHSA [23] | 0.142 | 0.154 |
Baseline + Swin [11] | 0.132 | 0.157 |
Baseline + CSwin [12] | 0.162 | 0.173 |
Baseline + SSLSA(ours) | 0.117 | 0.158 |
Method | Accuracy | Precision | Recall | F-Meature |
---|---|---|---|---|
Ke et al. [4] | 0.9213 | 0.9169 | 0.9191 | 0.9180 |
Yolo2 + Kalman Filter [3] | 0.8898 | 0.8772 | 0.8925 | 0.8848 |
Faster R-CNN + DeepSort [3] | 0.9134 | 0.9001 | 0.9181 | 0.9090 |
SA-ResNet [9] | 0.9134 | 0.9001 | 0.9181 | 0.9090 |
SSANet | 0.9606 | 0.9614 | 0.9581 | 0.9598 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jian, C.; Lin, C.; Hu, X.; Lu, J. Selective Scale-Aware Network for Traffic Density Estimation and Congestion Detection in ITS. Sensors 2025, 25, 766. https://doi.org/10.3390/s25030766
Jian C, Lin C, Hu X, Lu J. Selective Scale-Aware Network for Traffic Density Estimation and Congestion Detection in ITS. Sensors. 2025; 25(3):766. https://doi.org/10.3390/s25030766
Chicago/Turabian StyleJian, Cheng, Chenxi Lin, Xiaojian Hu, and Jian Lu. 2025. "Selective Scale-Aware Network for Traffic Density Estimation and Congestion Detection in ITS" Sensors 25, no. 3: 766. https://doi.org/10.3390/s25030766
APA StyleJian, C., Lin, C., Hu, X., & Lu, J. (2025). Selective Scale-Aware Network for Traffic Density Estimation and Congestion Detection in ITS. Sensors, 25(3), 766. https://doi.org/10.3390/s25030766