Vehicle Classification Algorithm Based on Improved Vision Transformer
Abstract
:1. Introduction
- We propose an improved vision transformer vehicle classification network, IND-ViT, which designs a local information feature extraction module to make up for the local detail features lost by the vision transformer’s direct image segmentation operation and improve the model’s perception ability.
- Aiming at the problem of misdetection caused by the large similarity of some vehicles, we proposed a sparse attention module based on the attention mechanism, which comprehensively utilized the attention weight information of all coding layers to capture the discernable region in the image and further improved the fine-grained feature representation ability of the model.
- We refer to the contrast loss function to further increase the intra-class consistency and inter-class difference of network learning features. Contrast loss makes the similarity of classification features corresponding to different labels minimum and the similarity of classification features corresponding to the same labels maximum.
- Through extensive testing experiments on datasets such as BIT-Vehicles, CIFAR-10, Oxford Flower-102, and Caltech-101, the results show that, compared with the original ViT network, the accuracy of the improved method in this paper is increased by 1.3%, 1.21%, 7.54%, and 3.60%, respectively, which is superior to other mainstream methods.
2. Related Works
2.1. Vehicle Classification Based on Convolutional Neural Network
2.2. Vehicle Classification Based on Attention Mechanism
3. Improved Vehicle Classification Model of Vision Transformer (IND-ViT)
3.1. Local Detail Feature Extraction Module: Inception D
3.2. Image Partitioning and Location Coding
3.3. Encoder
3.4. Sparse Attention Module
3.5. Loss Function
4. Experimental Results and Analysis
4.1. Datasets and Evaluation Indicators
4.2. Experimental Environment
4.3. Training Parameters
4.4. Comparative Experiment
4.5. Qualitative Analysis
4.6. Ablation Experiment
5. Conclusions
6. Discussion and Outlook
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Won, M. Intelligent traffic monitoring systems for vehicle classification: A survey. IEEE Access 2020, 8, 73340–73358. [Google Scholar] [CrossRef]
- Wang, P.; Ouyang, T.; Zhao, S.; Wang, X.; Ni, Z.; Fan, Y. Intelligent Vehicle Formation System Based on Information Interaction. World Electr. Veh. J. 2024, 15, 252. [Google Scholar] [CrossRef]
- Dai, Z.; Guan, Z.; Chen, Q.; Xu, Y.; Sun, F. Enhanced Object Detection in Autonomous Vehicles through LiDAR—Camera Sensor Fusion. World Electr. Veh. J. 2024, 15, 297. [Google Scholar] [CrossRef]
- Shi, D.; Chu, F.; Cai, Q.; Wang, Z.; Lv, Z.; Wang, J. Research on a Path Tracking Control Strategy for Autonomous Vehicles Based on State Parameter Identification. World Electr. Veh. J. 2024, 15, 295. [Google Scholar] [CrossRef]
- Ressi, D.; Romanello, R.; Piazza, C.; Rossi, S. AI-enhanced blockchain technology: A review of advancements and opportunities. J. Netw. Comput. Appl. 2024, 225, 103858. [Google Scholar] [CrossRef]
- Chen, Z.; Pears, N.; Freeman, M.; Austin, J. Road vehicle classification using support vector machines. In Proceedings of the 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, Shanghai, China, 20–22 November 2009; Volume 4, pp. 214–218. [Google Scholar]
- Kafai, M.; Bhanu, B. Dynamic Bayesian networks for vehicle classification in video. IEEE Trans. Ind. Inform. 2011, 8, 100–109. [Google Scholar] [CrossRef]
- Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
- Kaur, D.; Uslu, S.; Rittichier, K.J.; Durresi, A. Trustworthy artificial intelligence: A review. ACM Comput. Surv. (CSUR) 2022, 55, 1–38. [Google Scholar] [CrossRef]
- Butt, M.A.; Khattak, A.M.; Shafique, S.; Hayat, B.; Abid, S.; Kim, K.I.; Ayub, M.W.; Sajid, A.; Adnan, A. Convolutional neural network based vehicle classification in adverse illuminous conditions for intelligent transportation systems. Complexity 2021, 2021, 6644861. [Google Scholar] [CrossRef]
- Deshpande, S.; Muron, W.; Cai, Y. Vehicle classification. In Computer Vision and Imaging in Intelligent Transportation Systems; John Wiley & Sons: Hoboken, NJ, USA, 2017; pp. 47–79. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; p. 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv 2010, arXiv:11929.2020. [Google Scholar]
- Maungmai, W.; Nuthong, C. Vehicle classification with deep learning. In Proceedings of the 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), Singapore, 23–25 February 2019; pp. 294–298. [Google Scholar]
- Yu, S.; Wu, Y.; Li, W.; Song, Z.; Zeng, W. A model for fine-grained vehicle classification based on deep learning. Neurocomputing 2017, 257, 97–103. [Google Scholar] [CrossRef]
- Ma, Z.; Chang, D.; Xie, J.; Ding, Y.; Wen, S.; Li, X.; Si, Z.; Guo, J. Fine-grained vehicle classification with channel max pooling modified CNNs. IEEE Trans. Veh. Technol. 2019, 68, 3224–3233. [Google Scholar] [CrossRef]
- Jo, S.Y.; Ahn, N.; Lee, Y.; Kang, S.J. Transfer learning-based vehicle classification. In Proceedings of the 2018 International SoC Design Conference (ISOCC), Daegu, Republic of Korea, 12–15 November 2018; pp. 127–128. [Google Scholar]
- Neupane, B.; Horanont, T.; Aryal, J. Real-time vehicle classification and tracking using a transfer learning-improved deep learning network. Sensors 2022, 22, 3813. [Google Scholar] [CrossRef] [PubMed]
- Hasanvand, M.; Nooshyar, M.; Moharamkhani, E.; Selyari, A. Machine learning methodology for identifying vehicles using image processing. Artif. Intell. Appl. 2023, 1, 170–178. [Google Scholar] [CrossRef]
- Zhao, T.; He, J.; Lv, J.; Min, D.; Wei, Y. A comprehensive implementation of road surface classification for vehicle driving assistance: Dataset, models, and deployment. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8361–8370. [Google Scholar] [CrossRef]
- Zhao, D.; Chen, Y.; Lv, L. Deep reinforcement learning with visual attention for vehicle classification. IEEE Trans. Cogn. Dev. Syst. 2016, 9, 356–367. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 568–578. [Google Scholar]
- Zhu, J.; Fang, L.; Ghamisi, P. Deformable convolutional neural networks for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1254–1258. [Google Scholar] [CrossRef]
- Chen, Z.; Xie, L.; Niu, J.; Liu, X.; Wei, L.; Tian, Q. Visformer: The vision-friendly transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 589–598. [Google Scholar]
- d’Ascoli, S.; Touvron, H.; Leavitt, M.L.; Morcos, A.S.; Biroli, G.; Sagun, L. Convit: Improving vision transformers with soft convolutional inductive biases. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 2286–2296. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Peng, Z.; Huang, W.; Gu, S.; Xie, L.; Wang, Y.; Jiao, J.; Ye, Q. Conformer: Local features coupling global representations for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 367–376. [Google Scholar]
- Graham, B.; El-Nouby, A.; Touvron, H.; Stock, P.; Joulin, A.; Jégou, H.; Douze, M. Levit: A vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 12259–12269. [Google Scholar]
- Deshmukh, P.; Satyanarayana, G.S.R.; Majhi, S.; Sahoo, U.K.; Das, S.K. Swin transformer based vehicle detection in undisciplined traffic environment. Expert Syst. Appl. 2023, 213, 118992. [Google Scholar] [CrossRef]
- Roy, S.K.; Deria, A.; Shah, C.; Haut, J.M.; Du, Q.; Plaza, A. Spectral-spatial morphological attention transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Loshchilov, I.; Hutter, F. Fixing weight decay regularization in adam. arXiv 2018, arXiv:1711.05101. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Chen, D.; Guo, B. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12124–12134. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Accuracy | ResNet-50 [33] | ViT [13] | Visformer-Tiny [24] | Levit-256 [28] | CSWin-T [34] | IND-ViT |
---|---|---|---|---|---|---|
Track | 96.30% | 95.80% | 96.20% | 96.00% | 97.00% | 97.60% |
Sedan | 98.50% | 98.40% | 98.50% | 98.90% | 98.20% | 99.20% |
SUV | 84.60% | 84.30% | 84.70% | 85.60% | 85.80% | 87.20% |
Minivan | 99.80% | 99.90% | 99.95% | 99.85% | 99.90% | 99.99% |
Bus | 99.80% | 99.90% | 99.90% | 99.99% | 99.95% | 99.99% |
Microbus | 96.60% | 96.20% | 97.00% | 96.40% | 96.80% | 98.30% |
Average accuracy | 95.93% | 95.75% | 96.05% | 96.12% | 96.28% | 97.05% |
Network Model | Time/ms |
---|---|
ResNet-50 [33] | 8.6372 |
ViT [13] | 9.5764 |
Visformer-tiny [24] | 10.6314 |
Levit-256 [28] | 12.0295 |
CSWin-T [34] | 11.1245 |
IND-ViT | 10.8236 |
Models | CIFAR-10 | OxfordFlowers-102 | Caltech-101 |
---|---|---|---|
ResNet-18 [33] | 0.9328 | 0.7559 | 0.6401 |
ResNet-50 [33] | 0.9435 | 0.7953 | 0.6419 |
ViT [13] | 0.9842 | 0.7267 | 0.6859 |
Visformer-tiny [24] | 0.9644 | 0.7216 | 0.5304 |
Levit-256 [28] | 0.9651 | 0.7185 | 0.4619 |
CSWin-T [34] | 0.9788 | 0.7678 | 0.5950 |
IND-ViT | 0.9963 | 0.8021 | 0.7219 |
Exp | CNN-In D | SAM | BIT-Vehicles | CIFAR-10 | OxfordFlowers-102 | Caltech-101 |
---|---|---|---|---|---|---|
1 | -- | -- | 0.9428 | 0.9512 | 0.7358 | 0.6475 |
2 | √ | -- | 0.9639 | 0.9684 | 0.7537 | 0.6518 |
3 | -- | √ | 0.9785 | 0.9626 | 0.7451 | 0.6630 |
4 | √ | √ | 0.9842 | 0.9961 | 0.7742 | 0.6854 |
Method | Contrast Loss | BIT-Vehicles | CIFAR-10 | OxfordFlowers-102 | Caltech-101 |
---|---|---|---|---|---|
ViT | × | 0.9535 | 0.9842 | 0.7267 | 0.6859 |
ViT | √ | 0.9620 | 0.9894 | 0.7414 | 0.6890 |
IND-ViT | × | 0.9645 | 0.9908 | 0.7834 | 0.7066 |
IND-ViT | √ | 0.9740 | 0.9962 | 0.8037 | 0.7256 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dong, X.; Shi, P.; Tang, Y.; Yang, L.; Yang, A.; Liang, T. Vehicle Classification Algorithm Based on Improved Vision Transformer. World Electr. Veh. J. 2024, 15, 344. https://doi.org/10.3390/wevj15080344
Dong X, Shi P, Tang Y, Yang L, Yang A, Liang T. Vehicle Classification Algorithm Based on Improved Vision Transformer. World Electric Vehicle Journal. 2024; 15(8):344. https://doi.org/10.3390/wevj15080344
Chicago/Turabian StyleDong, Xinlong, Peicheng Shi, Yueyue Tang, Li Yang, Aixi Yang, and Taonian Liang. 2024. "Vehicle Classification Algorithm Based on Improved Vision Transformer" World Electric Vehicle Journal 15, no. 8: 344. https://doi.org/10.3390/wevj15080344
APA StyleDong, X., Shi, P., Tang, Y., Yang, L., Yang, A., & Liang, T. (2024). Vehicle Classification Algorithm Based on Improved Vision Transformer. World Electric Vehicle Journal, 15(8), 344. https://doi.org/10.3390/wevj15080344