SoC-VRP: A Deep-Reinforcement-Learning-Based Vehicle Route Planning Mechanism for Service-Oriented Cooperative ITS
Abstract
:1. Introduction
2. SoC-VRP and Its Solving Principle with DRL
2.1. Analysis of SoC-VRP
2.2. Reinforcement-Learning-Based Solving Model
2.2.1. Fundamental Theory of DRL
2.2.2. Fundamental DRL Models for VRP and SoC-VRP
3. A Novel DRL-Based SoC-VRP Solution Mechanism
3.1. Design of DRL “Environment”
3.2. Design of DRL “Agent”
3.2.1. The “State” Model
3.2.2. The “Action” Model
3.2.3. Design of “Reward”
3.2.4. Neural Network Model
3.3. Improvement via Rainbow DQN
4. Experiments and Verification
4.1. Establishment of the Simulation Environment
4.2. Experiments and Analysis
4.2.1. Comparison of Variants Addressing the VRP
4.2.2. Optimized Effects in the SoC-VRP
- (i)
- Single random flow;
- (ii)
- Ten random flows.
4.2.3. Larger Traffic Network Verification
4.2.4. Vehicle Proportion and Quantity Comparison
- Increasing vehicle proportion experiments;
- Increasing number of vehicle experiments.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
SoC-VRP | Service-oriented Cooeprative Vehicle Routing Problem |
Appendix A
References
- Laporte, G. The vehicle routing problem: An overview of exact and approximate algorithms. Eur. J. Oper. Res. 1992, 59, 345–358. [Google Scholar] [CrossRef]
- Toth, P.; Vigo, D. The Vehicle Routing Problem; SIAM: Philadelphia, PA, USA, 2002. [Google Scholar]
- Dantzig, G.; Fulkerson, R.; Johnson, S. Solution of a large-scale traveling-salesman problem. J. Oper. Res. Soc. Am. 1954, 2, 393–410. [Google Scholar] [CrossRef]
- Gambardella, L.M.; Dorigo, M. Ant-Q: A Reinforcement Learning approach to the traveling salesman problem. In Machine Learning Proceedings 1995; Prieditis, A., Russell, S., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1995; pp. 252–260. [Google Scholar] [CrossRef]
- Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural Combinatorial Optimization with Reinforcement Learning. arXiv 2016, arXiv:1611.09940. [Google Scholar]
- Liu, F.; Zeng, G. Study of genetic algorithm with reinforcement learning to solve the TSP. Expert Syst. Appl. 2009, 36, 6995–7001. [Google Scholar] [CrossRef]
- Imran, A.; Salhi, S.; Wassan, N.A. A variable neighborhood-based heuristic for the heterogeneous fleet vehicle routing problem. Eur. J. Oper. Res. 2009, 197, 509–518. [Google Scholar]
- Wang, J.; Sun, Y.; Liu, Z.; Yang, P.; Lin, T. Route planning based on floyd algorithm for intelligence transportation system. In Proceedings of the 2007 IEEE International Conference on Integration Technology, Shenzhen, China, 20–24 March 2007; pp. 544–546. [Google Scholar]
- Eisner, J.; Funke, S.; Storandt, S. Optimal route planning for electric vehicles in large networks. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2011. [Google Scholar]
- Chabini, I.; Lan, S. Adaptations of the A* algorithm for the computation of fastest paths in deterministic discrete-time dynamic networks. IEEE Trans. Intell. Transp. Syst. 2002, 3, 60–74. [Google Scholar]
- Stentz, A. The focussed d* algorithm for real-time replanning. In Proceedings of the IJCAI, Montreal, QC, Canada, 20–25 August 1995; Volume 95, pp. 1652–1659. [Google Scholar]
- LaValle, S.M. Rapidly-Exploring Random Trees: A New Tool for Path Planning. 1998. Available online: https://api.semanticscholar.org/CorpusID:14744621 (accessed on 5 September 2023).
- Bell, J.E.; McMullen, P.R. Ant colony optimization techniques for the vehicle routing problem. Adv. Eng. Inform. 2004, 18, 41–48. [Google Scholar]
- Bederina, H.; Hifi, M. A hybrid multi-objective evolutionary optimization approach for the robust vehicle routing problem. Appl. Soft Comput. 2018, 71, 980–993. [Google Scholar]
- Torki, A.; Somhon, S.; Enkawa, T. A competitive neural network algorithm for solving vehicle routing problem. Comput. Ind. Eng. 1997, 33, 473–476. [Google Scholar]
- Du, J.; Li, X.; Yu, L.; Dan, R.; Zhou, J. Multi-depot vehicle routing problem for hazardous materials transportation: A fuzzy bilevel programming. Inf. Sci. 2017, 399, 201–218. [Google Scholar]
- Nazari, M.; Oroojlooy, A.; Snyder, L.; Takác, M. Reinforcement learning for solving the vehicle routing problem. arXiv 2018, arXiv:1802.04240. [Google Scholar]
- Lu, H.; Zhang, X.; Yang, S. A learning-based iterative method for solving vehicle routing problems. In Proceedings of the International conference on learning representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- James, J.; Yu, W.; Gu, J. Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3806–3817. [Google Scholar]
- Zhao, J.; Mao, M.; Zhao, X.; Zou, J. A hybrid of deep reinforcement learning and local search for the vehicle routing problems. IEEE Trans. Intell. Transp. Syst. 2020, 22, 7208–7218. [Google Scholar]
- Koh, S.; Zhou, B.; Fang, H.; Yang, P.; Yang, Z.; Yang, Q.; Guan, L.; Ji, Z. Real-time deep reinforcement learning based vehicle navigation. Appl. Soft Comput. 2020, 96, 106694. [Google Scholar] [CrossRef]
- Zhang, K.; Yang, A.; Su, H.; de La Fortelle, A.; Miao, K.; Yao, Y. Service-Oriented Cooperation Models and Mechanisms for Heterogeneous Driverless Vehicles at Continuous Static Critical Sections. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1867–1881. [Google Scholar] [CrossRef]
- Zhang, X.; Yu, X.; Wu, X. Exponential Rank Differential Evolution Algorithm for Disaster Emergency Vehicle Path Planning. IEEE Access 2021, 9, 10880–10892. [Google Scholar] [CrossRef]
- Yang, B.; Yan, J.; Cai, Z.; Ding, Z.; Li, D.; Cao, Y.; Guo, L. A novel heuristic emergency path planning method based on vector grid map. ISPRS Int. J. Geo-Inf. 2021, 10, 370. [Google Scholar]
- Jotshi, A.; Gong, Q.; Batta, R. Dispatching and routing of emergency vehicles in disaster mitigation using data fusion. Socio-Econ. Plan. Sci. 2009, 43, 1–24. [Google Scholar]
- Özdamar, L.; Demir, O. A hierarchical clustering and routing procedure for large scale disaster relief logistics planning. Transp. Res. Part E Logist. Transp. Rev. 2012, 48, 591–602. [Google Scholar]
- Shelke, M.; Malhotra, A.; Mahalle, P.N. Fuzzy priority based intelligent traffic congestion control and emergency vehicle management using congestion-aware routing algorithm. J. Ambient. Intell. Humaniz. Comput. 2019, 2019, 1–18. [Google Scholar]
- Min, W.; Yu, L.; Chen, P.; Zhang, M.; Liu, Y.; Wang, J. On-demand greenwave for emergency vehicles in a time-varying road network with uncertainties. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3056–3068. [Google Scholar]
- Giri, A.R.; Chen, T.; Rajendran, V.P.; Khamis, A. A Metaheuristic Approach to Emergency Vehicle Dispatch and Routing. In Proceedings of the 2022 IEEE International Conference on Smart Mobility (SM), New Alamein, Egypt, 6–7 March 2022; pp. 27–31. [Google Scholar] [CrossRef]
- Jose, C.; Vijula Grace, K. Optimization based routing model for the dynamic path planning of emergency vehicles. Evol. Intell. 2022, 15, 1425–1439. [Google Scholar] [CrossRef]
- Li, X.; Niu, X.; Liu, G. Spatiotemporal representation learning for rescue route selection: An optimized regularization based method. Electron. Commer. Res. Appl. 2021, 48, 101065. [Google Scholar] [CrossRef]
- Nguyen, V.L.; Hwang, R.H.; Lin, P.C. Controllable Path Planning and Traffic Scheduling for Emergency Services in the Internet of Vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 12399–12413. [Google Scholar] [CrossRef]
- Rout, R.R.; Vemireddy, S.; Raul, S.K.; Somayajulu, D.V. Fuzzy logic-based emergency vehicle routing: An IoT system development for smart city applications. Comput. Electr. Eng. 2020, 88, 106839. [Google Scholar] [CrossRef]
- Su, H.; Zhong, Y.D.; Dey, B.; Chakraborty, A. Emvlight: A decentralized reinforcement learning framework for efficient passage of emergency vehicles. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington DC, USA, 7–14 February 2022; Volume 36, pp. 4593–4601. [Google Scholar]
- Wen, H.; Lin, Y.; Wu, J. Co-Evolutionary Optimization Algorithm Based on the Future Traffic Environment for Emergency Rescue Path Planning. IEEE Access 2020, 8, 148125–148135. [Google Scholar] [CrossRef]
- Wu, J.; Kulcsár, B.; Ahn, S.; Qu, X. Emergency vehicle lane pre-clearing: From microscopic cooperation to routing decision making. Transp. Res. Part B Methodol. 2020, 141, 223–239. [Google Scholar]
- Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; Silver, D. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 3215–3222. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Bellman, R. On the theory of dynamic programming. Proc. Natl. Acad. Sci. USA 1952, 38, 716. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M.A. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- US Bureau of Public Roads; Office of Planning; Urban Planning Division. Traffic Assignment Manual for Application with a Large, High Speed Computer; US Department of Commerce: Washington, DC, USA, 1964.
- Zhang, K.; Zhang, D.; de La Fortelle, A.; Wu, X.; Gregoire, J. State-driven priority scheduling mechanisms for driverless vehicles approaching intersections. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2487–2500. [Google Scholar] [CrossRef]
- Huang, S.; Ontañón, S. A Closer Look at Invalid Action Masking in Policy Gradient Algorithms. arXiv 2020, arXiv:2006.14171. [Google Scholar] [CrossRef]
- van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. Proc. AAAI Conf. Artif. Intell. 2016, 30. [Google Scholar] [CrossRef]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York City, NY, USA, 19–24 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; PMLR: New York, NY, USA, 2016; Volume 48, pp. 1995–2003. [Google Scholar]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
- Fortunato, M.; Azar, M.G.; Piot, B.; Menick, J.; Osband, I.; Graves, A.; Mnih, V.; Munos, R.; Hassabis, D.; Pietquin, O.; et al. Noisy Networks for Exploration. arXiv 2017, arXiv:1706.10295. [Google Scholar]
- Bellemare, M.G.; Dabney, W.; Munos, R. A Distributional Perspective on Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Precup, D., Teh, Y.W., Eds.; PMLR: New York, NY, USA, 2017; Volume 70, pp. 449–458. [Google Scholar]
- Haklay, M.; Weber, P. Openstreetmap: User-generated street maps. IEEE Pervasive Comput. 2008, 7, 12–18. [Google Scholar] [CrossRef]
- István, L. An integrated analysis of processes concerning traffic and vehicle dynamics, development of laboratory applying real traffic scenarios. In Proceedings of the 2016 ASME/IEEE International Conference on Mechatronic and Embedded Systems and Applications (MESA), Auckland, New Zealand, 29–31 August 2016. [Google Scholar]
Params | Definition |
---|---|
V | Vehicle agent |
S/s | Space of states and State |
A/a | Space of actions and Action |
Reward function at step t | |
P | State transition probability matrix |
Attenuated factor | |
Q | Value function |
T | Total number of steps |
Accumulated discounted reward after step t | |
w/b | Weights and biases in the Neural Network |
p | Position in the environment |
Priority of vehicles or roads | |
Length of road | |
Number of vehicles on the road | |
Expected time for vehicles passing this road | |
v | Velocity |
Time difference |
Scene | Reward Element | Value Illustration |
---|---|---|
ITS | ||
5 | ||
SoC-ITS | 1 |
Setting | Network 1 | Network 2 | Network 3 | Network 4 |
---|---|---|---|---|
Edge quantity | 14 | 22 | 36 | 54 |
Junction quantity | 5 | 8 | 13 | 20 |
Average length | 132.94 | 259.51 | 501.11 | 397.04 |
Allowed velocity | 13 | 20 | 20 | 20 |
Hyperparameters | Network 1 | Network 2 | Network 3 | Network 4 |
---|---|---|---|---|
T-network update/learning | 500 | 800 | 1000 | 1000 |
Gamma | 0.5 | 0.6 | 0.7 | 0.8 |
Multisteps | 2 | 3 | 5 | 8 |
Learning rate | 0.0001 | |||
Batch size | 128 | |||
Memory size | 100,000 | |||
Number of atoms | 51 | |||
PER attenuating | 0.6 | |||
PER calculating | 0.4→1 | |||
Noisy initial | 0.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hou, B.; Zhang, K.; Gong, Z.; Li, Q.; Zhou, J.; Zhang, J.; de La Fortelle, A. SoC-VRP: A Deep-Reinforcement-Learning-Based Vehicle Route Planning Mechanism for Service-Oriented Cooperative ITS. Electronics 2023, 12, 4191. https://doi.org/10.3390/electronics12204191
Hou B, Zhang K, Gong Z, Li Q, Zhou J, Zhang J, de La Fortelle A. SoC-VRP: A Deep-Reinforcement-Learning-Based Vehicle Route Planning Mechanism for Service-Oriented Cooperative ITS. Electronics. 2023; 12(20):4191. https://doi.org/10.3390/electronics12204191
Chicago/Turabian StyleHou, Boyuan, Kailong Zhang, Zu Gong, Qiugang Li, Junle Zhou, Jiahao Zhang, and Arnaud de La Fortelle. 2023. "SoC-VRP: A Deep-Reinforcement-Learning-Based Vehicle Route Planning Mechanism for Service-Oriented Cooperative ITS" Electronics 12, no. 20: 4191. https://doi.org/10.3390/electronics12204191
APA StyleHou, B., Zhang, K., Gong, Z., Li, Q., Zhou, J., Zhang, J., & de La Fortelle, A. (2023). SoC-VRP: A Deep-Reinforcement-Learning-Based Vehicle Route Planning Mechanism for Service-Oriented Cooperative ITS. Electronics, 12(20), 4191. https://doi.org/10.3390/electronics12204191