Real-Time Adjustment Method for Metro Systems with Train Delays Based on Improved Q-Learning
Abstract
:1. Introduction
1.1. Literature Review
- (1)
- The first type involves understanding the search process of existing search algorithms. For example, Qu et al. employed reinforcement learning to solve the process of mixed-integer linear programming, designing a reinforcement-learning-based branching strategy [34]. Tang et al. used reinforcement learning to select appropriate cutting planes for the branch-and-cut process [35].
- (2)
- The second type is based on data-driven approaches. For instance, Dündar et al. combined genetic algorithms with artificial neural networks to simulate train dispatchers in conflict resolution [36]. However, these methods may not be suitable for solving the TTR problem efficiently due to their extended response times.
- (3)
- The third type involves using reinforcement learning to directly construct solution strategies, which is more suitable for the TTR problem. In the field of railway operations, D. Šemrov was among the early adopters of the Q-learning algorithm to tackle the single-track train rescheduling problem. However, challenges arose due to the large state vector and scalability issues to other scenarios [37]. Harshad Khadilkar extended the Q-learning algorithm to real-world instances of single-track and multi-track dispatch, reducing the state vector representation and enhancing scalability. Real-time computation remained a challenge [38]. Zhu et al. employed Q-learning to solve the railway timetable rescheduling problem and demonstrated its effectiveness in finding high-quality solutions within a limited training set [39]. Li et al. utilized a multi-agent deep reinforcement learning approach for the TTR problem. However, their method lacked delay information, had limited state generality, and exhibited insufficient scalability [40]. Ning expanded the state representation to include the actual arrival and departure times of trains, providing more information about delays [41]. Some scholars found that learning strategies could only be trained for individual problem instances. For example, Ghasempour et al. trained learning strategies from multiple instances to address delay scenarios at railway intersections [42]. Wang et al. used a policy-based reinforcement learning approach, but their problem only allowed for the delay of one train, which does not capture the simultaneous delays of multiple trains that often occur in reality [43]. In the field of metro systems, Su et al. used Q-learning to simulate metro train operations and adjust schedules. However, their state definition made it challenging to solve large-scale scenarios, leading to the risk of local optima or difficulties in problem-solving [44]. Liao et al. proposed using deep reinforcement learning (DRLA) to minimize energy consumption in metro train timetable rescheduling [45].
- (1)
- Manual adjustment methods rely on the experience and manual intervention of dispatchers when dealing with train delays. However, these methods are constrained by the formulation and adaptability of manual rules, which makes it difficult to cope with the complexity and variability of actual operational scenarios. Optimization model methods can simulate train operations through complex algorithms, but they often struggle to accurately capture various complex factors in real-world scenarios. Simulation methods attempt to predict via simulating the interactions between trains, but they face computational complexity challenges when dealing with large-scale networked operations, making it difficult to rapidly solve problems with high real-time requirements. Heuristic algorithms are prone to getting stuck in local optima and often require significant computation time.
- (2)
- Reinforcement learning has gained considerable attention. Through interactive learning between intelligent agents and the environment, reinforcement learning gradually optimizes decision-making strategies, exhibiting adaptability and intelligence. In the field of metro systems, utilizing reinforcement learning to directly construct solution strategies can better adapt to the complexity and variability of actual operational scenarios.
- (3)
- Currently, most reinforcement learning methods focus on cases where only a single train is delayed, neglecting scenarios where multiple trains experience simultaneous delays, failing to consider the interactions between trains. Furthermore, existing methods often involve complex state representations, which can lead to inefficient problem-solving processes and challenges in ensuring real-time accuracy and prediction.
1.2. Contribution of This Paper
- (1)
- The proposed algorithm narrows down the scope of state variables, providing a clear and concise description of delay situations, which facilitates efficient real-time processing to improve its effectiveness in practical applications.
- (2)
- A simulated annealing dynamic factor is introduced to improve convergence stability and computation speed.
- (3)
- This algorithm is trained on scenarios involving delayed interaction between multiple trains, improving its universality and portability.
2. Problem Description
2.1. Train Operation and Train Delay
2.2. Measures for Train Delays
3. Model Construction
3.1. Model Notations and Assumptions
- (1)
- Due to the limited tracks at stations in metro lines, overtaking strategies for trains are not allowed. This means that there are no overtaking situations in the model.
- (2)
- Trains follow a strategy of stopping at each station, and there are no instances where a train bypasses a station without stopping.
- (3)
- The running time in each block is predetermined based on the ATO (automatic train operation) system, which means block running time levels are determined according to the ATO system.
- (4)
- To ensure the efficient operation of the passenger service, it is preferred that trains are not allowed to stop within the blocks. Even if a train does stop, the doors are not allowed to open within the blocks. Once the train receives block clearance ahead, it immediately departs without any stopping time.
- (5)
- In the case of a long delay, the metro staff may decide to let trains return to the depot. The situation after this operation can be regarded as a scenario of short-term delay.
3.2. Objective Function
3.3. Constraint Conditions
- Train Headway Constraint
- 2.
- Block Running Time Constraint
- 3.
- Station Dwelling Time Constraint
- 4.
- Turnaround Time Constraint
4. TTR Using Reinforcement Learning
4.1. Improved Q-Learning Algorithm
- (1)
- The Q-learning algorithm offers a concise description and representation of actual states by utilizing a limited set of variables. The usage of limited variables makes the algorithm more practical, reducing the complexity of the state space.
- (2)
- Pre-training the Q-value state table for various scenarios facilitates quick retrieval during delays or emergencies, enabling a real-time and dynamic selection of strategies such as detaining specific trains or modifying running time. This efficient table lookup and reusability enables our algorithm to make accurate and rapid adjustments in the dynamically changing train operation environment, ensuring both system efficiency and safety.
- (3)
- The incorporation of a simulated annealing dynamic factor enhances the algorithm’s convergence stability and computational speed. This enables the algorithm to achieve the optimal strategy within a relatively small number of training iterations [50].
4.2. State Definition
4.3. Action
- (1)
- Strategy A corresponds to the normal state, where no action for trains is taken.
- (2)
- Strategy B represents the train movement, where the train chooses a block running time level upon entering the block. The operator represents taking the value of that maximises the function .
- (3)
- Strategy C involves train detention, where the train should determine the duration of detention in the station before its departure. The duration can be a negative value.
4.4. Reward Function
4.5. Algorithm Process
- (1)
- Set the optimal action as the current action .
- (2)
- Select an action randomly.
- (3)
- A random number is generated. According to the Metropolis criterion, the random number is compared with to decide whether to accept the new action. If is larger, accept the action as the current action; otherwise, keep the optimal action unchanged. Here, is the adjusting factor, and represents the number of iterations in the algorithm, both equivalent to the temperature control parameter in the simulated annealing algorithm.
Algorithm 1: Real-time TTR Algorithm based on improved Q-learning with Train Delays |
Step 1. Input the basic information of the transit line, including the planned train timetable and parameters. |
Step 2. Input the delay conditions, determine the number of training iterations |
Step 3. Set current checking time and define the checking time interval, increase the checking time by one interval at each iteration. |
Step 3.1. Perform a sequential check of all stations along the line to determine the state using Equations (11) and (12). |
Step 3.2. For trains that require action, search the Q-value table to identify all potential candidate actions based on the constraint conditions outlined in Equations (2)–(10). |
Step 3.2.1. Calculate the reward value for each candidate action using Equations (15)–(17). |
Step 3.2.2. Select the action to be taken based on the simulated annealing concept. |
Step 3.3. Update the Q-value table using Equation (19). |
Step 4. Repeat the above steps (Step 3) until the desired number of model training iterations is achieved. |
Step 5. Output the optimal reschedule timetable. |
5. Case Study
5.1. Data Input
5.2. Case Results
5.3. Results Analysis
5.4. Sensitivity Analysis
5.5. Analysis of Testing on Different Lines
6. Conclusions
- (1)
- The improved Q-learning algorithm exhibited a stable convergence and rapid computation speed. Compared to no adjustment, manual adjustment, and FIFO methods, it achieved average reductions of approximately 39.09%, 50.16%, and 71.82% in total train delay, respectively. In comparison to the traditional Q-learning algorithm, the average computation time decreased by around 55.89%, and the average iteration count decreased by approximately 65.22%.
- (2)
- The optimized method resulted in a more evenly rescheduled train timetable and aimed to minimize the total delay time caused by train delays. It effectively coordinated the detention of subsequent trains and the adjustment of section running times, ensuring the smooth operation of subsequent trains.
- (3)
- The transferability of the algorithm proposed in this paper is verified through the case study of Shenzhen Metro Line 1. This method demonstrates real-time adjustment capability and promotes the utilization of pre-trained Q-value table in various scenarios.
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Latif, K.; Sharafat, A.; Seo, J. Digital Twin-Driven Framework for TBM Performance Prediction, Visualization, and Monitoring through Machine Learning. Appl. Sci. 2023, 13, 11435. [Google Scholar] [CrossRef]
- Melo, P.C.; Harris, N.G.; Graham, D.J.; Anderson, R.J.; Barron, A. Determinants of delay incident occurrence in urban metros. Transp. Res. Rec. 2011, 2216, 10–18. [Google Scholar] [CrossRef]
- Li, W.; Peng, Q.; Wen, C.; Wang, P.; Lessan, J.; Xu, X. Joint optimization of delay-recovery and energy-saving in a metro system: A case study from China. Energy 2020, 202, 117699. [Google Scholar] [CrossRef]
- Zhou, P.; Chen, L.; Dai, X.; Li, B.; Chai, T. Intelligent prediction of train delay changes and propagation using RVFLNs with improved transfer learning and ensemble learning. IEEE Trans. Intell. Transp. Syst. 2020, 22, 7432–7444. [Google Scholar] [CrossRef]
- Cacchiani, V.; Huisman, D.; Kidd, M.; Kroon, L.; Toth, P.; Veelenturf, L.; Wagenaar, J. An overview of recovery models and algorithms for real-time railway rescheduling. Transp. Res. Part B Methodol. 2014, 63, 15–37. [Google Scholar] [CrossRef]
- Cheng, R.; Song, Y.; Chen, D.; Chen, L. Intelligent localization of a high-speed train using LSSVM and the online sparse optimization approach. IEEE Trans. Intell. Transp. Syst. 2017, 18, 2071–2084. [Google Scholar] [CrossRef]
- Pellegrini, P.; Marlière, G.; Pesenti, R.; Rodriguez, J. RECIFE-MILP: An effective MILP-based heuristic for the real-time railway traffic management problem. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2609–2619. [Google Scholar] [CrossRef]
- Lamorgese, L.; Mannino, C. An exact decomposition approach for the real-time train dispatching problem. Oper. Res. 2015, 63, 48–64. [Google Scholar] [CrossRef]
- Yue, P.; Jin, Y.; Dai, X.; Feng, Z.; Cui, D. Reinforcement learning for online dispatching policy in real-time train timetable rescheduling. IEEE Trans. Intell. Transp. Syst. 2023, 25, 478–490. [Google Scholar] [CrossRef]
- Wang, P.; Ma, L.; Goverde RM, P.; Wang, Q. Rescheduling trains using Petri nets and heuristic search. IEEE Trans. Intell. Transp. Syst. 2015, 17, 726–735. [Google Scholar] [CrossRef]
- Fang, W.; Yang, S.; Yao, X. A survey on problem models and solution approaches to rescheduling in railway networks. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2997–3016. [Google Scholar] [CrossRef]
- Walraevens, J.; Bruneel, H.; Fiems, D.; Wittevrongel, S. Delay analysis of multiclass queues with correlated train arrivals and a hybrid priority/FIFO scheduling discipline. Appl. Math. Model. 2017, 45, 823–839. [Google Scholar] [CrossRef]
- Zhang, C.; Gao, Y.; Yang, L.; Gao, Z.; Qi, J. Joint optimization of train scheduling and maintenance planning in a railway network: A heuristic algorithm using Lagrangian relaxation. Transp. Res. Part B Methodol. 2020, 134, 64–92. [Google Scholar] [CrossRef]
- Zhu, Y.; Goverde, R.M.P. Dynamic railway timetable rescheduling for multiple connected disruptions. Transp. Res. Part C Emerg. Technol. 2021, 125, 103080. [Google Scholar] [CrossRef]
- Pellegrini, P.; Marlière, G.; Rodriguez, J. Optimal train routing and scheduling for managing traffic perturbations in complex junctions. Transp. Res. Part B Methodol. 2014, 59, 58–80. [Google Scholar] [CrossRef]
- Kersbergen, B.; van den Boom, T.; De Schutter, B. Distributed model predictive control for railway traffic management. Transp. Res. Part C Emerg. Technol. 2016, 68, 462–489. [Google Scholar] [CrossRef]
- D’ariano, A.; Pacciarelli, D.; Pranzo, M. A branch and bound algorithm for scheduling trains in a railway network. Eur. J. Oper. Res. 2007, 183, 643–657. [Google Scholar] [CrossRef]
- Min, Y.H.; Park, M.J.; Hong, S.P.; Hong, S.H. An appraisal of a column-generation-based algorithm for centralized train-conflict resolution on a metropolitan railway network. Transp. Res. Part B Methodol. 2011, 45, 409–429. [Google Scholar] [CrossRef]
- Zhan, S.; Wong, S.C.; Shang, P.; Peng, Q.; Xie, J.; Lo, S.M. Integrated railway timetable rescheduling and dynamic passenger routing during a complete blockage. Transp. Res. Part B Methodol. 2021, 143, 86–123. [Google Scholar] [CrossRef]
- Zhou, L.; Tong, L.C.; Chen, J.; Tang, J.; Zhou, X. Joint optimization of high-speed train timetables and speed profiles: A unified modeling approach using space-time-speed grid networks. Transp. Res. Part B Methodol. 2017, 97, 157–181. [Google Scholar] [CrossRef]
- Peng, S.; Yang, X.; Ding, S.; Wu, J.; Sun, H. A dynamic rescheduling and speed management approach for high-speed trains with uncertain time-delay. Inf. Sci. 2023, 632, 201–220. [Google Scholar] [CrossRef]
- Liu, X.; Zhou, M.; Dong, H.; Wu, X.; Li, Y.; Wang, F.Y. ADMM-based joint rescheduling method for high-speed railway timetabling and platforming in case of uncertain perturbation. Transp. Res. Part C Emerg. Technol. 2023, 152, 104150. [Google Scholar] [CrossRef]
- Kumar, N.; Mishra, A. A multi-objective and dictionary-based checking for efficient rescheduling trains. Alex. Eng. J. 2021, 60, 3233–3241. [Google Scholar] [CrossRef]
- Wang, R.; Zhang, Q.; Dai, X.; Yuan, Z.; Zhang, T.; Ding, S.; Jin, Y. An efficient evolutionary algorithm for high-speed train rescheduling under a partial station blockage. Appl. Soft Comput. 2023, 145, 110590. [Google Scholar] [CrossRef]
- Li, K.P.; Gao, Z.Y. An improved car-following model for railway traffic. J. Adv. Transp. 2013, 47, 475–482. [Google Scholar] [CrossRef]
- Corman, F.; Trivella, A.; Keyvan-Ekbatani, M. Stochastic process in railway traffic flow: Models, methods and implications. Transp. Res. Part C Emerg. Technol. 2021, 128, 103167. [Google Scholar] [CrossRef]
- Ketphat, N.; Whiteing, A.; Liu, R. State movement for controlling trains operating under the virtual coupling system. Proc. Inst. Mech. Eng. Part F J. Rail Rapid Transit 2022, 236, 172–182. [Google Scholar] [CrossRef]
- Liu, Y.; Zhou, Y.; Su, S.; Xun, J.; Tang, T. An analytical optimal control approach for virtually coupled high-speed trains with local and string stability. Transp. Res. Part C Emerg. Technol. 2021, 125, 102886. [Google Scholar] [CrossRef]
- Saidi, S.; Koutsopoulos, H.N.; Wilson NH, M.; Zhao, J. Train following model for urban rail transit performance analysis. Transp. Res. Part C Emerg. Technol. 2023, 148, 104037. [Google Scholar] [CrossRef]
- Corman, F.; D’Ariano, A.; Pacciarelli, D.; Pranzo, M. A tabu search algorithm for rerouting trains during rail operations. Transp. Res. Part B Methodol. 2010, 44, 175–192. [Google Scholar] [CrossRef]
- Zhou, M.; Liu, Y.; Mo, H.; Shang, J.; Dong, H. Timetable Rescheduling for High-Speed Railways under Temporary Segment Blockage based on Genetic Simulated Annealing Algorithm. In Proceedings of the 2022 China Automation Congress (CAC), Xiamen, China, 25–27 November 2022; pp. 6867–6872. [Google Scholar]
- He, Z.; Liu, T.; Liu, H. Improved particle swarm optimization algorithms for aerodynamic shape optimization of high-speed train. Adv. Eng. Softw. 2022, 173, 103242. [Google Scholar] [CrossRef]
- Eaton, J.; Yang, S. Dynamic railway junction rescheduling using population based ant colony optimisation. In Proceedings of the 2014 14th UK Workshop on Computational Intelligence (UKCI), Bradford, UK, 8–10 September 2014; pp. 1–8. [Google Scholar]
- Qu, Q.; Li, X.; Zhou, Y.; Zeng, J.; Yuan, M.; Wang, J.; Lv, J.; Liu, K.; Mao, K. An improved reinforcement learning algorithm for learning to branch. arXiv 2022, arXiv:2201.06213. [Google Scholar]
- Tang, Y.; Agrawal, S.; Faenza, Y. Reinforcement learning for integer programming: Learning to cut. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 13–18 July 2020; pp. 9367–9376. [Google Scholar]
- Dündar, S.; Şahin, İ. Train re-scheduling with genetic algorithms and artificial neural networks for single-track railways. Transp. Res. Part C Emerg. Technol. 2013, 27, 1–15. [Google Scholar] [CrossRef]
- Šemrov, D.; Marsetič, R.; Žura, M.; Todorovski, L.; Srdic, A. Reinforcement learning approach for train rescheduling on a single-track railway. Transp. Res. Part B Methodol. 2016, 86, 250–267. [Google Scholar] [CrossRef]
- Khadilkar, H. A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans. Intell. Transp. Syst. 2018, 20, 727–736. [Google Scholar] [CrossRef]
- Zhu, Y.; Wang, H.; Goverde, R.M.P. Reinforcement learning in railway timetable rescheduling. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; p. 9294188. [Google Scholar]
- Li, W.; Ni, S. Train timetabling with the general learning environment and multi-agent deep reinforcement learning. Transp. Res. Part B Methodol. 2022, 157, 230–251. [Google Scholar] [CrossRef]
- Ning, L.; Li, Y.; Zhou, M.; Song, H.; Dong, H. A deep reinforcement learning approach to high-speed train timetable rescheduling under disturbances. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 3469–3474. [Google Scholar]
- Ghasempour, T.; Nicholson, G.L.; Kirkwood, D.; Fujiyama, T.; Heydecker, B. Distributed approximate dynamic control for traffic management of busy railway networks. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3788–3798. [Google Scholar] [CrossRef]
- Wang, Y.; Lv, Y.; Zhou, J.; Yuan, Z.; Zhang, Q.; Zhou, M. A policy-based reinforcement learning approach for high-speed railway timetable rescheduling. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 2362–2367. [Google Scholar]
- Su, B.; Wang, Z.; Su, S.; Tang, T. Metro train timetable rescheduling based on q-learning approach. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6. [Google Scholar]
- Liao, J.; Yang, G.; Zhang, S.; Zhang, F.; Gong, C. A deep reinforcement learning approach for the energy-aimed train timetable rescheduling problem under disturbances. IEEE Trans. Transp. Electrif. 2021, 7, 3096–3109. [Google Scholar] [CrossRef]
- Ran, X.C.; Chen, S.K.; Liu, G.H.; Bai, Y. Energy-efficient approach combining train speed profile and timetable optimisations for metro operations. IET Intell. Transp. Syst. 2020, 14, 1967–1977. [Google Scholar] [CrossRef]
- Zhao, Y.; Ding, X. The Research on Delay Propagation of Urban Rail Transit Operation under Sudden Failure. J. Adv. Transp. 2021, 2021, 8984474. [Google Scholar] [CrossRef]
- Liu, R.; Li, S.; Yang, L. Collaborative optimization for metro train scheduling and train connections combined with passenger flow control strategy. Omega 2020, 90, 101990. [Google Scholar] [CrossRef]
- Peng, J.; Williams, R.J. Incremental multi-step Q-learning. In Machine Learning Proceedings 1994; Morgan Kaufmann: Burlington, MA, USA, 1994; pp. 226–232. [Google Scholar]
- Li, J.; Li, Y.; Su, Q. Sequential recovery of cyber-physical power systems based on improved q-learning. J. Frankl. Inst. 2023, 360, 13692–13711. [Google Scholar] [CrossRef]
Work | Problem Setting | Training Instances | State Definition |
---|---|---|---|
Šemrov et al. [37] | Multiple train delays | Single instance | Train locations, current time and track availability |
Khadilkar et al. [38] | Multiple train delays | Single instance | Track availability around the train to be controlled |
Zhu et al. [39] | Multiple train delays | Single instance | Delay time, location, and local track availability of the current train |
Li et al. [40] | Multiple train delays | Single instance | Departure time, whether to stop at stations, and running direction of the current train |
Ning et al. [41] | Multiple train delays | Single instance | Actual arrival time and departure time of trains |
Ghasempour et al. [42] | Single train delays | Multiple instances | Arrival and departure time of trains entering junction |
Yin et al. [43] | Single train delays | Multiple instances | Planned/actual arrival time and departure time |
Su et al. [44] | Single train delays | Multiple instances | Actual arrival time and the number of passengers onboard |
Liao et al. [45] | Single train delays | Multiple instances | The speed, position, and current driving status of the train |
This paper | Multiple train delays | Multiple instances | Train delays |
Notations | Definition |
---|---|
Set | |
is the maximum index of train services | |
is the end index of stations. | |
Set of physical stations. | |
is the maximum index of block running time levels. | |
Input Parameter | |
Scheduled arrival time of train service at physical station , | |
Scheduled departure time of train service at physical station , | |
The running time of train service using block running time level from station | |
The Boolean variable to decide whether block running time level from station for train service is selected. | |
Minimum train headway on the truck line | |
The block where train service locates | |
Minimum station dwelling time on the truck line | |
Time for door opening after train’s arrival | |
Time for door closing before train’s departure | |
Minimum boarding and alighting time for passengers | |
Maximum station passenger capacity | |
Number of passengers entering the station | |
Minimum turnaround time at the turnaround station | |
Intermediate variables | |
Rescheduled arrival time of train service at station | |
Rescheduled departure time of train service at station | |
Boarding time for passengers | |
Alighting time for passengers | |
Additional boarding and alighting time for passengers | |
Delay time of train service at physical station | |
Current checking time | |
Additional dwelling time of train service at physical station | |
The random action for train service at physical station | |
The optimal action for train service at physical station |
Physical Stations | State | Action | ||||||
---|---|---|---|---|---|---|---|---|
Strategy A | Strategy B | Strategy C/(s) | ||||||
1 | 2 | … | −10 | 10 | … | |||
(1401,1402,…,1417) | (0,0,…,0) | 0 | 0 | 0 | … | 0 | 0 | … |
(1401,1402,…,1417) | (0,0,…,50,…,0) | 0 | 0 | 0 | … | 35 | 20 | … |
(1401,1402,…,1417) | (0,0,…,0) | 20 | 0 | 10 | … | 0 | 10 | … |
(1401,1402,…,1417) | (0,0,…,0) | 30 | 10 | 0 | … | 40 | 0 | … |
(1401,1402,…,1417) | (0,…,50,…,100,…,0) | 15 | 10 | 0 | … | 10 | 10 | … |
No Adjustment | Manual Adjustment | FIFO | Traditional Q-Learning (ε = 0.6) | The Proposed Method | |
---|---|---|---|---|---|
Scenario 1 | |||||
Total delay time (s) | 885 | 510 | 400 | 256 | 256 |
Affected trains | 5 | 5 | 3 | 2 | 2 |
Affected physical stations | 5 | 6 | 3 | 2 | 2 |
Average computation time (min) | 8 | 5.1 | 0.1 | 2.5 | 1.1 |
Average Convergence Iterations | - | - | - | 634 | 235 |
Scenario 2 | |||||
Total delay time (s) | 1750 | 970 | 830 | 480 | 480 |
Affected trains | 10 | 11 | 6 | 4 | 4 |
Affected physical stations | 13 | 12 | 6 | 4 | 4 |
Average computation time (min) | 16 | 12.3 | 0.1 | 5.2 | 2.3 |
Average Convergence Iterations | - | - | - | 763 | 312 |
No. | The Proposed Method | Traditional Q-Learning | ||||
---|---|---|---|---|---|---|
p = 0.002 | p = 0.005 | p = 0.01 | p = 0.02 | ε = 0.6 | ε = 0.4 | |
Delay scenario 1 | 256 (591) * | 256 (363) | 256 (251) | 256 (235) | 256 (634) | 256 (661) |
Delay scenario 2 | 480 (640) | 480 (484) | 480 (334) | 480 (248) | 480 (763) | 480 (782) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, Y.; Li, W.; Luo, Q. Real-Time Adjustment Method for Metro Systems with Train Delays Based on Improved Q-Learning. Appl. Sci. 2024, 14, 1552. https://doi.org/10.3390/app14041552
Hu Y, Li W, Luo Q. Real-Time Adjustment Method for Metro Systems with Train Delays Based on Improved Q-Learning. Applied Sciences. 2024; 14(4):1552. https://doi.org/10.3390/app14041552
Chicago/Turabian StyleHu, Yushen, Wei Li, and Qin Luo. 2024. "Real-Time Adjustment Method for Metro Systems with Train Delays Based on Improved Q-Learning" Applied Sciences 14, no. 4: 1552. https://doi.org/10.3390/app14041552
APA StyleHu, Y., Li, W., & Luo, Q. (2024). Real-Time Adjustment Method for Metro Systems with Train Delays Based on Improved Q-Learning. Applied Sciences, 14(4), 1552. https://doi.org/10.3390/app14041552