Multi-Dimensional Double Deep Dynamic Q-Network with Aligned Q-Fusion for Dual-Ring Barrier Traffic Signal Control
Abstract
:1. Introduction
- How to define an action and when to select an action for two concurrent phases in different rings. For this challenge, we present an action definition based on “decision point aligning” in Section 4.
- How to effectively learn hundreds of combinatorial actions for two concurrent phases. For this challenge, we present a training algorithm based on a “multi-dimensional Q-network” in Section 6.2.
- How to ensure that two concurrent lagging phases end simultaneously. For this challenge, we present an action selection mechanism based on “aligned Q-fusion” in the algorithm in Section 6.3.
2. Literature Review
- Phase structure. Nearly all researchers employed a phase structure incompatible with DRBPS. This limitation inhibited the widespread use of their methods in engineering practice.
- Many studies restricted a phase to last only in multiples of a specific number of seconds, such as 5 seconds or 10 seconds. This constraint prevented further fine-tuning of green times to better match traffic demand. There had not yet been a training algorithm developed that fully supports DRBPS with second-by-second signal operation.
2.1. Phase Structure: Incompatibility with DRBPS
2.2. Action Space and Training Algorithm
3. RTSC Problem Within DRBPS and Its Complexities
- Fixed phase sequence. The sequence is namely the order of phases in each ring. It should be pre-determined and remain fixed during certain times of the day. Otherwise, the signals for all movements will turn green or red arbitrarily and unpredictably. This can confuse or even surprise drivers, and increase the risks of red-light-running and rear-end collision.
- Simultaneous ending of two concurrent lagging phases. Given that the two lagging phases precede the same barrier, this constraint ensures both rings cross this barrier together. Otherwise, conflicting vehicle movements will occur, leading to an increased risk of angled collisions. This is because each lagging phase conflicts with both phases that follow the barrier on the intersecting street.
4. Action Definition with Decision Point Aligning
- A case of two leading phases, and . DRBPS allows an independent termination of each phase, subject to only its own minimum green. For these two phases, ADP is the moment that either phase’s minimum green expires (i.e., the “ADP1” moment in the figure). It is the last possible moment to decide whether to give this phase the time beyond its minimum green. At this point, action represents the remaining green times after the respective ends of the minimum green durations of the two phases. and can be unequal. They denote the lengths of time marked with and in the figure, respectively.
- A case of two lagging phases, and . DRBPS requires the simultaneous termination of the two phases, subject to both their minimum greens. For these two phases, ADP is the moment when the minimum greens of both phases expire (i.e., moment “ADP2” in the figure). It is the earliest possible moment at which the two phases are allowed to terminate simultaneously. Here, action represents the remaining green times for the two phases after this moment, instead of after the ends of their own minimum greens. and have to be equal. They denote the lengths of time marked with and in the figure, respectively.
5. Reward and State Definitions
6. Training Algorithm with Multi-Dimensional Aligned Q-Fusion
6.1. Dynamic N-Step Update
6.2. Multi-Dimensional Q-Network
6.3. Aligned Q-Fusion
6.4. Bootstrap Action Value
Algorithm 1: Training algorithm for MD4-AQF |
7. Experiments
Experiment | Involved Method | Analytical Tool | Purpose and Expected Outcome |
---|---|---|---|
#1: agent learning | MD4-AQF | Line graph (Figure 8) | To verify whether the proposed method enables the agent to learn and improve its policy. The intersection performance is expected to gradually improve throughout the learning process. |
#2: versus MFDRL method | MD4-AQF D3ynQN | Line graph (Figure 8) Box plot (Figure 9) | To verify whether the proposed method trains the agent more effectively than another MFDRL method, D3ynQN. The proposed method is expected to achieve better intersection performance than D3ynQN, both during training and at the end of training. |
#3: versus conventional method | MD4-AQF FASC | Box plot (Figure 9) Scatter plot (Figure 10) Statistical test (Table 2) | To verify whether the policy learned via the proposed method outperforms the conventional method FASC. The learned policy is expected to yield better intersection performance compared to FASC in most of the tested traffic demand scenarios. |
8. Discussion
8.1. Experiment 1: Agent Learning
8.2. Experiment 2: Versus D3ynQN Method (Ablation Study)
8.3. Experiment 3: Versus Conventional Control Method
9. Conclusions
- We present a “decision point aligning”-based action definition for the concurrent phases in two rings. This approach produces a consistent action space, which is used to determine the green times for both phases simultaneously.
- We present a “multi-dimensional Q-network”-based training algorithm against the large action space containing 600+ permutations of green times for two phases. It can transform hundreds of combinatorial actions into tens of sub-actions.
- We present “aligned Q-fusion” embedded in the algorithm to ensure that two concurrent lagging phases end simultaneously. This approach can fuse paired sub-action values and generate one remaining green time shared by the two lagging phases.
- Determining the green time for two concurrent phases is essentially a multi-degree-of-freedom (DOF) control problem, where some DOFs operate in parallel but not strictly simultaneously. Our solution is to align the decision points of DOFs with the latest moment at which any DOF must make a decision. Then, the action should be defined as the move of each DOF after its own decision deadline. Therefore, an agent can control several concurrent DOFs at the same time.
- Learning too many combinatorial actions for two phases is essentially a problem of the “curse of dimensionality” caused by controlling multiple DOFs at once. In such a case, the number of actions grows exponentially with the number of DOFs. Our solution is to perform dimension expansion on a DQN-style algorithm to produce multi-dimensional sub-action values. Each sub-action should correspond to the choice of a specific DOF. By doing so, the number of Q-network outputs can become only linear with the number of DOFs.
- Ensuring simultaneous ends of two lagging phases is essentially a problem where all moves of DOFs are required to be identical in some states. Our solution is to align and fuse sub-action values. This is done by averaging the values of the same sub-action over different dimensions. Then, all DOFs should share and take the sub-action that maximizes the average, allowing them to move in sync.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
MFDRL | model-free deep reinforcement learning |
RTSC | real-time traffic signal control |
DRBPS | dual-ring barrier phase structure |
MD4-AQF | multi-dimensional double deep dynamic Q-network with aligned Q-fusion |
DQN | deep Q-network |
ADP | aligned decision point |
D3ynQN | double deep dynamic Q-network |
SGD | stochastic gradient descent |
FASC | fully actuated signal control |
DOF | degree-of-freedom |
References
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Urbanik, T.; Tanaka, A.; Lozner, B.; Lindstrom, E.; Lee, K.; Quayle, S.; Beaird, S.; Tsoi, S.; Ryus, P.; Gettman, D.; et al. NCHRP Report 812: Signal Timing Manual, 2nd ed.; Transportation Research Board: Washington, DC, USA, 2015. [Google Scholar]
- Ma, J.; Wu, F. Learning to coordinate traffic signals with adaptive network partition. IEEE Trans. Intell. Transp. Syst. 2024, 25, 263–274. [Google Scholar] [CrossRef]
- Kim, G.; Kang, J.; Sohn, K. A meta-reinforcement learning algorithm for traffic signal control to automatically switch different reward functions according to the saturation level of traffic flows. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 779–798. [Google Scholar] [CrossRef]
- Kim, G.; Sohn, K. Area-wide traffic signal control based on a deep graph Q-Network (DGQN) trained in an asynchronous manner. Appl. Soft Comput. 2022, 119, 108497. [Google Scholar] [CrossRef]
- Gu, H.; Wang, S.; Ma, X.; Jia, D.; Mao, G.; Lim, E.G.; Wong, C.P.R. Large-scale traffic signal control using constrained network partition and adaptive deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7619–7632. [Google Scholar] [CrossRef]
- Gu, H.; Wang, S.; Ma, X.; Jia, D.; Mao, G.; Lim, E.G.; Wong, C.P.R. Traffic signal optimization for partially observable traffic system and low penetration rate of connected vehicles. Comput.-Aided Civ. Infrastruct. Eng. 2022, 37, 2070–2092. [Google Scholar] [CrossRef]
- Huang, H.; Hu, Z.; Wang, Y.; Lu, Z.; Wen, X. Intersec2vec-TSC: Intersection representation learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7044–7056. [Google Scholar] [CrossRef]
- Yang, T.; Fan, W. Enhancing robustness of deep reinforcement learning based adaptive traffic signal controllers in mixed traffic environments through data fusion and multi-discrete actions. IEEE Trans. Intell. Transp. Syst. 2024, 25, 14196–14208. [Google Scholar] [CrossRef]
- Yang, S.; Yang, B.; Zeng, Z.; Kang, Z. Causal inference multi-agent reinforcement learning for traffic signal control. Inf. Fusion 2023, 94, 243–256. [Google Scholar] [CrossRef]
- Luo, H.; Bie, Y.; Jin, S. Reinforcement learning for traffic signal control in hybrid action space. IEEE Trans. Intell. Transp. Syst. 2024, 25, 5225–5241. [Google Scholar] [CrossRef]
- Mukhtar, H.; Afzal, A.; Alahmari, S.; Yonbawi, S. CCGN: Centralized collaborative graphical transformer multi-agent reinforcement learning for multi-intersection signal free-corridor. Neural Netw. 2023, 166, 396–409. [Google Scholar] [CrossRef]
- Liu, J.; Qin, S.; Su, M.; Luo, Y.; Zhang, S.; Wang, Y.; Yang, S. Traffic signal control using reinforcement learning based on the teacher-student framework. Expert Syst. Appl. 2023, 228, 120458. [Google Scholar] [CrossRef]
- Yang, S.; Yang, B. An inductive heterogeneous graph attention-based multi-agent deep graph infomax algorithm for adaptive traffic signal control. Inf. Fusion 2022, 88, 249–262. [Google Scholar] [CrossRef]
- Haddad, T.A.; Hedjazi, D.; Aouag, S. A deep reinforcement learning-based cooperative approach for multi-intersection traffic signal control. Eng. Appl. Artif. Intell. 2022, 114, 105019. [Google Scholar] [CrossRef]
- Liu, Y.; Luo, G.; Yuan, Q.; Li, J.; Jin, L.; Chen, B.; Pan, R. GPLight: Grouped multi-agent reinforcement learning for large-scale traffic signal control. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence, Macao, China, 19–25 August 2023; Volume 1, pp. 199–207. [Google Scholar] [CrossRef]
- Mei, H.; Li, J.; Shi, B.; Wei, H. Reinforcement learning approaches for traffic signal control under missing data. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence, Macao, China, 19–25 August 2023; Volume 3, pp. 2261–2269. [Google Scholar] [CrossRef]
- Fang, Z.; Zhang, F.; Wang, T.; Lian, X.; Chen, M. MonitorLight: Reinforcement learning-based traffic signal control using mixed pressure monitoring. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, New York, NY, USA, 17–21 October 2022; pp. 478–487. [Google Scholar] [CrossRef]
- Liang, E.; Su, Z.; Fang, C.; Zhong, R. OAM: An option-action reinforcement learning framework for universal multi-intersection control. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 4550–4558. [Google Scholar] [CrossRef]
- Han, G.; Liu, X.; Han, Y.; Peng, X.; Wang, H. CycLight: Learning traffic signal cooperation with a cycle-level strategy. Expert Syst. Appl. 2024, 255, 124543. [Google Scholar] [CrossRef]
- Agarwal, A.; Sahu, D.; Mohata, R.; Jeengar, K.; Nautiyal, A.; Saxena, D.K. Dynamic traffic signal control for heterogeneous traffic conditions using max pressure and reinforcement learning. Expert Syst. Appl. 2024, 254, 124416. [Google Scholar] [CrossRef]
- Zhu, R.; Ding, W.; Wu, S.; Li, L.; Lv, P.; Xu, M. Auto-learning communication reinforcement learning for multi-intersection traffic light control. Knowl.-Based Syst. 2023, 275, 110696. [Google Scholar] [CrossRef]
- Zhang, W.; Yan, C.; Li, X.; Fang, L.; Wu, Y.J.; Li, J. Distributed signal control of arterial corridors using multi-agent deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2023, 24, 178–190. [Google Scholar] [CrossRef]
- Guo, X.; Yu, Z.; Wang, P.; Jin, Z.; Huang, J.; Cai, D.; He, X.; Hua, X.S. Urban traffic light control via active multi-agent communication and supply-demand modeling. IEEE Trans. Knowl. Data Eng. 2023, 35, 4346–4356. [Google Scholar] [CrossRef]
- Zheng, Q.; Xu, H.; Chen, J.; Zhang, D.; Zhang, K.; Tang, G. Double deep Q-network with dynamic bootstrapping for real-time isolated signal control: A traffic engineering perspective. Appl. Sci. 2022, 12, 8641. [Google Scholar] [CrossRef]
- Lee, H.; Han, Y.; Kim, Y. Reinforcement learning for traffic signal control: Incorporating a virtual mesoscopic model for depicting oversaturated traffic conditions. Eng. Appl. Artif. Intell. 2023, 126, 107005. [Google Scholar] [CrossRef]
- Han, Y.; Lee, H.; Kim, Y. Extensible prototype learning for real-time traffic signal control. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 1181–1198. [Google Scholar] [CrossRef]
- Bie, Y.; Ji, Y.; Ma, D. Multi-agent deep reinforcement learning collaborative traffic signal control method considering intersection heterogeneity. Transp. Res. Part C-Emerg. Technol. 2024, 164, 104663. [Google Scholar] [CrossRef]
- Wang, M.; Chen, Y.; Kan, Y.; Xu, C.; Lepech, M.; Pun, M.O.; Xiong, X. Traffic signal cycle control with centralized critic and decentralized actors under varying intervention frequencies. IEEE Trans. Intell. Transp. Syst. 2024, 25, 20085–20104. [Google Scholar] [CrossRef]
- Yazdani, M.; Sarvi, M.; Bagloee, S.A.; Nassir, N.; Price, J.; Parineh, H. Intelligent vehicle pedestrian light (IVPL): A deep reinforcement learning approach for traffic signal control. Transp. Res. Part C-Emerg. Technol. 2023, 149, 103991. [Google Scholar] [CrossRef]
- Li, D.; Zhu, F.; Wu, J.; Wong, Y.D.; Chen, T. Managing mixed traffic at signalized intersections: An adaptive signal control and CAV coordination system based on deep reinforcement learning. Expert Syst. Appl. 2024, 238, 121959. [Google Scholar] [CrossRef]
- Ren, F.; Dong, W.; Zhao, X.; Zhang, F.; Kong, Y.; Yang, Q. Two-layer coordinated reinforcement learning for traffic signal control in traffic network. Expert Syst. Appl. 2024, 235, 121111. [Google Scholar] [CrossRef]
- Ma, D.; Zhou, B.; Song, X.; Dai, H. A deep reinforcement learning approach to traffic signal control with temporal traffic pattern mining. IEEE Trans. Intell. Transp. Syst. 2021, 23, 1–12. [Google Scholar] [CrossRef]
- NEMA TS 2-2021; Traffic Controller Assemblies with NTCIP Requirements, 03.08 ed. National Electrical Manufacturers Association: Rosslyn, VA, USA, 2021.
- Kang, L.; Huang, H.; Lu, W.; Liu, L. Optimizing gate control coordination signal for urban traffic network boundaries using multi-agent deep reinforcement learning. Expert Syst. Appl. 2024, 255, 124627. [Google Scholar] [CrossRef]
- Zhao, Z.; Wang, K.; Wang, Y.; Liang, X. Enhancing traffic signal control with composite deep intelligence. Expert Syst. Appl. 2024, 244, 123020. [Google Scholar] [CrossRef]
- Fang, J.; You, Y.; Xu, M.; Wang, J.; Cai, S. Multi-objective traffic signal control using network-wide agent coordinated reinforcement learning. Expert Syst. Appl. 2023, 229, 120535. [Google Scholar] [CrossRef]
- Bouktif, S.; Cheniki, A.; Ouni, A.; El-Sayed, H. Deep reinforcement learning for traffic signal control with consistent state and reward design approach. Knowl.-Based Syst. 2023, 267, 110440. [Google Scholar] [CrossRef]
- Wang, M.; Wu, L.; Li, M.; Wu, D.; Shi, X.; Ma, C. Meta-learning based spatial-temporal graph attention network for traffic signal control. Knowl.-Based Syst. 2022, 250, 109166. [Google Scholar] [CrossRef]
- Ye, Y.; Zhou, Y.; Ding, J.; Wang, T.; Chen, M.; Lian, X. InitLight: Initial model generation for traffic signal control using adversarial inverse reinforcement learning. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence, Macao, China, 19–25 August 2023; Volume 5, pp. 4949–4958. [Google Scholar] [CrossRef]
- Lin, J.; Zhu, Y.; Liu, L.; Liu, Y.; Li, G.; Lin, L. DenseLight: Efficient control for large-scale traffic signals with dense feedback. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence, Macao, China, 19–25 August 2023; Volume 6, pp. 6058–6066. [Google Scholar] [CrossRef]
- Han, X.; Zhao, X.; Zhang, L.; Wang, W. Mitigating action hysteresis in traffic signal control with traffic predictive reinforcement learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 673–684. [Google Scholar] [CrossRef]
- Zhang, L.; Wu, Q.; Shen, J.; Lü, L.; Du, B.; Wu, J. Expression might be enough: Representing pressure and demand for reinforcement learning based traffic signal control. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; Volume 162, pp. 26645–26654. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; Silver, D. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the 32th AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32, pp. 3215–3222. [Google Scholar] [CrossRef]
- Christodoulou, P. Soft actor-critic for discrete action settings. arXiv 2019, arXiv:1910.07207. [Google Scholar]
- Tavakoli, A.; Pardo, F.; Kormushev, P. Action branching architectures for deep reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32, pp. 4131–4138. [Google Scholar] [CrossRef]
- FHWA. Manual on Uniform Traffic Control Devices for Streets and Highways (MUTCD), 11th ed.; Federal Highway Administration: Washington, DC, USA, 2023. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1861–1870. [Google Scholar]
- Roderick, M.; MacGlashan, J.; Tellex, S. Implementing the deep Q-network. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2017; pp. 1–9. [Google Scholar]
- National Academies of Sciences, Engineering, and Medicine. In Highway Capacity Manual: A Guide for Multimodal Mobility Analysis, 7th ed.; National Academies Press: Washington, DC, USA, 2022.
FASC − MD4-AQF | N | Mean Rank | Sum of Ranks |
---|---|---|---|
Negative Ranks | 49 a | 530.35 | 25,987.00 |
Positive Ranks | 1951 b | 1012.31 | 1,975,013.00 |
Ties | 0 c | ||
Z | −37.729 | ||
Sig. (two-tailed) | 0.000 ** |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zheng, Q.; Xu, H.; Chen, J.; Zhang, K. Multi-Dimensional Double Deep Dynamic Q-Network with Aligned Q-Fusion for Dual-Ring Barrier Traffic Signal Control. Appl. Sci. 2025, 15, 1118. https://doi.org/10.3390/app15031118
Zheng Q, Xu H, Chen J, Zhang K. Multi-Dimensional Double Deep Dynamic Q-Network with Aligned Q-Fusion for Dual-Ring Barrier Traffic Signal Control. Applied Sciences. 2025; 15(3):1118. https://doi.org/10.3390/app15031118
Chicago/Turabian StyleZheng, Qiming, Hongfeng Xu, Jingyun Chen, and Kun Zhang. 2025. "Multi-Dimensional Double Deep Dynamic Q-Network with Aligned Q-Fusion for Dual-Ring Barrier Traffic Signal Control" Applied Sciences 15, no. 3: 1118. https://doi.org/10.3390/app15031118
APA StyleZheng, Q., Xu, H., Chen, J., & Zhang, K. (2025). Multi-Dimensional Double Deep Dynamic Q-Network with Aligned Q-Fusion for Dual-Ring Barrier Traffic Signal Control. Applied Sciences, 15(3), 1118. https://doi.org/10.3390/app15031118