Research on Deep Reinforcement Learning Control Algorithm for Active Suspension Considering Uncertain Time Delay
Abstract
:1. Introduction
- To our knowledge, this study represents the first research endeavor to employ DRL techniques within the realm of delay control. The primary aim of this investigation is to alleviate the repercussions of uncertain delays through the implementation of active suspension control strategies rooted in DRL. Furthermore, this study demonstrates the utilization of high-dimensional advantages of DRL in an infinite-dimensional delay control system, ultimately achieving commendable results.
- In this study, multiple sets of simulation considering deterministic delay, semi-regular delay, and uncertain delay are proposed to test the control performance of the algorithm. Various delay characteristics and uncertainty of the control system under their influence are considered to make the simulation closer to the actual working conditions.
- The control algorithm proposed in this study maintains good control performance in multiple sets of the simulation built by MATLAB/Simulink for different working conditions and speeds, proving its application potential.
2. Dynamics Model of Active Suspension System Considering Time Delay
2.1. Active Suspension Quarter Model
2.2. Road Model
3. Controller Algorithm
3.1. Reinforcement Learning
3.2. Deep Reinforcement Learning
3.3. Twin-Delayed Deep Deterministic Policy Gradient
Algorithm 1 TD3 | |
1 | Randomly initialize critic networks , , and actor network with random parameters , , and . |
2 | Initialize target networks |
3 | Initialize replay buffer |
4 | for episode = 1 to Max episodes, do |
5 | Initialize a Gaussian random process for action exploration. |
6 | Receive the initial observation state of the environment. |
7 | for t = 1 to Max steps, do |
8 | Select action according to the current policy and exploration noise. |
9 | Execute action and observe reward and observe the new state . |
10 | Store transition tuple in |
11 | Sample mini-batch of N transitions from |
12 | Set |
13 | Update parameters : |
14 | if t mod delayed update frequency, then |
15 | Update the parameters : |
16 | Soft update target networks: |
17 | end if |
18 | end for |
19 | end for |
4. Controller Model
4.1. State
4.2. Action
4.3. Reward
5. Simulation and Results
5.1. Implementation Details
5.2. Deterministic Delayed Conditions
5.3. Semi-Regular Delayed Conditions
5.4. Uncertain Delay Conditions
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Notations and Abbreviations
Parameters | Notation | Unit |
---|---|---|
Inherent delay | ||
Actuator delay | ||
Sprung mass | ||
Unsprung mass | ||
Spring stiffness | ||
Equivalent tire stiffness | ||
Equivalent damping factor | ||
Vertical displacement of the sprung mass | ||
Vertical displacement of the unsprung mass | ||
Road displacement | ||
Active suspension actuator control force | ||
Speed | ||
Lower cutoff frequency | — | |
Road unevenness coefficient | — | |
Uniformly distributed white noise | — | |
Spatial cutoff frequency | — | |
State at time t | — | |
Action at time t | — | |
State at time t + 1 | — | |
Reward at time t | — | |
Critic network | — | |
Actor network | — | |
Target critic network | — | |
Target actor network | — | |
Discount factor | — | |
Soft update factor | — | |
Exploration noise | — | |
Smoothing noise | — | |
Normalized coefficient | — | |
Maximum travel value of the suspension | — | |
Static load of the tire | — | |
Weight coefficient | — | |
Unit delay amount | ||
Unit amount determined force |
Abbreviation | Full Name |
---|---|
DDPG | Deep Deterministic Policy Gradient |
DL | Deep Learning |
DQN | Deep Q Network |
DRL | Deep Reinforcement Learning |
LQR | Linear Quadratic Regulator |
MDP | Markov Decision Processes |
PPO | Proximal Policy Optimization |
RL | Reinforcement Learning |
RMS | Root Mean Square |
SAC | Soft Actor–Critic |
TD3 | Twin-Delayed Deep Deterministic Policy Gradient |
Appendix B. Hyperparametric Portfolio Experiment
Neuronal Assemblies | 32–64 | 64–128 | 128–256 | 256–512 |
---|---|---|---|---|
Optimization | 24.99% | 30.30% | 35.62% | 32.82% |
Optimization | 0.1 | 0.01 | 0.001 |
---|---|---|---|
0.1 | 16.77% | 23.79% | 23.72% |
0.01 | 28.59% | 31.98% | 28.82% |
0.001 | 31.00% | 35.62% | 32.81% |
References
- Yan, G.H.; Wang, S.H.; Guan, Z.W.; Liu, C.F. PID Control Strategy of Vehicle Active Suspension Based on Considering Time-Delay and Stability. Adv. Mater. Res. 2013, 706–708, 901–906. [Google Scholar] [CrossRef]
- Xu, J.; Chung, K.W. Effects of Time Delayed Position Feedback on a van Der Pol–Duffing Oscillator. Phys. D Nonlinear Phenom. 2003, 180, 17–39. [Google Scholar] [CrossRef]
- Zhang, H.; Wang, X.-Y.; Lin, X.-H. Topology Identification and Module–Phase Synchronization of Neural Network with Time Delay. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 885–892. [Google Scholar] [CrossRef]
- Min, H.; Lu, J.; Xu, S.; Duan, N.; Chen, W. Neural Network-Based Output-Feedback Control for Stochastic High-Order Non-Linear Time-Delay Systems with Application to Robot System. IET Control. Theory Appl. 2017, 11, 1578–1588. [Google Scholar] [CrossRef]
- Chen, X.; Leng, S.; He, J.; Zhou, L.; Liu, H. The Upper Bounds of Cellular Vehicle-to-Vehicle Communication Latency for Platoon-Based Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2023, 24, 6874–6887. [Google Scholar] [CrossRef]
- Li, J.; Liu, X.; Xiao, M.; Lu, G. A Planning Control Strategy Based on Dynamic Safer Buffer to Avoid Traffic Collisions in an Emergency for CAVs at Nonsignalized Intersections. J. Transp. Eng. Part A Syst. 2023, 149, 04023066. [Google Scholar] [CrossRef]
- Xu, L.; Ma, J.; Zhang, S.; Wang, Y. Car Following Models for Alleviating the Degeneration of CACC Function of CAVs in Weak Platoon Intensity. Transp. Lett. 2023, 15, 1–13. [Google Scholar] [CrossRef]
- Samiayya, D.; Radhika, S.; Chandrasekar, A. An Optimal Model for Enhancing Network Lifetime and Cluster Head Selection Using Hybrid Snake Whale Optimization. Peer-to-Peer Netw. Appl. 2023, 16, 1959–1974. [Google Scholar] [CrossRef]
- Reddy, P.Y.; Saikia, L.C. Hybrid AC/DC Control Techniques with Improved Harmonic Conditions Using DBN Based Fuzzy Controller and Compensator Modules. Syst. Sci. Control Eng. 2023, 11, 2188406. [Google Scholar] [CrossRef]
- Wang, R.; Jorgensen, A.B.; Liu, W.; Zhao, H.; Yan, Z.; Munk-Nielsen, S. Voltage Balancing of Series-Connected SiC Mosfets with Adaptive-Impedance Self-Powered Gate Drivers. IEEE Trans. Ind. Electron. 2023, 70, 11401–11411. [Google Scholar] [CrossRef]
- Klockiewicz, Z.; Slaski, G. Comparison of Vehicle Suspension Dynamic Responses for Simplified and Advanced Adjustable Damper Models with Friction, Hysteresis and Actuation Delay for Different Comfort-Oriented Control Strategies. Acta Mech. Autom. 2023, 17, 1–15. [Google Scholar] [CrossRef]
- Ji, G.; Li, S.; Feng, G.; Wang, H. Enhanced Variable Universe Fuzzy Control of Vehicle Active Suspension Based on Adaptive Contracting-Expanding Factors. Int. J. Fuzzy Syst. 2023, 1–15. [Google Scholar] [CrossRef]
- Han, S.-Y.; Zhang, C.-H.; Tang, G.-Y. Approximation Optimal Vibration for Networked Nonlinear Vehicle Active Suspension with Actuator Time Delay. Asian J. Control. 2017, 19, 983–995. [Google Scholar] [CrossRef]
- Lei, J. Optimal Vibration Control of Nonlinear Systems with Multiple Time-Delays: An Application to Vehicle Suspension. Integr. Ferroelectr. 2016, 170, 10–32. [Google Scholar] [CrossRef]
- Bououden, S.; Chadli, M.; Zhang, L.; Yang, T. Constrained Model Predictive Control for Time-Varying Delay Systems: Application to an Active Car Suspension. Int. J. Control Autom. Syst. 2016, 14, 51–58. [Google Scholar] [CrossRef]
- Udwadia, F.E.; Phohomsiri, P. Active Control of Structures Using Time Delayed Positive Feedback Proportional Control Designs. Struct. Control. Health Monit. 2006, 13, 536–552. [Google Scholar] [CrossRef]
- Pan, H.; Sun, W.; Gao, H.; Yu, J. Finite-Time Stabilization for Vehicle Active Suspension Systems with Hard Constraints. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2663–2672. [Google Scholar] [CrossRef]
- Yang, J.N.; Li, Z.; Danielians, A.; Liu, S.C. Aseismic Hybrid Control of Nonlinear and Hysteretic Structures I. J. Eng. Mech. 1992, 118, 1423–1440. [Google Scholar] [CrossRef]
- Kwon, W.; Pearson, A. Feedback Stabilization of Linear Systems with Delayed Control. IEEE Trans. Autom. Control. 1980, 25, 266–269. [Google Scholar] [CrossRef]
- Du, H.; Zhang, N. H∞ Control of Active Vehicle Suspensions with Actuator Time Delay. J. Sound Vib. 2007, 301, 236–252. [Google Scholar] [CrossRef]
- Li, H.; Jing, X.; Karimi, H.R. Output-Feedback-Based Hınfty Control for Vehicle Suspension Systems with Control Delay. IEEE Trans. Ind. Electron. 2014, 61, 436–446. [Google Scholar] [CrossRef]
- Kim, J.; Lee, T.; Kim, C.-J.; Yi, K. Model Predictive Control of a Semi-Active Suspension with a Shift Delay Compensation Using Preview Road Information. Control Eng. Pract. 2023, 137, 105584. [Google Scholar] [CrossRef]
- Wu, K.; Ren, C.; Nan, Y.; Li, L.; Yuan, S.; Shao, S.; Sun, Z. Experimental Research on Vehicle Active Suspension Based on Time-Delay Control. Int. J. Control 2023, 96, 1–17. [Google Scholar] [CrossRef]
- Li, G.; Huang, Q.; Hu, G.; Ding, R.; Zhu, W.; Zeng, L. Semi-Active Fuzzy Cooperative Control of Vehicle Suspension with a Magnetorheological Damper. J. Intell. Mater. Syst. Struct. 2023, 1045389X231157353. [Google Scholar] [CrossRef]
- Wang, D. Adaptive Control for the Nonlinear Suspension Systems with Stochastic Disturbances and Unknown Time Delay. Syst. Sci. Control Eng. 2022, 10, 208–217. [Google Scholar] [CrossRef]
- Zhang, Z.; Dong, J. A New Optimization Control Policy for Fuzzy Vehicle Suspension Systems Under Membership Functions Online Learning. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 3255–3266. [Google Scholar] [CrossRef]
- Xie, Z.; You, W.; Wong, P.K.; Li, W.; Ma, X.; Zhao, J. Robust Fuzzy Fault Tolerant Control for Nonlinear Active Suspension Systems via Adaptive Hybrid Triggered Scheme. Int. J. Adapt. Control Signal Process. 2023, 37, 1608–1627. [Google Scholar] [CrossRef]
- Sakthivel, R.; Shobana, N.; Priyanka, S.; Kwon, O.M. State Observer-Based Predictive Proportional-Integral Tracking Control for Fuzzy Input Time-Delay Systems. Int. J. Robust Nonlinear Control 2023, 33, 6052–6069. [Google Scholar] [CrossRef]
- Gu, B.; Cong, J.; Zhao, J.; Chen, H.; Fatemi Golshan, M. A Novel Robust Finite Time Control Approach for a Nonlinear Disturbed Quarter-Vehicle Suspension System with Time Delay Actuation. Automatika 2022, 63, 627–639. [Google Scholar] [CrossRef]
- Ma, X.; Wong, P.K.; Li, W.; Zhao, J.; Ghadikolaei, M.A.; Xie, Z. Multi-Objective H-2/H-8 Control of Uncertain Active Suspension Systems with Interval Time-Varying Delay. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 2023, 237, 335–347. [Google Scholar] [CrossRef]
- Lee, Y.J.; Pae, D.S.; Choi, H.D.; Lim, M.T. Sampled-Data L-2 - L-8 Filter-Based Fuzzy Control for Active Suspensions. IEEE Access 2023, 11, 21068–21080. [Google Scholar] [CrossRef]
- Ma, G.; Wang, Z.; Yuan, Z.; Wang, X.; Yuan, B.; Tao, D. A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning. arXiv 2022. [Google Scholar] [CrossRef]
- Gao, Z.; Yan, X.; Gao, F.; He, L. Driver-like Decision-Making Method for Vehicle Longitudinal Autonomous Driving Based on Deep Reinforcement Learning. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2022, 236, 3060–3070. [Google Scholar] [CrossRef]
- Fares, A.; Bani Younes, A. Online Reinforcement Learning-Based Control of an Active Suspension System Using the Actor Critic Approach. Appl. Sci. 2020, 10, 8060. [Google Scholar] [CrossRef]
- Liu, M.; Li, Y.; Rong, X.; Zhang, S.; Yin, Y. Semi-Active Suspension Control Based on Deep Reinforcement Learning. IEEE Access 2020, 8, 9978–9986. [Google Scholar] [CrossRef]
- Pang, H.; Luo, J.; Wang, M.; Wang, L. A Stability Guaranteed Nonfragile Fault-Tolerant Control Approach for Markov-Type Vehicle Active Suspension System Subject to Faults and Disturbances. J. Vib. Control 2023, 10775463231160807. [Google Scholar] [CrossRef]
- Kozek, M.; Smoter, A.; Lalik, K. Neural-Assisted Synthesis of a Linear Quadratic Controller for Applications in Active Suspension Systems of Wheeled Vehicles. Energies 2023, 16, 1677. [Google Scholar] [CrossRef]
- Li, Y.; Wang, T.; Liu, W.; Tong, S. Neural Network Adaptive Output-Feedback Optimal Control for Active Suspension Systems. IEEE Trans. Syst. Man Cybern Syst. 2022, 52, 4021–4032. [Google Scholar] [CrossRef]
- Lin, Y.-C.; Nguyen, H.L.T.; Yang, J.-F.; Chiou, H.-J. A Reinforcement Learning Backstepping-Based Control Design for a Full Vehicle Active Macpherson Suspension System. IET Control Theory Appl. 2022, 16, 1417–1430. [Google Scholar] [CrossRef]
- Yong, H.; Seo, J.; Kim, J.; Kim, M.; Choi, J. Suspension Control Strategies Using Switched Soft Actor-Critic Models for Real Roads. IEEE Trans. Ind. Electron. 2023, 70, 824–832. [Google Scholar] [CrossRef]
- Lee, D.; Jin, S.; Lee, C. Deep Reinforcement Learning of Semi-Active Suspension Controller for Vehicle Ride Comfort. IEEE Trans. Veh. Technol. 2023, 72, 327–339. [Google Scholar] [CrossRef]
- Du, Y.; Chen, J.; Zhao, C.; Liao, F.; Zhu, M. A Hierarchical Framework for Improving Ride Comfort of Autonomous Vehicles via Deep Reinforcement Learning with External Knowledge. Comput.-Aided Civ. Infrastruct. Eng. 2022, 38, 1059–1078. [Google Scholar] [CrossRef]
- Han, S.-Y.; Liang, T. Reinforcement-Learning-Based Vibration Control for a Vehicle Semi-Active Suspension System via the PPO Approach. Appl. Sci. 2022, 12, 3078. [Google Scholar] [CrossRef]
- Dridi, I.; Hamza, A.; Ben Yahia, N. A New Approach to Controlling an Active Suspension System Based on Reinforcement Learning. Adv. Mech. Eng. 2023, 15, 16878132231180480. [Google Scholar] [CrossRef]
- Kwok, N.M.; Ha, Q.P.; Nguyen, T.H.; Li, J.; Samali, B. A Novel Hysteretic Model for Magnetorheological Fluid Dampers and Parameter Identification Using Particle Swarm Optimization. Sens. Actuators A Phys. 2006, 132, 441–451. [Google Scholar] [CrossRef]
- Krauze, P.; Kasprzyk, J. Driving Safety Improved with Control of Magnetorheological Dampers in Vehicle Suspension. Appl. Sci. 2020, 10, 8892. [Google Scholar] [CrossRef]
- Savaresi, S.M.; Spelta, C. Mixed Sky-Hook and ADD: Approaching the Filtering Limits of a Semi-Active Suspension. J. Dyn. Syst. Meas. Control 2006, 129, 382–392. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2019. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar] [CrossRef]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 3 July 2018; pp. 1587–1596. [Google Scholar] [CrossRef]
- Theunissen, J.; Sorniotti, A.; Gruber, P.; Fallah, S.; Ricco, M.; Kvasnica, M.; Dhaens, M. Regionless Explicit Model Predictive Control of Active Suspension Systems with Preview. IEEE Trans. Ind. Electron. 2020, 67, 4877–4888. [Google Scholar] [CrossRef]
- Liang, G.; Zhao, T.; Wei, Y. DDPG Based Self-Learning Active and Model-Constrained Semi-Active Suspension Control. In Proceedings of the 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI), Tianjin, China, 29–31 October 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Baek, S.; Baek, J.; Choi, J.; Han, S. A Reinforcement Learning-Based Adaptive Time-Delay Control and Its Application to Robot Manipulators. In Proceedings of the 2022 American Control Conference (ACC), Atlanta, GA, USA, 8–10 June 2022; pp. 2722–2729. [Google Scholar] [CrossRef]
- Li, S.; Ding, L.; Gao, H.; Liu, Y.-J.; Li, N.; Deng, Z. Reinforcement Learning Neural Network-Based Adaptive Control for State and Input Time-Delayed Wheeled Mobile Robots. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 4171–4182. [Google Scholar] [CrossRef]
- Zhu, W.; Garg, T.; Raza, S.; Lalar, S.; Barak, D.D.; Rahmani, A.W. Application Research of Time Delay System Control in Mobile Sensor Networks Based on Deep Reinforcement Learning. Wirel. Commun. Mob. Comput. 2022, 2022, 7844719. [Google Scholar] [CrossRef]
- Chen, B.; Xu, M.; Li, L.; Zhao, D. Delay-Aware Model-Based Reinforcement Learning for Continuous Control. Neurocomputing 2021, 450, 119–128. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017. [Google Scholar] [CrossRef]
Parameters | Value | Parameters | Value |
---|---|---|---|
400 | 40 | ||
20,000 | 200,000 | ||
20 | 0 | ||
256 | 0 | ||
0 | 1500 |
Hyperparameter | ||
---|---|---|
Item | Value | |
Critic | Learning rate | 1 × 10−3 |
Gradient threshold | 1 | |
L2 Regularization factor | 1 × 10−4 | |
Actor | Learning rate | 1 × 10−2 |
Gradient threshold | 1 | |
Agent | Sample time | 0.01 |
Target smoothing factor | 1 × 10−3 | |
Experience buffer length | 1 × 106 | |
Discount factor | 0.99 | |
Mini-batch size | 128 | |
Soft update factor | 1 × 10−3 | |
Delayed update frequency | 2 | |
Noise clipping | 0.5 | |
Noise variance | 0.6 | |
The decay rate of noise variance | 1 × 10−5 | |
Training process | Max episodes | 2000 |
Max steps | 1000 |
Controller | Passive | 10 ms | 20 ms | 30 ms | |||
---|---|---|---|---|---|---|---|
Proposed | DDPG | Proposed | DDPG | Proposed | DDPG | ||
1.8778 | 1.0595 (+43.58%) | 0.7772 (+58.61%) | 1.0347 (+44.9%) | 1.3901 (+25.97%) | 1.2716 (+32.28%) | 1.5439 (+17.78%) |
Speed (m/s) | Passive | Proposed | Optimization |
---|---|---|---|
10 | 1.265 | 0.8853 | 30.02% |
15 | 1.5461 | 0.9885 | 36.06% |
20 | 1.7803 | 1.1116 | 37.56% |
25 | 1.9837 | 1.2049 | 39.26% |
30 | 2.1645 | 1.2794 | 40.89% |
35 | 2.3275 | 1.353 | 41.87% |
40 | 2.476 | 1.4122 | 42.96% |
45 | 2.6123 | 1.4705 | 43.71% |
50 | 2.7383 | 1.5175 | 44.58% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Wang, C.; Zhao, S.; Guo, K. Research on Deep Reinforcement Learning Control Algorithm for Active Suspension Considering Uncertain Time Delay. Sensors 2023, 23, 7827. https://doi.org/10.3390/s23187827
Wang Y, Wang C, Zhao S, Guo K. Research on Deep Reinforcement Learning Control Algorithm for Active Suspension Considering Uncertain Time Delay. Sensors. 2023; 23(18):7827. https://doi.org/10.3390/s23187827
Chicago/Turabian StyleWang, Yang, Cheng Wang, Shijie Zhao, and Konghui Guo. 2023. "Research on Deep Reinforcement Learning Control Algorithm for Active Suspension Considering Uncertain Time Delay" Sensors 23, no. 18: 7827. https://doi.org/10.3390/s23187827
APA StyleWang, Y., Wang, C., Zhao, S., & Guo, K. (2023). Research on Deep Reinforcement Learning Control Algorithm for Active Suspension Considering Uncertain Time Delay. Sensors, 23(18), 7827. https://doi.org/10.3390/s23187827