The Adaptive Optimal Output Feedback Tracking Control of Unknown Discrete-Time Linear Systems Using a Multistep Q-Learning Approach
Abstract
:1. Introduction
- Compared to the results reported in [23,39], the proposed approach does not apply an actor–critic structure, which are dependent on actor and critic NNs, to approximate control policy or value function. Moreover, the proposed model-free learning approach removes the requirement for the measurability of system state variables by collecting past input, output, and reference trajectory data. This is particularly advantageous in practical scenarios in which obtaining full state information may be challenging or costly;
- Using the proposed multistep Q-learning technique [41], which surpasses the advantages of PI and VI methods, we are able to remove the requirement for an initial stabilizing control strategy. Moreover, this combination improves the convergence speed of the algorithm, leading to more efficient control performance.
2. Problem Statement
2.1. Offline Solution for LQT
2.2. Q-Function Bellman Equation
2.3. PI-Based Q-Learning for LQT
Algorithm 1 PI Q-learning Algorithm for LQT. |
Initialization: Start with an admissible control policy with . Procedure: 1: Policy Evaluation: For , collect samples under to solve using the Q-function Bellman equation: 2: Policy Improvement: Compute the improved control policy as follows: 3: Stopping Criterion: Stop the iteration if for some specified small positive number . Otherwise, let and go back to iteration. End Procedure |
3. Methods
3.1. Multistep Q-Learning
- Step 1. Initialization: Set and iterate from any initial control policy , which does not need to be stable or controllable, and .
- Step 2. Multistep policy evaluation: Use the Q-function Bellman equation to solve , where
- Step 3. Policy improvement: Update the control policy as follows:
- Step 4. Termination condition: Check if , where l is a very small threshold with pre-set algorithmic accuracy. If this condition is satisfied, terminate the iteration and obtain the optimal control policy . Otherwise, return to Step 2 and repeat the iteration.
3.2. Adjustment Rules for Step Size
3.3. Implementation
3.4. Convergence Analysis
- (i)
- For any j,
- (ii)
- , where is the optimal solution to the Q-function Bellman equation.
4. Simulation Experiment
4.1. Controlled Object
4.2. Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lewis, F.L.; Vrabie, D.; Syrmos, V.L. Optimal Control, 3rd ed.; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
- Luo, R.; Peng, Z.; Hu, J. On model identification based optimal control and its applications to multi-agent learning and control. Mathematics 2023, 11, 906. [Google Scholar] [CrossRef]
- Chen, Y.H.; Chen, Y.Y. Trajectory tracking design for a swarm of autonomous mobile robots: A nonlinear adaptive optimal approach. Mathematics 2022, 10, 3901. [Google Scholar] [CrossRef]
- Banholzer, S.; Herty, M.; Pfenninger, S.; Zügner, S. Multiobjective model predictive control of a parabolic advection-diffusion-reaction equation. Mathematics 2020, 8, 777. [Google Scholar] [CrossRef]
- Hewer, G. An Iterative Technique for the Computation of the Steady State Gains for the Discrete Optimal Regulator. IEEE Trans. Autom. Control 1971, 16, 382–384. [Google Scholar] [CrossRef]
- Lancaster, P.; Rodman, L. Algebraic Riccati Equations; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
- Dai, S.; Wang, C.; Wang, M. Dynamic Learning From Adaptive Neural Network Control of a Class of Nonaffine Nonlinear Systems. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 111–123. [Google Scholar]
- He, W.; Dong, Y.; Sun, C. Adaptive Neural Impedance Control of a Robotic Manipulator With Input Saturation. IEEE Trans. Syst. Man, Cybern. Syst. 2016, 46, 334–344. [Google Scholar] [CrossRef]
- Luy, N.T. Robust adaptive dynamic programming based online tracking control algorithm for real wheeled mobile robot with omni-directional vision system. Trans. Inst. Meas. Control. 2017, 39, 832–847. [Google Scholar] [CrossRef]
- He, W.; Meng, T.; He, X.; Ge, S.S. Unified iterative learning control for flexible structures with input constraints. Automatica 2018, 96, 326–336. [Google Scholar] [CrossRef]
- Radac, M.B.; Precup, R.E. Data-Driven model-free tracking reinforcement learning control with VRFT-based adaptive actor-critic. Appl. Sci. 2019, 9, 1807. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, D. Data-Based Controllability and Observability Analysis of Linear Discrete-Time Systems. IEEE Trans. Neural Netw. 2011, 22, 2388–2392. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Sutton, R.S.; Barto, A.G.; Williams, R.J. Reinforcement learning is direct adaptive optimal control. IEEE Control Syst. Mag. 1992, 12, 19–22. [Google Scholar]
- Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
- Wang, F.; Zhang, H.; Liu, D. Adaptive Dynamic Programming: An Introduction. IEEE Comput. Intell. Mag. 2009, 4, 39–47. [Google Scholar] [CrossRef]
- Jiang, Z.P.; Jiang, Y. Robust adaptive dynamic programming for linear and nonlinear systems: An overview. Eur. J. Control 2013, 19, 417–425. [Google Scholar] [CrossRef]
- Zhang, K.; Zhang, H.; Cai, Y.; Su, R. Parallel Optimal Tracking Control Schemes for Mode-Dependent Control of Coupled Markov Jump Systems via Integral RL Method. IEEE Trans. Autom. Sci. Eng. 2020, 17, 1332–1342. [Google Scholar] [CrossRef]
- Zhang, K.; Zhang, H.; Mu, Y.; Liu, C. Decentralized Tracking Optimization Control for Partially Unknown Fuzzy Interconnected Systems via Reinforcement Learning Method. IEEE Trans. Fuzzy Syst. 2020, 29, 917–926. [Google Scholar] [CrossRef]
- Vrabie, D.; Pastravanu, O.; Abou-Khalaf, M.; Lewis, F.L. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 2009, 45, 477–484. [Google Scholar] [CrossRef]
- Jiang, Y.; Jiang, Z.P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 2012, 48, 2699–2704. [Google Scholar] [CrossRef]
- Modares, H.; Lewis, F.L. Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning. IEEE Trans. Autom. Control. 2014, 59, 3051–3056. [Google Scholar] [CrossRef]
- Li, X.; Xue, L.; Sun, C. Linear quadratic tracking control of unknown discrete-time systems using value iteration algorithm. Neurocomputing 2018, 314, 86–93. [Google Scholar] [CrossRef]
- Lewis, F.L.; Vamvoudakis, K.G. Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data. IEEE Trans. Syst. Man, Cybern. Part B (Cybern.) 2011, 41, 14–25. [Google Scholar] [CrossRef]
- Kiumarsi, B.; Lewis, F.L.; Naghibi-Sistani, M.B.; Karimpour, A. Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data. IEEE Trans. Cybern. 2015, 45, 2770–2779. [Google Scholar] [CrossRef]
- Gao, W.; Huang, M.; Jiang, Z.; Chai, T. Sampled-data-based adaptive optimal output-feedback control of a 2-degree-of-freedom helicopter. IET Control. Theory Appl. 2016, 10, 1440–1447. [Google Scholar] [CrossRef]
- Xiao, G.; Zhang, H.; Zhang, K.; Wen, Y. Value iteration based integral reinforcement learning approach for H∞ controller design of continuous-time nonlinear systems. Neurocomputing 2018, 285, 51–59. [Google Scholar] [CrossRef]
- Chen, C.; Sun, W.; Zhao, G.; Peng, Y. Reinforcement Q-Learning Incorporated With Internal Model Method for Output Feedback Tracking Control of Unknown Linear Systems. IEEE Access 2020, 8, 134456–134467. [Google Scholar] [CrossRef]
- Zhao, F.; Gao, W.; Liu, T.; Jiang, Z.P. Adaptive optimal output regulation of linear discrete-time systems based on event-triggered output-feedback. Automatica 2022, 137, 110103. [Google Scholar] [CrossRef]
- Radac, M.B.; Lala, T. Learning Output Reference Model Tracking for Higher-Order Nonlinear Systems with Unknown Dynamics. Algorithms 2019, 12, 121. [Google Scholar] [CrossRef]
- Shi, P.; Shen, Q.K. Observer-based leader-following consensus of uncertain nonlinear multi-agent systems. Int. J. Robust Nonlinear Control. 2017, 27, 3794–3811. [Google Scholar] [CrossRef]
- Rizvi, S.A.A.; Lin, Z. Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1523–1536. [Google Scholar] [CrossRef]
- Zhu, L.M.; Modares, H.; Peen, G.O.; Lewis, F.L.; Yue, B. Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning. IEEE Trans. Control. Syst. Technol. 2015, 23, 264–273. [Google Scholar] [CrossRef]
- Moghadam, R.; Lewis, F.L. Output-feedback H∞ quadratic tracking control of linear systems using reinforcement learning. Int. J. Adapt. Control. Signal Process. 2019, 33, 300–314. [Google Scholar] [CrossRef]
- Valadbeigi, A.P.; Sedigh, A.K.; Lewis, F.L. H∞ Static Output-Feedback Control Design for Discrete-Time Systems Using Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 396–406. [Google Scholar] [CrossRef]
- Rizvi, S.A.A.; Lin, Z. Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control. Automatica 2018, 95, 213–221. [Google Scholar] [CrossRef]
- Peng, Y.; Chen, Q.; Sun, W. Reinforcement Q-Learning Algorithm for H∞ Tracking Control of Unknown Discrete-Time Linear Systems. IEEE Trans. Syst. Man Cybern. Syst. 2019, 50, 4109–4122. [Google Scholar] [CrossRef]
- Rizvi, S.A.A.; Lin, Z. Experience replay-based output feedback Q-learning scheme for optimal output tracking control of discrete-time linear systems. Int. J. Adapt. Control. Signal Process. 2019, 33, 1825–1842. [Google Scholar] [CrossRef]
- Luo, B.; Liu, D.; Huang, T.; Liu, J. Output Tracking Control Based on Adaptive Dynamic Programming with Multistep Policy Evaluation. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 2155–2165. [Google Scholar] [CrossRef]
- Kiumarsi, B.; Lewis, F.L.; Modares, H.; Karimpour, A.; Naghibi-Sistani, M.B. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 2014, 50, 1167–1175. [Google Scholar] [CrossRef]
- Luo, B.; Wu, H.N.; Huang, T. Optimal output regulation for model-free quanser helicopter with multistep Q-learning. IEEE Trans. Ind. Electron. 2017, 65, 4953–4961. [Google Scholar] [CrossRef]
- Lewis, F.L.; Vrabie, D.; Vamvoudakis, K.G. Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers. IEEE Control. Syst. Mag. 2012, 32, 76–105. [Google Scholar]
- Kiumarsi, B.; Lewis, F.L. Output synchronization of heterogeneous discrete-time systems: A model-free optimal approach. Automatica 2017, 84, 86–94. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dong, X.; Lin, Y.; Suo, X.; Wang, X.; Sun, W. The Adaptive Optimal Output Feedback Tracking Control of Unknown Discrete-Time Linear Systems Using a Multistep Q-Learning Approach. Mathematics 2024, 12, 509. https://doi.org/10.3390/math12040509
Dong X, Lin Y, Suo X, Wang X, Sun W. The Adaptive Optimal Output Feedback Tracking Control of Unknown Discrete-Time Linear Systems Using a Multistep Q-Learning Approach. Mathematics. 2024; 12(4):509. https://doi.org/10.3390/math12040509
Chicago/Turabian StyleDong, Xunde, Yuxin Lin, Xudong Suo, Xihao Wang, and Weijie Sun. 2024. "The Adaptive Optimal Output Feedback Tracking Control of Unknown Discrete-Time Linear Systems Using a Multistep Q-Learning Approach" Mathematics 12, no. 4: 509. https://doi.org/10.3390/math12040509
APA StyleDong, X., Lin, Y., Suo, X., Wang, X., & Sun, W. (2024). The Adaptive Optimal Output Feedback Tracking Control of Unknown Discrete-Time Linear Systems Using a Multistep Q-Learning Approach. Mathematics, 12(4), 509. https://doi.org/10.3390/math12040509