Error Dynamics Based Dual Heuristic Dynamic Programming for Self-Learning Flight Control
Abstract
:1. Introduction
- An ED-DHP controller for the fast loop is designed to control the morphing air vehicle attitude system with different work points without offline learning;
- The error dynamics of the fast loop are derived. Based on these, the control effectiveness matrix and the total disturbance term are identifiable by RLS. This enhances learning efficiency effectively. The ADP system is augmented by tracking error and reference trajectory, so the feedforward and feedback control terms can be learned by the actor NN, on the basis that the complete knowledge of system dynamics and the reference trajectory dynamics are unknown;
- The outputs of ED-DHP include the rough trim surface, feedforward and feedback terms, and compensation. The robustness of the method can be enhanced.
2. Mathematical Model of Air Vehicle
3. Online Tracking Controller Based on ED-DHP
3.1. Control Structure of the System
3.2. The Fast Loop Error Dynamics of Attitude System and Model Identification
3.3. ED-DHP Framework
Algorithm 1 Learning Framework |
Require: |
Simulation parameters: sampling time and the total simulation time ; |
The controller parameters for the slow loop subsystem; |
The forgetting factor of RLS; |
The positive definite matrices; |
The discount factor and the learning rates of two neural networks. |
Initialize: |
The initial states and inputs of the air vehicle; |
The parameter matrix and the estimation covariance matrix of RLS; |
Initialize the parameters of two neural networks randomly. |
Online Learning: |
1: for t = 0 to do; |
2: Calculate reference angular rate vector of the fast loop subsystem with (4); |
3: Obtain the angular rate vector through the sensors; |
4: Update the RLS model with (11)–(13); |
5: Update the parameters of the critic NN with (33) and (34); |
6: Update the parameters of the actor NN with (35) and (36); |
7: Calculate the output vector of the actor NN with (26)–(28); |
8: Calculate the control input vector of the air vehicle with (19); |
9: Get the next system states by taking action ; |
10: end for. |
4. Numerical Simulation and Discussion
4.1. Example 1
4.2. Example 2
5. Conclusions
- Augmented state increases the dimension of the system state space, which may affect the convergence speed of the algorithm without offline training;
- The systems with input constraints can be further considered by designing the new cost function to make this method better apply to real vehicle systems;
- ED-DHP is suitable for the control of fast loop subsystems. It is necessary to optimize the error dynamic and RLS method to expand the application scenarios of ED-DHP.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chai, R.; Tsourdos, A.; Savvaris, A. Review of advanced guidance and control algorithms for space/aerospace vehicles. Prog. Aerosp. Sci. 2021, 122, 100696. [Google Scholar] [CrossRef]
- Ding, Y.B.; Yue, X.K.; Chen, G.S. Review of control and guidance technology on hypersonic vehicle. Chin. J. Aeronaut. 2022, 35, 1–18. [Google Scholar] [CrossRef]
- Kober, J.; Peters, J. Reinforcement Learning in Robotics: A Survey. In Learning Motor Skills. Springer Tracts in Advanced Robotics; Springer: Cham, Switzerland, 2014; Volume 97, pp. 9–15. [Google Scholar]
- Degrave, J.; Felici, F.; Buchli, J. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 2022, 602, 414–419. [Google Scholar] [CrossRef] [PubMed]
- Tang, X.; Qin, Z.T.; Zhang, F. A Deep Value-network Based Approach for Multi-Driver Order Dispatching. In Proceedings of the 25th ACM SIGKDD International Conference, Anchorage, AK, USA, 4–8 August 2019; pp. 1780–1790. [Google Scholar]
- Zeng, S.; Kody, A.; Kim, Y. A Reinforcement Learning Approach to Parameter Selection for Distributed Optimization in Power Systems. Available online: https://arxiv.org/abs/2110.11991 (accessed on 22 October 2021).
- Dong, C.Y.; Liu, C.; Wang, Q. Switched Adaptive Active Disturbance Rejection Control Of Variable Structure Near Space Vehicles Based On Adaptive Dynamic Programming. Chin. J. Aeronaut. 2019, 32, 1684–1694. [Google Scholar] [CrossRef]
- Liu, C.; Dong, C.Y.; Zhou, Z.J. Barrier Lyapunov Function Based Reinforcement Learning Control For Air-Breathing Hypersonic Vehicle With Variable Geometry Inlet. Aerosp. Sci. Technol. 2019, 96, 105537. [Google Scholar] [CrossRef]
- Jia, C.H.; Gong, Q.H.; Huang, X. Synchronization Control of Nonlinear Chaotic Systems with Deep Reinforcement Learning Algorithm. In Proceedings of the International Conference on Guidance, Navigation and Control, Tianjin, China, 23–25 October 2020; pp. 1673–1682. [Google Scholar]
- Hui, J.P.; Wang, R.; Yu, Q.D. Generating new quality flight corridor for reentry aircraft based on reinforcement learning. Acta Aeronaut. Astronaut. Sin. 2022, 43, 325960. [Google Scholar]
- Kiumarsi, B.; Vamvoudakis, K.G.; Modares, H. Optimal and Autonomous Control Using Reinforcement Learning: A Survey. IEEE T. Neur. Net. Lear. 2018, 29, 2042–2062. [Google Scholar] [CrossRef]
- Ferrari, S.; Stengel, R. Online adaptive critic flight control. J. Guid. Control. Dynam. 2004, 27, 777–786. [Google Scholar] [CrossRef] [Green Version]
- Mu, C.X.; Ni, Z.; Sun, C.Y. Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming. IEEE T. Neur. Net. Lear. 2017, 28, 584–598. [Google Scholar] [CrossRef] [PubMed]
- Xia, R.; Chen, M.; Wu, Q. Neural Network based Integral Sliding Mode Optimal Flight Control of Near Space Hypersonic Vehicle. Neurocomputing 2020, 379, 41–52. [Google Scholar] [CrossRef]
- Huang, Y.; Liu, D. Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm. Neurocomputing 2014, 125, 46–56. [Google Scholar] [CrossRef]
- Wang, Q.; Gong, L.G.; Dong, C.Y. Morphing aircraft control based on switched nonlinear systems and adaptive dynamic programming. Aerosp. Sci. Technol. 2019, 93, 105325. [Google Scholar] [CrossRef]
- Han, X.; Zheng, Z.; Liu, L. Online policy iteration ADP-based attitude-tracking control for hypersonic vehicles. Aerosp. Sci. Technol. 2020, 106, 106233. [Google Scholar] [CrossRef]
- Bertsekas, D.P.; Homer, M.L.; Logan, D.A. Missile Defense and Interceptor Allocation By Neuro-Dynamic Programming. T-SMCA 2000, 1, 42–51. [Google Scholar] [CrossRef] [Green Version]
- Van Kampen, E.; Chu, Q.P.; Mulder, J.A. Continuous Adaptive Critic Flight Control Aided with Approximated Plant Dynamics. In Proceedings of the AIAA Guidance, Navigation, and Control Conference and Exhibit, Keystone, CO, USA, 21–24 August 2006. [Google Scholar]
- Zhou, Y.; van Kampen, E.J.; Qi, P.C. Nonlinear Adaptive Flight Control Using Incremental Approximate Dynamic Programming and Output Feedback. J. Guid Control. Dynam. 2016, 40, 1–8. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.; van Kampen, E.J.; Qi, P.C. Incremental model based online dual heuristic programming for nonlinear adaptive control. Control. Eng.Pract. 2018, 73, 13–25. [Google Scholar] [CrossRef]
- Li, H.X.; Sun, L.G.; Tan, W.Q. Switching Flight Control for Incremental Model-Based Dual Heuristic Dynamic Programming. J. Guid, Control. Dynam. 2020, 43, 1–7. [Google Scholar] [CrossRef]
- Liu, J.H.; Shan, J.Y.; Rong, J.L. Incremental reinforcement learning flight control with adaptive learning rate. J. Astronaut. 2022, 43, 111–121. [Google Scholar]
- Lee, J.H.; van Kampen, E.J. Online reinforcement learning for fixed-wing aircraft longitudinal control. In Proceedings of the AIAA Scitech 2021 Forum, Online, 11–15 January 2021. [Google Scholar]
- Heyer, S.; Kroezen, D.; Van Kampen, E. Online Adaptive Incremental Reinforcement Learning Flight Control for a CS-25 Class Aircraft. In Proceedings of the AIAA Scitech 2020 Forum, Orlando, FL, USA, 6–10 January 2020. [Google Scholar]
- Han, J.Q. Auto Disturbance Rejection Control Technology; National Defense Industry Press: Beijing, China, 2008; pp. 64–67. [Google Scholar]
- Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circ. Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
- Kiumarsi, B.; Lewis, F.L. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 140–151. [Google Scholar] [CrossRef]
- Modares, H.; Lewis, F.L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatic 2014, 50, 1780–1792. [Google Scholar] [CrossRef]
Case 0 | Case 1 | Case 2 |
---|---|---|
Time | Case 0 | Case 1 | Case 2 |
---|---|---|---|
2 s~10 s | |||
10 s~20 s | |||
20 s~25 s | |||
25 s~30 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, X.; Zhang, Y.; Liu, J.; Zhong, H.; Wang, Z.; Peng, Y. Error Dynamics Based Dual Heuristic Dynamic Programming for Self-Learning Flight Control. Appl. Sci. 2023, 13, 586. https://doi.org/10.3390/app13010586
Huang X, Zhang Y, Liu J, Zhong H, Wang Z, Peng Y. Error Dynamics Based Dual Heuristic Dynamic Programming for Self-Learning Flight Control. Applied Sciences. 2023; 13(1):586. https://doi.org/10.3390/app13010586
Chicago/Turabian StyleHuang, Xu, Yuan Zhang, Jiarun Liu, Honghao Zhong, Zhaolei Wang, and Yue Peng. 2023. "Error Dynamics Based Dual Heuristic Dynamic Programming for Self-Learning Flight Control" Applied Sciences 13, no. 1: 586. https://doi.org/10.3390/app13010586
APA StyleHuang, X., Zhang, Y., Liu, J., Zhong, H., Wang, Z., & Peng, Y. (2023). Error Dynamics Based Dual Heuristic Dynamic Programming for Self-Learning Flight Control. Applied Sciences, 13(1), 586. https://doi.org/10.3390/app13010586