Online Inverse Optimal Control for Time-Varying Cost Weights
Abstract
:1. Introduction
- We provide a solution for the recovery of time-varying cost weights, essential for analyzing real-world animal or human motion.
- Our method operates online, suitable for a broad spectrum of real-time calculation problems. This contrasts with previous online IOC methods that mainly focused on constant cost weights for discrete system control.
- We introduce a neural network and state observer-based framework for online verification and refinement of estimated cost weights. This innovation addresses the critical need for solution uniqueness and robustness against data noise in IOC applications.
2. Problem Formulation
2.1. System Description and Problem Statement
2.2. Maximum Principle in Forward Optimal Control
2.3. Analysis of the IOC Problem
- What happens when a different feature function is selected?
- Whether or not the given set in the IOC problem has a unique solution .
3. Adaptive Observer-Based Neural Network Approximation of Time-Varying Cost Weights
Construction of the Observer
4. Neural Network-Based Approximation of Time Varying Cost Weights
- become UUB after a time point (, and )
- The change in approaches zero
- Matrix C defined below will become a full row rank matrix.
4.1. Construction of the Neural Network
4.2. Tuning Law of the Neural Network for the Estimation of
5. Simulations
5.1. Basic Simulation Conditions
Algorithm 1 Online implementation |
|
- In the first case, we apply the optimal control of the sample system with cost weights as the signal ( and ). The proposed IOC method is employed online to estimate the cost weights, with the simultaneous online recovery of the original system trajectory. Parameters and in the updating law are set to and , respectively. Parameters k and are set to and , respectively. The initial values of and are set to matrixes with all elements equal to zero. The original and are set to and , respectively. The simulation also uses 49 nodes in the neural network.
- In the second case, we perform the simulation of our IOC method, but with the original and set to and , respectively. All other simulation settings are the same as in the first case.
5.2. Results
6. Discussion
6.1. Robustness of the Proposed Method to Noisy Data
6.2. Calculation Complexity and Real-Time Calculation
6.3. Advantages of Using
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Proof of Theorem 1
References
- Frigon, A.; Akay, T.; Prilutsky, B.I. Control of Mammalian Locomotion by Somatosensory Feedback. Compr. Physiol. 2021, 12, 2877–2947. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Tee, K.P.; Yan, R.; Chan, W.L.; Wu, Y. A framework of human–robot coordination based on game theory and policy iteration. IEEE Trans. Robot. 2016, 32, 1408–1418. [Google Scholar] [CrossRef]
- Ziebart, B.D.; Maas, A.L.; Bagnell, J.A.; Dey, A.K. Human Behavior Modeling with Maximum Entropy Inverse Optimal Control. In Proceedings of the AAAI Spring Symposium: Human Behavior Modeling, Stanford, CA, USA, 23–25 March 2009; Volume 92. [Google Scholar]
- Berret, B.; Chiovetto, E.; Nori, F.; Pozzo, T. Evidence for composite cost functions in arm movement planning: An inverse optimal control approach. PLoS Comput. Biol. 2011, 7, e1002183. [Google Scholar] [CrossRef]
- El-Hussieny, H.; Abouelsoud, A.; Assal, S.F.; Megahed, S.M. Adaptive learning of human motor behaviors: An evolving inverse optimal control approach. Eng. Appl. Artif. Intell. 2016, 50, 115–124. [Google Scholar] [CrossRef]
- Jin, W.; Kulić, D.; Mou, S.; Hirche, S. Inverse optimal control from incomplete trajectory observations. Int. J. Robot. Res. 2021, 40, 848–865. [Google Scholar] [CrossRef]
- Kalman, R.E. When is a linear control system optimal? J. Fluids Eng. 1964, 86, 51–60. [Google Scholar] [CrossRef]
- Molinari, B. The stable regulator problem and its inverse. IEEE Trans. Autom. Control 1973, 18, 454–459. [Google Scholar] [CrossRef]
- Obermayer, R.; Muckler, F.A. On the Inverse Optimal Control Problem in Manual Control Systems; NASA: Washington, DC, USA, 1965; Volume 208. [Google Scholar]
- Boyd, S.; El Ghaoui, L.; Feron, E.; Balakrishnan, V. Linear Matrix Inequalities in System and Control Theory; SIAM: Philadelphia, PA, USA, 1994. [Google Scholar]
- Priess, M.C.; Conway, R.; Choi, J.; Popovich, J.M.; Radcliffe, C. Solutions to the inverse LQR problem with application to biological systems analysis. IEEE Trans. Control Syst. Technol. 2014, 23, 770–777. [Google Scholar] [CrossRef] [PubMed]
- Rodriguez, A.; Ortega, R. Adaptive stabilization of nonlinear systems: The non-feedback linearizable case. IFAC Proc. Vol. 1990, 23, 303–306. [Google Scholar] [CrossRef]
- Freeman, R.A.; Kokotovic, P.V. Inverse optimality in robust stabilization. SIAM J. Control Optim. 1996, 34, 1365–1391. [Google Scholar] [CrossRef]
- Chan, T.C.; Mahmood, R.; Zhu, I.Y. Inverse optimization: Theory and applications. Oper. Res. 2023. [Google Scholar] [CrossRef]
- Cao, S.; Luo, Z.; Quan, C. Sequential Inverse Optimal Control of Discrete-Time Systems. IEEE/CAA J. Autom. Sin. 2024, 11, 1–14. [Google Scholar]
- Tomasi, M.; Artoni, A. Identification of motor control objectives in human locomotion via multi-objective inverse optimal control. J. Comput. Nonlinear Dyn. 2023, 18, 051004. [Google Scholar] [CrossRef]
- Jean, F.; Maslovskaya, S. Injectivity of the inverse optimal control problem for control-affine systems. In Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; pp. 511–516. [Google Scholar]
- Dewhurst, J. A Collage-Based Approach to Inverse Optimal Control Problems with Unique Solutions. Ph.D. Thesis, University of Guelph, Guelph, ON, Canada, 2021. [Google Scholar]
- Johnson, M.; Aghasadeghi, N.; Bretl, T. Inverse optimal control for deterministic continuous-time nonlinear systems. In Proceedings of the 52nd IEEE Conference on Decision and Control, Firenze, Italy, 10–13 December 2013; pp. 2906–2913. [Google Scholar]
- Abbeel, P.; Ng, A.Y. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 1. [Google Scholar]
- Ziebart, B.D.; Maas, A.L.; Bagnell, J.A.; Dey, A.K. Maximum entropy inverse reinforcement learning. In Proceedings of the Aaai, Chicago, IL, USA, 13–17 July 2008; Volume 8, pp. 1433–1438. [Google Scholar]
- Molloy, T.L.; Ford, J.J.; Perez, T. Online inverse optimal control for control-constrained discrete-time systems on finite and infinite horizons. Automatica 2020, 120, 109109. [Google Scholar] [CrossRef]
- Gupta, R.; Zhang, Q. Decomposition and Adaptive Sampling for Data-Driven Inverse Linear Optimization. INFORMS J. Comput. 2022, 34, 2720–2735. [Google Scholar] [CrossRef]
- Jin, W.; Kulić, D.; Lin, J.F.S.; Mou, S.; Hirche, S. Inverse optimal control for multiphase cost functions. IEEE Trans. Robot. 2019, 35, 1387–1398. [Google Scholar] [CrossRef]
- Athans, M.; Falb, P.L. Optimal Control: An Introduction to the Theory and Its Applications; Courier Corporation: Chelmsford, MA, USA, 2007. [Google Scholar]
- Ab Azar, N.; Shahmansoorian, A.; Davoudi, M. From inverse optimal control to inverse reinforcement learning: A historical review. Annu. Rev. Control 2020, 50, 119–138. [Google Scholar] [CrossRef]
- Li, Y.; Yao, Y.; Hu, X. Continuous-time inverse quadratic optimal control problem. Automatica 2020, 117, 108977. [Google Scholar] [CrossRef]
- Zhang, H.; Ringh, A. Inverse linear-quadratic discrete-time finite-horizon optimal control for indistinguishable homogeneous agents: A convex optimization approach. Automatica 2023, 148, 110758. [Google Scholar] [CrossRef]
- Lewis, F.; Jagannathan, S.; Yesildirak, A. Neural Network Control of Robot Manipulators and Non-Linear Systems; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, S.; Luo, Z.; Quan, C. Online Inverse Optimal Control for Time-Varying Cost Weights. Biomimetics 2024, 9, 84. https://doi.org/10.3390/biomimetics9020084
Cao S, Luo Z, Quan C. Online Inverse Optimal Control for Time-Varying Cost Weights. Biomimetics. 2024; 9(2):84. https://doi.org/10.3390/biomimetics9020084
Chicago/Turabian StyleCao, Sheng, Zhiwei Luo, and Changqin Quan. 2024. "Online Inverse Optimal Control for Time-Varying Cost Weights" Biomimetics 9, no. 2: 84. https://doi.org/10.3390/biomimetics9020084
APA StyleCao, S., Luo, Z., & Quan, C. (2024). Online Inverse Optimal Control for Time-Varying Cost Weights. Biomimetics, 9(2), 84. https://doi.org/10.3390/biomimetics9020084