Model-Free Optimized Tracking Control Heuristic
Abstract
:1. Introduction
2. Control Architecture
2.1. Tracking Control
2.2. Cost Function Optimization
2.3. Agrregate Control System
3. Tracking Control Algorithms
3.1. RL-Based Tracking Control Algorithm
Algorithm 1 Reinforcement Learning: Offline Computation of Q-Table |
Input:
Q-Table
|
3.2. NLTA-Based Tracking Control Algorithm
Algorithm 2 NLTA: Offline Tuning of the PID Gains |
Input: : Search range of . : Search range of . : Search range of . : a nominal frequency for the low-pass filter. : a frequency discount value. : an initial value for . N_ITERATIONS: number of iterations per episode. N_EPISODES: number of episodes. Output: Optimized PID gains .
|
4. Neural Network Optimization Algorithm
Algorithm 3 Neural Network: Offline Energy Optimization Scheme |
Input: Minimum and maximum bounds , of every state , , and its discretization step . Minimum and maximum bounds and of , and its discretization step . Output: Two trained neural networks, and
|
5. Aggregate Control Scheme
6. Simulation Results and Analysis
6.1. Simulation Setup
6.2. Performance Analysis
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
Variables | |
v | Linear side velocity |
, | Roll and yaw angles in the wing’s frame of motion |
Abbreviations | |
RL | Reinforcement Learning |
NLTA | Nonlinear Threshold Accepting |
NN | Neural Network |
NP | Non-deterministic Polynomial-time |
PID | Proportional-Integral-Derivative |
References
- Ernst, D.; Glavic, M.; Wehenkel, L. Power systems stability control: Reinforcement learning framework. IEEE Trans. Power Syst. 2004, 19, 427–435. [Google Scholar] [CrossRef] [Green Version]
- Luy, N.T. Reinforecement learning-based optimal tracking control for wheeled mobile robot. In Proceedings of the 2012 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), Bangkok, Thailand, 27–31 May 2012; pp. 371–376. [Google Scholar]
- Figueroa, R.; Faust, A.; Cruz, P.; Tapia, L.; Fierro, R. Reinforcement learning for balancing a flying inverted pendulum. In Proceedings of the 11th World Congress on Intelligent Control and Automation, Shenyang, China, 29 June–4 July 2014; pp. 1787–1793. [Google Scholar]
- Peters, J.; Vijayakumar, S.; Schaal, S. Reinforcement learning for humanoid robotics. In Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, 29–30 September 2003. [Google Scholar]
- Côté, D. Using machine learning in communication networks. J. Opt. Commun. Netw. 2018, 10, D100–D109. [Google Scholar] [CrossRef]
- Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Nahas, N.; Abouheaf, M.; Sharaf, A.M.; Gueaieb, W. A Self-Adjusting Adaptive AVR-LFC Scheme for Synchronous Generators. IEEE Trans. Power Syst. 2019, 34, 5073–5075. [Google Scholar] [CrossRef]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef] [Green Version]
- Kiumarsi, B.; Vamvoudakis, K.G.; Modares, H.; Lewis, F.L. Optimal and Autonomous Control Using Reinforcement Learning: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2042–2062. [Google Scholar] [CrossRef]
- Watkins, C.J.C.H. Learning from Delayed Rewards. Ph.D. Thesis, King’s College, Cambridge, UK, May 1989. [Google Scholar]
- Abouheaf, M.I.; Lewis, F.L. Multi-agent differential graphical games: Nash online adaptive learning solutions. In Proceedings of the 52nd IEEE Conference on Decision and Control, Florence, Italy, 10–13 December 2013; pp. 5803–5809. [Google Scholar] [CrossRef]
- Abouheaf, M.I.; Lewis, F.L.; Mahmoud, M.S. Model-free adaptive learning solutions for discrete-time dynamic graphical games. In Proceedings of the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, USA, 15–17 December 2014; pp. 3578–3583. [Google Scholar] [CrossRef]
- Abouheaf, M.I.; Lewis, F.L.; Mahmoud, M.S. Differential graphical games: Policy iteration solutions and coupled Riccati formulation. In Proceedings of the 2014 European Control Conference (ECC), Strasbourg, France, 24–27 June 2014; pp. 1594–1599. [Google Scholar]
- Abouheaf, M.; Lewis, F.; Vamvoudakis, K.; Haesaert, S.; Babuska, R. Multi-Agent Discrete-Time Graphical Games And Reinforcement Learning Solutions. Automatica 2014, 50, 3038–3053. [Google Scholar] [CrossRef] [Green Version]
- Abouheaf, M.I.; Lewis, F.L. Dynamic Graphical Games: Online Adaptive Learning Solutions Using Approximate Dynamic Programming. In Frontiers of Intelligent Control and Information Processing; World Scientific: Singapore, 2015; pp. 1–48. [Google Scholar]
- Abouheaf, M.I.; Haesaert, S.; Lee, W.; Lewis, F.L. Q-Learning with Eligibility Traces to solve Non-Convex economic dispatch problems. Int. J. Electr. Sci. Eng. 2012, 7, 1390–1396. [Google Scholar]
- Slotine, J.J.; Sastry, S.S. Tracking control of non-linear systems using sliding surfaces, with application to robot manipulators. Int. J. Control 1983, 38, 465–492. [Google Scholar] [CrossRef] [Green Version]
- Jiang, Z.P.; Nijmeijer, H. Tracking control of mobile robots: A case study in backstepping. Automatica 1997, 33, 1393–1399. [Google Scholar] [CrossRef] [Green Version]
- Chen, B.; Liu, X.; Tong, S. Adaptive fuzzy output tracking control of MIMO nonlinear uncertain systems. IEEE Trans. Fuzzy Syst. 2007, 15, 287–300. [Google Scholar] [CrossRef]
- Zhang, H.; Wei, Q.; Luo, Y. A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2008, 38, 937–942. [Google Scholar] [CrossRef] [PubMed]
- Cheah, C.C.; Liu, C.; Slotine, J.J.E. Adaptive tracking control for robots with unknown kinematic and dynamic properties. Int. J. Robot. Res. 2006, 25, 283–296. [Google Scholar] [CrossRef]
- Repoulias, F.; Papadopoulos, E. Planar trajectory planning and tracking control design for underactuated AUVs. Ocean Eng. 2007, 34, 1650–1667. [Google Scholar] [CrossRef]
- O’Dwyer, A. Handbook of PI and PID Controller Tuning Rules; Imperial College Press: London, UK, 2009. [Google Scholar]
- Singh, S.; Mitra, R. Comparative analysis of robustness of optimally offline tuned PID controller and Fuzzy supervised PID controller. In Proceedings of the 2014 Recent Advances in Engineering and Computational Sciences (RAECS), Chandigarh, India, 6–8 March 2014; pp. 1–6. [Google Scholar] [CrossRef]
- Fu, H.; Chen, X.; Wang, W.; Wu, M. MRAC for unknown discrete-time nonlinear systems based on supervised neural dynamic programming. Neurocomputing 2020, 384, 130–141. [Google Scholar] [CrossRef]
- Radac, M.B.; Lala, T. Learning Output Reference Model Tracking for Higher-Order Nonlinear Systems with Unknown Dynamics. Algorithms 2019, 12, 121. [Google Scholar] [CrossRef] [Green Version]
- Tang, D.; Chen, L.; Tian, Z.F.; Hu, E. Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control. Int. J. Control 2019, 1–13. [Google Scholar] [CrossRef]
- Mu, C.; Ni, Z.; Sun, C.; He, H. Data-Driven Tracking Control With Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems. IEEE Trans. Cybern. 2017, 47, 1460–1470. [Google Scholar] [CrossRef]
- Mu, C.; Zhang, Y. Learning-Based Robust Tracking Control of Quadrotor with Time-Varying and Coupling Uncertainties. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 259–273. [Google Scholar] [CrossRef]
- Nahas, N.; Nourelfath, M. Nonlinear threshold accepting meta-heuristic for combinatorial optimisation problems. Int. J. Metaheuristics 2014, 3, 265–290. [Google Scholar] [CrossRef]
- Nahas, N.; Darghouth, M.N.; Kara, A.Q.; Nourelfath, M. Non-linear threshold algorithm based solution for the redundancy allocation problem considering multiple redundancy strategies. J. Qual. Maint. Eng. 2019, 25, 397–411. [Google Scholar] [CrossRef]
- Nahas, N.; Darghouth, M.N.; Abouheaf, M. A non-linear-threshold-accepting function based algorithm for the solution of economic dispatch problem. RAIRO Oper. Res. 2020, 54. [Google Scholar] [CrossRef] [Green Version]
- Jain, A.K.; Mao, J.; Mohiuddin, K.M. Artificial neural networks: A tutorial. Computer 1996, 29, 31–44. [Google Scholar] [CrossRef] [Green Version]
- Svozil, D.; Kvasnicka, V.; Pospichal, J. Introduction to multi-layer feed-forward neural networks. Chemom. Intell. Lab. Syst. 1997, 39, 43–62. [Google Scholar] [CrossRef]
- Yao, X. Evolving artificial neural networks. Proc. IEEE 1999, 87, 1423–1447. [Google Scholar]
- Beigzadeh, R.; Rahimi, M.; Shabanian, S.R. Developing a feed forward neural network multilayer model for prediction of binary diffusion coefficient in liquids. Fluid Phase Equilib. 2012, 331, 48–57. [Google Scholar] [CrossRef]
- Oczak, M.; Viazzi, S.; Ismayilova, G.; Sonoda, L.T.; Roulston, N.; Fels, M.; Bahr, C.; Hartung, J.; Guarino, M.; Berckmans, D.; et al. Classification of aggressive behaviour in pigs by activity index and multilayer feed forward neural network. Biosyst. Eng. 2014, 119, 89–97. [Google Scholar] [CrossRef]
- Liang, P.; Bose, N. Neural Network Fundamentals with Graphs, Algorithms and Applications; Mac Graw-Hill: Pennsylvania Plaza, NY, USA, 1996. [Google Scholar]
- Golomb, B.A.; Lawrence, D.T.; Sejnowski, T.J. SEXnet: A neural network identifies sex from human faces. In Advances in Neural Information Processing Systems; Touretzky, D.S., Lippmann, R., Eds.; Morgan Kaufmann: San Mateo, CA, USA, 1991; Volume 3. [Google Scholar]
- Rowley, H.A.; Baluja, S.; Kanade, T. Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 23–38. [Google Scholar] [CrossRef]
- Park, D.C.; El-Sharkawi, M.; Marks, R.; Atlas, L.; Damborg, M. Electric load forecasting using an artificial neural network. IEEE Trans. Power Syst. 1991, 6, 442–449. [Google Scholar] [CrossRef] [Green Version]
- Chen, F.C.; Khalil, H.K. Adaptive control of nonlinear systems using neural networks. Int. J. Control 1992, 55, 1299–1317. [Google Scholar] [CrossRef]
- Kim, B.S.; Calise, A.J. Nonlinear flight control using neural networks. J. Guid. Control Dyn. 1997, 20, 26–33. [Google Scholar] [CrossRef]
- Minemoto, T.; Isokawa, T.; Nishimura, H.; Matsui, N. Feed forward neural network with random quaternionic neurons. Signal Process. 2017, 136, 59–68. [Google Scholar] [CrossRef]
- Hagan, M.T.; Menhaj, M.B. Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Netw. 1994, 5, 989–993. [Google Scholar] [CrossRef]
- Fun, M.H.; Hagan, M.T. Levenberg-Marquardt training for modular networks. In Proceedings of the International Conference on Neural Networks (ICNN’96), Washington, DC, USA, 3–6 June 1996; pp. 468–473. [Google Scholar]
- Bansal, S.; Calandra, R.; Xiao, T.; Levine, S.; Tomlin, C.J. Goal-Driven Dynamics Learning via Bayesian Optimization. In Proceedings of the 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, VIC, Australia, 12–15 December 2017. [Google Scholar]
- Zhao, J.; Na, J.; Gao, G. Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties. Neurocomputing 2020, 395, 56–65. [Google Scholar] [CrossRef]
- Wang, N.; Pan, X. Path Following of Autonomous Underactuated Ships: A Translation–Rotation Cascade Control Approach. IEEE/ASME Trans. Mechatron. 2019, 24, 2583–2593. [Google Scholar] [CrossRef]
- Yao, D.; Li, H.; Lu, R.; Shi, Y. Distributed Sliding-Mode Tracking Control of Second-Order Nonlinear Multiagent Systems: An Event-Triggered Approach. IEEE Trans. Cybern. 2020, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Lewis, F.L.; Vrabie, D.; Syrmos, V.L. Optimal Control; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Abouheaf, M.; Gueaieb, W. Neurofuzzy Reinforcement Learning Control Schemes for Optimized Dynamical Performance. In Proceedings of the 2019 IEEE International Symposium on Robotic and Sensors Environments (ROSE), Ottawa, ON, Canada, 17–18 June 2019; pp. 1–7. [Google Scholar]
- de Matteis, G. Dynamics of Hang Gliders. J. Guid. Control Dyn. 1991, 14, 1145–1152. [Google Scholar] [CrossRef]
- Cook, M.; Spottiswoode, M. Modelling the Flight Dynamics of the Hang Glider. Aeronaut. J. 2006, 109, I–XX. [Google Scholar] [CrossRef] [Green Version]
- De Matteis, G. Response of Hang Gliders to Control. Aeronaut. J. 1990, 94, 289–294. [Google Scholar] [CrossRef]
Parameters | Values |
---|---|
(0,30) | |
(0,15) | |
(0,11) | |
[] | |
[] | |
[] | |
N_EPISODES | 10 |
N_ITERATIONS |
Parameters | Values |
---|---|
, | |
, | |
, | |
Parameters | Values |
---|---|
, | |
, | |
Number of hidden layers | 1 |
Number of hidden neurons | 13 |
Size of data set | samples |
Training date set | |
Validation date set | |
Testing data set | |
Learning algorithm | Levenberg-Marquardt |
Model 1 | Model 2 | Model 3 |
---|---|---|
Method | Model 1 | Model 2 | Model 3 |
---|---|---|---|
RL | |||
RL+NN | |||
NLTA | |||
NLTA+NN |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, N.; Abouheaf, M.; Gueaieb, W.; Nahas, N. Model-Free Optimized Tracking Control Heuristic. Robotics 2020, 9, 49. https://doi.org/10.3390/robotics9030049
Wang N, Abouheaf M, Gueaieb W, Nahas N. Model-Free Optimized Tracking Control Heuristic. Robotics. 2020; 9(3):49. https://doi.org/10.3390/robotics9030049
Chicago/Turabian StyleWang, Ning, Mohammed Abouheaf, Wail Gueaieb, and Nabil Nahas. 2020. "Model-Free Optimized Tracking Control Heuristic" Robotics 9, no. 3: 49. https://doi.org/10.3390/robotics9030049
APA StyleWang, N., Abouheaf, M., Gueaieb, W., & Nahas, N. (2020). Model-Free Optimized Tracking Control Heuristic. Robotics, 9(3), 49. https://doi.org/10.3390/robotics9030049