Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem
Abstract
:1. Introduction
2. Preliminaries
2.1. Description of the Ground Tracking Problem
2.2. The Fuzzy Actor–Critic Learning Algorithm
3. Overall Design of the Proposed SK-FACL
4. Accelerating Fuzzy Actor–Critic Learning Algorithm via Suboptimal Knowledge
4.1. Guided Policy Based on the Apollonius Circle
4.2. Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge
Algorithm 1 Accelerating fuzzy actor–critic learning algorithm via suboptimal knowledge |
1: Initialize the state and discount factor for each pursuer and the evader 2: Initialize the membership functions for and 3: Initialize the consequent set of the actor and critic and 4: Initialize the learning rates and 5: Estimate the velocities and 6: For each iteration i do 7: Calculate the proper and for P0 and P1 according to the Apollonius circle 8: For each step do 9: Calculate the output of the critic in the current state 10: Obtain the actions and generated from the actor 11: Combine the and to form for each pursuer 12: Perform the action for each pursuer, and get the next state , and the reward 13: Calculate the time difference 14: Update the consequent parameters for the critic 15: Update the consequent parameters for the actor 16: End for 17: End for |
5. Numerical Results
5.1. Case 1: Pure Knowledge Control
5.2. Case 2: Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge
5.3. Case 3: Tracking a Smart Evader
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Nomenclature
Pursuers | |
E0 | Evader |
Initial positions of the pursuers | |
Initial position of the evader | |
Capture distance | |
Speed of pursuers | |
Speed of evader | |
Heading angle of the evader | |
t | Current time step |
Discount factor | |
Immediate reward | |
TD error | |
Consequent parameter in the FLC | |
Membership degree of input in the fuzzy rule | |
White noise | |
Consequent parameter of the critic | |
Learning rate for the critic | |
Learning rate for the FLC | |
Guided policy for | |
Guided policy for | |
Reinforcement learning policy for | |
Reinforcement learning policy for | |
Constant scale factor | |
Radius of the Apollonius circle | |
Center of the Apollonius circle | |
Angle between the line of sight and the evader’s heading direction | |
Angle that ensures the capture of the high-speed evader | |
Estimated speed for pursuer | |
Estimated speed for evader | |
State vector | |
Relative distance between and | |
Heading angle difference between and | |
Derivative of the angle difference | |
Relative distance at in the th training | |
Relative distance at in the th training | |
Minimum distance between and in the th training | |
Minimum distance between and in the (k − 1)th training | |
Immediate reward | |
Coefficients in reward design |
References
- Chen, J.; Zha, W.; Peng, Z.; Gu, D. Multi-player pursuit–evasion games with one superior evader. Automatica 2016, 71, 24–32. [Google Scholar] [CrossRef] [Green Version]
- Fang, B.F.; Pan, Q.S.; Hong, B.R.; Lei, D.; Zhong, Q.B.; Zhang, Z. Research on High Speed Evader vs. Multi Lower Speed Pursuers in Multi Pursuit-evasion Games. Inf. Technol. J. 2012, 11, 989–997. [Google Scholar]
- Feng, J. Development tendency and key technology of system intellectualization. Mod. Def. Technol. 2020, 48, 1–6. [Google Scholar]
- Ma, X.; Dai, K.; Li, M.; Yu, H.; Shang, W.; Ding, L.; Zhang, H.; Wang, X. Optimal-Damage-Effectiveness Cooperative-Control Strategy for the Pursuit–Evasion Problem with Multiple Guided Missiles. Sensors 2022, 22, 9342. [Google Scholar] [CrossRef]
- Rilwan, J.; Ferrara, M.; Ja’afaru, A.; Pansera, B. On pursuit and evasion game problems with Grönwall-type constraints. Qual. Quant. 2023. [Google Scholar] [CrossRef]
- Liu, H.; Wu, K.; Huang, K.; Cheng, G.; Wang, R.; Liu, G. Optimization of large-scale UAV cluster confrontation game based on integrated evolution strategy. Clust. Comput. 2023, 1–15. [Google Scholar] [CrossRef]
- Souli, N.; Kolios, P.; Ellinas, G. Multi-Agent System for Rogue Drone Interception. IEEE Robot. Autom. Lett. 2023, 8, 2221–2228. [Google Scholar] [CrossRef]
- Forestiero, A. Bio-inspired algorithm for outliers detection. Multimed. Tools Appl. 2017, 76, 25659–25677. [Google Scholar] [CrossRef]
- Forestiero, A. Heuristic recommendation technique in Internet of Things featuring swarm intelligence approach. Expert Syst. Appl. 2021, 187, 115904. [Google Scholar] [CrossRef]
- Dimeas, A.; Hatziargyriou, N. Operation of a Multiagent System for Microgrid Control. IEEE Trans. Power Syst. 2005, 20, 1447–1455. [Google Scholar] [CrossRef]
- Burgos, E.; Ceva, H.; Perazzo, R.P.J. Dynamical quenching and annealing in self-organization multiagent models. Phys. Rev. E 2001, 64, 016130. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, Z.; Wang, L.; Han, Z.; Fu, M. Distributed Formation Control of Multi-Agent Systems Using Complex Laplacian. IEEE Trans. Autom. Control 2014, 59, 1765–1777. [Google Scholar] [CrossRef]
- Flores-Resendiz, J.F.; Avilés, D.; Aranda-Bricaire, E. Formation Control for Second-Order Multi-Agent Systems with Collision Avoidance. Machines 2023, 11, 208. [Google Scholar] [CrossRef]
- Do, H.; Nguyen, H.; Nguyen, V.; Nguyen, M.; Nguyen, M.T. Formation control of multiple unmanned vehicles based on graph theory: A Comprehensive Review. ICST Trans. Mob. Commun. Appl. 2022, 7, e3. [Google Scholar] [CrossRef]
- Zhang, X.; Sun, J. Almost equitable partitions and controllability of leader–follower multi-agent systems. Automatica 2021, 131, 109740. [Google Scholar] [CrossRef]
- Zhang, X.; Xie, S.; Tao, Y.; Li, G. A robust control method for close formation of aerial-refueling UAVs. Acta Aeronaut. Astronaut. Sin. 2023. [Google Scholar] [CrossRef]
- Sun, F.; Wang, F.; Liu, P.; Kurths, J. Robust fixed-time connectivity preserving consensus of nonlinear multi-agent systems with disturbance. Int. J. Robust Nonlinear Control 2021, 32, 1469–1486. [Google Scholar] [CrossRef]
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Doroodgar, B.; Nejat, G. A hierarchical reinforcement learning based control architecture for semi-autonomous rescue robots in cluttered environments. In Proceedings of the 2010 IEEE International Conference on Automation Science and Engineering, Toronto, ON, Canada, 21–24 August 2010. [Google Scholar]
- Barros, P.; Yalçın, N.; Tanevska, A.; Sciutti, A. Incorporating rivalry in reinforcement learning for a competitive game. Neural Comput. Appl. 2022, 1–14. [Google Scholar] [CrossRef]
- Sniehotta, F.F. Towards a theory of intentional behaviour change: Plans, planning, and self-regulation. Br. J. Health Psychol. 2009, 14, 261–273. [Google Scholar] [CrossRef]
- Sewak, M. Temporal Difference Learning, SARSA, and Q-Learning: Some Popular Value Approximation Based Reinforcement Learning Approaches; Springer: Singapore, 2019. [Google Scholar]
- Woeginger, G.J. Exact Algorithms for NP-Hard Problems: A Survey. In Proceedings of the Combinatorial Optimization-Eureka, You Shrink!, Papers Dedicated to Jack Edmonds, International Workshop, Aussois, France, 5–9 March 2001. [Google Scholar]
- Cui, Y.; Zhu, L.; Fujisaki, M.; Kanokogi, H.; Matsubara, T. Factorial Kernel Dynamic Policy Programming for Vinyl Acetate Monomer Plant Model Control. In Proceedings of the 14th IEEE International Conference on Automation Science and Engineering, Munich, Germany, 20–24 August 2018. [Google Scholar]
- Wang, X.; Shi, P.; Zhao, Y.; Sun, Y. A Pre-Trained Fuzzy Reinforcement Learning Method for the Pursuing Satellite in a One-to-One Game in Space. Sensors 2020, 20, 2253. [Google Scholar] [CrossRef] [PubMed]
- Neu, G.; Szepesvari, C. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods. arXiv 2012, arXiv:1206.5264. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Liu, M.; Zhu, Y.; Zhao, D. An Improved Minimax-Q Algorithm Based on Generalized Policy Iteration to Solve a Chaser-Invader Game. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020. [Google Scholar]
- Vamvoudakis, K.G. Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems. Automatica 2015, 61, 274–281. [Google Scholar] [CrossRef]
- Lin, K.; Zhao, R.; Xu, Z.; Zhou, J. Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning. ACM 2018, 1774–1783. [Google Scholar] [CrossRef]
- Chi, C.; Ji, K.; Song, P.; Marahatta, A.; Zhang, S.; Zhang, F.; Qiu, D.; Liu, Z. Cooperatively Improving Data Center Energy Efficiency Based on Multi-Agent Deep Reinforcement Learning. Energies 2021, 14, 2071. [Google Scholar] [CrossRef]
- Li, B.; Yue, K.Q.; Gan, Z.G.; Gao, P.X. Multi-UAV Cooperative Autonomous Navigation Based on Multi-agent Deep Deterministic Policy Gradient. Yuhang Xuebao J. Astronaut. 2021, 42, 757–765. [Google Scholar]
- Wang, X.; Shi, P.; Wen, C.; Zhao, Y. Design of Parameter-Self-Tuning Controller Based on Reinforcement Learning for Tracking Noncooperative Targets in Space. IEEE Trans. Aerosp. Electron. Syst. 2020, 56, 4192–4208. [Google Scholar] [CrossRef]
- Wang, X.; Shi, P.; Schwartz, H.; Zhao, Y. An algorithm of pretrained fuzzy actor–critic learning applying in fixed-time space differential game. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2021, 235, 2095–2112. [Google Scholar] [CrossRef]
- Wang, K.; Xing, R.; Feng, W.; Huang, B. A Method of UAV Formation Transformation Based on Reinforcement Learning Multi-agent. In Proceeding of the 2021 International Conference on Wireless Communications, Networking and Applications, Hangzhou, China, 13–15 August 2021. [Google Scholar] [CrossRef]
- Xu, D.; Chen, G. Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning. Aeronaut. J. 2022, 126, 932–951. [Google Scholar] [CrossRef]
- Cardarilli, G.C.; Di Nunzio, L.; Fazzolari, R.; Giardino, D.; Re, M.; Ricci, A.; Spanò, S. An FPGA-based multi-agent Reinforcement Learning timing synchronizer. Comput. Electr. Eng. 2022, 99, 107749. [Google Scholar] [CrossRef]
- Wang, X.; Shi, P.; Wen, C.; Zhao, Y. An Algorithm of Reinforcement Learning for Maneuvering Parameter Self-Tuning Applying in Satellite Cluster. Math. Probl. Eng. 2020, 2020, 1–17. [Google Scholar] [CrossRef]
- Dorothy, M.; Maity, D.; Shishika, D.; Von Moll, A. One Apollonius Circle is Enough for Many Pursuit-Evasion Games. arXiv 2021, arXiv:2111.09205. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, X.; Ma, Z.; Mao, L.; Sun, K.; Huang, X.; Fan, C.; Li, J. Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem. Electronics 2023, 12, 1852. https://doi.org/10.3390/electronics12081852
Wang X, Ma Z, Mao L, Sun K, Huang X, Fan C, Li J. Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem. Electronics. 2023; 12(8):1852. https://doi.org/10.3390/electronics12081852
Chicago/Turabian StyleWang, Xiao, Zhe Ma, Lei Mao, Kewu Sun, Xuhui Huang, Changchao Fan, and Jiake Li. 2023. "Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem" Electronics 12, no. 8: 1852. https://doi.org/10.3390/electronics12081852
APA StyleWang, X., Ma, Z., Mao, L., Sun, K., Huang, X., Fan, C., & Li, J. (2023). Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem. Electronics, 12(8), 1852. https://doi.org/10.3390/electronics12081852