Improving Agent Decision Payoffs via a New Framework of Opponent Modeling
Abstract
:1. Introduction
2. Preliminaries
2.1. Reinforcement Learning
2.2. Q-Learning
2.3. DQN
3. Opponent Modeling Based on Deep Reinforcement Learning
3.1. Decision Process with Opponent Modeling
3.2. Deep Reinforcement Network with Opponent Modeling
4. Reinforcement Learning Algorithm for the Opponent Network
Algorithm 1: DecoupleRlO |
|
5. Experiment and Analysis
5.1. Soccer Game
5.1.1. Experimental Parameters
5.1.2. Analysis of the Results
5.2. Quiz Bowl
5.2.1. Experimental Setting
5.2.2. Results
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sun, C.; Liu, W.; Dong, L. Reinforcement Learning With Task Decomposition for Cooperative Multiagent Systems. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2054–2065. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Huang, A.; Maddison, C.; Guez, A.; Sifre, L.; Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Ferreira, J.C.L. Opponent Modelling in Texas Hold’em: Learning Pre-Flop Strategies in Multiplayer Tables. Master’s Thesis, University of Porto, Porto, Portugal, 2012. [Google Scholar]
- Chen, H.; Wang, C.; Huang, J.; Kong, J.; Deng, H. XCS with opponent modelling for concurrent reinforcement learners. Neurocomputing 2020, 399, 449–466. [Google Scholar] [CrossRef]
- Sui, Z.; Pu, Z.; Yi, J.; Wu, S. Formation Control With Collision Avoidance Through Deep Reinforcement Learning Using Model-Guided Demonstration. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2358–2372. [Google Scholar] [CrossRef] [PubMed]
- Noor, U.; Anwar, Z.; Malik, A.W.; Khan, S.; Saleem, S. A machine learning framework for investigating data breaches based on semantic analysis of adversary attack patterns in threat intelligence repositories. Future Gener. Comput. Syst. 2019, 95, 467–487. [Google Scholar] [CrossRef]
- Li, Z. Artificial Intelligence Algorithms for Strategic Reasoning over Complex Multiagent Systems. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, London, UK, 29 May–2 June 2023; pp. 2931–2933. [Google Scholar]
- Bondi, E. Bridging the Gap Between High-Level Reasoning in Strategic Agent Coordination and Low-Level Agent Development. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, Montreal, QC, Canada, 13–17 May 2019; pp. 2402–2404. [Google Scholar]
- Tsantekidis, A.; Passalis, N.; Toufa, A.S.; Saitas-Zarkias, K.; Chairistanidis, S.; Tefas, A. Price Trailing for Financial Trading Using Deep Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2837–2846. [Google Scholar] [CrossRef]
- Albrecht, S.V.; Stone, P. Autonomous agents modelling other agents: A comprehensive survey and open problems. Artif. Intell. 2018, 258, 66–95. [Google Scholar] [CrossRef] [Green Version]
- Yu, X.; Jiang, J.; Zhang, W.; Jiang, H.; Lu, Z. Model-based opponent modeling. Adv. Neural Inf. Process. Syst. 2022, 35, 28208–28221. [Google Scholar]
- Nashed, S.; Zilberstein, S. A survey of opponent modeling in adversarial domains. J. Artif. Intell. Res. 2022, 73, 277–327. [Google Scholar] [CrossRef]
- Lockett, A.J.; Chen, C.L.; Miikkulainen, R. Evolving explicit opponent models in game playing. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, London, UK, 7–11 July 2007; pp. 2106–2113. [Google Scholar] [CrossRef] [Green Version]
- Schwarting, W.; Seyde, T.; Gilitschenski, I.; Liebenwein, L.; Sander, R.; Karaman, S.; Rus, D. Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space. In Proceedings of the Conference on Robot Learning, London, UK, 8–11 November 2021; pp. 1855–1870. [Google Scholar]
- Liu, L.; Yang, J.; Zhang, Y.; Zhang, J.; Ma, Y. An Overview of Opponent Modeling for Multi-agent Competition. In Proceedings of the International Conference on Machine Learning for Cyber Security, Guangzhou, China, 2–4 December 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 634–648. [Google Scholar] [CrossRef]
- Freire, I.T.; Puigbò, J.Y.; Arsiwalla, X.D.; Verschure, P.F. Modeling the opponent’s action using control-based reinforcement learning. In Proceedings of the Biomimetic and Biohybrid Systems: 7th International Conference, Living Machines 2018, Paris, France, 17–20 July 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 179–186. [Google Scholar] [CrossRef]
- Kim, M.J.; Kim, K.J. Opponent modeling based on action table for MCTS-based fighting game AI. In Proceedings of the 2017 IEEE Conference on Computational Intelligence and Games (CIG), New York, NY, USA, 22–25 August 2017; pp. 178–180. [Google Scholar] [CrossRef]
- Gabel, T.; Godehardt, E. I Know What You’re Doing: A Case Study on Case-Based Opponent Modeling and Low-Level Action Prediction. In Proceedings of the ICCBR (Workshops), Frankfurt, Germany, 28–30 September 2015; pp. 13–22. [Google Scholar]
- Carmel, D.; Markovitch, S. Opponent Modeling in Multi-Agent Systems. In Proceedings of the Workshop on Adaption and Learning in Multi-Agent Systems, IJCAI ’95, Montreal, QC, Canada, 21 August 1995; pp. 40–52. [Google Scholar] [CrossRef]
- Hosokawa, Y.; Fujita, K. Opponent’s Preference Estimation Considering Their Offer Transition in Multi-Issue Closed Negotiations. IEICE Trans. Inf. Syst. 2020, 103, 2531–2539. [Google Scholar] [CrossRef]
- Zafari, F.; Nassiri-Mofakham, F. Popponent: Highly accurate, individually and socially efficient opponent preference model in bilateral multi issue negotiations. Artif. Intell. 2016, 237, 59–91. [Google Scholar] [CrossRef]
- Efstathiou, I.; Lemon, O. Learning Better Trading Dialogue Policies by Inferring Opponent Preferences. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, Singapore, 9–13 May 2016; pp. 1403–1404. [Google Scholar]
- Tian, Z.; Wen, Y.; Gong, Z.; Punakkath, F.; Zou, S.; Wang, J. A regularized opponent model with maximum entropy objective. In Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, International Joint Conferences on Artifical Intelligence (IJCAI), Macao, China, 10–16 August 2019; Volume 28, pp. 602–608. [Google Scholar] [CrossRef] [Green Version]
- Gallego, V.; Naveiro, R.; Insua, D.R.; Gómez-Ullate, D. Opponent Aware Reinforcement Learning. arXiv 2019, arXiv:1908.08773. [Google Scholar] [CrossRef]
- Shen, M.; How, J.P. Robust opponent modeling via adversarial ensemble reinforcement learning. In Proceedings of the International Conference on Automated Planning and Scheduling, Guangzhou, China, 2–13 August 2021; Volume 31, pp. 578–587. [Google Scholar] [CrossRef]
- Baarslag, T.; Hendrikx, M.J.; Hindriks, K.V.; Jonker, C.M. Learning about the opponent in automated bilateral negotiation: A comprehensive survey of opponent modeling techniques. Auton. Agents Multi-Agent Syst. 2016, 30, 849–898. [Google Scholar] [CrossRef] [Green Version]
- Wu, Z.; Li, K.; Xu, H.; Zang, Y.; An, B.; Xing, J. L2E: Learning to exploit your opponent. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar] [CrossRef]
- Schadd, F.; Bakkes, S.; Spronck, P. Opponent modeling in real-time strategy games. In Proceedings of the GAME-ON 2007, Bologna, Italy, 20–22 November 2007; pp. 61–70. [Google Scholar]
- Zhang, W.; Wang, X.; Shen, J.; Zhou, M. Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Montreal, QC, Canada, 19–27 August 2021; pp. 3384–3391. [Google Scholar] [CrossRef]
- Hernandez-Leal, P.; Zhan, Y.; Taylor, M.E.; Sucar, L.E.; de Cote, E.M. Efficiently detecting switches against non-stationary opponents. Auton. Agents Multi-Agent Syst. 2017, 31, 767–789. [Google Scholar] [CrossRef] [Green Version]
- Foerster, J.N.; Chen, R.Y.; Al-Shedivat, M.; Whiteson, S.; Abbeel, P.; Mordatch, I. Learning with Opponent-Learning Awareness. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, 10–15 July 2018; pp. 122–130. [Google Scholar] [CrossRef]
- He, H.; Boyd-Graber, J.; Kwok, K.; Daumé, H., III. Opponent modeling in deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1804–1813. [Google Scholar] [CrossRef]
- Everett, R.; Roberts, S.J. Learning Against Non-Stationary Agents with Opponent Modelling and Deep Reinforcement Learning. In Proceedings of the AAAI Spring Symposia, Palo Alto, CA, USA, 26–28 March 2018. [Google Scholar]
- Lu, H.; Gu, C.; Luo, F.; Ding, W.; Liu, X. Optimization of lightweight task offloading strategy for mobile edge computing based on deep reinforcement learning. Future Gener. Comput. Syst. 2020, 102, 847–861. [Google Scholar] [CrossRef]
- Haotian, X.; Long, Q.; Junjie, Z.; Yue, H.; Qi, Z. Research Progress of Opponent Modeling Based on Deep Reinforcement Learning. J. Syst. Simul. 2023, 35, 671. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Michale, J. Positive and Negative Reinforcement, A Distinction That Is No Longer Necessary; or a Better Way to Talk about Bad Things. J. Organ. Behav. Manag. 2005, 24, 207–222. [Google Scholar] [CrossRef]
- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
- Barron, E.; Ishii, H. The Bellman equation for minimizing the maximum cost. Nolinear Anal. Theory Method Appl. 1989, 13, 1067–1090. [Google Scholar] [CrossRef]
- Jang, B.; Kim, M.; Harerimana, G.; Kim, J.W. Q-learning algorithms: A comprehensive classification and applications. IEEE Access 2019, 7, 133653–133667. [Google Scholar] [CrossRef]
- Weitkamp, L.; van der Pol, E.; Akata, Z. Visual rationalizations in deep reinforcement learning for atari games. In Proceedings of the Artificial Intelligence: 30th Benelux Conference, BNAIC 2018, ‘s-Hertogenbosch, The Netherlands, 8–9 November 2018; pp. 151–165. [Google Scholar] [CrossRef] [Green Version]
- Fayjie, A.R.; Hossain, S.; Oualid, D.; Lee, D.J. Driverless car: Autonomous driving using deep reinforcement learning in urban environment. In Proceedings of the 2018 15th International Conference on Ubiquitous Robots (ur), Honolulu, HI, USA, 26–30 June 2018; pp. 896–901. [Google Scholar] [CrossRef]
- Zhang, W.; Gai, J.; Zhang, Z.; Tang, L.; Liao, Q.; Ding, Y. Double-DQN based path smoothing and tracking control method for robotic vehicle navigation. Comput. Electron. Agric. 2019, 166, 104985. [Google Scholar] [CrossRef]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1995–2003. [Google Scholar] [CrossRef]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar] [CrossRef]
- Littman, M.L. Markov Games as a Framework for Multi-Agent Reinforcement Learning. In Proceedings of the Eleventh International Conference on International Conference on Machine Learning, ICML’94, New Brunswick, NJ, USA, 10–13 July 1994; pp. 157–163. [Google Scholar] [CrossRef] [Green Version]
Player | Maximum Reward | Average Reward |
---|---|---|
DQN | 0.6658 | 0.5479 |
DRON-Fc2 | 0.6826 | 0.6439 |
DualFc2 | 0.7154 | 0.6723 |
DecoupleFc2 | 0.7242 | 0.6472 |
DecoupleDualFc2 | 0.7116 | 0.6590 |
Player | Maximum Reward | Average Reward |
---|---|---|
DQN | 0.6658 | 0.5479 |
DRON-Moe | 0.6764 | 0.6286 |
DualMoe | 0.7022 | 0.6206 |
DecoupleMoe | 0.7056 | 0.6758 |
DecoupleDualMoe | 0.722 | 0.6949 |
Player | Maximum Reward | Average Reward |
---|---|---|
DQN | 0.7709 | −0.2171 |
DRON-Fc2 | 0.7918 | −0.0736 |
DualFc2 | 0.8788 | 0.3426 |
DecoupleFc2 | 0.8331 | 0.2781 |
DecoupleDualFc2 | 0.7910 | 0.4099 |
Player | Maximum Reward | Average Reward |
---|---|---|
DQN | 0.7709 | −0.2171 |
DRON-Moe | 0.7819 | 0.5292 |
DualMoe | 0.8606 | 0.6174 |
DecoupleMoe | 0.8017 | 0.5911 |
DecoupleDualMoe | 0.8307 | 0.6346 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, C.; Cong, J.; Zhao, T.; Zhu, E. Improving Agent Decision Payoffs via a New Framework of Opponent Modeling. Mathematics 2023, 11, 3062. https://doi.org/10.3390/math11143062
Liu C, Cong J, Zhao T, Zhu E. Improving Agent Decision Payoffs via a New Framework of Opponent Modeling. Mathematics. 2023; 11(14):3062. https://doi.org/10.3390/math11143062
Chicago/Turabian StyleLiu, Chanjuan, Jinmiao Cong, Tianhao Zhao, and Enqiang Zhu. 2023. "Improving Agent Decision Payoffs via a New Framework of Opponent Modeling" Mathematics 11, no. 14: 3062. https://doi.org/10.3390/math11143062
APA StyleLiu, C., Cong, J., Zhao, T., & Zhu, E. (2023). Improving Agent Decision Payoffs via a New Framework of Opponent Modeling. Mathematics, 11(14), 3062. https://doi.org/10.3390/math11143062