Transfer Learning Method from Parameter Scene to Physical Scene Based on Self-Game Theory
Abstract
:1. Introduction
2. Materials and Methods
2.1. Algorithms Migration Task
2.2. Self-Game Training and Group Training
2.3. Virtual Self Game
2.4. Multi-Agent Virtual Self-Game
2.5. League Training
3. Results
- (1)
- Mini Map → Medium Map → Large Map
- (2)
- Avoiding obstacles and moving → Discovering and safely approaching competitors → Encircling and blocking to defeat competitors
3.1. Overall Architecture
3.2. Platform Features
3.2.1. Hardware Computing Power Layer
3.2.2. Computing Platform Layer
3.2.3. Intelligent Engine Layer
3.2.4. Algorithm Training Layer
- (1)
- Core algorithm library
- (2)
- Network Model Library
- (3)
- Basic neural network
- (4)
- Composite neural network
- (5)
- Intelligent agent network
- (6)
- Training Method Library
- (7)
- Comprehensive scheduling technology of large-scale heterogeneous computing resources
- (8)
- Distributed Reinforcement Learning Engine Technology
- (9)
- Asynchronous learning algorithm optimization technology
3.2.5. User
- (1)
- Intelligent agent development IDE components
- (2)
- Algorithm secondary development SDK components
- (3)
- Simulation Environment Integration SDK Components
- (4)
- Platform management WEB interface components
4. Simulation
Intelligent Agent Training
- (1)
- Agent programming development/optimization process
- (2)
- Scenario loading process
- (3)
- Parallel Sampling and Agent Model Updating Process in Simulation Environment
5. Conclusions
- (1)
- The training model is designed based on a parameterized environment and incorporates a feedback loop utilizing a digital environment evaluation model. This allows for continuous verification of decision-making strategies and assessment of their transferability.
- (2)
- Experimental tests were conducted to investigate the transfer learning approach from parametric scenes to physical scenes. This research draws upon foundational theories such as algorithm migration tasks, virtual self-play, multi-agent virtual self-games, and league training.
- (3)
- An internal interface transmission content model was established to facilitate the transfer of information between different components of the training model.
- (4)
- Utilizing a digital environment as a testing ground significantly enhances the iterative efficiency of virtual-to-real conversion. Moreover, it offers benefits such as reduced deployment costs and improved security in virtual–real migration experiments.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, Y.; Lin, L.; Dai, Q.; Zhang, L. Allocating common costs of multinational companies based on arm’s length principle and Nash non-cooperative game. Eur. J. Oper. Res. 2020, 283, 1002–1010. [Google Scholar] [CrossRef]
- Yu, Y.; Huang, G.Q. Nash game model for optimizing market strategies, configuration of platform products in a Vendor Managed Inventory (VMI) supply chain for a product family. Eur. J. Oper. Res. 2010, 206, 361–373. [Google Scholar] [CrossRef]
- Yin, H.; Chen, Y.H.; Yu, D.; Lü, H. Nash game oriented optimal design in controlling fuzzy dynamical systems. IEEE Trans. Fuzzy Syst. 2018, 27, 1659–1673. [Google Scholar] [CrossRef]
- Sun, Q.; Yang, G.; Wang, X.; Chen, Y.H. Optimizing constraint obedience for mechanical systems: Robust control and non-cooperative game. Mech. Syst. Signal Process. 2021, 149, 107207. [Google Scholar] [CrossRef]
- Belkov, S.; Evstigneev, I.V.; Hens, T.; Xu, L. Nash equilibrium strategies and survival portfolio rules in evolutionary models of asset markets. Math. Financ. Econ. 2020, 14, 249–262. [Google Scholar] [CrossRef]
- Tang, Z.; Zhang, L. Nash equilibrium and multi criterion aerodynamic optimization. J. Comput. Phys. 2016, 314, 107–126. [Google Scholar] [CrossRef]
- Tang, Z.L.; Chen, Y.B.; Zhang, L.H. Natural laminar flow shape optimization in transonic regime with competitive Nash game strategy. Appl. Math. Model. 2017, 48, 534–547. [Google Scholar] [CrossRef]
- Tang, Z.; Désidéri, J.A.; Périaux, J. Multicriterion Aerodynamic Shape Design Optimization and Inverse Problems Using Control Theory and Nash Games. J. Optim. Theory Appl. 2007, 135, 599–622. [Google Scholar] [CrossRef]
- Leon, E.R.; Pape, A.L.; Costes, M.; Désidéri, J.A.; Alfano, D. Concurrent Aerodynamic Optimization of Rotor Blades Using a Nash Game Method. J. Am. Helicopter Soc. 2016, 61, 1–13. [Google Scholar] [CrossRef]
- Hou, F.; Zhai, Y.; You, X. An equilibrium in group decision and its association with the Nash equilibrium in game theory. Comput. Ind. Eng. 2020, 139, 106138. [Google Scholar] [CrossRef]
- E Oliveira, H.A.; Petraglia, A. Solving generalized Nash equilibrium problems through stochastic global optimization. Appl. Soft Comput. 2016, 39, 21–35. [Google Scholar] [CrossRef]
- Su, W.; Yao, D.; Li, K.; Chen, L. A novel biased proportional navigation guidance law for close approach phase. Chin. J. Aeronaut. 2016, 19, 228–237. [Google Scholar] [CrossRef]
- Zhang, N.; Gai, W.; Zhong, M.; Zhang, J. A fast finite-time convergent guidance law with nonlinear disturbance observer for unmanned aerial vehicles collision avoidance. Aerosp. Sci. Technol. 2019, 86, 204–214. [Google Scholar] [CrossRef]
- Qian, M.S.; Wu, Z.; Jiang, B. Cerebellar model articulation neural network-based distributed fault tolerant tracking control with obstacle avoidance for fixed-wing UAVs. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 6841–6852. [Google Scholar] [CrossRef]
- Hafezi, R. How artificial intelligence can improve understanding in challenging chaotic environments. World Futures Rev. 2020, 12, 219–228. [Google Scholar] [CrossRef]
- Van Belkom, R. The impact of artificial intelligence on the activities of a futurist. World Futures Rev. 2020, 12, 156–168. [Google Scholar] [CrossRef]
- Díaz-Domínguez, A. How futures studies and foresight could address ethical dilemmas of machine learning and artificial intelligence. World Futures Rev. 2020, 12, 169–180. [Google Scholar] [CrossRef]
- Boysen, A. Mine the gap: Augmenting foresight methodologies with data analytics. World Futures Rev. 2020, 12, 239–248. [Google Scholar] [CrossRef]
- Riva, G.; Riva, E. OS for Ind Robots: Manufacturing robots get smarter thanks to artificial intelligence. Cyberpsychol. Behav. Soc. Netw. 2020, 23, 357–358. [Google Scholar] [CrossRef]
- Bevolo, M.; Amati, F. The potential role of AI in anticipating futures from a design process perspective: From the reflexive description of “esign” to a discussion of influences by the inclusion of AI in the futures research process. World Futures Rev. 2020, 12, 198–218. [Google Scholar] [CrossRef]
- Xu, L.; Tu, P.; Tang, Q. Contract design for cloud logistics (CL) based on blockchain technology (BT). Complexity 2020, 28, 1–13. [Google Scholar] [CrossRef]
- Banadkooki, F.B.; Ehteram, M.; Panahi, F.; Sh Sammen, S.; Othman, F.B.; EL-Shafie, A. Estimation of total dissolved solids (TDS) using new hybrid machine learning models. J. Hydrol. 2020, 587, 124989. [Google Scholar] [CrossRef]
- Kosovic, I.N.; Mastelic, T.; Ivankovic, D. Using artificial intelligence on environmental data from internet of things for estimating solar radiation: Comprehensive analysis. J. Clean. Prod. 2020, 266, 121489. [Google Scholar] [CrossRef]
- Daniyan, I.; Mpofu, K.; Oyesola, O.; Ramatsetse, B.; Adeodu, A. Artificial intelligence for predictive maintenance in the railcar learning factories. Procedia Manuf. 2020, 45, 13–18. [Google Scholar] [CrossRef]
- Maathuis, C.; Pieters, W.; Berg, J.V. Decision support model for effects estimation and proportionality assessment for targeting in cyber operations. Def. Technol. 2020, 17, 352–374. [Google Scholar] [CrossRef]
- Ding, R.-X.; Palomares, I.; Wang, X.Q.; Yang, G.-R.; Liu, B.C.; Dong, Y.C.; Herrera-Viedma, E.; Herrera, F. Large-Scale decision-making: Characterization, taxonomy, challenges and future directions from an artificial intelligence and applications perspective. Inf. Fusion 2020, 59, 84–102. [Google Scholar] [CrossRef]
Serial Number | Hardware Devices | Software or Algorithm Library |
---|---|---|
1 | Master control equipment | Intelligent agent development IDE components |
2 | Algorithm secondary development SDK components | |
3 | Simulation environment integration SDK components | |
4 | Platform management WEB interface components | |
5 | Cloud management equipment | Container management components |
6 | Heterogeneous resource scheduling component | |
7 | Distributed components | |
8 | Virtual network management components | |
9 | Training and management equipment | Large-scale data generation engine components |
10 | Distributed continuous learning engine components | |
11 | High-performance prediction inference engine components | |
12 | Core algorithm library | |
13 | Network model library | |
14 | Training method library | |
15 | Hardware server cluster | Simulation and deduction environment |
16 | Large-scale data generation engine components | |
17 | Distributed continuous learning engine components | |
18 | High-performance prediction inference engine components |
Serial Number | Algorithm | Referred to as | Task Category | Examples of Applicable Scenarios |
---|---|---|---|---|
1 | Deep Q-value Network Learning Algorithm | DQN | Decision problems in discrete spaces | Single-agent small-scale combat scenarios |
2 | Deep Dual Q-value Network Learning Algorithm | DDQ | Decision problems in discrete spaces | Single-agent small-scale combat scenarios |
3 | Advantage Q-value Network Learning Algorithm | Dueling DQN | Decision problems in discrete spaces | Single-agent small-scale combat scenarios |
4 | A Learning Algorithm for Noise Q-value Networks | Noisy DQN | Decision problems in discrete spaces | Single-agent small-scale combat scenarios |
5 | A Learning Algorithm for Q-valued Networks with Priority Data Queues | Prioritized DQN | Decision problems in discrete spaces | Single-agent small-scale combat scenarios |
6 | Rain Bow | Rainbow | Decision problems in discrete spaces | Single-agent small-scale combat scenarios |
7 | Deep Deterministic Strategy Gradient Algorithm | DDPG | Decision Problems in Continuous Spaces | Single-agent small-scale combat scenarios |
8 | Enhanced version of near-end optimization algorithm | PPO | Decision problems in mixed spaces | Medium-scale tactical confrontation scenario with single agent |
9 | Asynchronous Near End Optimization Algorithm | A-PPO | Decision problems in mixed spaces | Medium-scale tactical confrontation scenario with single agent |
10 | Gradient algorithm for multi-agent deep deterministic strategy | MADDPG | Decision problems in multi-agent discrete space | Multi-agent small-scale tactical confrontation scenarios |
11 | Multi Agent Near End Optimization Algorithm | MAPPO | Decision problems in multi-agent mixed space | Medium-scale tactical confrontation scenarios with multi-agent systems |
Serial Number | Sender | Receiving End | Transferring Content |
---|---|---|---|
1 | Simulation environment | Situation input module | Real-time situational data in simulation environment |
2 | Situation input module | Simulation environment | Command data of intelligent agents in simulation environments |
3 | Situation input module | Decision result generation module | Acceptable state information of neural networks |
4 | Decision result generation module | Situation input module | Acceptable instruction formats for simulation environments |
5 | Large-scale data generation engine components | Training data buffer pool | <State, Action, Reward> State is the simulation environments, action is the training data buffer pool, reward is the distributed continuous learning engine components reward |
6 | Training data buffer pool | Distributed continuous learning engine components | <State, Action, Reward> State is the simulation environments, action is the training data buffer pool, reward is the distributed continuous learning engine components reward |
7 | Distributed continuous learning engine components | Distributed continuous learning engine components | Decision result generation module neural network parameters, updated using Ring-All Reduce method |
8 | Distributed continuous learning engine components | High-performance prediction inference engine components | Decision result generation module neural network parameters |
9 | Distributed continuous learning engine components | Distributed storage component (agent model) | Decision result generation module neural network parameters, phased neural network model: intelligent agent |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, N.; Yang, G.; Su, B.; Song, W. Transfer Learning Method from Parameter Scene to Physical Scene Based on Self-Game Theory. Electronics 2024, 13, 1327. https://doi.org/10.3390/electronics13071327
Zhang N, Yang G, Su B, Song W. Transfer Learning Method from Parameter Scene to Physical Scene Based on Self-Game Theory. Electronics. 2024; 13(7):1327. https://doi.org/10.3390/electronics13071327
Chicago/Turabian StyleZhang, Nan, Guolai Yang, Bo Su, and Weilong Song. 2024. "Transfer Learning Method from Parameter Scene to Physical Scene Based on Self-Game Theory" Electronics 13, no. 7: 1327. https://doi.org/10.3390/electronics13071327
APA StyleZhang, N., Yang, G., Su, B., & Song, W. (2024). Transfer Learning Method from Parameter Scene to Physical Scene Based on Self-Game Theory. Electronics, 13(7), 1327. https://doi.org/10.3390/electronics13071327