SC-Phi2: A Fine-Tuned Small Language Model for StarCraft II Build Order Prediction
Abstract
:1. Introduction
1.1. LLMs in StarCraft
1.2. MSC Dataset
- Each match within the replay contains at least 10,000 or more frames.
- Both the player and the opponent have at least 10 APM (actions per minute) rate.
- Both players have at least 1000 MMR (match-making ratio).
- Broken or incomplete replays are excluded.
2. Method
2.1. SC2 Text Dataset
2.2. Stage-1 Fine-Tuning the SLM
2.3. Stage-2 Fine-Tuning
2.3.1. Visual Backbone
2.3.2. Global Features for Prompt Generation
2.3.3. Dynamic Prompt Generation
2.3.4. Prompt Strategy
2.3.5. Final Fine-Tuning of SLM
2.4. Training
LoRA and QLoRA Adaptation for Language Backbone
3. Results
3.1. Ablation Results
3.1.1. Impact of One-Stage Fine-Tuning
3.1.2. Impact of Two-Stage Fine-Tuning
3.1.3. Comparison with Phi-3.5-Vision
4. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
SC2 | StarCraft II |
SLM | Small Language Model |
LLM | Large Language Model |
ViT | Vision Transformer |
BLIP | Bootstrapping Language-Image Pre-training |
GPT | Generative Pre-trained Transformer |
LLaMA | Large Language Model Meta AI |
LoRA | Low Rank Adaptation |
QLoRA | Quantized Low Rank Adaptation |
Appendix A
Appendix A.1. Stage-1 Self-Supervised Fine-Tuning
LoRA r | LoRA Alpha | Training Time | Training Loss | Epochs | 4 Bit | 8 Bit | Warmup Steps | Optimizer |
---|---|---|---|---|---|---|---|---|
32 | 64 | 9625 | 1.58 | 40 | Yes | No | 0 | Paged AdamW 8 bit |
64 | 128 | 10,587 | 2.0427 | 20 | Yes | No | 20 | Paged AdamW 8 bit |
64 | 128 | 20,946 | 1.57 | 40 | Yes | No | 20 | Paged AdamW 8 bit |
64 | 128 | 22,157 | 1.56 | 40 | No | Yes | 20 | Paged AdamW 8 bit |
64 | 128 | 43,329 | 1.06 | 80 | No | Yes | 30 | AdamW 8 bit |
64 | 128 | 54,705 | 0.8854 | 100 | No | Yes | 30 | AdamW 8 bit |
64 | 128 | 54,277 | 0.9866 | 100 | No | Yes | 30 | Paged AdamW 32 bit |
64 | 128 | 78,544 | 0.6355 | 140 | No | Yes | 50 | AdamW 8 bit |
64 | 128 | 87153 | 0.5564 | 160 | No | Yes | 80 | AdamW 8 bit |
Base Model Params | LoRA r | LoRA Alpha | Trainable Params | Token Length | GPU Memory |
---|---|---|---|---|---|
2,889,784,320 | 32 | 64 | 1.31% | 36,700,160 | 820 | 17.1 GB |
2,889,784,320 | 64 | 128 | 2.57% | 74,400,320 | 820 | 17.5 GB |
2,889,784,320 | 96 | 192 | 3.81% | 110,100,480 | 820 | 17.9 GB |
Appendix A.2. Stage-2 Self-Supervised Fine-Tuning
Base Model Params | LoRA r | LoRA Alpha | Trainable Params | Token Length | Batch Size | GPU Memory |
---|---|---|---|---|---|---|
2,889,784,320 | 96 | 192 | 3.81% | 110,100,480 | 288 | 4 | 23.44 GB |
References
- Čertický, M.; Churchill, D.; Kim, K.J.; Čertický, M.; Kelly, R. StarCraft AI Competitions, Bots, and Tournament Manager Software. IEEE Trans. Games 2019, 11, 227–237. [Google Scholar] [CrossRef]
- Vinyals, O.; Babuschkin, I.; Czarnecki, M.W.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, H.D.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef]
- Huang, R.; Wu, X.; Yu, H.; Fan, Z.; Fu, H.; Fu, Q.; Yang, W. A Robust and Opponent-Aware League Training Method for StarCraft II. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; 2023; Volume 36, pp. 47554–47574. [Google Scholar]
- Churchill, D.; Buro, M. Build order optimization in StarCraft. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Palo Alto, CA, USA, 10–14 October 2011; pp. 14–19. [Google Scholar]
- Gallotta, R.; Todd, G.; Zammit, M.; Earle, S.; Liapis, A.; Togelius, J.; Yannakakis, G.N. Large Language Models and Games: A Survey and Roadmap. arXiv 2024, arXiv:2402.18659. [Google Scholar] [CrossRef]
- Wang, G.; Xie, Y.; Jiang, Y.; Mandlekar, A.; Xiao, C.; Zhu, Y.; Fan, L.; Anandkumar, A. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv 2023, arXiv:2305.16291. [Google Scholar]
- Hu, S.; Huang, T.; Ilhan, F.; Tekin, S.; Liu, G.; Kompella, R.; Liu, L. A Survey on Large Language Model-Based Game Agents. arXiv 2024, arXiv:2404.02039. [Google Scholar]
- Zhu, X.; Chen, Y.; Tian, H.; Tao, C.; Su, W.; Yang, C.; Huang, G.; Li, B.; Lu, L.; Wang, X.; et al. Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory. arXiv 2023, arXiv:2305.17144. [Google Scholar]
- Zhou, E.; Qin, Y.; Yin, Z.; Huang, Y.; Zhang, R.; Sheng, L.; Qiao, Y.; Shao, J. MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control. arXiv 2024, arXiv:2403.12037. [Google Scholar]
- Yuan, H.; Zhang, C.; Wang, H.; Xie, F.; Cai, P.; Dong, H.; Lu, Z. Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks. In Proceedings of the NeurIPS 2023 Foundation Models for Decision Making Workshop, New Orleans, LA, USA, 15 December 2023. [Google Scholar]
- Kambhampati, S. Can large language models reason and plan? Ann. N. Y. Acad. Sci. 2024, 1534, 15–18. [Google Scholar] [CrossRef] [PubMed]
- Ma, W.; Mi, Q.; Zeng, Y.; Yan, X.; Wu, Y.; Lin, R.; Zhang, H.; Wang, J. Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach. arXiv 2024, arXiv:2312.11865. [Google Scholar]
- Gunasekar, S.; Zhang, Y.; Aneja, J.; Mendes, C.C.T.; Giorno, A.D.; Gopi, S.; Javaheripi, M.; Kauffmann, P.; de Rosa, G.; Saarikivi, O.; et al. Textbooks Are All You Need. arXiv 2023, arXiv:2306.11644. [Google Scholar]
- Shao, X.; Jiang, W.; Zuo, F.; Liu, M. SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models. arXiv 2024, arXiv:2401.17749. [Google Scholar]
- Li, J.; Li, D.; Savarese, S.; Hoi, S. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Wu, H.; Zhang, J.; Huang, K. MSC: A Dataset for Macro-Management in StarCraft II. arXiv 2017, arXiv:1710.03131. [Google Scholar]
- Vinyals, O.; Ewalds, T.; Bartunov, S.; Georgiev, P.; Vezhnevets, A.S.; Yeo, M.; Makhzani, A.; Küttler, H.; Agapiou, J.P.; Schrittwieser, J.; et al. StarCraft II: A New Challenge for Reinforcement Learning. arXiv 2017, arXiv:1708.04782. [Google Scholar]
- Khan, M.J.; Hassan, S.; Sukthankar, G. Leveraging Transformers for StarCraft Macromanagement Prediction. In Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 13–16 December 2021; pp. 1229–1234. [Google Scholar]
- Liquipedia. 2024. Available online: https://liquipedia.net/starcraft/Main_Page (accessed on 16 February 2024).
- Wiki, S. StarCraft Wiki. 2024. Available online: https://starcraft.fandom.com/wiki/StarCraft_Wiki# (accessed on 18 February 2024).
- StarCraft-Wikipedia. 2024. Available online: https://en.wikipedia.org/wiki/StarCraft (accessed on 20 February 2024).
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv 2020, arXiv:1910.03771. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the International Conference on Learning Representations, Online, 25–29 April 2022. [Google Scholar]
- Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. QLORA: Efficient Finetuning of Quantized LLMs. In Proceedings of the 37th Annual Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Yuan, Z.; Li, Z.; Sun, L. TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones. arXiv 2023, arXiv:2312.16862. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. arXiv 2021, arXiv:2103.00020. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Online, 3–7 May 2021. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; 2022; Volume 35, pp. 24824–24837. [Google Scholar]
- Abdin, M.; Aneja, J.; Awadalla, H.; Awadallah, A.; Awan, A.A.; Bach, N.; Bahree, A.; Bakhtiari, A.; Bao, J.; Behl, H.; et al. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv 2024, arXiv:2404.14219. [Google Scholar]
- Ranella, N.; Eger, M. Towards Automated Video Game Commentary Using Generative AI. In Proceedings of the EXAG@ AIIDE, Salt Lake City, UT, USA, 18 October 2023. [Google Scholar]
Race | No. of Actions | No. of Units |
---|---|---|
Terran | 75 | 336 |
Protoss | 61 | 246 |
Zerg | 74 | 714 |
Question | Answer |
---|---|
What is the role of Extractor in Zerg? | The Extractor is the Zerg building from which Drones collect Vespene Gas. The Extractor is the Zerg equivalent to the Terran Refinery and the Protoss Assimilator. Besides being the fundamental Zerg building for collecting the Vespene Gas resource, the Extractor is sometimes used early game to bypass the 14 food limit by using the Extractor Trick. Players can build one or two Extractors at their main base, start the production of one or two Drones and then cancel the Extractor(s) as a way of avoiding the delay of a morphing Overlord when supply capped. This temporary decrease in supply is only possible for Zerg due to the fact that a morphing Extractor, along with all other Zerg buildings, requires the sacrifice of one Drone. Note that performing this supply trick will only refund 19 of the 25 minerals (75% rounded) used to morph the Extractor. Scouting or expanding drones that are under siege can be morphed into extractors on the fly to prevent the drone from being killed. They can later be canceled to continue scouting or expanding once the threats have been nullified. |
What is the role of Terran Medivac vs. Protoss? | Medivacs are brought with a Terran bio army to provide healing support. They are also frequently used to drop units in the Protoss base and snipe important infrastructure (Mining Probes, Pylons, Nexus, tech structures). They are also used in TvP for Hellion/Hellbat drops. |
What are the Training actions in Protoss? | “Train_Adept_quick”, “Train_Carrier_quick”, “Train_Colossus_quick”, “Train_DarkTemplar_quick”, “Train_Disruptor_quick”, “Train_HighTemplar_quick”, “Train_Immortal_quick”, “Train_MothershipCore_quick”, “Train_Observer_quick”, “Train_Oracle_quick”, “Train_Phoenix_quick”, “Train_Probe_quick”, “Train_Sentry_quick”, “Train_Stalker_quick”, “Train_Tempest_quick”, “Train_VoidRay_quick”, “Train_WarpPrism_quick”, “Train_Zealot_quick” |
Match-Up | No. of Replays | No. of Replays Used |
---|---|---|
Terran vs. Terran (TvT) | 4897 | 1000 |
Terran vs. Protoss (TvP) | 7894 | 1000 |
Terran vs. Zerg (TvZ) | 9996 | 1000 |
Protoss vs. Protoss (PvP) | 4334 | 1000 |
Protoss vs. Zerg (PvZ) | 6509 | 1000 |
Zerg vs. Zerg (ZvZ) | 2989 | 1000 |
Games | GRU [16] | Transformer [18] | Ours |
---|---|---|---|
Mirror matchups | |||
TvT | 73.1% | 74.38% | 76.82% |
PvP | 74.2% | 74.6% | 78.49% |
ZvZ | 74.9% | 74.6% | 77.07% |
Non-mirror matchups | |||
PvT | 69.6% | 77.58% | 79.62% |
PvZ | 74.2% | 77.92% | 80.37% |
TvZ | 74.8% | 75.22% | 78.74% |
Games | Fine-Tuned | Transfer Learning | |
---|---|---|---|
Zero-Shot | 5-Shot | ||
TvT to TvZ | 78.74% | 53.34% | 74.63% |
PvP to PvT | 79.62% | 51.37% | 75.0% |
PvP to PvZ | 80.37% | 51.74% | 75.81% |
Ground Truth Actions | ‘Research_RavenCorvidReactor_quick’, ‘Research_AdvancedBallistics_quick’, ‘Research_RavenCorvidReactor_quick’, ‘Research_AdvancedBallistics_quick’ |
Ground Truth Outcome | win |
Generated Actions and Outcome | Instruct: As an expert StarCraft II Terran player, playing against the Terran, predict the next 4 actions and also the result of the game, given the following resources: |
Game Stage: Mid, Army Count: low, Army Units/Buildings: 5 buildings | |
Minerals collected: low, Minerals used: low, Vespene gas collected: low, Vespene gas used: low | |
Food used: low, Food cap: low, Food for Army: low, Food for Workers: low | |
Idle Workers: low, Warp gates count: low, Larva count: low. | |
Output: | |
Action 1: Research_RavenCorvidReactor_quick | |
Action 2: Research_AdvancedBallistics_quick | |
Action 3: Research_RavenCorvidReactor_quick | |
Action 4: Research_RavenCorvidReactor_quick | |
Result: win |
Prompt | Question: What are the Build actions for Terran? Answer: |
Phi-2 Output | Question: What are the Build actions for Terran? Answer: Build a barracks Build a research lab Build a space station Build a space station |
Stage 1 | Question: What are the available Build actions for Terran? Answer: actions = [‘Build_Barracks_screen’, Build_CommandCenter_screen’, ‘Build_EngineeringBay_screen’] |
8-bits | Output: Terran: low, Terran: low, Terran units: low, Terran units: low, Terran units: low, Terran units: low, Terran units: low, Terran units: low, Terran units: low, Terran units: low |
16-bits | Output: Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy |
32-bit | Output: Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy Units: low, Enemy |
Stage 1 | Output: Terran: low, Terran units: low, Terran units: low, Terran units: low, Terran units: low, Terran units: low, Terran units: low, Terran units: low, Terran units: low |
Stage 2 | Output: Action 1: Research_RavenCorvidReactor_quick, Action 2: Research_AdvancedBallistics_quick, Action 3: Research_RavenCorvidReactor_quick, Action 4: Research_RavenCorvidReactor_quick |
Prompt | Output |
---|---|
As an expert StarCraft II Terran player, playing against the Terran, predict the next 4 actions and also the result of the game, given the following resources: Game Stage: Mid, Army Count: low, Army Units/Buildings: 5 buildings Minerals collected: low, Minerals used: low, Vespene gas collected: low, Vespene gas used: low Food used: low, Food cap: low, Food for Army: low, Food for Workers: low Idle Workers: low, Warp gates count: low, Larva count: low. Output: | As an AI, I cannot predict future events or outcomes in a game of StarCraft II. My responses are based on the information provided and do not include speculation or predictions. |
Consider youself an expert StarCraft II Terran Player. You are playing against the Protoss opponent. Currently you have enough resources to build army units, upgrade or train your army. What 4 actions will you perform next? | As an AI, I cannot provide real-time strategic decisions for a game of StarCraft II. My responses are based on the information provided and do not include speculation or predictions. |
The image is the map features of Starcraft game where each circle represents a building. Now assume yourself as the Starcraft Terran player and your opponent is Zerg. You have enough resources to perform next actions including developing new building, upgrading or researching. What actions will you do next? | As a Starcraft Terran player, my next actions would depend on the current state of the game and my opponent’s actions. However, some general options could include: 1. Building a new structure, such as a Barracks or a Factory, to increase my production capabilities. 2. Upgrading existing structures to improve their efficiency and output. 3. Researching new technologies to gain an advantage in the game. Expanding my base to secure more resources and defend against potential attacks. 4. Focusing on resource management to ensure a steady supply of resources for future actions. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khan, M.J.; Sukthankar, G. SC-Phi2: A Fine-Tuned Small Language Model for StarCraft II Build Order Prediction. AI 2024, 5, 2338-2352. https://doi.org/10.3390/ai5040115
Khan MJ, Sukthankar G. SC-Phi2: A Fine-Tuned Small Language Model for StarCraft II Build Order Prediction. AI. 2024; 5(4):2338-2352. https://doi.org/10.3390/ai5040115
Chicago/Turabian StyleKhan, Muhammad Junaid, and Gita Sukthankar. 2024. "SC-Phi2: A Fine-Tuned Small Language Model for StarCraft II Build Order Prediction" AI 5, no. 4: 2338-2352. https://doi.org/10.3390/ai5040115
APA StyleKhan, M. J., & Sukthankar, G. (2024). SC-Phi2: A Fine-Tuned Small Language Model for StarCraft II Build Order Prediction. AI, 5(4), 2338-2352. https://doi.org/10.3390/ai5040115