Imitating Human Go Players via Vision Transformer
Abstract
:1. Introduction
- Application of Vision Transformers (ViTs) in Go: We introduce a ViT-based model that captures long-range dependencies in Go and predicts future moves, providing deeper insights into game progression and improving gameplay understanding.
- Bridging AI and Human Intuition: Our model emulates human decision-making, providing a more intuitive training tool for Go players to engage with the AI as if playing against a professional.
- Improved Top-one Accuracy Compared to Existing Studies: By utilizing the ViT model, we achieved higher top-one accuracy compared to previous studies, demonstrating the superior capability of ViT in Go move prediction.
2. Background and Related Work
2.1. Deep Learning and Go
2.2. Vision Transformer (ViT)
3. Materials and Methods
3.1. Dataset
3.1.1. Dataset Formation
3.1.2. Feature Preprocessing
3.2. Vision Transformer for Go
- Multi-head Attention Module:
- 2.
- Multi-layer Perceptron Module:
4. Experiments and Results
4.1. Experimental Settings
4.1.1. Training Settings for ViT
4.1.2. Training Settings for CNNs
4.1.3. Test-Time Augmentation
4.2. Performance Evaluation for Imitating Human Moves
4.2.1. Top-k Accuracy
- N is the total number of instances in the dataset;
- is the true label (human expert’s move) for the i-th instance;
- represents the predicted scores or probabilities for the i-th input;
- is the set of k labels with the highest predicted scores or probabilities for the i-th instance;
- I(⋅) is the indicator function, returning 1 if the condition inside is true and 0 otherwise.
4.2.2. Extended Top-k Accuracy
- N is the total number of instances in the dataset;
- is the true label for the i-th instance;
- represents the predicted scores or probabilities for the i-th input;
- represents the set of k labels with the highest predicted scores or probabilities for the i-th instance, considering label ;
- is the set of labels considered quivalent to the true label ;
- is the indicator function, which returns 1 if the condition inside is true and 0 otherwise.
4.2.3. Inference Time Analysis on Different Devices
5. Conclusions and Future Work
5.1. Conclusions
5.2. Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed]
- AlphaGo—The Movie|Full Award-Winning Documentary. Available online: https://www.youtube.com/watch?v=WXuK6gekU1Y (accessed on 1 July 2024).
- Patankar, S.; Usakoyal, C.; Patil, P.; Raut, K. A Survey of Deep Reinforcement Learning in Game Playing. In Proceedings of the 2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon), Pune, India, 25–27 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Pellegrino, M.; Lombardo, G.; Adosoglou, G.; Cagnoni, S.; Pardalos, P.M.; Poggi, A. A Multi-Head LSTM Architecture for Bankruptcy Prediction with Time Series Accounting Data. Future Internet 2024, 16, 79. [Google Scholar] [CrossRef]
- Saleh, S.N.; Elagamy, M.N.; Saleh, Y.N.; Osman, R.A. An Explainable Deep Learning-Enhanced IoMT Model for Effective Monitoring and Reduction of Maternal Mortality Risks. Future Internet 2024, 16, 411. [Google Scholar] [CrossRef]
- Chen, S.-W.; Chen, J.-K.; Hsieh, Y.-H.; Chen, W.-H.; Liao, Y.-H.; Lin, Y.-C.; Chen, M.-C.; Tsai, C.-T.; Chai, J.-W.; Yuan, S.-M. Improving Patient Safety in the X-Ray Inspection Process with EfficientNet-Based Medical Assistance System. Healthcare 2023, 11, 2068. [Google Scholar] [CrossRef] [PubMed]
- Van Der Werf, E.; Uiterwijk, J.W.; Postma, E.; Van Den Herik, J. Local move prediction in Go. In Proceedings of the Computers and Games: Third International Conference, CG 2002, Edmonton, AB, Canada, 25–27 July 2002; Revised Papers 3. Springer: Berlin, Germany, 2003; pp. 393–412. [Google Scholar]
- Sutskever, I.; Nair, V. Mimicking go experts with convolutional neural networks. In Proceedings of the Artificial Neural Networks-ICANN 2008: 18th International Conference, Prague, Czech Republic, 3–6 September 2008; Proceedings, Part II 18. Springer: Berlin, Germany, 2008; pp. 101–110. [Google Scholar]
- Clark, C.; Storkey, A. Teaching deep convolutional neural networks to play go. arXiv 2014, arXiv:1412.3409. [Google Scholar]
- Maddison, C.J.; Huang, A.; Sutskever, I.; Silver, D. Move evaluation in Go using deep convolutional neural networks. arXiv 2014, arXiv:1412.6564. [Google Scholar]
- Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef] [PubMed]
- KGS. KGS GO Server. Available online: https://u-go.net/gamerecords/ (accessed on 25 February 2024).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Sagri, A.; Cazenave, T.; Arjonilla, J.; Saffidine, A. Vision Transformers for Computer Go. In Proceedings of the International Conference on the Applications of Evolutionary Computation (Part of EvoStar), Aberystwyth, UK, 3–5 March 2024; Springer: Berlin, Germany, 2024; pp. 376–388. [Google Scholar]
- Lin, Y.-C.; Huang, Y.-C. Streamlined Deep Learning Models for Move Prediction in Go-Game. Electronics 2024, 13, 3093. [Google Scholar] [CrossRef]
- Cazenave, T. Mobile networks for computer Go. IEEE Trans. Games 2020, 14, 76–84. [Google Scholar] [CrossRef]
Feature | Shape | Description |
---|---|---|
Black | (1, 19, 19) | The positions of black stones are marked with 1. The rest are marked with 0. |
White | (1, 19, 19) | The positions of white stones are marked with 1. The rest are marked with 0. |
Invalid | (1, 19, 19) | The positions of invalid moves are marked with 1. The rest are marked with 0. |
Turn | (1, 19, 19) | The entire plane is marked with 1 if it is black’s turn to play or 0 if it is white’s turn to play. |
Ones | (1, 19, 19) | An entire plane is marked with 1. |
Empty | (1, 19, 19) | The positions that are not occupied by a stone are marked with 1. The rest are marked with 0. |
Recent moves | (4, 19, 19) | The features that store the most recent 4 moves. The ith plane of this feature stores the feature of the ith most recent move. The position of the move is marked with 1. The rest are marked with 0. |
Model | Symmetries | Top-1 Acc. | Top-5 Acc. |
---|---|---|---|
12-layer CNN | 1 | 0.4911 | 0.8250 |
12-layer CNN | 8 | 0.4921 | 0.8259 |
Policy network | 1 | 0.4924 | 0.8134 |
Policy network | 8 | 0.5045 | 0.8209 |
Vit/L = 4 | 1 | 0.4513 | 0.7674 |
Vit/L = 4 | 8 | 0.4726 | 0.7973 |
Vit/L = 8 | 1 | 0.4860 | 0.8031 |
Vit/L = 8 | 8 | 0.5050 | 0.8214 |
Vit/L = 12 | 1 | 0.4973 | 0.8154 |
Vit/L = 12 | 8 | 0.5149 | 0.8312 |
Model | Symmetries | Ext. Top-1 Acc. | Ext. Top-5 Acc. |
---|---|---|---|
12-layer CNN | 1 | 0.4943 | 0.8274 |
12-layer CNN | 8 | 0.4962 | 0.8272 |
Policy network | 1 | 0.4930 | 0.8144 |
Policy network | 8 | 0.5091 | 0.8237 |
Vit/L = 4 | 1 | 0.4533 | 0.7704 |
Vit/L = 4 | 8 | 0.4751 | 0.7993 |
Vit/L = 8 | 1 | 0.4892 | 0.8063 |
Vit/L = 8 | 8 | 0.5106 | 0.8278 |
Vit/L = 12 | 1 | 0.5013 | 0.8259 |
Vit/L = 12 | 8 | 0.5187 | 0.8349 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hsieh, Y.-H.; Kao, C.-C.; Yuan, S.-M. Imitating Human Go Players via Vision Transformer. Algorithms 2025, 18, 61. https://doi.org/10.3390/a18020061
Hsieh Y-H, Kao C-C, Yuan S-M. Imitating Human Go Players via Vision Transformer. Algorithms. 2025; 18(2):61. https://doi.org/10.3390/a18020061
Chicago/Turabian StyleHsieh, Yu-Heng, Chen-Chun Kao, and Shyan-Ming Yuan. 2025. "Imitating Human Go Players via Vision Transformer" Algorithms 18, no. 2: 61. https://doi.org/10.3390/a18020061
APA StyleHsieh, Y.-H., Kao, C.-C., & Yuan, S.-M. (2025). Imitating Human Go Players via Vision Transformer. Algorithms, 18(2), 61. https://doi.org/10.3390/a18020061