Czech, Johannes
Checkmating One, by Using Many: Combining Mixture of Experts with MCTS to Improve in Chess
Helfenstein, Felix, Blüml, Jannis, Czech, Johannes, Kersting, Kristian
This paper presents a new approach that integrates deep learning with computational chess, using both the Mixture of Experts (MoE) method and Monte-Carlo Tree Search (MCTS). Our methodology employs a suite of specialized models, each designed to respond to specific changes in the game's input data. This results in a framework with sparsely activated models, which provides significant computational benefits. Our framework combines the MoE method with MCTS, in order to align it with the strategic phases of chess, thus departing from the conventional ``one-for-all'' model. Instead, we utilize distinct game phase definitions to effectively distribute computational tasks across multiple expert neural networks. Our empirical research shows a substantial improvement in playing strength, surpassing the traditional single-model framework. This validates the efficacy of our integrated approach and highlights the potential of incorporating expert knowledge and strategic principles into neural network design. The fusion of MoE and MCTS offers a promising avenue for advancing machine learning architectures.
Know your Enemy: Investigating Monte-Carlo Tree Search with Opponent Models in Pommerman
Weil, Jannis, Czech, Johannes, Meuser, Tobias, Kersting, Kristian
In combination with Reinforcement Learning, Monte-Carlo Tree Search has shown to outperform human grandmasters in games such as Chess, Shogi and Go with little to no prior domain knowledge. However, most classical use cases only feature up to two players. Scaling the search to an arbitrary number of players presents a computational challenge, especially if decisions have to be planned over a longer time horizon. In this work, we investigate techniques that transform general-sum multiplayer games into single-player and two-player games that consider other agents to act according to given opponent models. For our evaluation, we focus on the challenging Pommerman environment which involves partial observability, a long time horizon and sparse rewards. In combination with our search methods, we investigate the phenomena of opponent modeling using heuristics and self-play. Overall, we demonstrate the effectiveness of our multiplayer search variants both in a supervised learning and reinforcement learning setting.
Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers
Czech, Johannes, Blüml, Jannis, Kersting, Kristian
With transformers, While transformers have gained the reputation as the this information can be effectively captured and modeled, "Swiss army knife of AI", no one has challenged them whereas with CNNs it can be more challenging. This is one to master the game of chess, one of the classical AI reason, why nowadays transformer models are taken over benchmarks. Simply using vision transformers (ViTs) classical CNN approaches in computer vision and other domains within AlphaZero does not master the game of chess, [6]. Moreover, by combining the strengths of transformers mainly because ViTs are too slow. Even making them more and reinforcement learning (RL), it is possible to efficient using a combination of MobileNet and NextViT develop powerful models for solving complex sequential does not beat what actually matters: a simple change of the decision-making problems [2, 13]. They can be used to input representation and value loss, resulting in a greater model the state representation, policy, and value function, boost of up to 180 Elo points over AlphaZero.
Monte-Carlo Graph Search for AlphaZero
Czech, Johannes, Korus, Patrick, Kersting, Kristian
The AlphaZero algorithm has been successfully applied in a range of discrete domains, most notably board games. It utilizes a neural network, that learns a value and policy function to guide the exploration in a Monte-Carlo Tree Search. Although many search improvements have been proposed for Monte-Carlo Tree Search in the past, most of them refer to an older variant of the Upper Confidence bounds for Trees algorithm that does not use a policy for planning. We introduce a new, improved search algorithm for AlphaZero which generalizes the search tree to a directed acyclic graph. This enables information flow across different subtrees and greatly reduces memory consumption. Along with Monte-Carlo Graph Search, we propose a number of further extensions, such as the inclusion of Epsilon-greedy exploration, a revised terminal solver and the integration of domain knowledge as constraints. In our evaluations, we use the CrazyAra engine on chess and crazyhouse as examples to show that these changes bring significant improvements to AlphaZero.
Learning to play the Chess Variant Crazyhouse above World Champion Level with Deep Neural Networks and Human Data
Czech, Johannes, Willig, Moritz, Beyer, Alena, Kersting, Kristian, Fürnkranz, Johannes
Deep neural networks have been successfully applied in learning the board games Go, chess and shogi without prior knowledge by making use of reinforcement learning. Although starting from zero knowledge has been shown to yield impressive results, it is associated with high computationally costs especially for complex games. With this paper, we present CrazyAra which is a neural network based engine solely trained in supervised manner for the chess variant crazyhouse. Crazyhouse is a game with a higher branching factor than chess and there is only limited data of lower quality available compared to AlphaGo. Therefore, we focus on improving efficiency in multiple aspects while relying on low computational resources. These improvements include modifications in the neural network design and training configuration, the introduction of a data normalization step and a more sample efficient Monte-Carlo tree search which has a lower chance to blunder. After training on 569,537 human games for 1.5 days we achieve a move prediction accuracy of 60.4%. During development, versions of CrazyAra played professional human players. Most notably, CrazyAra achieved a four to one win over 2017 crazyhouse world champion Justin Tan (aka LM Jann Lee) who is more than 400 Elo higher rated compared to the average player in our training set. Furthermore, we test the playing strength of CrazyAra on CPU against all participants of the second Crazyhouse Computer Championships 2017, winning against twelve of the thirteen participants. Finally, for CrazyAraFish we continue training our model on generated engine games. In ten long-time control matches playing Stockfish 10, CrazyAraFish wins three games and draws one out of ten matches.