simultaneous move game
Convergence of Monte Carlo Tree Search in Simultaneous Move Games
In this paper, we study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves. We present a general template of MCTS algorithms for these games, which can be instantiated by various selection methods. We formally prove that if a selection method is $\epsilon$-Hannan consistent in a matrix game and satisfies additional requirements on exploration, then the MCTS algorithm eventually converges to an approximate Nash equilibrium (NE) of the extensive-form game. We empirically evaluate this claim using regret matching and Exp3 as the selection methods on randomly generated and worst case games. We confirm the formal result and show that additional MCTS variants also converge to approximate NE on the evaluated games.
1579779b98ce9edb98dd85606f2c119d-Reviews.html
"NIPS 2013 Neural Information Processing Systems December 5 - 10, Lake Tahoe, Nevada, USA",,, "Paper ID:","1046" "Title:","Convergence of Monte Carlo Tree Search in Simultaneous Move Games" Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper studies Monte Carlo tree search in zero-sum extensive form games with perfect information and simultaneous moves. It is proved that the MCTS algorithm converges to an approximate Nash equilibrium under certain conditions. Empirical study confirms the formal result. The detailed comments are as follows. The result is useful and the presentation is clear.
Convergence of Monte Carlo Tree Search in Simultaneous Move Games
In this paper, we study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves. We present a general template of MCTS algorithms for these games, which can be instantiated by various selection methods. We formally prove that if a selection method is $\epsilon$-Hannan consistent in a matrix game and satisfies additional requirements on exploration, then the MCTS algorithm eventually converges to an approximate Nash equilibrium (NE) of the extensive-form game. We empirically evaluate this claim using regret matching and Exp3 as the selection methods on randomly generated and worst case games. We confirm the formal result and show that additional MCTS variants also converge to approximate NE on the evaluated games.
Convergence of Monte Carlo Tree Search in Simultaneous Move Games
Lisy, Viliam, Kovarik, Vojta, Lanctot, Marc, Bosansky, Branislav
In this paper, we study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves. We present a general template of MCTS algorithms for these games, which can be instantiated by various selection methods. We formally prove that if a selection method is $\epsilon$-Hannan consistent in a matrix game and satisfies additional requirements on exploration, then the MCTS algorithm eventually converges to an approximate Nash equilibrium (NE) of the extensive-form game. We empirically evaluate this claim using regret matching and Exp3 as the selection methods on randomly generated and worst case games. We confirm the formal result and show that additional MCTS variants also converge to approximate NE on the evaluated games.
Using Double-Oracle Method and Serialized Alpha-Beta Search for Pruning in Simultaneous Move Games
Bosansky, Branislav (Czech Technical University in Prague) | Lisy, Viliam (Czech Technical University in Prague) | Cermak, Jiri (Czech Technical University in Prague) | Vitek, Roman (Czech Technical University in Prague) | Pechoucek, Michal (Czech Technical University in Prague)
We focus on solving two-player zero-sum extensive-form games with perfect information and simultaneous moves. In these games, both players fully observe the current state of the game where they simultaneously make a move determining the next state of the game. We solve these games by a novel algorithm that relies on two components: (1) it iteratively solves the games that correspond to a single simultaneous move using a double-oracle method, and (2) it prunes the states of the game using bounds on the sub-game values obtained by the classical Alpha-Beta search on a serialized variant of the game. We experimentally evaluate our algorithm on the Goofspiel card game, a pursuit-evasion game, and randomly generated games. The results show that our novel algorithm typically provides significant running-time improvements and reduction in the number of evaluated nodes compared to the full search algorithm.