Smooth UCT Search in Computer Poker
Heinrich, Johannes (University College London) | Silver, David (Google DeepMind)
They concluded that UCT quickly finds Self-play Monte Carlo Tree Search (MCTS) has a good but suboptimal policy, while Outcome Sampling initially been successful in many perfect-information twoplayer learns more slowly but converges to the optimal policy games. Although these methods have been over time. In this paper, we address the question whether the extended to imperfect-information games, so far inability of UCT to converge to a Nash equilibrium can be they have not achieved the same level of practical overcome while retaining UCT's fast initial learning rate. We success or theoretical convergence guarantees focus on the full-game MCTS setting, which is an important as competing methods. In this paper we step towards developing sound variants of online MCTS in introduce Smooth UCT, a variant of the established imperfect-information games. Upper Confidence Bounds Applied to Trees In particular, we introduce Smooth UCT, which combines (UCT) algorithm.
Jul-15-2015
- Country:
- Europe > United Kingdom
- England > Greater London > London (0.04)
- North America
- Canada > Alberta (0.14)
- United States > Texas (0.05)
- Europe > United Kingdom
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Leisure & Entertainment > Games > Poker (0.52)
- Technology: