Deep counterfactual value networks combined with continual resolving provide a way to conduct depth-limited search in imperfect-information games. However, since their introduction in the DeepStack poker AI, deep counterfactual value networks have not seen widespread adoption. In this paper we introduce several improvements to deep counterfactual value networks, as well as counterfactual regret minimization, and analyze the effects of each change. We combined these improvements to create the poker AI Supremus. We show that while a reimplementation of DeepStack loses head-to-head against the strong benchmark agent Slumbot, Supremus successfully beats Slumbot by an extremely large margin and also achieves a lower exploitability than DeepStack against a local best response. Together, these results show that with our key improvements, deep counterfactual value networks can achieve state-of-the-art performance.
Schmid, Martin, Moravcik, Matej, Burch, Neil, Kadlec, Rudolf, Davidson, Josh, Waugh, Kevin, Bard, Nolan, Timbers, Finbarr, Lanctot, Marc, Holland, Zach, Davoodi, Elnaz, Christianson, Alden, Bowling, Michael
Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.
A fundamental challenge in imperfect-information games is that states do not have well-defined values. As a result, depth-limited search algorithms used in single-agent settings and perfect-information games do not apply. This paper introduces a principled way to conduct depth-limited solving in imperfect-information games by allowing the opponent to choose among a number of strategies for the remainder of the game at the depth limit. Each one of these strategies results in a different set of values for leaf nodes. This forces an agent to be robust to the different strategies an opponent may employ. We demonstrate the effectiveness of this approach by building a master-level heads-up no-limit Texas hold'em poker AI that defeats two prior top agents using only a 4-core CPU and 16 GB of memory. Developing such a powerful agent would have previously required a supercomputer.
Moravcik, Matej (Charles University in Prague) | Schmid, Martin (Charles University in Prague) | Ha, Karel (Charles University in Prague) | Hladik, Milan (Charles University in Prague) | Gaukrodger, Stephen J. (Koypetition)
The leading approach to solving large imperfect information games is to pre-calculate an approximate solution using a simplified abstraction of the full game; that solution is then used to play the original, full-scale game. The abstraction step is necessitated by the size of the game tree. However, as the original game progresses, the remaining portion of the tree (the subgame) becomes smaller. An appealing idea is to use the simplified abstraction to play the early parts of the game and then, once the subgame becomes tractable, to calculate a solution using a finer-grained abstraction in real time, creating a combined final strategy. While this approach is straightforward for perfect information games, it is a much more complex problem for imperfect information games. If the subgame is solved locally, the opponent can alter his play in prior to this subgame to exploit our combined strategy. To prevent this, we introduce the notion of subgame margin, a simple value with appealing properties. If any best response reaches the subgame, the improvement of exploitability of the combined strategy is (at least) proportional to the subgame margin. This motivates subgame refinements resulting in large positive margins. Unfortunately, current techniques either neglect subgame margin (potentially leading to a large negative subgame margin and drastically more exploitable strategies), or guarantee only non-negative subgame margin (possibly producing the original, unrefined strategy, even if much stronger strategies are possible). Our technique remedies this problem by maximizing the subgame margin and is guaranteed to find the optimal solution. We evaluate our technique using one of the top participants of the AAAI-14 Computer Poker Competition, the leading playground for agents in imperfect information setting
The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of a successes in single-agent settings and perfect-information games, best exemplified by the success of AlphaZero. However, algorithms of this form have been unable to cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search for imperfect-information games. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results show ReBeL leads to low exploitability in benchmark imperfect-information games and achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI. We also prove that ReBeL converges to a Nash equilibrium in two-player zero-sum games in tabular settings.