Collaborating Authors



AAAI Conferences

Self-play Monte Carlo Tree Search (MCTS) has been successful in many perfect-information two-player games. Although these methods have been extended to imperfect-information games, so far they have not achieved the same level of practical success or theoretical convergence guarantees as competing methods. In this paper we introduce Smooth UCT, a variant of the established Upper Confidence Bounds Applied to Trees (UCT) algorithm. Smooth UCT agents mix in their average policy during self-play and the resulting planning process resembles game-theoretic fictitious play. When applied to Kuhn and Leduc poker, Smooth UCT approached a Nash equilibrium, whereas UCT diverged. In addition, Smooth UCT outperformed UCT in Limit Texas Hold'em and won 3 silver medals in the 2014 Annual Computer Poker Competition.

Player of Games Artificial Intelligence

Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.

On Strategy Stitching in Large Extensive Form Multiplayer Games

Neural Information Processing Systems

Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold'em poker, and that a specific class of static experts can be preferred among a number of alternatives.

Facebook and CMU's 'superhuman' poker AI beats human pros


AI has definitively beaten humans at another of our favorite games. A poker bot, designed by researchers from Facebook's AI lab and Carnegie Mellon University, has bested some of the world's top players in a series of games of six-person no-limit Texas Hold'em poker. Over 12 days and 10,000 hands, the AI system named Pluribus faced off against 12 pros in two different settings. In one, the AI played alongside five human players; in the other, five versions of the AI played with one human player (the computer programs were unable to collaborate in this scenario). Pluribus won an average of $5 per hand with hourly winnings of around $1,000 -- a "decisive margin of victory," according to the researchers.

AI beats professionals at six-player Texas Hold 'Em poker

New Scientist

Artificial intelligence has finally cracked the biggest challenge in poker: beating top professionals in six-player no-limit Texas Hold'Em, the most popular variant of the game. Over 20,000 hands of online poker, the AI beat fifteen of the world's top poker players, each of whom has won more than $1 million USD playing the game professionally. The AI, called Pluribus, was tested in 10,000 games against five human players, as well as in 10,000 rounds where five copies of Pluribus played against one professional – and did better than the pros in both. Pluribus was developed by Noam Brown of Facebook AI Research and Tuomas Sandholm at Carnegie Mellon University in the US. It is an improvement on their previous poker-playing AI, called Libratus, which in 2017 outplayed professionals at Heads-Up Texas Hold'Em, a variant of the game that pits two players head to head.

Bet On The Bot: AI Beats The Professionals At 6-Player Texas Hold 'Em

NPR Technology

During one experiment, the poker bot Pluribus played against five professional players. During one experiment, the poker bot Pluribus played against five professional players. In artificial intelligence, it's a milestone when a computer program can beat top players at a game like chess. But a game like poker, specifically six-player Texas Hold'em, has been too tough for a machine to master -- until now. Researchers say they have designed a bot called Pluribus capable of taking on poker professionals in the most popular form of poker and winning.

Why it's a big deal that AI knows how to bluff in poker


As the great Kenny Rogers once said, a good gambler has to know when to hold'em and know when to fold'em. At the Rivers Casino in Pittsburgh this week, a computer program called Libratus may finally prove that computers can do this better than any human card player. Libratus is playing thousands of games of heads-up, or two-player, no-limit Texas hold'em against several expert professional poker players. Now a little more than halfway through the 20-day contest, Libratus is up by almost $800,000 against its human opponents. So victory, while far from guaranteed, may well be in the cards.

Depth-Limited Solving for Imperfect-Information Games

Neural Information Processing Systems

A fundamental challenge in imperfect-information games is that states do not have well-defined values. As a result, depth-limited search algorithms used in single-agent settings and perfect-information games do not apply. This paper introduces a principled way to conduct depth-limited solving in imperfect-information games by allowing the opponent to choose among a number of strategies for the remainder of the game at the depth limit. Each one of these strategies results in a different set of values for leaf nodes. This forces an agent to be robust to the different strategies an opponent may employ. We demonstrate the effectiveness of this approach by building a master-level heads-up no-limit Texas hold'em poker AI that defeats two prior top agents using only a 4-core CPU and 16 GB of memory. Developing such a powerful agent would have previously required a supercomputer.

Dynamic Adaptation and Opponent Exploitation in Computer Poker

AAAI Conferences

As a classic example of imperfect information games, Heads-Up No-limit Texas Holdem (HUNL), has been studied extensively in recent years. While state-of-the-art approaches based on Nash equilibrium have been successful, they lack the ability to model and exploit opponents effectively. This paper presents an evolutionary approach to discover opponent models based Long Short Term Memory neural networks and on Pattern Recognition Trees. Experimental results showed that poker agents built in this method can adapt to opponents they have never seen in training and exploit weak strategies far more effectively than Slumbot 2017, one of the cutting-edge Nash-equilibrium-based poker agents. In addition, agents evolved through playing against relatively weak rule-based opponents tied statistically with Slumbot in heads-up matches. Thus, the proposed approach is a promising new direction for building high-performance adaptive agents in HUNL and other imperfect information games.

AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games

AAAI Conferences

Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available. The variance of sampled outcomes may make the simple approach of Monte Carlo sampling inadequate. This is the case for agents playing heads-up no-limit Texas hold'em poker, whereman-machine competitions typically involve multiple days of consistent play by multiple players, but still can (and sometimes did) result in statistically insignificant conclusions. In this paper, we introduce AIVAT, a low variance, provably unbiased value assessment tool that exploits an arbitrary heuristic estimate of state value, as well as the explicit strategy of a subset of the agents. Unlike existing techniques which reduce the variance from chance events, or only consider game ending actions, AIVAT reduces the variance both from choices by nature and by players with a known strategy. The resulting estimator produces results that significantly outperform previous state of the art techniques. It was able to reduce the standard deviation of a Texas hold'em poker man-machine match by 85\% and consequently requires 44 times fewer games to draw the same statistical conclusion. AIVAT enabled the first statistically significant AI victory against professional poker players in no-limit hold'em.Furthermore, the technique was powerful enough to produce statistically significant results versus individual players, not just an aggregate pool of the players. We also used AIVAT to analyze a short series of AI vs human poker tournaments,producing statistical significant results with as few as 28 matches.