"Industry executives and analysts often mistakenly talk about strategy as if it were some kind of chess match. But in chess, you have just two opponents, each with identical resources, and with luck playing a minimal role. The real world is much more like a poker game, with multiple players trying to make the best of whatever hand fortune has dealt them." (David Moschella, Computerworld, 1999)
Poker is the canonical example of a stochastic game with imperfect information. These are games with random chance (when cards are dealt from the deck) and some information is hidden from the players (their opponents' private cards). Poker is famous for its psychological aspects: bluffing with bad cards, feigning weakness with strong ones, and reading an opponent's facial expressions and "tells" to discover what cards they are hiding. In spite of these "human" elements, computers can still play poker at a high level of skill, including bluffing and other tricky plays, by approaching the game with game theory.
Poker is a family of wagering card games with different numbers of players, betting structures, and rules. The current most popular variant of poker, played in casinos and seen on television, is no-limit Texas hold'em. This game and a smaller variant, limit Texas hold'em, have been used as a testbed for artificial intelligence research since 1997. Since 2006, the Annual Computer Poker Competition has allowed researchers, programmers, and poker players to play their poker programs against each other, allowing us to find out which artificial intelligence techniques work best in practice. The competition has resulted in significant advances in fields such as computational game theory, and resulted in algorithms that can find optimal strategies for games six orders of magnitude larger than was possible using earlier techniques.
At the 2007 AAAI conference, a program called Polaris made by the University of Alberta's Computer Poker Research Group played heads-up limit Texas hold'em against two human professionals, Phil Laak and Ali Eslami, in the First Man-Machine Poker Championship. Over four duplicate poker matches, Polaris won once, tied once, and lost twice, losing the match overall by a small margin. In Las Vegas in 2008, an updated version of Polaris played against six human heads-up limit specialists in the Second Man-Machine Poker Championship. Over six matches, Polaris won three, lost two and tied one to win the event by a small margin. This marked the first time that a computer program defeated human poker professionals in a meaningful competition. Computer programs have continued to improve, and the top competitors in the Annual Computer Poker Competition are much stronger than the version of Polaris used in the 2008 competition.
Challenges for Artificial Intelligence Research:
Poker offers a rich set of research challenges that are not seen in popular AI research testbeds such as checkers, chess, and Go. These include:
- Imperfect information, where one or more players cannot determine the exact state of the game.
- Stochastic outcomes, where random chance affects the course of the game.
- Multiple players (two to ten). Games with more than one opponent are strategically very different from two-player games.
- Repeated interaction, where the players play a long series of games and can learn and adapt to their opponent over time.
- Variable payouts, so that players must adapt to exploit their opponents' mistakes in order to maximize their winnings.
In two-player poker games, the most successful approach to date is based on game theory. A Nash equilibrium is a set of strategies, one for each player, where no player can increase the amount they win by changing to a different strategy. In two-player zero-sum games, such as two-player poker games, one player using a Nash equilibrium strategy is guaranteed to not lose against any adversary, even a worst-case adversary that knows what the strategy is. If the adversary makes mistakes, then a Nash equilibrium strategy can win. If a Nash equilibrium strategy is computed before the game is played, it can be used against any adversary without needing to know anything about their strategy. This is a nice property in poker, because it means that we can use it to at worst tie the game, while learning what mistakes the opponent is making so that we can change our strategy in response. The Annual Computer Poker Competition has driven significant effort towards approximating Nash equilibrium strategies in very large games, and the best programs in two-player limit Texas hold'em are now very close to a Nash equilibrium strategy. This approach was used in the 2007 and 2008 Man-vs-Machine Poker Championships, and is also used by most of the top competitors in the Annual Computer Poker Competition.
The other popular approach in the Computer Poker Competition involves observing strong players, such as good humans or good computer programs, and learning a strategy that mimics their behavior. Recently, this has involved case-based reasoning, in which a program's current decision is compared to a set of examples stored in memory. The most similar examples are used to choose a response.
While computer programs are now very strong in the simplest real poker game played in the competition, two-player limit Texas hold'em poker, there are many open problems and new challenges that need to be addressed:
- State-space abstraction. Human-scale poker games are far too large to exactly compute optimal strategies. Instead, a technique is used to simplify the game down to a size where we can compute good (but no longer perfect) strategies. This usually means finding decision points in the game that are strategically similar and merging them together. Automated techniques for doing this will become more and more important as we move from two-player limit Texas hold'em to larger games, like no-limit or multi-player games.
- Multi-player games. The most successful approach to date for making strong computer poker agents uses the mathematics of game theory to decide how to pick actions so that the strategy cannot lose much, or at all, against a perfect opponent. In games with more than one opponent, however, the mathematical guarantees of this approach are no longer useful. Instead of using a single precomputed strategy, a good computer poker agent will instead have to observe its opponents and choose a strategy that works well in response. This is largely an open problem.
- Adapting to opponents during a match. The goal of poker is to maximize the amount of money won from the opponents. However, the game theoretic approach to the game pursues a different goal, by trying to minimize how much money is lost to a perfect adversary. In order to win as much as possible from an opponent, a player has to observe the opponent's actions, find out what mistakes the opponent is making, and change their strategy in response to take advantage of those mistakes. Computer poker research has led to effective techniques that work when the opponent is observed for millions of games, but human players are able to adjust their strategy after only tens or hundreds of games.