In recent years there have been great strides in artificial intelligence (AI), with games often serving as challenge problems, benchmarks, and milestones for progress. Poker has served for decades as such a challenge problem. Past successes in such benchmarks, including poker, have been limited to two-player games. However, poker in particular is traditionally played with more than two players. Multiplayer games present fundamental additional issues beyond those in two-player games, and multiplayer poker is a recognized AI milestone. In this paper we present Pluribus, an AI that we show is stronger than top human professionals in six-player no-limit Texas hold'em poker, the most popular form of poker played by humans. Poker has served as a challenge problem for the fields of artificial intelligence (AI) and game theory for decades (1). In fact, the foundational papers on game theory used poker to illustrate their concepts (2, 3). The reason for this choice is simple: no other popular recreational game captures the challenges of hidden information as effectively and as elegantly as poker. Although poker has been useful as a benchmark for new AI and game-theoretic techniques, the challenge of hidden information in strategic settings is not limited to recreational games.
Pluribus is the first AI bot capable of beating human experts in six-player no-limit Hold'em, the most widely played poker format in the world. This is the first time an AI bot has beaten top human players in a complex game with more than two players or two teams. We tested Pluribus against professional poker players, including two winners of the World Series of Poker Main Event. Pluribus succeeds because it can very efficiently handle the challenges of a game with both hidden information and more than two players. It uses self-play to teach itself how to win, with no examples or guidance on strategy. Pluribus uses far fewer computing resources than the bots that have defeated humans in other games. The bot's success will advance AI research, because many important AI challenges involve many players and hidden information. For decades, poker has been a difficult and important grand challenge problem for the field of AI.
Although games of skill like Go and chess have long been touchstones for intelligence, programmers have gotten steadily better at crafting programs that can now beat even the best human opponents. Only recently, however, has artificial intelligence (AI) begun to successfully challenge humans in the much more popular (and lucrative) game of poker. Part of what makes poker difficult is that the luck of the draw in this card game introduces an intrinsic randomness (although randomness is also an element of games like backgammon, at which software has beaten humans for decades). More important, though, is that in the games where computers previously have triumphed, players have "perfect information" about the state of the play up until that point. "Randomness is not nearly as hard a problem," said Michael Bowling of the University of Alberta in Canada.
The machines have proven their superiority in one-on-one games like chess and go, and even poker -- but in complex multiplayer versions of the card game humans have retained their edge… until now. An evolution of the last AI agent to flummox poker pros individually is now decisively beating them in championship-style 6-person game. As documented in a paper published in the journal Science today, the CMU/Facebook collaboration they call Pluribus reliably beats five professional poker players in the same game, or one pro pitted against five independent copies of itself. It's a major leap forward in capability for the machines, and amazingly is also far more efficient than previous agents as well. One-on-one poker is a weird game, and not a simple one, but the zero-sum nature of it (whatever you lose, the other player gets) makes it susceptible to certain strategies in which computer able to calculate out far enough can put itself at an advantage.
Facebook researchers have developed a general AI framework called Recursive Belief-based Learning (ReBeL) that they say achieves better-than-human performance in heads-up, no-limit Texas hold'em poker while using less domain knowledge than any prior poker AI. They assert that ReBeL is a step toward developing universal techniques for multi-agent interactions -- in other words, general algorithms that can be deployed in large-scale, multi-agent settings. Potential applications run the gamut from auctions, negotiations, and cybersecurity to self-driving cars and trucks. Combining reinforcement learning with search at AI model training and test time has led to a number of advances. Reinforcement learning is where agents learn to achieve goals by maximizing rewards, while search is the process of navigating from a start to a goal state.