Extensive games are often used to model the interactions of multiple agents within an environment. Much recent work has focused on increasing the size of an extensive game that can be feasibly solved. Despite these improvements, many interesting games are still too large for such techniques. A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size. This abstract game is then solved and the resulting strategy is used in the original game. Most top programs in recent AAAI Computer Poker Competitions use this approach. The trend in this competition has been that strategies found in larger abstract games tend to beat strategies found in smaller abstract games. These larger abstract games have more expressive strategy spaces and therefore contain better strategies. In this paper we present a new method for computing strategies in large games. This method allows us to compute more expressive strategies without increasing the size of abstract games that we are required to solve. We demonstrate the power of the approach experimentally in both small and large games, while also providing a theoretical justification for the resulting improvement.
The world's best artificial intelligence poker player seems to know exactly when to hold'em and when to fold'em. An artificial-intelligence program known as Libratus has beat the world's absolute best human poker players in a 20-day No-Limit Texas Hold'em tournament, defeating four opponents by about $1.77 million in poker chips, according to Pittsburgh's Rivers Casino, where the "Brains vs. Artificial Intelligence" poker tournament was held. At the end of each day, at least one of the human players was beating the AI program. But in the end, it was not enough. "We appreciate their hard work, but unfortunately, the computer won," said Craig Clark, general manager of Rivers Casino.
Carnegie Mellon's No-Limit Texas Hold'em software made short work of four of the world's best professional poker players in Pittsburgh at the grueling "Brains vs. Artificial Intelligence" poker tournament. Poker now joins chess, Jeopardy, go, and many other games at which programs outplay people. But poker is different from all the others in one big way: players have to guess based on partial, or "imperfect" information. "Chess and Go are games of perfect information," explains Libratus co-creator Noam Brown, a Ph.D. candidate at Carnegie Mellon. "All the information in the game is available for both sides to see.
Twelve days into the strangest poker tournament of their lives, Jason Les and his companions returned to their hotel, browbeaten and exhausted. Huddled over a pile of tacos, they strategized, as they had done every night. With about 60,000 hands played -- and 60,000 to go -- they were losing badly to an unusual opponent: a computer program called Libratus, which was up nearly $800,000 in chips. That wasn't supposed to happen. In 2015, Les and a crew of poker pros had beaten a similar computer program, winning about $700,000.
Multi-agent decision problems can often be formulated as extensive-form games. We focus on imperfect information extensive-form games in which one or more actions at many decision points have an associated continuous or many-valued parameter. A stock trading agent, in addition to deciding whether to buy or not, must decide how much to buy. In no-limit poker, in addition to selecting a probability for each action, the agent must decide how much to bet for each betting action. Selecting values for these parameters makes these games extremely large. Two-player no-limit Texas Hold'em poker with stacks of 500 big blinds has approximately 10 71 states, which is more than 10 50 times more states than two-player limit Texas Hold'em. The main contribution of this paper is a technique that abstracts a game's action space by selecting one, or a small number, of the many values for each parameter. We show that strategies computed using this new algorithm for no-limit Leduc poker exhibit significant utility gains over epsilon-Nash equilibrium strategies computed with standard, hand-crafted parameter value abstractions.