The current most popular variant of poker, played in casinos and seen on television, is no-limit Texas hold'em. This game and a smaller variant, limit Texas hold'em, have been used as a testbed for artificial intelligence research since 1997. Since 2006, the Annual Computer Poker Competition has allowed researchers, programmers, and poker players to play their poker programs against each other, allowing us to find out which artificial intelligence techniques work best in practice. The competition has resulted in significant advances in fields such as computational game theory, and resulted in algorithms that can find optimal strategies for games six orders of magnitude larger than was possible using earlier techniques.
Self-play Monte Carlo Tree Search (MCTS) has been successful in many perfect-information two-player games. Although these methods have been extended to imperfect-information games, so far they have not achieved the same level of practical success or theoretical convergence guarantees as competing methods. In this paper we introduce Smooth UCT, a variant of the established Upper Confidence Bounds Applied to Trees (UCT) algorithm. Smooth UCT agents mix in their average policy during self-play and the resulting planning process resembles game-theoretic fictitious play. When applied to Kuhn and Leduc poker, Smooth UCT approached a Nash equilibrium, whereas UCT diverged. In addition, Smooth UCT outperformed UCT in Limit Texas Hold'em and won 3 silver medals in the 2014 Annual Computer Poker Competition.
Schmid, Martin, Moravcik, Matej, Burch, Neil, Kadlec, Rudolf, Davidson, Josh, Waugh, Kevin, Bard, Nolan, Timbers, Finbarr, Lanctot, Marc, Holland, Zach, Davoodi, Elnaz, Christianson, Alden, Bowling, Michael
Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.
From the very dawn of the field, search with value functions was a fundamental concept of computer games research. Turing's chess algorithm from 1950 was able to think two moves ahead, and Shannon's work on chess from $1950$ includes an extensive section on evaluation functions to be used within a search. Samuel's checkers program from 1959 already combines search and value functions that are learned through self-play and bootstrapping. TD-Gammon improves upon those ideas and uses neural networks to learn those complex value functions -- only to be again used within search. The combination of decision-time search and value functions has been present in the remarkable milestones where computers bested their human counterparts in long standing challenging games -- DeepBlue for Chess and AlphaGo for Go. Until recently, this powerful framework of search aided with (learned) value functions has been limited to perfect information games. As many interesting problems do not provide the agent perfect information of the environment, this was an unfortunate limitation. This thesis introduces the reader to sound search for imperfect information games.
The 32-year-old is the only person to have won four World Poker Tour titles and has earned more than $7 million at tournaments. Despite his expertise, he learned something new this spring from an artificial intelligence bot. Elias was helping test new soft ware from researchers at Carnegie Mellon University and Facebook. He and another pro, Chris "Jesus" Ferguson, each played 5,000 hands over the internet in six-way games against five copies of a bot called Pluribus. At the end, the bot was ahead by a good margin.
TL;DR: The Ultimate Poker Pro Blueprint Mastery Bundle is on sale for £16.08 as of August 14, saving you 99% on list price. Playing poker online is a totally different game than playing in real life. You aren't playing other people so much as you are just playing the algorithm. Therefore, it requires a touch less skill and a touch more pattern recognition and smarts. In the Ultimate Poker Pro Blueprint Mastery Bundle, you'll learn exactly what it takes to win money playing poker online.
Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold'em poker, and that a specific class of static experts can be preferred among a number of alternatives.
In two-player zero-sum games a Nash equilibrium strategy is guaranteed to win (or tie) in expectation against any opposing strategy by the minimax theorem. In games with m ore than two players there can be multiple equilibria with different values to the players, and follow ing one has no performance guarantee; however, it was shown that a Nash equilibrium strategy defeated a variet y of agents submitted for a class project in a 3-player imperfect-information game, Kuhn poker . Thi s demonstrates that Nash equilibrium strategies can be successful in practice despite the fact that they do no t have a performance guarantee. While Nash equilibrium can be computed in polynomial time fo r two-player zero-sum games, it is PPAD-hard to compute for nonzero-sum and games with 3 or mor e agents and widely believed that no efficient algorithms exist [8, 9]. Counterfactual regret mi nimization (CFR) is an iterative self-play procedure that has been proven to converge to Nash equilibrium in two-p layer zero-sum .
In this tutorial, you will learn step-by-step how to implement a poker bot in Python. First, we need an engine in which we can simulate our poker bot. It also has a GUI available which can graphically display a game. Both the engine and the GUI have excellent tutorials on their GitHub pages in how to use them. The choice for the engine (and/or the GUI) is arbitrary and can be replaced by any engine (and/or GUI) you like.
Poker is considered a good challenge for AI, as it is seen as combination of mathematical/strategic play, and human intuition, especially about the strategies of others. I would consider the game a cross between the two extremes of technical vs. human skill: chess and rock-paper-scissors. In the game of chess, the technically superior player will almost always win, an amateur would lose literally 100% of their games to the top chess playing AI. In rock-paper-scissors, if the top AI plays the perfect strategy, of each option 1/3rd of the time, it will be unbeatable, but also by definition be incapable of beating anyone. To see why let's analyse how it plays against the Bart Simpson strategy: If your opponent always plays rock, you will play rock 1/3rd of the time, paper 1/3rd and scissors 1/3rd, meaning you will tie 1/3rd, win 1/3rd, and lose 1/3rd.