Goto

Collaborating Authors

 poker


Playing the Player: A Heuristic Framework for Adaptive Poker AI

Paterson, Andrew, Sanders, Carl

arXiv.org Artificial Intelligence

For years, the discourse around poker AI has been dominated by the concept of solvers and the pursuit of unexploitable, machine-perfect play. This paper challenges that orthodoxy. It presents Patrick, an AI built on the contrary philosophy: that the path to victory lies not in being unexploitable, but in being maximally exploitative. Patrick's architecture is a purpose-built engine for understanding and attacking the flawed, psychological, and often irrational nature of human opponents. Through detailed analysis of its design, its novel prediction-anchored learning method, and its profitable performance in a 64,267-hand trial, this paper makes the case that the solved myth is a distraction from the real, far more interesting challenge: creating AI that can master the art of human imperfection.


Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance

Silue, Bram, Amaya-Corredor, Santiago, Mannion, Patrick, Willem, Lander, Libin, Pieter

arXiv.org Artificial Intelligence

Adversarial Inverse Reinforcement Learning (AIRL) has shown promise in addressing the sparse reward problem in reinforcement learning (RL) by inferring dense reward functions from expert demonstrations. However, its performance in highly complex, imperfect-information settings remains largely unexplored. To explore this gap, we evaluate AIRL in the context of Heads-Up Limit Hold'em (HULHE) poker, a domain characterized by sparse, delayed rewards and significant uncertainty. In this setting, we find that AIRL struggles to infer a sufficiently informative reward function. To overcome this limitation, we contribute Hybrid-AIRL (H-AIRL), an extension that enhances reward inference and policy learning by incorporating a supervised loss derived from expert data and a stochastic regularization mechanism. We evaluate H-AIRL on a carefully selected set of Gymnasium benchmarks and the HULHE poker setting. Additionally, we analyze the learned reward function through visualization to gain deeper insights into the learning process. Our experimental results show that H-AIRL achieves higher sample efficiency and more stable learning compared to AIRL. This highlights the benefits of incorporating supervised signals into inverse RL and establishes H-AIRL as a promising framework for tackling challenging, real-world settings.


Quantifying Skill and Chance: A Unified Framework for the Geometry of Games

Silver, David H.

arXiv.org Artificial Intelligence

We introduce a quantitative framework for separating skill and chance in games by modeling them as complementary sources of control over stochastic decision trees. We define the Skill-Luck Index S(G) in [-1, 1] by decomposing game outcomes into skill leverage K and luck leverage L. Applying this to 30 games reveals a continuum from pure chance (coin toss, S = -1) through mixed domains such as backgammon (S = 0, Sigma = 1.20) to pure skill (chess, S = +1, Sigma = 0). Poker exhibits moderate skill dominance (S = 0.33) with K = 0.40 +/- 0.03 and Sigma = 0.80. We further introduce volatility Sigma to quantify outcome uncertainty over successive turns. The framework extends to general stochastic decision systems, enabling principled comparisons of player influence, game balance, and predictive stability, with applications to game design, AI evaluation, and risk assessment.





Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning

Dewey, Richard, Botyanszki, Janos, Moallemi, Ciamac C., Zheng, Andrew T.

arXiv.org Artificial Intelligence

AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold'em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar's Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm. Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar's Poker. Solly also outperformed large language models (LLMs), including those with reasoning abilities, on the same metrics. Solly developed novel bidding strategies, randomized play effectively, and was not easily exploitable by world-class human players.


Nash Policy Gradient: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria

Yu, Eason, Liu, Tzu Hao, Wang, Yunke, Canonne, Clément L., Tran, Nguyen H., Xu, Chang

arXiv.org Artificial Intelligence

Finding Nash equilibria in imperfect-information games remains a central challenge in multi-agent reinforcement learning. While regularization-based methods have recently achieved last-iteration convergence to a regularized equilibrium, they require the regularization strength to shrink toward zero to approximate a Nash equilibrium, often leading to unstable learning in practice. Instead, we fix the regularization strength at a large value for robustness and achieve convergence by iteratively refining the reference policy. Our main theoretical result shows that this procedure guarantees strictly monotonic improvement and convergence to an exact Nash equilibrium in two-player zero-sum games, without requiring a uniqueness assumption. Building on this framework, we develop a practical algorithm, Nash Policy Gradient (NashPG), which preserves the generalizability of policy gradient methods while relying solely on the current and reference policies. Empirically, NashPG achieves comparable or lower exploitability than prior model-free methods on classic benchmark games and scales to large domains such as Battleship and No-Limit Texas Hold'em, where NashPG consistently attains higher Elo ratings.



Analysis of Bluffing by DQN and CFR in Leduc Hold'em Poker

Zaciragic, Tarik, Plaat, Aske, Batenburg, K. Joost

arXiv.org Artificial Intelligence

In the game of poker, being unpredictable, or bluffing, is an essential skill. When humans play poker, they bluff. However, most works on computer-poker focus on performance metrics such as win rates, while bluffing is overlooked. In this paper we study whether two popular algorithms, DQN (based on reinforcement learning) and CFR (based on game theory), exhibit bluffing behavior in Leduc Hold'em, a simplified version of poker. We designed an experiment where we let the DQN and CFR agent play against each other while we log their actions. We find that both DQN and CFR exhibit bluffing behavior, but they do so in different ways. Although both attempt to perform bluffs at different rates, the percentage of successful bluffs (where the opponent folds) is roughly the same. This suggests that bluffing is an essential aspect of the game, not of the algorithm. Future work should look at different bluffing styles and at the full game of poker.