Goto

Collaborating Authors

 counterfactual regret minimization



Monopoly Deal: A Benchmark Environment for Bounded One-Sided Response Games

arXiv.org Artificial Intelligence

Card games are widely used to study sequential decision-making under uncertainty, with real-world analogues in negotiation, finance, and cybersecurity. These games typically fall into three categories based on the flow of control: strictly sequential (players alternate single actions), deterministic response (some actions trigger a fixed outcome), and unbounded reciprocal response (alternating counterplays are permitted). A less-explored but strategically rich structure is the bounded one-sided response, where a player's action briefly transfers control to the opponent, who must satisfy a fixed condition through one or more moves before the turn resolves. We term games featuring this mechanism Bounded One-Sided Response Games (BORGs). We introduce a modified version of Monopoly Deal as a benchmark environment that isolates this dynamic, where a Rent action forces the opponent to choose payment assets. The gold-standard algorithm, Counterfactual Regret Minimization (CFR), converges on effective strategies without novel algorithmic extensions. A lightweight full-stack research platform unifies the environment, a parallelized CFR runtime, and a human-playable web interface. The trained CFR agent and source code are available at https://monopolydeal.ai.


Analysis of Bluffing by DQN and CFR in Leduc Hold'em Poker

arXiv.org Artificial Intelligence

In the game of poker, being unpredictable, or bluffing, is an essential skill. When humans play poker, they bluff. However, most works on computer-poker focus on performance metrics such as win rates, while bluffing is overlooked. In this paper we study whether two popular algorithms, DQN (based on reinforcement learning) and CFR (based on game theory), exhibit bluffing behavior in Leduc Hold'em, a simplified version of poker. We designed an experiment where we let the DQN and CFR agent play against each other while we log their actions. We find that both DQN and CFR exhibit bluffing behavior, but they do so in different ways. Although both attempt to perform bluffs at different rates, the percentage of successful bluffs (where the opponent folds) is roughly the same. This suggests that bluffing is an essential aspect of the game, not of the algorithm. Future work should look at different bluffing styles and at the full game of poker.


Robust Deep Monte Carlo Counterfactual Regret Minimization: Addressing Theoretical Risks in Neural Fictitious Self-Play

arXiv.org Machine Learning

Monte Carlo Counterfactual Regret Minimization (MCCFR) has emerged as a cornerstone algorithm for solving extensive-form games, but its integration with deep neural networks introduces scale-dependent challenges that manifest differently across game complexities. This paper presents a comprehensive analysis of how neural MCCFR component effectiveness varies with game scale and proposes an adaptive framework for selective component deployment. We identify that theoretical risks such as nonstationary target distribution shifts, action support collapse, variance explosion, and warm-starting bias have scale-dependent manifestation patterns, requiring different mitigation strategies for small versus large games. Our proposed Robust Deep MCCFR framework incorporates target networks with delayed updates, uniform exploration mixing, variance-aware training objectives, and comprehensive diagnostic monitoring. Through systematic ablation studies on Kuhn and Leduc Poker, we demonstrate scale-dependent component effectiveness and identify critical component interactions. The best configuration achieves final exploitability of 0.0628 on Kuhn Poker, representing a 60% improvement over the classical framework (0.156). On the more complex Leduc Poker domain, selective component usage achieves exploitability of 0.2386, a 23.5% improvement over the classical framework (0.3703) and highlighting the importance of careful component selection over comprehensive mitigation. Our contributions include: (1) a formal theoretical analysis of risks in neural MCCFR, (2) a principled mitigation framework with convergence guarantees, (3) comprehensive multi-scale experimental validation revealing scale-dependent component interactions, and (4) practical guidelines for deployment in larger games.


Monte Carlo Sampling for Regret Minimization in Extensive Games

Neural Information Processing Systems

Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation.


LiteEFG: An Efficient Python Library for Solving Extensive-form Games

arXiv.org Artificial Intelligence

LiteEFG is an efficient library with easy-to-use Python bindings, which can solve multiplayer extensive-form games (EFGs). LiteEFG enables the user to express computation graphs in Python to define updates on the game tree structure. The graph is then executed by the C++ backend, leading to significant speedups compared to running the algorithm in Python. Moreover, in LiteEFG, the user needs to only specify the computation graph of the update rule in a decision node of the game, and LiteEFG will automatically distribute the update rule to each decision node and handle the structure of the imperfect-information game.


Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

Neural Information Processing Systems

Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensive-form games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a smaller, sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants.


Pure Monte Carlo Counterfactual Regret Minimization

arXiv.org Artificial Intelligence

Counterfactual Regret Minimization (CFR) and its variants are the best algorithms so far for solving large-scale incomplete information games. However, we believe that there are two problems with CFR: First, matrix multiplication is required in CFR iteration, and the time complexity of one iteration is too high; Secondly, the game characteristics in the real world are different. Just using one CFR algorithm will not be perfectly suitable for all game problems. For these two problems, this paper proposes a new algorithm called Pure CFR (PCFR) based on CFR. PCFR can be seen as a combination of CFR and Fictitious Play (FP), inheriting the concept of counterfactual regret (value) from CFR, and using the best response strategy instead of the regret matching strategy for the next iteration. This algorithm has three advantages. First, PCFR can be combined with any CFR variant. The resulting Pure MCCFR (PMCCFR) can significantly reduce the time and space complexity of one iteration. Secondly, our experiments show that the convergence speed of the PMCCFR is 2$\sim$3 times that of the MCCFR. Finally, there is a type of game that is very suitable for PCFR. We call this type of game clear-game, which is characterized by a high proportion of dominated strategies. Experiments show that in clear-game, the convergence rate of PMCCFR is two orders of magnitude higher than that of MCCFR.


Hierarchical Deep Counterfactual Regret Minimization

arXiv.org Artificial Intelligence

Imperfect Information Games (IIGs) offer robust models for scenarios where decision-makers face uncertainty or lack complete information. Counterfactual Regret Minimization (CFR) has been one of the most successful family of algorithms for tackling IIGs. The integration of skill-based strategy learning with CFR could potentially mirror more human-like decision-making process and enhance the learning performance for complex IIGs. It enables the learning of a hierarchical strategy, wherein low-level components represent skills for solving subgames and the high-level component manages the transition between skills. In this paper, we introduce the first hierarchical version of Deep CFR (HDCFR), an innovative method that boosts learning efficiency in tasks involving extensively large state spaces and deep game trees. A notable advantage of HDCFR over previous works is its ability to facilitate learning with predefined (human) expertise and foster the acquisition of skills that can be transferred to similar tasks. To achieve this, we initially construct our algorithm on a tabular setting, encompassing hierarchical CFR updating rules and a variance-reduced Monte Carlo sampling extension. Notably, we offer the theoretical justifications, including the convergence rate of the proposed updating rule, the unbiasedness of the Monte Carlo regret estimator, and ideal criteria for effective variance reduction. Then, we employ neural networks as function approximators and develop deep learning objectives to adapt our proposed algorithms for large-scale tasks, while maintaining the theoretical support.


CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong

arXiv.org Artificial Intelligence

Counterfactual Regret Minimization(CFR) has shown its success in Texas Hold'em poker. We apply this algorithm to another popular incomplete information game, Mahjong. Compared to the poker game, Mahjong is much more complex with many variants. We study two-player Mahjong by conducting game theoretical analysis and making a hierarchical abstraction to CFR based on winning policies. This framework can be generalized to other imperfect information games.