Monte Carlo Sampling for Regret Minimization in Extensive Games
Lanctot, Marc, Waugh, Kevin, Zinkevich, Martin, Bowling, Michael
–Neural Information Processing Systems
Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: {\it outcome sampling} and {\it external sampling}, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFRs bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games.
Neural Information Processing Systems
Dec-31-2009
- Country:
- North America
- Canada > Alberta (0.29)
- United States > Pennsylvania
- Allegheny County > Pittsburgh (0.14)
- North America
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Leisure & Entertainment > Games (1.00)
- Technology: