Monte Carlo Sampling for Regret Minimization in Extensive Games
Lanctot, Marc, Waugh, Kevin, Zinkevich, Martin, Bowling, Michael
–Neural Information Processing Systems
Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation.
Neural Information Processing Systems
Feb-15-2020, 02:28:18 GMT
- Technology: