Zinkevich, Martin
Computing Robust Counter-Strategies
Johanson, Michael, Zinkevich, Martin, Bowling, Michael
Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counter-strategy to the inferred posterior of the other agents' behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents' expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed.
The Annual Computer Poker Competition
Bard, Nolan (University of Alberta) | Hawkin, John (Verafin) | Rubin, Jonathan (PARC) | Zinkevich, Martin (Google)
Now entering its eighth year, the Annual Computer Poker Competition (ACPC) is the premier event within the field of computer poker. With both academic and nonacademic competitors from around the world, the competition provides an open and international venue for benchmarking computer poker agents. We describe the competition's origins and evolution, current events, and winning techniques.
The Annual Computer Poker Competition
Bard, Nolan (University of Alberta) | Hawkin, John (Verafin) | Rubin, Jonathan (PARC) | Zinkevich, Martin (Google)
Now entering its eighth year, the Annual Computer Poker Competition (ACPC) is the premier event within the field of computer poker. With both academic and nonacademic competitors from around the world, the competition provides an open and international venue for benchmarking computer poker agents. We describe the competitionโs origins and evolution, current events, and winning techniques.
Parallelized Stochastic Gradient Descent
Zinkevich, Martin, Weimer, Markus, Li, Lihong, Smola, Alex J.
With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms our variant comes with parallel acceleration guarantees and it poses no overly tight latency constraints, which might only be available in the multicore setting. Our analysis introduces a novel proof technique --- contractive mappings to quantify the speed of convergence of parameter distributions to their asymptotic limits. As a side effect this answers the question of how quickly stochastic gradient descent algorithms reach the asymptotically normal regime.
Slow Learners are Fast
Zinkevich, Martin, Langford, John, Smola, Alex J.
Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning.
Monte Carlo Sampling for Regret Minimization in Extensive Games
Lanctot, Marc, Waugh, Kevin, Zinkevich, Martin, Bowling, Michael
Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: {\it outcome sampling} and {\it external sampling}, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFRs bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games.
Computing Robust Counter-Strategies
Johanson, Michael, Zinkevich, Martin, Bowling, Michael
Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counter-strategy to the inferred posterior of the other agents' behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents' expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in two-player Texas Hold'em. We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses.
Regret Minimization in Games with Incomplete Information
Zinkevich, Martin, Johanson, Michael, Bowling, Michael, Piccione, Carmelo
Extensive games are a powerful model of multiagent decision-making scenarios with incomplete information. Finding a Nash equilibrium for very large instances of these games has received a great deal of recent attention. In this paper, we describe a new technique for solving large games based on regret minimization. In particular, we introduce the notion of counterfactual regret, which exploits the degree of incomplete information in an extensive game. We show how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium. We demonstrate this technique in the domain of poker, showing we can solve abstractions of limit Texas Hold'em with as many as 10
iLSTD: Eligibility Traces and Convergence Analysis
Geramifard, Alborz, Bowling, Michael, Zinkevich, Martin, Sutton, Richard S.
In this paper, we generalize the previous iLSTD algorithm and present three new results: (1)the first convergence proof for an iLSTD algorithm; (2) an extension to incorporate eligibility traces without changing the asymptotic computational complexity; and(3) the first empirical results with an iLSTD algorithm for a problem (mountain car) with feature vectors large enough (n 10, 000) to show substantial computationaladvantages over LSTD.
Cyclic Equilibria in Markov Games
Zinkevich, Martin, Greenwald, Amy, Littman, Michael L.
Although variants of value iteration have been proposed for finding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demonstrate by construction that existing variants of value iteration cannot find stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value iteration based on a new (non-stationary) equilibrium concept that we call "cyclic equilibria." We prove that value iteration identifies cyclic equilibria in a class of games in which it fails to find stationary equilibria. We also demonstrate empirically that value iteration finds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games.