Industry
Tree-Based Solution Methods for Multiagent POMDPs with Delayed Communication
Oliehoek, Frans Adriaan (Maastricht University) | Spaan, Matthijs T. J. (Delft University of Technology)
Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal decision making under the assumption of instantaneous communication. We focus on a delayed communication setting (MPOMDP-DC), in which broadcasted information is delayed by at most one time step. This model allows agents to act on their most recent (private) observation. Such an assumption is a strict generalization over having agents wait until the global information is available and is more appropriate for applications in which response time is critical. In this setting, however, value function backups are significantly more costly, and naive application of incremental pruning, the core of many state-of-the-art optimal POMDP techniques, is intractable. In this paper, we overcome this problem by demonstrating that computation of the MPOMDP-DC backup can be structured as a tree and introducing two novel tree-based pruning techniques that exploit this structure in an effective way. We experimentally show that these methods have the potential to outperform naive incremental pruning by orders of magnitude, allowing for the solution of larger problems.
Characterizing Multi-Agent Team Behavior from Partial Team Tracings: Evidence from the English Premier League
Lucey, Patrick (Disney Research Pittsburgh) | Bialkowski, Alina (Queensland University of Technology and Disney Research Pittsburgh) | Carr, Peter (Disney Research Pittsburgh) | Foote, Eric (Disney Research Pittsburgh) | Matthews, Iain (Disney Research Pittsburgh)
Real-world AI systems have been recently deployed which can automatically analyze the plan and tactics of tennis players. As the game-state is updated regularly at short intervals (i.e. point-level), a library of successful and unsuccessful plans of a player can be learnt over time. Given the relative strengths and weaknesses of a player’s plans, a set of proven plans or tactics from the library that characterize a player can be identified. For low-scoring, continuous team sports like soccer, such analysis for multi-agent teams does not exist as the game is not segmented into “discretized” plays (i.e. plans), making it difficult to obtain a library that characterizes a team’s behavior. Additionally, as player tracking data is costly and difficult to obtain, we only have partial team tracings in the form of ball actions which makes this problem even more difficult. In this paper, we propose a method to overcome these issues by representing team behavior via play-segments, which are spatio-temporal descriptions of ball movement over fixed windows of time. Using these representations we can characterize team behavior from entropy maps, which give a measure of predictability of team behaviors across the field. We show the efficacy and applicability of our method on the 2010-2011 English Premier League soccer data.
Competing with Humans at Fantasy Football: Team Formation in Large Partially-Observable Domains
Matthews, Tim (University of Southampton) | Ramchurn, Sarvapali D. (University of Southampton) | Chalkiadakis, Georgios (Technical University of Crete)
We present the first real-world benchmark for sequentially-optimal team formation, working within the framework of a class of online football prediction games known as Fantasy Football. We model the problem as a Bayesian reinforcement learning one, where the action space is exponential in the number of players and where the decision maker's beliefs are over multiple characteristics of each footballer. We then exploit domain knowledge to construct computationally tractable solution techniques in order to build a competitive automated Fantasy Football manager. Thus, we are able to establish the baseline performance in this domain, even without complete information on footballers' performances (accessible to human managers), showing that our agent is able to rank at around the top percentile when pitched against 2.5M human players.
Computing Optimal Strategies to Commit to in Stochastic Games
Letchford, Joshua (Duke University) | MacDermed, Liam (Georgia Institute of Technology) | Conitzer, Vincent (Duke University) | Parr, Ronald (Duke University) | Isbell, Charles L. (Georgia Institute of Technology)
Significant progress has been made recently in the following two lines of research in the intersection of AI and game theory: (1) the computation of optimal strategies to commit to (Stackelberg strategies), and (2) the computation of correlated equilibria of stochastic games. In this paper, we unite these two lines of research by studying the computation of Stackelberg strategies in stochastic games. We provide theoretical results on the value of being able to commit and the value of being able to correlate, as well as complexity results about computing Stackelberg strategies in stochastic games. We then modify the QPACE algorithm (MacDermed et al. 2011) to compute Stackelberg strategies, and provide experimental results.
Finding Optimal Abstract Strategies in Extensive-Form Games
Johanson, Michael (University of Alberta) | Bard, Nolan (University of Alberta) | Burch, Neil (University of Alberta) | Bowling, Michael (University of Alberta)
Extensive-form games are a powerful model for representing interactions between agents. Nash equilibrium strategies are a common solution concept for extensive-form games and, in two-player zero-sum games, there are efficient algorithms for calculating such strategies. In large games, this computation may require too much memory and time to be tractable. A standard approach in such cases is to apply a lossy state-space abstraction technique to produce a smaller abstract game that can be tractably solved, while hoping that the resulting abstract game equilibrium is close to an equilibrium strategy in the unabstracted game. Recent work has shown that this assumption is unreliable, and an arbitrary Nash equilibrium in the abstract game is unlikely to be even near the least suboptimal strategy that can be represented in that space. In this work, we present for the first time an algorithm which efficiently finds optimal abstract strategies --- strategies with minimal exploitability in the unabstracted game. We use this technique to find the least exploitable strategy ever reported for two-player limit Texas hold'em.
The Deployment-to-Saturation Ratio in Security Games
Jain, Manish (University of Southern California) | Leyton-Brown, Kevin (University of British Columbia) | Tambe, Milind (University of Southern California)
Stackelberg security games form the backbone of systems like ARMOR, IRIS and PROTECT, which are in regular use by the Los Angeles International Police, US Federal Air Marshal Service and the US Coast Guard respectively. An understanding of the runtime required by algorithms that power such systems is critical to furthering the application of game theory to other real-world domains. This paper identifies the concept of the deployment-to-saturation ratio in random Stackelberg security games, and shows that problem instances for which this ratio is 0.5 are computationally harder than instances with other deployment-to-saturation ratios for a wide range of different equilibrium computation methods, including (i) previously published different MIP algorithms, and (ii) different underlying solvers and solution mechanisms. This finding has at least two important implications. First, it is important for new algorithms to be evaluated on the hardest problem instances. We show that this has often not been done in the past, and introduce a publicly available benchmark suite to facilitate such comparisons. Second, we provide evidence that this computationally hard region is also one where optimization would be of most benefit to security agencies, and thus requires significant attention from researchers in this area. Furthermore, we use the concept of phase transitions to better understand this computationally hard region. We define a decision problem related to security games, and show that the probability that this problem has a solution exhibits a phase transition as the deployment-to-saturation ratio crosses 0.5. We also demonstrate that this phase transition is invariant to changes both in the domain and the domain representation, and that the phase transition point corresponds to the computationally hardest instances.
Generalized Sampling and Variance in Counterfactual Regret Minimization
Gibson, Richard (University of Alberta) | Lanctot, Marc (University of Alberta) | Burch, Neil (University of Alberta) | Szafron, Duane (University of Alberta) | Bowling, Michael (University of Alberta)
In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. While the base algorithm performs a full tree traversal on each iteration, Monte Carlo CFR (MCCFR) reduces the per iteration time cost by traversing just a sampled portion of the tree. On the other hand, MCCFR's sampled values introduce variance, and the effects of this variance were previously unknown. In this paper, we generalize MCCFR by considering any generic estimator of the sought values. We show that any choice of an estimator can be used to probabilistically minimize regret, provided the estimator is bounded and unbiased. In addition, we relate the variance of the estimator to the convergence rate of an algorithm that calculates regret directly from the estimator. We demonstrate the application of our analysis by defining a new bounded, unbiased estimator with empirically lower variance than MCCFR estimates. Finally, we use this estimator in a new sampling algorithm to compute approximate equilibria in Goofspiel, Bluff, and Texas hold'em poker. Under each of our selected sampling schemes, our new algorithm converges faster than MCCFR.
Symmetric Subgame Perfect Equilibria in Resource Allocation
Cigler, Ludek (EPFL, Lausanne) | Faltings, Boi (EPFL, Lausanne)
We analyze symmetric protocols to rationally coordinate on an asymmetric, efficient allocation in an infinitely repeated N-agent, C-resource allocation problems. (Bhaskar 2000) proposed one way to achieve this in 2-agent, 1-resource allocation games: Agents start by symmetrically randomizing their actions, and as soon as they each choose different actions, they start to follow a potentially asymmetric "convention" that prescribes their actions from then on. We extend the concept of convention to the general case of infinitely repeated resource allocation games with N agents and C resources. We show that for any convention, there exists a symmetric subgame perfect equilibrium which implements it. We present two conventions: bourgeois, where agents stick to the first allocation; and market, where agents pay for the use of resources, and observe a global coordination signal which allows them to alternate between different allocations. We define price of anonymity of a convention as the ratio between the maximum social payoff of any (asymmetric) strategy profile and the expected social payoff of the convention. We show that while the price of anonymity of the bourgeois convention is infinite, the market convention decreases this price by reducing the conflict between the agents.
Computing the Nucleolus of Matching, Cover and Clique Games
Chen, Ning (Nanyang Technological University) | Lu, Pinyan (Microsoft Research Asia) | Zhang, Hongyang (Shanghai Jiao Tong University)
In cooperative games, a key question is to find a division of payoffs to coalition members in a fair manner. Nucleolus is one of such solution concepts that provides a stable solution for the grand coalition. We study the computation of the nucleolus of a number of cooperative games, including fractional matching games and fractional edge cover games on general weighted graphs, as well as vertex cover games and clique games on weighted bipartite graphs. Our results are on the positive side---we give efficient algorithms to compute the nucleolus, as well as the least core, of all of these games.
A Multivariate Complexity Analysis of Lobbying in Multiple Referenda
Bredereck, Robert (Technische Universität Berlin) | Chen, Jiehua (Technische Universität Berlin) | Hartung, Sepp (Technische Universität Berlin) | Niedermeier, Rolf (Technische Universität Berlin) | Suchý, Ondřej (Technische Universität Berlin) | Kratsch, Stefan (Universiteit Utrecht, Utrecht)
We extend work by Christian et al. [Review of Economic Design 2007] on lobbying in multiple referenda by first providing a more fine-grained analysis of the computational complexity of the NP-complete Lobbying problem. Herein, given a binary matrix, the columns represent issues to vote on and the rows correspond to voters making a binary vote on each issue. An issue is approved if a majority of votes has a 1 in the corresponding column. The goal is to get all issues approved by modifying a minimum number of rows to all-1-rows. In our multivariate complexity analysis, we present a more holistic view on the nature of the computational complexity of Lobbying, providing both (parameterized) tractability and intractability results, depending on various problem parameterizations to be adopted. Moreover, we show non-existence results concerning efficient and effective preprocessing for Lobbying and introduce natural variants such as Restricted Lobbying and Partial Lobbying.