Goto

Collaborating Authors

 Agents


Potential-Aware Imperfect-Recall Abstraction with Earth Mover's Distance in Imperfect-Information Games

AAAI Conferences

There is often a large disparity between the size of a game we wish to solve and the size of the largest instances solvable by the best algorithms; for example, a popular variant of poker has about $10^{165}$ nodes in its game tree, while the currently best approximate equilibrium-finding algorithms scale to games with around $10^{12}$ nodes. In order to approximate equilibrium strategies in these games, the leading approach is to create a sufficiently small strategic approximation of the full game, called an abstraction, and to solve that smaller game instead. The leading abstraction algorithm for imperfect-information games generates abstractions that have imperfect recall and are distribution aware, using $k$-means with the earth mover's distance metric to cluster similar states together. A distribution-aware abstraction groups states together at a given round if their full distributions over future strength are similar (as opposed to, for example, just the expectation of their strength). The leading algorithm considers distributions over future strength at the final round of the game. However, one might benefit by considering the trajectory of distributions over strength in all future rounds, not just the final round. An abstraction algorithm that takes all future rounds into account is called potential aware. We present the first algorithm for computing potential-aware imperfect-recall abstractions using earth mover's distance. Experiments on no-limit Texas Hold'em show that our algorithm improves performance over the previously best approach.


On the Axiomatic Characterization of Runoff Voting Rules

AAAI Conferences

Runoff voting rules such as single transferable vote (STV) and Baldwin's rule are of particular interest in computational social choice due to their recursive nature and hardness of manipulation, as well as in (human) practice because they are relatively easy to understand. However, they are not known for their compliance with desirable axiomatic properties, which we attempt to rectify here. We characterize runoff rules that are based on scoring rules using two axioms: a weakening of local independence of irrelevant alternatives and a variant of population-consistency. We then show, as our main technical result, that STV is the only runoff scoring rule satisfying an independence-of-clones property. Furthermore, we provide axiomatizations of Baldwin's rule and Coombs' rule.


Binary Aggregation by Selection of the Most Representative Voters

AAAI Conferences

Examples range from multiagent planning, That is, we look for the most representative voter and return to crowdsourcing and human computation, to collaborative her ballot as the outcome. In our example, a natural choice filtering for recommender systems, to rank aggregation would be any of the voters voting (0, 1, 1). The distance of for search engines, to coordination and resource allocation this choice to the individual ballots is 42 (21 voters disagree in multiagent systems. Several frameworks have been on 2 issues each), i.e., this solution is only marginally worse proposed in the literature on computational social choice than the solution returned by the distance-based rule--and it (Chevaleyre et al. 2007; Brandt, Conitzer, and Endriss 2013) is optimal in case (1, 1, 1) is infeasible.


On Detecting Nearly Structured Preference Profiles

AAAI Conferences

Structured preference domains, such as, for example, the domains of single-peaked and single-crossing preferences, are known to admit efficient algorithms for many problems in computational social choice. Some of these algorithms extend to preferences that are close to having the respective structural property, i.e., can be made to enjoy this property by performing minor changes to voters' preferences, such as deleting a small number of voters or candidates. However, it has recently been shown that finding the optimal number of voters or candidates to delete in order to achieve the desired structural property is NP-hard for many such domains. In this paper, we show that these problems admit efficient approximation algorithms. Our results apply to all domains that can be characterized in terms of forbidden configurations; this includes, in particular, single-peaked and single-crossing elections. For a large range of scenarios, our approximation results are optimal under a plausible complexity-theoretic assumption. We also provide parameterized complexity results for this class of problems.


Using Response Functions to Measure Strategy Strength

AAAI Conferences

Extensive-form games are a powerful tool for representing complex multi-agent interactions. Nash equilibrium strategies are commonly used as a solution concept for extensive-form games, but many games are too large for the computation of Nash equilibria to be tractable. In these large games, exploitability has traditionally been used to measure deviation from Nash equilibrium, and thus strategies are aimed to achieve minimal exploitability. However, while exploitability measures a strategy's worst-case performance, it fails to capture how likely that worst-case is to be observed in practice. In fact, empirical evidence has shown that a less exploitable strategy can perform worse than a more exploitable strategy in one-on-one play against a variety of opponents. In this work, we propose a class of response functions that can be used to measure the strength of a strategy. We prove that standard no-regret algorithms can be used to learn optimal strategies for a scenario where the opponent uses one of these response functions. We demonstrate the effectiveness of this technique in Leduc Hold'em against opponents that use the UCT Monte Carlo tree search algorithm.


Mechanism Design for Scheduling with Uncertain Execution Time

AAAI Conferences

We study the problem where a task (or multiple unrelated tasks) must be executed, there are multiple machines/agents that can potentially perform the task, and our objective is to minimize the expected sum of the agents' processing times. Each agent does not know exactly how long it will take him to finish the task; he only knows the distribution from which this time is drawn. These times are independent across agents and the distributions fulfill the monotone hazard rate condition. Agents are selfish and will lie about their distributions if this increases their expected utility. We study different variations of the Vickrey mechanism that take as input the agents' reported distributions and the players' realized running times and that output a schedule that minimizes the expected sum of processing times, as well as payments that make it an ex-post equilibrium for the agents to both truthfully report their distributions and exert full effort to complete the task. We devise the ChPE mechanism, which is uniquely tailored to our problem, and has many desirable properties including: not rewarding agents that fail to finish the task and having non-negative payments.


Modal Ranking: A Uniquely Robust Voting Rule

AAAI Conferences

Motivated by applications to crowdsourcing, we study voting rules that output a correct ranking of alternatives by quality from a large collection of noisy input rankings. We seek voting rules that are supremely robust to noise, in the sense of being correct in the face of any "reasonable" type of noise. We show that there is such a voting rule, which we call the modal ranking rule. Moreover, we establish that the modal ranking rule is the unique rule with the preceding robustness property within a large family of voting rules, which includes a slew of well-studied rules.


Biased Games

AAAI Conferences

We present a novel extension of normal form games that we call biased games. In these games, a player's utility is influenced by the distance between his mixed strategy and a given base strategy. We argue that biased games capture important aspects of the interaction between software agents. Our main result is that biased games satisfying certain mild conditions always admit an equilibrium. We also tackle the computation of equilibria in biased games.


Solving Imperfect Information Games Using Decomposition

AAAI Conferences

Decomposition, i.e. independently analyzing possible subgames, has proven to be an essential principle for effective decision-making in perfect information games. However, in imperfect information games, decomposition has proven to be problematic. To date, all proposed techniques for decomposition in imperfect information games have abandoned theoretical guarantees. This work presents the first technique for decomposing an imperfect information game into subgames that can be solved independently, while retaining optimality guarantees on the full-game solution. We can use this technique to construct theoretically justified algorithms that make better use of information available at run-time, overcome memory or disk limitations at run-time, or make a time/space trade-off to overcome memory or disk limitations while solving a game. In particular, we present an algorithm for subgame solving which guarantees performance in the whole game, in contrast to existing methods which may have unbounded error. In addition, we present an offline game solving algorithm, CFR-D, which can produce a Nash equilibrium for a game that is larger than available storage.


Regret Transfer and Parameter Optimization

AAAI Conferences

Regret matching is a widely-used algorithm for learning how to act. We begin by proving that regrets on actions in one setting (game) can be transferred to warm start the regrets for solving a different setting with same structure but different payoffs that can be written as a function of parameters. We prove how this can be done by carefully discounting the prior regrets. This provides, to our knowledge, the first principled warm-starting method for no-regret learning. It also extends to warm-starting the widely-adopted counterfactual regret minimization (CFR) algorithm for large incomplete-information games; we show this experimentally as well. We then study optimizing a parameter vector for a player in a two-player zero-sum game (e.g., optimizing bet sizes to use in poker). We propose a custom gradient descent algorithm that provably finds a locally optimal parameter vector while leveraging our warm-start theory to significantly save regret-matching iterations at each step. It optimizes the parameter vector while simultaneously finding an equilibrium. We present experiments in no-limit Leduc Hold'em and no-limit Texas Hold'em to optimize bet sizing. This amounts to the first action abstraction algorithm (algorithm for selecting a small number of discrete actions to use from a continuum of actions---a key preprocessing step for solving large games using current equilibrium-finding algorithms) with convergence guarantees for extensive-form games.