We investigate Kantian equilibria in finite normal form games, a class of non-Nashian, morally motivated courses of action that was recently proposed in the economics literature. We highlight a number of problems with such equilibria, including computational intractability, a high price of miscoordination, and expensive/problematic extension to general normal form games. We point out that such a proper generalization will likely involve the concept of program equilibrium. Finally we propose some general, intuitive, computationally tractable, other-regarding equilibria related to Kantian equilibria, as well as a class of courses of action that interpolates between purely self-regarding and Kantian behavior.
Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.
We describe mechanisms for the allocation of a scarce resource among multiple users in a way that is efficient, fair, and strategy-proof, but when users do not know their resource requirements. The mechanism is repeated for multiple rounds and a user's requirements can change on each round. At the end of each round, users provide feedback about the allocation they received, enabling the mechanism to learn user preferences over time. Such situations are common in the shared usage of a compute cluster among many users in an organisation, where all teams may not precisely know the amount of resources needed to execute their jobs. By understating their requirements, users will receive less than they need and consequently not achieve their goals. By overstating them, they may siphon away precious resources that could be useful to others in the organisation. We formalise this task of online learning in fair division via notions of efficiency, fairness, and strategy-proofness applicable to this setting, and study this problem under three types of feedback: when the users' observations are deterministic, when they are stochastic and follow a parametric model, and when they are stochastic and nonparametric. We derive mechanisms inspired by the classical max-min fairness procedure that achieve these requisites, and quantify the extent to which they are achieved via asymptotic rates. We corroborate these insights with an experimental evaluation on synthetic problems and a web-serving task.
Owning to the unremitting efforts by a few institutes, significant progress has recently been made in designing superhuman AIs in No-limit Texas Hold'em (NLTH), the primary testbed for large-scale imperfect-information game research. However, it remains challenging for new researchers to study this problem since there are no standard benchmarks for comparing with existing methods, which seriously hinders further developments in this research area. In this work, we present OpenHoldem, an integrated toolkit for large-scale imperfect-information game research using NLTH. OpenHoldem makes three main contributions to this research direction: 1) a standardized evaluation protocol for thoroughly evaluating different NLTH AIs, 2) three publicly available strong baselines for NLTH AI, and 3) an online testing platform with easy-to-use APIs for public NLTH AI evaluation. We have released OpenHoldem at http://holdem.ia.ac.cn/, hoping it facilitates further studies on the unsolved theoretical and computational issues in this area and cultivate crucial research problems like opponent modeling, large-scale equilibrium-finding, and human-computer interactive learning.
Generative adversarial network (GAN) is among the most popular deep learning models for learning complex data distributions. However, training a GAN is known to be a challenging task. This is often attributed to the lack of correlation between the training progress and the trajectory of the generator and discriminator losses and the need for the GAN's subjective evaluation. A recently proposed measure inspired by game theory - the duality gap, aims to bridge this gap. However, as we demonstrate, the duality gap's capability remains constrained due to limitations posed by its estimation process. This paper presents a theoretical understanding of this limitation and proposes a more dependable estimation process for the duality gap. At the crux of our approach is the idea that local perturbations can help agents in a zero-sum game escape non-Nash saddle points efficiently. Through exhaustive experimentation across GAN models and datasets, we establish the efficacy of our approach in capturing the GAN training progress with minimal increase to the computational complexity. Further, we show that our estimate, with its ability to identify model convergence/divergence, is a potential performance measure that can be used to tune the hyperparameters of a GAN.
In many real world situations, collective decisions are made using voting and, in scenarios such as committee or board elections, employing voting rules that return multiple winners. In multi-winner approval voting (AV), an agent submits a ballot consisting of approvals for as many candidates as they wish, and winners are chosen by tallying up the votes and choosing the top-$k$ candidates receiving the most approvals. In many scenarios, an agent may manipulate the ballot they submit in order to achieve a better outcome by voting in a way that does not reflect their true preferences. In complex and uncertain situations, agents may use heuristics instead of incurring the additional effort required to compute the manipulation which most favors them. In this paper, we examine voting behavior in single-winner and multi-winner approval voting scenarios with varying degrees of uncertainty using behavioral data obtained from Mechanical Turk. We find that people generally manipulate their vote to obtain a better outcome, but often do not identify the optimal manipulation. There are a number of predictive models of agent behavior in the COMSOC and psychology literature that are based on cognitively plausible heuristic strategies. We show that the existing approaches do not adequately model real-world data. We propose a novel model that takes into account the size of the winning set and human cognitive constraints, and demonstrate that this model is more effective at capturing real-world behaviors in multi-winner approval voting scenarios.
We consider a scenario in which two reinforcement learning agents repeatedly play a matrix game against each other and update their parameters after each round. The agents' decision-making is transparent to each other, which allows each agent to predict how their opponent will play against them. To prevent an infinite regress of both agents recursively predicting each other indefinitely, each agent is required to give an opponent-independent response with some probability at least epsilon. Transparency also allows each agent to anticipate and shape the other agent's gradient step, i.e. to move to regions of parameter space in which the opponent's gradient points in a direction favourable to them. We study the resulting dynamics experimentally, using two algorithms from previous literature (LOLA and SOS) for opponent-aware learning. We find that the combination of mutually transparent decision-making and opponent-aware learning robustly leads to mutual cooperation in a single-shot prisoner's dilemma. In a game of chicken, in which both agents try to manoeuvre their opponent towards their preferred equilibrium, converging to a mutually beneficial outcome turns out to be much harder, and opponent-aware learning can even lead to worst-case outcomes for both agents. This highlights the need to develop opponent-aware learning algorithms that achieve acceptable outcomes in social dilemmas involving an equilibrium selection problem.
We study convergence properties of the mixed strategies that result from a general class of optimal no regret learning strategies in a repeated game setting where the stage game is any 2 by 2 competitive game (i.e. game for which all the Nash equilibria (NE) of the game are completely mixed). We consider the class of strategies whose information set at each step is the empirical average of the opponent's realized play (and the step number), that we call mean based strategies. We first show that there does not exist any optimal no regret, mean based strategy for player 1 that would result in the convergence of her mixed strategies (in probability) against an opponent that plays his Nash equilibrium mixed strategy at each step. Next, we show that this last iterate divergence necessarily occurs if player 2 uses any adaptive strategy with a minimal randomness property. This property is satisfied, for example, by any fixed sequence of mixed strategies for player 2 that converges to NE. We conjecture that this property holds when both players use optimal no regret learning strategies against each other, leading to the divergence of the mixed strategies with a positive probability. Finally, we show that variants of mean based strategies using recency bias, which have yielded last iterate convergence in deterministic min max optimization, continue to lead to this last iterate divergence. This demonstrates a crucial difference in outcomes between using the opponent's mixtures and realizations to make strategy updates.
The Shapley value solution concept from cooperative game theory has become popular for interpreting ML models, but efficiently estimating Shapley values remains challenging, particularly in the model-agnostic setting. We revisit the idea of estimating Shapley values via linear regression to understand and improve upon this approach. By analyzing KernelSHAP alongside a newly proposed unbiased estimator, we develop techniques to detect its convergence and calculate uncertainty estimates. We also find that that the original version incurs a negligible increase in bias in exchange for a significant reduction in variance, and we propose a variance reduction technique that further accelerates the convergence of both estimators. Finally, we develop a version of KernelSHAP for stochastic cooperative games that yields fast new estimators for two global explanation methods.
How can we assess the value of data objectively, systematically and quantitatively? Pricing data, or information goods in general, has been studied and practiced in dispersed areas and principles, such as economics, marketing, electronic commerce, data management, data mining and machine learning. In this article, we present a unified, interdisciplinary and comprehensive overview of this important direction. We examine various motivations behind data pricing, understand the economics of data pricing and review the development and evolution of pricing models according to a series of fundamental principles. We discuss both digital products and data products. We also consider a series of challenges and directions for future work.