Goto

Collaborating Authors

 psne


Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback

arXiv.org Artificial Intelligence

No-regret self-play learning dynamics have become one of the premier ways to solve large-scale games in practice. Accelerating their convergence via improving the regret of the players over the naive $O(\sqrt{T})$ bound after $T$ rounds has been extensively studied in recent years, but almost all studies assume access to exact gradient feedback. We address the question of whether acceleration is possible under bandit feedback only and provide an affirmative answer for two-player zero-sum normal-form games. Specifically, we show that if both players apply the Tsallis-INF algorithm of Zimmert and Seldin (2018, arXiv:1807.07623), then their regret is at most $O(c_1 \log T + \sqrt{c_2 T})$, where $c_1$ and $c_2$ are game-dependent constants that characterize the difficulty of learning -- $c_1$ resembles the complexity of learning a stochastic multi-armed bandit instance and depends inversely on some gap measures, while $c_2$ can be much smaller than the number of actions when the Nash equilibria have a small support or are close to the boundary. In particular, for the case when a pure strategy Nash equilibrium exists, $c_2$ becomes zero, leading to an optimal instance-dependent regret bound as we show. We additionally prove that in this case, our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample complexity.


Computing Nash Equilibria in Generalized Interdependent Security Games

Neural Information Processing Systems

We study the computational complexity of computing Nash equilibria in generalized interdependent-security (IDS) games. Like traditional IDS games, originally introduced by economists and risk-assessment experts Heal and Kunreuther about a decade ago, generalized IDS games model agents' voluntary investment decisions when facing potential direct risk and transfer-risk exposure from other agents. A distinct feature of generalized IDS games, however, is that full investment can reduce transfer risk. As a result, depending on the transfer-risk reduction level, generalized IDS games may exhibit strategic complementarity (SC) or strategic substitutability (SS). We consider three variants of generalized IDS games in which players exhibit only SC, only SS, and both SC+SS.


PSNE: Efficient Spectral Sparsification Algorithms for Scaling Network Embedding

arXiv.org Artificial Intelligence

Network embedding has numerous practical applications and has received extensive attention in graph learning, which aims at mapping vertices into a low-dimensional and continuous dense vector space by preserving the underlying structural properties of the graph. Many network embedding methods have been proposed, among which factorization of the Personalized PageRank (PPR for short) matrix has been empirically and theoretically well supported recently. However, several fundamental issues cannot be addressed. (1) Existing methods invoke a seminal Local Push subroutine to approximate \textit{a single} row or column of the PPR matrix. Thus, they have to execute $n$ ($n$ is the number of nodes) Local Push subroutines to obtain a provable PPR matrix, resulting in prohibitively high computational costs for large $n$. (2) The PPR matrix has limited power in capturing the structural similarity between vertices, leading to performance degradation. To overcome these dilemmas, we propose PSNE, an efficient spectral s\textbf{P}arsification method for \textbf{S}caling \textbf{N}etwork \textbf{E}mbedding, which can fast obtain the embedding vectors that retain strong structural similarities. Specifically, PSNE first designs a matrix polynomial sparser to accelerate the calculation of the PPR matrix, which has a theoretical guarantee in terms of the Frobenius norm. Subsequently, PSNE proposes a simple but effective multiple-perspective strategy to enhance further the representation power of the obtained approximate PPR matrix. Finally, PSNE applies a randomized singular value decomposition algorithm on the sparse and multiple-perspective PPR matrix to get the target embedding vectors. Experimental evaluation of real-world and synthetic datasets shows that our solutions are indeed more efficient, effective, and scalable compared with ten competitors.


Computing Nash Equilibria in Generalized Interdependent Security Games

Neural Information Processing Systems

We study the computational complexity of computing Nash equilibria in generalized interdependent-security (IDS) games. Like traditional IDS games, originally introduced by economists and risk-assessment experts Heal and Kunreuther about a decade ago, generalized IDS games model agents' voluntary investment decisions when facing potential direct risk and transfer-risk exposure from other agents. A distinct feature of generalized IDS games, however, is that full investment can reduce transfer risk. As a result, depending on the transfer-risk reduction level, generalized IDS games may exhibit strategic complementarity (SC) or strategic substitutability (SS). We consider three variants of generalized IDS games in which players exhibit only SC, only SS, and both SC+SS.


Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

arXiv.org Artificial Intelligence

We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry $(i,j)$ of the input matrix $A\in[-1,1]^{n\times m}$ and observe $A_{i,j}+\eta$ where $\eta$ is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to identify the PSNE of $A$, whenever it exists, with high probability while taking as few samples as possible. Zhou et al. (2017) presents an instance-dependent sample complexity lower bound that depends only on the entries in the row and column in which the PSNE lies. We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors. The problem of identifying the PSNE also generalizes the problem of pure exploration in stochastic multi-armed bandits and dueling bandits, and our result matches the optimal bounds, up to log factors, in both the settings.


Instance-dependent Sample Complexity Bounds for Zero-sum Matrix Games

arXiv.org Artificial Intelligence

We study the sample complexity of identifying an approximate equilibrium for two-player zero-sum $n\times 2$ matrix games. That is, in a sequence of repeated game plays, how many rounds must the two players play before reaching an approximate equilibrium (e.g., Nash)? We derive instance-dependent bounds that define an ordering over game matrices that captures the intuition that the dynamics of some games converge faster than others. Specifically, we consider a stochastic observation model such that when the two players choose actions $i$ and $j$, respectively, they both observe each other's played actions and a stochastic observation $X_{ij}$ such that $\mathbb E[ X_{ij}] = A_{ij}$. To our knowledge, our work is the first case of instance-dependent lower bounds on the number of rounds the players must play before reaching an approximate equilibrium in the sense that the number of rounds depends on the specific properties of the game matrix $A$ as well as the desired accuracy. We also prove a converse statement: there exist player strategies that achieve this lower bound.


On Nash Equilibria in Normal-Form Games With Vectorial Payoffs

arXiv.org Artificial Intelligence

We provide an in-depth study of Nash equilibria in multi-objective normal form games (MONFGs), i.e., normal form games with vectorial payoffs. Taking a utility-based approach, we assume that each player's utility can be modelled with a utility function that maps a vector to a scalar utility. In the case of a mixed strategy, it is meaningful to apply such a scalarisation both before calculating the expectation of the payoff vector as well as after. This distinction leads to two optimisation criteria. With the first criterion, players aim to optimise the expected value of their utility function applied to the payoff vectors obtained in the game. With the second criterion, players aim to optimise the utility of expected payoff vectors given a joint strategy. Under this latter criterion, it was shown that Nash equilibria need not exist. Our first contribution is to provide a sufficient condition under which Nash equilibria are guaranteed to exist. Secondly, we show that when Nash equilibria do exist under both criteria, no equilibrium needs to be shared between the two criteria, and even the number of equilibria can differ. Thirdly, we contribute a study of pure strategy Nash equilibria under both criteria. We show that when assuming quasiconvex utility functions for players, the sets of pure strategy Nash equilibria under both optimisation criteria are equivalent. This result is further extended to games in which players adhere to different optimisation criteria. Finally, given these theoretical results, we construct an algorithm to compute all pure strategy Nash equilibria in MONFGs where players have a quasiconvex utility function.


Altruism Design in Networked Public Goods Games

arXiv.org Artificial Intelligence

Many collective decision-making settings feature a strategic tension between agents acting out of individual self-interest and promoting a common good. These include wearing face masks during a pandemic, voting, and vaccination. Networked public goods games capture this tension, with networks encoding strategic interdependence among agents. Conventional models of public goods games posit solely individual self-interest as a motivation, even though altruistic motivations have long been known to play a significant role in agents' decisions. We introduce a novel extension of public goods games to account for altruistic motivations by adding a term in the utility function that incorporates the perceived benefits an agent obtains from the welfare of others, mediated by an altruism graph. Most importantly, we view altruism not as immutable, but rather as a lever for promoting the common good. Our central algorithmic question then revolves around the computational complexity of modifying the altruism network to achieve desired public goods game investment profiles. We first show that the problem can be solved using linear programming when a principal can fractionally modify the altruism network. While the problem becomes in general intractable if the principal's actions are all-or-nothing, we exhibit several tractable special cases.


Resource Allocation Polytope Games: Uniqueness of Equilibrium, Price of Stability, and Price of Anarchy

AAAI Conferences

We consider a two-player resource allocation polytope game, in which the strategy of a player is restricted by the strategy of the other player, with common coupled constraints. With respect to such a game, we formally introduce the notions of independent optimal strategy profile, which is the profile when players play optimally in the absence of the other player; and common contiguous set, which is the set of top nodes in the preference orderings of both the players that are exhaustively invested on in the independent optimal strategy profile. We show that for the game to have a unique PSNE, it is a necessary and sufficient condition that the independent optimal strategies of the players do not conflict, and either the common contiguous set consists of at most one node or all the nodes in the common contiguous set are invested on by only one player in the independent optimal strategy profile. We further derive a socially optimal strategy profile, and show that the price of anarchy cannot be bound by a common universal constant. We hence present an efficient algorithm to compute the price of anarchy and the price of stability, given an instance of the game. Under reasonable conditions, we show that the price of stability is 1. We encounter a paradox in this game that higher budgets may lead to worse outcomes.


On the Sample Complexity of Learning Graphical Games

arXiv.org Machine Learning

We analyze the sample complexity of learning graphical games from purely behavioral data. We assume that we can only observe the players' joint actions and not their payoffs. We analyze the sufficient and necessary number of samples for the correct recovery of the set of pure-strategy Nash equilibria (PSNE) of the true game. Our analysis focuses on directed graphs with $n$ nodes and at most $k$ parents per node. Sparse graphs correspond to ${k \in O(1)}$ with respect to $n$, while dense graphs correspond to ${k \in O(n)}$. By using VC dimension arguments, we show that if the number of samples is greater than ${O(k n \log^2{n})}$ for sparse graphs or ${O(n^2 \log{n})}$ for dense graphs, then maximum likelihood estimation correctly recovers the PSNE with high probability. By using information-theoretic arguments, we show that if the number of samples is less than ${\Omega(k n \log^2{n})}$ for sparse graphs or ${\Omega(n^2 \log{n})}$ for dense graphs, then any conceivable method fails to recover the PSNE with arbitrary probability.