Goto

Collaborating Authors

 polymatrix game



AApproximate Target Maximum Welfare Minimum Relative Entropy Equilbiria We use a Minimum Relative Entropy (RME) (also known as minimum KL divergence) Pa (a)ln

Neural Information Processing Systems

This objective is similar to Maximum Entropy Correlated Equilibrium (MECE) [48], and the proofs here are similar to the framework set out there. A drawback of MECE is that it is not easy to determine the minimum p permissible. If we choose p that does not permit a valid solution, then the parameters will diverge. We can circumvent this problem by optimizing the distance to a target ห† p. And ยตis for balancing the linear objective.




From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications

arXiv.org Artificial Intelligence

The convergence of online learning algorithms in games under self-play is a fundamental question in game theory and machine learning. Among various notions of convergence, last-iterate convergence is particularly desirable, as it reflects the actual decisions made by the learners and captures the day-to-day behavior of the learning dynamics. While many algorithms are known to converge in the average-iterate, achieving last-iterate convergence typically requires considerably more effort in both the design and the analysis of the algorithm. Somewhat surprisingly, we show in this paper that for a large family of games, there exists a simple black-box reduction that transforms the average iterates of an uncoupled learning dynamics into the last iterates of a new uncoupled learning dynamics, thus also providing a reduction from last-iterate convergence to average-iterate convergence. Our reduction applies to games where each player's utility is linear in both their own strategy and the joint strategy of all opponents. This family includes two-player bimatrix games and generalizations such as multi-player polymatrix games. By applying our reduction to the Optimistic Multiplicative Weights Update algorithm, we obtain new state-of-the-art last-iterate convergence rates for uncoupled learning dynamics in multi-player zero-sum polymatrix games: (1) an $O(\frac{\log d}{T})$ last-iterate convergence rate under gradient feedback, representing an exponential improvement in the dependence on the dimension $d$ (i.e., the maximum number of actions available to either player); and (2) an $\widetilde{O}(d^{\frac{1}{5}} T^{-\frac{1}{5}})$ last-iterate convergence rate under bandit feedback, improving upon the previous best rates of $\widetilde{O}(\sqrt{d} T^{-\frac{1}{8}})$ and $\widetilde{O}(\sqrt{d} T^{-\frac{1}{6}})$.



Aggregate Fictitious Play for Learning in Anonymous Polymatrix Games (Extended Version)

arXiv.org Artificial Intelligence

Fictitious play (FP) is a well-studied algorithm that enables agents to learn Nash equilibrium in games with certain reward structures. However, when agents have no prior knowledge of the reward functions, FP faces a major challenge: the joint action space grows exponentially with the number of agents, which slows down reward exploration. Anonymous games offer a structure that mitigates this issue. In these games, the rewards depend only on the actions taken; not on who is taking which action. Under such a structure, we introduce aggregate fictitious play (agg-FP), a variant of FP where each agent tracks the frequency of the number of other agents playing each action, rather than these agents' individual actions. We show that in anonymous polymatrix games, agg-FP converges to a Nash equilibrium under the same conditions as classical FP. In essence, by aggregating the agents' actions, we reduce the action space without losing the convergence guarantees. Using simulations, we provide empirical evidence on how this reduction accelerates convergence.


Differentially Private Equilibrium Finding in Polymatrix Games

arXiv.org Artificial Intelligence

We study equilibrium finding in polymatrix games under differential privacy constraints. To start, we show that high accuracy and asymptotically vanishing differential privacy budget (as the number of players goes to infinity) cannot be achieved simultaneously under either of the two settings: (i) We seek to establish equilibrium approximation guarantees in terms of Euclidean distance to the equilibrium set, and (ii) the adversary has access to all communication channels. Then, assuming the adversary has access to a constant number of communication channels, we develop a novel distributed algorithm that recovers strategies with simultaneously vanishing Nash gap (in expected utility, also referred to as exploitability and privacy budget as the number of players increases.


Guarantees for Self-Play in Multiplayer Games via Polymatrix Decomposability

arXiv.org Artificial Intelligence

Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself. Self-play is useful for generating large quantities of data for learning, but has the drawback that the agents the learner will face post-training may have dramatically different behavior than the learner came to expect by interacting with itself. For the special case of two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that perform well against any post-training opponent; however, no such guarantee exists for multiplayer games. We show that in games that approximately decompose into a set of two-player constant-sum games (called constant-sum polymatrix games) where global $\epsilon$-Nash equilibria are boundedly far from Nash equilibria in each subgame (called subgame stability), any no-external-regret algorithm that learns by self-play will produce a strategy with bounded vulnerability. For the first time, our results identify a structural property of multiplayer games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms. We demonstrate our findings through experiments on Leduc poker.


Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization

arXiv.org Artificial Intelligence

For example, the first column indicates the payoff when all background players play action 0. The second column indicates all background players play action 0 except for one which plays action 1, and so on. The last column indicates all background players play action 1. These 2n scalars uniquely define the payoffs of a symmetric game. Given that this game only has two actions, we represent a mixed strategy by a single scalar p [0, 1], i.e., the probability of the first action. Furthermore, this game is symmetric and we seek a symmetric equilibrium, so we can represent a full Nash equilibrium by this single scalar p. This reduces our search space from 7 2 = 14 variables to 1 variable (and obviates any need for a map s from the unit hypercube to the simplex--see Lemma 25).