Goto

Collaborating Authors

 quilibria


$\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games

arXiv.org Artificial Intelligence

No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T^{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.


Computing Bayes-Nash Equilibria in Combinatorial Auctions with Verification

Journal of Artificial Intelligence Research

We present a new algorithm for computing pure-strategy ε-Bayes-Nash equilibria (ε-BNEs) in combinatorial auctions with continuous value and action spaces. An essential innovation of our algorithm is to separate the algorithm's search phase (for finding the ε-BNE) from the verification phase (for computing the ε). Using this approach, we obtain an algorithm that is both very fast and provides theoretical guarantees on the ε it finds. Our main technical contribution is a verification method which allows us to upper bound the ε across the whole continuous value space without making assumptions about the mechanism. Using our algorithm, we can now compute ε-BNEs in multi-minded domains that are significantly more complex than what was previously possible to solve. We release our code under an open-source license to enable researchers to perform algorithmic analyses of auctions, to enable bidders to analyze different strategies, and to facilitate many other applications.


Selfish Robustness and Equilibria in Multi-Player Bandits

arXiv.org Machine Learning

Motivated by cognitive radios, stochastic multi-player multi-armed bandits gained a lot of interest recently. In this class of problems, several players simultaneously pull arms and encounter a collision -- with 0 reward -- if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward (obediently following some fixed protocol) has been mostly considered, robustness to malicious players is a crucial and challenging concern. Existing approaches consider only the case of adversarial jammers whose objective is to blindly minimize the collective reward. We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare. We provide the first algorithm robust to selfish players (a.k.a. Nash equilibrium) with a logarithmic regret, when the arm reward is observed. When collisions are also observed, Grim Trigger type of strategies enable some implicit communication-based algorithms and we construct robust algorithms in two different settings: in the homogeneous case (with a regret comparable to the centralized optimal one) and in the heterogeneous case (for an adapted and relevant notion of regret). We also provide impossibility results when only the reward is observed or when arm means vary arbitrarily among players.