Goto

Collaborating Authors

 approximate nash equilibria


Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Neural Information Processing Systems

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable PSRO-based method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of 10^50. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.


Review for NeurIPS paper: Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Neural Information Processing Systems

Weaknesses: The paper is missing a comparison with the most relevant previous work, namely XFP [1] Heinrich, Johannes, and David Silver. Both of these works are mentioned in the Background and Related Work, but: 1) XFP is just mentioned but never compared to in experiments 2)DeepCFR is just discarded with "However, Deep CFR uses external sampling, which may be impractical for games with a large branching factor such as Stratego and Barrage Stratego." Furthermore, there are newer variants based on this work, and it is not limited to a particular form of sampling. The paper only really compares to other variants from the PSRO family Furthermore, the theory and algorithms (the way described in the text) deal only with matrix games, while the experiments are on extensive form games. If the goal is to run on top of the exponentially large matrix game, this should be discussed.


Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Neural Information Processing Systems

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable PSRO-based method for finding approximate Nash equilibria in large zero-sum imperfect-information games.


On adversarial robustness and the use of Wasserstein ascent-descent dynamics to enforce it

Trillos, Camilo Garcia, Trillos, Nicolas Garcia

arXiv.org Artificial Intelligence

We propose iterative algorithms to solve adversarial problems in a variety of supervised learning settings of interest. Our algorithms, which can be interpreted as suitable ascent-descent dynamics in Wasserstein spaces, take the form of a system of interacting particles. These interacting particle dynamics are shown to converge toward appropriate mean-field limit equations in certain large number of particles regimes. In turn, we prove that, under certain regularity assumptions, these mean-field equations converge, in the large time limit, toward approximate Nash equilibria of the original adversarial learning problems. We present results for nonconvex-nonconcave settings, as well as for nonconvex-concave ones. Numerical experiments illustrate our results.


Game Theory reveals the Future of Deep Learning

#artificialintelligence

If you've been following my articles up to now, you'll begin to perceive, what's apparent to many advanced practitioners of Deep Learning (DL), is the emergence of Game Theoretic concepts in the design of newer architectures. This makes intuitive sense for two reasons. The first intuition is that DL systems will eventually need to tackle situations with imperfect knowledge. In fact, we've already seen this in DeepMind's AlphaGo that uses partial knowledge to tactically and strategically best the world-best human in the game of Go. The second intuition is that systems will not remain monolithic as they are now, but rather would involve multiple coordinating (or competing) cliques of DL systems.


Efficient Nash Computation in Large Population Games with Bounded Influence

Kearns, Michael, Mansour, Yishay

arXiv.org Artificial Intelligence

We introduce a general representation of largepopulation games in which each player's influence on the others is centralized and limited, but may otherwise be arbitrary. This representation significantly generalizes the class known as congestion games in a natural way. Our main results are provably correct and efficient algorithms for computing and learning approximate Nash equilibria in this general framework.


On the performance of approximate equilibria in congestion games

Christodoulou, George, Koutsoupias, Elias, Spirakis, Paul

arXiv.org Artificial Intelligence

We study the performance of approximate Nash equilibria for linear congestion games. We consider how much the price of anarchy worsens and how much the price of stability improves as a function of the approximation factor $\epsilon$. We give (almost) tight upper and lower bounds for both the price of anarchy and the price of stability for atomic and non-atomic congestion games. Our results not only encompass and generalize the existing results of exact equilibria to $\epsilon$-Nash equilibria, but they also provide a unified approach which reveals the common threads of the atomic and non-atomic price of anarchy results. By expanding the spectrum, we also cast the existing results in a new light. For example, the Pigou network, which gives tight results for exact Nash equilibria of selfish routing, remains tight for the price of stability of $\epsilon$-Nash equilibria but not for the price of anarchy.