Learning with Bandit Feedback in Potential Games
Heliou, Amélie, Cohen, Johanne, Mertikopoulos, Panayotis
–Neural Information Processing Systems
This paper examines the equilibrium convergence properties of no-regret learning with exponential weights in potential games. To establish convergence with minimal information requirements on the players' side, we focus on two frameworks: the semi-bandit case (where players have access to a noisy estimate of their payoff vectors, including strategies they did not play), and the bandit case (where players are only able to observe their in-game, realized payoffs). In the semi-bandit case, we show that the induced sequence of play converges almost surely to a Nash equilibrium at a quasi-exponential rate. In the bandit case, the same result holds for approximate Nash equilibria if we introduce a constant exploration factor that guarantees that action choice probabilities never become arbitrarily small. In particular, if the algorithm is run with a suitably decreasing exploration factor, the sequence of play converges to a bona fide Nash equilibrium with probability 1.
Neural Information Processing Systems
Dec-31-2017
- Country:
- Europe
- France
- Auvergne-Rhône-Alpes > Isère
- Grenoble (0.04)
- Grand Est > Bas-Rhin
- Strasbourg (0.04)
- Auvergne-Rhône-Alpes > Isère
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- France
- North America > United States
- California > Los Angeles County
- Long Beach (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- California > Los Angeles County
- Europe
- Genre:
- Research Report (0.34)
- Technology: