Learning with Bandit Feedback in Potential Games

Heliou, Amélie, Cohen, Johanne, Mertikopoulos, Panayotis

Dec-31-2017–Neural Information Processing Systems

This paper examines the equilibrium convergence properties of no-regret learning with exponential weights in potential games. To establish convergence with minimal information requirements on the players' side, we focus on two frameworks: the semi-bandit case (where players have access to a noisy estimate of their payoff vectors, including strategies they did not play), and the bandit case (where players are only able to observe their in-game, realized payoffs). In the semi-bandit case, we show that the induced sequence of play converges almost surely to a Nash equilibrium at a quasi-exponential rate. In the bandit case, the same result holds for approximate Nash equilibria if we introduce a constant exploration factor that guarantees that action choice probabilities never become arbitrarily small. In particular, if the algorithm is run with a suitably decreasing exploration factor, the sequence of play converges to a bona fide Nash equilibrium with probability 1.

artificial intelligence, equilibrium, machine learning, (17 more...)

Neural Information Processing Systems

Dec-31-2017

Conferences PDF

Add feedback

Country:
- North America > United States (0.46)
- Europe > France (0.28)

Genre:
- Research Report (0.34)

Technology:
- Information Technology
  - Game Theory (1.00)
  - Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
Learning with Bandit Feedback in Potential Games
Learning with Bandit Feedback in Potential Games

Similar Docs Excel Report more

Title	Similarity	Source
None found