No-Regret Learning in Unknown Games with Correlated Payoffs

Sessa, Pier Giuseppe, Bogunovic, Ilija, Kamgarpour, Maryam, Krause, Andreas

Mar-19-2020, 02:16:14 GMT–Neural Information Processing Systems

We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs).

Neural Information Processing Systems

Mar-19-2020, 02:16:14 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)