Learning in Games: Robustness of Fast Convergence
Dylan J. Foster, zhiyuan li, Thodoris Lykouris, Karthik Sridharan, Eva Tardos
–Neural Information Processing Systems
We show that learning algorithms satisfying a low approximate regret property experience fast convergence to approximate optimality in a large class of repeated games. Our property, which simply requires that each learner has small regret compared to a (1 +)-multiplicative approximation to the best action in hindsight, is ubiquitous among learning algorithms; it is satisfied even by the vanilla Hedge forecaster. Our results improve upon recent work of Syrgkanis et al. [28] in a number of ways. We require only that players observe payoffs under other players' realized actions, as opposed to expected payoffs. We further show that convergence occurs with high probability, and show convergence under bandit feedback.
Neural Information Processing Systems
Jan-20-2025, 18:23:40 GMT