Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games

Neural Information Processing Systems 

In this paper we establish efficient and uncoupled learning dynamics so that, when employed by all players in a general-sum multiplayer game, the swap regret of each player after T repetitions of the game is bounded by O(logT), improving over the prior best bounds of O(log4(T)). At the same time, we guarantee optimal O( T) swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a time-invariant learning rate, the second-order path lengths of the dynamics up to time T are bounded by O(logT), a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way optimistic regularized learning with the use of self-concordant barriers. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found