Efficient Learning in Polyhedral Games via Best Response Oracles

Chakrabarti, Darshan, Farina, Gabriele, Kroer, Christian

arXiv.org Artificial Intelligence 

Learning in games is a well-studied framework in which agents iteratively refine their strategies through repeated interactions with their environment. One natural way for agents to iteratively refine their strategies is by best-responding. This idea can be applied in many forms, the simplest and earliest instance of which was fictitious play (FP) [Brown, 1951]. This algorithm involves the agent observing the strategies played by the opponent and then playing a strategy that corresponds to the best response to the average of the observed strategies. This algorithm was shown to converge [Robinson, 1951], but its convergence rate can, in the worst case, scale quite poorly with the number of actions available to each player [Daskalakis and Pan, 2014]. It is then natural to ask what are the best convergence guarantees that can be obtained for the computation of Nash equilibria in two-player zero-sum games or coarse correlated equilibria in multiplayer games when agents are learning through a best-response oracle. In the online learning community, methods based only on best-response oracles are special cases of methods based on a linear minimization oracle (LMO), which can be queried for points that minimize a linear objective over the feasible set. Such methods are known as projection-free methods because they avoid potentially expensive projections onto the feasible set. Projection-free online learning algorithms might perform multiple LMO calls per iteration, so our paper and related literature are concerned not only with the number of iterations T of online learning but also the total number of LMO calls, which we will denote by N. Because LMOs for polyhedral decision sets essentially correspond to a best-response oracle (BRO), we will use these two terms interchangeably.