Mirror Descent and the Information Ratio
Lattimore, Tor, György, András
The combination of minimax duality and the information-theoretic machinery developed by Russo and Van Roy [2014] has yielded a series of elegant arguments bounding the minimax regret for a variety of regret minimisation problems. The downside is that the application of minimax duality makes the approach non-constructive. The existence of certain policies is established without identifying what those policies are. Our main contribution is to show that the information-theoretic machinery can be translated in a natural way to the language of online linear optimisation, yielding explicit policies. Before you get too excited, these policies are not guaranteed to be efficient - they must solve a convex optimisation problem that may be infinite dimensional. Nevertheless, it provides a clear path towards algorithm design and/or improved bounds, as we illustrate with an application to finite-armed bandits. To maximise generality, our results are stated using the linear partial monitoring framework, which is flexible enough to model most classical setups.
Sep-25-2020