Mirror Descent and the Information Ratio

Lattimore, Tor, György, András

arXiv.org Machine Learning 

The combination of minimax duality and the information-theoretic machinery developed by Russo and Van Roy [2014] has yielded a series of elegant arguments bounding the minimax regret for a variety of regret minimisation problems. The downside is that the application of minimax duality makes the approach non-constructive. The existence of certain policies is established without identifying what those policies are. Our main contribution is to show that the information-theoretic machinery can be translated in a natural way to the language of online linear optimisation, yielding explicit policies. Before you get too excited, these policies are not guaranteed to be efficient - they must solve a convex optimisation problem that may be infinite dimensional. Nevertheless, it provides a clear path towards algorithm design and/or improved bounds, as we illustrate with an application to finite-armed bandits. To maximise generality, our results are stated using the linear partial monitoring framework, which is flexible enough to model most classical setups.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found