Connections Between Mirror Descent, Thompson Sampling and the Information Ratio
Zimmert, Julian, Lattimore, Tor
The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications can be derived from existing techniques for online convex optimisation. Besides this, for $k$-armed adversarial bandits we provide an efficient algorithm with regret that matches the best information-theoretic upper bound and improve best known regret guarantees for online linear optimisation on $\ell_p$-balls and bandits with graph feedback.
May-28-2019
- Country:
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- France > Île-de-France
- Spain > Canary Islands (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Denmark > Capital Region
- Europe
- Genre:
- Research Report (0.50)
- Industry:
- Education (0.34)
- Technology: