Tor Lattimore
Bandit Phase Retrieval
Tor Lattimore
We prove an upper bound on the minimax cumulative regret in this problem of (d p n), which matches known lower bounds up to logarithmic factors and improves on the best known upper bound by a factor of p d. We also show that the minimax simple regret is (d/ p n) and that this is only achievable by an adaptive algorithm. Our analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling [Russo and Van Roy, 2014] are not sufficient for optimal regret.
Bounded Regret for Finite-Armed Structured Bandits
Tor Lattimore, Remi Munos
We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problemdependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.
Bandit Phase Retrieval
Tor Lattimore
We prove an upper bound on the minimax cumulative regret in this problem of Θ(d n), which matches known lower bounds up to logarithmic factors and improves on the best known upper bound by a factor of d. We also show that the minimax simple regret is Θ(d/ n) and that this is only achievable by an adaptive algorithm. Our analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling [Russo and Van Roy, 2014] are not sufficient for optimal regret.
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio
Julian Zimmert, Tor Lattimore
The information-theoretic analysis by Russo and Van Roy [25] in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications can be derived from existing techniques for online convex optimisation.