Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

Shang, Xuedong, de Heide, Rianne, Kaufmann, Emilie, Ménard, Pierre, Valko, Michal

arXiv.org Machine Learning 

In particular, we justify its use for fixed-confidence best-arm identification . We further propose a variant of TTTS called Top-Two Transportation Cost ( T3C), which disposes of the computational burden of TTTS . As our main contribution, we provide the first sample complexity analysis of TTTS and T3C when coupled with a very natural Bayesian stopping rule, for bandits with Gaussian rewards, solving one of the open questions raised by Russo (2016). We also provide new posterior convergence results for TTTS under two models that are commonly used in practice: bandits with Gaussian and Bernoulli rewards and conjugate priors. 1 Introduction In multi-armed bandits, a learner repeatedly chooses an arm to play, and receives a reward from the associated unknown probability distribution. When the task is best-arm identification (BAI), the learner is not only asked to sample an arm at each stage, but is also asked to output a recommendation (i.e., a guess for the arm with the largest mean reward) after a certain period.