Bandits with many optimal arms

de Heide, Rianne, Cheshire, James, Ménard, Pierre, Carpentier, Alexandra

Mar-23-2021–arXiv.org Machine Learning

We consider a stochastic bandit problem with a possibly infinite number of arms. We write $p^*$ for the proportion of optimal arms and $\Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters $T$ (the budget), $p^*$ and $\Delta$. For the objective of minimizing the cumulative regret, we provide a lower bound of order $\Omega(\log(T)/(p^*\Delta))$ and a UCB-style algorithm with matching upper bound up to a factor of $\log(1/\Delta)$. Our algorithm needs $p^*$ to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to $p^*$ in this setting is impossible. For best-arm identification we also provide a lower bound of order $\Omega(\exp(-cT\Delta^2p^*))$ on the probability of outputting a sub-optimal arm where $c>0$ is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order $\log(1/\Delta)$ in the exponential, and that does not need $p^*$ or $\Delta$ as parameter.

big data, health & medicine, optimal arm, (22 more...)

arXiv.org Machine Learning

Mar-23-2021

arXiv.org PDF

Add feedback

Country:
- Europe
  - Germany > Saxony-Anhalt (0.14)
  - Netherlands (0.14)
- North America > United States (0.14)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (0.35)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Computational Learning Theory (0.49)
  - Data Science > Data Mining
    - Big Data (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found