Adaptation to the Range in $K$-Armed Bandits

Nov-12-2020–arXiv.org Machine Learning

We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m,M]$. We do not assume that the range $[m,M]$ is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which prevents from simultaneously achieving the typical $\ln T$ and \smash{$\sqrt{T}$} bounds. For instance, a \smash{$\sqrt{T}$} distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order \smash{$\sqrt{T}$}. We exhibit a strategy achieving the rates for regret indicated by the new trade-off.

adaptation, artificial intelligence, big data, (18 more...)

arXiv.org Machine Learning

Nov-12-2020

arXiv.org PDF

Add feedback

Country:
- Europe (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found