Bandits with Side Observations: Bounded vs. Logarithmic Regret

Degenne, Rémy, Garcelon, Evrard, Perchet, Vianney

Jul-10-2018–arXiv.org Machine Learning

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free. We prove that, no matter how small $\epsilon$ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than $\sum_i \frac{\log(1/\epsilon)}{\Delta_i}$, up to multiplicative constant and loglog terms. We also prove a matching lower-bound, stating that no reasonable algorithm can outperform this quantity.

algorithm, artificial intelligence, big data, (20 more...)

arXiv.org Machine Learning

Jul-10-2018

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.28)

Genre:
- Research Report (0.64)

Industry:
- Energy (0.47)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (0.93)
    - Representation & Reasoning (0.67)
  - Data Science > Data Mining
    - Big Data (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found