Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

May-27-2016–arXiv.org Machine Learning

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon. Remarkably it turns out that this approach leads to finite-time regret guarantees comparable to those available for the popular UCB algorithm. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss some computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is a modest improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.

big data, gittin index, health & medicine, (20 more...)

arXiv.org Machine Learning

May-27-2016

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Alberta (0.28)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.55)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found