Practical Evaluation and Optimization of Contextual Bandit Algorithms

Bietti, Alberto, Agarwal, Alekh, Langford, John

Feb-12-2018–arXiv.org Machine Learning

We study and empirically optimize contextual bandit learning, exploration, and problem encodings across 500+ datasets, creating a reference for practitioners and discovering or reinforcing a number of natural open problems for researchers. Across these experiments we show that minimizing the amount of exploration is a key design goal for practical performance. Remarkably, many problems can be solved purely via the implicit exploration imposed by the diversity of contexts. For practitioners, we introduce a number of practical improvements to common exploration algorithms including Bootstrap Thompson sampling, Online Cover, and $\epsilon$-greedy. We also detail a new form of reduction to regression for learning from exploration data. Overall, this is a thorough study and review of contextual bandit methodology.

algorithm, artificial intelligence, big data, (17 more...)

arXiv.org Machine Learning

Feb-12-2018

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.67)

Industry:
- Education (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Statistical Learning (0.46)
    - Representation & Reasoning (0.93)
  - Data Science > Data Mining
    - Big Data (0.65)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found