Meta-Thompson Sampling

Kveton, Branislav, Konobeev, Mikhail, Zaheer, Manzil, Hsu, Chih-wei, Mladenov, Martin, Boutilier, Craig, Szepesvari, Csaba

Feb-11-2021–arXiv.org Machine Learning

Efficient exploration in multi-armed bandits is a fundamental online learning problem. In this work, we propose a variant of Thompson sampling that learns to explore better as it interacts with problem instances drawn from an unknown prior distribution. Our algorithm meta-learns the prior and thus we call it Meta-TS. We propose efficient implementations of Meta-TS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning the prior and is of a broader interest, because we derive the first prior-dependent upper bound on the Bayes regret of Thompson sampling. This result is complemented by empirical evaluation, which shows that Meta-TS quickly adapts to the unknown prior.

big data, educational setting, thompson, (18 more...)

arXiv.org Machine Learning

Feb-11-2021

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Alberta (0.14)

Genre:
- Research Report (0.64)

Industry:
- Education > Educational Setting (0.34)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found