Parallelizing Thompson Sampling

Karbasi, Amin, Mirrokni, Vahab, Shadravan, Mohammad

Jun-2-2021–arXiv.org Machine Learning

Many problems in machine learning and artificial intelligence are sequential in nature and require making decisions over a long period of time and under uncertainty. Examples include A/B testing [Graepel et al., 2010], hyper-parameter tuning [Kandasamy et al., 2018], adaptive experimental design [Berry and Fristedt, 1985], ad placement [Schwartz et al., 2017], clinical trials [Villar et al., 2015], and recommender systems [Kawale et al., 2015], to name a few. Bandit problems provide a simple yet expressive view of sequential decision making with uncertainty. In such problems, a repeated game between a learner and the environment is played where at each round the learner selects an action, so called an arm, and then the environment reveals the reward. The goal of the learner is to maximize the accumulated reward over a horizon T. The main challenge faced by the learner is that the environment is unknown, and thus the learner has to follow a policy that identifies an efficient trade-off between the exploration (i.e., trying new actions) and exploitation (i.e., choosing among the known actions).

batch, big data, health & medicine, (19 more...)

arXiv.org Machine Learning

Jun-2-2021

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.34)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.35)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning > Uncertainty (0.48)
  - Data Science > Data Mining
    - Big Data (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found