Prior-free and prior-dependent regret bounds for Thompson Sampling

Mar-13-2024, 15:52:48 GMT–Neural Information Processing Systems

We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We are interested in studying prior-free and priordependent regret bounds, very much in the same spirit than the usual distributionfree and distribution-dependent bounds for the non-Bayesian stochastic bandit. We first show that Thompson Sampling attains an optimal prior-free bound in the sense that for any prior distribution its Bayesian regret is bounded from above by 14 nK.

best arm, reward distribution, thompson sampling, (14 more...)

Neural Information Processing Systems

Mar-13-2024, 15:52:48 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)