Prior-free and prior-dependent regret bounds for Thompson Sampling
–Neural Information Processing Systems
We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We are interested in studying prior-free and priordependent regret bounds, very much in the same spirit than the usual distributionfree and distribution-dependent bounds for the non-Bayesian stochastic bandit. We first show that Thompson Sampling attains an optimal prior-free bound in the sense that for any prior distribution its Bayesian regret is bounded from above by 14 nK.
Neural Information Processing Systems
Mar-13-2024, 15:52:48 GMT
- Technology: