Thompson Sampling with Information Relaxation Penalties Seungki Min Columbia Business School Costis Maglaras Columbia Business School Ciamac C. Moallemi Columbia Business School

Neural Information Processing Systems 

We consider a finite-horizon multi-armed bandit (MAB) problem in a Bayesian setting, for which we propose an information relaxation sampling framework.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found