Thompson Sampling with Information Relaxation Penalties Seungki Min Columbia Business School Costis Maglaras Columbia Business School Ciamac C. Moallemi Columbia Business School
–Neural Information Processing Systems
We consider a finite-horizon multi-armed bandit (MAB) problem in a Bayesian setting, for which we propose an information relaxation sampling framework.
Neural Information Processing Systems
Aug-20-2025, 07:27:44 GMT