ThompsonSamplingwithInformationRelaxation Penalties
–Neural Information Processing Systems
Weconsider afinite-horizon multi-armed bandit (MAB) problem inaBayesian setting, for which we propose aninformation relaxation samplingframework.
Neural Information Processing Systems
Feb-14-2026, 19:58:27 GMT
- Country:
- Technology: