ThompsonSamplingwithInformationRelaxation Penalties

Neural Information Processing Systems 

Weconsider afinite-horizon multi-armed bandit (MAB) problem inaBayesian setting, for which we propose aninformation relaxation samplingframework.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found