Fast Asymptotically Optimal Algorithms for Non-Parametric Stochastic Bandits

Neural Information Processing Systems 

We consider the problem of regret minimization in non-parametric stochastic bandits. When the rewards are known to be bounded from above, there exists asymptotically optimal algorithms, with asymptotic regret depending on an infi-mum of Kullback-Leibler divergences (KL).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found