Fast Asymptotically Optimal Algorithms for Non-Parametric Stochastic Bandits
–Neural Information Processing Systems
We consider the problem of regret minimization in non-parametric stochastic bandits. When the rewards are known to be bounded from above, there exists asymptotically optimal algorithms, with asymptotic regret depending on an infi-mum of Kullback-Leibler divergences (KL).
Neural Information Processing Systems
Feb-9-2026, 02:49:37 GMT
- Country:
- Europe
- North America > United States
- Colorado > Boulder County > Boulder (0.04)
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Food & Agriculture > Agriculture (1.00)
- Technology: