Sub-sampling for Efficient Non-Parametric Bandit Exploration

Neural Information Processing Systems 

In this paper, we propose the first re-sampling based algorithm that is asymptotically optimal for several classes of possibly un-bounded parametric distributions.