Stochastic Multi-armed Bandits: Optimal Trade-off among Optimality, Consistency, and Tail Risk

Neural Information Processing Systems 

Such decaying rate is proved to be best achievable.