Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

Neural Information Processing Systems 

Specifically, we extend the normal approximation-based lower bound for Beta distributions by Alfers and Dinges [1984] to Dirichlet distributions.