Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

Shipra Agrawal, Randy Jia

Neural Information Processing Systems 

Neural Information Processing Systems http://nips.cc/