Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

Open in new window