Optimistic posterior sampling for reinforcement learning: worst-case regret bounds