Worst-Case Regret Bounds for Exploration via Randomized Value Functions

Daniel Russo

Neural Information Processing Systems 

This paper studies a recent proposal to use randomized value functions to drive exploration in reinforcement learning.