Worst-Case Regret Bounds for Exploration via Randomized Value Functions
–Neural Information Processing Systems
This paper studies a recent proposal to use randomized value functions to drive exploration in reinforcement learning.
Neural Information Processing Systems
Feb-12-2026, 01:03:50 GMT