Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning
David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek
–Neural Information Processing Systems
Specifically,becauseaQfunctionis defined with respect toaparticular policy,constructingPˆQ requires selection ofareference policy or distribution over policies.
Neural Information Processing Systems
Feb-11-2026, 14:47:15 GMT
- Country:
- Technology: