Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek

Neural Information Processing Systems 

Specifically,becauseaQfunctionis defined with respect toaparticular policy,constructingPˆQ requires selection ofareference policy or distribution over policies.