Conditional Importance Sampling for Off-Policy Learning

Rowland, Mark, Harutyunyan, Anna, van Hasselt, Hado, Borsa, Diana, Schaul, Tom, Munos, Rémi, Dabney, Will

arXiv.org Machine Learning 

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found