Multi-Step Generalized Policy Improvement by Leveraging Approximate Models Lucas N. Alegre Ana L. C. Bazzan 1 Ann Now é

Neural Information Processing Systems 

We introduce a principled method for performing zero-shot transfer in reinforcement learning (RL) by exploiting approximate models of the environment. Zeroshot transfer in RL has been investigated by leveraging methods rooted in generalized policy improvement (GPI) and successor features (SFs). Although computationally efficient, these methods are model-free: they analyze a library of policies--each solving a particular task--and identify which action the agent should take. We investigate the more general setting where, in addition to a library of policies, the agent has access to an approximate environment model. Even though model-based RL algorithms can identify near-optimal policies, they are typically computationally intensive.