Multi-Step Generalized Policy Improvement by Leveraging Approximate Models Lucas N. Alegre Ana L. C. Bazzan 1 Ann Now é
–Neural Information Processing Systems
We introduce a principled method for performing zero-shot transfer in reinforcement learning (RL) by exploiting approximate models of the environment. Zeroshot transfer in RL has been investigated by leveraging methods rooted in generalized policy improvement (GPI) and successor features (SFs). Although computationally efficient, these methods are model-free: they analyze a library of policies--each solving a particular task--and identify which action the agent should take. We investigate the more general setting where, in addition to a library of policies, the agent has access to an approximate environment model. Even though model-based RL algorithms can identify near-optimal policies, they are typically computationally intensive.
Neural Information Processing Systems
Feb-10-2025, 22:49:46 GMT
- Country:
- Europe (0.67)
- North America > United States (0.93)
- Genre:
- Research Report > New Finding (0.68)
- Technology: