Multi-Step Generalized Policy Improvement by Leveraging Approximate Models

Jan-19-2025, 09:01:53 GMT–Neural Information Processing Systems

We introduce a principled method for performing zero-shot transfer in reinforcement learning (RL) by exploiting approximate models of the environment. Zero-shot transfer in RL has been investigated by leveraging methods rooted in generalized policy improvement (GPI) and successor features (SFs). Although computationally efficient, these methods are model-free: they analyze a library of policies---each solving a particular task---and identify which action the agent should take. We investigate the more general setting where, in addition to a library of policies, the agent has access to an approximate environment model. Even though model-based RL algorithms can identify near-optimal policies, they are typically computationally intensive.

large language model, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Jan-19-2025, 09:01:53 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.60)
  - Natural Language > Large Language Model (0.51)