When does return-conditioned supervised learning work for offline reinforcement learning?

Oct-9-2024, 14:02:33 GMT–Neural Information Processing Systems

Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL). RCSL algorithms learn the distribution of actions conditioned on both the state and the return of the trajectory. Then they define a policy by conditioning on achieving high return. In this paper, we provide a rigorous study of the capabilities and limitations of RCSL something which is crucially missing in previous work. We find that RCSL returns the optimal policy under a set of assumptions that are stronger than those needed for the more traditional dynamic programming-based algorithms.

algorithm, offline reinforcement learning, return-conditioned supervised learning work, (3 more...)

Neural Information Processing Systems

Oct-9-2024, 14:02:33 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.66)
  - Inductive Learning (0.66)