A Minimalist Approach to Offline Reinforcement Learning Scott Fujimoto 1, 2 Shixiang Shane Gu2 1 Mila, McGill University 2 Google Research, Brain Team scott.fujimoto@mail.mcgill.ca
–Neural Information Processing Systems
We find that we can match the performance of state-of-the-art offline RL algorithms by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data.
Neural Information Processing Systems
Aug-16-2025, 15:13:34 GMT
- Country:
- North America
- United States > California
- Santa Clara County > Palo Alto (0.04)
- Canada > Quebec
- Montreal (0.40)
- United States > California
- North America
- Genre:
- Research Report > New Finding (0.46)
- Technology: