A Minimalist Approach to Offline Reinforcement Learning Scott Fujimoto 1, 2 Shixiang Shane Gu2 1 Mila, McGill University 2 Google Research, Brain Team scott.fujimoto@mail.mcgill.ca

Neural Information Processing Systems 

We find that we can match the performance of state-of-the-art offline RL algorithms by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found