RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning

Neural Information Processing Systems 

Offline reinforcement learning (RL) aims to find performant policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy optimisation within that model, have emerged as a promising approach to this problem.