Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble Gaon An

Neural Information Processing Systems 

To this end, offline RL algorithms adopt either a constraint or a penalty term that explicitly guides the policy to stay close to the given dataset.