Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble Gaon An
–Neural Information Processing Systems
To this end, offline RL algorithms adopt either a constraint or a penalty term that explicitly guides the policy to stay close to the given dataset.
Neural Information Processing Systems
Aug-14-2025, 05:01:47 GMT