In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning

Open in new window