Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

Open in new window