Provable Reward-Agnostic Preference-Based Reinforcement Learning

Open in new window