Benchmarks and Algorithms for Offline Preference-Based Reward Learning