Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback

Open in new window