Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback