Best Policy Learning from Trajectory Preference Feedback