Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning