Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration