Sample-Efficient Reinforcement Learning from Human Feedback via Information-Directed Sampling