Towards Efficient Online Exploration for Reinforcement Learning with Human Feedback