Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration

Open in new window