Active teacher selection for reinforcement learning from human feedback

Open in new window