Optimal Design for Reward Modeling in RLHF

Open in new window