Optimal Design for Reward Modeling in RLHF