Learning a Pessimistic Reward Model in RLHF

Open in new window