Prototypical Reward Network for Data-Efficient RLHF

Open in new window