RRM: Robust Reward Model Training Mitigates Reward Hacking

Open in new window