The Trickle-down Impact of Reward (In-)consistency on RLHF

Open in new window