Residual Reward Models for Preference-based Reinforcement Learning

Open in new window