Beyond Reward: Offline Preference-guided Policy Optimization

Open in new window