Group Robust Preference Optimization in Reward-free RLHF

Open in new window