Preference-free Alignment Learning with Regularized Relevance Reward

Open in new window