Towards Understanding the Influence of Reward Margin on Preference Model Performance

Open in new window