Towards Understanding the Influence of Reward Margin on Preference Model Performance