Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization