Gradient Imbalance in Direct Preference Optimization