2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision