2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Open in new window