Rethinking DPO: The Role of Rejected Responses in Preference Misalignment

Open in new window