Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

Open in new window