Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints

Open in new window