$f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization

Open in new window