Goto

Collaborating Authors

 principled foundation


Principled Foundations for Preference Optimization

Zhou, Wenxuan, Zhang, Shujian, Magdalou, Brice, Lambert, John, Amid, Ehsan, Nock, Richard, Hard, Andrew

arXiv.org Artificial Intelligence

The connection is established for all of Savage's DPO framework to generalize its functional parts (Alfano et al., 2025; Azar et al., 2024; Chen et al., The latter involves elements from Doignon-Falmagne's stochastic choice These many design elements lead to a generalization making the most of the connection since we encompass all of properness on Savage's side (regardless of optional properties like symmetry, We also encompass all of the modelling's power on Krantz, Luce, Suppes and Notably, our level of generalization is able to support "for free" important This is an important task because DPO was designed with the objective to simplify RLHF and getting "above" DPO is mandatory to improve results by getting more freedom on reward shapes, trajectories and preference behaviours (Gupta et al., 2025), all of which needs to be done while One perhaps unexpected pitfall comes from the RLHF/DPO inherited "gold To preserve readability, all proofs are given in an appendix. We adopt many definitions from Rafailov et al. (2023).