Principled Foundations for Preference Optimization

Zhou, Wenxuan, Zhang, Shujian, Magdalou, Brice, Lambert, John, Amid, Ehsan, Nock, Richard, Hard, Andrew

Aug-6-2025–arXiv.org Artificial Intelligence

The connection is established for all of Savage's DPO framework to generalize its functional parts (Alfano et al., 2025; Azar et al., 2024; Chen et al., The latter involves elements from Doignon-Falmagne's stochastic choice These many design elements lead to a generalization making the most of the connection since we encompass all of properness on Savage's side (regardless of optional properties like symmetry, We also encompass all of the modelling's power on Krantz, Luce, Suppes and Notably, our level of generalization is able to support "for free" important This is an important task because DPO was designed with the objective to simplify RLHF and getting "above" DPO is mandatory to improve results by getting more freedom on reward shapes, trajectories and preference behaviours (Gupta et al., 2025), all of which needs to be done while One perhaps unexpected pitfall comes from the RLHF/DPO inherited "gold To preserve readability, all proofs are given in an appendix. We adopt many definitions from Rafailov et al. (2023).

machine learning, natural language, principled foundation, (19 more...)

arXiv.org Artificial Intelligence

Aug-6-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.93)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.86)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found