Preference Models assume Proportional Hazards of Utilities

Aug-20-2025–arXiv.org Machine Learning

Modelling of human preferences is an important step in modern post-training pipelines for AI alignment. One popular approach of building such models of human preference is assuming that human preference rankings assume a Plackett-Luce (Plackett, 1975; Luce et al., 1959) distribution. In this monograph, I draw a somewhat remarkable connection of the popular statistical model for estimating lifetimes, the Cox Proportional Hazard model (Cox, 1972) to the Plackett-Luce model and then consequently to algorithms such as Direct Preference Optimization, a popular algorithm for aligning modern Artifical Intelligence (Ouyang et al., 2022). To the best of my knowledge, at the time of writing the connection between the Proportional Hazards model and the Plackett-Luce is relatively little known, and the subsequent connections to the AI alignment algorithms such as'Direct Preference Optimization ' (Rafailov et al., 2023) are not well appreciated. I believe that explcitly stating this connection will help the AI research community build on existing research in semi-parametric statistics to build better models of human preference.

artificial intelligence, machine learning, plackett-luce model, (13 more...)

arXiv.org Machine Learning

Aug-20-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.95)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (0.51)
  - Cognitive Science > Problem Solving (0.41)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found