Preference Learning Algorithms Do Not Learn Preference Rankings
–Neural Information Processing Systems
Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited.
Neural Information Processing Systems
Nov-20-2025, 02:47:26 GMT
- Country:
- Europe > Austria (0.04)
- North America > United States (0.14)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Information Technology (0.67)
- Leisure & Entertainment (0.67)
- Media (0.92)
- Technology: