Preference Learning Algorithms Do Not Learn Preference Rankings

Neural Information Processing Systems 

Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found