Goto

Collaborating Authors

 Europe




Preference Learning Algorithms Do Not Learn Preference Rankings

Neural Information Processing Systems

Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited.