What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data

Open in new window