Influencing Humans to Conform to Preference Models for RLHF

Open in new window