Alignment as Distribution Learning: Your Preference Model is Explicitly a Language Model

Open in new window