Principled Fine-tuning of LLMs from User-Edits: AMedley of Preference, Supervision, and Reward

Open in new window