Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning

Open in new window