Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning