Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)