UFT: Unifying Supervised and Reinforcement Fine-Tuning

Open in new window