Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning