Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

Open in new window