Bridging Supervised Learning and Reinforcement Learning in Math Reasoning