Teaching Large Language Models to Reason with Reinforcement Learning