How Reinforcement Learning After Next-Token Prediction Facilitates Learning