On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models

Open in new window