The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Open in new window