The Surprising Effectiveness of Negative Reinforcement in LLMReasoning
–Neural Information Processing Systems
Reinforcement learning with verifiable rewards (RLVR) is a promising approach for training language models (LMs) on reasoning tasks that elicit emergent long chains of thought (CoTs).
Neural Information Processing Systems
Jun-22-2026, 06:58:00 GMT