Reinforcement Learning for Reasoning in Large Language Models with One Training Example
–Neural Information Processing Systems
We show that reinforcement learning with verifiable reward using one training example (1-shot RLVR) is effective in incentivizing the mathematical reasoning capabilities of large language models (LLMs).
Neural Information Processing Systems
Jun-21-2026, 23:32:23 GMT
- Country:
- North America > United States > California (0.28)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Education (0.45)
- Technology: