SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLMReasoning
–Neural Information Processing Systems
Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for training large language models (LLMs) on complex reasoning tasks, such as mathematical problem solving. A prerequisite for the scalability of RLVR is a high-quality problem set with precise and verifiable answers.
Neural Information Processing Systems
Jun-17-2026, 07:03:27 GMT
- Country:
- North America > United States > California (0.28)
- Genre:
- Overview (0.67)
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Industry:
- Education > Educational Setting (0.67)
- Technology: