DAPO: An Open-Source LLM Reinforcement Learning System at Scale
–Neural Information Processing Systems
Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results.
Neural Information Processing Systems
Jun-13-2026, 15:52:27 GMT
- Technology: