DAPO: An Open-Source LLMReinforcement Learning System at Scale

Jun-21-2026, 04:16:33 GMT–Neural Information Processing Systems

Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) algorithm, and fully opensource a state-of-the-art large-scale RL system that achieves 50 points on AIME 2024 using Qwen2.5-32B

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Jun-21-2026, 04:16:33 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found