Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
–Neural Information Processing Systems
We introduce Open-Reasoner-Zero, the first open source implementation of largescale reasoning-oriented RL training on the base model focusing on scalability, simplicity and accessibility. Through extensive experiments, we demonstrate that a minimalist approach, vanilla PPO with GAE (λ = 1, γ = 1) and straightforward rule-based rewards, without any KL regularization, is sufficient to scale up both benchmark performance and response length, replicating the scaling phenomenon observed in DeepSeek-R1-Zero.
Neural Information Processing Systems
Jun-23-2026, 01:48:37 GMT
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Education (0.46)
- Technology: