CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
–Neural Information Processing Systems
This paper introduces Completion Pruning Policy Optimization (CPPO) to accelerate the training of reasoning models based on Group Relative Policy Optimization (GRPO).
Neural Information Processing Systems
Jun-23-2026, 07:45:01 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.68)
- Research Report
- Industry:
- Education (0.46)
- Technology: