CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Neural Information Processing Systems 

This paper introduces Completion Pruning Policy Optimization (CPPO) to accelerate the training of reasoning models based on Group Relative Policy Optimization (GRPO).