PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

Open in new window