Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models

Open in new window