Group Causal Policy Optimization for Post-Training Large Language Models

Open in new window