GVPO: Group Variance Policy Optimization for Large Language Model Post-Training

Open in new window