sGPO: Trading Inference FLOPs for Training Efficiency in RLVR

Open in new window