On the optimization dynamics of RLVR: Gradient gap and step size thresholds

Open in new window