Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training

Open in new window