G$^2$RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance

Open in new window