Training-Free Group Relative Policy Optimization

Open in new window