Group-in-Group Policy Optimization for LLM Agent Training

Open in new window