Group-in-Group Policy Optimization for LLMAgent Training

Open in new window