Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents

Open in new window