Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs

Open in new window