MiGrATe: Mixed-Policy GRPO for Adaptation at Test-Time
Phan, Peter, Agarwal, Dhruv, Srinivas, Kavitha, Samulowitz, Horst, Kapanipathi, Pavan, McCallum, Andrew
–arXiv.org Artificial Intelligence
Large language models (LLMs) are increasingly being applied to black-box optimization tasks, from program synthesis to molecule design. Prior work typically leverages in-context learning to iteratively guide the model towards better solutions. Such methods, however, often struggle to balance exploration of new solution spaces with exploitation of high-reward ones. Recently, test-time training (TTT) with synthetic data has shown promise in improving solution quality. However, the need for hand-crafted training data tailored to each task limits feasibility and scalability across domains. To address this problem, we introduce MiGrATe-a method for online TTT that uses GRPO as a search algorithm to adapt LLMs at inference without requiring external training data. MiGrATe operates via a mixed-policy group construction procedure that combines on-policy sampling with two off-policy data selection techniques: greedy sampling, which selects top-performing past completions, and neighborhood sampling (NS), which generates completions structurally similar to high-reward ones. Together, these components bias the policy gradient towards exploitation of promising regions in solution space, while preserving exploration through on-policy sampling. We evaluate MiGrATe on three challenging domains-word search, molecule optimization, and hypothesis+program induction on the Abstraction and Reasoning Corpus (ARC)-and find that it consistently outperforms both inference-only and TTT baselines, demonstrating the potential of online TTT as a solution for complex search tasks without external supervision.
arXiv.org Artificial Intelligence
Aug-13-2025
- Country:
- Asia
- Indonesia > Bali (0.04)
- Middle East
- Jordan (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- Singapore (0.04)
- Europe > Monaco (0.04)
- North America
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States > Massachusetts
- Hampshire County > Amherst (0.14)
- South America > Chile
- Asia
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Education (0.46)
- Energy (0.46)
- Materials > Chemicals (0.46)
- Transportation (0.35)
- Technology: