semantle
MiGrATe: Mixed-Policy GRPO for Adaptation at Test-Time
Phan, Peter, Agarwal, Dhruv, Srinivas, Kavitha, Samulowitz, Horst, Kapanipathi, Pavan, McCallum, Andrew
Large language models (LLMs) are increasingly being applied to black-box optimization tasks, from program synthesis to molecule design. Prior work typically leverages in-context learning to iteratively guide the model towards better solutions. Such methods, however, often struggle to balance exploration of new solution spaces with exploitation of high-reward ones. Recently, test-time training (TTT) with synthetic data has shown promise in improving solution quality. However, the need for hand-crafted training data tailored to each task limits feasibility and scalability across domains. To address this problem, we introduce MiGrATe-a method for online TTT that uses GRPO as a search algorithm to adapt LLMs at inference without requiring external training data. MiGrATe operates via a mixed-policy group construction procedure that combines on-policy sampling with two off-policy data selection techniques: greedy sampling, which selects top-performing past completions, and neighborhood sampling (NS), which generates completions structurally similar to high-reward ones. Together, these components bias the policy gradient towards exploitation of promising regions in solution space, while preserving exploration through on-policy sampling. We evaluate MiGrATe on three challenging domains-word search, molecule optimization, and hypothesis+program induction on the Abstraction and Reasoning Corpus (ARC)-and find that it consistently outperforms both inference-only and TTT baselines, demonstrating the potential of online TTT as a solution for complex search tasks without external supervision.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- Asia > Middle East > Jordan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (6 more...)
- Energy (0.46)
- Materials > Chemicals (0.46)
- Education (0.46)
- Transportation (0.35)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Semantle is like the Dark Souls of Wordle
While the concept of guessing the "correct" word is similar, Semantle's answer can be any number of letters long. The only way a player knows if their guess is on the right path is through Word2Vec, the Google-owned, underlying technology running the game, which produces a number representing how close the word's meaning is (a.k.a. It also provides a "Getting close?" indicator of "hot" or "cold."