Mode-Conditioning Unlocks Superior Test-Time Scaling
Wu, Chen Henry, Goyal, Sachin, Raghunathan, Aditi
–arXiv.org Artificial Intelligence
Parallel sampling promises substantial gains in test-time scaling, but its effectiveness is sharply limited by diversity collapse, where models concentrate on a few modes and repeated samples produce the same mistakes. We propose the mode-conditioning (ModC) framework, which explicitly allocates test-time compute across reasoning modes using either specialist models or mode-specific prefixes. ModC consistently improves scaling across controlled graph-search tasks and large-scale reasoning benchmarks, spanning model families and sizes from 0.5B to 7B. On OpenThoughts, fine-tuning Qwen2.5-7B with ModC achieves a 4x efficiency gain over standard training while also improving the maximum attainable Pass@k. We further show that gradient clustering enables ModC without explicit mode labels, yielding up to 10% gains on datasets such as NuminaMath. Finally, we show that ModC improves reinforcement learning (RL) and can further boost diversity-inducing RL methods. These results demonstrate that standard training underutilizes the diversity in data, and that ModC provides a simple, effective remedy for unlocking the full benefits of diversity in test-time scaling.
arXiv.org Artificial Intelligence
Dec-2-2025
- Country:
- Asia
- Middle East > Jordan (0.04)
- Singapore (0.04)
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States
- Hawaii > Honolulu County
- Honolulu (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Hawaii > Honolulu County
- Asia
- Genre:
- Research Report > New Finding (0.48)
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science (0.93)
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence