Algorithmic Thinking Theory
Bateni, MohammadHossein, Cohen-Addad, Vincent, Gu, Yuzhou, Lattanzi, Silvio, Meierhans, Simon, Mohri, Christopher
–arXiv.org Artificial Intelligence
Initial challenges, such as grade-school mathematics (GSM8K) and standard competition math (MATH dataset), have largely been surmounted, pushing the frontier of AI reasoning toward "grand challenge" problems, such as those found in the International Mathematical Olympiad (IMO). These problems, renowned for their demand for deep insight, creativity, and rigorous proof, expose a fascinating weakness in modern LLMs. While a model's performance on a single attempt (termed pass@1) may be very low, its ability to produce a correct answer within k attempts (pass@k) can be significantly higher. This pass@1 versus pass@k gap, especially pronounced when sampling with high temperature to produce diverse outputs, suggests that models possess a vast, latent capability that is not accessible in a single, high-confidence generation. Interestingly, to recover the full power of the model it is not sufficient to simply use multiple attempts. In fact, even the pass@k metric fails to capture the full story. On the most difficult problems, simply sampling k times and selecting the best answer (e.g., "best-of-32") still yields poor results. For instance, Huang and Yang (2025) report that a best-of-32 baseline on the IMO 2025 problems achieved an accuracy of only 31.6-38.1% for leading models [HY25]. This paradox lies at the heart of our work: the latent capability of LLMs is not merely a matter of selection (finding one correct needle in a haystack of k attempts), but one of synthesis.
arXiv.org Artificial Intelligence
Dec-5-2025
- Country:
- Africa > Rwanda
- Europe
- Austria > Vienna (0.14)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- California > Santa Clara County
- Palo Alto (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York (0.04)
- California > Santa Clara County
- Genre:
- Research Report > New Finding (0.68)
- Technology: