Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search
Misaki, Kou, Inoue, Yuichi, Imajuku, Yuki, Kuroki, So, Nakamura, Taishi, Akiba, Takuya
–arXiv.org Artificial Intelligence
Recent advances demonstrate that increasing inference-time computation can significantly boost the reasoning capabilities of large language models (LLMs). Although repeated sampling (i.e., generating multiple candidate outputs) is a highly effective strategy, it does not leverage external feedback signals for refinement, which are often available in tasks like coding. In this work, we propose Adaptive Branching Monte Carlo Tree Search (AB-MCTS), a novel inference-time framework that generalizes repeated sampling with principled multi-turn exploration and exploitation. At each node in the search tree, AB-MCTS dynamically decides whether to "go wider" by expanding new candidate responses or "go deeper" by revisiting existing ones based on external feedback signals. We evaluate our method on complex coding and engineering tasks using frontier models. Empirical results show that AB-MCTS consistently outperforms both repeated sampling and standard MCTS, underscoring the importance of combining the response diversity of LLMs with multi-turn solution refinement for effective inference-time scaling. Recent work (Li et al., 2022; Lewkowycz et al., 2022; Brown et al., 2024; Wu et al., 2025) has begun to reveal that scaling inference-time computation can significantly boost the performance of large language models (LLMs) on complex tasks. Traditionally, LLM performance improvements have stemmed from training-time scaling--namely, increasing the size of training datasets, model parameters, and computational resources at training (Kaplan et al., 2020; Hoffmann et al., 2022). In contrast, inference-time scaling seeks to improve the performance of an LLM by allocating more computational resources at inference. As we outline in Section 2, there are broadly three types of approaches to achieve inference-time scaling: post-training tuning, reward-guided CoT (Chain-of-Thought), and multiple answer generation.
arXiv.org Artificial Intelligence
Mar-6-2025