Goto

Collaborating Authors

 monte carlo tree search


Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction

Neural Information Processing Systems

Monte Carlo Tree Search (MCTS) algorithms such as AlphaGo and MuZero have achieved superhuman performance in many challenging tasks. However, the computational complexity of MCTS-based algorithms is influenced by the size of the search space. To address this issue, we propose a novel probability tree state abstraction (PTSA) algorithm to improve the search efficiency of MCTS. A general tree state abstraction with path transitivity is defined. In addition, the probability tree state abstraction is proposed for fewer mistakes during the aggregation step. Furthermore, the theoretical guarantees of the transitivity and aggregation error bound are justified. To evaluate the effectiveness of the PTSA algorithm, we integrate it with state-of-the-art MCTS-based algorithms, such as Sampled MuZero and Gumbel MuZero. Experimental results on different tasks demonstrate that our method can accelerate the training process of state-of-the-art algorithms with 10%-45% search space reduction.


LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Neural Information Processing Systems

Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari.However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity.In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios.


Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions

Xiao, Junlin, Darvariu, Victor-Alexandru, Lacerda, Bruno, Hawes, Nick

arXiv.org Artificial Intelligence

Monte Carlo Tree Search is a cornerstone algorithm for online planning, and its root-parallel variant is widely used when wall clock time is limited but best performance is desired. In environments with continuous action spaces, how to best aggregate statistics from different threads is an important yet underexplored question. In this work, we introduce a method that uses Gaussian Process Regression to obtain value estimates for promising actions that were not trialed in the environment. We perform a systematic evaluation across 6 different domains, demonstrating that our approach outperforms existing aggregation strategies while requiring a modest increase in inference time.


MOTIF: Multi-strategy Optimization via Turn-based Interactive Framework

Kiet, Nguyen Viet Tuan, Van Tung, Dao, Dao, Tran Cong, Binh, Huynh Thi Thanh

arXiv.org Artificial Intelligence

Designing effective algorithmic components remains a fundamental obstacle in tackling NP-hard combinatorial optimization problems (COPs), where solvers often rely on carefully hand-crafted strategies. Despite recent advances in using large language models (LLMs) to synthesize high-quality components, most approaches restrict the search to a single element - commonly a heuristic scoring function - thus missing broader opportunities for innovation. In this paper, we introduce a broader formulation of solver design as a multi-strategy optimization problem, which seeks to jointly improve a set of interdependent components under a unified objective. To address this, we propose Multi-strategy Optimization via Turn-based Interactive Framework (MOTIF) - a novel framework based on Monte Carlo Tree Search that facilitates turn-based optimization between two LLM agents. At each turn, an agent improves one component by leveraging the history of both its own and its opponent's prior updates, promoting both competitive pressure and emergent cooperation. This structured interaction broadens the search landscape and encourages the discovery of diverse, high-performing solutions. Experiments across multiple COP domains show that MOTIF consistently outperforms state-of-the-art methods, highlighting the promise of turn-based, multi-agent prompting for fully automated solver design.


Parallelizing Tree Search with Twice Sequential Monte Carlo

Oren, Yaniv, de Vries, Joery A., van der Vaart, Pascal R., Spaan, Matthijs T. J., Böhmer, Wendelin

arXiv.org Artificial Intelligence

Model-based reinforcement learning (RL) methods that leverage search are responsible for many milestone breakthroughs in RL. Sequential Monte Carlo (SMC) recently emerged as an alternative to the Monte Carlo Tree Search (MCTS) algorithm which drove these breakthroughs. SMC is easier to parallelize and more suitable to GPU acceleration. However, it also suffers from large variance and path degeneracy which prevent it from scaling well with increased search depth, i.e., increased sequential compute. To address these problems, we introduce Twice Sequential Monte Carlo Tree Search (TSMCTS). Across discrete and continuous environments TSMCTS outperforms the SMC baseline as well as a popular modern version of MCTS. Through variance reduction and mitigation of path degeneracy, TSMCTS scales favorably with sequential compute while retaining the properties that make SMC natural to parallelize.


SMRC: Aligning Large Language Models with Student Reasoning for Mathematical Error Correction

Zeng, Biaojie, Zhang, Min, Zhou, Juan, Liu, Fengrui, Huang, Ruiyang, Lin, Xin

arXiv.org Artificial Intelligence

Large language models (LLMs) often make reasoning errors when solving mathematical problems, and how to automatically detect and correct these errors has become an important research direction. However, existing approaches \textit{mainly focus on self-correction within the model}, which falls short of the ``teacher-style`` correction required in educational settings, \textit{i.e.}, systematically guiding and revising a student's problem-solving process. To address this gap, we propose \texttt{SMRC} (\textit{\underline{S}tudent \underline{M}athematical \underline{R}easoning \underline{C}orrection}), a novel method that aligns LLMs with student reasoning. Specifically, \texttt{SMRC} formulates student reasoning as a multi-step sequential decision problem and introduces Monte Carlo Tree Search (MCTS) to explore optimal correction paths. To reduce the cost of the annotating process-level rewards, we leverage breadth-first search (BFS) guided by LLMs and final-answer evaluation to generate reward signals, which are then distributed across intermediate reasoning steps via a back-propagation mechanism, enabling fine-grained process supervision. Additionally, we construct a benchmark for high school mathematics, MSEB (Multi-Solution Error Benchmark), consisting of 158 instances that include problem statements, student solutions, and correct reasoning steps. We further propose a dual evaluation protocol centered on \textbf{solution accuracy} and \textbf{correct-step retention}, offering a comprehensive measure of educational applicability. Experiments demonstrate that \texttt{SMRC} significantly outperforms existing methods on two public datasets (ProcessBench and MR-GSM8K) and our MSEB in terms of effectiveness and overall performance. The code and data are available at https://github.com/Mind-Lab-ECNU/SMRC.


T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search

Cui, Xing, Zou, Yueying, Li, Zekun, Li, Peipei, Xu, Xinyuan, Liu, Xuannan, Huang, Huaibo

arXiv.org Artificial Intelligence

Real-world multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification. However, existing methods mainly rely on static pipelines and limited tool usage, limiting their ability to handle such complexity and diversity. To address this challenge, we propose \method, a novel misinformation detection agent that incorporates an extensible toolkit with Monte Carlo Tree Search (MCTS). The toolkit consists of modular tools such as web search, forgery detection, and consistency analysis. Each tool is described using standardized templates, enabling seamless integration and future expansion. To avoid inefficiency from using all tools simultaneously, a greedy search-based selector is proposed to identify a task-relevant subset. This subset then serves as the action space for MCTS to dynamically collect evidence and perform multi-source verification. To better align MCTS with the multi-source nature of misinformation detection, \method~ extends traditional MCTS with multi-source verification, which decomposes the task into coordinated subtasks targeting different forgery sources. A dual reward mechanism containing a reasoning trajectory score and a confidence score is further proposed to encourage a balance between exploration across mixed forgery sources and exploitation for more reliable evidence. We conduct ablation studies to confirm the effectiveness of the tree search mechanism and tool usage. Extensive experiments further show that \method~ consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks, demonstrating its strong potential as a training-free detector.


Top-Down Semantic Refinement for Image Captioning

Zhang, Jusheng, Cai, Kaitong, Yang, Jing, Wang, Jian, Tang, Chengpei, Wang, Keze

arXiv.org Artificial Intelligence

Large Vision-Language Models (VLMs) face an inherent contradiction in image captioning: their powerful single-step generation capabilities often lead to a myopic decision-making process. This makes it difficult to maintain global narrative coherence while capturing rich details, a limitation that is particularly pronounced in tasks that require multi-step and complex scene description. To overcome this fundamental challenge, we redefine image captioning as a goal-oriented hierarchical refinement planning problem, and further propose a novel framework, named Top-Down Semantic Refinement (TDSR), which models the generation process as a Markov Decision Process (MDP). However, planning within the vast state space of a VLM presents a significant computational hurdle. Our core contribution, therefore, is the design of a highly efficient Monte Carlo Tree Search (MCTS) algorithm tailored for VLMs. By incorporating a visual-guided parallel expansion and a lightweight value network, our TDSR reduces the call frequency to the expensive VLM by an order of magnitude without sacrificing planning quality. Furthermore, an adaptive early stopping mechanism dynamically matches computational overhead to the image's complexity. Extensive experiments on multiple benchmarks, including DetailCaps, COMPOSITION-CAP, and POPE, demonstrate that our TDSR, as a plug-and-play module, can significantly enhance the performance of existing VLMs (e.g., LLaV A-1.5, Qwen2.5-VL) by achieving state-of-the-art or highly competitive results in fine-grained description, compositional generalization, and hallucination suppression.


VisuoAlign: Safety Alignment of LVLMs with Multimodal Tree Search

Li, MingSheng, Zhao, Guangze, Liu, Sichen

arXiv.org Artificial Intelligence

Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal perception and generation, yet their safety alignment remains a critical challenge.Existing defenses and vulnerable to multimodal jailbreaks, as visual inputs introduce new attack surfaces, reasoning chains lack safety supervision, and alignment often degrades under modality fusion.To overcome these limitation, we propose VisuoAlign, a framework for multi-modal safety alignment via prompt-guided tree search.VisuoAlign embeds safety constrains into the reasoning process through visual-textual interactive prompts, employs Monte Carlo Tree Search(MCTS) to systematically construct diverse safety-critical prompt trajectories, and introduces prompt-based scaling to ensure real-time risk detection and compliant responses.Extensive experiments demonstrate that VisuoAlign proactively exposes risks, enables comprehensive dataset generation, and significantly improves the robustness of LVLMs against complex cross-modal threats.


GammaZero: Learning To Guide POMDP Belief Space Search With Graph Representations

Mangannavar, Rajesh, Tadepalli, Prasad

arXiv.org Artificial Intelligence

We introduce an action-centric graph representation framework for learning to guide planning in Partially Observable Markov Decision Processes (POMDPs). Unlike existing approaches that require domain-specific neural architectures and struggle with scalability, GammaZero leverages a unified graph-based belief representation that enables generalization across problem sizes within a domain. Our key insight is that belief states can be systematically transformed into action-centric graphs where structural patterns learned on small problems transfer to larger instances. We employ a graph neural network with a decoder architecture to learn value functions and policies from expert demonstrations on computationally tractable problems, then apply these learned heuristics to guide Monte Carlo tree search on larger problems. Experimental results on standard POMDP benchmarks demonstrate that GammaZero achieves comparable performance to BetaZero when trained and tested on the same-sized problems, while uniquely enabling zero-shot generalization to problems 2-4 times larger than those seen during training, maintaining solution quality with reduced search requirements. Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential decision-making under uncertainty, where agents must act based on incomplete information about the true state of the environment Kaelbling et al. (1998). This partial observability arises naturally in many real-world applications, from autonomous driving where sensors provide limited field-of-view Hoel et al. (2019), to robotic manipulation where object properties must be inferred through interaction Lauri et al. (2022), to subsurface exploration where underground structures can only be observed at sparse drilling locations Mern & Caers (2023).