ARIES: Autonomous Reasoning with LLMs on Interactive Thought Graph Environments

Gimenes, Pedro, Cao, Zeyu, Wong, Jeffrey, Zhao, Yiren

Feb-28-2025–arXiv.org Artificial Intelligence

Recent research has shown that LLM performance on reasoning tasks can be enhanced by scaling test-time compute. One promising approach, particularly with decomposable problems, involves arranging intermediate solutions as a graph on which transformations are performed to explore the solution space. However, prior works rely on pre-determined, task-specific transformation schedules which are subject to a set of searched hyperparame-ters. In this work, we view thought graph transformations as actions in a Markov decision process, and implement policy agents to drive effective action policies for the underlying reasoning LLM agent. In particular, we investigate the ability for another LLM to act as a policy agent on thought graph environments and introduce ARIES, a multi-agent architecture for reasoning with LLMs. In ARIES, reasoning LLM agents solve decomposed subproblems, while policy LLM agents maintain visibility of the thought graph states, and dynamically adapt the problem-solving strategy. Through extensive experiments, we observe that using off-the-shelf LLMs as policy agents with no supervised fine-tuning (SFT) can yield up to 29% higher accuracy on HumanEval relative to static transformation schedules, as well as reducing inference costs by 35% and avoid any search requirements. We also conduct a thorough analysis of observed failure modes, highlighting that limitations on LLM sizes and the depth of problem decomposition can be seen as challenges to scaling LLM-guided reasoning. Prior works have shown that Large Language Models (LLMs) are subject to the emergence of abilities as their parameter count grows (Wei et al., 2022), which spurred significant interest in training increasingly larger models. However, recent work showed that under a fixed compute budget for training and inference, LLM performance on reasoning tasks can be enhanced by allocating a higher proportion of compute to inference rather than training (Snell et al., 2024). This shift towards inference-time compute scaling can be intuitively understood through the Dual Process Theory, which postulates the existence of two distinct modes of reasoning in humans - (1) a fast, intuitive mode and (2) a slow, deliberate mode (Evans & Frankish, 2009).

agent, policy agent, transformation, (17 more...)

arXiv.org Artificial Intelligence

Feb-28-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England
  - Oxfordshire > Oxford (0.04)
  - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report
  - Promising Solution (0.48)
  - New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found