ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

Yu, Xiao, Peng, Baolin, Vajipey, Vineeth, Cheng, Hao, Galley, Michel, Gao, Jianfeng, Yu, Zhou

Oct-17-2024–arXiv.org Artificial Intelligence

Autonomous agents have demonstrated significant potential in automating complex multistep decision-making tasks. However, even state-of-the-art vision-language models (VLMs), such as GPT-4o, still fall short of human-level performance, particularly in intricate web environments and long-horizon tasks. To address these limitations, we present ExACT, an approach to combine test-time search and self-learning to build o1-like models for agentic applications. We first introduce Reflective Monte Carlo Tree Search (R-MCTS), a novel test time algorithm designed to enhance AI agents' ability to explore decision space on the fly. R-MCTS extends traditional MCTS by 1) incorporating contrastive reflection, allowing agents to learn from past interactions and dynamically improve their search efficiency; and 2) using multi-agent debate for reliable state evaluation. Next, we introduce Exploratory Learning, a novel learning strategy to teach agents to search at inference time without relying on any external search algorithms. On the challenging VisualWebArena benchmark, our GPT-4o based R-MCTS agent achieves a 6% to 30% relative improvement across various tasks compared to the previous state-of-the-art. Additionally, we show that the knowledge and experience gained from test-time search can be effectively transferred back to GPT-4o via fine-tuning. After Exploratory Learning, GPT-4o 1) demonstrates the ability to explore the environment, evaluate a state, and backtrack to viable ones when it detects that the current state cannot lead to success, and 2) matches 87% of R-MCTS's performance while using significantly less compute. Notably, our work demonstrates the compute scaling properties in both training - data collection with R-MCTS - and testing time. These results suggest a promising research direction to enhance VLMs' capabilities for agentic applications via test-time search and self-learning.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-17-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - West Virginia (0.04)
  - Pennsylvania (0.04)
  - Virginia (0.04)
- Asia > Japan
  - Honshū
    - Kantō > Tokyo Metropolis Prefecture
      - Tokyo (0.14)
    - Chūbu > Toyama Prefecture
      - Toyama (0.04)

Genre:
- Research Report > New Finding (0.87)

Industry:
- Automobiles & Trucks (0.47)
- Transportation > Ground
  - Road (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning
    - Search (1.00)
    - Agents (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found