WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback

Hu, Minda, Fang, Tianqing, Zhang, Jianshu, Ma, Junyu, Zhang, Zhisong, Zhou, Jingyan, Zhang, Hongming, Mi, Haitao, Yu, Dong, King, Irwin

Sep-19-2025–arXiv.org Artificial Intelligence

Web agents powered by Large Language Models (LLMs) show promise for next-generation AI, but their limited reasoning in uncertain, dynamic web environments hinders robust deployment. In this paper, we identify key reasoning skills essential for effective web agents, i.e., reflection & lookahead, branching, and rollback, and curate trajectory data that exemplifies these abilities by reconstructing the agent's (inference-time) reasoning algorithms into chain-of-thought rationales. We conduct experiments in the agent self-improving benchmark, OpenWebVoyager, and demonstrate that distilling salient reasoning patterns into the backbone LLM via simple fine-tuning can substantially enhance its performance. Our approach yields significant improvements across multiple benchmarks, including WebVoyager, Mind2web-live, and SimpleQA (web search), highlighting the potential of targeted reasoning skill enhancement for web agents.

large language model, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

Sep-19-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)
- Asia (0.67)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Media (0.46)
- Information Technology (0.46)

Technology:
- Information Technology
  - Communications > Web (1.00)
  - Artificial Intelligence
    - Representation & Reasoning > Agents (1.00)
    - Natural Language > Large Language Model (1.00)
    - Cognitive Science > Problem Solving (1.00)
    - Machine Learning
      - Neural Networks > Deep Learning (0.71)
      - Learning Graphical Models > Undirected Networks
        Markov Models (0.46)