Large Language Models Can Self-Improve At Web Agent Tasks

Patel, Ajay, Hofmarcher, Markus, Leoveanu-Condrei, Claudiu, Dinu, Marius-Constantin, Callison-Burch, Chris, Hochreiter, Sepp

May-30-2024–arXiv.org Artificial Intelligence

Training models to act as agents that can effectively navigate and perform actions in a complex environment, such as a web browser, has typically been challenging due to lack of training data. Large language models (LLMs) have recently demonstrated some capability to navigate novel environments as agents in a zero-shot or few-shot fashion, purely guided by natural language instructions as prompts. Recent research has also demonstrated LLMs have the capability to exceed their base performance through self-improvement, i.e. fine-tuning on data generated by the model itself. In this work, we explore the extent to which LLMs can self-improve their performance as agents in long-horizon tasks in a complex environment using the WebArena benchmark. In WebArena, an agent must autonomously navigate and perform actions on web pages to achieve a specified objective. We explore fine-tuning on three distinct synthetic training data mixtures and achieve a 31\% improvement in task completion rate over the base model on the WebArena benchmark through a self-improvement procedure. We additionally contribute novel evaluation metrics for assessing the performance, robustness, capabilities, and quality of trajectories of our fine-tuned agent models to a greater degree than simple, aggregate-level benchmark scores currently used to measure self-improvement.

objective, subreddit, trajectory, (16 more...)

arXiv.org Artificial Intelligence

May-30-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts (0.04)
  - Pennsylvania (0.04)
- Europe
  - Monaco (0.04)
  - Austria > Upper Austria (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (1.00)

Industry:
- Government > Regional Government > North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found