Agent S: An Open Agentic Framework that Uses Computers Like a Human

Agashe, Saaket, Han, Jiuzhou, Gan, Shuyu, Yang, Jiachen, Li, Ang, Wang, Xin Eric

Oct-10-2024–arXiv.org Artificial Intelligence

We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks. Agent S aims to address three key challenges in automating computer tasks: acquiring domain-specific knowledge, planning over long task horizons, and handling dynamic, non-uniform interfaces. To this end, Agent S introduces experience-augmented hierarchical planning, which learns from external knowledge search and internal experience retrieval at multiple levels, facilitating efficient task planning and subtask execution. In addition, it employs an Agent-Computer Interface (ACI) to better elicit the reasoning and control capabilities of GUI agents based on Multimodal Large Language Models (MLLMs). Evaluation on the OSWorld benchmark shows that Agent S outperforms the baseline by 9.37% on success rate (an 83.6% relative improvement) and achieves a new state-of-the-art. Comprehensive analysis highlights the effectiveness of individual components and provides insights for future improvements. Furthermore, Agent S demonstrates broad generalizability to different operating systems on a newly-released WindowsAgentArena benchmark. Code available at https://github.com/simular-ai/Agent-S.

agent, international conference, subtask, (15 more...)

arXiv.org Artificial Intelligence

Oct-10-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Maryland > Baltimore (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.05)
  - California > San Francisco County
    - San Francisco (0.14)
- Europe
  - Austria > Vienna (0.14)
  - United Kingdom > England
    - Greater London > London (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Japan > Honshū
    - Chūbu > Toyama Prefecture > Toyama (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Workflow (0.68)
- Research Report (0.64)

Technology:
- Information Technology
  - Human Computer Interaction > Interfaces (1.00)
  - Artificial Intelligence
    - Representation & Reasoning > Agents (1.00)
    - Natural Language > Large Language Model (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found