Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights

Andreux, Mathieu, Skuk, Breno Baldas, Benchekroun, Hamza, Biré, Emilien, Bonnet, Antoine, Bordie, Riaz, Bout, Nathan, Brunel, Matthias, Cedoz, Pierre-Louis, Chassang, Antoine, Chen, Mickaël, Constantinou, Alexandra D., d'Andigné, Antoine, de La Jonquière, Hubert, Delfosse, Aurélien, Denoyer, Ludovic, Deprez, Alexis, Derupti, Augustin, Eickenberg, Michael, Federico, Mathïs, Kantor, Charles, Koegler, Xavier, Labbé, Yann, Lee, Matthew C. H., de Kergaradec, Erwan Le Jumeau, Mahla, Amir, Manevich, Avshalom, Maret, Adrien, Masson, Charles, Maurin, Rafaël, Mena, Arturo, Modard, Philippe, Moyal, Axel, Kerbel, Axel Nguyen, Revelle, Julien, Richter, Mats L., Santos, María, Sifre, Laurent, Theillard, Maxime, Thibault, Marc, Thiry, Louis, Tronchon, Léo, Usunier, Nicolas, Wu, Tony

Jun-12-2025–arXiv.org Artificial Intelligence

Building AI agents requires designing systems capable of acting in and adapting to dynamic digital environments in real time. In this context, Large Language Models (LLMs) have made remarkable progress in reasoning and problem solving, rivaling or even surpassing human experts in domain-specific tasks [12, 32]. However, in their most fundamental form, LLMs are confined to a static, pre-trained world: they cannot act, verify, or access up-to-date information. For instance, they cannot answer questions about current events, book a restaurant table, or avoid hallucination [30, 35]. To circumvent their limitations, research has focused on enhancing LLMs with tool-use capabilities, enabling them to execute code snippets [7, 29], query Application Programming Interfaces (APIs) [18, 31], or retrieve information at scale with multi-step reasoning [33, 38, 24, 26]. These systems, often referred to 1 as agents, extend LLMs into more capable virtual assistants [36]. However, their real-world utility remains bounded by the available predefined tools and the engineering effort required to expand them [13]. Approaching this problem from another angle, computer use agents have recently emerged as a new paradigm in which agents interact with software directly through Graphical User Interfaces (GUIs) [1, 8, 11, 15, 17, 23, 39], i.e. using the same interface humans are presented with. This approach avoids relying on custom integrations or APIs, opening the door to more adaptable general-purpose agents with higher potential and broader real-world utility.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jun-12-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Thailand (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found