BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

Wei, Jason, Sun, Zhiqing, Papay, Spencer, McKinney, Scott, Han, Jeffrey, Fulford, Isa, Chung, Hyung Won, Passos, Alex Tachard, Fedus, William, Glaese, Amelia

Apr-18-2025–arXiv.org Artificial Intelligence

Although the internet has transformed the way we access informa tion, human navigation of the internet to find information is clunky for several reasons: (1) our m emory and world knowledge are limited; (2) our browsing abilities are hindered by distraction and fatig ue; and (3) human brains can only attend to one thing at a time and cannot be parallelized. Machine in telligence, on the other hand, has much more extensive recall and can operate tirelessly without g etting distracted. A sufficiently capable machine intelligence should be able to, in principle, retrieve any well-specified any piece of information from the open web, even if retrieving it would require bro wsing thousands of web pages. As AI progresses from chatbots to reasoners and to agents, th ere has been increased interest in models that can browse the internet beyond simple queries ( Google, 2024; OpenAI, 2025b, a; perplexity.AI, 2025; x.AI, 2025). While past benchmarks have measured the ability to retrieve information ( Joshi et al., 2017; Yang et al., 2018; Thorne et al., 2018; Dinan et al., 2019; Fan et al., 2019; Mialon et al., 2023), most of these benchmarks focus on retrieving information that ca n be found easily, and hence have become saturated by recent language models. Here we introduce a new benchmark called BrowseComp, which stands for "Browsing Competition" and comprises 1,266 challe nging problems that require browsing a large number of websites to solve. Three example questio ns are shown below.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Apr-18-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Industry:
- Leisure & Entertainment (0.94)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (0.95)
  - Machine Learning > Neural Networks
    - Deep Learning (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found