BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

Oct-10-2025, 15:30:59 GMT–Neural Information Processing Systems

To bridge this gap, we introduce the BABILong benchmark, designed to test language models' ability to reason across facts distributed in extremely

arxiv preprint arxiv, babilong, dataset, (15 more...)

Neural Information Processing Systems

Oct-10-2025, 15:30:59 GMT

Conferences PDF

Country:
- North America
  - United States (0.04)
  - Dominican Republic (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
- Asia
  - Russia (0.14)
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - China (0.04)
  - Japan > Kyūshū & Okinawa
    - Kyūshū > Nagasaki Prefecture > Nagasaki (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Research Report > New Finding (0.92)

Industry:
- Law (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

Similar Docs Excel Report more

Title	Similarity	Source
None found