How Many Instructions Can LLMs Follow at Once?

Jaroslawicz, Daniel, Whiting, Brendan, Shah, Parth, Maamari, Karime

Jul-16-2025–arXiv.org Artificial Intelligence

Production-grade LLM systems require robust adherence to dozens or even hundreds of instructions simultaneously. However, the instruction-following capabilities of LLMs at high instruction densities have not yet been characterized, as existing benchmarks only evaluate models on tasks with a single or few instructions. We introduce IFScale, a simple benchmark of 500 keyword-inclusion instructions for a business report writing task to measure how instruction-following performance degrades as instruction density increases. We evaluate 20 state-of-the-art models across seven major providers and find that even the best frontier models only achieve 68% accuracy at the max density of 500 instructions. Our analysis reveals model size and reasoning capability to correlate with 3 distinct performance degradation patterns, bias towards earlier instructions, and distinct categories of instruction-following errors. Our insights can help inform design of instruction-dense prompts in real-world applications and highlight important performance-latency tradeoffs. We open-source the benchmark and all results for further analysis at https://distylai.github.io/IFScale.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jul-16-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.46)
- North America > United States (0.46)

Genre:
- Research Report (1.00)
- Financial News (1.00)

Industry:
- Law (1.00)
- Health & Medicine (1.00)
- Energy > Renewable (1.00)
- Information Technology > Security & Privacy (0.93)
- Banking & Finance > Trading (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.71)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found