AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
–Neural Information Processing Systems
Large Language Models (LLMs) have demonstrated advanced capabilities in realworld agentic applications. Growing research efforts aim to develop LLM-based agents to address practical demands, introducing a new challenge: agentic scenarios often involve lengthy instructions with complex constraints, such as extended system prompts and detailed tool specifications. While adherence to such instructions is crucial for agentic applications, whether LLMs can reliably follow them remains underexplored. In this paper, we introduce AGENTIF, the first benchmark for systematically evaluating LLM instruction following ability in agentic scenarios. AGENTIF features three key characteristics: (1) Realistic, constructed from 50 real-world agentic applications.
Neural Information Processing Systems
Jun-23-2026, 07:25:34 GMT