GTA: A Benchmark for General Tool Agents
–Neural Information Processing Systems
In developing general-purpose agents, significant focus has been placed on integrating large language models (LLMs) with various tools. This poses a challenge to the tool-use capabilities of LLMs. However, there are evident gaps between existing tool evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only inputs, which fail to reveal the agents' real-world problem-solving abilities effectively.
Neural Information Processing Systems
Dec-26-2025, 13:22:50 GMT
- Technology: