GTA: A Benchmark for General Tool Agents

Mar-21-2026, 12:40:16 GMT–Neural Information Processing Systems

In developing general-purpose agents, significant focus has been placed on integrating large language models (LLMs) with various tools. This poses a challenge to the tool-use capabilities of LLMs. However, there are evident gaps between existing tool evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only inputs, which fail to reveal the agents' real-world problem-solving abilities effectively.

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Mar-21-2026, 12:40:16 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)