Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents

Aug-26-2025–arXiv.org Artificial Intelligence

LLM agents are increasingly deployed to plan, retrieve, and write with tools, yet evaluation still leans on static benchmarks and small human studies. We present the Agent-Testing Agent (A T A), a meta-agent that combines static code analysis, designer interrogation, literature mining, and persona-driven adversarial test generation whose difficulty adapts via judge feedback. Each dialogue is scored with an LLM-as-a-Judge (LAAJ) rubric and used to steer subsequent tests toward the agent's weakest capabilities. On a travel planner and a Wikipedia writer, the A T A surfaces more diverse and severe failures than expert annotators while matching severity, and finishes in 20-30 minutes versus ten-annotator rounds that took days.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

Aug-26-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Austria (0.28)
- North America
  - Mexico (0.28)
  - United States > New Mexico (0.14)

Genre:
- Research Report (0.40)

Industry:
- Consumer Products & Services > Travel (0.95)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Natural Language > Large Language Model (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found