Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents
Komoravolu, Sameer, Mrini, Khalil
–arXiv.org Artificial Intelligence
LLM agents are increasingly deployed to plan, retrieve, and write with tools, yet evaluation still leans on static benchmarks and small human studies. We present the Agent-Testing Agent (A T A), a meta-agent that combines static code analysis, designer interrogation, literature mining, and persona-driven adversarial test generation whose difficulty adapts via judge feedback. Each dialogue is scored with an LLM-as-a-Judge (LAAJ) rubric and used to steer subsequent tests toward the agent's weakest capabilities. On a travel planner and a Wikipedia writer, the A T A surfaces more diverse and severe failures than expert annotators while matching severity, and finishes in 20-30 minutes versus ten-annotator rounds that took days.
arXiv.org Artificial Intelligence
Aug-26-2025
- Country:
- Europe > Austria (0.28)
- North America
- Mexico (0.28)
- United States > New Mexico (0.14)
- Genre:
- Research Report (0.40)
- Industry:
- Consumer Products & Services > Travel (0.95)
- Technology: