Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey
Guan, Shengyue, Xiong, Haoyi, Wang, Jindong, Bian, Jiang, Zhu, Bin, Lou, Jian-guang
–arXiv.org Artificial Intelligence
This survey examines evaluation methods for large language model (LLM)-based agents in multi-turn conversational settings. Using a PRISMA-inspired framework, we systematically reviewed nearly 250 scholarly sources, capturing the state of the art from various venues of publication, and establishing a solid foundation for our analysis. Our study offers a structured approach by developing two interrelated taxonomy systems: one that defines \emph{what to evaluate} and another that explains \emph{how to evaluate}. The first taxonomy identifies key components of LLM-based agents for multi-turn conversations and their evaluation dimensions, including task completion, response quality, user experience, memory and context retention, as well as planning and tool integration. These components ensure that the performance of conversational agents is assessed in a holistic and meaningful manner. The second taxonomy system focuses on the evaluation methodologies. It categorizes approaches into annotation-based evaluations, automated metrics, hybrid strategies that combine human assessments with quantitative measures, and self-judging methods utilizing LLMs. This framework not only captures traditional metrics derived from language understanding, such as BLEU and ROUGE scores, but also incorporates advanced techniques that reflect the dynamic, interactive nature of multi-turn dialogues.
arXiv.org Artificial Intelligence
Mar-28-2025
- Country:
- South America
- Paraguay > Asunción
- Asunción (0.04)
- Colombia > Meta Department
- Villavicencio (0.04)
- Paraguay > Asunción
- North America
- Dominican Republic (0.04)
- United States
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Colorado > Denver County
- Denver (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Florida > Miami-Dade County
- Miami (0.05)
- Washington > King County
- Seattle (0.14)
- New York > New York County
- New York City (0.04)
- Michigan > Washtenaw County
- Mexico > Mexico City
- Mexico City (0.04)
- Canada > Ontario
- Toronto (0.04)
- Europe
- Austria > Vienna (0.14)
- Middle East > Malta (0.04)
- Czechia > Prague (0.04)
- Slovenia > Central Slovenia
- Municipality of Ljubljana > Ljubljana (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- United Kingdom > England
- West Midlands > Birmingham (0.04)
- Oxfordshire > Oxford (0.04)
- Italy
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- South America
- Genre:
- Research Report (1.00)
- Overview (1.00)
- Industry:
- Consumer Products & Services (0.46)
- Technology: