ClarQ-LLM: A Benchmark for Models Clarifying and Requesting Information in Task-Oriented Dialog
Gan, Yujian, Li, Changling, Xie, Jinxia, Wen, Luou, Purver, Matthew, Poesio, Massimo
–arXiv.org Artificial Intelligence
We introduce ClarQ-LLM, an evaluation framework consisting of bilingual English-Chinese conversation tasks, conversational agents and evaluation metrics, designed to serve as a strong benchmark for assessing agents' ability to ask clarification questions in task-oriented dialogues. The benchmark includes 31 different task types, each with 10 unique dialogue scenarios between information seeker and provider agents. The scenarios require the seeker to ask questions to resolve uncertainty and gather necessary information to complete tasks. Unlike traditional benchmarks that evaluate agents based on fixed dialogue content, ClarQ-LLM includes a provider conversational agent to replicate the original human provider in the benchmark. This allows both current and future seeker agents to test their ability to complete information gathering tasks through dialogue by directly interacting with our provider agent. In tests, LLAMA3.1 405B seeker agent managed a maximum success rate of only 60.05\%, showing that ClarQ-LLM presents a strong challenge for future research.
arXiv.org Artificial Intelligence
Sep-14-2024
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Pennsylvania > Philadelphia County
- Mexico > Mexico City
- Mexico City (0.04)
- Europe
- Netherlands > Utrecht (0.04)
- Croatia (0.04)
- Spain
- Valencian Community > Valencia Province
- Valencia (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Valencian Community > Valencia Province
- Slovenia > Central Slovenia
- Municipality of Ljubljana > Ljubljana (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Industry:
- Education > Educational Setting (0.46)
- Technology: