RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue
Shi, Zhengliang, Sun, Weiwei, Zhang, Shuo, Zhang, Zhen, Ren, Pengjie, Ren, Zhaochun
–arXiv.org Artificial Intelligence
Evaluating open-domain dialogue systems is challenging for reasons such as the one-to-many problem, i.e., many appropriate responses other than just the golden response. As of now, automatic evaluation methods need better consistency with humans, while reliable human evaluation can be time- and cost-intensive. To this end, we propose the Reference-Assisted Dialogue Evaluation (RADE) approach under the multi-task learning framework, which leverages the pre-created utterance as reference other than the gold response to relief the one-to-many problem. Specifically, RADE explicitly compares reference and the candidate response to predict their overall scores. Moreover, an auxiliary response generation task enhances prediction via a shared encoder. To support RADE, we extend three datasets with additional rated responses other than just a golden response by human annotation. Experiments on our three datasets and two existing benchmarks demonstrate the effectiveness of our method, where Pearson, Spearman, and Kendall correlations with human evaluation outperform state-of-the-art baselines.
arXiv.org Artificial Intelligence
Sep-17-2023
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- Texas (0.04)
- Pennsylvania (0.04)
- Michigan (0.04)
- Louisiana (0.04)
- Washington > King County
- Seattle (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Canada > British Columbia
- Europe
- United Kingdom
- Scotland > City of Aberdeen
- Aberdeen (0.04)
- England > Greater London
- London (0.04)
- Scotland > City of Aberdeen
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom
- Asia
- Taiwan > Taiwan Province
- Taipei (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- China
- Shandong Province > Qingdao (0.04)
- Hong Kong (0.04)
- Taiwan > Taiwan Province
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Retail (0.68)
- Technology: