DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation

Park, ChaeHun, Lee, Seungil Chad, Rim, Daniel, Choo, Jaegul

May-25-2023–arXiv.org Artificial Intelligence

Despite the recent advances in open-domain dialogue systems, building a reliable evaluation metric is still a challenging problem. Recent studies proposed learnable metrics based on classification models trained to distinguish the correct response. However, neural classifiers are known to make overly confident predictions for examples from unseen distributions. We propose DEnsity, which evaluates a response by utilizing density estimation on the feature space derived from a neural classifier. Our metric measures how likely a response would appear in the distribution of human conversations. Moreover, to improve the performance of DEnsity, we utilize contrastive learning to further compress the feature space. Experiments on multiple response evaluation datasets show that DEnsity correlates better with human evaluations than the existing metrics. Our code is available at https://github.com/ddehun/DEnsity.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

May-25-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Texas (0.04)
    - Pennsylvania (0.04)
    - Michigan (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe > Spain
  - Catalonia > Barcelona Province > Barcelona (0.04)
- Asia
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - Middle East
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Qatar > Ad-Dawhah
      - Doha (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty (0.62)
  - Natural Language > Discourse & Dialogue (0.47)
  - Machine Learning
    - Neural Networks > Deep Learning (0.48)
    - Statistical Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found