On the Effectiveness of Automated Metrics for Text Generation Systems

von Däniken, Pius, Deriu, Jan, Tuggener, Don, Cieliebak, Mark

Oct-24-2022–arXiv.org Artificial Intelligence

A major challenge in the field of Text Generation is evaluation because we lack a sound theory that can be leveraged to extract guidelines for evaluation campaigns. In this work, we propose a first step towards such a theory that incorporates different sources of uncertainty, such as imperfect automated metrics and insufficiently sized test sets. The theory has practical applications, such as determining the number of samples needed to reliably distinguish the performance of a set of Text Generation systems in a given setting. We showcase the application of the theory on the WMT 21 and Spot-The-Bot evaluation data and outline how it can be leveraged to improve the evaluation protocol regarding the reliability, robustness, and significance of the evaluation outcome.

machine learning, metric, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-24-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe
  - Italy > Trentino-Alto Adige/Südtirol
    - Trentino Province > Trento (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Generation (1.00)
  - Machine Learning
    - Performance Analysis > Accuracy (0.94)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found