Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices

Schmidtová, Patrícia, Mahamood, Saad, Balloccu, Simone, Dušek, Ondřej, Gatt, Albert, Gkatzia, Dimitra, Howcroft, David M., Plátek, Ondřej, Sivaprasad, Adarsa

Aug-17-2024–arXiv.org Artificial Intelligence

There is now a Given the well-documented shortcomings of automatic significant body of contributions presenting experimental metrics, our goal in this paper is to survey research, meta-analyses and/or best practice the current state of play in metric-based evaluations guidelines, on issues ranging from statistical significance of natural language generation (NLG). As with the testing (Dror and Reichart, 2018), to human above-mentioned studies focusing on other facets evaluation methods (Howcroft et al., 2020a; van der of evaluation, we aim to both understand how metrics Lee et al., 2021; Hämäläinen and Alnajjar, 2021; are currently used in NLG, and to identify gaps Shimorina and Belz, 2022a), error analysis (van and possible ways forward in an effort to improve Miltenburg et al., 2021a, 2023) and replicability of the scientific quality of NLG research.

computational linguistic, linguistics, proceedings, (13 more...)

arXiv.org Artificial Intelligence

Aug-17-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Pennsylvania (0.04)
    - Michigan (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Ohio > Franklin County
      - Columbus (0.04)
    - California
      - San Francisco County > San Francisco (0.04)
      - San Diego County > San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.08)
- Europe
  - Czechia > Prague (0.06)
  - Netherlands > Utrecht (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy
    - Tuscany > Florence (0.04)
    - Liguria > Genoa (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Germany > North Rhine-Westphalia
    - Düsseldorf Region > Düsseldorf (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - United Kingdom
    - Scotland
      - City of Aberdeen > Aberdeen (0.04)
      - City of Edinburgh > Edinburgh (0.04)
    - England > East Sussex
      - Brighton (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
  - China
    - Hong Kong (0.04)
    - Beijing > Beijing (0.04)

Genre:
- Overview (1.00)
- Research Report
  - New Finding (0.46)
  - Experimental Study (0.34)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found