Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Schmidtová, Patrícia, Mahamood, Saad, Balloccu, Simone, Dušek, Ondřej, Gatt, Albert, Gkatzia, Dimitra, Howcroft, David M., Plátek, Ondřej, Sivaprasad, Adarsa
–arXiv.org Artificial Intelligence
There is now a Given the well-documented shortcomings of automatic significant body of contributions presenting experimental metrics, our goal in this paper is to survey research, meta-analyses and/or best practice the current state of play in metric-based evaluations guidelines, on issues ranging from statistical significance of natural language generation (NLG). As with the testing (Dror and Reichart, 2018), to human above-mentioned studies focusing on other facets evaluation methods (Howcroft et al., 2020a; van der of evaluation, we aim to both understand how metrics Lee et al., 2021; Hämäläinen and Alnajjar, 2021; are currently used in NLG, and to identify gaps Shimorina and Belz, 2022a), error analysis (van and possible ways forward in an effort to improve Miltenburg et al., 2021a, 2023) and replicability of the scientific quality of NLG research.
arXiv.org Artificial Intelligence
Aug-17-2024
- Country:
- North America
- United States
- Pennsylvania (0.04)
- Michigan (0.04)
- Washington > King County
- Seattle (0.04)
- Ohio > Franklin County
- Columbus (0.04)
- California
- San Francisco County > San Francisco (0.04)
- San Diego County > San Diego (0.04)
- Canada > Ontario
- Toronto (0.08)
- United States
- Europe
- Czechia > Prague (0.06)
- Netherlands > Utrecht (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Italy
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Germany > North Rhine-Westphalia
- Düsseldorf Region > Düsseldorf (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom
- Scotland
- City of Aberdeen > Aberdeen (0.04)
- City of Edinburgh > Edinburgh (0.04)
- England > East Sussex
- Brighton (0.04)
- Scotland
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- China
- Middle East > UAE
- North America
- Genre:
- Overview (1.00)
- Research Report
- New Finding (0.46)
- Experimental Study (0.34)
- Technology: