COLING 2022 Highlights

#artificialintelligence 

Recent metrics for natural language generation rely on pre-trained language models, for instance BERTScore, BLEURT, and COMET. These metrics achieve a high correlation with human evaluations on standard benchmarks. However, it is unclear how these metrics perform for styles and domains that aren't well represented in their training data. In other words, are these metrics robust? The authors found that BERTScore isn't robust to character-level perturbations.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found