CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction
Ye, Jingheng, Xu, Zishan, Li, Yinghui, Cheng, Xuxin, Song, Linlin, Zhou, Qingyu, Zheng, Hai-Tao, Shen, Ying, Su, Xin
–arXiv.org Artificial Intelligence
The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute to revealing the critical characteristics and locating drawbacks of GEC systems. Evaluating systems by Combining these dimensions leads to high human consistency over other reference-based and reference-less metrics. Extensive experiments on 2 human judgement datasets and 6 reference datasets demonstrate the effectiveness and robustness of our method. All the codes will be released after the peer review.
arXiv.org Artificial Intelligence
Jun-30-2024
- Country:
- Asia > Middle East
- UAE (0.14)
- Europe > Austria
- Vienna (0.14)
- North America
- Canada (0.46)
- United States > Texas (0.14)
- Asia > Middle East
- Genre:
- Research Report (0.82)
- Technology: