CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction
Ye, Jingheng, Xu, Zishan, Li, Yinghui, Cheng, Xuxin, Song, Linlin, Zhou, Qingyu, Zheng, Hai-Tao, Shen, Ying, Su, Xin
–arXiv.org Artificial Intelligence
The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute to revealing the critical characteristics and locating drawbacks of GEC systems. Evaluating systems by Combining these dimensions leads to high human consistency over other reference-based and reference-less metrics. Extensive experiments on 2 human judgement datasets and 6 reference datasets demonstrate the effectiveness and robustness of our method. All the codes will be released after the peer review.
arXiv.org Artificial Intelligence
Jun-30-2024
- Country:
- North America
- United States
- Illinois (0.04)
- Texas > Travis County
- Austin (0.04)
- Canada
- Quebec > Montreal (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- Austria > Vienna (0.14)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Piedmont
- Turin Province > Turin (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Asia
- Singapore (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- China > Beijing
- Beijing (0.04)
- North America
- Genre:
- Research Report (0.82)
- Technology: