JELV: A Judge of Edit-Level Validity for Evaluation and Automated Reference Expansion in Grammatical Error Correction
Zhan, Yuhao, Zhang, Yuqing, Yuan, Jing, Ma, Qixiang, Yang, Zhiqi, Gu, Yu, Liu, Zemin, Wu, Fei
–arXiv.org Artificial Intelligence
Existing Grammatical Error Correction (GEC) systems suffer from limited reference diversity, leading to underestimated evaluation and restricted model generalization. To address this issue, we introduce the Judge of Edit-Level Validity (JELV), an automated framework to validate correction edits from grammaticality, faithfulness, and fluency. Using our proposed human-annotated Pair-wise Edit-level Validity Dataset (PEVData) as benchmark, JELV offers two implementations: a multi-turn LLM-as-Judges pipeline achieving 90% agreement with human annotators, and a distilled DeBERTa classifier with 85% precision on valid edits. We then apply JELV to reclassify misjudged false positives in evaluation and derive a comprehensive evaluation metric by integrating false positive decoupling and fluency scoring, resulting in state-of-the-art correlation with human judgments. We also apply JELV to filter LLM-generated correction candidates, expanding the BEA19's single-reference dataset containing 38,692 source sentences. Retraining top GEC systems on this expanded dataset yields measurable performance gains. JELV provides a scalable solution for enhancing reference diversity and strengthening both evaluation and model generalization.
arXiv.org Artificial Intelligence
Dec-9-2025
- Country:
- Asia
- China
- Beijing > Beijing (0.04)
- Zhejiang Province > Hangzhou (0.04)
- India > Karnataka
- Bengaluru (0.04)
- Indonesia > Bali (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- China
- Europe
- North America
- Canada (0.04)
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Maryland > Baltimore (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Texas > Travis County
- Austin (0.04)
- Oceania > Australia
- Asia
- Genre:
- Overview (0.68)
- Research Report (0.82)
- Technology: