Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study
Alhafni, Bashar, Habash, Nizar
–arXiv.org Artificial Intelligence
Text editing frames grammatical error correction (GEC) as a sequence tagging problem, where edit tags are assigned to input tokens, and applying these edits results in the corrected text. This approach has gained attention for its efficiency and interpretability. However, while extensively explored for English, text editing remains largely underexplored for morphologically rich languages like Arabic. In this paper, we introduce a text editing approach that derives edit tags directly from data, eliminating the need for language-specific edits. We demonstrate its effectiveness on Arabic, a diglossic and morphologically rich language, and investigate the impact of different edit representations on model performance. Our approach achieves SOTA results on two Arabic GEC benchmarks and performs on par with SOTA on two others. Additionally, our models are over six times faster than existing Arabic GEC systems, making our approach more practical for real-world applications. Finally, we explore ensemble models, demonstrating how combining different models leads to further performance improvements. We make our code, data, and pretrained models publicly available.
arXiv.org Artificial Intelligence
Mar-2-2025
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Maryland > Baltimore (0.04)
- Washington > King County
- Seattle (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Colorado > Denver County
- Denver (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada
- Europe
- Netherlands (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Italy > Tuscany
- Florence (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Middle East
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Singapore (0.05)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.14)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- Lebanon > Beirut Governorate
- Beirut (0.04)
- UAE > Abu Dhabi Emirate
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- China
- Africa > Middle East
- Tunisia > Tunis Governorate
- Tunis (0.04)
- Egypt > Cairo Governorate
- Cairo (0.04)
- Tunisia > Tunis Governorate
- North America
- Genre:
- Research Report (0.64)
- Technology: