Bangla Grammatical Error Detection Leveraging Transformer-based Token Classification
Islam, Shayekh Bin, Tanvir, Ridwanul Hasan, Afnan, Sihat
–arXiv.org Artificial Intelligence
Bangla is the seventh most spoken language by a total number of speakers in the world, and yet the development of an automated grammar checker in this language is an understudied problem. Bangla grammatical error detection is a task of detecting sub-strings of a Bangla text that contain grammatical, punctuation, or spelling errors, which is crucial for developing an automated Bangla typing assistant. Our approach involves breaking down the task as a token classification problem and utilizing state-of-the-art transformer-based models. Finally, we combine the output of these models and apply rule-based post-processing to generate a more reliable and comprehensive result. Our system is evaluated on a dataset consisting of over 25,000 texts from various sources. Our best model achieves a Levenshtein distance score of 1.04. Finally, we provide a detailed analysis of different components of our system.
arXiv.org Artificial Intelligence
Nov-13-2024
- Country:
- Asia
- Bangladesh > Dhaka Division
- Dhaka District > Dhaka (0.04)
- India > Maharashtra
- Mumbai (0.04)
- Bangladesh > Dhaka Division
- North America > United States
- Washington > King County > Seattle (0.04)
- Asia
- Genre:
- Research Report (0.64)
- Technology: