arabart
Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART
Masri, Sari, Raddad, Yaqeen, Khandaqji, Fidaa, Ashqar, Huthaifa I., Elhenawy, Mohammed
Recently, with the rapid development in the fields of technology and the increasing amount of text t available on the internet, it has become urgent to develop effective tools for processing and understanding texts in a way that summaries the content without losing the fundamental essence of the information. Given this challenge, we have developed an advanced text summarization system targeting Arabic textbooks. Relying on modern natu-ral language processing models such as MT5, AraBART, AraT5, and mBART50, this system evaluates and extracts the most important sentences found in biology textbooks for the 11th and 12th grades in the Palestinian curriculum, which enables students and teachers to obtain accurate and useful summaries that help them easily understand the content. We utilized the Rouge metric to evaluate the performance of the trained models. Moreover, experts in education Edu textbook authoring assess the output of the trained models. This approach aims to identify the best solutions and clarify areas needing improvement. This research provides a solution for summarizing Arabic text. It enriches the field by offering results that can open new horizons for research and development in the technologies for understanding and generating the Arabic language. Additionally, it contributes to the field with Arabic texts through creating and compiling schoolbook texts and building a dataset.
- Asia > Middle East > Palestine (0.04)
- Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
- Oceania > Australia > Queensland > Brisbane (0.04)
- (2 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.48)
Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation
Alhafni, Bashar, Inoue, Go, Khairallah, Christian, Habash, Nizar
Grammatical error correction (GEC) is a well-explored problem in English with many existing models and datasets. However, research on GEC in morphologically rich languages has been limited due to challenges such as data scarcity and language complexity. In this paper, we present the first results on Arabic GEC using two newly developed Transformer-based pretrained sequence-to-sequence models. We also define the task of multi-class Arabic grammatical error detection (GED) and present the first results on multi-class Arabic GED. We show that using GED information as an auxiliary input in GEC models improves GEC performance across three datasets spanning different genres. Moreover, we also investigate the use of contextual morphological preprocessing in aiding GEC systems. Our models achieve SOTA results on two Arabic GEC shared task datasets and establish a strong benchmark on a recently created dataset. We make our code, data, and pretrained models publicly available.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Washington > King County > Seattle (0.14)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (32 more...)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)