AITopics | Alhafni, Bashar

Collaborating Authors

Alhafni, Bashar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection

Qwaider, Chatrine, Alhafni, Bashar, Chirkunov, Kirill, Habash, Nizar, Briscoe, Ted

arXiv.org Artificial IntelligenceMar-22-2025

Automated Essay Scoring (AES) plays a crucial role in assessing language learners' writing quality, reducing grading workload, and providing real-time feedback. Arabic AES systems are particularly challenged by the lack of annotated essay datasets. This paper presents a novel framework leveraging Large Language Models (LLMs) and Transformers to generate synthetic Arabic essay datasets for AES. We prompt an LLM to generate essays across CEFR proficiency levels and introduce controlled error injection using a fine-tuned Standard Arabic BERT model for error type prediction. Our approach produces realistic human-like essays, contributing a dataset of 3,040 annotated essays. Additionally, we develop a BERT-based auto-marking system for accurate and scalable Arabic essay evaluation. Experimental results demonstrate the effectiveness of our framework in improving Arabic AES performance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.17739

Country:

Asia > Thailand (0.14)
Europe > Ukraine (0.14)
Europe > Germany (0.14)
Europe > France (0.14)

Genre:

Overview (0.93)
Research Report > New Finding (0.48)
Personal > Interview (0.46)

Industry:

Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.71)
Education > Educational Technology > Educational Software > Computer Based Training (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study

Alhafni, Bashar, Habash, Nizar

arXiv.org Artificial IntelligenceMar-2-2025

Text editing frames grammatical error correction (GEC) as a sequence tagging problem, where edit tags are assigned to input tokens, and applying these edits results in the corrected text. This approach has gained attention for its efficiency and interpretability. However, while extensively explored for English, text editing remains largely underexplored for morphologically rich languages like Arabic. In this paper, we introduce a text editing approach that derives edit tags directly from data, eliminating the need for language-specific edits. We demonstrate its effectiveness on Arabic, a diglossic and morphologically rich language, and investigate the impact of different edit representations on model performance. Our approach achieves SOTA results on two Arabic GEC benchmarks and performs on par with SOTA on two others. Additionally, our models are over six times faster than existing Arabic GEC systems, making our approach more practical for real-world applications. Finally, we explore ensemble models, demonstrating how combining different models leads to further performance improvements. We make our code, data, and pretrained models publicly available.

computational linguistic, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.00985

Country:

Europe (1.00)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Data Science > Data Quality > Data Cleaning (0.64)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.64)

Add feedback

Strategies for Arabic Readability Modeling

Liberato, Juan Piñeros, Alhafni, Bashar, Khalil, Muhamed Al, Habash, Nizar

arXiv.org Artificial IntelligenceJul-3-2024

Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility. However, Arabic readability assessment is a challenging task due to Arabic's morphological richness and limited readability resources. In this paper, we present a set of experimental results on Arabic readability assessment using a diverse range of approaches, from rule-based methods to Arabic pretrained language models. We report our results on a newly created corpus at different textual granularity levels (words and sentence fragments). Our results show that combining different techniques yields the best results, achieving an overall macro F1 score of 86.7 at the word level and 87.9 at the fragment level on a blind test set. We make our code, data, and pretrained models publicly available.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.03032

Country:

Europe (1.00)
Asia (1.00)
North America > United States > New York (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exploiting Dialect Identification in Automatic Dialectal Text Normalization

Alhafni, Bashar, Al-Towaity, Sarah, Fawzy, Ziyad, Nassar, Fatema, Eryani, Fadhl, Bouamor, Houda, Habash, Nizar

arXiv.org Artificial IntelligenceJul-3-2024

Dialectal Arabic is the primary spoken language used by native Arabic speakers in daily communication. The rise of social media platforms has notably expanded its use as a written language. However, Arabic dialects do not have standard orthographies. This, combined with the inherent noise in user-generated content on social media, presents a major challenge to NLP applications dealing with Dialectal Arabic. In this paper, we explore and report on the task of CODAfication, which aims to normalize Dialectal Arabic into the Conventional Orthography for Dialectal Arabic (CODA). We work with a unique parallel corpus of multiple Arabic dialects focusing on five major city dialects. We benchmark newly developed pretrained sequence-to-sequence models on the task of CODAfication. We further show that using dialect identification information improves the performance across all dialects. We make our code, data, and pretrained models publicly available.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.0302

Country:

Europe (1.00)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.95)
Information Technology > Communications > Social Media (0.86)

Add feedback

The SAMER Arabic Text Simplification Corpus

Alhafni, Bashar, Hazim, Reem, Liberato, Juan Piñeros, Khalil, Muhamed Al, Habash, Nizar

arXiv.org Artificial IntelligenceApr-29-2024

We present the SAMER Corpus, the first manually annotated Arabic parallel corpus for text simplification targeting school-aged learners. Our corpus comprises texts of 159K words selected from 15 publicly available Arabic fiction novels most of which were published between 1865 and 1955. Our corpus includes readability level annotations at both the document and word levels, as well as two simplified parallel versions for each text targeting learners at two different readability levels. We describe the corpus selection process, and outline the guidelines we followed to create the annotations and ensure their quality. Our corpus is publicly available to support and encourage research on Arabic text simplification, Arabic automatic readability assessment, and the development of Arabic pedagogical language technologies.

machine learning, natural language, simplification, (19 more...)

arXiv.org Artificial Intelligence

2404.18615

Country:

Europe (1.00)
North America > United States > Colorado (0.28)
North America > United States > California (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

mEdIT: Multilingual Text Editing via Instruction Tuning

Raheja, Vipul, Alikaniotis, Dimitris, Kulkarni, Vivek, Alhafni, Bashar, Kumar, Dhruv

arXiv.org Artificial IntelligenceApr-17-2024

We introduce mEdIT, a multi-lingual extension to CoEdIT -- the recent state-of-the-art text editing models for writing assistance. mEdIT models are trained by fine-tuning multi-lingual large, pre-trained language models (LLMs) via instruction tuning. They are designed to take instructions from the user specifying the attributes of the desired text in the form of natural language instructions, such as Grammatik korrigieren (German) or Parafrasee la oraci\'on (Spanish). We build mEdIT by curating data from multiple publicly available human-annotated text editing datasets for three text editing tasks (Grammatical Error Correction (GEC), Text Simplification, and Paraphrasing) across diverse languages belonging to six different language families. We detail the design and training of mEdIT models and demonstrate their strong performance on many multi-lingual text editing benchmarks against other multilingual LLMs. We also find that mEdIT generalizes effectively to new languages over multilingual baselines. We publicly release our data, code, and trained models at https://github.com/vipulraheja/medit.

computational linguistic, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2402.16472

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Personalized Text Generation with Fine-Grained Linguistic Control

Alhafni, Bashar, Kulkarni, Vivek, Kumar, Dhruv, Raheja, Vipul

arXiv.org Artificial IntelligenceFeb-7-2024

As the text generation capabilities of large language models become increasingly prominent, recent studies have focused on controlling particular aspects of the generated text to make it more personalized. However, most research on controllable text generation focuses on controlling the content or modeling specific high-level/coarse-grained attributes that reflect authors' writing styles, such as formality, domain, or sentiment. In this paper, we focus on controlling fine-grained attributes spanning multiple linguistic dimensions, such as lexical and syntactic attributes. We introduce a novel benchmark to train generative models and evaluate their ability to generate personalized text based on multiple fine-grained linguistic attributes. We systematically investigate the performance of various large language models on our benchmark and draw insights from the factors that impact their performance. We make our code, data, and pretrained models publicly available.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2402.04914

Country:

North America > United States > New Mexico (0.14)
Asia > Middle East > UAE (0.14)
North America > United States > New York (0.14)
Europe > Italy > Tuscany (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation

Alhafni, Bashar, Inoue, Go, Khairallah, Christian, Habash, Nizar

arXiv.org Artificial IntelligenceNov-9-2023

Grammatical error correction (GEC) is a well-explored problem in English with many existing models and datasets. However, research on GEC in morphologically rich languages has been limited due to challenges such as data scarcity and language complexity. In this paper, we present the first results on Arabic GEC using two newly developed Transformer-based pretrained sequence-to-sequence models. We also define the task of multi-class Arabic grammatical error detection (GED) and present the first results on multi-class Arabic GED. We show that using GED information as an auxiliary input in GEC models improves GEC performance across three datasets spanning different genres. Moreover, we also investigate the use of contextual morphological preprocessing in aiding GEC systems. Our models achieve SOTA results on two Arabic GEC shared task datasets and establish a strong benchmark on a recently created dataset. We make our code, data, and pretrained models publicly available.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.14734

Country:

Europe (1.00)
Asia > Middle East (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Perception, performance, and detectability of conversational artificial intelligence across 32 university courses

Ibrahim, Hazem, Liu, Fengyuan, Asim, Rohail, Battu, Balaraju, Benabderrahmane, Sidahmed, Alhafni, Bashar, Adnan, Wifag, Alhanai, Tuka, AlShebli, Bedoor, Baghdadi, Riyadh, Bélanger, Jocelyn J., Beretta, Elena, Celik, Kemal, Chaqfeh, Moumena, Daqaq, Mohammed F., Bernoussi, Zaynab El, Fougnie, Daryl, de Soto, Borja Garcia, Gandolfi, Alberto, Gyorgy, Andras, Habash, Nizar, Harris, J. Andrew, Kaufman, Aaron, Kirousis, Lefteris, Kocak, Korhan, Lee, Kangsan, Lee, Seungah S., Malik, Samreen, Maniatakos, Michail, Melcher, David, Mourad, Azzam, Park, Minsu, Rasras, Mahmoud, Reuben, Alicja, Zantout, Dania, Gleason, Nancy W., Makovi, Kinga, Rahwan, Talal, Zaki, Yasir

arXiv.org Artificial IntelligenceMay-7-2023

The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work -- a possibility that has sparked discussions on the integrity of student evaluations in the age of artificial intelligence (AI). To date, it is unclear how such tools perform compared to students on university-level courses. Further, students' perspectives regarding the use of such tools, and educators' perspectives on treating their use as plagiarism, remain unknown. Here, we compare the performance of ChatGPT against students on 32 university-level courses. We also assess the degree to which its use can be detected by two classifiers designed specifically for this purpose. Additionally, we conduct a survey across five countries, as well as a more in-depth survey at the authors' institution, to discern students' and educators' perceptions of ChatGPT's use. We find that ChatGPT's performance is comparable, if not superior, to that of students in many courses. Moreover, current AI-text classifiers cannot reliably detect ChatGPT's use in school work, due to their propensity to classify human-written answers as AI-generated, as well as the ease with which AI-generated text can be edited to evade detection. Finally, we find an emerging consensus among students to use the tool, and among educators to treat this as plagiarism. Our findings offer insights that could guide policy discussions addressing the integration of AI into educational frameworks.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41598-023-38964-3

2305.13934

Country:

Asia (0.96)
North America > United States (0.69)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Education > Educational Setting > Higher Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback