ALHD: A Large-Scale and Multigenre Benchmark Dataset for Arabic LLM-Generated Text Detection
Khairallah, Ali, Zubiaga, Arkaitz
–arXiv.org Artificial Intelligence
Misuse of LLM-generated texts Modern LLMs are increasingly capable of generating highly fluent humanlike texts and adaptive to multiple dialects across genres [21]. While this progress unlocks many opportunities towards resolving daily life challenges, it also introduces risks in distinguishing between human-and machine-generated texts [22]. Undetected contents can cause serious cyber threats including misinformation, academic dishonesty, and even more aggressive consequences with phishing, smishing, and social engineering, where convincing texts are often used to manipulate individuals and organizations [23, 24]. For instance, the Anti-Phishing Working Group (APWG) have indicated in their phishing activity trends report that 932,923 phishing attacks were recorded worldwide in the third quarter of 2024, highlighting a significant increase in smishing by 22% during the same period. APWG also reported that social media platforms were the most targeted sector, representing 30.5% of all phishing attacks [24]. Furthermore, according to the FBI Internet Crime Report, losses due to cybercrimes in the United States have exceeded $12.5 billion in 2023 [25]. These examples illustrate how malicious manoeuvres can exploit machine-generated texts to scale deception. Hence, robust methods for detecting LLM-generated texts are urgently needed particularly with linguistically sensitive scripts such as Arabic, where formal, legal, and religious texts require extra reliability.
arXiv.org Artificial Intelligence
Oct-23-2025
- Country:
- Europe
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- France > Provence-Alpes-Côte d'Azur
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Florida > Duval County
- Jacksonville (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- California > Los Angeles County
- Canada
- Europe
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Technology: