A Survey of Large Language Models for Arabic Language and its Dialects
Mashaabi, Malak, Al-Khalifa, Shahad, Al-Khalifa, Hend
–arXiv.org Artificial Intelligence
This survey offers a comprehensive overview of Large Language Models (LLMs) designed for Arabic language and its dialects. It covers key architectures, including encoder-only, decoder-only, and encoder-decoder models, along with the datasets used for pre-training, spanning Classical Arabic, Modern Standard Arabic, and Dialectal Arabic. The study also explores monolingual, bilingual, and multilingual LLMs, analyzing their architectures and performance across downstream tasks, such as sentiment analysis, named entity recognition, and question answering. Furthermore, it assesses the openness of Arabic LLMs based on factors, such as source code availability, training data, model weights, and documentation. The survey highlights the need for more diverse dialectal datasets and attributes the importance of openness for research reproducibility and transparency. It concludes by identifying key challenges and opportunities for future research and stressing the need for more inclusive and representative models.
arXiv.org Artificial Intelligence
Oct-26-2024
- Country:
- South America
- Colombia > Meta Department
- Villavicencio (0.04)
- Brazil > Rio de Janeiro
- Rio de Janeiro (0.04)
- Colombia > Meta Department
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada > Ontario
- Toronto (0.04)
- Europe
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ukraine > Kyiv Oblast
- Asia
- Singapore (0.04)
- Indonesia > Bali (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- Jordan (0.04)
- Kuwait (0.04)
- Palestine (0.04)
- Bahrain (0.04)
- Yemen > Amanat Al Asimah
- Sanaa (0.04)
- Saudi Arabia
- Riyadh Province > Riyadh (0.04)
- Mecca Province > Jeddah (0.04)
- Israel > Jerusalem District
- Jerusalem (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Lebanon > Beirut Governorate
- Beirut (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- Oman > Muscat Governorate
- Muscat (0.04)
- Syria
- Damascus Governorate > Damascus (0.04)
- Aleppo Governorate > Aleppo (0.04)
- Iraq
- Nineveh Governorate > Mosul (0.04)
- Basra Governorate > Basra (0.04)
- Baghdad Governorate > Baghdad (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Africa
- North Africa (0.04)
- Mauritania (0.04)
- Sudan
- Khartoum State > Khartoum (0.04)
- Khartoum (0.04)
- Middle East
- Morocco (0.04)
- Somalia (0.04)
- Djibouti (0.04)
- Tunisia > Tunis Governorate
- Tunis (0.04)
- Libya > Benghazi District
- Benghazi (0.04)
- Egypt
- Cairo Governorate > Cairo (0.04)
- Aswan Governorate > Aswan (0.04)
- Algeria > Annaba Province
- Annaba (0.04)
- South America
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.47)
- Technology: