Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification
–arXiv.org Artificial Intelligence
Automatic Arabic Dialect Identification (ADI) of text has gained great popularity since it was introduced in the early 2010s. Multiple datasets were developed, and yearly shared tasks have been running since 2018. However, ADI systems are reported to fail in distinguishing between the micro-dialects of Arabic. We argue that the currently adopted framing of the ADI task as a single-label classification problem is one of the main reasons for that. We highlight the limitation of the incompleteness of the Dialect labels and demonstrate how it impacts the evaluation of ADI systems. A manual error analysis for the predictions of an ADI, performed by 7 native speakers of different Arabic dialects, revealed that $\approx$ 66% of the validated errors are not true errors. Consequently, we propose framing ADI as a multi-label classification task and give recommendations for designing new ADI datasets.
arXiv.org Artificial Intelligence
Oct-20-2023
- Country:
- North America
- Canada (0.04)
- United States
- Oregon (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Indian Ocean > Arabian Sea
- Gulf of Aden (0.04)
- Europe
- Switzerland (0.04)
- Belgium (0.04)
- Bulgaria > Varna Province
- Varna (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Hungary > Budapest
- Budapest (0.04)
- Italy > Tuscany
- Florence (0.04)
- Spain
- Valencian Community > Valencia Province
- Valencia (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Valencian Community > Valencia Province
- Portugal > Lisbon
- Lisbon (0.04)
- France
- Île-de-France > Paris
- Paris (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Île-de-France > Paris
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Asia
- Singapore (0.04)
- South Korea > Gyeonggi-do
- Suwon (0.04)
- Middle East
- Jordan (0.06)
- Kuwait (0.05)
- Bahrain (0.05)
- Yemen > Amanat Al Asimah
- Sanaa (0.04)
- Saudi Arabia
- Riyadh Province > Riyadh (0.04)
- Mecca Province > Jeddah (0.04)
- Israel > Jerusalem District
- Jerusalem (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Palestine > Gaza Strip
- Gaza Governorate > Gaza (0.04)
- Lebanon > Beirut Governorate
- Beirut (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- Oman > Muscat Governorate
- Muscat (0.04)
- Syria
- Damascus Governorate > Damascus (0.04)
- Aleppo Governorate > Aleppo (0.04)
- Iraq
- Nineveh Governorate > Mosul (0.04)
- Basra Governorate > Basra (0.04)
- Baghdad Governorate > Baghdad (0.04)
- Japan
- Kyūshū & Okinawa > Kyūshū
- Miyazaki Prefecture > Miyazaki (0.04)
- Honshū > Kansai
- Osaka Prefecture > Osaka (0.04)
- Kyūshū & Okinawa > Kyūshū
- China > Shanghai
- Shanghai (0.04)
- Africa
- North Africa (0.04)
- Mauritania (0.04)
- Comoros (0.04)
- Sudan
- Khartoum State > Khartoum (0.04)
- Khartoum (0.04)
- Middle East
- Somalia (0.14)
- Djibouti (0.14)
- Morocco (0.06)
- Tunisia > Tunis Governorate
- Tunis (0.04)
- Libya > Benghazi District
- Benghazi (0.04)
- Egypt
- Cairo Governorate > Cairo (0.04)
- Aswan Governorate > Aswan (0.04)
- Algeria > Annaba Province
- Annaba (0.04)
- North America
- Genre:
- Research Report (0.82)
- Technology: