Nouakchott
Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
Alwajih, Fakhraddin, Magdy, Samar M., Mekki, Abdellah El, Nacar, Omer, Nafea, Youssef, Abdelfadil, Safaa Taher, Yahya, Abdulfattah Mohammed, Luqman, Hamzah, Almarwani, Nada, Aloufi, Samah, Qawasmen, Baraah, Atou, Houdaifa, Sibaee, Serry, Alsayadi, Hamzah A., Al-Dhabyani, Walid, Al-shaibani, Maged S., Aatar, Aya El, Qandos, Nour, Alhamouri, Rahaf, Ahmad, Samar, Al-Ghrawi, Mohammed Anwar, Yacoub, Aminetou, AbuHweidi, Ruwa, Lemin, Vatimetou Mohamed, Abdel-Salam, Reem, Bashiti, Ahlam, Alansari, Aisha, Ashraf, Ahmed, Alturayeif, Nora, Inciarte, Alcides Alcoba, Ammar, Adel, Elmadany, Abdelrahim A., Tourad, Mohamedou Cheikh, Berrada, Ismail, Jarrar, Mustafa, Shehata, Shady, Abdul-Mageed, Muhammad
Mainstream large vision-language models (LVLMs) inherently encode cultural biases, highlighting the need for diverse multimodal datasets. To address this gap, we introduce PEARL, a large-scale Arabic multimodal dataset and benchmark explicitly designed for cultural understanding. Constructed through advanced agentic workflows and extensive human-in-the-loop annotations by 37 annotators from across the Arab world, PEARL comprises over 309K multimodal examples spanning ten culturally significant domains covering all Arab countries. We further provide two robust evaluation benchmarks (PEARL and PEARL-LITE) along with a specialized subset (PEARL-X) explicitly developed to assess nuanced cultural variations. Comprehensive evaluations on state-of-the-art open and proprietary LVLMs demonstrate that reasoning-centric instruction alignment substantially improves models' cultural grounding compared to conventional scaling methods. PEARL establishes a foundational resource for advancing culturally-informed multimodal modeling research. All datasets and benchmarks are publicly available.
- Africa > Sudan (0.28)
- Asia > Middle East > Saudi Arabia (0.14)
- Asia > Middle East > Yemen (0.14)
- (26 more...)
- Overview (0.92)
- Research Report > New Finding (0.67)
- Information Technology (0.67)
- Leisure & Entertainment (0.45)
NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge
Alam, Firoj, Hasan, Md Arid, Laskar, Sahinur Rahman, Kutlu, Mucahid, Darwish, Kareem, Chowdhury, Shammur Absar
The rapid advancement of large language models (LLMs) has raised concerns about cultural bias, fairness, and their applicability in diverse linguistic and underrepresented regional contexts. To enhance and benchmark the capabilities of LLMs, there is a need to develop large-scale resources focused on multilingual, local, and cultural contexts. In this study, we propose the NativQA framework, which can seamlessly construct large-scale, culturally and regionally aligned QA datasets in native languages. The framework utilizes user-defined seed queries and leverages search engines to collect location-specific, everyday information. It has been evaluated across 39 locations in 24 countries and in 7 languages -- ranging from extremely low-resource to high-resource languages -- resulting in over 300K Question-Answer (QA) pairs. The developed resources can be used for LLM benchmarking and further fine-tuning. The framework has been made publicly available for the community (https://gitlab.com/nativqa/nativqa-framework).
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > Middle East > Yemen > Amanat Al Asimah > Sanaa (0.04)
- (45 more...)
Casablanca: Data and Models for Multidialectal Arabic Speech Recognition
Talafha, Bashar, Kadaoui, Karima, Magdy, Samar Mohamed, Habiboullah, Mariem, Chafei, Chafei Mohamed, El-Shangiti, Ahmed Oumar, Zayed, Hiba, tourad, Mohamedou cheikh, Alhamouri, Rahaf, Assi, Rwaa, Alraeesi, Aisha, Mohamed, Hour, Alwajih, Fakhraddin, Mohamed, Abdelrahman, Mekki, Abdellah El, Nagoudi, El Moatez Billah, Saadia, Benelhadj Djelloul Mama, Alsayadi, Hamzah A., Al-Dhabyani, Walid, Shatnawi, Sara, Ech-Chammakhy, Yasir, Makouar, Amal, Berrachedi, Yousra, Jarrar, Mustafa, Shehata, Shady, Berrada, Ismail, Abdul-Mageed, Muhammad
Arabic encompasses a diverse array of for a select few languages. This bias towards linguistic varieties, many of which are nearly mutually resource-rich languages leaves behind the majority unintelligible (Watson, 2007; Abdul-Mageed of the world's languages (Bartelds et al., 2023; et al., 2024). This diversity includes three primary Talafha et al., 2023; Meelen et al., 2024; Tonja categories: Classical Arabic, historically used in et al., 2024). In this work, we report our efforts literature and still employed in religious contexts; to alleviate this challenge for Arabic--a collection Modern Standard Arabic (MSA), used in media, of languages and dialects spoken by more than education, and governmental settings; and numerous 450 million people. We detail a year-long community colloquial dialects, which are the main forms effort to collect and annotate a novel dataset of daily communication across the Arab world and for eight Arabic dialects spanning both Africa and often involve code-switching (Abdul-Mageed et al., Asia. This new dataset, dubbed Casablanca, is rich 2020; Mubarak et al., 2021).
- Africa > Middle East > Morocco > Casablanca-Settat Region > Casablanca (0.65)
- North America > United States (0.28)
- Asia > Middle East > UAE (0.05)
- (12 more...)
Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty
Zhou, Kaitlyn, Hwang, Jena D., Ren, Xiang, Sap, Maarten
As natural language becomes the default interface for human-AI interaction, there is a critical need for LMs to appropriately communicate uncertainties in downstream applications. In this work, we investigate how LMs incorporate confidence about their responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We examine publicly deployed models and find that LMs are unable to express uncertainties when answering questions even when they produce incorrect responses. LMs can be explicitly prompted to express confidences, but tend to be overconfident, resulting in high error rates (on average 47%) among confident responses. We test the risks of LM overconfidence by running human experiments and show that users rely heavily on LM generations, whether or not they are marked by certainty. Lastly, we investigate the preference-annotated datasets used in RLHF alignment and find that humans have a bias against texts with uncertainty. Our work highlights a new set of safety harms facing human-LM interactions and proposes design recommendations and mitigating strategies moving forward.
- Africa > Mauritania > Nouakchott (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Singapore (0.04)
- (11 more...)
- Education (0.68)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data
Naggita, Keziah, LaChance, Julienne, Xiang, Alice
Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.
- Asia > Brunei (0.14)
- North America > Canada > Quebec > Montreal (0.06)
- Africa > Sierra Leone (0.06)
- (142 more...)
- Health & Medicine (0.92)
- Information Technology > Services (0.75)
- Government > Regional Government (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task
Abdul-Mageed, Muhammad, Zhang, Chiyu, Bouamor, Houda, Habash, Nizar
We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect identification (Subtask 2). The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain. As such, NADI is the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level. A total of 61 teams from 25 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 47 submissions for Subtask 1 from 18 teams and 9 submissions for Subtask 2 from 9 teams.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Africa > Middle East > Djibouti (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (63 more...)
Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments
Abdul-Mageed, Muhammad, Zhang, Chiyu, Elmadany, AbdelRahim, Ungar, Lyle
Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties. Inspired by geolocation research, we propose the novel task of Micro-Dialect Identification (MDI) and introduce MARBERT, a new language model with striking abilities to predict a fine-grained variety (as small as that of a city) given a single, short message. For modeling, we offer a range of novel spatially and linguistically-motivated multi-task learning models. To showcase the utility of our models, we introduce a new, large-scale dataset of Arabic micro-varieties (low-resource) suited to our tasks. MARBERT predicts micro-dialects with 9.9% F1, ~76X better than a majority class baseline. Our new language model also establishes new state-of-the-art on several external tasks.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Middle East > Oman (0.05)
- Asia > Middle East > Saudi Arabia (0.05)
- (30 more...)
Finding Generalizable Evidence by Learning to Convince Q&A Models
Perez, Ethan, Karamcheti, Siddharth, Fergus, Rob, Weston, Jason, Kiela, Douwe, Cho, Kyunghyun
We plot the judge's probability of the target answer given that sentence against how often humans also select that target answer given that same sentence. Humans tend to find a sentence to be strong evidence for an answer when the judge model finds it to be strong evidence. Strong evidence to a model tends to be strong evidence to humans as shown in Figure 7. Combined with the previous result, we can see that learned agents are more accurate at predicting sentences that humans find to be strong evidence. F Model Evaluation of Evidence on DREAM Figure 8 shows how convincing various judge models find each evidence agent. Our findings on DREAM are similar to those from RACE in §4.2. Figure 8: On DREAM, how often each judge selects an agent's answer when given a single agent-chosen sentence. The black line divides learned agents (right) and search agents (left), with human evidence selection in the leftmost column. All agents find evidence that convinces judge models more often than a no-evidence baseline (33%). Learned agents predicting p ( i) or p ( i) find the most broadly convincing evidence.
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- North America > Canada (0.04)
- (8 more...)