Having Beer after Prayer? Measuring Cultural Bias in Large Language Models

Naous, Tarek, Ryan, Michael J., Ritter, Alan, Xu, Wei

arXiv.org Artificial Intelligence 

It is important that language models appropriately adapt to specific cultural contexts. However, as we show in this paper, multilingual and Arabic monolingual language models default to Western culture even when prompted in Arabic and contextualized by an Arab cultural setting. To measure this Western bias, we introduce CAMeL, a dataset of naturally occurring Arabic prompts spanning eight diverse cultural aspects and an extensive list of 20,504 cultural targets corresponding to Arab or Western culture. Using CAMeL, we show that models favor Western targets and demonstrate cultural unfairness on downstream tasks such as named entity recognition and sentiment analysis. Our analyses of pretraining corpora also reveal that commonly used sources such as Wikipedia may not be suited to build culturally aware models, underscoring the importance of carefully curating pretraining data in constructing language models to serve a global population.