Risks of Cultural Erasure in Large Language Models
Qadri, Rida, Davani, Aida M., Robinson, Kevin, Prabhakaran, Vinodkumar
–arXiv.org Artificial Intelligence
Large language models are increasingly being integrated into applications that shape the production and discovery of societal knowledge such as search, online education, and travel planning. As a result, language models will shape how people learn about, perceive and interact with global cultures making it important to consider whose knowledge systems and perspectives are represented in models. Recognizing this importance, increasingly work in Machine Learning and NLP has focused on evaluating gaps in global cultural representational distribution within outputs. However, more work is needed on developing benchmarks for cross-cultural impacts of language models that stem from a nuanced sociologically-aware conceptualization of cultural impact or harm. We join this line of work arguing for the need of metricizable evaluations of language technologies that interrogate and account for historical power inequities and differential impacts of representation on global cultures, particularly for cultures already under-represented in the digital corpora. We look at two concepts of erasure: omission: where cultures are not represented at all and simplification i.e. when cultural complexity is erased by presenting one-dimensional views of a rich culture. The former focuses on whether something is represented, and the latter on how it is represented. We focus our analysis on two task contexts with the potential to influence global cultural production. First, we probe representations that a language model produces about different places around the world when asked to describe these contexts. Second, we analyze the cultures represented in the travel recommendations produced by a set of language model applications. Our study shows ways in which the NLP community and application developers can begin to operationalize complex socio-cultural considerations into standard evaluations and benchmarks.
arXiv.org Artificial Intelligence
Jan-1-2025
- Country:
- Africa
- North Africa (0.04)
- Southern Africa (0.04)
- Sub-Saharan Africa (0.04)
- Angola > Luanda Province (0.04)
- Sudan
- Khartoum (0.04)
- Khartoum State > Khartoum (0.04)
- Tanzania > Dar es Salaam Region
- Dar es Salaam (0.04)
- South Africa
- Gauteng > Johannesburg (0.04)
- Western Cape > Cape Town (0.04)
- Côte d'Ivoire > Abidjan
- Abidjan (0.04)
- West Africa (0.04)
- Nigeria (0.04)
- Middle East
- East Africa (0.04)
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Central Africa (0.04)
- Democratic Republic of the Congo > Kinshasa Province
- Kinshasa (0.04)
- Asia
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Pakistan > Sindh
- Karachi Division > Karachi (0.04)
- Central Asia (0.04)
- East Asia (0.04)
- Indonesia
- Middle East
- Iran > Tehran Province
- Tehran (0.04)
- Iraq > Baghdad Governorate
- Baghdad (0.04)
- Israel > Jerusalem District
- Jerusalem (0.05)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Saudi Arabia
- Mecca Province > Mecca (0.04)
- Riyadh Province > Riyadh (0.04)
- Iran > Tehran Province
- China
- Shanghai > Shanghai (0.04)
- Tibet Autonomous Region (0.14)
- South Korea > Seoul
- Seoul (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Singapore (0.04)
- India > Uttar Pradesh (0.04)
- Bangladesh > Dhaka Division
- Dhaka District > Dhaka (0.04)
- Japan > Honshū
- Europe
- Romania > București - Ilfov Development Region
- Municipality of Bucharest > Bucharest (0.04)
- Poland > Masovia Province
- Warsaw (0.04)
- Northern Europe (0.04)
- Czechia > Prague (0.04)
- France > Île-de-France
- Serbia > Central Serbia
- Belgrade (0.04)
- Eastern Europe (0.04)
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- Western Europe (0.04)
- Holy See > Vatican City (0.04)
- Middle East
- United Kingdom > England
- Greater London > London (0.04)
- Oxfordshire > Oxford (0.04)
- Hungary > Budapest
- Budapest (0.04)
- Austria > Vienna (0.04)
- Spain > Galicia
- Madrid (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Romania > București - Ilfov Development Region
- North America
- Canada
- Central America (0.14)
- Cuba > La Habana Province
- Havana (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > Los Angeles County
- Los Angeles (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Illinois > Cook County
- Chicago (0.04)
- New York (0.05)
- California > Los Angeles County
- Oceania
- Australia (0.04)
- Micronesia (0.04)
- New Zealand > North Island
- Auckland Region > Auckland (0.04)
- South America
- Argentina > Pampas
- Buenos Aires F.D. > Buenos Aires (0.04)
- Brazil > Rio de Janeiro
- Rio de Janeiro (0.04)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Colombia > Bogotá D.C.
- Bogotá (0.04)
- Peru > Cusco Department
- Cusco Province > Cusco (0.04)
- Argentina > Pampas
- Africa
- Genre:
- Research Report (1.00)
- Industry:
- Consumer Products & Services > Travel (1.00)
- Education > Educational Setting
- Online (0.54)
- Government (1.00)
- Technology: