CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
Li, Huihan, Jiang, Liwei, Huang, Jena D., Kim, Hyunwoo, Santy, Sebastin, Sorensen, Taylor, Lin, Bill Yuchen, Dziri, Nouha, Ren, Xiang, Choi, Yejin
–arXiv.org Artificial Intelligence
As the utilization of large language models (LLMs) has proliferated worldwide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations, and extract symbols from these generations that are associated to each culture by the LLM. We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures. We also discover that LLMs have an uneven degree of diversity in the culture symbols, and that cultures from different geographic regions have different presence in LLMs' culture-agnostic generation. Our findings promote further research in studying the knowledge and fairness of global culture perception in LLMs. Code and Data can be found in: https://github.com/huihanlhh/Culture-Gen/
arXiv.org Artificial Intelligence
Apr-26-2024
- Country:
- South America
- Oceania
- New Zealand (0.04)
- Australia (0.04)
- North America
- United States > California (0.14)
- Trinidad and Tobago (0.04)
- Puerto Rico (0.04)
- Dominican Republic (0.04)
- Nicaragua (0.04)
- Guatemala (0.04)
- Mexico (0.04)
- Canada (0.04)
- El Salvador (0.04)
- Haiti (0.04)
- Europe
- Finland (0.04)
- Montenegro (0.04)
- Bosnia and Herzegovina (0.04)
- Austria (0.04)
- Bulgaria (0.04)
- Andorra (0.04)
- Poland (0.04)
- Germany (0.04)
- Spain (0.04)
- Netherlands (0.04)
- Iceland (0.04)
- Switzerland (0.04)
- Denmark (0.04)
- Albania (0.04)
- Norway (0.04)
- Slovakia (0.04)
- Slovenia (0.04)
- Serbia (0.04)
- France (0.04)
- Italy (0.04)
- Russia (0.04)
- Middle East > Cyprus (0.04)
- Greece (0.04)
- Latvia (0.04)
- Lithuania (0.04)
- Estonia (0.04)
- Romania (0.04)
- Belgium (0.04)
- Ukraine (0.04)
- Croatia (0.04)
- Sweden (0.04)
- Czechia (0.04)
- United Kingdom (0.04)
- Kosovo (0.04)
- Ireland (0.04)
- Moldova (0.04)
- Portugal (0.04)
- Hungary (0.04)
- North Macedonia (0.04)
- Belarus (0.04)
- Asia
- India (0.04)
- Singapore (0.04)
- Southeast Asia (0.04)
- East Asia (0.04)
- Central Asia (0.04)
- China > Hong Kong (0.04)
- Tajikistan (0.04)
- Thailand (0.04)
- Kyrgyzstan (0.04)
- Myanmar (0.04)
- Taiwan (0.04)
- South Korea (0.04)
- Armenia (0.04)
- Russia (0.04)
- Philippines (0.04)
- Maldives (0.04)
- Japan (0.04)
- Vietnam (0.04)
- Azerbaijan (0.04)
- Bangladesh (0.04)
- Uzbekistan (0.04)
- Kazakhstan (0.04)
- Macao (0.04)
- Malaysia (0.04)
- Indonesia > Bali (0.04)
- Mongolia (0.04)
- Pakistan (0.04)
- Middle East
- Africa
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Leisure & Entertainment (0.68)
- Media (0.46)
- Technology: