SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes
Bhutani, Mukul, Robinson, Kevin, Prabhakaran, Vinodkumar, Dave, Shachi, Dev, Sunipa
–arXiv.org Artificial Intelligence
While generative multilingual models are rapidly being deployed, their safety and fairness evaluations are largely limited to resources collected in English. This is especially problematic for evaluations targeting inherently socio-cultural phenomena such as stereotyping, where it is important to build multi-lingual resources that reflect the stereotypes prevalent in respective language communities. However, gathering these resources, at scale, in varied languages and regions pose a significant challenge as it requires broad socio-cultural knowledge and can also be prohibitively expensive. To overcome this critical gap, we employ a recently introduced approach that couples LLM generations for scale with culturally situated validations for reliability, and build SeeGULL Multilingual, a global-scale multilingual dataset of social stereotypes, containing over 25K stereotypes, spanning 20 languages, with human annotations across 23 regions, and demonstrate its utility in identifying gaps in model evaluations. Content warning: Stereotypes shared in this paper can be offensive.
arXiv.org Artificial Intelligence
Mar-8-2024
- Country:
- Oceania > New Zealand (0.04)
- South America
- North America
- Mexico > Estado de México (0.04)
- Bermuda (0.04)
- United States
- Washington > King County
- Seattle (0.04)
- New York > New York County
- New York City (0.04)
- Washington > King County
- Canada > Ontario
- Toronto (0.04)
- Europe
- Spain (0.05)
- Portugal (0.05)
- Germany (0.05)
- Italy (0.05)
- Netherlands (0.05)
- Albania (0.04)
- United Kingdom > Northern Ireland (0.04)
- Switzerland (0.04)
- Denmark (0.04)
- Ukraine > Crimea (0.04)
- Slovakia (0.04)
- Slovenia (0.04)
- Serbia (0.04)
- Greece (0.04)
- Romania (0.04)
- Gibraltar (0.04)
- Ireland (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Asia
- India (0.06)
- South Korea (0.05)
- Japan (0.05)
- Bangladesh (0.05)
- Vietnam (0.05)
- Thailand (0.05)
- Malaysia (0.05)
- Singapore (0.04)
- Afghanistan (0.04)
- Armenia (0.04)
- Maldives (0.04)
- Bhutan (0.04)
- North Korea (0.04)
- Nepal (0.04)
- Pakistan (0.04)
- Indonesia > New Guinea
- Western New Guinea > Papua (0.04)
- Middle East
- Republic of Türkiye (0.05)
- Syria (0.04)
- Palestine (0.04)
- Lebanon (0.04)
- Israel (0.04)
- Iraq (0.04)
- UAE
- Sharjah Emirate > Sharjah (0.04)
- Dubai Emirate > Dubai (0.04)
- Ajman Emirate > Ajman (0.04)
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Africa
- Kenya (0.05)
- Rwanda (0.04)
- South Sudan (0.04)
- South Africa (0.04)
- Seychelles (0.04)
- Nigeria (0.04)
- Middle East
- Genre:
- Research Report (0.40)
- Technology: