Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
Zhang, Lily Hong, Milli, Smitha, Jusko, Karen, Smith, Jonathan, Amos, Brandon, Bouaziz, Wassim, Revel, Manon, Kussman, Jack, Sheynin, Yasha, Titus, Lisa, Radharapu, Bhaktipriya, Yu, Jane, Sarma, Vidya, Rose, Kris, Nickel, Maximilian
–arXiv.org Artificial Intelligence
How can large language models (LLMs) serve users with varying preferences that may conflict across cultural, political, or other dimensions? To advance this challenge, this paper establishes four key results. First, we demonstrate, through a large-scale multilingual human study with representative samples from five countries (N=15,000), that humans exhibit significantly more variation in preferences than the responses of 21 state-of-the-art LLMs. Second, we show that existing methods for preference dataset collection are insufficient for learning the diversity of human preferences even along two of the most salient dimensions of variability in global values, due to the underlying homogeneity of candidate responses. Third, we argue that this motivates the need for negatively-correlated sampling when generating candidate sets, and we show that simple prompt-based techniques for doing so significantly enhance the performance of alignment methods in learning heterogeneous preferences. Fourth, based on this novel candidate sampling approach, we collect and open-source Community Alignment, the largest and most representative multilingual and multi-turn preference dataset to date, featuring almost 200,000 comparisons from annotators spanning five countries. We hope that the Community Alignment dataset will be a valuable resource for improving the effectiveness of LLMs for a diverse global population.
arXiv.org Artificial Intelligence
Oct-28-2025
- Country:
- Asia
- India (0.04)
- Indonesia > Bali (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Middle East
- Iran (0.04)
- Lebanon > Beirut Governorate
- Beirut (0.04)
- Republic of Türkiye > Konya Province
- Konya (0.04)
- Singapore (0.04)
- Sri Lanka (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Portugal (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- France (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Switzerland (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany > Berlin (0.04)
- Austria > Vienna (0.13)
- Spain > Galicia
- Madrid (0.04)
- North America
- Costa Rica (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- New York > New York County
- New York City (0.04)
- Florida > Miami-Dade County
- South America
- Argentina (0.04)
- Brazil (0.04)
- Colombia > Antioquia Department
- Medellín (0.04)
- Peru > Cusco Department
- Cusco Province > Cusco (0.04)
- Asia
- Genre:
- Instructional Material (0.92)
- Overview (0.92)
- Personal > Interview (0.67)
- Questionnaire & Opinion Survey (0.92)
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Promising Solution (0.67)
- Industry:
- Leisure & Entertainment > Sports
- Basketball (0.67)
- Media
- Music (1.00)
- Television (0.92)
- Banking & Finance (1.00)
- Health & Medicine
- Consumer Health (1.00)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Neurology (0.67)
- Oncology (0.92)
- Psychiatry/Psychology > Mental Health (0.67)
- Law
- Civil Rights & Constitutional Law (1.00)
- Environmental Law (1.00)
- Education > Health & Safety
- School Nutrition (1.00)
- Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (1.00)
- Energy > Renewable (1.00)
- Information Technology > Security & Privacy (1.00)
- Materials > Chemicals
- Agricultural Chemicals (1.00)
- Food & Agriculture > Agriculture
- Pest Control (0.96)
- Government > Regional Government
- Water & Waste Management
- Solid Waste Management (0.92)
- Water Management > Water Supplies & Services (1.00)
- Leisure & Entertainment > Sports
- Technology: