Goto

Collaborating Authors

 ethnicity



Brett Kavanaugh Is Trying to Walk Back "Kavanaugh Stops." Too Late.

Slate

Jurisprudence Brett Kavanaugh Is Trying to Walk Back "Kavanaugh Stops." Justice Brett Kavanaugh does not seem happy that his name has become synonymous with racist immigration enforcement. In September, the justice wrote that Hispanic residents' "apparent ethnicity" could be a "relevant factor" in federal agents' decision to stop them and demand proof of citizenship. Immigration and Customs Enforcement and Customs and Border Protection promptly seized upon his opinion as a license to stop any Hispanic person on the basis of race--often with excessive, even sadistic force --and detain them until they proved their lawful presence. Law professor Anil Kalhan termed these encounters "Kavanaugh stops," and the name swiftly caught on as evidence mounted that they had become standard practice across the country.


Operationalizing Pluralistic Values in Large Language Model Alignment Reveals Trade-offs in Safety, Inclusivity, and Model Behavior

Ali, Dalia, Zhao, Dora, Koenecke, Allison, Papakyriakopoulos, Orestis

arXiv.org Artificial Intelligence

Although large language models (LLMs) are increasingly trained using human feedback for safety and alignment with human values, alignment decisions often overlook human social diversity. This study examines how incorporating pluralistic values affects LLM behavior by systematically evaluating demographic variation and design parameters in the alignment pipeline. We collect alignment data from US and German participants (N = 1,095 participants, 27,375 ratings) who rated LLM responses across five dimensions: Toxicity, Emotional Awareness (EA), Sensitivity, Stereotypical Bias, and Helpfulness. We fine-tuned multiple Large Language Models and Large Reasoning Models using preferences from different social groups while varying rating scales, disagreement handling methods, and optimization techniques. The results revealed systematic demographic effects: male participants rated responses 18% less toxic than female participants; conservative and Black participants rated responses 27.9% and 44% higher on EA than liberal and White participants, respectively. Models fine-tuned on group-specific preferences exhibited distinct behaviors. Technical design choices showed strong effects: the preservation of rater disagreement achieved roughly 53% greater toxicity reduction than majority voting, and 5-point scales yielded about 22% more reduction than binary formats; and Direct Preference Optimization (DPO) consistently outperformed Group Relative Policy Optimization (GRPO) in multi-value optimization. These findings represent a preliminary step in answering a critical question: How should alignment balance expert-driven and user-driven signals to ensure both safety and fair representation?


Assessing Historical Structural Oppression Worldwide via Rule-Guided Prompting of Large Language Models

Chatterjee, Sreejato, Tran, Linh, Nguyen, Quoc Duy, Kirson, Roni, Hamlin, Drue, Aquino, Harvest, Lyu, Hanjia, Luo, Jiebo, Dye, Timothy

arXiv.org Artificial Intelligence

Abstract--Traditional efforts to measure historical structural oppression struggle with cross-national validity due to the unique, locally specified histories of exclusion, colonization, and social status in each country, and often have relied on structured indices that privilege material resources while overlooking lived, identity-based exclusion. We introduce a novel framework for oppression measurement that leverages Large Language Models (LLMs) to generate context-sensitive scores of lived historical disadvantage across diverse geopolitical settings. Using unstructured self-identified ethnicity utterances from a multilingual COVID-19 global study, we design rule-guided prompting strategies that encourage models to produce interpretable, theoretically grounded estimations of oppression. We systematically evaluate these strategies across multiple state-of-the-art LLMs. Our results demonstrate that LLMs, when guided by explicit rules, can capture nuanced forms of identity-based historical oppression within nations. This approach provides a complementary measurement tool that highlights dimensions of systemic exclusion, offering a scalable, cross-cultural lens for understanding how oppression manifests in data-driven research and public health contexts. The study of racial and ethnic inequality remains central to sociological research, with extensive research documenting how structural oppression is reproduced in historical and contemporary contexts [1]-[3]. Oppression can be understood as a social hierarchy in which some groups subject other groups to lower status and to systemic exclusion, dehumanization, and disadvantage. In public health and sociology, this oppression is closely aligned with definitions of systemic and structural racism, which describe racism as deeply embedded in laws, policies, institutional practices, and social norms that sustain widespread inequities, violence, and disadvantage over time [1]. Foundational works have demonstrated how ethnic and national hierarchies shape access to power, life opportunities, autonomy, and sovereignty, for example, primarily through institutionalized mechanisms such as legal structures, educational systems, and healthcare access, among others [2].


CURE: Cultural Understanding and Reasoning Evaluation - A Framework for "Thick" Culture Alignment Evaluation in LLMs

Vo, Truong, Koyejo, Sanmi

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly deployed in culturally diverse environments, yet existing evaluations of cultural competence remain limited. Existing methods focus on de-contextualized correctness or forced-choice judgments, overlooking the need for cultural understanding and reasoning required for appropriate responses. To address this gap, we introduce a set of benchmarks that, instead of directly probing abstract norms or isolated statements, present models with realistic situational contexts that require culturally grounded reasoning. In addition to the standard Exact Match metric, we introduce four complementary metrics (Coverage, Specificity, Connotation, and Coherence) to capture different dimensions of model's response quality. Empirical analysis across frontier models reveals that thin evaluation systematically overestimates cultural competence and produces unstable assessments with high variance. In contrast, thick evaluation exposes differences in reasoning depth, reduces variance, and provides more stable, interpretable signals of cultural understanding.



A Missing Proofs

Neural Information Processing Systems

Proposition 2. F or a given group a A, gradient norms can be upper bounded as: g Proposition 3. Consider a binary classifier B.1 Datasets The paper uses the following datasets to validate the findings discussed in the main paper: The experiments adopt the following attributes for classification (e.g., Y) and as protected group ( A): ethnicity, age bins, gender. B.2 Architectures, Hyper-parameters, and Settings The study adopts the following architectures to validate the results of the main paper: The model has 11 million trainable parameters. ResNet50 This model contains 48 convolution layers, 1 MaxPool layer and a AvgPool layer. ResNet50 has 25 million trainable parameters. VGG-19 This model consists of 19 layers (16 convolution layers, 3 fully connected layers, 5 MaxPool layers and 1 SoftMax layer).




Appendix Uncovering and Quantifying Social Biases in Code Generation

Neural Information Processing Systems

We conduct a preliminary study on finding a proper prompt construction strategy. Further research can utilize our analysis to construct more powerful code prompts. Table 1: Code prompt study results of CBS. N" means there are one human-relevant function Table 2: Automatic and human evaluation results of social biases in the generated code on GPT -4. We also conduct experiments on GPT -4.