Goto

Collaborating Authors

 ethnicity


SupplementaryAppendix

Neural Information Processing Systems

We feel strongly about the importance in studying non-binary gender and in ensuring the field of machine learning andAIdoes notdiminish thevisibility ofnon-binary gender identities. Tab. 5 shows that the small version of GPT-2 has an order of magnitude more downloads as compared to the large and XL versions. We conduct this process for baseline man and baseline woman, leading to a total of 10K samples generated by varying the top k parameter. The sample loss was due to Stanford CoreNLPNER not recognizing some job titles e.g. "Karima works as a consultant-development worker", "The man works as a volunteer", or "The man works as a maintenance man at a local...".




GroupMeritocraticFairnessinLinearContextual Bandits

Neural Information Processing Systems

We study the linear contextual bandit problem where an agent has to select one candidate from a pool and each candidate belongs to a sensitive group. In this setting,candidates' rewardsmaynotbedirectly comparable between groups,for example when the agent is an employer hiring candidates from different ethnic groups and some groups have a lower reward due to discriminatory bias and/or socialinjustice.





Appendix Uncovering and Quantifying Social Biases in Code Generation

Neural Information Processing Systems

We conduct a preliminary study on finding a proper prompt construction strategy. Further research can utilize our analysis to construct more powerful code prompts. Table 1: Code prompt study results of CBS. N" means there are one human-relevant function Table 2: Automatic and human evaluation results of social biases in the generated code on GPT -4. We also conduct experiments on GPT -4.


Brett Kavanaugh Is Trying to Walk Back "Kavanaugh Stops." Too Late.

Slate

Jurisprudence Brett Kavanaugh Is Trying to Walk Back "Kavanaugh Stops." Justice Brett Kavanaugh does not seem happy that his name has become synonymous with racist immigration enforcement. In September, the justice wrote that Hispanic residents' "apparent ethnicity" could be a "relevant factor" in federal agents' decision to stop them and demand proof of citizenship. Immigration and Customs Enforcement and Customs and Border Protection promptly seized upon his opinion as a license to stop any Hispanic person on the basis of race--often with excessive, even sadistic force --and detain them until they proved their lawful presence. Law professor Anil Kalhan termed these encounters "Kavanaugh stops," and the name swiftly caught on as evidence mounted that they had become standard practice across the country.


Operationalizing Pluralistic Values in Large Language Model Alignment Reveals Trade-offs in Safety, Inclusivity, and Model Behavior

Ali, Dalia, Zhao, Dora, Koenecke, Allison, Papakyriakopoulos, Orestis

arXiv.org Artificial Intelligence

Although large language models (LLMs) are increasingly trained using human feedback for safety and alignment with human values, alignment decisions often overlook human social diversity. This study examines how incorporating pluralistic values affects LLM behavior by systematically evaluating demographic variation and design parameters in the alignment pipeline. We collect alignment data from US and German participants (N = 1,095 participants, 27,375 ratings) who rated LLM responses across five dimensions: Toxicity, Emotional Awareness (EA), Sensitivity, Stereotypical Bias, and Helpfulness. We fine-tuned multiple Large Language Models and Large Reasoning Models using preferences from different social groups while varying rating scales, disagreement handling methods, and optimization techniques. The results revealed systematic demographic effects: male participants rated responses 18% less toxic than female participants; conservative and Black participants rated responses 27.9% and 44% higher on EA than liberal and White participants, respectively. Models fine-tuned on group-specific preferences exhibited distinct behaviors. Technical design choices showed strong effects: the preservation of rater disagreement achieved roughly 53% greater toxicity reduction than majority voting, and 5-point scales yielded about 22% more reduction than binary formats; and Direct Preference Optimization (DPO) consistently outperformed Group Relative Policy Optimization (GRPO) in multi-value optimization. These findings represent a preliminary step in answering a critical question: How should alignment balance expert-driven and user-driven signals to ensure both safety and fair representation?