gender identity
Value Drifts: Tracing Value Alignment During LLM Post-Training
Bhatia, Mehar, Nayak, Shravan, Kamath, Gaurav, Mosbach, Marius, Stańczak, Karolina, Shwartz, Vered, Reddy, Siva
As LLMs occupy an increasingly important role in society, they are more and more confronted with questions that require them not only to draw on their general knowledge but also to align with certain human value systems. Therefore, studying the alignment of LLMs with human values has become a crucial field of inquiry. Prior work, however, mostly focuses on evaluating the alignment of fully trained models, overlooking the training dynamics by which models learn to express human values. In this work, we investigate how and at which stage value alignment arises during the course of a model's post-training. Our analysis disentangles the effects of post-training algorithms and datasets, measuring both the magnitude and time of value drifts during training. Experimenting with Llama-3 and Qwen-3 models of different sizes and popular supervised fine-tuning (SFT) and preference optimization datasets and algorithms, we find that the SFT phase generally establishes a model's values, and subsequent preference optimization rarely re-aligns these values. Furthermore, using a synthetic preference dataset that enables controlled manipulation of values, we find that different preference optimization algorithms lead to different value alignment outcomes, even when preference data is held constant. Our findings provide actionable insights into how values are learned during post-training and help to inform data curation, as well as the selection of models and algorithms for preference optimization to improve model alignment to human values.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (10 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
A Fairness metric
Dynabench comprises four dynamic tasks with multiple rounds of datasets that will grow over time. These names cover 85.6 percent of the U.S. population, are based on the 1990 Census information on first name frequencies [ Note: there is nothing inherently "racial" about particular names--for example, each demographic group had at least a few people named "Anna" or "Benjamin" [ For gender identity, we investigate two kinds of perturbations: names and noun phrases. Social Security Association's Lists of Baby Names (1980-2019), and performed perturbations as For noun phrases (i.e., pronouns and nouns), we adopted a slightly more "her" with a noun like "dad" and expect no effect on the classification label). Given that our perturbations are heuristic, some noise is to be expected. Consider "I've always enjoyed eating at Red Robin " being perturbed to "I've always enjoyed eating at Red Kayla ".
Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models
Tang, Xushuo, Ding, Yi, Yang, Zhengyi, Chen, Yin, Gu, Yongrui, Yang, Wenke, Ju, Mingchen, Cao, Xin, Liu, Yongfei, Zhang, Wenjie
Large language models (LLMs) are increasingly deployed in sensitive contexts where fairness and inclusivity are critical. Pronoun usage, especially concerning gender-neutral and neopronouns, remains a key challenge for responsible AI. Prior work, such as the MISGENDERED benchmark, revealed significant limitations in earlier LLMs' handling of inclusive pronouns, but was constrained to outdated models and limited evaluations. In this study, we introduce MISGENDERED+, an extended and updated benchmark for evaluating LLMs' pronoun fidelity. We benchmark five representative LLMs, GPT-4o, Claude 4, DeepSeek-V3, Qwen Turbo, and Qwen2.5, across zero-shot, few-shot, and gender identity inference. Our results show notable improvements compared with previous studies, especially in binary and gender-neutral pronoun accuracy. However, accuracy on neopronouns and reverse inference tasks remains inconsistent, underscoring persistent gaps in identity-sensitive reasoning. We discuss implications, model-specific observations, and avenues for future inclusive AI research.
Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language Models
Shan, Zhengyang, Diana, Emily Ruth, Zhou, Jiawei
We present a comprehensive evaluation of gender fairness in large language models (LLMs), focusing on their ability to handle both binary and non-binary genders. While previous studies primarily focus on binary gender distinctions, we introduce the Gender Inclusivity Fairness Index (GIFI), a novel and comprehensive metric that quantifies the diverse gender inclusivity of LLMs. GIFI consists of a wide range of evaluations at different levels, from simply probing the model with respect to provided gender pronouns to testing various aspects of model generation and cognitive behaviors under different gender assumptions, revealing biases associated with varying gender identifiers. We conduct extensive evaluations with GIFI on 22 prominent open-source and proprietary LLMs of varying sizes and capabilities, discovering significant variations in LLMs' gender inclusivity. Our study highlights the importance of improving LLMs' inclusivity, providing a critical benchmark for future advancements in gender fairness in generative models.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- (15 more...)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (0.92)
- Information Technology (0.67)
Longitudinal Study on Social and Emotional Use of AI Conversational Agent
Chandra, Mohit, Hernandez, Javier, Ramos, Gonzalo, Ershadi, Mahsa, Bhattacharjee, Ananya, Amores, Judith, Okoli, Ebele, Paradiso, Ann, Warreth, Shahed, Suh, Jina
Development in digital technologies has continuously reshaped how individuals seek and receive social and emotional support. While online platforms and communities have long served this need, the increased integration of general-purpose conversational AI into daily lives has introduced new dynamics in how support is provided and experienced. Existing research has highlighted both benefits (e.g., wider access to well-being resources) and potential risks (e.g., over-reliance) of using AI for support seeking. In this five-week, exploratory study, we recruited 149 participants divided into two usage groups: a baseline usage group (BU, n=60) that used the internet and AI as usual, and an active usage group (AU, n=89) encouraged to use one of four commercially available AI tools (Microsoft Copilot, Google Gemini, PI AI, ChatGPT) for social and emotional interactions. Our analysis revealed significant increases in perceived attachment towards AI (32.99 percentage points), perceived AI empathy (25.8 p.p.), and motivation to use AI for entertainment (22.90 p.p.) among the AU group. We also observed that individual differences (e.g., gender identity, prior AI usage) influenced perceptions of AI empathy and attachment. Lastly, the AU group expressed higher comfort in seeking personal help, managing stress, obtaining social support, and talking about health with AI, indicating potential for broader emotional support while highlighting the need for safeguards against problematic usage. Overall, our exploratory findings underscore the importance of developing consumer-facing AI tools that support emotional well-being responsibly, while empowering users to understand the limitations of these tools.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Health & Medicine > Consumer Health (1.00)
Sensing and Steering Stereotypes: Extracting and Applying Gender Representation Vectors in LLMs
Cyberey, Hannah, Ji, Yangfeng, Evans, David
Large language models (LLMs) are known to perpetuate stereotypes and exhibit biases. Various strategies have been proposed to mitigate potential harms that may result from these biases, but most work studies biases in LLMs as a black-box problem without considering how concepts are represented within the model. We adapt techniques from representation engineering to study how the concept of "gender" is represented within LLMs. We introduce a new method that extracts concept representations via probability weighting without labeled data and efficiently selects a steering vector for measuring and manipulating the model's representation. We also present a projection-based method that enables precise steering of model predictions and demonstrate its effectiveness in mitigating gender bias in LLMs.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (6 more...)
- Health & Medicine (1.00)
- Information Technology (0.68)
Revisiting gender bias research in bibliometrics: Standardizing methodological variability using Scholarly Data Analysis (SoDA) Cards
Lee, HaeJin, Mishra, Shubhanshu, Mishra, Apratim, You, Zhiwen, Kim, Jinseok, Diesner, Jana
Gender biases in scholarly metrics remain a persistent concern, despite numerous bibliometric studies exploring their presence and absence across productivity, impact, acknowledgment, and self-citations. However, methodological inconsistencies, particularly in author name disambiguation and gender identification, limit the reliability and comparability of these studies, potentially perpetuating misperceptions and hindering effective interventions. A review of 70 relevant publications over the past 12 years reveals a wide range of approaches, from name-based and manual searches to more algorithmic and gold-standard methods, with no clear consensus on best practices. This variability, compounded by challenges such as accurately disambiguating Asian names and managing unassigned gender labels, underscores the urgent need for standardized and robust methodologies. To address this critical gap, we propose the development and implementation of ``Scholarly Data Analysis (SoDA) Cards." These cards will provide a structured framework for documenting and reporting key methodological choices in scholarly data analysis, including author name disambiguation and gender identification procedures. By promoting transparency and reproducibility, SoDA Cards will facilitate more accurate comparisons and aggregations of research findings, ultimately supporting evidence-informed policymaking and enabling the longitudinal tracking of analytical approaches in the study of gender and other social biases in academia.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > North Korea (0.14)
- North America > Canada > Quebec (0.04)
- (18 more...)
- Research Report > New Finding (0.93)
- Overview (0.93)
- Health & Medicine > Therapeutic Area (1.00)
- Government (0.93)
- Law > Civil Rights & Constitutional Law (0.69)
- Education > Educational Setting > Higher Education (0.46)
Pronoun Logic
Particularly in transgender and nonbinary (TGNB) communities, it is an increasingly common practice to publicly share one's personal pronouns so that we may be gendered correctly in others' speech. Many of us have nuanced desires for how we are gendered, leading us to use more complex descriptions of our wishes; for example, the descriptor 'she/they'. We observe that these descriptions of our wishes have the structure of a little language all their own. We thus propose formal logic as a tool for expressing one's personal pronouns and potentially other aspects of gender. We explore three potential logical foundations (linear logic, temporal logic, and free logic with definite descriptions) and their trade-offs. Our foremost motivation for this proposal is play, affirming that one can be both a logician and TGNB at the same time. We present formalization as something that can continue to evolve over time with society's understanding of gender. This implies that outreach is a major potential application: we can show TGNB youth that they belong in logic and have a unique contribution to make. Tools for evaluating whether one's pronouns are respected are an application as well.
- South America > Brazil > Rio Grande do Norte > Natal (0.04)
- North America > United States > Massachusetts > Worcester County > Worcester (0.04)
- Europe > France (0.04)
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
Tang, Kunsheng, Zhou, Wenbo, Zhang, Jie, Liu, Aishan, Deng, Gelei, Li, Shuai, Qi, Peigui, Zhang, Weiming, Zhang, Tianwei, Yu, Nenghai
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed to assess gender bias in LLMs comprehensively. Our benchmark provides standardized and realistic evaluations, including previously overlooked gender groups such as transgender and non-binary individuals. Furthermore, we develop effective debiasing techniques that incorporate counterfactual data augmentation and specialized fine-tuning strategies to reduce gender bias in LLMs without compromising their overall performance. Extensive experiments demonstrate a significant reduction in various gender bias benchmarks, with reductions peaking at over 90% and averaging above 35% across 17 different LLMs. Importantly, these reductions come with minimal variability in mainstream language tasks, remaining below 2%. By offering a realistic assessment and tailored reduction of gender biases, we hope that our GenderCARE can represent a significant step towards achieving fairness and equity in LLMs. More details are available at https://github.com/kstanghere/GenderCARE-ccs24.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- Asia > China > Anhui Province > Hefei (0.04)
- (17 more...)
- Overview (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Law > Civil Rights & Constitutional Law (0.67)