Goto

Collaborating Authors

 gender bia


Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models

Neural Information Processing Systems

Pre-training large language models (LLMs) on vast text corpora enhances natural language processing capabilities but risks encoding social biases, particularly gender bias. While parameter-modification methods like fine-tuning mitigate bias, they are resource-intensive, unsuitable for closed-source models, and lack adaptability to evolving societal norms. Instruction-based approaches offer flexibility but often compromise general performance on normal tasks. To address these limitations, we propose FaIRMaker, an automated and model-independent framework that employs an auto-search and refinement paradigm to adaptively generate Fairwords, which act as instructions to reduce gender bias and enhance response quality. FaIRMaker enhances the debiasing capacity by enlarging the Fairwords search space while preserving the utility and making it applicable to closed-source models by training a sequence-to-sequence model that adaptively refines Fairwords into effective debiasing instructions when facing gender-related queries and performance-boosting prompts for neutral inputs. Extensive experiments demonstrate that FaIRMaker effectively mitigates gender bias while preserving task integrity and ensuring compatibility with both open-and closed-source LLMs.


Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models

Neural Information Processing Systems

Pre-training large language models (LLMs) on vast text corpora enhances natural language processing capabilities but risks encoding social biases, particularly gender bias. While parameter-modification methods like fine-tuning mitigate bias, they are resource-intensive, unsuitable for closed-source models, and lack adaptability to evolving societal norms. Instruction-based approaches offer flexibility but often compromise general performance on normal tasks. To address these limitations, we propose $\textit{FaIRMaker}$, an automated and model-independent framework that employs an $\textbf{auto-search and refinement}$ paradigm to adaptively generate Fairwords, which act as instructions to reduce gender bias and enhance response quality.



Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Neural Information Processing Systems

The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between the words receptionist and female, while maintaining desired associations such as between the words queen and female. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.



InvestigatingGenderBiasinLanguageModels UsingCausalMediationAnalysis

Neural Information Processing Systems

A popular class of analysis methods, often called structural analysis, aims to extract this information using probing classifiers that predict linguistic properties from representations of trained models (e.g., Adi et al., 2017; Conneau et al., 2018; Hupkes et al., 2018; Tenney et al., 2019).


VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution

Neural Information Processing Systems

We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas, where each image is associated with a caption containing a pronoun relationship of subjects and objects in the scene. VisoGender is balanced by gender representation in professional roles, supporting bias evaluation in two ways: i) resolution bias, where we evaluate the difference between pronoun resolution accuracies for image subjects with gender presentations perceived as masculine versus feminine by human annotators and ii) retrieval bias, where we compare ratios of professionals perceived to have masculine and feminine gender presentations retrieved for a gender-neutral search query. We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes. While the direction and magnitude of gender bias depends on the task and the model being evaluated, captioning models are generally less biased than Vision-Language Encoders.


Investigating Gender Bias in Language Models Using Causal Mediation Analysis

Neural Information Processing Systems

Many interpretation methods for neural models in natural language processing investigate how information is encoded inside hidden representations. However, these methods can only measure whether the information exists, not whether it is actually used by the model. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. The approach enables us to analyze the mechanisms that facilitate the flow of information from input to output through various model components, known as mediators. As a case study, we apply this methodology to analyzing gender bias in pre-trained Transformer language models. We study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model's sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are concentrated in specific components of the model that may exhibit highly specialized behavior.


The Gender Code: Gendering the Global Governance of Artificial Intelligence

arXiv.org Artificial Intelligence

This paper examines how international AI governance frameworks address gender issues and gender-based harms. The analysis covers binding regulations, such as the EU AI Act; soft law instruments, like the UNESCO Recommendations on AI Ethics; and global initiatives, such as the Global Partnership on AI (GPAI). These instruments reveal emerging trends, including the integration of gender concerns into broader human rights frameworks, a shift toward explicit gender-related provisions, and a growing emphasis on inclusivity and diversity. Yet, some critical gaps persist, including inconsistent treatment of gender across governance documents, limited engagement with intersectionality, and a lack of robust enforcement mechanisms. However, this paper argues that effective AI governance must be intersectional, enforceable, and inclusive. This is key to moving beyond tokenism toward meaningful equity and preventing reinforcement of existing inequalities. The study contributes to ethical AI debates by highlighting the importance of gender-sensitive governance in building a just technological future.


What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models

arXiv.org Artificial Intelligence

Interpretability can be implemented as a means to understand decisions taken by (black box) models, such as machine translation (MT) or large language models (LLMs). Yet, research in this area has been limited in relation to a manifested problem in these models: gender bias. With this research, we aim to move away from simply measuring bias to exploring its origins. Working with gender-ambiguous natural source data, this study examines which context, in the form of input tokens in the source sentence, influences (or triggers) the translation model choice of a certain gender inflection in the target language. To analyse this, we use contrastive explanations and compute saliency attribution. We first address the challenge of a lacking scoring threshold and specifically examine different attribution levels of source words on the model gender decisions in the translation. We compare salient source words with human perceptions of gender and demonstrate a noticeable overlap between human perceptions and model attribution. Additionally, we provide a linguistic analysis of salient words. Our work showcases the relevance of understanding model translation decisions in terms of gender, how this compares to human decisions and that this information should be leveraged to mitigate gender bias.