AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.63)

Sivakumar, Nivedha, Mackraz, Natalie, Khorshidi, Samira, Patel, Krishna, Theobald, Barry-John, Zappella, Luca, Apostoloff, Nicholas

Bias after Prompting: Persistent Discrimination in Large Language Models

arXiv.org Artificial IntelligenceNov-20-2025

A dangerous assumption that can be made from prior work on the bias transfer hypothesis (BTH) is that biases do not transfer from pre-trained large language models (LLMs) to adapted models. We invalidate this assumption by studying the BTH in causal models under prompt adaptations, as prompting is an extremely popular and accessible adaptation strategy used in real-world applications. In contrast to prior work, we find that biases can transfer through prompting and that popular prompt-based mitigation methods do not consistently prevent biases from transferring. Specifically, the correlation between intrinsic biases and those after prompt adaptation remain moderate to strong across demographics and tasks -- for example, gender (rho >= 0.94) in co-reference resolution, and age (rho >= 0.98) and religion (rho >= 0.69) in question answering. Further, we find that biases remain strongly correlated when varying few-shot composition parameters, such as sample size, stereotypical content, occupational distribution and representational balance (rho >= 0.90). We evaluate several prompt-based debiasing strategies and find that different approaches have distinct strengths, but none consistently reduce bias transfer across models, tasks or demographics. These results demonstrate that correcting bias, and potentially improving reasoning ability, in intrinsic models may prevent propagation of biases to downstream tasks.

artificial intelligence, large language model, natural language, (18 more...)

2509.08146

Country:

North America > United States (1.00)
Europe (0.93)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Industry: Government (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Neural Information Processing SystemsOct-10-2025, 06:05:02 GMT

72462f451057f9e782af2c3c04c5a4b0-Paper-Conference.pdf

arxiv preprint arxiv, dataset, query, (13 more...)

Country:

North America > United States > Virginia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Neural Information Processing SystemsMay-27-2025, 13:27:48 GMT

DeNetDM: Debiasing by Network Depth Modulation

Neural networks trained on biased datasets tend to inadvertently learn spurious correlations, hindering generalization. We formally prove that (1) samples that exhibit spurious correlations lie on a lower rank manifold relative to the ones that do not; and (2) the depth of a network acts as an implicit regularizer on the rank of the attribute subspace that is encoded in its representations. Leveraging these insights, we present DeNetDM, a novel debiasing method that uses network depth modulation as a way of developing robustness to spurious correlations. Using a training paradigm derived from Product of Experts, we create both biased and debiased branches with deep and shallow architectures and then distill knowledge to produce the target debiased model. Our method requires no bias annotations or explicit data augmentation while performing on par with approaches that require either or both.

debiasing, denetdm, network depth modulation, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsMay-26-2025, 18:44:06 GMT

A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models

artificial intelligence, machine learning model, near-optimal algorithm, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

arXiv.org Artificial IntelligenceJan-28-2025

DebiasPI: Inference-time Debiasing by Prompt Iteration of a Text-to-Image Generative Model

Bonna, Sarah, Huang, Yu-Cheng, Novozhilova, Ekaterina, Paik, Sejin, Shan, Zhengyang, Feng, Michelle Yilin, Gao, Ge, Tayal, Yonish, Kulkarni, Rushil, Yu, Jialin, Divekar, Nupur, Ghadiyaram, Deepti, Wijaya, Derry, Betke, Margrit

Ethical intervention prompting has emerged as a tool to counter demographic biases of text-to-image generative AI models. Existing solutions either require to retrain the model or struggle to generate images that reflect desired distributions on gender and race. We propose an inference-time process called DebiasPI for Debiasing-by-Prompt-Iteration that provides prompt intervention by enabling the user to control the distributions of individuals' demographic attributes in image generation. DebiasPI keeps track of which attributes have been generated either by probing the internal state of the model or by using external attribute classifiers. Its control loop guides the text-to-image model to select not yet sufficiently represented attributes, With DebiasPI, we were able to create images with equal representations of race and gender that visualize challenging concepts of news headlines. We also experimented with the attributes age, body type, profession, and skin tone, and measured how attributes change when our intervention prompt targets the distribution of an unrelated attribute type. We found, for example, if the text-to-image model is asked to balance racial representation, gender representation improves but the skin tone becomes less diverse. Attempts to cover a wide range of skin colors with various intervention prompts showed that the model struggles to generate the palest skin tones. We conducted various ablation studies, in which we removed DebiasPI's attribute control, that reveal the model's propensity to generate young, male characters.

debiaspi, machine learning, natural language, (19 more...)

2501.18642

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.69)

arXiv.org Artificial IntelligenceNov-6-2024

BendVLM: Test-Time Debiasing of Vision-Language Embeddings

Gerych, Walter, Zhang, Haoran, Hamidieh, Kimia, Pan, Eileen, Sharma, Maanas, Hartvigsen, Thomas, Ghassemi, Marzyeh

Vision-language model (VLM) embeddings have been shown to encode biases present in their training data, such as societal biases that prescribe negative characteristics to members of various racial and gender identities. VLMs are being quickly adopted for a variety of tasks ranging from few-shot classification to text-guided image generation, making debiasing VLM embeddings crucial. Debiasing approaches that fine-tune the VLM often suffer from catastrophic forgetting. On the other hand, fine-tuning-free methods typically utilize a "one-size-fits-all" approach that assumes that correlation with the spurious attribute can be explained using a single linear direction across all possible inputs. In this work, we propose Bend-VLM, a nonlinear, fine-tuning-free approach for VLM embedding debiasing that tailors the debiasing operation to each unique input. This allows for a more flexible debiasing approach. Additionally, we do not require knowledge of the set of inputs a priori to inference time, making our method more appropriate for online, open-set tasks such as retrieval and text guided image generation.

arxiv preprint arxiv, dataset, query, (13 more...)

2411.0442

Country:

North America > United States > Virginia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia (0.04)

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Neural Information Processing SystemsOct-10-2024, 04:18:55 GMT

A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models

debiasing, machine learning model, near-optimal algorithm

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Lee, Yeon-Chang, Shin, Hojung, Kim, Sang-Wook

Disentangling, Amplifying, and Debiasing: Learning Disentangled Representations for Fair Graph Neural Networks

arXiv.org Artificial IntelligenceAug-23-2024

Graph Neural Networks (GNNs) have become essential tools for graph representation learning in various domains, such as social media and healthcare. However, they often suffer from fairness issues due to inherent biases in node attributes and graph structure, leading to unfair predictions. To address these challenges, we propose a novel GNN framework, DAB-GNN, that Disentangles, Amplifies, and deBiases attribute, structure, and potential biases in the GNN mechanism. DAB-GNN employs a disentanglement and amplification module that isolates and amplifies each type of bias through specialized disentanglers, followed by a debiasing module that minimizes the distance between subgroup distributions to ensure fairness. Extensive experiments on five datasets demonstrate that DAB-GNN significantly outperforms ten state-of-the-art competitors in terms of achieving an optimal balance between accuracy and fairness.

disentangling, fair graph neural network, learning disentangled representation, (2 more...)

2408.12875

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Jain, Saachi, Hamidieh, Kimia, Georgiev, Kristian, Ilyas, Andrew, Ghassemi, Marzyeh, Madry, Aleksander

Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection

arXiv.org Machine LearningJun-24-2024

Machine learning models can fail on subgroups that are underrepresented during training. While techniques such as dataset balancing can improve performance on underperforming groups, they require access to training group annotations and can end up removing large portions of the dataset. In this paper, we introduce Data Debiasing with Datamodels (D3M), a debiasing approach which isolates and removes specific training examples that drive the model's failures on minority groups. Our approach enables us to efficiently train debiased classifiers while removing only a small number of examples, and does not require training group annotations or additional hyperparameter tuning.

accuracy, arxiv preprint arxiv, dataset, (13 more...)

arXiv.org Machine Learning

2406.16846

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)