Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

Mackraz, Natalie, Sivakumar, Nivedha, Khorshidi, Samira, Patel, Krishna, Theobald, Barry-John, Zappella, Luca, Apostoloff, Nicholas

Dec-4-2024–arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy models in real-world systems. In contrast to previous works, we establish that intrinsic biases in pre-trained Mistral, Falcon and Llama models are strongly correlated (rho >= 0.94) with biases when the same models are zero- and few-shot prompted, using a pronoun co-reference resolution task. Further, we find that bias transfer remains strongly correlated even when LLMs are specifically prompted to exhibit fair or biased behavior (rho >= 0.92), and few-shot length and stereotypical composition are varied (rho >= 0.97). Our findings highlight the importance of ensuring fairness in pre-trained LLMs, especially when they are later used to perform downstream tasks via prompt adaptation.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

Dec-4-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Government (0.47)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)