Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions
Thakur, Himanshu, Jain, Atishay, Vaddamanu, Praneetha, Liang, Paul Pu, Morency, Louis-Philippe
–arXiv.org Artificial Intelligence
Societal biases present in pre-trained large language models are a critical issue as these models have been shown to propagate biases in countless downstream applications, rendering them unfair towards specific groups of people. Since large-scale retraining of these models from scratch is both time and compute-expensive, a variety of approaches have been previously proposed that de-bias a pre-trained model. While the majority of current state-of-the-art debiasing methods focus on changes to the training regime, in this paper, we propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models. Specifically, we empirically show that by fine-tuning a pre-trained model on only 10 de-biased (intervened) training examples, the tendency to favor any gender is significantly reduced. Since our proposed method only needs a few training examples, our few-shot debiasing approach is highly feasible and practical. Through extensive experimentation, we show that our debiasing technique performs better than competitive state-of-the-art baselines with minimal loss in language modeling ability.
arXiv.org Artificial Intelligence
Jun-7-2023
- Country:
- Asia > China
- Hong Kong (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France (0.04)
- Germany > Brandenburg
- Potsdam (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Belgium > Brussels-Capital Region
- North America
- Dominican Republic (0.04)
- United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Washington > King County
- Seattle (0.04)
- Minnesota > Hennepin County
- Asia > China
- Genre:
- Research Report (0.82)
- Technology: