Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models

Tong, Schrasing, Zemour, Eliott, Lohanimit, Rawisara, Kagal, Lalana

Dec-2-2024–arXiv.org Artificial Intelligence

Although large language models (LLMs) have demonstrated their effectiveness in a wide range of applications, they have also been observed to perpetuate unwanted biases present in the training data, potentially leading to harm for marginalized communities. In this paper, we mitigate bias by leveraging small biased and anti-biased expert models to obtain a debiasing signal that will be added to the LLM output at decoding-time. This approach combines resource efficiency with interpretability and can be optimized for mitigating specific types of bias, depending on the target use case. Experiments on mitigating gender, race, and religion biases show a reduction in bias on several local and global bias metrics while preserving language model performance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-2-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County
    - Seattle (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.05)
- Europe
  - Italy
    - Tuscany > Florence (0.04)
    - Calabria > Catanzaro Province
      - Catanzaro (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.30)