"Im not Racist but...": Discovering Bias in the Internal Knowledge of Large Language Models

Salinas, Abel, Penafiel, Louis, McCormack, Robert, Morstatter, Fred

Oct-12-2023–arXiv.org Artificial Intelligence

Large language models (LLMs) have garnered significant attention for their remarkable performance in a continuously expanding set of natural language processing tasks. However, these models have been shown to harbor inherent societal biases, or stereotypes, which can adversely affect their performance in their many downstream applications. In this paper, we introduce a novel, purely prompt-based approach to uncover hidden stereotypes within any arbitrary LLM. Our approach dynamically generates a knowledge representation of internal stereotypes, enabling the identification of biases encoded within the LLM's internal knowledge. By illuminating the biases present in LLMs and offering a systematic methodology for their analysis, our work contributes to advancing transparency and promoting fairness in natural language processing systems.

discovering bias, internal knowledge, language model

arXiv.org Artificial Intelligence

Oct-12-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Law > Civil Rights & Constitutional Law (0.40)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found