Augmenting Bias Detection in LLMs Using Topological Data Analysis

Varadarajan, Keshav, Songdechakraiwut, Tananun

Aug-12-2025–arXiv.org Artificial Intelligence

Recently, many bias detection methods have been proposed to determine the level of bias a large language model captures. However, tests to identify which parts of a large language model are responsible for bias towards specific groups remain underdeveloped. In this study, we present a method using topological data analysis to identify which heads in GPT-2 contribute to the misrepresentation of identity groups present in the StereoSet dataset. We find that biases for particular categories, such as gender or profession, are concentrated in attention heads that act as hot spots. The metric we propose can also be used to determine which heads capture bias for a specific group within a bias category, and future work could extend this method to help de-bias large language models.

category, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Aug-12-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Lebanon (0.04)
- Europe > Norway (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - North Carolina (0.04)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.34)
  - Natural Language > Large Language Model (1.00)