Auditing Large Language Models for Enhanced Text-Based Stereotype Detection and Probing-Based Bias Evaluation
Wu, Zekun, Bulathwela, Sahan, Perez-Ortiz, Maria, Koshiyama, Adriano Soares
–arXiv.org Artificial Intelligence
Recent advancements in Large Language Models (LLMs) have significantly increased their presence in human-facing Artificial Intelligence (AI) applications. However, LLMs could reproduce and even exacerbate stereotypical outputs from training data. This work introduces the Multi-Grain Stereotype (MGS) dataset, encompassing 51,867 instances across gender, race, profession, religion, and stereotypical text, collected by fusing multiple previously publicly available stereotype detection datasets. We explore different machine learning approaches aimed at establishing baselines for stereotype detection, and fine-tune several language models of various architectures and model sizes, presenting in this work a series of stereotypes classifier models for English text trained on MGS. To understand whether our stereotype detectors capture relevant features (aligning with human common sense) we utilise a variety of explanainable AI tools, including SHAP, LIME, and BertViz, and analyse a series of example cases discussing the results. Finally, we develop a series of stereotype elicitation prompts and evaluate the presence of stereotypes in text generation tasks with popular LLMs, using one of our best performing previously presented stereotypes detectors. Our experiments yielded several key findings: i) Training stereotype detectors in a multi-dimension setting yields better results than training multiple single-dimension classifiers.ii) The integrated MGS Dataset enhances both the in-dataset and cross-dataset generalisation ability of stereotype detectors compared to using the datasets separately.
arXiv.org Artificial Intelligence
Apr-2-2024
- Country:
- Oceania > Australia (0.04)
- South America
- North America
- Puerto Rico (0.04)
- Guatemala (0.04)
- United States
- Virginia (0.04)
- Nebraska (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Europe
- Middle East (0.04)
- Russia (0.04)
- Germany (0.04)
- Spain (0.04)
- Albania (0.04)
- Greece (0.04)
- Romania (0.04)
- Sweden (0.04)
- Portugal (0.04)
- Italy > Tuscany
- Florence (0.04)
- United Kingdom > England
- Greater London > London (0.14)
- France > Île-de-France
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- Taiwan (0.14)
- Singapore (0.04)
- India (0.04)
- Russia (0.04)
- Afghanistan (0.04)
- Laos (0.04)
- Bangladesh (0.04)
- China (0.04)
- Indonesia (0.04)
- Japan (0.04)
- Vietnam (0.04)
- Nepal (0.04)
- Middle East
- Africa
- Ghana (0.04)
- Kenya (0.04)
- South Africa (0.04)
- Cabo Verde (0.04)
- Sudan (0.04)
- Ethiopia (0.04)
- Sierra Leone (0.04)
- Southern Africa (0.04)
- Middle East
- Genre:
- Personal > Interview (0.46)
- Research Report
- New Finding (0.46)
- Experimental Study (0.46)
- Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Consumer Products & Services (1.00)
- Leisure & Entertainment (1.00)
- Law (1.00)
- Health & Medicine (1.00)
- Government (1.00)
- Retail (1.00)
- Media (1.00)
- Education > Educational Setting (0.92)
- Transportation > Ground
- Road (0.46)
- Technology: