BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models

Abhishek, Alok, Erickson, Lisa, Bandopadhyay, Tushar

Mar-31-2025–arXiv.org Artificial Intelligence

In this research, we introduce BEA TS, a novel framework for evaluating Bias, Ethics, Fairness, and Factuality in Large Language Models (LLMs). Building upon the BEA TS framework, we present a bias benchmark for LLMs that measure performance across 29 distinct metrics. These metrics span a broad range of characteristics, including demographic, cognitive, and social biases, as well as measures of ethical reasoning, group fairness, and factuality related misinformation risk. These metrics enable a quantitative assessment of the extent to which LLM generated responses may perpetuate societal prejudices that reinforce or expand systemic inequities. To achieve a high score on this benchmark a LLM must show very equitable behavior in their responses, making it a rigorous standard for responsible AI evaluation. Empirical results based on data from our experiment show that, 37.65% of outputs generated by industry leading models contained some form of bias, highlighting a substantial risk of using these models in critical decision making systems. BEA TS framework and benchmark offer a scalable and statistically rigorous methodology to benchmark LLMs, diagnose factors driving biases, and develop mitigation strategies. With the BEA TS framework, our goal is to help the development of more socially responsible and ethically aligned AI models.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Mar-31-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - California > San Francisco County
    - San Francisco (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Switzerland > Basel-City
    - Basel (0.04)

Genre:
- Research Report
  - Experimental Study (0.69)
  - New Finding (0.67)

Industry:
- Law (0.67)
- Government > Regional Government (0.46)
- Media > News (0.36)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found