AITopics | discovering bias

Collaborating Authors

discovering bias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Discovering Bias in Latent Space: An Unsupervised Debiasing Approach

Adila, Dyah, Zhang, Shuai, Han, Boran, Wang, Yuyang

arXiv.org Artificial IntelligenceJun-5-2024

The question-answering (QA) capabilities of foundation models are highly sensitive to prompt variations, rendering their performance susceptible to superficial, non-meaning-altering changes. This vulnerability often stems from the model's preference or bias towards specific input characteristics, such as option position or superficial image features in multi-modal settings. We propose to rectify this bias directly in the model's internal representation. Our approach, SteerFair, finds the bias direction in the model's representation space and steers activation values away from it during inference. Specifically, we exploit the observation that bias often adheres to simple association rules, such as the spurious association between the first option and correctness likelihood. Next, we construct demonstrations of these rules from unlabeled samples and use them to identify the bias directions. We empirically show that SteerFair significantly reduces instruction-tuned model performance variance across prompt modifications on three benchmark tasks. Remarkably, our approach surpasses a supervised baseline with 100 labels by an average of 10.86% accuracy points and 12.95 score points and matches the performance with 500 labels.

bias direction, dataset, discovering bias, (14 more...)

arXiv.org Artificial Intelligence

2406.03631

Country:

Europe > Austria > Vienna (0.14)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.05)
Asia > India > Tamil Nadu > Chennai (0.05)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

"Im not Racist but...": Discovering Bias in the Internal Knowledge of Large Language Models

Salinas, Abel, Penafiel, Louis, McCormack, Robert, Morstatter, Fred

arXiv.org Artificial IntelligenceOct-12-2023

Large language models (LLMs) have garnered significant attention for their remarkable performance in a continuously expanding set of natural language processing tasks. However, these models have been shown to harbor inherent societal biases, or stereotypes, which can adversely affect their performance in their many downstream applications. In this paper, we introduce a novel, purely prompt-based approach to uncover hidden stereotypes within any arbitrary LLM. Our approach dynamically generates a knowledge representation of internal stereotypes, enabling the identification of biases encoded within the LLM's internal knowledge. By illuminating the biases present in LLMs and offering a systematic methodology for their analysis, our work contributes to advancing transparency and promoting fairness in natural language processing systems.

internal knowledge, large language model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2310.0878

Genre: Research Report (0.40)

Industry: Law > Civil Rights & Constitutional Law (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Global explanations for discovering bias in data

Mikołajczyk, Agnieszka, Grochowski, Michał, Kwasigroch, Arkadiusz

arXiv.org Artificial IntelligenceMay-5-2020

In the paper, we propose attention-based summarized post-hoc explanations for detection and identification of bias in data. We propose a global explanation and introduce a step-by-step framework on how to detect and test bias. Then, the bias is evaluated with a proposed counterfactual approach to bias insertion. Because removing the unwanted bias is often a complicated and tremendous task, we automatically insert it, instead. We validate our results on the example of the skin lesion dataset. Using the method, we successfully identified and confirmed part of the possible bias-causing artifacts in dermoscopy images. We confirmed that the commonplace black frames in the training dataset images have a strong influence on the Convolutional Neural Network's prediction. After artificially adding a black frame to all images, around 22% of them changed the prediction from benign to malignant. We have shown that bias detection is an important step of making more robust models, and we discuss how to improve them

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2005.02269

Country:

North America > United States (0.04)
Europe > Poland > Pomerania Province > Gdańsk (0.04)

Genre:

Workflow (0.71)
Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Dermatology (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback