AITopics | absolute bias

Collaborating Authors

absolute bias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-Judge

Zhang, Yang, Wang, Cunxiang, Wu, Lindong, Yu, Wenbo, Wang, Yidong, Bao, Guangsheng, Tang, Jie

arXiv.org Artificial IntelligenceNov-18-2025

Pairwise evaluation of Large Language Models (LLMs) is a common paradigm, but it is prone to preference bias, where judges systematically favor certain outputs, such as their own. This bias leads to inconsistent and skewed rankings across different judges. To address this, we first empirically demonstrate significant and heterogeneous biases in cross-model evaluations. We then propose UDA (Unsupervised Debiasing Alignment), a framework that reduces inter-judge disagreement by dynamically adjusting the Elo rating system. For each pairwise comparison, a compact neural network learns to adaptively set the K-factor and refine win probabilities. Crucially, UDA operates in a fully unsupervised manner, guided solely by the objective of minimizing the dispersion among the Elo trajectories of all judges. This forces an alignment towards a collective consensus, which serves as an unsupervised proxy for a more stable and reproducible evaluation. In addition, we provide theoretical motivation demonstrating how alignment towards a consensus can reduce aggregate system bias. Experiments show that UDA significantly reduces the inter-judge rating standard deviation by up to 63.4% and improves the average correlation with human judgments by 24.7%. Notably, UDA elevates the performance of poorly performing judges to achieve parity with high-quality ones, fostering a more robust and reliable evaluation ecosystem. Code and data are available at https://anonymous.4open.science/r/62AB93CD-23B4.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.09724

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Chess (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

BiasAsker: Measuring the Bias in Conversational AI System

Wan, Yuxuan, Wang, Wenxuan, He, Pinjia, Gu, Jiazhen, Bai, Haonan, Lyu, Michael

arXiv.org Artificial IntelligenceMay-21-2023

Powered by advanced Artificial Intelligence (AI) techniques, conversational AI systems, such as ChatGPT and digital assistants like Siri, have been widely deployed in daily life. However, such systems may still produce content containing biases and stereotypes, causing potential social problems. Due to the data-driven, black-box nature of modern AI techniques, comprehensively identifying and measuring biases in conversational systems remains a challenging task. Particularly, it is hard to generate inputs that can comprehensively trigger potential bias due to the lack of data containing both social groups as well as biased properties. In addition, modern conversational systems can produce diverse responses (e.g., chatting and explanation), which makes existing bias detection methods simply based on the sentiment and the toxicity hardly being adopted. In this paper, we propose BiasAsker, an automated framework to identify and measure social bias in conversational AI systems. To obtain social groups and biased properties, we construct a comprehensive social bias dataset, containing a total of 841 groups and 8,110 biased properties. Given the dataset, BiasAsker automatically generates questions and adopts a novel method based on existence measurement to identify two types of biases (i.e., absolute bias and related bias) in conversational systems. Extensive experiments on 8 commercial systems and 2 famous research models, such as ChatGPT and GPT-3, show that 32.83% of the questions generated by BiasAsker can trigger biased behaviors in these widely deployed conversational systems. All the code, data, and experimental results have been released to facilitate future research.

biasasker, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.12434

Country:

North America > United States > California > San Francisco County > San Francisco (0.05)
Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance

Okasa, Gabriel

arXiv.org Machine LearningJan-29-2022

In recent years there has been a growing interest in the estimation of causal effects using machine learning algorithms, particularly in the field of economics (Athey, 2018). The newly emerging synthesis of machine learning methods with causal inference has a large potential for a more comprehensive estimation of causal effects (Lechner, 2018). On the one hand, it enables a more flexible estimation of average effects which are of main interest in microeconometrics (Imbens & Wooldridge, 2009). On the other hand, it advances the estimation beyond the average effects and allows for a systematic analysis of effect heterogeneity (Athey & Imbens, 2017). Both of these aspects contribute to a better description of the causal mechanisms and thus to a possibly more efficient treatment allocation (Zhao, Zeng, Rush, & Kosorok, 2012; Kitagawa & Tetenov, 2018; Athey & Wager, 2021; Nie, Brunskill, & Wager, 2021). Hence, applied empirical researchers can greatly benefit from the usage of machine learning methods ranging from evaluation of public policies and business decisions to designing personalized interventions (Andini, Ciani, de Blasio, D'Ignazio, & Salvestrini, 2018; Bansak et al., 2018). Machine learning estimators as such are, however, primarily designed for prediction problems and thus cannot be used directly for causal inference. Therefore, new approaches for the estimation of causal parameters using machine learning emerged (see Athey & Imbens, 2019, for an overview). In particular, the development of the so-called meta-learners have received considerable attention (see e.g.

estimation, sample size, treatment effect, (17 more...)

arXiv.org Machine Learning

2201.12692

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Italy (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Government (0.65)
Health & Medicine > Epidemiology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Decision Making with Differential Privacy under a Fairness Lens

Fioretto, Ferdinando, Tran, Cuong, Van Hentenryck, Pascal

arXiv.org Artificial IntelligenceMay-16-2021

Agencies, such as the U.S. Census Bureau, release data sets and statistics about groups of individuals that are used as input to a number of critical decision processes. To conform with privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data. This paper studies the release of differentially private data sets and analyzes their impact on some critical resource allocation tasks under a fairness perspective. The paper shows that, when the decisions take as input differentially private data, the noise added to achieve privacy disproportionately impacts some groups over others. The paper analyzes the reasons for these disproportionate impacts and proposes guidelines to mitigate these effects. The proposed approaches are evaluated on critical decision problems that use differentially private census data.

absolute bias, differential privacy, fairness, (15 more...)

arXiv.org Artificial Intelligence

2105.07513

Country:

North America > United States > Texas > Loving County (0.04)
North America > United States > New York (0.04)
North America > United States > Texas > Terrell County (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Missing Mass of Rank-2 Markov Chains

Chandra, Prafulla, Thangaraj, Andrew, Rajaraman, Nived

arXiv.org Machine LearningFeb-6-2021

Estimation of missing mass with the popular Good-Turing (GT) estimator is well-understood in the case where samples are independent and identically distributed (iid). In this article, we consider the same problem when the samples come from a stationary Markov chain with a rank-2 transition matrix, which is one of the simplest extensions of the iid case. We develop an upper bound on the absolute bias of the GT estimator in terms of the spectral gap of the chain and a tail bound on the occupancy of states. Borrowing tail bounds from known concentration results for Markov chains, we evaluate the bound using other parameters of the chain. The analysis, supported by simulations, suggests that, for rank-2 irreducible chains, the GT estimator has bias and mean-squared error falling with number of samples at a rate that depends loosely on the connectivity of the states in the chain.

estimator, inequality, markov chain, (15 more...)

arXiv.org Machine Learning

2102.01938

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > India > Tamil Nadu > Chennai (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Statistical Estimation of Malware Detection Metrics in the Absence of Ground Truth

Du, Pang, Sun, Zheyuan, Chen, Huashan, Cho, Jin-Hee, Xu, Shouhuai

arXiv.org Machine LearningSep-23-2018

The accurate measurement of security metrics is a critical research problem because an improper or inaccurate measurement process can ruin the usefulness of the metrics, no matter how well they are defined. This is a highly challenging problem particularly when the ground truth is unknown or noisy. In contrast to the well perceived importance of defining security metrics, the measurement of security metrics has been little understood in the literature. In this paper, we measure five malware detection metrics in the {\em absence} of ground truth, which is a realistic setting that imposes many technical challenges. The ultimate goal is to develop principled, automated methods for measuring these metrics at the maximum accuracy possible. The problem naturally calls for investigations into statistical estimators by casting the measurement problem as a {\em statistical estimation} problem. We propose statistical estimators for these five malware detection metrics. By investigating the statistical properties of these estimators, we are able to characterize when the estimators are accurate, and what adjustments can be made to improve them under what circumstances. We use synthetic data with known ground truth to validate these statistical estimators. Then, we employ these estimators to measure five metrics with respect to a large dataset collected from VirusTotal. We believe our study touches upon a vital problem that has not been paid due attention and will inspire many future investigations.

artificial intelligence, detector, machine learning, (18 more...)

arXiv.org Machine Learning

1810.0726

Country: North America > United States (1.00)

Genre:

Research Report (1.00)
Personal > Honors (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.46)

Add feedback