compa
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- North America > United States > New York > New York County > New York City (0.07)
- North America > United States > Massachusetts > Middlesex County > Burlington (0.05)
- North America > United States > District of Columbia > Washington (0.05)
- (2 more...)
SM.1 Omittedproofs SM.1.1 ProofofProposition1 Proposition1. ThefunctionmC() = 2C(Mϵ()): X [1,c]satisfiesallpropertiesofapredictive multiplicitymetricinDefinition1
For clarity, we assume|Mϵ(xi)| = m. By the information inequality [1, Theorem 2.6.3] the mutual informationI(M;Y) between the random variablesM and Y (defined in Section 3) is non-negative, i.e.,I(M;Y) 0. On the other hand, we denote the c models in R(H,ϵ) which output scores are the "vertices" of c to be m1,,mc, then H(Y|M = mk) = 0, k [c]. H(Y|M) is minimized to 0 by setting the weightspm on those c models to be 1c and the rest to be0. Since this holds for the capacity-achievingPM, which in turn is the maximimum across input distributions,theconverseresultfollows. Theconsequence ofpredictivemultiplicity isthatthe sameindividual can betreated differently due toarbitrary and unjustified choices made during the training process (e.g., parameter initialization, random seed, dropoutprobability,etc.).
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
ad991bbc381626a8e44dc5414aa136a8-Supplemental-Conference.pdf
Figure 1 shows the change of accuracy under different cutoff value. However, for gender classification under CelebA dataset, thetrade-offbetweenλval and accuracyisnotveryclear;and wesuspect that under suchscenario, focusing on hard samples does not harm the performance of easy samples, and thus benefits the classifier. Figure 1 shows the change of fairness (equalized odds) under different cutoff value. Suppose we have a large unlabeled training set of sizeN and a small labeled validation set { xvalj,yvalj,1 j M} with M N. In each training step, we sample a small mini-batch of size n(n < N) from training set and perform random augmentation twice to obtain a subset { xi,1 i 2n} and we update the contrastive encoderf with parameterθ. During validation, we freeze the contrastive encoder and train a downstream linear classifierg with parameterω for classification task.
- North America > Canada > Quebec > Montreal (0.05)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Facets of Disparate Impact: Evaluating Legally Consistent Bias in Machine Learning
Briscoe, Jarren, Gebremedhin, Assefaw
Leveraging current legal standards, we define bias through the lens of marginal benefits and objective testing with the novel metric "Objective Fairness Index". This index combines the contextual nuances of objective testing with metric stability, providing a legally consistent and reliable measure. Utilizing the Objective Fairness Index, we provide fresh insights into sensitive machine learning applications, such as COMPAS (recidivism prediction), highlighting the metric's practical and theoretical significance. The Objective Fairness Index allows one to differentiate between discriminatory tests and systemic disparities.
- North America > United States > Idaho > Ada County > Boise (0.06)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > District of Columbia > Washington (0.04)
- (12 more...)
- Health & Medicine (1.00)
- Law > Labor & Employment Law (0.68)
- Government > Regional Government > North America Government > United States Government (0.68)
On the Origins of Sampling Bias: Implications on Fairness Measurement and Mitigation
Zhioua, Sami, Binkyte, Ruta, Ouni, Ayoub, Ktata, Farah Barika
Accurately measuring discrimination is crucial to faithfully assessing fairness of trained machine learning (ML) models. Any bias in measuring discrimination leads to either amplification or underestimation of the existing disparity. Several sources of bias exist and it is assumed that bias resulting from machine learning is born equally by different groups (e.g. females vs males, whites vs blacks, etc.). If, however, bias is born differently by different groups, it may exacerbate discrimination against specific sub-populations. Sampling bias, in particular, is inconsistently used in the literature to describe bias due to the sampling procedure. In this paper, we attempt to disambiguate this term by introducing clearly defined variants of sampling bias, namely, sample size bias (SSB) and underrepresentation bias (URB). Through an extensive set of experiments on benchmark datasets and using mainstream learning algorithms, we expose relevant observations in several model training scenarios. The observations are finally framed as actionable recommendations for practitioners.
- North America > United States (0.14)
- Africa > Middle East > Tunisia > Sousse Governorate > Sousse (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
- (3 more...)
- Information Technology > Security & Privacy (0.45)
- Law (0.34)
- Education (0.34)
Exploring Equality: An Investigation into Custom Loss Functions for Fairness Definitions
This paper explores the complex tradeoffs between various fairness metrics such as equalized odds, disparate impact, and equal opportunity and predictive accuracy within COMPAS by building neural networks trained with custom loss functions optimized to specific fairness criteria. This paper creates the first fairness-driven implementation of the novel Group Accuracy Parity (GAP) framework, as theoretically proposed by Gupta et al. (2024), and applies it to COMPAS. To operationalize and accurately compare the fairness of COMPAS models optimized to differing fairness ideals, this paper develops and proposes a combinatory analytical procedure that incorporates Pareto front and multivariate analysis, leveraging data visualizations such as violin graphs. This paper concludes that GAP achieves an enhanced equilibrium between fairness and accuracy compared to COMPAS's current nationwide implementation and alternative implementations of COMPAS optimized to more traditional fairness definitions. While this paper's algorithmic improvements of COMPAS significantly augment its fairness, external biases undermine the fairness of its implementation. Practices such as predictive policing and issues such as the lack of transparency regarding COMPAS's internal workings have contributed to the algorithm's historical injustice. In conjunction with developments regarding COMPAS's predictive methodology, legal and institutional changes must happen for COMPAS's just deployment.
- Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.49)
How Aligned are Generative Models to Humans in High-Stakes Decision-Making?
Tan, Sarah, Mallari, Keri, Adebayo, Julius, Gordo, Albert, Wells, Martin T., Inkpen, Kori
Large generative models (LMs) are increasingly being considered for high-stakes decision-making. This work considers how such models compare to humans and predictive AI models on a specific case of recidivism prediction. We combine three datasets -- COMPAS predictive AI risk scores, human recidivism judgements, and photos -- into a dataset on which we study the properties of several state-of-the-art, multimodal LMs. Beyond accuracy and bias, we focus on studying human-LM alignment on the task of recidivism prediction. We investigate if these models can be steered towards human decisions, the impact of adding photos, and whether anti-discimination prompting is effective. We find that LMs can be steered to outperform humans and COMPAS using in context-learning. We find anti-discrimination prompting to have unintended effects, causing some models to inhibit themselves and significantly reduce their number of positive predictions.
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.67)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Health & Medicine (0.93)
- Government (0.92)
Tell me the truth: A system to measure the trustworthiness of Large Language Models
Large Language Models (LLM) have taken the front seat in most of the news since November 2022, when ChatGPT was introduced. After more than one year, one of the major reasons companies are resistant to adopting them is the limited confidence they have in the trustworthiness of those systems. In a study by (Baymard, 2023), ChatGPT-4 showed an 80.1% false-positive error rate in identifying usability issues on websites. A Jan. '24 study by JAMA Pediatrics found that ChatGPT has an accuracy rate of 17% percent when diagnosing pediatric medical cases (Barile et al., 2024). But then, what is "trust"? Trust is a relative, subject condition that can change based on culture, domain, individuals. And then, given a domain, how can the trustworthiness of a system be measured? In this paper, I present a systematic approach to measure trustworthiness based on a predefined ground truth, represented as a knowledge graph of the domain. The approach is a process with humans in the loop to validate the representation of the domain and to fine-tune the system. Measuring the trustworthiness would be essential for all the entities operating in critical environments, such as healthcare, defense, finance, but it would be very relevant for all the users of LLMs.
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > Hawaii (0.04)
- Europe > Greece (0.04)