caucasian
A Pseudocode of SLDG Algorithm 1: Training and Inference for SLDG
Tab. 4 provides detailed statistics of the two datasets. B.2 Clinical Predictive T asks We focus on two common clinical predictive tasks: readmission prediction and mortality prediction. In the case of the eICU dataset, the predictions are made 12 hours after admission. The overall prevalence for these tasks is 15% for readmission and 4% for mortality. For the MIMIC-IV dataset, the predictions are made at the time of discharge.
_NeurIPS_Camera_Ready__Actionable_Recourse_for_Subgroups (4)
In order to prove that the objective function in Eqn. 1 is non-normal, non-negative, non-monotone, and submodular, we need to prove the following: any one of the terms in the objective is non-normal all the terms in the objective are non-negative any one of the terms in the objective is non-monotone all the terms in the objective are submodular Non-normality Let us consider the term f This metric can never be negative by definition. This is clearly a diminishing returns function i.e., more additional instances in the data are covered Before we prove Theorem 2.2, we will first discuss how several previously proposed methods which Eqn.1 can be reduced to the objectives employed by prior approaches which provide instance level Subsuming other objective functions: The objective optimized by Wachter et al. is Higher values of recourse accuracy are desired; lower values of mean fcost are desired. Explanation vs. Recourse Accuracy for COMP AS (left), Credit (middle), and Bail (right) datasets A.2.2 User Study We manually constructed a two level recourse set (as our black box model) for the bail application. We deliberately ensured that this black box was biased against individuals who are not Caucasian. This two level recourse set (black box) is shown in Figure 4. We used AR-LIME as a comparison point in our user study.
Leveraging Diffusion Perturbations for Measuring Fairness in Computer Vision
Lui, Nicholas, Chia, Bryan, Berrios, William, Ross, Candace, Kiela, Douwe
Computer vision models have been known to encode harmful biases, leading to the potentially unfair treatment of historically marginalized groups, such as people of color. However, there remains a lack of datasets balanced along demographic traits that can be used to evaluate the downstream fairness of these models. In this work, we demonstrate that diffusion models can be leveraged to create such a dataset. We first use a diffusion model to generate a large set of images depicting various occupations. Subsequently, each image is edited using inpainting to generate multiple variants, where each variant refers to a different perceived race. Using this dataset, we benchmark several vision-language models on a multi-class occupation classification task. We find that images generated with non-Caucasian labels have a significantly higher occupation misclassification rate than images generated with Caucasian labels, and that several misclassifications are suggestive of racial biases. We measure a model's downstream fairness by computing the standard deviation in the probability of predicting the true occupation label across the different perceived identity groups. Using this fairness metric, we find significant disparities between the evaluated vision-and-language models. We hope that our work demonstrates the potential value of diffusion methods for fairness evaluations.
Evaluating Machine Perception of Indigeneity: An Analysis of ChatGPT's Perceptions of Indigenous Roles in Diverse Scenarios
Solorzano, Cecilia Delgado, Hernandez, Carlos Toxtli
Large Language Models (LLMs), like ChatGPT, are fundamentally tools trained on vast data, reflecting diverse societal impressions. This paper aims to investigate LLMs' self-perceived bias concerning indigeneity when simulating scenarios of indigenous people performing various roles. Through generating and analyzing multiple scenarios, this work offers a unique perspective on how technology perceives and potentially amplifies societal biases related to indigeneity in social computing. The findings offer insights into the broader implications of indigeneity in critical computing.
Equal Confusion Fairness: Measuring Group-Based Disparities in Automated Decision Systems
Gursoy, Furkan, Kakadiaris, Ioannis A.
As artificial intelligence plays an increasingly substantial role in decisions affecting humans and society, the accountability of automated decision systems has been receiving increasing attention from researchers and practitioners. Fairness, which is concerned with eliminating unjust treatment and discrimination against individuals or sensitive groups, is a critical aspect of accountability. Yet, for evaluating fairness, there is a plethora of fairness metrics in the literature that employ different perspectives and assumptions that are often incompatible. This work focuses on group fairness. Most group fairness metrics desire a parity between selected statistics computed from confusion matrices belonging to different sensitive groups. Generalizing this intuition, this paper proposes a new equal confusion fairness test to check an automated decision system for fairness and a new confusion parity error to quantify the extent of any unfairness. To further analyze the source of potential unfairness, an appropriate post hoc analysis methodology is also presented. The usefulness of the test, metric, and post hoc analysis is demonstrated via a case study on the controversial case of COMPAS, an automated decision system employed in the US to assist judges with assessing recidivism risks. Overall, the methods and metrics provided here may assess automated decision systems' fairness as part of a more extensive accountability assessment, such as those based on the system accountability benchmark.
An information-theoretic learning model based on importance sampling
Zhang, Jiangshe, Ji, Lizhen, Gao, Fei, Li, Mengyao
A crucial assumption underlying the most current theory of machine learning is that the training distribution is identical to the test distribution. However, this assumption may not hold in some real-world applications. In this paper, we develop a learning model based on principles of information theory by minimizing the worst-case loss at prescribed levels of uncertainty. We reformulate the empirical estimation of the risk functional and the distribution deviation constraint based on the importance sampling method. The objective of the proposed approach is to minimize the loss under maximum degradation and hence the resulting problem is a minimax problem which can be converted to an unconstrained minimum problem using the Lagrange method with the Lagrange multiplier $T$. We reveal that the minimization of the objective function under logarithmic transformation is equivalent to the minimization of the p-norm loss with $p=\frac{1}{T}$. We applied the proposed model to the face verification task on Racial Faces in the Wild datasets and showed that the proposed model performs better under large distribution deviations.
An Examination of Fairness of AI Models for Deepfake Detection
Recent studies have demonstrated that deep learning models can discriminate based on protected classes like race and gender. In this work, we evaluate bias present in deepfake datasets and detection models across protected subgroups. Using facial datasets balanced by race and gender, we examine three popular deepfake detectors and find large disparities in predictive performances across races, with up to 10.7% difference in error rate between subgroups. A closer look reveals that the widely used FaceForensics++ dataset is overwhelmingly composed of Caucasian subjects, with the majority being female Caucasians. Our investigation of the racial distribution of deepfakes reveals that the methods used to create deepfakes as positive training signals tend to produce "irregular" faces - when a person's face is swapped onto another person of a different race or gender. This causes detectors to learn spurious correlations between the foreground faces and fakeness. Moreover, when detectors are trained with the Blended Image (BI) dataset from Face X-Rays, we find that those detectors develop systematic discrimination towards certain racial subgroups, primarily female Asians.
Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP
Chen, John, Berlot-Atwell, Ian, Hossain, Safwan, Wang, Xindi, Rudzicz, Frank
Clinical machine learning is increasingly multimodal, collected in both structured tabular formats and unstructured forms such as freetext. We propose a novel task of exploring fairness on a multimodal clinical dataset, adopting equalized odds for the downstream medical prediction tasks. To this end, we investigate a modality-agnostic fairness algorithm - equalized odds post processing - and compare it to a text-specific fairness algorithm: debiased clinical word embeddings. Despite the fact that debiased word embeddings do not explicitly address equalized odds of protected groups, we show that a text-specific approach to fairness may simultaneously achieve a good balance of performance and classical notions of fairness. We hope that our paper inspires future contributions at the critical intersection of clinical NLP and fairness. The full source code is available here: https://github.com/johntiger1/multimodal_fairness