Goto

Collaborating Authors

 Morgenstern, Jamie


Evaluating Fairness of Machine Learning Models Under Uncertain and Incomplete Information

arXiv.org Machine Learning

Training and evaluation of fair classifiers is a challenging problem. This is partly due to the fact that most fairness metrics of interest depend on both the sensitive attribute information and label information of the data points. In many scenarios it is not possible to collect large datasets with such information. An alternate approach that is commonly used is to separately train an attribute classifier on data with sensitive attribute information, and then use it later in the ML pipeline to evaluate the bias of a given classifier. While such decoupling helps alleviate the problem of demographic scarcity, it raises several natural questions such as: how should the attribute classifier be trained?, and how should one use a given attribute classifier for accurate bias estimation? In this work we study this question from both theoretical and empirical perspectives. We first experimentally demonstrate that the test accuracy of the attribute classifier is not always correlated with its effectiveness in bias estimation for a downstream model. In order to further investigate this phenomenon, we analyze an idealized theoretical model and characterize the structure of the optimal classifier. Our analysis has surprising and counter-intuitive implications where in certain regimes one might want to distribute the error of the attribute classifier as unevenly as possible among the different subgroups. Based on our analysis we develop heuristics for both training and using attribute classifiers for bias estimation in the data scarce regime. We empirically demonstrate the effectiveness of our approach on real and simulated data.


Adaptive Sampling to Reduce Disparate Performance

arXiv.org Machine Learning

Existing methods for reducing disparate performance of a classifier across different demographic groups assume that one has access to a large data set, thereby focusing on the algorithmic aspect of optimizing overall performance subject to additional constraints. However, poor data collection and imbalanced data sets can severely affect the quality of these methods. In this work, we consider a setting where data collection and optimization are performed simultaneously. In such a scenario, a natural strategy to mitigate the performance difference of the classifier is to provide additional training data drawn from the demographic groups that are worse off. In this paper, we propose to consistently follow this strategy throughout the whole training process and to guide the resulting classifier towards equal performance on the different groups by adaptively sampling each data point from the group that is currently disadvantaged. We provide a rigorous theoretical analysis of our approach in a simplified one-dimensional setting and an extensive experimental evaluation on numerous real-world data sets, including a case study on the data collected during the Flint water crisis.


A Notion of Individual Fairness for Clustering

arXiv.org Machine Learning

A common distinction in fair machine learning, in particular in fair classification, is between group fairness and individual fairness. In the context of clustering, group fairness has been studied extensively in recent years; however, individual fairness for clustering has hardly been explored. In this paper, we propose a natural notion of individual fairness for clustering. Our notion asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. We study several questions related to our proposed notion of individual fairness. On the negative side, we show that deciding whether a given data set allows for such an individually fair clustering in general is NP-hard. On the positive side, for the special case of a data set lying on the real line, we propose an efficient dynamic programming approach to find an individually fair clustering. For general data sets, we investigate heuristics aimed at minimizing the number of individual fairness violations and compare them to standard clustering approaches on real data sets.


Effectiveness of Equalized Odds for Fair Classification under Imperfect Group Information

arXiv.org Machine Learning

Most approaches for ensuring or improving a model's fairness with respect to a protected attribute (such as race or gender) assume access to the true value of the protected attribute for every data point. In many scenarios, however, perfect knowledge of the protected attribute is unrealistic. In this paper, we ask to what extent fairness interventions can be effective even with imperfect information about the protected attribute. In particular, we study this question in the context of the prominent equalized odds method of Hardt et al. (2016). We claim that as long as the perturbation of the protected attribute is somewhat moderate, one should still run equalized odds if one would run it knowing the true protected attribute: the bias of the classifier that we obtain using the perturbed attribute is smaller than the bias of the original classifier, and its error is not larger than the error of the equalized odds classifier obtained when working with the true protected attribute.


FairVis: Visual Analytics for Discovering Intersectional Bias in Machine Learning

arXiv.org Machine Learning

The growing capability and accessibility of machine learning has led to its application to many real-world domains and data about people. Despite the benefits algorithmic systems may bring, models can reflect, inject, or exacerbate implicit and explicit societal biases into their outputs, disadvantaging certain demographic subgroups. Discovering which biases a machine learning model has introduced is a great challenge, due to the numerous definitions of fairness and the large number of potentially impacted subgroups. We present FairVis, a mixed-initiative visual analytics system that integrates a novel subgroup discovery technique for users to audit the fairness of machine learning models. Through FairVis, users can apply domain knowledge to generate and investigate known subgroups, and explore suggested and similar subgroups. FairVis' coordinated views enable users to explore a high-level overview of subgroup performance and subsequently drill down into detailed investigation of specific subgroups. We show how FairVis helps to discover biases in two real datasets used in predicting income and recidivism. As a visual analytics system devoted to discovering bias in machine learning, FairVis demonstrates how interactive visualization may help data scientists and the general public in understanding and creating more equitable algorithmic systems.


Predictive Inequity in Object Detection

arXiv.org Machine Learning

In this work, we investigate whether state-of-the-art object detection systems have equitable predictive performance on pedestrians with different skin tones. This work is motivated by many recent examples of ML and vision systems displaying higher error rates for certain demographic groups than others. We annotate an existing large scale dataset which contains pedestrians, BDD100K, with Fitzpatrick skin tones in ranges [1-3] or [4-6]. We then provide an in-depth comparative analysis of performance between these two skin tone groupings, finding that neither time of day nor occlusion explain this behavior, suggesting this disparity is not merely the result of pedestrians in the 4-6 range appearing in more difficult scenes for detection. We investigate to what extent time of day, occlusion, and reweighting the supervised loss during training affect this predictive bias.


Fair k-Center Clustering for Data Summarization

arXiv.org Machine Learning

In data summarization we want to choose k prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose k_i prototypes belonging to group i. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as k-center. A natural extension then is to incorporate the fairness constraint into the clustering objective. Existing algorithms for doing so run in time super-quadratic in the size of the data set. This is in contrast to the standard k-center objective that can be approximately optimized in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the k-center problem under the fairness constraint with running time linear in the size of the data set and k. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead. We demonstrate the applicability of our algorithm on both synthetic and real data sets.


Guarantees for Spectral Clustering with Fairness Constraints

arXiv.org Machine Learning

Given the widespread popularity of spectral clustering (SC) for partitioning graph data, we study a version of constrained SC in which we try to incorporate the fairness notion proposed by Chierichetti et al. (2017). According to this notion, a clustering is fair if every demographic group is approximately proportionally represented in each cluster. To this end, we develop variants of both normalized and unnormalized constrained SC and show that they help find fairer clusterings on both synthetic and real data. We also provide a rigorous theoretical analysis of our algorithms. While there have been efforts to incorporate various constraints into the SC framework, theoretically analyzing them is a challenging problem. We overcome this by proposing a natural variant of the stochastic block model where h groups have strong inter-group connectivity, but also exhibit a "natural" clustering structure which is fair. We prove that our algorithms can recover this fair clustering with high probability.


The Price of Fair PCA: One Extra Dimension

arXiv.org Machine Learning

We investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations. We show on several real-world data sets, PCA has higher reconstruction error on population A than on B (for example, women versus men or lower- versus higher-educated individuals). This can happen even when the data set has a similar number of samples from A and B. This motivates our study of dimensionality reduction techniques which maintain similar fidelity for A and B. We define the notion of Fair PCA and give a polynomial-time algorithm for finding a low dimensional representation of the data which is nearly-optimal with respect to this measure. Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data.


Datasheets for Datasets

arXiv.org Artificial Intelligence

Currently there is no standard way to identify how a dataset was created, and what characteristics, motivations, and potential skews it represents. To begin to address this issue, we propose the concept of a datasheet for datasets, a short document to accompany public datasets, commercial APIs, and pretrained models. The goal of this proposal is to enable better communication between dataset creators and users, and help the AI community move toward greater transparency and accountability. By analogy, in computer hardware, it has become industry standard to accompany everything from the simplest components (e.g., resistors), to the most complex microprocessor chips, with datasheets detailing standard operating characteristics, test results, recommended usage, and other information. We outline some of the questions a datasheet for datasets should answer. These questions focus on when, where, and how the training data was gathered, its recommended use cases, and, in the case of human-centric datasets, information regarding the subjects' demographics and consent as applicable. We develop prototypes of datasheets for two well-known datasets: Labeled Faces in The Wild~\cite{lfw} and the Pang \& Lee Polarity Dataset~\cite{polarity}.