discrimination
A principled approach for data bias mitigation
How do you know if your data is fair? And if it isn't, what can you do about it? Machine learning models are increasingly used to make high-stakes decisions, from predicting who gets a loan to estimating the likelihood that someone will reoffend. But these models are only as good as the data they learn from [Shahbazi 2023]. If the training data is biased, the model's decisions will likely be biased too [Hort 2024, Pagano 2023].
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Singapore (0.04)
- Government (0.70)
- Law (0.48)
Optimized Pre-Processing for Discrimination Prevention
Non-discrimination is a recognized objective in algorithmic decision making. In this paper, we introduce a novel probabilistic formulation of data pre-processing for reducing discrimination. We propose a convex optimization for learning a data transformation with three goals: controlling discrimination, limiting distortion in individual data samples, and preserving utility. We characterize the impact of limited sample size in accomplishing this objective. Two instances of the proposed optimization are applied to datasets, including one on real-world criminal recidivism. Results show that discrimination can be greatly reduced at a small cost in classification accuracy.
Equality of Opportunity in Supervised Learning
We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features. Assuming data about the predictor, target, and membership in the protected group are available, we show how to optimally adjust any learned predictor so as to remove discrimination according to our definition. Our framework also improves incentives by shifting the cost of poor classification from disadvantaged groups to the decision maker, who can respond by improving the classification accuracy.
Equality of Opportunity in Classification: A Causal Approach
The Equalized Odds (for short, EO) is one of the most popular measures of discrimination used in the supervised learning setting. It ascertains fairness through the balance of the misclassification rates (false positive and negative) across the protected groups -- e.g., in the context of law enforcement, an African-American defendant who would not commit a future crime will have an equal opportunity of being released, compared to a non-recidivating Caucasian defendant. Despite this noble goal, it has been acknowledged in the literature that statistical tests based on the EO are oblivious to the underlying causal mechanisms that generated the disparity in the first place (Hardt et al. 2016). This leads to a critical disconnect between statistical measures readable from the data and the meaning of discrimination in the legal system, where compelling evidence that the observed disparity is tied to a specific causal process deemed unfair by society is required to characterize discrimination. The goal of this paper is to develop a principled approach to connect the statistical disparities characterized by the EO and the underlying, elusive, and frequently unobserved, causal mechanisms that generated such inequality. We start by introducing a new family of counterfactual measures that allows one to explain the misclassification disparities in terms of the underlying mechanisms in an arbitrary, non-parametric structural causal model. This will, in turn, allow legal and data analysts to interpret currently deployed classifiers through causal lens, linking the statistical disparities found in the data to the corresponding causal processes. Leveraging the new family of counterfactual measures, we develop a learning procedure to construct a classifier that is statistically efficient, interpretable, and compatible with the basic human intuition of fairness. We demonstrate our results through experiments in both real (COMPAS) and synthetic datasets.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Law > Civil Rights & Constitutional Law (0.68)
- Law > Labor & Employment Law (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Data Science (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
fe248e22b241ae5a9adf11493c8c12bc-Supplemental-Conference.pdf
In practice, however, the runtime is much smaller becauseofpruning. With this change in place, the solver can search for incomplete trees. KamPost differs from CART by using a different splitting criterion and by its post-relabelling of theleafnodes. The results also confirm thefindings from Figure 1that thevariance inthediscrimination value is often high, specifically for the small datasets. This means that for those instances it is difficult to generalize and overfitting interms ofdiscrimination isstill happening.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Connecticut (0.04)
- (3 more...)