Well File:


Two-stage Algorithm for Fairness-aware Machine Learning

arXiv.org Machine Learning

Algorithmic decision making process now affects many aspects of our lives. Standard tools for machine learning, such as classification and regression, are subject to the bias in data, and thus direct application of such off-the-shelf tools could lead to a specific group being unfairly discriminated. Removing sensitive attributes of data does not solve this problem because a \textit{disparate impact} can arise when non-sensitive attributes and sensitive attributes are correlated. Here, we study a fair machine learning algorithm that avoids such a disparate impact when making a decision. Inspired by the two-stage least squares method that is widely used in the field of economics, we propose a two-stage algorithm that removes bias in the training data. The proposed algorithm is conceptually simple. Unlike most of existing fair algorithms that are designed for classification tasks, the proposed method is able to (i) deal with regression tasks, (ii) combine explanatory attributes to remove reverse discrimination, and (iii) deal with numerical sensitive attributes. The performance and fairness of the proposed algorithm are evaluated in simulations with synthetic and real-world datasets.

On formalizing fairness in prediction with machine learning

arXiv.org Machine Learning

Machine learning algorithms for prediction are increasingly being used in critical decisions affecting human lives. Various fairness formalizations, with no firm consensus yet, are employed to prevent such algorithms from systematically discriminating against people based on certain attributes protected by law. The aim of this article is to survey how fairness is formalized in the machine learning literature for the task of prediction and present these formalizations with their corresponding notions of distributive justice from the social sciences literature. We provide theoretical as well as empirical critiques of these notions from the social sciences literature and explain how these critiques limit the suitability of the corresponding fairness formalizations to certain domains. We also suggest two notions of distributive justice which address some of these critiques and discuss avenues for prospective fairness formalizations.

Exposing the Probabilistic Causal Structure of Discrimination

arXiv.org Artificial Intelligence

Discrimination discovery from data is an important task aiming at identifying patterns of illegal and unethical discriminatory activities against protected-by-law groups, e.g., ethnic minorities. While any legally-valid proof of discrimination requires evidence of causality, the state-of-the-art methods are essentially correlation-based, albeit, as it is well known, correlation does not imply causation. In this paper we take a principled causal approach to the data mining problem of discrimination detection in databases. Following Suppes' probabilistic causation theory, we define a method to extract, from a dataset of historical decision records, the causal structures existing among the attributes in the data. The result is a type of constrained Bayesian network, which we dub Suppes-Bayes Causal Network (SBCN). Next, we develop a toolkit of methods based on random walks on top of the SBCN, addressing different anti-discrimination legal concepts, such as direct and indirect discrimination, group and individual discrimination, genuine requirement, and favoritism. Our experiments on real-world datasets confirm the inferential power of our approach in all these different tasks.