Performance Analysis
Accenture Unveils Tool to Help Companies Insure Their AI Is Fair
Consulting firm Accenture has a new tool to help businesses detect and eliminate gender, racial and ethnic bias in artificial intelligence software. Companies and governments are increasingly turning to machine-learning algorithms to help make critical decisions, including who to hire, who gets insurance or a mortgage, who receives government benefits and even whether to grant a prisoner parole. One of the arguments for using such software is that, if correctly designed and trained, it can potentially make decisions free from the prejudices that often impact human choices. But, in a number of well-publicized examples, algorithms have been found to discriminate against minorities and women. For instance, an algorithm many U.S. cities and states used to help make bail decisions was twice as likely to falsely label black prisoners as being at high-risk for re-offending as white prisoners, according to a 2016 investigation by ProPublica.
On the Relationship between Data Efficiency and Error for Uncertainty Sampling
Mussmann, Stephen, Liang, Percy
While active learning offers potential cost savings, the actual data efficiency---the reduction in amount of labeled data needed to obtain the same error rate---observed in practice is mixed. This paper poses a basic question: when is active learning actually helpful? We provide an answer for logistic regression with the popular active learning algorithm, uncertainty sampling. Empirically, on 21 datasets from OpenML, we find a strong inverse correlation between data efficiency and the error rate of the final classifier. Theoretically, we show that for a variant of uncertainty sampling, the asymptotic data efficiency is within a constant factor of the inverse error rate of the limiting classifier.
Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees
Celis, L. Elisa, Huang, Lingxiao, Keswani, Vijay, Vishnoi, Nisheeth K.
Developing classification algorithms that are fair with respect to sensitive attributes of the data has become an important problem due to the growing deployment of classification algorithms in various social contexts. Several recent works have focused on fairness with respect to a specific metric, modeled the corresponding fair classification problem as a constrained optimization problem, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which we do not have fair classifiers and many of the aforementioned algorithms do not come with theoretical guarantees; perhaps because the resulting optimization problem is non-convex. The main contribution of this paper is a new meta-algorithm for classification that takes as input a large class of fairness constraints, with respect to multiple non-disjoint sensitive attributes, and which comes with provable guarantees. This is achieved by first developing a meta-algorithm for a large family of classification problems with convex constraints, and then showing that classification problems with general types of fairness constraints can be reduced to those in this family. We present empirical results that show that our algorithm can achieve near-perfect fairness with respect to various fairness metrics, and that the loss in accuracy due to the imposed fairness constraints is often small. Overall, this work unifies several prior works on fair classification, presents a practical algorithm with theoretical guarantees, and can handle fairness metrics that were previously not possible.
Non-Negative Networks Against Adversarial Attacks
Fleshman, William, Raff, Edward, Sylvester, Jared, Forsyth, Steven, McLean, Mark
Adversarial attacks against Neural Networks are a problem of considerable importance, for which effective defenses are not yet readily available. We make progress toward this problem by showing that non-negative weight constraints can be used to improve resistance in specific scenarios. In particular, we show that they can provide an effective defense for binary classification problems with asymmetric cost, such as malware or spam detection. We also show how non-negativity can be leveraged to reduce an attacker's ability to perform targeted misclassification attacks in other domains such as image processing.
Fairness Behind a Veil of Ignorance: A Welfare Analysis for Automated Decision Making
Heidari, Hoda, Ferrari, Claudio, Gummadi, Krishna P., Krause, Andreas
We draw attention to an important, yet largely overlooked aspect of evaluating fairness for automated decision making systems---namely risk and welfare considerations. Our proposed family of measures corresponds to the long-established formulations of cardinal social welfare in economics. We come to this proposal by taking the perspective of a rational, risk-averse individual who is going to be subject to algorithmic decision making and is faced with the task of choosing between several algorithmic alternatives behind a Rawlsian veil of ignorance. The convex formulation of our measures allows us to integrate them as a constraint into any convex loss minimization pipeline. Our empirical analysis reveals interesting trade-offs between our proposal and (a) prediction accuracy, (b) group discrimination, and (c) Dwork et al.'s notion of individual fairness. Furthermore and perhaps most importantly, our work provides both theoretical and empirical evidence suggesting that a lower-bound on our measures often leads to bounded inequality in algorithmic outcomes; hence presenting the first computationally feasible mechanism for bounding individual-level (un)fairness.
Artificial Intelligence for Healthcare Accenture UK
Jeremy Howard was pioneering ways for deep learning to help physicians interpret medical data better when the challenge he was tackling suddenly hit close to home. When Jeremy Howard's wife, Rachel, was diagnosed with a brain cyst while she was pregnant with their first child three years ago, Jeremy and Rachel did what comes naturally to data scientists like them. It contained: possible treatments, their known likelihood of success and failure, the value they assigned to different outcomes, and the potential problems if things went wrong. When most of us fall ill, we find ourselves thrust into a world of frantic Googling, confusing choices, and fear of the unknown. We place trust in our doctors to know what's best. When it comes to making decisions, from investing money, to raising children, to taking medicine, he uses probabilities, priors and statistics.
Extract, Explore, Transform, Model: How Machine Learning Projects Unfold
In the majority of cases, you should compare the strength of a few algorithms before choosing the final model. The decision is not a trivial one, but boils down to finding an algorithm that balances strong performance with accomplishing the task in the most straightforward way. You can compare the strength of algorithms by handing your data to each of them and scoring their performance on the task with a technique called cross-validation. The metric used to score performance will vary by context, but some common ones are R-squared for regression, and accuracy, precision, and recall for classification. For classification tasks, quantifying the cost of both a false positive prediction and a false negative prediction can help you to choose the most appropriate metric.
Static Malware Detection & Subterfuge: Quantifying the Robustness of Machine Learning and Current Anti-Virus
Fleshman, William, Raff, Edward, Zak, Richard, McLean, Mark, Nicholas, Charles
As machine-learning (ML) based systems for malware detection become more prevalent, it becomes necessary to quantify the benefits compared to the more traditional anti-virus (AV) systems widely used today. It is not practical to build an agreed upon test set to benchmark malware detection systems on pure classification performance. Instead we tackle the problem by creating a new testing methodology, where we evaluate the change in performance on a set of known benign & malicious files as adversarial modifications are performed. The change in performance combined with the evasion techniques then quantifies a system's robustness against that approach. Through these experiments we are able to show in a quantifiable way how purely ML based systems can be more robust than AV products at detecting malware that attempts evasion through modification, but may be slower to adapt in the face of significantly novel attacks.
A One-Sided Classification Toolkit with Applications in the Analysis of Spectroscopy Data
This dissertation investigates the use of one-sided classification algorithms in the application of separating hazardous chlorinated solvents from other materials, based on their Raman spectra. The experimentation is carried out using a new one-sided classification toolkit that was designed and developed from the ground up. In the one-sided classification paradigm, the objective is to separate elements of the target class from all outliers. These one-sided classifiers are generally chosen, in practice, when there is a deficiency of some sort in the training examples. Sometimes outlier examples can be rare, expensive to label, or even entirely absent. However, this author would like to note that they can be equally applicable when outlier examples are plentiful but nonetheless not statistically representative of the complete outlier concept. It is this scenario that is explicitly dealt with in this research work. In these circumstances, one-sided classifiers have been found to be more robust that conventional multi-class classifiers. The term "unexpected" outliers is introduced to represent outlier examples, encountered in the test set, that have been taken from a different distribution to the training set examples. These are examples that are a result of an inadequate representation of all possible outliers in the training set. It can often be impossible to fully characterise outlier examples given the fact that they can represent the immeasurable quantity of "everything else" that is not a target. The findings from this research have shown the potential drawbacks of using conventional multi-class classification algorithms when the test data come from a completely different distribution to that of the training samples.
Partial AUC Maximization via Nonlinear Scoring Functions
Ueda, Naonori, Fujino, Akinori
We propose a method for maximizing a partial area under a receiver operating characteristic (ROC) curve (pAUC) for binary classification tasks. In binary classification tasks, accuracy is the most commonly used as a measure of classifier performance. In some applications such as anomaly detection and diagnostic testing, accuracy is not an appropriate measure since prior probabilties are often greatly biased. Although in such cases the pAUC has been utilized as a performance measure, few methods have been proposed for directly maximizing the pAUC. This optimization is achieved by using a scoring function. The conventional approach utilizes a linear function as the scoring function. In contrast we newly introduce nonlinear scoring functions for this purpose. Specifically, we present two types of nonlinear scoring functions based on generative models and deep neural networks. We show experimentally that nonlinear scoring fucntions improve the conventional methods through the application of a binary classification of real and bogus objects obtained with the Hyper Suprime-Cam on the Subaru telescope.