Law
Conscientious Classification: A Data Scientist's Guide to Discrimination-Aware Classification
d'Alessandro, Brian, O'Neil, Cathy, LaGatta, Tom
Recent research has helped to cultivate growing awareness that machine learning systems fueled by big data can create or exacerbate troubling disparities in society. Much of this research comes from outside of the practicing data science community, leaving its members with little concrete guidance to proactively address these concerns. This article introduces issues of discrimination to the data science community on its own terms. In it, we tour the familiar data mining process while providing a taxonomy of common practices that have the potential to produce unintended discrimination. We also survey how discrimination is commonly measured, and suggest how familiar development processes can be augmented to mitigate systems' discriminatory potential. We advocate that data scientists should be intentional about modeling and reducing discriminatory outcomes. Without doing so, their efforts will result in perpetuating any systemic discrimination that may exist, but under a misleading veil of data-driven objectivity.
AI for Earth: a gamechanger for our planet
On 11 December 2017, at the One Planet Summit in Paris, Microsoft announced our $50m, five-year commitment to using AI to improve sustainability, known as AI for Earth. In the past year, the program has grown to support 233 grantees doing work with impact in more than 50 countries and all seven continents. We have also seen the science, from the IPCC and others, that indicate progress is still too slow and uneven to achieve a 2-degree future agreed to in the Paris Accord. Below, you'll see our vision for the program and in following pieces, you'll see how we're continuing to accelerate our efforts. On the two-year anniversary of the Paris climate accord, the world's political, civic and business leaders came together in Paris to discuss one of the most important issues and opportunities of our time: climate change.
Audits as Evidence: Experiments, Ensembles, and Enforcement
Kline, Patrick, Walters, Christopher
We develop tools for utilizing correspondence experiments to detect illegal discrimination by individual employers. Employers violate US employment law if their propensity to contact applicants depends on protected characteristics such as race or sex. We establish identification of higher moments of the causal effects of protected characteristics on callback rates as a function of the number of fictitious applications sent to each job ad. These moments are used to bound the fraction of jobs that illegally discriminate. Applying our results to three experimental datasets, we find evidence of significant employer heterogeneity in discriminatory behavior, with the standard deviation of gaps in job-specific callback probabilities across protected groups averaging roughly twice the mean gap. In a recent experiment manipulating racially distinctive names, we estimate that at least 85% of jobs that contact both of two white applications and neither of two black applications are engaged in illegal discrimination. To assess the tradeoff between type I and II errors presented by these patterns, we consider the performance of a series of decision rules for investigating suspicious callback behavior under a simple two-type model that rationalizes the experimental data. Though, in our preferred specification, only 17% of employers are estimated to discriminate on the basis of race, we find that an experiment sending 10 applications to each job would enable accurate detection of 7-10% of discriminators while falsely accusing fewer than 0.2% of non-discriminators. A minimax decision rule acknowledging partial identification of the joint distribution of callback rates yields higher error rates but more investigations than our baseline two-type model. Our results suggest illegal labor market discrimination can be reliably monitored with relatively small modifications to existing audit designs.
An AI-based, Multi-stage detection system of banking botnets
Ling, Li, Gao, Zhiqiang, Silas, Michael A, Lee, Ian, Doeuff, Erwan A Le
Banking Trojans, botnets are primary drivers of financially-motivated cybercrime. In this paper, we first analyzed how an APT-based banking botnet works step by step through the whole lifecycle. Specifically, we present a multi-stage system that detects malicious banking botnet activities which potentially target the organizations. The system leverages Cyber Data Lake as well as multiple artificial intelligence techniques at different stages. The evaluation results using public datasets showed that Deep Learning based detections were highly successful compared with baseline models. The proposed detections are partially in production on Cyber Data Lake within the organization, and we are continuing to work with internal security teams on further operational challenges.
(Podcast) Chief data officer in government
SONAL SHAH: It's also about how do we make data more useful for people to use and to solve problems in their communities? TANYA OTT: Okay, that is a big job. Who is this superhuman who fills it? TANYA OTT: We'll tell you, in a moment. But first, let me say, you're listening to the Press Room, where we talk about some of the biggest issues facing businesses today. I'm Tanya Ott and joining me today are Bill Eggers … I am the executive director and a professor of practice at Georgetown University's Beeck Center. TANYA OTT: Bill and Sonal are coauthors of The CDO Playbook – a guide for Chief Data Officers. For the last decade, government has been focused on making data more open and easily [accessible] to the public.
Why automation is a feminist issue
According to a new study from the Institute for Public Policy Research (IPPR), nearly 10% of women work in jobs with a high potential for automation, compared with only 4% of men. So what, I hear you say. Substitute "robots" for "austerity", "the demise of unionisation", "public-sector pay freezes", "modern life" – pick any of these and women will always come off worst. Except maybe this time the pointy heads are on to something: perhaps better understanding what the risks are will give us all some agency, and even allow us to change the future. As Carys Roberts, the author of the IPPR report, tells me: "We don't even talk about risks in this area, because there are so many different factors.
How The Software Industry Must Marry Ethics With Artificial Intelligence
Intelligent, learning, autonomous machines are about to change the way we do business forever. But in a world where corporations or even executives may be liable in a civil or even criminal court for their decisions, who is responsible for decisions made by artificial intelligence (AI)? In the United States, courts are already having to wrestle with this science fiction scenario after an Arizona woman was killed by an experimental autonomous Uber vehicle. The European Commission recently shared ethical guidelines, requiring AI to be transparent, have human oversight and be subject to privacy and data protection rules. This sounds really good, but how will any of this be applied in practical situations?
How To Prevent Discriminatory Outcomes In Machine Learning - Liwaiwai
As machine learning (ML) systems continue to improve, its integration to systems making up the society becomes more seamless. Right now, ML is involved in making critical decisions such as court decisions and job hirings. Without a doubt, using ML in these processes will lead to more efficiency. With a good design, ML systems can also eliminate the biases humans have when it comes to their decisions. On the other extreme, this integration could end up really ugly.
Mediation Challenges and Socio-Technical Gaps for Explainable Deep Learning Applications
Brandão, Rafael, Carbonera, Joel, de Souza, Clarisse, Ferreira, Juliana, Gonçalves, Bernardo, Leitão, Carla
The presumed data owners' right to explanations brought about by the General Data Protection Regulation in Europe has shed light on the social challenges of explainable artificial intelligence (XAI). In this paper, we present a case study with Deep Learning (DL) experts from a research and development laboratory focused on the delivery of industrial-strength AI technologies. Our aim was to investigate the social meaning (i.e. meaning to others) that DL experts assign to what they do, given a richly contextualized and familiar domain of application. Using qualitative research techniques to collect and analyze empirical data, our study has shown that participating DL experts did not spontaneously engage into considerations about the social meaning of machine learning models that they build. Moreover, when explicitly stimulated to do so, these experts expressed expectations that, with real-world DL application, there will be available mediators to bridge the gap between technical meanings that drive DL work, and social meanings that AI technology users assign to it. We concluded that current research incentives and values guiding the participants' scientific interests and conduct are at odds with those required to face some of the scientific challenges involved in advancing XAI, and thus responding to the alleged data owners' right to explanations or similar societal demands emerging from current debates. As a concrete contribution to mitigate what seems to be a more general problem, we propose three preliminary XAI Mediation Challenges with the potential to bring together technical and social meanings of DL applications, as well as to foster much needed interdisciplinary collaboration among AI and the Social Sciences researchers.
Fairness-enhancing interventions in stream classification
Iosifidis, Vasileios, Tran, Thi Ngoc Han, Ntoutsi, Eirini
The wide spread usage of automated data-driven decision support systems has raised a lot of concerns regarding accountability and fairness of the employed models in the absence of human supervision. Existing fairness-aware approaches tackle fairness as a batch learning problem and aim at learning a fair model which can then be applied to future instances of the problem. In many applications, however, the data comes sequentially and its characteristics might evolve with time. In such a setting, it is counter-intuitive to "fix" a (fair) model over the data stream as changes in the data might incur changes in the underlying model therefore, affecting its fairness. In this work, we propose fairness-enhancing interventions that modify the input data so that the outcome of any stream classifier applied to that data will be fair. Experiments on real and synthetic data show that our approach achieves good predictive performance and low discrimination scores over the course of the stream.