Collaborating Authors


Why So Many Data Science Projects Fail to Deliver


This article is based on an in-depth study of the data science efforts in three large, private-sector Indian banks with collective assets exceeding $200 million. The study included onsite observations; semistructured interviews with 57 executives, managers, and data scientists; and the examination of archival records. The five obstacles and the solutions for overcoming them emerged from an inductive analytical process based on the qualitative data. More and more companies are embracing data science as a function and a capability. But many of them have not been able to consistently derive business value from their investments in big data, artificial intelligence, and machine learning.1 Moreover, evidence suggests that the gap is widening between organizations successfully gaining value from data science and those struggling to do so.2

Beyond traditional assumptions in fair machine learning Artificial Intelligence

After challenging the validity of these assumptions in real-world applications, we propose ways to move forward when they are violated. First, we show that group fairness criteria purely based on statistical properties of observed data are fundamentally limited. Revisiting this limitation from a causal viewpoint we develop a more versatile conceptual framework, causal fairness criteria, and first algorithms to achieve them. We also provide tools to analyze how sensitive a believed-to-be causally fair algorithm is to misspecifications of the causal graph. Second, we overcome the assumption that sensitive data is readily available in practice. To this end we devise protocols based on secure multi-party computation to train, validate, and contest fair decision algorithms without requiring users to disclose their sensitive data or decision makers to disclose their models. Finally, we also accommodate the fact that outcome labels are often only observed when a certain decision has been made. We suggest a paradigm shift away from training predictive models towards directly learning decisions to relax the traditional assumption that labels can always be recorded. The main contribution of this thesis is the development of theoretically substantiated and practically feasible methods to move research on fair machine learning closer to real-world applications.