Despite the hype around AI, most Machine Learning (ML)-based projects focus on predicting outcomes rather than understanding causality. Indeed, after several AI projects, I realized that ML is great at finding correlations in data, but not causation. In our projects, we try to not fall into the trap of equating correlation with causation. This issue significantly limits our ability to rely on ML for decision-making. From a business perspective, we need to have tools that can understand the causal relationships between data and create ML solutions that can generalize well.

Much of artificial intelligence (AI) in common use is dedicated to predicting people's behavior. It tries to anticipate your next purchase, your next mouse-click, your next job move. But such techniques can run into problems when they are used to analyze data for health and development programs. If we do not know the root causes of behavior, we could easily make poor decisions and support ineffective and prejudicial policies. AI, for example, has made it possible for health-care systems to predict which patients are likely to have the most complex medical needs. In the United States, risk-prediction software is being applied to roughly 200 million people to anticipate which patients would benefit from extra medical care now, based on how much they are likely to cost the health-care system in the future. It employs predictive machine learning, a class of self-adaptive algorithms that improve their accuracy as they are provided new data. But as health researcher Ziad Obermeyer and his colleagues showed in a recent article in Science magazine, this particular tool had an unintended consequence: black patients who had more chronic illnesses than white patients were not flagged as needing extra care. The algorithm used insurance claims data to predict patients' future health needs based on their recent health costs.

Acharya, Jayadev, Bhattacharyya, Arnab, Daskalakis, Constantinos, Kandasamy, Saravanan

We consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). Given a causal Bayesian network M on a graph with n discrete variables and bounded in-degree and bounded confounded components'', we show that O(log n) interventions on an unknown causal Bayesian network X on the same graph, and O(n/epsilon 2) samples per intervention, suffice to efficiently distinguish whether X M or whether there exists some intervention under which X and M are farther than epsilon in total variation distance. We also obtain sample/time/intervention efficient algorithms for: (i) testing the identity of two unknown causal Bayesian networks on the same graph; and (ii) learning a causal Bayesian network on a given graph. Although our algorithms are non-adaptive, we show that adaptivity does not help in general: Omega(log n) interventions are necessary for testing the identity of two unknown causal Bayesian networks on the same graph, even adaptively. Our algorithms are enabled by a new subadditivity inequality for the squared Hellinger distance between two causal Bayesian networks.

Sinha, Gaurav, Chauhan, Ayush, Maiti, Aurghya, Poddar, Naman, Goel, Pulkit

We study the problem of separating a mixture of distributions, all of which come from interventions on a known causal bayesian network. Given oracle access to marginals of all distributions resulting from interventions on the network, and estimates of marginals from the mixture distribution, we want to recover the mixing proportions of different mixture components. We show that in the worst case, mixing proportions cannot be identified using marginals only. If exact marginals of the mixture distribution were known, under a simple assumption of excluding a few distributions from the mixture, we show that the mixing proportions become identifiable. Our identifiability proof is constructive and gives an efficient algorithm recovering the mixing proportions exactly. When exact marginals are not available, we design an optimization framework to estimate the mixing proportions. Our problem is motivated from a real-world scenario of an e-commerce business, where multiple interventions occur at a given time, leading to deviations in expected metrics. We conduct experiments on the well known publicly available ALARM network and on a proprietary dataset from a large e-commerce company validating the performance of our method.

Machine Learning engineers work around bias or the offsets in a model by drawing insights from the output, gauging the losses, going through tonnes of data and repeating till agreeable results have been obtained. This is a traditional process which takes time but works decently. An alternative to this approach is the Lagrangian approach, a mathematical method to find the local maxima and local minima of a function when provided with equality constraints. This too, comes with its own set of complexities. The unfairness of machine learning algorithms was exposed when they were deployed for manual tasks like hiring, surveillance and other such critical tasks, where the damages can be irreversible.

One of the arguments that is regularly used in favor of machine learning systems is the fact that they can arrive to decisions without being vulnerable to human subjectivity. However, that argument is only partially true. While machine learning systems don't make decisions based on feelings or emotions, they do inherit a lot of human biases via the training datasets. Bias is relevant because it leads to unfairness. In the last few years, there has been a lot of progress developing techniques that can mitigate the impact of bias and improve the fairness of machine learning systems.

Acharya, Jayadev, Bhattacharyya, Arnab, Daskalakis, Constantinos, Kandasamy, Saravanan

We consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). Given a causal Bayesian network M on a graph with n discrete variables and bounded in-degree and bounded ``confounded components'', we show that O(log n) interventions on an unknown causal Bayesian network X on the same graph, and O(n/epsilon^2) samples per intervention, suffice to efficiently distinguish whether X=M or whether there exists some intervention under which X and M are farther than epsilon in total variation distance. We also obtain sample/time/intervention efficient algorithms for: (i) testing the identity of two unknown causal Bayesian networks on the same graph; and (ii) learning a causal Bayesian network on a given graph. Although our algorithms are non-adaptive, we show that adaptivity does not help in general: Omega(log n) interventions are necessary for testing the identity of two unknown causal Bayesian networks on the same graph, even adaptively. Our algorithms are enabled by a new subadditivity inequality for the squared Hellinger distance between two causal Bayesian networks.

Acharya, Jayadev, Bhattacharyya, Arnab, Daskalakis, Constantinos, Kandasamy, Saravanan

Lattimore, Finnian, Ong, Cheng Soon

We provide a conceptual map to navigate causal analysis problems. Focusing on the case of discrete random variables, we consider the case of causal effect estimation from observational data. The presented approaches apply also to continuous variables, but the issue of estimation becomes more complex. We then introduce the four schools of thought for causal analysis