Collaborating Authors

bayesian network

The Case for Causal AI (SSIR)


Much of artificial intelligence (AI) in common use is dedicated to predicting people's behavior. It tries to anticipate your next purchase, your next mouse-click, your next job move. But such techniques can run into problems when they are used to analyze data for health and development programs. If we do not know the root causes of behavior, we could easily make poor decisions and support ineffective and prejudicial policies. AI, for example, has made it possible for health-care systems to predict which patients are likely to have the most complex medical needs. In the United States, risk-prediction software is being applied to roughly 200 million people to anticipate which patients would benefit from extra medical care now, based on how much they are likely to cost the health-care system in the future. It employs predictive machine learning, a class of self-adaptive algorithms that improve their accuracy as they are provided new data. But as health researcher Ziad Obermeyer and his colleagues showed in a recent article in Science magazine, this particular tool had an unintended consequence: black patients who had more chronic illnesses than white patients were not flagged as needing extra care. The algorithm used insurance claims data to predict patients' future health needs based on their recent health costs.

Causal AI & Bayesian Networks


We are all familiar with the dictum that "correlation does not imply causation". Furthermore, given a data file with samples of two variables x and z, we all know how to calculate the correlation between x and z. But it's only an elite minority, the few, the proud, the Bayesian Network aficionados, that know how to calculate the causal connection between x and z. Neural Net aficionados are incapable of doing this. Their Neural nets are just too wimpy to cut it.

Generative Adversarial Networks (GANs) & Bayesian Networks


Generative Adversarial Networks (GANs) software is software for producing forgeries and imitations of data (aka synthetic data, fake data). Human beings have been making fakes, with good or evil intent, of almost everything they possibly can, since the beginning of the human race. Thus, perhaps not too surprisingly, GAN software has been widely used since it was first proposed in this amazingly recent 2014 paper. To gauge how widely GAN software has been used so far, see, for example, this 2019 article entitled "18 Impressive Applications of Generative Adversarial Networks (GANs)" Sounds (voices, music,...), Images (realistic pictures, paintings, drawings, handwriting, ...), Text,etc. The forgeries can be tweaked so that they range from being very similar to the originals, to being whimsical exaggerations thereof.

Learning DAGs with continuous optimization


As datasets continually increase in size and complexity, our ability to uncover meaningful insights from unstructured and unlabeled data is crucial. At the same time, a premium has been placed on delivering simple, human-interpretable, and trustworthy inferential models of data. One promising class of such models are graphical models, which have been used to extract relational information from massive datasets arising from a wide variety of domains including biology, medicine, business, and finance, just to name a few. Graphical models are families of multivariate distributions with compact representations expressed as graphs. In both undirected (Markov networks) and directed (Bayesian networks) graphical models, the graph structure guides the factorization of the joint distribution into smaller local specifications such as clique potentials or local conditionals of a variable given its "parent" variables.

Three Modern Roles for Logic in AI Artificial Intelligence

We consider three modern roles for logic in artificial intelligence, which are based on the theory of tractable Boolean circuits: (1) logic as a basis for computation, (2) logic for learning from a combination of data and knowledge, and (3) logic for reasoning about the behavior of machine learning systems.

Learning Bayesian Networks that enable full propagation of evidence Machine Learning

This paper builds on recent developments in Bayesian network (BN) structure learning under the controversial assumption that the input variables are dependent. This assumption is geared towards real-world datasets that incorporate variables which are assumed to be dependent. It aims to address the problem of learning multiple disjoint subgraphs which do not enable full propagation of evidence. A novel hybrid structure learning algorithm is presented in this paper for this purpose, called SaiyanH. The results show that the algorithm discovers satisfactorily accurate connected DAGs in cases where all other algorithms produce multiple disjoint subgraphs for dependent variables. This problem is highly prevalent in cases where the sample size of the input data is low with respect to the dimensionality of the model, which is often the case when working with real data. Based on six case studies, five different sample sizes, three different evaluation metrics, and other state-of-the-art or well-established constraint-based, score-based and hybrid learning algorithms, the results rank SaiyanH 4th out of 13 algorithms for overall performance.

Learning Bayesian Networks with Low Rank Conditional Probability Tables

Neural Information Processing Systems

In this paper, we provide a method to learn the directed structure of a Bayesian network using data. The data is accessed by making conditional probability queries to a black-box model. We introduce a notion of simplicity of representation of conditional probability tables for the nodes in the Bayesian network, that we call low rankness''. We connect this notion to the Fourier transformation of real valued set functions and propose a method which learns the exact directed structure of a low rank Bayesian network using very few queries. We formally prove that our method correctly recovers the true directed structure, runs in polynomial time and only needs polynomial samples with respect to the number of nodes.

Causal datasheet: An approximate guide to practically assess Bayesian networks in the real world Artificial Intelligence

In solving real-world problems like changing healthcare-seeking behaviors, designing interventions to improve downstream outcomes requires an understanding of the causal links within the system. Causal Bayesian Networks (BN) have been proposed as one such powerful method. In real-world applications, however, confidence in the results of BNs are often moderate at best. This is due in part to the inability to validate against some ground truth, as the DAG is not available. This is especially problematic if the learned DAG conflicts with pre-existing domain doctrine. At the policy level, one must justify insights generated by such analysis, preferably accompanying them with uncertainty estimation. Here we propose a causal extension to the datasheet concept proposed by Gebru et al (2018) to include approximate BN performance expectations for any given dataset. To generate the results for a prototype Causal Datasheet, we constructed over 30,000 synthetic datasets with properties mirroring characteristics of real data. We then recorded the results given by state-of-the-art structure learning algorithms. These results were used to populate the Causal Datasheet, and recommendations were automatically generated dependent on expected performance. As a proof of concept, we used our Causal Datasheet Generation Tool (CDG-T) to assign expected performance expectations to a maternal health survey we conducted in Uttar Pradesh, India.

An Incremental Explanation of Inference in Hybrid Bayesian Networks for Increasing Model Trustworthiness and Supporting Clinical Decision Making Artificial Intelligence

Various AI models are increasingly being considered as part of clinical decision-support tools. However, the trustworthiness of such models is rarely considered. Clinicians are more likely to use a model if they can understand and trust its predictions. Key to this is if its underlying reasoning can be explained. A Bayesian network (BN) model has the advantage that it is not a black-box and its reasoning can be explained. In this paper, we propose an incremental explanation of inference that can be applied to hybrid BNs, i.e. those that contain both discrete and continuous nodes. The key questions that we answer are: (1) which important evidence supports or contradicts the prediction, and (2) through which intermediate variables does the information flow. The explanation is illustrated using a real clinical case study. A small evaluation study is also conducted.

BARD: A structured technique for group elicitation of Bayesian networks to support analytic reasoning Artificial Intelligence

In many complex, real-world situations, problem solving and decision making require effective reasoning about causation and uncertainty. However, human reasoning in these cases is prone to confusion and error. Bayesian networks (BNs) are an artificial intelligence technology that models uncertain situations, supporting probabilistic and causal reasoning and decision making. However, to date, BN methodologies and software require significant upfront training, do not provide much guidance on the model building process, and do not support collaboratively building BNs. BARD (Bayesian ARgumentation via Delphi) is both a methodology and an expert system that utilises (1) BNs as the underlying structured representations for better argument analysis, (2) a multi-user web-based software platform and Delphi-style social processes to assist with collaboration, and (3) short, high-quality e-courses on demand, a highly structured process to guide BN construction, and a variety of helpful tools to assist in building and reasoning with BNs, including an automated explanation tool to assist effective report writing. The result is an end-to-end online platform, with associated online training, for groups without prior BN expertise to understand and analyse a problem, build a model of its underlying probabilistic causal structure, validate and reason with the causal model, and use it to produce a written analytic report. Initial experimental results demonstrate that BARD aids in problem solving, reasoning and collaboration.