Most Relevant Explanation in Bayesian Networks

Journal of Artificial Intelligence Research

A major inference task in Bayesian networks is explaining why some variables are observed in their particular states using a set of target variables. Existing methods for solving this problem often generate explanations that are either too simple (underspecified) or too complex (overspecified). In this paper, we introduce a method called Most Relevant Explanation (MRE) which finds a partial instantiation of the target variables that maximizes the generalized Bayes factor (GBF) as the best explanation for the given evidence. Our study shows that GBF has several theoretical properties that enable MRE to automatically identify the most relevant target variables in forming its explanation. In particular, conditional Bayes factor (CBF), defined as the GBF of a new explanation conditioned on an existing explanation, provides a soft measure on the degree of relevance of the variables in the new explanation in explaining the evidence given the existing explanation. As a result, MRE is able to automatically prune less relevant variables from its explanation. We also show that CBF is able to capture well the explaining-away phenomenon that is often represented in Bayesian networks. Moreover, we define two dominance relations between the candidate solutions and use the relations to generalize MRE to find a set of top explanations that is both diverse and representative. Case studies on several benchmark diagnostic Bayesian networks show that MRE is often able to find explanatory hypotheses that are not only precise but also concise.


Most Relevant Explanation in Bayesian Networks

AAAI Conferences

A major inference task in Bayesian networks is explaining why some variables are observed in their particular states using a set of target variables. Existing methods for solving this problem often generate explanations that are either too simple (underspecified) or too complex (overspecified). In this paper, we introduce a method called Most Relevant Explanation (MRE) which finds a partial instantiation of the target variables that maximizes the generalized Bayes factor (GBF) as the best explanation for the given evidence. Our study shows that GBF has several theoretical properties that enable MRE to automatically identify the most relevant target variables in forming its explanation. In particular, conditional Bayes factor (CBF), defined as the GBF of a new explanation conditioned on an existing explanation, provides a soft measure on the degree of relevance of the variables in the new explanation in explaining the evidence given the existing explanation. As a result, MRE is able to automatically prune less relevant variables from its explanation. We also show that CBF is able to capture well the explaining-away phenomenon that is often represented in Bayesian networks. Moreover, we define two dominance relations between the candidate solutions and use the relations to generalize MRE to find a set of top explanations that is both diverse and representative. Case studies on several benchmark diagnostic Bayesian networks show that MRE is often able to find explanatory hypotheses that are not only precise but also concise.


Using Virtual Patients to Train Clinical Interviewing Skills

AAAI Conferences

Virtual patients are viewed as a cost-effective alternative to standardized patients for role-play training of clinical interviewing skills. However, training studies produce mixed results. Students give high ratings to practice with virtual patients and feel more self-confident, but they show little improvement in objective skills. This confidence-competence gap matches a common cognitive illusion, in which students overestimate the effectiveness of training that is too easy. We hypothesize that cost-effective training requires virtual patients that emphasize functional and psychological fidelity over physical fidelity. We discuss 12 design decisions aimed at cost-effective training and their application in virtual patients for practicing brief intervention in alcohol abuse. Our STAR Workshop includes 3 such patients and a virtual coach. A controlled experiment evaluated STAR and compared it to an easier E-Book and no-training Control. E-Book subjects displayed the illusion, giving high ratings to their training and self-confidence, but performing no better than Control subjects on skills. STAR subjects gave high ratings to their training and self-confidence and scored better higher than E-Book or Control subjects on skills. We invite other researchers to use the underlying Imp technology to build virtual patients for their own work.


Further Optimal Regret Bounds for Thompson Sampling

arXiv.org Machine Learning

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that simultaneously proves both the optimal problem-dependent bound of $(1+\epsilon)\sum_i \frac{\ln T}{\Delta_i}+O(\frac{N}{\epsilon^2})$ and the first near-optimal problem-independent bound of $O(\sqrt{NT\ln T})$ on the expected regret of this algorithm. Our near-optimal problem-independent bound solves a COLT 2012 open problem of Chapelle and Li. The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [ALT 2012]. Our novel martingale-based analysis techniques are conceptually simple, easily extend to distributions other than the Beta distribution, and also extend to the more general contextual bandits setting [Manuscript, Agrawal and Goyal, 2012].


Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Neural Information Processing Systems

The curse of dimensionality is severe when modeling high-dimensional discrete data: the number of possible combinations of the variables explodes exponentially.In this paper we propose a new architecture for modeling high-dimensional data that requires resources (parameters and computations) that grow only at most as the square of the number of variables, usinga multi-layer neural network to represent the joint distribution of the variables as the product of conditional distributions. The neural network can be interpreted as a graphical model without hidden random variables,but in which the conditional distributions are tied through the hidden units. The connectivity of the neural network can be pruned by using dependency tests between the variables. Experiments on modeling the distribution of several discrete data sets show statistically significant improvements over other methods such as naive Bayes and comparable Bayesian networks, and show that significant improvements can be obtained bypruning the network. 1 Introduction The curse of dimensionality hits particularly hard on models of high-dimensional discrete data because there are many more possible combinations of the values of the variables than can possibly be observed in any data set, even the large data sets now common in datamining applications.In this paper we are dealing in particular with multivariate discrete data, where one tries to build a model of the distribution of the data. This can be used for example to detect anomalous cases in data-mining applications, or it can be used to model the class-conditional distribution of some observed variables in order to build a classifier. A simple multinomial maximum likelihood model would give zero probability to all of the combinations not encountered in the training set, i.e., it would most likely give zero probability to most out-of-sample test cases. Smoothing the model by assigning the same nonzero probability for all the unobserved cases would not be satisfactory either because it would not provide much generalization from the training set. This could be obtained by using a multivariate multinomial model whose parameters Bare estimated by the maximum a-posteriori (MAP) principle, i.e., those that have the greatest probability, given the training data D, and using a diffuse prior PCB) (e.g.