Bayesian Learning
Active Learning for Undirected Graphical Model Selection
Vats, Divyanshu, Nowak, Robert D., Baraniuk, Richard G.
Conventional graphical model selection algorithms are passive, i.e., they require all the measurements to have been collected before processing begins. We propose an active learning algorithm that uses junction tree representations to adapt future measurements based on the information gathered from prior measurements. We prove that, under certain conditions, our active learning algorithm requires fewer scalar measurements than any passive algorithm to reliably estimate a graph. A range of numerical results validate our theory and demonstrates the benefits of active learning.
Open problem: Tightness of maximum likelihood semidefinite relaxations
Bandeira, Afonso S., Khoo, Yuehaw, Singer, Amit
We have observed an interesting, yet unexplained, phenomenon: Semidefinite programming (SDP) based relaxations of maximum likelihood estimators (MLE) tend to be tight in recovery problems with noisy data, even when MLE cannot exactly recover the ground truth. Several results establish tightness of SDP based relaxations in the regime where exact recovery from MLE is possible. However, to the best of our knowledge, their tightness is not understood beyond this regime. As an illustrative example, we focus on the generalized Procrustes problem.
A comparison of linear and non-linear calibrations for speaker recognition
Brümmer, Niko, Swart, Albert, van Leeuwen, David
In recent work on both generative and discriminative score to log-likelihood-ratio calibration, it was shown that linear transforms give good accuracy only for a limited range of operating points. Moreover, these methods required tailoring of the calibration training objective functions in order to target the desired region of best accuracy. Here, we generalize the linear recipes to non-linear ones. We experiment with a non-linear, non-parametric, discriminative PAV solution, as well as parametric, generative, maximum-likelihood solutions that use Gaussian, Student's T and normal-inverse-Gaussian score distributions. Experiments on NIST SRE'12 scores suggest that the non-linear methods provide wider ranges of optimal accuracy and can be trained without having to resort to objective function tailoring.
Data mining for censored time-to-event data: A Bayesian network model for predicting cardiovascular risk from electronic health record data
Bandyopadhyay, Sunayan, Wolfson, Julian, Vock, David M., Vazquez-Benitez, Gabriela, Adomavicius, Gediminas, Elidrisi, Mohamed, Johnson, Paul E., O'Connor, Patrick J.
Models for predicting the risk of cardiovascular events based on individual patient characteristics are important tools for managing patient care. Most current and commonly used risk prediction models have been built from carefully selected epidemiological cohorts. However, the homogeneity and limited size of such cohorts restricts the predictive power and generalizability of these risk models to other populations. Electronic health data (EHD) from large health care systems provide access to data on large, heterogeneous, and contemporaneous patient populations. The unique features and challenges of EHD, including missing risk factor information, non-linear relationships between risk factors and cardiovascular event outcomes, and differing effects from different patient subgroups, demand novel machine learning approaches to risk model development. In this paper, we present a machine learning approach based on Bayesian networks trained on EHD to predict the probability of having a cardiovascular event within five years. In such data, event status may be unknown for some individuals as the event time is right-censored due to disenrollment and incomplete follow-up. Since many traditional data mining methods are not well-suited for such data, we describe how to modify both modelling and assessment techniques to account for censored observation times. We show that our approach can lead to better predictive performance than the Cox proportional hazards model (i.e., a regression-based approach commonly used for censored, time-to-event data) or a Bayesian network with {\em{ad hoc}} approaches to right-censoring. Our techniques are motivated by and illustrated on data from a large U.S. Midwestern health care system.
A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data
Wolfson, Julian, Bandyopadhyay, Sunayan, Elidrisi, Mohamed, Vazquez-Benitez, Gabriela, Musgrove, Donald, Adomavicius, Gediminas, Johnson, Paul, O'Connor, Patrick
Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records (EHRs) provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often ``messy,'' with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes (NB) machine learning approach for classification to time-to-event outcomes subject to censoring. We compare the predictive performance of our method to the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an EHR dataset from a large Midwest integrated healthcare system.
Pseudo-Marginal Bayesian Inference for Gaussian Processes
Filippone, Maurizio, Girolami, Mark
The main challenges that arise when adopting Gaussian Process priors in probabilistic modeling are how to carry out exact Bayesian inference and how to account for uncertainty on model parameters when making model-based predictions on out-of-sample data. Using probit regression as an illustrative working example, this paper presents a general and effective methodology based on the pseudo-marginal approach to Markov chain Monte Carlo that efficiently addresses both of these issues. The results presented in this paper show improvements over existing sampling methods to simulate from the posterior distribution over the parameters defining the covariance function of the Gaussian Process prior. This is particularly important as it offers a powerful tool to carry out full Bayesian inference of Gaussian Process based hierarchic statistical models in general. The results also demonstrate that Monte Carlo based integration of all model parameters is actually feasible in this class of models providing a superior quantification of uncertainty in predictions. Extensive comparisons with respect to state-of-the-art probabilistic classifiers confirm this assertion.
Structural Intervention Distance (SID) for Evaluating Causal Graphs
Peters, Jonas, Bühlmann, Peter
Causal inference relies on the structure of a graph, often a directed acyclic graph (DAG). Different graphs may result in different causal inference statements and different intervention distributions. To quantify such differences, we propose a (pre-) distance between DAGs, the structural intervention distance (SID). The SID is based on a graphical criterion only and quantifies the closeness between two DAGs in terms of their corresponding causal inference statements. It is therefore well-suited for evaluating graphs that are used for computing interventions. Instead of DAGs it is also possible to compare CPDAGs, completed partially directed acyclic graphs that represent Markov equivalence classes. Since it differs significantly from the popular Structural Hamming Distance (SHD), the SID constitutes a valuable additional measure. We discuss properties of this distance and provide an efficient implementation with software code available on the first author's homepage (an R package is under construction).
Inapproximability of Treewidth and Related Problems
Wu, Y., Austrin, P., Pitassi, T., Liu, D.
Graphical models, such as Bayesian Networks and Markov networks play an important role in artificial intelligence and machine learning. Inference is a central problem to be solved on these networks. This, and other problems on these graph models are often known to be hard to solve in general, but tractable on graphs with bounded Treewidth. Therefore, finding or approximating the Treewidth of a graph is a fundamental problem related to inference in graphical models. In this paper, we study the approximability of a number of graph problems: Treewidth and Pathwidth of graphs, Minimum Fill-In, One-Shot Black (and Black-White) pebbling costs of directed acyclic graphs, and a variety of different graph layout problems such as Minimum Cut Linear Arrangement and Interval Graph Completion. We show that, assuming the recently introduced Small Set Expansion Conjecture, all of these problems are NP-hard to approximate to within any constant factor in polynomial time.
An Efficient Search Strategy for Aggregation and Discretization of Attributes of Bayesian Networks Using Minimum Description Length
Corcoran, Jem, Tran, Daniel, Levine, Nicholas
Bayesian networks are convenient graphical expressions for high dimensional probability distributions representing complex relationships between a large number of random variables. They have been employed extensively in areas such as bioinformatics, artificial intelligence, diagnosis, and risk management. The recovery of the structure of a network from data is of prime importance for the purposes of modeling, analysis, and prediction. Most recovery algorithms in the literature assume either discrete of continuous but Gaussian data. For general continuous data, discretization is usually employed but often destroys the very structure one is out to recover. Friedman and Goldszmidt suggest an approach based on the minimum description length principle that chooses a discretization which preserves the information in the original data set, however it is one which is difficult, if not impossible, to implement for even moderately sized networks. In this paper we provide an extremely efficient search strategy which allows one to use the Friedman and Goldszmidt discretization in practice.
Constrained speaker linking
van Leeuwen, David A., Brümmer, Niko
In this paper we study speaker linking (a.k.a.\ partitioning) given constraints of the distribution of speaker identities over speech recordings. Specifically, we show that the intractable partitioning problem becomes tractable when the constraints pre-partition the data in smaller cliques with non-overlapping speakers. The surprisingly common case where speakers in telephone conversations are known, but the assignment of channels to identities is unspecified, is treated in a Bayesian way. We show that for the Dutch CGN database, where this channel assignment task is at hand, a lightweight speaker recognition system can quite effectively solve the channel assignment problem, with 93% of the cliques solved. We further show that the posterior distribution over channel assignment configurations is well calibrated.