Bayesian Learning
A Model for Quality of Schooling
Moussavi, Massoud (Causal Links, LLC) | McGinn, Noel (Causal Links, LLC)
A key challenge for policymakers in many developing countries is to decide which intervention or collection of interventions works best to improve learning outcomes in their schools. Our aim is to develop a causal model that explains student learning outcomes in terms of observable characteristics as well as conditions and processes difficult to observe directly. We start with a theoretical model based on the results of previous research, direct experience and experts’ knowledge in the field. This model is then refined through application of supervised learning methods to available data sets. Once calibrated with local data in a country, the model estimates the probability that a given intervention would affect learning outcomes.
Join-Graph Propagation Algorithms
Mateescu, R., Kask, K., Gogate, V., Dechter, R.
The paper investigates parameterized approximate message-passing schemes that are based on bounded inference and are inspired by Pearls belief propagation algorithm (BP). We start with the bounded inference mini-clustering algorithm and then move to the iterative scheme called Iterative Join-Graph Propagation (IJGP), that combines both iteration and bounded inference. Algorithm IJGP belongs to the class of Generalized Belief Propagation algorithms, a framework that allowed connections with approximate algorithms from statistical physics and is shown empirically to surpass the performance of mini-clustering and belief propagation, as well as a number of other state-of-the-art algorithms on several classes of networks. We also provide insight into the accuracy of iterative BP and IJGP by relating these algorithms to well known classes of constraint propagation schemes.
What does Newcomb's paradox teach us?
Wolpert, David H., Benford, Gregory
In Newcomb's paradox you choose to receive either the contents of a particular closed box, or the contents of both that closed box and another one. Before you choose, a prediction algorithm deduces your choice, and fills the two boxes based on that deduction. Newcomb's paradox is that game theory appears to provide two conflicting recommendations for what choice you should make in this scenario. We analyze Newcomb's paradox using a recent extension of game theory in which the players set conditional probability distributions in a Bayes net. We show that the two game theory recommendations in Newcomb's scenario have different presumptions for what Bayes net relates your choice and the algorithm's prediction. We resolve the paradox by proving that these two Bayes nets are incompatible. We also show that the accuracy of the algorithm's prediction, the focus of much previous work, is irrelevant. In addition we show that Newcomb's scenario only provides a contradiction between game theory's expected utility and dominance principles if one is sloppy in specifying the underlying Bayes net. We also show that Newcomb's paradox is time-reversal invariant; both the paradox and its resolution are unchanged if the algorithm makes its `prediction' after you make your choice rather than before.
Supervised Topic Models
Blei, David M., McAuliffe, Jon D.
We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive an approximate maximum-likelihood procedure for parameter estimation, which relies on variational methods to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and the political tone of amendments in the U.S. Senate based on the amendment text. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression.
Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG
Jakaite, L., Schetinin, V., Maple, C.
The methodology of Bayesian Model Averaging (BMA) is applied for assessment of newborn brain maturity from sleep EEG. In theory this methodology provides the most accurate assessments of uncertainty in decisions. However, the existing BMA techniques have been shown providing biased assessments in the absence of some prior information enabling to explore model parameter space in details within a reasonable time. The lack in details leads to disproportional sampling from the posterior distribution. In case of the EEG assessment of brain maturity, BMA results can be biased because of the absence of information about EEG feature importance. In this paper we explore how the posterior information about EEG features can be used in order to reduce a negative impact of disproportional sampling on BMA performance. We use EEG data recorded from sleeping newborns to test the efficiency of the proposed BMA technique.
Syntactic Topic Models
Boyd-Graber, Jordan, Blei, David M.
The syntactic topic model (STM) is a Bayesian nonparametric model of language that discovers latent distributions of words (topics) that are both semantically and syntactically coherent. The STM models dependency parsed corpora where sentences are grouped into documents. It assumes that each word is drawn from a latent topic chosen by combining document-level features and the local syntactic context. Each document has a distribution over latent topics, as in topic models, which provides the semantic consistency. Each element in the dependency parse tree also has a distribution over the topics of its children, as in latent-state syntax models, which provides the syntactic consistency. These distributions are convolved so that the topic of each word is likely under both its document and syntactic context. We derive a fast posterior inference algorithm based on variational methods. We report qualitative and quantitative studies on both synthetic data and hand-parsed documents. We show that the STM is a more predictive model of language than current models based only on syntax or only on topics.
Convergence of Bayesian Control Rule
Ortega, Pedro A., Braun, Daniel A.
Recently, new approaches to adaptive control have sought to reformulate the problem as a minimization of a relative entropy criterion to obtain tractable solutions. In particular, it has been shown that minimizing the expected deviation from the causal input-output dependencies of the true plant leads to a new promising stochastic control rule called the Bayesian control rule. This work proves the convergence of the Bayesian control rule under two sufficient assumptions: boundedness, which is an ergodicity condition; and consistency, which is an instantiation of the sure-thing principle.
Modeling of Human Criminal Behavior using Probabilistic Networks
Pillai, Ramesh Kumar Gopala, P, Dr. Ramakanth Kumar .
Currently, criminal's profile (CP) is obtained from investigator's or forensic psychologist's interpretation, linking crime scene characteristics and an offender's behavior to his or her characteristics and psychological profile. This paper seeks an efficient and systematic discovery of non-obvious and valuable patterns between variables from a large database of solved cases via a probabilistic network (PN) modeling approach. The PN structure can be used to extract behavioral patterns and to gain insight into what factors influence these behaviors. Thus, when a new case is being investigated and the profile variables are unknown because the offender has yet to be identified, the observed crime scene variables are used to infer the unknown variables based on their connections in the structure and the corresponding numerical (probabilistic) weights. The objective is to produce a more systematic and empirical approach to profiling, and to use the resulting PN model as a decision tool.
A Generalization of the Chow-Liu Algorithm and its Application to Statistical Learning
We extend the Chow-Liu algorithm for general random variables while the previous versions only considered finite cases. In particular, this paper applies the generalization to Suzuki's learning algorithm that generates from data forests rather than trees based on the minimum description length by balancing the fitness of the data to the forest and the simplicity of the forest. As a result, we successfully obtain an algorithm when both of the Gaussian and finite random variables are present.
Confidence Sets Based on Penalized Maximum Likelihood Estimators in Gaussian Regression
Pötscher, Benedikt M., Schneider, Ulrike
Confidence intervals based on penalized maximum likelihood estimators such as the LASSO, adaptive LASSO, and hard-thresholding are analyzed. In the known-variance case, the finite-sample coverage properties of such intervals are determined and it is shown that symmetric intervals are the shortest. The length of the shortest intervals based on the hard-thresholding estimator is larger than the length of the shortest interval based on the adaptive LASSO, which is larger than the length of the shortest interval based on the LASSO, which in turn is larger than the standard interval based on the maximum likelihood estimator. In the case where the penalized estimators are tuned to possess the `sparsity property', the intervals based on these estimators are larger than the standard interval by an order of magnitude. Furthermore, a simple asymptotic confidence interval construction in the `sparse' case, that also applies to the smoothly clipped absolute deviation estimator, is discussed. The results for the known-variance case are shown to carry over to the unknown-variance case in an appropriate asymptotic sense.