Uncertainty
Probabilistic detection of short events, with application to critical care monitoring
Aleks, Norm, Russell, Stuart J., Madden, Michael G., Morabito, Diane, Staudenmayer, Kristan, Cohen, Mitchell, Manley, Geoffrey T.
We describe an application of probabilistic modeling and inference technology to the problem of analyzing sensor data in the setting of an intensive care unit (ICU). In particular, we consider the arterial-line blood pressure sensor, which is subject to frequent data artifacts that cause false alarms in the ICU and make the raw data almost useless for automated decision making. The problem is complicated by the fact that the sensor data are averaged over fixed intervals whereas the events causing data artifacts may occur at any time and often have durations significantly shorter than the data collection interval. We show that careful modeling of the sensor, combined with a general technique for detecting sub-interval events and estimating their duration, enables detection of artifacts and accurate estimation of the underlying blood pressure values. Our model's performance identifying artifacts is superior to two other classifiers' and about as good as a physician's.
Mixed Membership Stochastic Blockmodels
Airoldi, Edo M., Blei, David M., Fienberg, Stephen E., Xing, Eric P.
In many settings, such as protein interactions and gene regulatory networks, collections ofauthor-recipient email, and social networks, the data consist of pairwise measurements, e.g., presence or absence of links between pairs of objects. Analyzing such data with probabilistic models requires nonstandard assumptions, since the usual independence or exchangeability assumptions no longer hold. In this paper, we introduce a class of latent variable models for pairwise measurements: mixedmembership stochastic blockmodels. Models in this class combine a global model of dense patches of connectivity (blockmodel) with a local model to instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference.We demonstrate the advantages of mixed membership stochastic blockmodel with applications to social networks and protein interaction networks.
The Gaussian Process Density Sampler
Murray, Iain, MacKay, David, Adams, Ryan P.
We present the Gaussian Process Density Sampler (GPDS), an exchangeable generative model for use in nonparametric Bayesian density estimation. Samples drawn from the GPDS are consistent with exact, independent samples from a fixed density function that is a transformation of a function drawn from a Gaussian process prior. Our formulation allows us to infer an unknown density from data using Markov chain Monte Carlo, which gives samples from the posterior distribution over density functions and from the predictive distribution on data space. We can also infer the hyperparameters of the Gaussian process. We compare this density modeling technique to several existing techniques on a toy problem and a skull-reconstruction task.
Nonparametric Bayesian Texture Learning and Synthesis
Zhu, Long, Chen, Yuanahao, Freeman, Bill, Torralba, Antonio
We present a nonparametric Bayesian method for texture learning and synthesis. A texture image is represented by a 2D-Hidden Markov Model (2D-HMM) where the hidden states correspond to the cluster labeling of textons and the transition matrix encodes their spatial layout (the compatibility between adjacent textons). 2D-HMM is coupled with the Hierarchical Dirichlet process (HDP) which allows the number of textons and the complexity of transition matrix grow as the input texture becomes irregular. The HDP makes use of Dirichlet process prior which favors regular textures by penalizing the model complexity. This framework (HDP-2D-HMM) learns the texton vocabulary and their spatial layout jointly and automatically. The HDP-2D-HMM results in a compact representation of textures which allows fast texture synthesis with comparable rendering quality over the state-of-the-art image-based rendering methods. We also show that HDP-2D-HMM can be applied to perform image segmentation and synthesis.
Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions
Yao, Bangpeng, Walther, Dirk, Beck, Diane, Fei-fei, Li
The human brain can be described as containing a number of functional regions. For a given task, these regions, as well as the connections between them, play a key role in information processing in the brain. However, most existing multi-voxel pattern analysis approaches either treat multiple functional regions as one large uniform region or several independent regions, ignoring the connections between regions. In this paper, we propose to model such connections in an Hidden Conditional Random Field (HCRF) framework, where the classifier of one region of interest (ROI) makes predictions based on not only its voxels but also the classifier predictions from ROIs that it connects to. Furthermore, we propose a structural learning method in the HCRF framework to automatically uncover the connections between ROIs. Experiments on fMRI data acquired while human subjects viewing images of natural scenes show that our model can improve the top-level (the classifier combining information from all ROIs) and ROI-level prediction accuracy, as well as uncover some meaningful connections between ROIs.
Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units
Yan, Feng, Xu, Ningyi, Qi, Yuan
The recent emergence of Graphics Processing Units (GPUs) as general-purpose parallel computing devices provides us with new opportunities to develop scalable learning methods for massive data. In this work, we consider the problem of parallelizing two inference methods on GPUs for latent Dirichlet Allocation (LDA) models, collapsed Gibbs sampling (CGS) and collapsed variational Bayesian (CVB). To address limited memory constraints on GPUs, we propose a novel data partitioning scheme that effectively reduces the memory cost. Furthermore, the partitioning scheme balances the computational cost on each multiprocessor and enables us to easily avoid memory access conflicts. We also use data streaming to handle extremely large datasets. Extensive experiments showed that our parallel inference methods consistently produced LDA models with the same predictive power as sequential training methods did but with 26x speedup for CGS and 196x speedup for CVB on a GPU with 30 multiprocessors; actually the speedup is almost linearly scalable with the number of multiprocessors available. The proposed partitioning scheme and data streaming can be easily ported to many other models in machine learning.
A Neural Implementation of the Kalman Filter
There is a growing body of experimental evidence to suggest that the brain is capable of approximating optimal Bayesian inference in the face of noisy input stimuli. Despite this progress, the neural underpinnings of this computation are still poorly understood. In this paper we focus on the problem of Bayesian filtering of stochastic time series. In particular we introduce a novel neural network, derived from a line attractor architecture, whose dynamics map directly onto those of the Kalman Filter in the limit where the prediction error is small. When the prediction error is large we show that the network responds robustly to change-points in a way that is qualitatively compatible with the optimal Bayesian model. The model suggests ways in which probability distributions are encoded in the brain and makes a number of testable experimental predictions.
Sequential effects reflect parallel learning of multiple environmental regularities
Wilder, Matthew, Jones, Matt, Mozer, Michael C.
Across a wide range of cognitive tasks, recent experience influences behavior. For example, when individuals repeatedly perform a simple two-alternative forcedchoice task(2AFC), response latencies vary dramatically based on the immediately preceding trial sequence. These sequential effects have been interpreted as adaptation to the statistical structure of an uncertain, changing environment (e.g., Jones and Sieck, 2003; Mozer, Kinoshita, and Shettel, 2007; Yu and Cohen, 2008).The Dynamic Belief Model (DBM) (Yu and Cohen, 2008) explains sequential effects in 2AFC tasks as a rational consequence of a dynamic internal representation that tracks second-order statistics of the trial sequence (repetition rates) and predicts whether the upcoming trial will be a repetition or an alternation ofthe previous trial. Experimental results suggest that first-order statistics (base rates) also influence sequential effects. We propose a model that learns both first-and second-order sequence properties, each according to the basic principles ofthe DBM but under a unified inferential framework. This model, the Dynamic BeliefMixture Model (DBM2), obtains precise, parsimonious fits to data. Furthermore, the model predicts dissociations in behavioral (Maloney, Martello, Sahm, and Spillmann, 2005) and electrophysiological studies (Jentzsch and Sommer, 2002),supporting the psychological and neurobiological reality of its two components.
Variational Inference for the Nested Chinese Restaurant Process
The nested Chinese restaurant process (nCRP) is a powerful nonparametric Bayesian model for learning tree-based hierarchies from data. Since its posterior distribution is intractable, current inference methods have all relied on MCMC sampling. In this paper, we develop an alternative inference technique based on variational methods. To employ variational methods, we derive a tree-based stick-breaking construction of the nCRP mixture model, and a novel variational algorithm that efficiently explores a posterior over a large set of combinatorial structures. We demonstrate the use of this approach for text and hand written digits modeling, where we show we can adapt the nCRP to continuous data as well.
Rethinking LDA: Why Priors Matter
Wallach, Hanna M., Mimno, David M., McCallum, Andrew
Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such smoothing parameters" have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document-topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic-word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling."