Directed Networks
Simple approximate MAP Inference for Dirichlet processes
Raykov, Yordan P., Boukouvalas, Alexis, Little, Max A.
Simple approximate MAP Inference for Dirichlet processes Yordan P. Raykov, Alexis Boukouvalas, Max A. Little October 27, 2014 Abstract The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibb's sampling are required. As a result, DPM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. For example, they would not be practical for digital signal processing on embedded hardware, where computational resources are at a serious premium. Here, we develop simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithms for DPMs. This algorithm is as simple asK -means clustering, performs in experiments as well as Gibb's sampling, while requiring only a fraction of the computational effort. Unlike related small variance asymptotics, our algorithm is non-degenerate and so inherits the "rich get richer" property of the Dirichlet process. It also retains a non-degenerate closed-form likelihood which enables standard tools such as cross-validation to be used. This is a well-posed approximation to the MAP solution of the probabilistic DPM model. 1 Introduction Bayesian nonparametric (BNP) models have been successfully applied on a wide range of domains but despite significant improvements in computational hardware, statistical inference in most BNP models remains infeasible in the context of large datasets. The flexibility gained by such models is paid for with severe decreases in computational efficiency, and this makes these models somewhat impractical.
Bayesian feature selection with strongly-regularizing priors maps to the Ising Model
Fisher, Charles K., Mehta, Pankaj
Identifying small subsets of features that are relevant for prediction and/or classification tasks is a central problem in machine learning and statistics. The feature selection task is especially important, and computationally difficult, for modern datasets where the number of features can be comparable to, or even exceed, the number of samples. Here, we show that feature selection with Bayesian inference takes a universal form and reduces to calculating the magnetizations of an Ising model, under some mild conditions. Our results exploit the observation that the evidence takes a universal form for strongly-regularizing priors --- priors that have a large effect on the posterior probability even in the infinite data limit. We derive explicit expressions for feature selection for generalized linear models, a large class of statistical techniques that include linear and logistic regression. We illustrate the power of our approach by analyzing feature selection in a logistic regression-based classifier trained to distinguish between the letters B and D in the notMNIST dataset.
Variational Gaussian Process State-Space Models
Frigola, Roger, Chen, Yutian, Rasmussen, Carl E.
State-space models have been successfully used for more than fifty years in different areas of science and engineering. We present a procedure for efficient varia-tional Bayesian learning of nonlinear state-space models based on sparse Gaussian processes. The result of learning is a tractable posterior over nonlinear dynamical systems. In comparison to conventional parametric models, we offer the possibility to straightforwardly trade off model capacity and computational cost whilst avoiding overfitting. Our main algorithm uses a hybrid inference approach combining variational Bayes and sequential Monte Carlo.
Deterministic Bayesian Information Fusion and the Analysis of its Performance
Sensor networks are ubiquitous across many different domains, including wireless communications, temperature and process control, area surveillance, object tracking and numerous other fields [2, 6]. Large performance gains can be achieved in such networks by performing data fusion between the sensors, or combining information from the individual sensors to reach system-level decisions [9, 16, 24, 26]. The sensors are typically connected by wireless links to either a separate information collector (centralized fusion) or to each other (distributed fusion). Elementary fusion rules based on Boolean logic are used in many contexts due to their simplicity and ease of implementation. On the other hand, in most situations we have some knowledge of the statistical properties of the sensors' outputs, and designing fusion rules that take this into account can provide much better performance [17, 24]. The fusion rule can be built to satisfy any of various statistical optimality criteria, such as achieving the maximum likelihood or the minimum Bayes risk, under any other constraints of the problem [17].
Compressed Sensing for Energy-Efficient Wireless Telemonitoring of Noninvasive Fetal ECG via Block Sparse Bayesian Learning
Zhang, Zhilin, Jung, Tzyy-Ping, Makeig, Scott, Rao, Bhaskar D.
Fetal ECG (FECG) telemonitoring is an important branch in telemedicine. The design of a telemonitoring system via a wireless body-area network with low energy consumption for ambulatory use is highly desirable. As an emerging technique, compressed sensing (CS) shows great promise in compressing/reconstructing data with low energy consumption. However, due to some specific characteristics of raw FECG recordings such as non-sparsity and strong noise contamination, current CS algorithms generally fail in this application. This work proposes to use the block sparse Bayesian learning (BSBL) framework to compress/reconstruct non-sparse raw FECG recordings. Experimental results show that the framework can reconstruct the raw recordings with high quality. Especially, the reconstruction does not destroy the interdependence relation among the multichannel recordings. This ensures that the independent component analysis decomposition of the reconstructed recordings has high fidelity. Furthermore, the framework allows the use of a sparse binary sensing matrix with much fewer nonzero entries to compress recordings. Particularly, each column of the matrix can contain only two nonzero entries. This shows the framework, compared to other algorithms such as current CS algorithms and wavelet algorithms, can greatly reduce code execution in CPU in the data compression stage.
Extension of SBL Algorithms for the Recovery of Block Sparse Signals with Intra-Block Correlation
Zhang, Zhilin, Rao, Bhaskar D.
We examine the recovery of block sparse signals and extend the framework in two important directions; one by exploiting signals' intra-block correlation and the other by generalizing signals' block structure. We propose two families of algorithms based on the framework of block sparse Bayesian learning (BSBL). One family, directly derived from the BSBL framework, requires knowledge of the block structure. Another family, derived from an expanded BSBL framework, is based on a weaker assumption on the block structure, and can be used when the block structure is completely unknown. Using these algorithms we show that exploiting intra-block correlation is very helpful in improving recovery performance. These algorithms also shed light on how to modify existing algorithms or design new ones to exploit such correlation and improve performance.
Noisy Matrix Completion under Sparse Factor Models
Soni, Akshay, Jain, Swayambhoo, Haupt, Jarvis, Gonella, Stefano
This paper examines a general class of noisy matrix completion tasks where the goal is to estimate a matrix from observations obtained at a subset of its entries, each of which is subject to random noise or corruption. Our specific focus is on settings where the matrix to be estimated is well-approximated by a product of two (a priori unknown) matrices, one of which is sparse. Such structural models - referred to here as "sparse factor models" - have been widely used, for example, in subspace clustering applications, as well as in contemporary sparse modeling and dictionary learning tasks. Our main theoretical contributions are estimation error bounds for sparsity-regularized maximum likelihood estimators for problems of this form, which are applicable to a number of different observation noise or corruption models. Several specific implications are examined, including scenarios where observations are corrupted by additive Gaussian noise or additive heavier-tailed (Laplace) noise, Poisson-distributed observations, and highly-quantized (e.g., one-bit) observations. We also propose a simple algorithmic approach based on the alternating direction method of multipliers for these tasks, and provide experimental evidence to support our error analyses.
Risk Event and Probability Extraction for Modeling Medical Risks
Jochim, Charles (IBM Research โ ย Ireland) | Sacaleanu, Bogdan (IBM Research โ ย Ireland) | Deleris, Lรฉa A. (IBM Research โ ย Ireland)
In this paper we address the task of extracting risk events and probabilities from free text, focusing in particular on the biomedical domain. While our initial motivation is to enable the determination of the parameters of a Bayesian belief network, our approach is not specific to that use case. We are the first to investigate this task as a sequence tagging problem where we label spans of text as events A or B that are then used to construct probability statements of the form P(A|B)=x. We show that our approach significantly outperforms an entity extraction baseline on a new annotated medical risk event corpus. We also explore semi-supervised methods that lead to modest improvement, encouraging further work in this direction.
Altitude Training: Strong Bounds for Single-Layer Dropout
Wager, Stefan, Fithian, William, Wang, Sida, Liang, Percy
Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks. This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. Dropout achieves this gain much like a marathon runner who practices at altitude: once a classifier learns to perform reasonably well on training examples that have been artificially corrupted by dropout, it will do very well on the uncorrupted test set. We also show that, under similar conditions, dropout preserves the Bayes decision boundary and should therefore induce minimal bias in high dimensions.
Causal Inference through a Witness Protection Program
One of the most fundamental problems in causal inference is the estimation of a causal effect when variables are confounded. This is difficult in an observational study, because one has no direct evidence that all confounders have been adjusted for. We introduce a novel approach for estimating causal effects that exploits observational conditional independencies to suggest "weak" paths in a unknown causal graph. The widely used faithfulness condition of Spirtes et al. is relaxed to allow for varying degrees of "path cancellations" that imply conditional independencies but do not rule out the existence of confounding causal paths. The outcome is a posterior distribution over bounds on the average causal effect via a linear programming approach and Bayesian inference. We claim this approach should be used in regular practice along with other default tools in observational studies.