Bayesian Learning
Compressed Sensing for Energy-Efficient Wireless Telemonitoring of Noninvasive Fetal ECG via Block Sparse Bayesian Learning
Zhang, Zhilin, Jung, Tzyy-Ping, Makeig, Scott, Rao, Bhaskar D.
Fetal ECG (FECG) telemonitoring is an important branch in telemedicine. The design of a telemonitoring system via a wireless body-area network with low energy consumption for ambulatory use is highly desirable. As an emerging technique, compressed sensing (CS) shows great promise in compressing/reconstructing data with low energy consumption. However, due to some specific characteristics of raw FECG recordings such as non-sparsity and strong noise contamination, current CS algorithms generally fail in this application. This work proposes to use the block sparse Bayesian learning (BSBL) framework to compress/reconstruct non-sparse raw FECG recordings. Experimental results show that the framework can reconstruct the raw recordings with high quality. Especially, the reconstruction does not destroy the interdependence relation among the multichannel recordings. This ensures that the independent component analysis decomposition of the reconstructed recordings has high fidelity. Furthermore, the framework allows the use of a sparse binary sensing matrix with much fewer nonzero entries to compress recordings. Particularly, each column of the matrix can contain only two nonzero entries. This shows the framework, compared to other algorithms such as current CS algorithms and wavelet algorithms, can greatly reduce code execution in CPU in the data compression stage.
Extension of SBL Algorithms for the Recovery of Block Sparse Signals with Intra-Block Correlation
Zhang, Zhilin, Rao, Bhaskar D.
We examine the recovery of block sparse signals and extend the framework in two important directions; one by exploiting signals' intra-block correlation and the other by generalizing signals' block structure. We propose two families of algorithms based on the framework of block sparse Bayesian learning (BSBL). One family, directly derived from the BSBL framework, requires knowledge of the block structure. Another family, derived from an expanded BSBL framework, is based on a weaker assumption on the block structure, and can be used when the block structure is completely unknown. Using these algorithms we show that exploiting intra-block correlation is very helpful in improving recovery performance. These algorithms also shed light on how to modify existing algorithms or design new ones to exploit such correlation and improve performance.
Noisy Matrix Completion under Sparse Factor Models
Soni, Akshay, Jain, Swayambhoo, Haupt, Jarvis, Gonella, Stefano
This paper examines a general class of noisy matrix completion tasks where the goal is to estimate a matrix from observations obtained at a subset of its entries, each of which is subject to random noise or corruption. Our specific focus is on settings where the matrix to be estimated is well-approximated by a product of two (a priori unknown) matrices, one of which is sparse. Such structural models - referred to here as "sparse factor models" - have been widely used, for example, in subspace clustering applications, as well as in contemporary sparse modeling and dictionary learning tasks. Our main theoretical contributions are estimation error bounds for sparsity-regularized maximum likelihood estimators for problems of this form, which are applicable to a number of different observation noise or corruption models. Several specific implications are examined, including scenarios where observations are corrupted by additive Gaussian noise or additive heavier-tailed (Laplace) noise, Poisson-distributed observations, and highly-quantized (e.g., one-bit) observations. We also propose a simple algorithmic approach based on the alternating direction method of multipliers for these tasks, and provide experimental evidence to support our error analyses.
Risk Event and Probability Extraction for Modeling Medical Risks
Jochim, Charles (IBM Research โ ย Ireland) | Sacaleanu, Bogdan (IBM Research โ ย Ireland) | Deleris, Lรฉa A. (IBM Research โ ย Ireland)
In this paper we address the task of extracting risk events and probabilities from free text, focusing in particular on the biomedical domain. While our initial motivation is to enable the determination of the parameters of a Bayesian belief network, our approach is not specific to that use case. We are the first to investigate this task as a sequence tagging problem where we label spans of text as events A or B that are then used to construct probability statements of the form P(A|B)=x. We show that our approach significantly outperforms an entity extraction baseline on a new annotated medical risk event corpus. We also explore semi-supervised methods that lead to modest improvement, encouraging further work in this direction.
Altitude Training: Strong Bounds for Single-Layer Dropout
Wager, Stefan, Fithian, William, Wang, Sida, Liang, Percy
Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks. This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. Dropout achieves this gain much like a marathon runner who practices at altitude: once a classifier learns to perform reasonably well on training examples that have been artificially corrupted by dropout, it will do very well on the uncorrupted test set. We also show that, under similar conditions, dropout preserves the Bayes decision boundary and should therefore induce minimal bias in high dimensions.
Causal Inference through a Witness Protection Program
One of the most fundamental problems in causal inference is the estimation of a causal effect when variables are confounded. This is difficult in an observational study, because one has no direct evidence that all confounders have been adjusted for. We introduce a novel approach for estimating causal effects that exploits observational conditional independencies to suggest "weak" paths in a unknown causal graph. The widely used faithfulness condition of Spirtes et al. is relaxed to allow for varying degrees of "path cancellations" that imply conditional independencies but do not rule out the existence of confounding causal paths. The outcome is a posterior distribution over bounds on the average causal effect via a linear programming approach and Bayesian inference. We claim this approach should be used in regular practice along with other default tools in observational studies.
Faster graphical model identification of tandem mass spectra using peptide word lattices
Wang, Shengjie, Halloran, John T., Bilmes, Jeff A., Noble, William S.
Liquid chromatography coupled with tandem mass spectrometry, also known as shotgun proteomics, is a widely-used high-throughput technology for identifying proteins in complex biological samples. Analysis of the tens of thousands of fragmentation spectra produced by a typical shotgun proteomics experiment begins by assigning to each observed spectrum the peptide hypothesized to be responsible for generating the spectrum, typically done by searching each spectrum against a database of peptides. We have recently described a machine learning method---Dynamic Bayesian Network for Rapid Identification of Peptides (DRIP)---that not only achieves state-of-the-art spectrum identification performance on a variety of datasets but also provides a trainable model capable of returning valuable auxiliary information regarding specific peptide-spectrum matches. In this work, we present two significant improvements to DRIP. First, we describe how to use word lattices, which are widely used in natural language processing, to significantly speed up DRIP's computations. To our knowledge, all existing shotgun proteomics search engines compute independent scores between a given observed spectrum and each possible candidate peptide from the database. The key idea of the word lattice is to represent the set of candidate peptides in a single data structure, thereby allowing sharing of redundant computations among the different candidates. We demonstrate that using lattices in conjunction with DRIP leads to speedups on the order of tens across yeast and worm data sets. Second, we introduce a variant of DRIP that uses a discriminative training framework, performing maximum mutual entropy estimation rather than maximum likelihood estimation. This modification improves DRIP's statistical power, enabling us to increase the number of identified spectrum at a 1% false discovery rate on yeast and worm data sets.
Sparse Estimation using Bayesian Hierarchical Prior Modeling for Real and Complex Linear Models
Pedersen, Niels Lovmand, Manchรณn, Carles Navarro, Badiu, Mihai-Alin, Shutin, Dmitriy, Fleury, Bernard Henri
In sparse Bayesian learning (SBL), Gaussian scale mixtures (GSMs) have been used to model sparsity-inducing priors that realize a class of concave penalty functions for the regression task in real-valued signal models. Motivated by the relative scarcity of formal tools for SBL in complex-valued models, this paper proposes a GSM model - the Bessel K model - that induces concave penalty functions for the estimation of complex sparse signals. The properties of the Bessel K model are analyzed when it is applied to Type I and Type II estimation. This analysis reveals that, by tuning the parameters of the mixing pdf different penalty functions are invoked depending on the estimation type used, the value of the noise variance, and whether real or complex signals are estimated. Using the Bessel K model, we derive a sparse estimator based on a modification of the expectation-maximization algorithm formulated for Type II estimation. The estimator includes as a special instance the algorithms proposed by Tipping and Faul [1] and by Babacan et al. [2]. Numerical results show the superiority of the proposed estimator over these state-of-the-art estimators in terms of convergence speed, sparseness, reconstruction error, and robustness in low and medium signal-to-noise ratio regimes.
On the Challenges of Physical Implementations of RBMs
Dumoulin, Vincent, Goodfellow, Ian J., Courville, Aaron, Bengio, Yoshua
Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC. Physical computation offers the opportunity to reduce the cost of sampling by building physical systems whose natural dynamics correspond to drawing samples from the desired RBM distribution. Such a system avoids the burn-in and mixing cost of a Markov chain. However, hardware implementations of this variety usually entail limitations such as low-precision and limited range of the parameters and restrictions on the size and topology of the RBM. We conduct software simulations to determine how harmful each of these restrictions is. Our simulations are based on the D-Wave Two computer, but the issues we investigate arise in most forms of physical computation. Our findings suggest that designers of new physical computing hardware and algorithms for physical computers should focus their efforts on overcoming the limitations imposed by the topology restrictions of currently existing physical computers.
Non-parametric Bayesian Learning with Deep Learning Structure and Its Applications in Wireless Networks
Statistical models have been applied to the classification and prediction problems in machine learning and data analysis [1]. Some statistical methods make the hypothesis of mathematical models that are controlled by certain parameters to fit the latent structure of observed data [2]. The observed data are assumed to be generated by complex structures which have hierarchical layers and hidden causes [3]. One key challenge faced by modeling the data structure in this way is thus the determination of the numbers of layers and hidden variables. However, it is sometimes impractical and challenging to choose any fixed number for the model structure when making the hypothesis.