Goto

Collaborating Authors

 Bayesian Inference


Thinning Measurement Models and Questionnaire Design

Neural Information Processing Systems

Inferring key unobservable features of individuals is an important task in the applied sciences. In particular, an important source of data in fields such as marketing, social sciences and medicine is questionnaires: answers in such questionnaires are noisy measures of target unobserved features. While comprehensive surveys help to better estimate the latent variables of interest, aiming at a high number of questions comes at a price: refusal to participate in surveys can go up, as well as the rate of missing data; quality of answers can decline; costs associated with applying such questionnaires can also increase. In this paper, we cast the problem of refining existing models for questionnaire data as follows: solve a constrained optimization problem of preserving the maximum amount of information found in a latent variable model using only a subset of existing questions. The goal is to find an optimal subset of a given size. For that, we first define an information theoretical measure for quantifying the quality of a reduced questionnaire. Three different approximate inference methods are introduced to solve this problem. Comparisons against a simple but powerful heuristic are presented.


Robust Multi-Class Gaussian Process Classification

Neural Information Processing Systems

Multi-class Gaussian Process Classifiers (MGPCs) are often affected by over-fitting problems when labeling errors occur far from the decision boundaries. To prevent this, we investigate a robust MGPC (RMGPC) which considers labeling errors independently of their distance to the decision boundaries. Expectation propagation is used for approximate inference. Experiments with several datasets in which noise is injected in the class labels illustrate the benefits of RMGPC. This method performs better than other Gaussian process alternatives based on considering latent Gaussian noise or heavy-tailed processes. When no noise is injected in the labels, RMGPC still performs equal or better than the other methods. Finally, we show how RMGPC can be used for successfully identifying data instances which are difficult to classify accurately in practice.


Inferring Interaction Networks using the IBP applied to microRNA Target Prediction

Neural Information Processing Systems

Determining interactions between entities and the overall organization and clustering of nodes in networks is a major challenge when analyzing biological and social network data. Here we extend the Indian Buffet Process (IBP), a nonparametric Bayesian model, to integrate noisy interaction scores with properties of individual entities for inferring interaction networks and clustering nodes within these networks. We present an application of this method to study how microRNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the method improves upon prior methods, correctly recovers interactions and clusters, and provides accurate biological predictions.


Estimating time-varying input signals and ion channel states from a single voltage trace of a neuron

Neural Information Processing Systems

State-of-the-art statistical methods in neuroscience have enabled us to fit mathematical models to experimental data and subsequently to infer the dynamics of hidden parameters underlying the observable phenomena. Here, we develop a Bayesian method for inferring the time-varying mean and variance of the synaptic input, along with the dynamics of each ion channel from a single voltage trace of a neuron. An estimation problem may be formulated on the basis of the state-space model with prior distributions that penalize large fluctuations in these parameters. After optimizing the hyperparameters by maximizing the marginal likelihood, the state-space model provides the time-varying parameters of the input signals and the ion channel states. The proposed method is tested not only on the simulated data from the Hodgkin-Huxley type models but also on experimental data obtained from a cortical slice in vitro.


Unifying Non-Maximum Likelihood Learning Objectives with Minimum KL Contraction

Neural Information Processing Systems

When used to learn high dimensional parametric probabilistic models, the clas- sical maximum likelihood (ML) learning often suffers from computational in- tractability, which motivates the active developments of non-ML learning meth- ods. Yet, because of their divergent motivations and forms, the objective func- tions of many non-ML learning methods are seemingly unrelated, and there lacks a unified framework to understand them. In this work, based on an information geometric view of parametric learning, we introduce a general non-ML learning principle termed as minimum KL contraction, where we seek optimal parameters that minimizes the contraction of the KL divergence between the two distributions after they are transformed with a KL contraction operator. We then show that the objective functions of several important or recently developed non-ML learn- ing methods, including contrastive divergence [12], noise-contrastive estimation [11], partial likelihood [7], non-local contrastive objectives [31], score match- ing [14], pseudo-likelihood [3], maximum conditional likelihood [17], maximum mutual information [2], maximum marginal likelihood [9], and conditional and marginal composite likelihood [24], can be unified under the minimum KL con- traction framework with different choices of the KL contraction operators.


On the computability of conditional probability

arXiv.org Machine Learning

As inductive inference and machine learning methods in computer science see continued success, researchers are aiming to describe even more complex probabilistic models and inference algorithms. What are the limits of mechanizing probabilistic inference? We investigate the computability of conditional probability, a fundamental notion in probability theory and a cornerstone of Bayesian statistics, and show that there are computable joint distributions with noncomputable conditional distributions, ruling out the prospect of general inference algorithms, even inefficient ones. Specifically, we construct a pair of computable random variables in the unit interval such that the conditional distribution of the first variable given the second encodes the halting problem. Nevertheless, probabilistic inference is possible in many common modeling settings, and we prove several results giving broadly applicable conditions under which conditional distributions are computable. In particular, conditional distributions become computable when measurements are corrupted by independent computable noise with a sufficiently smooth density.


Finding Consensus Bayesian Network Structures

Journal of Artificial Intelligence Research

Suppose that multiple experts (or learning algorithms) provide us with alternative Bayesian network (BN) structures over a domain, and that we are interested in combining them into a single consensus BN structure. Specifically, we are interested in that the consensus BN structure only represents independences all the given BN structures agree upon and that it has as few parameters associated as possible. In this paper, we prove that there may exist several non-equivalent consensus BN structures and that finding one of them is NP-hard. Thus, we decide to resort to heuristics to find an approximated consensus BN structure. In this paper, we consider the heuristic proposed by Matzkevich and Abramson, which builds upon two algorithms, called Methods A and B, for efficiently deriving the minimal directed independence map of a BN structure relative to a given node ordering. Methods A and B are claimed to be correct although no proof is provided (a proof is just sketched). In this paper, we show that Methods A and B are not correct and propose a correction of them.


Convergent Expectation Propagation in Linear Models with Spike-and-slab Priors

arXiv.org Machine Learning

Exact inference in the linear regression model with spike and slab priors is often intractable. Expectation propagation (EP) can be used for approximate inference. However, the regular sequential form of EP (R-EP) may fail to converge in this model when the size of the training set is very small. As an alternative, we propose a provably convergent EP algorithm (PC-EP). PC-EP is proved to minimize an energy function which, under some constraints, is bounded from below and whose stationary points coincide with the solution of R-EP. Experiments with synthetic data indicate that when R-EP does not converge, the approximation generated by PC-EP is often better. By contrast, when R-EP converges, both methods perform similarly.


Discriminant Analysis with Adaptively Pooled Covariance

arXiv.org Machine Learning

Linear and Quadratic Discriminant analysis (LDA/QDA) are common tools for classification problems. For these methods we assume observations are normally distributed within group. We estimate a mean and covariance matrix for each group and classify using Bayes theorem. With LDA, we estimate a single, pooled covariance matrix, while for QDA we estimate a separate covariance matrix for each group. Rarely do we believe in a homogeneous covariance structure between groups, but often there is insufficient data to separately estimate covariance matrices. We propose 1-PDA, a regularized model which adaptively pools elements of the precision matrices. Adaptively pooling these matrices decreases the variance of our estimates (as in LDA), without overly biasing them. In this paper, we propose and discuss this method, give an efficient algorithm to fit it for moderate sized problems, and show its efficacy on real and simulated datasets. Keywords: Lasso, Penalized, Discriminant Analysis, Interactions, Classification 1. Introduction Consider the usual two class problem: our data consists ofn observations, each observation with a known class label { 1, 2}, and p covariates measured per observation.


Dimension adaptability of Gaussian process models with variable selection and projection

arXiv.org Machine Learning

It is now known that an extended Gaussian process model equipped with rescaling can adapt to different smoothness levels of a function valued parameter in many nonparametric Bayesian analyses, offering a posterior convergence rate that is optimal (up to logarithmic factors) for the smoothness class the true function belongs to. This optimal rate also depends on the dimension of the function's domain and one could potentially obtain a faster rate of convergence by casting the analysis in a lower dimensional subspace that does not amount to any loss of information about the true function. In general such a subspace is not known a priori but can be explored by equipping the model with variable selection or linear projection. We demonstrate that for nonparametric regression, classification, density estimation and density regression, a rescaled Gaussian process model equipped with variable selection or linear projection offers a posterior convergence rate that is optimal (up to logarithmic factors) for the lowest dimension in which the analysis could be cast without any loss of information about the true function. Theoretical exploration of such dimension reduction features appears novel for Bayesian nonparametric models with or without Gaussian processes.