Goto

Collaborating Authors

 Learning Graphical Models


Learning graphical models from the Glauber dynamics

arXiv.org Machine Learning

In this paper we consider the problem of learning undirected graphical models from data generated according to the Glauber dynamics. The Glauber dynamics is a Markov chain that sequentially updates individual nodes (variables) in a graphical model and it is frequently used to sample from the stationary distribution (to which it converges given sufficient time). Additionally, the Glauber dynamics is a natural dynamical model in a variety of settings. This work deviates from the standard formulation of graphical model learning in the literature, where one assumes access to i.i.d. samples from the distribution. Much of the research on graphical model learning has been directed towards finding algorithms with low computational cost. As the main result of this work, we establish that the problem of reconstructing binary pairwise graphical models is computationally tractable when we observe the Glauber dynamics. Specifically, we show that a binary pairwise graphical model on $p$ nodes with maximum degree $d$ can be learned in time $f(d)p^2\log p$, for a function $f(d)$, using nearly the information-theoretic minimum number of samples.


Efficient inference of overlapping communities in complex networks

arXiv.org Machine Learning

We discuss two views on extending existing methods for complex network modeling which we dub the communities first and the networks first view, respectively. Inspired by the networks first view that we attribute to White, Boorman, and Breiger (1976)[1], we formulate the multiple-networks stochastic blockmodel (MNSBM), which seeks to separate the observed network into subnetworks of different types and where the problem of inferring structure in each subnetwork becomes easier. We show how this model is specified in a generative Bayesian framework where parameters can be inferred efficiently using Gibbs sampling. The result is an effective multiple-membership model without the drawbacks of introducing complex definitions of "groups" and how they interact. We demonstrate results on the recovery of planted structure in synthetic networks and show very encouraging results on link prediction performances using multiple-networks models on a number of real-world network data sets.


A Nonparametric Bayesian Approach to Uncovering Rat Hippocampal Population Codes During Spatial Navigation

arXiv.org Machine Learning

Rodent hippocampal population codes represent important spatial information about the environment during navigation. Several computational methods have been developed to uncover the neural representation of spatial topology embedded in rodent hippocampal ensemble spike activity. Here we extend our previous work and propose a nonparametric Bayesian approach to infer rat hippocampal population codes during spatial navigation. To tackle the model selection problem, we leverage a nonparametric Bayesian model. Specifically, to analyze rat hippocampal ensemble spiking activity, we apply a hierarchical Dirichlet process-hidden Markov model (HDP-HMM) using two Bayesian inference methods, one based on Markov chain Monte Carlo (MCMC) and the other based on variational Bayes (VB). We demonstrate the effectiveness of our Bayesian approaches on recordings from a freely-behaving rat navigating in an open field environment. We find that MCMC-based inference with Hamiltonian Monte Carlo (HMC) hyperparameter sampling is flexible and efficient, and outperforms VB and MCMC approaches with hyperparameters set by empirical Bayes.


The Poisson transform for unnormalised statistical models

arXiv.org Machine Learning

Contrary to standard statistical models, unnormalised statistical models only specify the likelihood function up to a constant. While such models are natural and popular, the lack of normalisation makes inference much more difficult. Here we show that inferring the parameters of a unnormalised model on a space $\Omega$ can be mapped onto an equivalent problem of estimating the intensity of a Poisson point process on $\Omega$. The unnormalised statistical model now specifies an intensity function that does not need to be normalised. Effectively, the normalisation constant may now be inferred as just another parameter, at no loss of information. The result can be extended to cover non-IID models, which includes for example unnormalised models for sequences of graphs (dynamical graphs), or for sequences of binary vectors. As a consequence, we prove that unnormalised parameteric inference in non-IID models can be turned into a semi-parametric estimation problem. Moreover, we show that the noise-contrastive divergence of Gutmann & Hyv\"arinen (2012) can be understood as an approximation of the Poisson transform, and extended to non-IID settings. We use our results to fit spatial Markov chain models of eye movements, where the Poisson transform allows us to turn a highly non-standard model into vanilla semi-parametric logistic regression.


PLUTO: Penalized Unbiased Logistic Regression Trees

arXiv.org Machine Learning

We propose a new algorithm called PLUTO for building logistic regression trees to binary response data. PLUTO can capture the nonlinear and interaction patterns in messy data by recursively partitioning the sample space. It fits a simple or a multiple linear logistic regression model in each partition. PLUTO employs the cyclical coordinate descent method for estimation of multiple linear logistic regression models with elastic net penalties, which allows it to deal with high-dimensional data efficiently. The tree structure comprises a graphical description of the data. Together with the logistic regression models, it provides an accurate classifier as well as a piecewise smooth estimate of the probability of "success". PLUTO controls selection bias by: (1) separating split variable selection from split point selection; (2) applying an adjusted chi-squared test to find the split variable instead of exhaustive search. A bootstrap calibration technique is employed to further correct selection bias. Comparison on real datasets shows that on average, the multiple linear PLUTO models predict more accurately than other algorithms.


Noise Benefits in Expectation-Maximization Algorithms

arXiv.org Machine Learning

This dissertation shows that careful injection of noise into sample data can substantially speed up Expectation-Maximization algorithms. Expectation-Maximization algorithms are a class of iterative algorithms for extracting maximum likelihood estimates from corrupted or incomplete data. The convergence speed-up is an example of a noise benefit or "stochastic resonance" in statistical signal processing. The dissertation presents derivations of sufficient conditions for such noise-benefits and demonstrates the speed-up in some ubiquitous signal-processing algorithms. These algorithms include parameter estimation for mixture models, the $k$-means clustering algorithm, the Baum-Welch algorithm for training hidden Markov models, and backpropagation for training feedforward artificial neural networks. This dissertation also analyses the effects of data and model corruption on the more general Bayesian inference estimation framework. The main finding is a theorem guaranteeing that uniform approximators for Bayesian model functions produce uniform approximators for the posterior pdf via Bayes theorem. This result also applies to hierarchical and multidimensional Bayesian models.


bartMachine: Machine Learning with Bayesian Additive Regression Trees

arXiv.org Machine Learning

Ensemble-of-trees methods have become popular choices for forecasting in both regression and classification problems. Algorithms such as random forests (Breiman 2001) and stochastic gradient boosting (Friedman 2002) are two well-established and widely employed procedures. Recent advances in ensemble methods include dynamic trees (Taddy, Gramacy, and Polson 2011) and Bayesian additive regression trees (BART, Chipman, George, and McCulloch 2010), which depart from predecessors in that they rely on an underlying Bayesian probability model rather than a pure algorithm. BART has demonstrated substantial promise in a wide variety of simulations and real world applications such as predicting avalanches on mountain roads (Blattenberger and Fowles 2014), predicting how transcription factors interact with DNA (Zhou and Liu 2008) and predicting movie box office revenues (Eliashberg 2010). This paper introduces bartMachine, a new R (R Core Team 2014) package available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package


A Greedy, Flexible Algorithm to Learn an Optimal Bayesian Network Structure

arXiv.org Machine Learning

In this report paper we first present a report of the Advanced Machine Learning Course Project on the provided data set and then present a novel heuristic algorithm for exact Bayesian network (BN) structure discovery that uses decomposable scoring functions. Our algorithm follows a different approach to solve the problem of BN structure discovery than the previously used methods such as Dynamic Programming (DP) and Branch and Bound to reduce the search space and find the global optima space for the problem. The algorithm we propose has some degree of flexibility that can make it more or less greedy. The more the algorithm is set to be greedy, the more the speed of the algorithm will be, and the less optimal the final structure. Our algorithm runs in a much less time than the previously known methods and guarantees to have an optimality of close to 99%.


Target Fishing: A Single-Label or Multi-Label Problem?

arXiv.org Machine Learning

According to Cobanoglu et al and Murphy, it is now widely acknowledged that the single target paradigm (one protein or target, one disease, one drug) that has been the dominant premise in drug development in the recent past is untenable. More often than not, a drug-like compound (ligand) can be promiscuous - that is, it can interact with more than one target protein. In recent years, in in silico target prediction methods the promiscuity issue has been approached computationally in different ways. In this study we confine attention to the so-called ligand-based target prediction machine learning approaches, commonly referred to as target-fishing. With a few exceptions, the target-fishing approaches that are currently ubiquitous in cheminformatics literature can be essentially viewed as single-label multi-classification schemes; these approaches inherently bank on the single target paradigm assumption that a ligand can home in on one specific target. In order to address the ligand promiscuity issue, one might be able to cast target-fishing as a multi-label multi-class classification problem. For illustrative and comparison purposes, single-label and multi-label Naive Bayes classification models (denoted here by SMM and MMM, respectively) for target-fishing were implemented. The models were constructed and tested on 65,587 compounds and 308 targets retrieved from the ChEMBL17 database. SMM and MMM performed differently: for 16,344 test compounds, the MMM model returned recall and precision values of 0.8058 and 0.6622, respectively; the corresponding recall and precision values yielded by the SMM model were 0.7805 and 0.7596, respectively. However, at a significance level of 0.05 and one degree of freedom McNemar test performed on the target prediction results returned by SMM and MMM for the 16,344 test ligands gave a chi-squared value of 15.656, in favour of the MMM approach.


Graph-Sparse LDA: A Topic Model with Structured Sparsity

arXiv.org Machine Learning

Originally designed to model text, topic modeling has become a powerful tool for uncovering latent structure in domains including medicine, finance, and vision. The goals for the model vary depending on the application: in some cases, the discovered topics may be used for prediction or some other downstream task. In other cases, the content of the topic itself may be of intrinsic scientific interest. Unfortunately, even using modern sparse techniques, the discovered topics are often difficult to interpret due to the high dimensionality of the underlying space. To improve topic interpretability, we introduce Graph-Sparse LDA, a hierarchical topic model that leverages knowledge of relationships between words (e.g., as encoded by an ontology). In our model, topics are summarized by a few latent concept-words from the underlying graph that explain the observed words. Graph-Sparse LDA recovers sparse, interpretable summaries on two real-world biomedical datasets while matching state-of-the-art prediction performance.