Directed Networks
Sparse Stochastic Inference for Latent Dirichlet allocation
Mimno, David, Hoffman, Matt, Blei, David
We present a hybrid algorithm for Bayesian topic models that combines the efficiency of sparse Gibbs sampling with the scalability of online stochastic inference. We used our algorithm to analyze a corpus of 1.2 million books (33 billion words) with thousands of topics. Our approach reduces the bias of variational inference and generalizes to many Bayesian hidden-variable models.
An Infinite Latent Attribute Model for Network Data
Palla, Konstantina, Knowles, David, Ghahramani, Zoubin
Latent variable models for network data extract a summary of the relational structure underlying an observed network. The simplest possible models subdivide nodes of the network into clusters; the probability of a link between any two nodes then depends only on their cluster assignment. Currently available models can be classified by whether clusters are disjoint or are allowed to overlap. These models can explain a "flat" clustering structure. Hierarchical Bayesian models provide a natural approach to capture more complex dependencies. We propose a model in which objects are characterised by a latent feature vector. Each feature is itself partitioned into disjoint groups (subclusters), corresponding to a second layer of hierarchy. In experimental comparisons, the model achieves significantly improved predictive performance on social and biological link prediction tasks. The results indicate that models with a single layer hierarchy over-simplify real networks.
Communications Inspired Linear Discriminant Analysis
Chen, Minhua, Carson, William, Rodrigues, Miguel, Calderbank, Robert, Carin, Lawrence
We study the problem of supervised linear dimensionality reduction, taking an information-theoretic viewpoint. The linear projection matrix is designed by maximizing the mutual information between the projected signal and the class label (based on a Shannon entropy measure). By harnessing a recent theoretical result on the gradient of mutual information, the above optimization problem can be solved directly using gradient descent, without requiring simplification of the objective function. Theoretical analysis and empirical comparison are made between the proposed method and two closely related methods (Linear Discriminant Analysis and Information Discriminant Analysis), and comparisons are also made with a method in which Renyi entropy is used to define the mutual information (in this case the gradient may be computed simply, under a special parameter setting). Relative to these alternative approaches, the proposed method achieves promising results on real datasets.
Feature Selection via Probabilistic Outputs
Danyluk, Andrea, Arnosti, Nicholas
This paper investigates two feature-scoring criteria that make use of estimated class probabilities: one method proposed by \citet{shen} and a complementary approach proposed below. We develop a theoretical framework to analyze each criterion and show that both estimate the spread (across all values of a given feature) of the probability that an example belongs to the positive class. Based on our analysis, we predict when each scoring technique will be advantageous over the other and give empirical results validating our predictions.
Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring
Ahn, Sungjin, Korattikara, Anoop, Welling, Max
In this paper we address the following question: "Can we approximately sample from a Bayesian posterior distribution if we are only allowed to touch a small mini-batch of data-items for every sample we generate?". An algorithm based on the Langevin equation with stochastic gradients (SGLD) was previously proposed to solve this, but its mixing rate was slow. By leveraging the Bayesian Central Limit Theorem, we extend the SGLD algorithm so that at high mixing rates it will sample from a normal approximation of the posterior, while for slow mixing rates it will mimic the behavior of SGLD with a pre-conditioner matrix. As a bonus, the proposed algorithm is reminiscent of Fisher scoring (with stochastic gradients) and as such an efficient optimizer during burn-in.
Inference in Hybrid Bayesian Networks Using Mixtures of Gaussians
The main goal of this paper is to describe a method for exact inference in general hybrid Bayesian networks (BNs) (with a mixture of discrete and continuous chance variables). Our method consists of approximating general hybrid Bayesian networks by a mixture of Gaussians (MoG) BNs. There exists a fast algorithm by Lauritzen-Jensen (LJ) for making exact inferences in MoG Bayesian networks, and there exists a commercial implementation of this algorithm. However, this algorithm can only be used for MoG BNs. Some limitations of such networks are as follows. All continuous chance variables must have conditional linear Gaussian distributions, and discrete chance nodes cannot have continuous parents. The methods described in this paper will enable us to use the LJ algorithm for a bigger class of hybrid Bayesian networks. This includes networks with continuous chance nodes with non-Gaussian distributions, networks with no restrictions on the topology of discrete and continuous variables, networks with conditionally deterministic variables that are a nonlinear function of their continuous parents, and networks with continuous chance variables whose variances are functions of their parents.
A simple approach for finding the globally optimal Bayesian network structure
Silander, Tomi, Myllymaki, Petri
We study the problem of learning the best Bayesian network structure with respect to a decomposable score such as BDe, BIC or AIC. This problem is known to be NP-hard, which means that solving it becomes quickly infeasible as the number of variables increases. Nevertheless, in this paper we show that it is possible to learn the best Bayesian network structure with over 30 variables, which covers many practically interesting cases. Our algorithm is less complicated and more efficient than the techniques presented earlier. It can be easily parallelized, and offers a possibility for efficient exploration of the best networks consistent with different variable orderings. In the experimental part of the paper we compare the performance of the algorithm to the previous state-of-the-art algorithm. Free source-code and an online-demo can be found at http://b-course.hiit.fi/bene.
Bayesian Inference for Gaussian Mixed Graph Models
Silva, Ricardo, Ghahramani, Zoubin
We introduce priors and algorithms to perform Bayesian inference in Gaussian models defined by acyclic directed mixed graphs. Such a class of graphs, composed of directed and bi-directed edges, is a representation of conditional independencies that is closed under marginalization and arises naturally from causal models which allow for unmeasured confounding. Monte Carlo methods and a variational approximation for such models are presented. Our algorithms for Bayesian inference allow the evaluation of posterior distributions for several quantities of interest, including causal effects that are not identifiable from data alone but could otherwise be inferred where informative prior knowledge about confounding is available.
Recognizing Activities and Spatial Context Using Wearable Sensors
Subramanya, Amarnag, Raj, Alvin, Bilmes, Jeff A., Fox, Dieter
We introduce a new dynamic model with the capability of recognizing both activities that an individual is performing as well as where that ndividual is located. Our model is novel in that it utilizes a dynamic graphical model to jointly estimate both activity and spatial context over time based on the simultaneous use of asynchronous observations consisting of GPS measurements, and measurements from a small mountable sensor board. Joint inference is quite desirable as it has the ability to improve accuracy of the model. A key goal, however, in designing our overall system is to be able to perform accurate inference decisions while minimizing the amount of hardware an individual must wear. This minimization leads to greater comfort and flexibility, decreased power requirements and therefore increased battery life, and reduced cost. We show results indicating that our joint measurement model outperforms measurements from either the sensor board or GPS alone, using two types of probabilistic inference procedures, namely particle filtering and pruned exact inference.
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Wood, Frank, Griffiths, Thomas, Ghahramani, Zoubin
We present a non-parametric Bayesian approach to structure learning with hidden causes. Previous Bayesian treatments of this problem define a prior over the number of hidden causes and use algorithms such as reversible jump Markov chain Monte Carlo to move between solutions. In contrast, we assume that the number of hidden causes is unbounded, but only a finite number influence observable variables. This makes it possible to use a Gibbs sampler to approximate the distribution over causal structures. We evaluate the performance of both approaches in discovering hidden causes in simulated data, and use our non-parametric approach to discover hidden causes in a real medical dataset.