Learning Graphical Models
A Dictionary Learning Approach for Factorial Gaussian Models
Subakan, Y. Cem, Traa, Johannes, Smaragdis, Paris, Stein, Noah
In this paper, we develop a parameter estimation method for factorially parametrized models such as Factorial Gaussian Mixture Model and Factorial Hidden Markov Model. Our contributions are two-fold. First, we show that the emission matrix of the standard Factorial Model is unidentifiable even if the true assignment matrix is known. Secondly, we address the issue of identifiability by making a one component sharing assumption and derive a parameter learning algorithm for this case. Our approach is based on a dictionary learning problem of the form $X = O R$, where the goal is to learn the dictionary $O$ given the data matrix $X$. We argue that due to the specific structure of the activation matrix $R$ in the shared component factorial mixture model, and an incoherence assumption on the shared component, it is possible to extract the columns of the $O$ matrix without the need for alternating between the estimation of $O$ and $R$.
Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo
Heinonen, Markus, Mannerström, Henrik, Rousu, Juho, Kaski, Samuel, Lähdesmäki, Harri
We present a novel approach for fully non-stationary Gaussian process regression (GPR), where all three key parameters -- noise variance, signal variance and lengthscale -- can be simultaneously input-dependent. We develop gradient-based inference methods to learn the unknown function and the non-stationary model parameters, without requiring any model approximations. We propose to infer full parameter posterior with Hamiltonian Monte Carlo (HMC), which conveniently extends the analytical gradient-based GPR learning by guiding the sampling with model gradients. We also learn the MAP solution from the posterior by gradient ascent. In experiments on several synthetic datasets and in modelling of temporal gene expression, the nonstationary GPR is shown to be necessary for modeling realistic input-dependent dynamics, while it performs comparably to conventional stationary or previous non-stationary GPR models otherwise.
Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data
Hu, Changwei, Rai, Piyush, Chen, Changyou, Harding, Matthew, Carin, Lawrence
We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well as infer the rank of the decomposition. Moreover, leveraging a reparameterization of the Poisson distribution as a multinomial facilitates conjugacy in the model and enables simple and efficient Gibbs sampling and variational Bayes (VB) inference updates, with a computational cost that only depends on the number of nonzeros in the tensor. The model also provides a nice interpretability for the factors; in our model, each factor corresponds to a "topic". We develop a set of online inference algorithms that allow further scaling up the model to massive tensors, for which batch inference methods may be infeasible. We apply our framework on diverse real-world applications, such as \emph{multiway} topic modeling on a scientific publications database, analyzing a political science data set, and analyzing a massive household transactions data set.
Zero-Truncated Poisson Tensor Factorization for Massive Binary Tensors
Hu, Changwei, Rai, Piyush, Carin, Lawrence
We present a scalable Bayesian model for low-rank factorization of massive tensors with binary observations. The proposed model has the following key properties: (1) in contrast to the models based on the logistic or probit likelihood, using a zero-truncated Poisson likelihood for binary data allows our model to scale up in the number of \emph{ones} in the tensor, which is especially appealing for massive but sparse binary tensors; (2) side-information in form of binary pairwise relationships (e.g., an adjacency network) between objects in any tensor mode can also be leveraged, which can be especially useful in "cold-start" settings; and (3) the model admits simple Bayesian inference via batch, as well as \emph{online} MCMC; the latter allows scaling up even for \emph{dense} binary data (i.e., when the number of ones in the tensor/network is also massive). In addition, non-negative factor matrices in our model provide easy interpretability, and the tensor rank can be inferred from the data. We evaluate our model on several large-scale real-world binary tensors, achieving excellent computational scalability, and also demonstrate its usefulness in leveraging side-information provided in form of mode-network(s).
Partial Optimality by Pruning for MAP-Inference with General Graphical Models
Swoboda, Paul, Shekhovtsov, Alexander, Kappes, Jörg Hendrik, Schnörr, Christoph, Savchynskyy, Bogdan
We consider the energy minimization problem for undirected graphical models, also known as MAP-inference problem for Markov random fields which is NP-hard in general. We propose a novel polynomial time algorithm to obtain a part of its optimal non-relaxed integral solution. Our algorithm is initialized with variables taking integral values in the solution of a convex relaxation of the MAP-inference problem and iteratively prunes those, which do not satisfy our criterion for partial optimality. We show that our pruning strategy is in a certain sense theoretically optimal. Also empirically our method outperforms previous approaches in terms of the number of persistently labelled variables. The method is very general, as it is applicable to models with arbitrary factors of an arbitrary order and can employ any solver for the considered relaxed problem. Our method's runtime is determined by the runtime of the convex relaxation solver for the MAP-inference problem.
Tree-Width and the Computational Complexity of MAP Approximations in Bayesian Networks
The problem of finding the most probable explanation to a designated set of variables given partial evidence (the MAP problem) is a notoriously intractable problem in Bayesian networks, both to compute exactly and to approximate. It is known, both from theoretical considerations and from practical experience, that low tree-width is typically an essential prerequisite to efficient exact computations in Bayesian networks. In this paper we investigate whether the same holds for approximating MAP. We define four notions of approximating MAP (by value, structure, rank, and expectation) and argue that all of them are intractable in general. We prove that efficient value-approximations, structure-approximations, and rank-approximations of MAP instances with high tree-width will violate the Exponential Time Hypothesis. In contrast, we show that MAP can sometimes be efficiently expectation-approximated, even in instances with high tree-width, if the most probable explanation has a high probability. We introduce the complexity class FERT, analogous to the class FTP, to capture this notion of fixed-parameter expectation-approximability. We suggest a road-map to future research that yields fixed-parameter tractable results for expectation-approximate MAP, even in graphs with high tree-width.
A Generative Model for Multi-Dialect Representation
In the era of deep learning several unsupervised models have been developed to capture the key features in unlabeled handwritten data. Popular among them is the Restricted Boltzmann Machines RBM. However, due to the novelty in handwritten multidialect data, the RBM may fail to generate an efficient representation. In this paper we propose a generative model, the Mode Synthesizing Machine MSM for on-line representation of real life handwritten multidialect language data. The MSM takes advantage of the hierarchical representation of the modes of a data distribution using a two-point error update to learn a sequence of representative multidialects in a generative way. Experiments were performed to evaluate the performance of the MSM over the RBM with the former attaining much lower error values than the latter on both independent and mixed data set.
A Deep Learning Approach to Structured Signal Recovery
Mousavi, Ali, Patel, Ankit B., Baraniuk, Richard G.
Abstract--In this paper, we develop a new framework for sensing and recovering structured signals. In contrast to compressive sensing (CS) systems that employ linear measurements, sparse representations, and computationally complex convex/greedy algorithms, we introduce a deep learning framework that supports both linear and mildly nonlinear measurements, that learns a structured representation from training data, and that efficiently computes a signal estimate. In particular, we apply a stacked denoising autoencoder (SDA), as an unsupervised feature learner. SDA enables us to capture statistical dependencies between the different elements of certain signals and improve signal recovery performance as compared to the CS approach. Many configurations for x and Γ(.) have been explored in the literature for this problem; however, one of the most useful ones is to have a sparse signal x and a linear Γ(.), i.e., y Γ(x) Φx. Compressive sensing (CS) [1]-[3] is a field that tries to solve this linear inverse problem in case that x has a sparse representation, i.e., there exists an N N basis matrix Ψ [ψ Department of Electrical and Computer Engineering Rice University Houston, TX 77005 (i) How to recover the signal x from a given measurement vector y and operator Γ(.)? (ii) How to design the measurement operator Γ(.)? (iii) If we are concerned with any type of structure, How could we find a representation in which the signal x has that structure? Although there has been a considerable progress in CS and particularly in the answers of aforementioned questions, our goal is to go beyond the state-of-the-art results.
A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution
Li, Shaohua, Zhu, Jun, Miao, Chunyan
Most existing word embedding methods can be categorized into Neural Embedding Models and Matrix Factorization (MF)-based methods. However some models are opaque to probabilistic interpretation, and MF-based methods, typically solved using Singular Value Decomposition (SVD), may incur loss of corpus information. In addition, it is desirable to incorporate global latent factors, such as topics, sentiments or writing styles, into the word embedding model. Since generative models provide a principled way to incorporate latent factors, we propose a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models. The model inference reduces to a low rank weighted positive semidefinite approximation problem. Its optimization is approached by eigendecomposition on a submatrix, followed by online blockwise regression, which is scalable and avoids the information loss in SVD. In experiments on 7 common benchmark datasets, our vectors are competitive to word2vec, and better than other MF-based methods.
Causal Decision Trees
Li, Jiuyong, Ma, Saisai, Le, Thuc Duy, Liu, Lin, Liu, Jixue
Uncovering causal relationships in data is a major objective of data analytics. Causal relationships are normally discovered with designed experiments, e.g. randomised controlled trials, which, however are expensive or infeasible to be conducted in many cases. Causal relationships can also be found using some well designed observational studies, but they require domain experts' knowledge and the process is normally time consuming. Hence there is a need for scalable and automated methods for causal relationship exploration in data. Classification methods are fast and they could be practical substitutes for finding causal signals in data. However, classification methods are not designed for causal discovery and a classification method may find false causal signals and miss the true ones. In this paper, we develop a causal decision tree where nodes have causal interpretations. Our method follows a well established causal inference framework and makes use of a classic statistical test. The method is practical for finding causal signals in large data sets.