Bayesian Learning
Deep Tempering
Desjardins, Guillaume, Luo, Heng, Courville, Aaron, Bengio, Yoshua
Restricted Boltzmann Machines (RBMs) are one of the fundamental building blocks of deep learning. Approximate maximum likelihood training of RBMs typically necessitates sampling from these models. In many training scenarios, computationally efficient Gibbs sampling procedures are crippled by poor mixing. In this work we propose a novel method of sampling from Boltzmann machines that demonstrates a computationally efficient way to promote mixing. Our approach leverages an under-appreciated property of deep generative models such as the Deep Belief Network (DBN), where Gibbs sampling from deeper levels of the latent variable hierarchy results in dramatically increased ergodicity. Our approach is thus to train an auxiliary latent hierarchical model, based on the DBN. When used in conjunction with parallel-tempering, the method is asymptotically guaranteed to simulate samples from the target RBM. Experimental results confirm the effectiveness of this sampling strategy in the context of RBM training.
A Bayesian Tensor Factorization Model via Variational Inference for Link Prediction
Ermis, Beyza, Cemgil, A. Taylan
Probabilistic approaches for tensor factorization aim to extract meaningful structure from incomplete data by postulating low rank constraints. Recently, variational Bayesian (VB) inference techniques have successfully been applied to large scale models. This paper presents full Bayesian inference via VB on both single and coupled tensor factorization models. Our method can be run even for very large models and is easily implemented. It exhibits better prediction performance than existing approaches based on maximum likelihood on several real-world datasets for missing link prediction problem.
The automatic creation of concept maps from documents written using morphologically rich languages
Zubrinic, Krunoslav, Kalpic, Damir, Milicevic, Mario
Concept map is a graphical tool for representing knowledge. They have been used in many different areas, including education, knowledge management, business and intelligence. Constructing of concept maps manually can be a complex task; an unskilled person may encounter difficulties in determining and positioning concepts relevant to the problem area. An application that recommends concept candidates and their position in a concept map can significantly help the user in that situation. This paper gives an overview of different approaches to automatic and semi-automatic creation of concept maps from textual and non-textual sources. The concept map mining process is defined, and one method suitable for the creation of concept maps from unstructured textual sources in highly inflected languages such as the Croatian language is described in detail. Proposed method uses statistical and data mining techniques enriched with linguistic tools. With minor adjustments, that method can also be used for concept map mining from textual sources in other morphologically rich languages.
Order-invariant prior specification in Bayesian factor analysis
In (exploratory) factor analysis, the loading matrix is identified only up to orthogonal rotation. For identifiability, one thus often takes the loading matrix to be lower triangular with positive diagonal entries. In Bayesian inference, a standard practice is then to specify a prior under which the loadings are independent, the off-diagonal loadings are normally distributed, and the diagonal loadings follow a truncated normal distribution. This prior specification, however, depends in an important way on how the variables and associated rows of the loading matrix are ordered. We show how a minor modification of the approach allows one to compute with the identifiable lower triangular loading matrix but maintain invariance properties under reordering of the variables.
Beyond Maximum Likelihood: from Theory to Practice
Jiao, Jiantao, Venkat, Kartik, Han, Yanjun, Weissman, Tsachy
Maximum likelihood is the most widely used statistical estimation technique. Recent work by Jiao, Venkat, Han, and Weissman [1] introduced a general methodology for the construction of estimators for functionals in parametric models, and demonstrated improvements - both in theory and in practice - over the maximum likelihood estimator (MLE), particularly in high dimensional scenarios involving parameter dimension comparable to or larger than the number of samples. This approach to estimation, building on results from approximation theory, is shown to yield minimax rate-optimal estimators for a wide class of functionals, implementable with modest computational requirements. In a nutshell, a message of this recent work is that, for a wide class of functionals, the performance of these essentially optimal estimators with n samples is comparable to that of the MLE with nlnn samples. In the present paper, we highlight the applicability of the aforementioned methodology to statistical problems beyond functional estimation, and show that it can yield substantial gains. For example, we demonstrate that for learning tree-structured graphical models, our approach achieves a significant reduction of the required data size compared with the classical Chow-Liu algorithm, which is an implementation of the MLE, to achieve the same accuracy. The key step in improving the Chow-Liu algorithm is to replace the empirical mutual information with the estimator for mutual information proposed in [1]. Further, applying the same replacement approach to classical Bayesian network classification, the resulting classifiers uniformly outperform the previous classifiers on 26 widely used datasets.
Identification of jump Markov linear models using particle filters
Svensson, Andreas, Schรถn, Thomas B., Lindsten, Fredrik
Jump Markov linear models consists of a finite number of linear state space models and a discrete variable encoding the jumps (or switches) between the different linear models. Identifying jump Markov linear models makes for a challenging problem lacking an analytical solution. We derive a new expectation maximization (EM) type algorithm that produce maximum likelihood estimates of the model parameters. Our development hinges upon recent progress in combining particle filters with Markov chain Monte Carlo methods in solving the nonlinear state smoothing problem inherent in the EM formulation. Key to our development is that we exploit a conditionally linear Gaussian substructure in the model, allowing for an efficient algorithm.
Unsupervised learning of regression mixture models with unknown number of components
Regression mixture models are widely studied in statistics, machine learning and data analysis. Fitting regression mixtures is challenging and is usually performed by maximum likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the initialization is crucial for EM. If the initialization is inappropriately performed, the EM algorithm may lead to unsatisfactory results. The EM algorithm also requires the number of clusters to be given a priori; the problem of selecting the number of mixture components requires using model selection criteria to choose one from a set of pre-estimated candidate models. We propose a new fully unsupervised algorithm to learn regression mixture models with unknown number of components. The developed unsupervised learning approach consists in a penalized maximum likelihood estimation carried out by a robust expectation-maximization (EM) algorithm for fitting polynomial, spline and B-spline regressions mixtures. The proposed learning approach is fully unsupervised: 1) it simultaneously infers the model parameters and the optimal number of the regression mixture components from the data as the learning proceeds, rather than in a two-fold scheme as in standard model-based clustering using afterward model selection criteria, and 2) it does not require accurate initialization unlike the standard EM for regression mixtures. The developed approach is applied to curve clustering problems. Numerical experiments on simulated data show that the proposed robust EM algorithm performs well and provides accurate results in terms of robustness with regard initialization and retrieving the optimal partition with the actual number of clusters. An application to real data in the framework of functional data clustering, confirms the benefit of the proposed approach for practical applications.
On tensor rank of conditional probability tables in Bayesian networks
Vomlel, Jiลรญ, Tichavskรฝ, Petr
A difficult task in modeling with Bayesian networks is the elicitation of numerical parameters of Bayesian networks. A large number of parameters is needed to specify a conditional probability table (CPT) that has a larger parent set. In this paper we show that, most CPTs from real applications of Bayesian networks can actually be very well approximated by tables that require substantially less parameters. This observation has practical consequence not only for model elicitation but also for efficient probabilistic reasoning with these networks.
Expectation Propagation
Raymond, Jack, Manoel, Andre, Opper, Manfred
Variational inference is a powerful concept that underlies many iterative approximation algorithms; expectation propagation, mean-field methods and belief propagations were all central themes at the school that can be perceived from this unifying framework. The lectures of Manfred Opper introduce the archetypal example of Expectation Propagation, before establishing the connection with the other approximation methods. Corrections by expansion about the expectation propagation are then explained. Finally some advanced inference topics and applications are explored in the final sections.
On the Maximum Entropy Property of the First-Order Stable Spline Kernel and its Implications
A new nonparametric approach for system identification has been recently proposed where the impulse response is seen as the realization of a zero--mean Gaussian process whose covariance, the so--called stable spline kernel, guarantees that the impulse response is almost surely stable. Maximum entropy properties of the stable spline kernel have been pointed out in the literature. In this paper we provide an independent proof that relies on the theory of matrix extension problems in the graphical model literature and leads to a closed form expression for the inverse of the first order stable spline kernel as well as to a new factorization in the form $UWU^\top$ with $U$ upper triangular and $W$ diagonal. Interestingly, all first--order stable spline kernels share the same factor $U$ and $W$ admits a closed form representation in terms of the kernel hyperparameter, making the factorization computationally inexpensive. Maximum likelihood properties of the stable spline kernel are also highlighted. These results can be applied both to improve the stability and to reduce the computational complexity associated with the computation of stable spline estimators.