Goto

Collaborating Authors

 Uncertainty


Continuous Semantic Topic Embedding Model Using Variational Autoencoder

arXiv.org Machine Learning

This paper proposes the continuous semantic topic embedding model (CSTEM) which finds latent topic variables in documents using continuous semantic distance function between the topics and the words by means of the vari-ational autoencoder(V AE). The semantic distance could be represented by any symmetric bell-shaped geometric distance function on the Euclidean space, for which the Mahalanobis distance is used in this paper. In order for the semantic distance to perform more properly, we newly introduce an additional model parameter for each word to take out the global factor from this distance indicating how likely it occurs regardless of its topic. It certainly improves the problem that the Gaussian distribution which is used in previous topic model with continuous word embedding could not explain the semantic relation correctly and helps to obtain the higher topic coherence. Through the experiments with the dataset of 20 Newsgroup, NIPS papers and CNN/Dailymail corpus, the performance of the recent state-of-the-art models is accomplished by our model as well as generating topic embedding vectors which makes possible to observe where the topic vectors are embedded with the word vectors in the real Euclidean space and how the topics are related each other semantically.


Inference via low-dimensional couplings

arXiv.org Machine Learning

We investigate the low-dimensional structure of deterministic transformations between random variables, i.e., transport maps between probability measures. In the context of statistics and machine learning, these transformations can be used to couple a tractable "reference" measure (e.g., a standard Gaussian) with a target measure of interest. Direct simulation from the desired measure can then be achieved by pushing forward reference samples through the map. Yet characterizing such a map---e.g., representing and evaluating it---grows challenging in high dimensions. The central contribution of this paper is to establish a link between the Markov properties of the target measure and the existence of low-dimensional couplings, induced by transport maps that are sparse and/or decomposable. Our analysis not only facilitates the construction of transformations in high-dimensional settings, but also suggests new inference methodologies for continuous non-Gaussian graphical models. For instance, in the context of nonlinear state-space models, we describe new variational algorithms for filtering, smoothing, and sequential parameter inference. These algorithms can be understood as the natural generalization---to the non-Gaussian case---of the square-root Rauch-Tung-Striebel Gaussian smoother.


Diversity-Promoting Bayesian Learning of Latent Variable Models

arXiv.org Machine Learning

To address three important issues involved in latent variable models (LVMs), including capturing infrequent patterns, achieving small-sized but expressive models and alleviating overfitting, several studies have been devoted to "diversifying" LVMs, which aim at encouraging the components in LVMs to be diverse. Most existing studies fall into a frequentist-style regularization framework, where the components are learned via point estimation. In this paper, we investigate how to "diversify" LVMs in the paradigm of Bayesian learning. We propose two approaches that have complementary advantages. One is to define a diversity-promoting mutual angular prior which assigns larger density to components with larger mutual angles and use this prior to affect the posterior via Bayes' rule. We develop two efficient approximate posterior inference algorithms based on variational inference and MCMC sampling. The other approach is to impose diversity-promoting regularization directly over the post-data distribution of components. We also extend our approach to "diversify" Bayesian nonparametric models where the number of components is infinite. A sampling algorithm based on slice sampling and Hamiltonian Monte Carlo is developed. We apply these methods to "diversify" Bayesian mixture of experts model and infinite latent feature model. Experiments on various datasets demonstrate the effectiveness and efficiency of our methods.


Bayesian Nonparametric Models

@machinelearnbot

Bayesian Nonparametrics is a class of models with a potentially infinite number of parameters. High flexibility and expressive power of this approach enables better data modelling compared to parametric methods.


Dynamic classifier chains for multi-label learning

arXiv.org Machine Learning

In this paper, we deal with the task of building a dynamic ensemble of chain classifiers for multi-label classification. To do so, we proposed two concepts of classifier chains algorithms that are able to change label order of the chain without rebuilding the entire model. Such modes allows anticipating the instance-specific chain order without a significant increase in computational burden. The proposed chain models are built using the Naive Bayes classifier and nearest neighbour approach as a base single-label classifiers. To take the benefits of the proposed algorithms, we developed a simple heuristic that allows the system to find relatively good label order. The heuristic sort labels according to the label-specific classification quality gained during the validation phase. The heuristic tries to minimise the phenomenon of error propagation in the chain. The experimental results showed that the proposed model based on Naive Bayes classifier the above-mentioned heuristic is an efficient tool for building dynamic chain classifiers.


An Efficient ADMM Algorithm for Structural Break Detection in Multivariate Time Series

arXiv.org Machine Learning

We present an efficient alternating direction method of multipliers (ADMM) algorithm for segmenting a multivariate non-stationary time series with structural breaks into stationary regions. We draw from recent work where the series is assumed to follow a vector autoregressive model within segments and a convex estimation procedure may be formulated using group fused lasso penalties. Our ADMM approach first splits the convex problem into a global quadratic program and a simple group lasso proximal update. We show that the global problem may be parallelized over rows of the time dependent transition matrices and furthermore that each subproblem may be rewritten in a form identical to the log-likelihood of a Gaussian state space model. Consequently, we develop a Kalman smoothing algorithm to solve the global update in time linear in the length of the series.


Variational Bayesian Inference For A Scale Mixture Of Normal Distributions Handling Missing Data

arXiv.org Machine Learning

In this paper, a scale mixture of Normal distributions model is developed for classification and clustering of data having outliers and missing values. The classification method, based on a mixture model, focuses on the introduction of latent variables that gives us the possibility to handle sensitivity of model to outliers and to allow a less restrictive modelling of missing data. Inference is processed through a Variational Bayesian Approximation and a Bayesian treatment is adopted for model learning, supervised classification and clustering.


Adversarial Phenomenon in the Eyes of Bayesian Deep Learning

arXiv.org Machine Learning

Deep Learning models are vulnerable to adversarial examples, i.e.\ images obtained via deliberate imperceptible perturbations, such that the model misclassifies them with high confidence. However, class confidence by itself is an incomplete picture of uncertainty. We therefore use principled Bayesian methods to capture model uncertainty in prediction for observing adversarial misclassification. We provide an extensive study with different Bayesian neural networks attacked in both white-box and black-box setups. The behaviour of the networks for noise, attacks and clean test data is compared. We observe that Bayesian neural networks are uncertain in their predictions for adversarial perturbations, a behaviour similar to the one observed for random Gaussian perturbations. Thus, we conclude that Bayesian neural networks can be considered for detecting adversarial examples.


The De-Biased Whittle Likelihood

arXiv.org Machine Learning

The Whittle likelihood is a widely used and computationally efficient pseudo-likelihood. However, it is known to produce biased parameter estimates for large classes of models. We propose a method for de-biasing Whittle estimates for second-order stationary stochastic processes. The de-biased Whittle likelihood can be computed in the same $\mathcal{O}(n\log n)$ operations as the standard approach. We demonstrate the superior performance of the method in simulation studies and in application to a large-scale oceanographic dataset, where in both cases the de-biased approach reduces bias by up to two orders of magnitude, achieving estimates that are close to exact maximum likelihood, at a fraction of the computational cost. We prove that the method yields estimates that are consistent at an optimal convergence rate of $n^{-1/2}$, under weaker assumptions than standard theory, where we do not require that the power spectral density is continuous in frequency. We describe how the method can be easily combined with standard methods of bias reduction, such as tapering and differencing, to further reduce bias in parameter estimates.


Domain Generalization by Marginal Transfer Learning

arXiv.org Machine Learning

Domain generalization is the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distribution-free, kernel-based approach that predicts a classifier from the marginal distribution of features, by leveraging the trends present in related classification tasks. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on synthetic data and three real data applications demonstrate the superiority of the method with respect to a pooling strategy.