Bayesian Learning
Collaborative Filtering using Denoising Auto-Encoders for Market Basket Data
Abad, Andres G., Reyes-Castro, Luis I.
Recommender systems (RS) help users navigate large sets of items in the search for "interesting" ones. One approach to RS is Collaborative Filtering (CF), which is based on the idea that similar users are interested in similar items. Most model-based approaches to CF seek to train a machine-learning/data-mining model based on sparse data; the model is then used to provide recommendations. While most of the proposed approaches are effective for small-size situations, the combinatorial nature of the problem makes it impractical for medium-to-large instances. In this work we present a novel approach to CF that works by training a Denoising Auto-Encoder (DAE) on corrupted baskets, i.e., baskets from which one or more items have been removed. The DAE is then forced to learn to reconstruct the original basket given its corrupted input. Due to recent advancements in optimization and other technologies for training neural-network models (such as DAE), the proposed method results in a scalable and practical approach to CF. The contribution of this work is twofold: (1) to identify missing items in observed baskets and, thus, directly providing a CF model; and, (2) to construct a generative model of baskets which may be used, for instance, in simulation analysis or as part of a more complex analytical method.
Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings
There exist two main approaches to automatically extract affective orientation: lexicon-based and corpus-based. In this work, we argue that these two methods are compatible and show that combining them can improve the accuracy of emotion classifiers. In particular, we introduce a novel variant of the Label Propagation algorithm that is tailored to distributed word representations, we apply batch gradient descent to accelerate the optimization of label propagation and to make the optimization feasible for large graphs, and we propose a reproducible method for emotion lexicon expansion. We conclude that label propagation can expand an emotion lexicon in a meaningful way and that the expanded emotion lexicon can be leveraged to improve the accuracy of an emotion classifier.
Model-Based Multiple Instance Learning
Vo, Ba-Ngu, Phung, Dinh, Tran, Quang N., Vo, Ba-Tuong
While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.
Automatic Selection of t-SNE Perplexity
In practice, proper tuning of t-SNE perplexity requires users to understand the inner working of the method as well as to have hands-on experience. We propose a model selection objective for t-SNE perplexity that requires negligible extra computation beyond that of the t-SNE itself. We empirically validate that the perplexity settings found by our approach are consistent with preferences elicited from human experts across a number of datasets. The similarities of our approach to Bayesian information criteria (BIC) and minimum description length (MDL) are also analyzed.
A probabilistic model for the numerical solution of initial value problems
Schober, Michael, Sรคrkkรค, Simo, Hennig, Philipp
In recent years, the search for numerical algorithms which return probability distributions over the solution for a given numerical problem has become an active area of research [25]. Several models and methods have been proposed for the solution of initial value problems (IVPs) [57, 7, 51, 9, 31, 61]. However, these probabilistic algorithms have no immediate connection to the extensive literature on this task in numerical analysis. Most importantly, such inference algorithms do not come with convergence analysis out of the box. The methods in [7, 9, 61] have convergence results, but their respective implementations are based on sampling schemes and, thus, do not offer guarantees for individual runs. The methods in [51, 31] offer a deterministic execution and an analytical guarantee for the first step, but we will show that this guarantee is lacking for the whole integration domain. In this paper, we present a class of probabilistic solvers which combine properties of the standard and the probabilistic algorithms. We formulate desiderata that users might have for a probabilistic numerical algorithm.
The Multivariate Generalised von Mises distribution: Inference and applications
Navarro, Alexandre K. W., Frellsen, Jes, Turner, Richard E.
Circular variables arise in a multitude of data-modelling contexts ranging from robotics to the social sciences, but they have been largely overlooked by the machine learning community. This paper partially redresses this imbalance by extending some standard probabilistic modelling tools to the circular domain. First we introduce a new multivariate distribution over circular variables, called the multivariate Generalised von Mises (mGvM) distribution. This distribution can be constructed by restricting and renormalising a general multivariate Gaussian distribution to the unit hyper-torus. Previously proposed multivariate circular distributions are shown to be special cases of this construction. Second, we introduce a new probabilistic model for circular regression, that is inspired by Gaussian Processes, and a method for probabilistic principal component analysis with circular hidden variables. These models can leverage standard modelling tools (e.g. covariance functions and methods for automatic relevance determination). Third, we show that the posterior distribution in these models is a mGvM distribution which enables development of an efficient variational free-energy scheme for performing approximate inference and approximate maximum-likelihood learning.
Variational Bayesian inference for linear and logistic regression
The article describe the model, derivation, and implementation of variational Bayesian inference for linear and logistic regression, both with and without automatic relevance determination. It has the dual function of acting as a tutorial for the derivation of variational Bayesian inference for simple models, as well as documenting, and providing brief examples for the MATLABfunctions that implement this inference. These functions are freely available online. 1. Introduction Linear and logistic regression are essential workhorses of statistical analysis, whose Bayesian treatment has received much recent attention (Gelman et al., 2013; Bishop, 2006; Murphy, 2012; Hastie et al., 2011). These allow specifying the a-priori uncertainty and infer a-posteriori uncertainty about regression coefficients explic-ity and hierarchically, by, for example, specifying how uncertain we are a-priori that these coefficients are small. However, Bayesian inference in such hierarchical models quickly becomes intractable, such that recent effort has focused on approximate inference, like Markov Chain Monte Carlo methods (Gilks et al., 1995), or variational Bayesian approximation (Beal, 2003; Bishop, 2006; Murphy, 2012). Here, we describe such a variational treatment and implementation of Bayesian hierarchical models for both linear and logistic regression. Even though neither the statistical models nor their Bayesian approximation are particularly novel, the article provides a tutorial-style introduction to the derivation of their algorithms, together with a MATLABimplementation of these algorithms.
Delayed acceptance ABC-SMC
Everitt, Richard G., Rowiลska, Paulina A.
Approximate Bayesian computation (ABC) is now an established technique for statistical inference used in cases where the likelihood function is computationally expensive or not available. It relies on the use of a model that is specified in the form of a simulator, and approximates the likelihood at a parameter $\theta$ by simulating auxiliary data sets $x$ and evaluating the distance of $x$ from the true data $y$. However, ABC is not computationally feasible in cases where using the simulator for each $\theta$ is very expensive. This paper investigates this situation in cases where a cheap, but approximate, simulator is available. The approach is to employ delayed acceptance Markov chain Monte Carlo (MCMC) within an ABC sequential Monte Carlo (SMC) sampler in order to, in a first stage of the kernel, use the cheap simulator to rule out parts of the parameter space that are not worth exploring, so that the "true" simulator is only run (in the second stage of the kernel) where there is a reasonable chance of accepting proposed values of $\theta$. We show that this approach can be used quite automatically, with the only tuning parameter choice additional to ABC-SMC being the number of particles we wish to carry through to the second stage of the kernel. Applications to stochastic differential equation models and latent doubly intractable distributions are presented.
7 Machine Learning Algorithms Every Engineer Should Know
Machine Learning, the branch of Artificial Intelligence is based on the idea that machines should be able to learn and adapt through experience. It is increasingly gaining popularity over the last couple of years. Machine learning is one approach to achieve Artificial Intelligence by using algorithms. It is predicted that Machine Learning Algorithms may replace a wealth of jobs in the coming years. Logistic Regression is a powerful statistical way of estimating discrete values (usually binary values) from a set of independent variables.
A Latent Variable Model for Two-Dimensional Canonical Correlation Analysis and its Variational Inference
Safayani, Mehran, Momenzadeh, Saeid
Describing the dimension reduction (DR) techniques by means of probabilistic models has recently been given special attention. Probabilistic models, in addition to a better interpretability of the DR methods, provide a framework for further extensions of such algorithms. One of the new approaches to the probabilistic DR methods is to preserving the internal structure of data. It is meant that it is not necessary that the data first be converted from the matrix or tensor format to the vector format in the process of dimensionality reduction. In this paper, a latent variable model for matrix-variate data for canonical correlation analysis (CCA) is proposed. Since in general there is not any analytical maximum likelihood solution for this model, we present two approaches for learning the parameters. The proposed methods are evaluated using the synthetic data in terms of convergence and quality of mappings. Also, real data set is employed for assessing the proposed methods with several probabilistic and none-probabilistic CCA based approaches. The results confirm the superiority of the proposed methods with respect to the competing algorithms. Moreover, this model can be considered as a framework for further extensions.