Uncertainty
Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models
Vehtari, Aki, Mononen, Tommi, Tolvanen, Ville, Sivula, Tuomas, Winther, Ole
The future predictive performance of a Bayesian model can be estimated using Bayesian cross-validation. In this article, we consider Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation (EP). We study the properties of several Bayesian leave-one-out (LOO) cross-validation approximations that in most cases can be computed with a small additional cost after forming the posterior approximation given the full data. Our main objective is to assess the accuracy of the approximative LOO cross-validation estimators. That is, for each method (Laplace and EP) we compare the approximate fast computation with the exact brute force LOO computation. Secondarily, we evaluate the accuracy of the Laplace and EP approximations themselves against a ground truth established through extensive Markov chain Monte Carlo simulation. Our empirical results show that the approach based upon a Gaussian approximation to the LOO marginal distribution (the so-called cavity distribution) gives the most accurate and reliable results among the fast methods.
Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD
Taghizadeh, Nasrin, Faili, Hesham
Wordnets are an effective resource for natural language processing and information retrieval, especially for semantic processing and meaning related tasks. So far, wordnets have been constructed for many languages. However, the automatic development of wordnets for low-resource languages has not been well studied. In this paper, an Expectation-Maximization algorithm is used to create high quality and large scale wordnets for poorresource languages. The proposed method benefits from possessing cross-lingual word sense disambiguation and develops a wordnet by only using a bi-lingual dictionary and a monolingual corpus. The proposed method has been executed with Persian language and the resulting wordnet has been evaluated through several experiments. The results show that the induced wordnet has a precision score of 90% and a recall score of 35%.
ATD: Anomalous Topic Discovery in High Dimensional Discrete Data
Soleimani, Hossein, Miller, David J.
We propose an algorithm for detecting patterns exhibited by anomalous clusters in high dimensional discrete data. Unlike most anomaly detection (AD) methods, which detect individual anomalies, our proposed method detects groups (clusters) of anomalies; i.e. sets of points which collectively exhibit abnormal patterns. In many applications this can lead to better understanding of the nature of the atypical behavior and to identifying the sources of the anomalies. Moreover, we consider the case where the atypical patterns exhibit on only a small (salient) subset of the very high dimensional feature space. Individual AD techniques and techniques that detect anomalies using all the features typically fail to detect such anomalies, but our method can detect such instances collectively, discover the shared anomalous patterns exhibited by them, and identify the subsets of salient features. In this paper, we focus on detecting anomalous topics in a batch of text documents, developing our algorithm based on topic models. Results of our experiments show that our method can accurately detect anomalous topics and salient features (words) under each such topic in a synthetic data set and two real-world text corpora and achieves better performance compared to both standard group AD and individual AD techniques. All required code to reproduce our experiments is available from https://github.com/hsoleimani/ATD
On the estimation of initial conditions in kernel-based system identification
Risuleo, Riccardo Sven, Bottegal, Giulio, Hjalmarsson, Hรฅkan
Recent developments in system identification have brought attention to regularized kernel-based methods, where, adopting the recently introduced stable spline kernel, prior information on the unknown process is enforced. This reduces the variance of the estimates and thus makes kernel-based methods particularly attractive when few input-output data samples are available. In such cases however, the influence of the system initial conditions may have a significant impact on the output dynamics. In this paper, we specifically address this point. We propose three methods that deal with the estimation of initial conditions using different types of information. The methods consist in various mixed maximum likelihood--a posteriori estimators which estimate the initial conditions and tune the hyperparameters characterizing the stable spline kernel. To solve the related optimization problems, we resort to the expectation-maximization method, showing that the solutions can be attained by iterating among simple update steps. Numerical experiments show the advantages, in terms of accuracy in reconstructing the system impulse response, of the proposed strategies, compared to other kernel-based schemes not accounting for the effect initial conditions.
Blind system identification using kernel-based methods
Bottegal, Giulio, Risuleo, Riccardo S., Hjalmarsson, Hรฅkan
We propose a new method for blind system identification. Resorting to a Gaussian regression framework, we model the impulse response of the unknown linear system as a realization of a Gaussian process. The structure of the covariance matrix (or kernel) of such a process is given by the stable spline kernel, which has been recently introduced for system identification purposes and depends on an unknown hyperparameter. We assume that the input can be linearly described by few parameters. We estimate these parameters, together with the kernel hyperparameter and the noise variance, using an empirical Bayes approach. The related optimization problem is efficiently solved with a novel iterative scheme based on the Expectation-Maximization method. In particular, we show that each iteration consists of a set of simple update rules. We show, through some numerical experiments, very promising performance of the proposed method.
Variational Gaussian Copula Inference
Han, Shaobo, Liao, Xuejun, Dunson, David B., Carin, Lawrence
We utilize copulas to constitute a unified framework for constructing and optimizing variational proposals in hierarchical Bayesian models. For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational Gaussian copula approach, in which the parametric Gaussian copula family is able to preserve multivariate posterior dependence, and the nonparametric transformations based on Bernstein polynomials provide ample flexibility in characterizing the univariate marginal posteriors.
Recurrent Exponential-Family Harmoniums without Backprop-Through-Time
Makin, Joseph G., Dichter, Benjamin K., Sabes, Philip N.
Exponential-family harmoniums (EFHs), which extend restricted Boltzmann machines (RBMs) from Bernoulli random variables to other exponential families (Welling et al., 2005), are generative models that can be trained with unsupervised-learning techniques, like contrastive divergence (Hinton et al., 2006; Hinton, 2002), as density estimators for static data. Methods for extending RBMs--and likewise EFHs--to data with temporal dependencies have been proposed previously (Sutskever and Hinton, 2007; Sutskever et al., 2009), the learning procedure being validated by qualitative assessment of the generative model. Here we propose and justify, from a very different perspective, an alternative training procedure, proving sufficient conditions for optimal inference under that procedure. The resulting algorithm can be learned with only forward passes through the data--backprop-through-time is not required, as in previous approaches. The proof exploits a recent result about information retention in density estimators (Makin and Sabes, 2015), and applies it to a "recurrent EFH" (rEFH) by induction. Finally, we demonstrate optimality by simulation, testing the rEFH: (1) as a filter on training data generated with a linear dynamical system, the position of which is noisily reported by a population of "neurons" with Poisson-distributed spike counts; and (2) with the qualitative experiments proposed by Sutskever et al. (2009).
Classification of Big Data with Application to Imaging Genetics
Ulfarsson, Magnus O., Palsson, Frosti, Sigurdsson, Jakob, Sveinsson, Johannes R.
ECENT technological achievements and globalization have increased data acquisition capability in almost all corners of human activities, ranging from scientific and engineering endeavors such as genomics, medical imaging, remote sensing, economics and finance, and all the way to people's personal lives with the emergence of social media through the world wide web and mobile networks. The enormous growth of data creates daunting challenges, not only in finding out how to store and access the data, but more importantly, how to process and make sense of it. Also, since data collection is expensive, we are somehow obliged to make good use of the data at hand, so it is obvious that for further progress, the development of efficient algorithms for processing big data is very important. Big data is usually considered in terms of the number of observations n and the number of variables p measured on each observation. In many branches of science such as genetics and medical imaging, the number of variables is very large and is often much larger than the number of observations. This scenario is often denoted as p n.
Arimo Predictive Engine (tm) Shows Opportunity to Improve Investor Returns in Peer-to-Peer Lending - Arimo
Random forest model using Lending Club public dataset shows opportunity to improve adjusted return by 2.75% Arimo recently performed a study using a public dataset provided by Lending Club with the goal of showing how machine learning could improve investor returns. To do this we used the PredictiveEngine component of our Data Intelligence Platform, which provides the ability to easily build a variety of predictive machine learning models which scale transparently when deployed on distributed parallel computing platforms. Lending Club is an online peer-to-peer lending company that connects borrowers with investors who have capital to lend. When a loan application is submitted by a borrower, Lending Club reviews and decides whether to offer a loan at a risk-adjusted rate or to reject the application. As of the 3rd quarter of 2015, more than 12 billion in loans have been issued through Lending Club.
How To Think Real Good
First, it is a brain dump: too long, epsilon-baked, and unpolished. Second, it is not obviously relevant to the topic of this site. Third, parts are more technical than most readers would want. However, a quick, bad post may be better than none. This post was prompted by discussions about Bayesianism and the LessWrong rationalist community, with Scott Alexander, Catharine G. Evans, muflax, and St. Rev. (among others). They are each brilliant, quirky, articulate, and fascinating; consider following them online! They might disagree with much of this post, though, and are not implicated in its defects.] This site concerns ways of thinking about some particularly important things: purpose, self, ethics, authority, and meaning, for instance. My aim is to point out common mistakes in thinking about those things, and how to do better. I enjoy thinking about thinking. That's one reason I spent a dozen years in artificial intelligence research. To make a computer think, you'd need to understand how you think. So AI research is a way of thinking about thinking that forces you to be specific. It calls your bluff if you think you understand thinking, but don't. I thought a lot about how to do AI. 1 In 1988, I put together "How to do research at the MIT AI Lab," a guide for graduate students. Although I edited it, it was a collaboration of many people. There are now many similar guides, some of them better, but this was the first.