Goto

Collaborating Authors

 Bayesian Learning


Generation of discrete random variables in scalable frameworks

arXiv.org Machine Learning

Abstract: In this paper, we face the problem of simulating discrete random variables with general and varying distributions in a scalable framework, where fully parallelizable operations should be preferred. The new paradigm is inspired by the context of discrete choice models. Compared to classical algorithms, we add parallelized randomness, and we leave the final simulation of the random variable to a single associative operation. We characterize the set of algorithms that work in this way, and those algorithms that may have an additive or multiplicative local noise. As a consequence, we could define a natural way to solve some popular simulation problems.


Introduction to Machine Learning

#artificialintelligence

About โ€ข subfield of Artificial Intelligence (AI) โ€ข name is derived from the concept that it deals with "construction and study of systems that can learn from data" โ€ข can be seen as building blocks to make computers learn to behave more intelligently โ€ข It is a theoretical concept. There are various techniques with various implementations.


A case study of Empirical Bayes in User-Movie Recommendation system

arXiv.org Machine Learning

In this article we provide a formulation of empirical bayes described by Atchade (2011) to tune the hyperparameters of priors used in bayesian set up of collaborative filter. We implement the same in MovieLens small dataset. We see that it can be used to get a good initial choice for the parameters. It can also be used to guess an initial choice for hyper-parameters in grid search procedure even for the datasets where MCMC oscillates around the true value or takes long time to converge.


Bayesian Models of Data Streams with Hierarchical Power Priors

arXiv.org Machine Learning

Making inferences from data streams is a pervasive problem in many modern data analysis applications. But it requires to address the problem of continuous model updating, and adapt to changes or drifts in the underlying data generating distribution. In this paper, we approach these problems from a Bayesian perspective covering general conjugate exponential models. Our proposal makes use of non-conjugate hierarchical priors to explicitly model temporal changes of the model parameters. We also derive a novel variational inference scheme which overcomes the use of non-conjugate priors while maintaining the computational efficiency of variational methods over conjugate models. The approach is validated on three real data sets over three latent variable models.


Exhaustive search for sparse variable selection in linear regression

arXiv.org Machine Learning

We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.


Note Value Recognition for Piano Transcription Using Markov Random Fields

arXiv.org Artificial Intelligence

This paper presents a statistical method for use in music transcription that can estimate score times of note onsets and offsets from polyphonic MIDI performance signals. Because performed note durations can deviate largely from score-indicated values, previous methods had the problem of not being able to accurately estimate offset score times (or note values) and thus could only output incomplete musical scores. Based on observations that the pitch context and onset score times are influential on the configuration of note values, we construct a context-tree model that provides prior distributions of note values using these features and combine it with a performance model in the framework of Markov random fields. Evaluation results show that our method reduces the average error rate by around 40 percent compared to existing/simple methods. We also confirmed that, in our model, the score model plays a more important role than the performance model, and it automatically captures the voice structure by unsupervised learning.


Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

arXiv.org Machine Learning

Extracting meaningful knowledge from large and nonlinearly-connected data structures is of primary importance for efficiently utilizing data. Big data problems (e.g. 1 GB/s) often contain superpositions of multiple distinct processes, sources, or latent factors. Estimating or inferring the component distributions or statistical factors is called the mixture problem. Methods for solving mixture problems are known as mixture models [Everitt, 1996], and in machine learning they are used to define Bayes classifiers [Bishop, 2006]. Mixture models are a widely applicable pattern recognition and dimensionality reduction approach for extracting meaningful content from large and complex datasets. Only finite mixture models are described here, although countably or uncountably infinite numbers of mixture components are also possible [McAuliffe et al., 2006]. In terms of dimensionality reduction methods, Laplacian mixture models provide global and nonhierarchical analyses of massive datasets using scalable algorithms.


Generalized Random Forests

arXiv.org Machine Learning

We propose generalized random forests, a method for non-parametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method operates at a particular point in covariate space by considering a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian, and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: non-parametric quantile regression, conditional average partial effect estimation, and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.


How Bayesian Inference Works

@machinelearnbot

Brandon is an author and deep learning developer. He has worked as Principal Data Scientist at Microsoft, as well as for DuPont Pioneer and Sandia National Laboratories. Brandon earned a Ph.D. in Mechanical Engineering from the Massachusetts Institute of Technology. Bayesian inference is a way to get sharper predictions from your data. It's particularly useful when you don't have as much data as you would like and want to juice every last bit of predictive strength from it. Although it is sometimes described with reverence, Bayesian inference isn't magic or mystical. And even though the math under the hood can get dense, the concepts behind it are completely accessible. In brief, Bayesian inference lets you draw stronger conclusions from your data by folding in what you already know about the answer. Bayesian inference is based on the ideas of Thomas Bayes, a nonconformist Presbyterian minister in London about 300 years ago. He wrote two books, one on theology, and one on probability.


Structured Black Box Variational Inference for Latent Time Series Models

arXiv.org Machine Learning

Continuous latent time series models are prevalent in Bayesian modeling; examples include the Kalman filter, dynamic collaborative filtering, or dynamic topic models. These models often benefit from structured, non mean field variational approximations that capture correlations between time steps. Black box variational inference with reparameterization gradients (BBVI) allows us to explore a rich new class of Bayesian non-conjugate latent time series models; however, a naive application of BBVI to such structured variational models would scale quadratically in the number of time steps. We describe a BBVI algorithm analogous to the forward-backward algorithm which instead scales linearly in time. It allows us to efficiently sample from the variational distribution and estimate the gradients of the ELBO. Finally, we show results on the recently proposed dynamic word embedding model, which was trained using our method.