Goto

Collaborating Authors

 Uncertainty


Variational Boosting: Iteratively Refining Posterior Approximations

arXiv.org Machine Learning

We propose a black-box variational inference method to approximate intractable distributions with an increasingly rich approximating class. Our method, termed variational boosting, iteratively refines an existing variational approximation by solving a sequence of optimization problems, allowing the practitioner to trade computation time for accuracy. We show how to expand the variational approximating class by incorporating additional covariance structure and by introducing new components to form a mixture. We apply variational boosting to synthetic and real statistical models, and show that resulting posterior inferences compare favorably to existing posterior approximation algorithms in both accuracy and efficiency.


Startup Unveils Machine Learning Products Based on Novel Approach to AI

#artificialintelligence

Gamalon Inc, emerged from stealth mode this week, announced two machine learning products, based on an in-house technology known as Bayesian Program Synthesis (BPS). The company claims BPS can perform machine learning tasks 100 times faster than conventional deep learning techniques, while providing more accurate results. "We call our way of doing this Bayesian program learning," said Gamalon founder and CEO, Ben Vigoda at a recent TED talk. He believes using Bayesian probabilistic modeling is a much more efficient way, that is, a much less computationally intensive way, to infuse intelligence into machines. Unlike deep learning, which often needs millions of data examples to train a neural network, a Bayesian model can be built with much fewer examples.


Causal inference via algebraic geometry: feasibility tests for functional causal structures with two binary observed variables

arXiv.org Machine Learning

We provide a scheme for inferring causal relations from uncontrolled statistical data based on tools from computational algebraic geometry, in particular, the computation of Groebner bases. We focus on causal structures containing just two observed variables, each of which is binary. We consider the consequences of imposing different restrictions on the number and cardinality of latent variables and of assuming different functional dependences of the observed variables on the latent ones (in particular, the noise need not be additive). We provide an inductive scheme for classifying functional causal structures into distinct observational equivalence classes. For each observational equivalence class, we provide a procedure for deriving constraints on the joint distribution that are necessary and sufficient conditions for it to arise from a model in that class. We also demonstrate how this sort of approach provides a means of determining which causal parameters are identifiable and how to solve for these. Prospects for expanding the scope of our scheme, in particular to the problem of quantum causal inference, are also discussed.


Approximate Bayes learning of stochastic differential equations

arXiv.org Machine Learning

Gaussian processes are used as flexible models for these functions and estimates are calculated directly from dense data sets using Gaussian process regression. We also develop an approximate expectation maximization algorithm to deal with the unobserved, latent dynamics between sparse observations. The posterior over states is approximated by a piecewise linearized process of the Ornstein-Uhlenbeck type and the maximum a posteriori estimation of the drift is facilitated by a sparse Gaussian process approximation. I. INTRODUCTION Dynamical systems in the physical world evolve in continuous time and often the (noisy) dynamics is described naturally in terms of (stochastic) differential equations [1]. However, due to missing information and/or the complexity of a system it may be difficult to derive such a model from first principles. Instead, the goal often is to fit it to observations of the state at discrete points in time [2]. So far most inference approaches for these systems have dealt with the estimation of parameters contained in the drift function (e.g. Assumptions for the stochastic part were often simple: additive noise with the diffusion constant as the only parameter to estimate. But as both drift and diffusion can be nonlinear functions of the state vector, a nonparametric estimation would be a natural generalization, when a large number of data points is available. Previous nonparametric approaches were based on solving the adjoint Fokker-Planck equation [5] and on kernel estimators [6] and are effectively restricted to one-dimensional models. An alternative would be a Bayesian nonparametric approach, where prior knowledge on the unknown functions--such as smoothness, variability, or periodicity--can be encoded in a probability distribution. A recent result by [7, 8] presented an important step in this direction. The authors have shown that Gaussian processes (GPs) provide a natural family of prior probability measures over drift functions. If a path of the stochastic dynamics is observed densely, the posterior process over the drift is also a GP. Unfortunately, this simplicity is lost, when observations are not dense, but separated by larger time intervals. In [7] the case of sparse observations has been treated by a Monte Carlo approach, which alternates between sampling complete diffusion paths of the stochastic differential equation (SDE) and sampling from GP for the drift given a philipp.batz@tu-berlin.de


The Mathematics of Machine Learning

@machinelearnbot

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results. There are many reasons why the mathematics of Machine Learning is important and I'll highlight some of them below: The main question when trying to understand an interdisciplinary field such as Machine Learning is the amount of maths necessary and the level of maths needed to understand these techniques.


Completing a joint PMF from projections: a low-rank coupled tensor factorization approach

arXiv.org Machine Learning

There has recently been considerable interest in completing a low-rank matrix or tensor given only a small fraction (or few linear combinations) of its entries. Related approaches have found considerable success in the area of recommender systems, under machine learning. From a statistical estimation point of view, the gold standard is to have access to the joint probability distribution of all pertinent random variables, from which any desired optimal estimator can be readily derived. In practice high-dimensional joint distributions are very hard to estimate, and only estimates of low-dimensional projections may be available. We show that it is possible to identify higher-order joint PMFs from lower-order marginalized PMFs using coupled low-rank tensor factorization. Our approach features guaranteed identifiability when the full joint PMF is of low-enough rank, and effective approximation otherwise. We provide an algorithmic approach to compute the sought factors, and illustrate the merits of our approach using rating prediction as an example.


Semi-supervised Learning for Discrete Choice Models

arXiv.org Machine Learning

We introduce a semi-supervised discrete choice model to calibrate discrete choice models when relatively few requests have both choice sets and stated preferences but the majority only have the choice sets. Two classic semi-supervised learning algorithms, the expectation maximization algorithm and the cluster-and-label algorithm, have been adapted to our choice modeling problem setting. We also develop two new algorithms based on the cluster-and-label algorithm. The new algorithms use the Bayesian Information Criterion to evaluate a clustering setting to automatically adjust the number of clusters. Two computational studies including a hotel booking case and a large-scale airline itinerary shopping case are presented to evaluate the prediction accuracy and computational effort of the proposed algorithms. Algorithmic recommendations are rendered under various scenarios.


An Empirical Bayes Approach for High Dimensional Classification

arXiv.org Machine Learning

We propose an empirical Bayes estimator based on Dirichlet process mixture model for estimating the sparse normalized mean difference, which could be directly applied to the high dimensional linear classification. In theory, we build a bridge to connect the estimation error of the mean difference and the misclassification error, also provide sufficient conditions of sub-optimal classifiers and optimal classifiers. In implementation, a variational Bayes algorithm is developed to compute the posterior efficiently and could be parallelized to deal with the ultra-high dimensional case.


Distance-Penalized Active Learning Using Quantile Search

arXiv.org Machine Learning

Adaptive sampling theory has shown that, with proper assumptions on the signal class, algorithms exist to reconstruct a signal in $\mathbb{R}^{d}$ with an optimal number of samples. We generalize this problem to the case of spatial signals, where the sampling cost is a function of both the number of samples taken and the distance traveled during estimation. This is motivated by our work studying regions of low oxygen concentration in the Great Lakes. We show that for one-dimensional threshold classifiers, a tradeoff between the number of samples taken and distance traveled can be achieved using a generalization of binary search, which we refer to as quantile search. We characterize both the estimation error after a fixed number of samples and the distance traveled in the noiseless case, as well as the estimation error in the case of noisy measurements. We illustrate our results in both simulations and experiments and show that our method outperforms existing algorithms in the majority of practical scenarios.


The Mathematics of Machine Learning

#artificialintelligence

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.