Goto

Collaborating Authors

 Bayesian Inference


Bayesian Machine Learning in Python: A/B Testing

#artificialintelligence

Link: Bayesian Machine Learning in Python: A/B Testing coupon code udemy Traditional A/B testing has been around for a long time, and it's full of approximations and confusing definitions. In this course, while we will do traditional A/B testing in order to appreciate its complexity, what we will eventually get to is the Bayesian machine learning way of doing things. First, we'll see if we can improve on ... Bestseller by Lazy Programmer Inc. What you'll learn Use adaptive algorithms to improve A/B testing performance Understand the difference between Bayesian and frequentist statistics Apply Bayesian methods to A/B testing Description This course is all about A/B testing. A/B testing is used everywhere.


Amortised Learning by Wake-Sleep

arXiv.org Machine Learning

Models that employ latent variables to capture structure in observed data lie at the heart of many current unsupervised learning algorithms, but exact maximum-likelihood learning for powerful and flexible latent-variable models is almost always intractable. Thus, state-of-the-art approaches either abandon the maximum-likelihood framework entirely, or else rely on a variety of variational approximations to the posterior distribution over the latents. Here, we propose an alternative approach that we call amortised learning. Rather than computing an approximation to the posterior over latents, we use a wake-sleep Monte-Carlo strategy to learn a function that directly estimates the maximum-likelihood parameter updates. Amortised learning is possible whenever samples of latents and observations can be simulated from the generative model, treating the model as a "black box". We demonstrate its effectiveness on a wide range of complex models, including those with latents that are discrete or supported on non-Euclidean spaces.


Stochastic Gradient MCMC with Repulsive Forces

arXiv.org Machine Learning

We propose a unifying view of two different Bayesian inference algorithms, Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) and Stein Variational Gradient Descent (SVGD), leading to improved and efficient novel sampling schemes. We show that SVGD combined with a noise term can be framed as a multiple chain SG-MCMC method. Instead of treating each parallel chain independently from others, our proposed algorithm implements a repulsive force between particles, avoiding collapse and facilitating a better exploration of the parameter space. We also show how the addition of this noise term is necessary to obtain a valid SG-MCMC sampler, a significant difference with SVGD. Experiments with both synthetic distributions and real datasets illustrate the benefits of the proposed scheme.


Split-BOLFI for for misspecification-robust likelihood free inference in high dimensions

arXiv.org Machine Learning

Likelihood-free inference for simulator-based statistical models has recently grown rapidly from its infancy to a useful tool for practitioners. However, models with more than a very small number of parameters as the target of inference have remained an enigma, in particular for the approximate Bayesian computation (ABC) community. To advance the possibilities for performing likelihood-free inference in high-dimensional parameter spaces, here we introduce an extension of the popular Bayesian optimisation based approach to approximate discrepancy functions in a probabilistic manner which lends itself to an efficient exploration of the parameter space. Our method achieves computational scalability by using separate acquisition procedures for the discrepancies defined for different parameters. These efficient high-dimensional simulation acquisitions are combined with exponentiated loss-likelihoods to provide a misspecification-robust characterisation of the marginal posterior distribution for all model parameters. The method successfully performs computationally efficient inference in a 100-dimensional space on canonical examples and compares favourably to existing Copula-ABC methods. We further illustrate the potential of this approach by fitting a bacterial transmission dynamics model to daycare centre data, which provides biologically coherent results on the strain competition in a 30-dimensional parameter space.


Preference Modeling with Context-Dependent Salient Features

arXiv.org Machine Learning

We consider the problem of estimating a ranking on a set of items from noisy pairwise comparisons given item features. We address the fact that pairwise comparison data often reflects irrational choice, e.g. intransitivity. Our key observation is that two items compared in isolation from other items may be compared based on only a salient subset of features. Formalizing this framework, we propose the "salient feature preference model" and prove a sample complexity result for learning the parameters of our model and the underlying ranking with maximum likelihood estimation. We also provide empirical results that support our theoretical bounds and illustrate how our model explains systematic intransitivity. Finally we demonstrate strong performance of maximum likelihood estimation of our model on both synthetic data and two real data sets: the UT Zappos50K data set and comparison data about the compactness of legislative districts in the US.


Leveraging Cross Feedback of User and Item Embeddings for Variational Autoencoder based Collaborative Filtering

arXiv.org Machine Learning

Matrix factorization (MF) has been widely applied to collaborative filtering in recommendation systems. Its Bayesian variants can derive posterior distributions of user and item embeddings, and are more robust to sparse ratings. However, the Bayesian methods are restricted by their update rules for the posterior parameters due to the conjugacy of the priors and the likelihood. Neural networks can potentially address this issue by capturing complex mappings between the posterior parameters and the data. In this paper, we propose a variational auto-encoder based Bayesian MF framework. It leverages not only the data but also the information from the embeddings to approximate their joint posterior distribution. The approximation is an iterative procedure with cross feedback of user and item embeddings to the others' encoders. More specifically, user embeddings sampled in the previous iteration, alongside their ratings, are fed back into the item-side encoders to compute the posterior parameters for the item embeddings in the current iteration, and vice versa. The decoder network then reconstructs the data using the MF with the currently re-sampled user and item embeddings. We show the effectiveness of our framework in terms of reconstruction errors across five real-world datasets. We also perform ablation studies to illustrate the importance of the cross feedback component of our framework in lowering the reconstruction errors and accelerating the convergence.


An Advance on Variable Elimination with Applications to Tensor-Based Computation

arXiv.org Artificial Intelligence

We present new results on the classical algorithm of variable elimination, which underlies many algorithms including for probabilistic inference. The results relate to exploiting functional dependencies, allowing one to perform inference and learning efficiently on models that have very large treewidth. The highlight of the advance is that it works with standard (dense) factors, without the need for sparse factors or techniques based on knowledge compilation that are commonly utilized. This is significant as it permits a direct implementation of the improved variable elimination algorithm using tensors and their operations, leading to extremely efficient implementations especially when learning model parameters. Moreover, the proposed technique does not require knowledge of the specific functional dependencies, only that they exist, so can be used when learning these dependencies. We illustrate the efficacy of our proposed algorithm by compiling Bayesian network queries into tensor graphs and then learning their parameters from labeled data using a standard tool for tensor computation.


Deep Sigma Point Processes

arXiv.org Machine Learning

We introduce Deep Sigma Point Processes, a class of parametric models inspired by the compositional structure of Deep Gaussian Processes (DGPs). Deep Sigma Point Processes (DSPPs) retain many of the attractive features of (variational) DGPs, including mini-batch training and predictive uncertainty that is controlled by kernel basis functions. Importantly, since DSPPs admit a simple maximum likelihood inference procedure, the resulting predictive distributions are not degraded by any posterior approximations. In an extensive empirical comparison on univariate and multivariate regression tasks we find that the resulting predictive distributions are significantly better calibrated than those obtained with other probabilistic methods for scalable regression, including variational DGPs--often by as much as a nat per datapoint.


Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

arXiv.org Machine Learning

Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally intractable for complex control problems. We propose a highly efficient Bayesian reward learning algorithm that scales to high-dimensional imitation learning problems by first pre-training a low-dimensional feature encoding via self-supervised tasks and then leveraging preferences over demonstrations to perform fast Bayesian inference. We evaluate our proposed approach on the task of learning to play Atari games from demonstrations, without access to the game score. For Atari games our approach enables us to generate 100,000 samples from the posterior over reward functions in only 5 minutes using a personal laptop. Furthermore, our proposed approach achieves comparable or better imitation learning performance than state-of-the-art methods that only find a point estimate of the reward function. Finally, we show that our approach enables efficient high-confidence policy performance bounds. We show that these high-confidence performance bounds can be used to rank the performance and risk of a variety of evaluation policies, despite not having samples of the reward function. We also show evidence that high-confidence performance bounds can be used to detect reward hacking in complex imitation learning problems.


Meta-learning for mixed linear regression

arXiv.org Machine Learning

Recent advances in machine learning highlight successes on a small set of tasks where a large number of labeled examples have been collected and exploited. These include image classification with 1.2 million labeled examples Deng et al. (2009) and French-English machine translation with 40 million paired sentences Bojar et al. (2014). For common tasks, however, collecting clean labels is costly, as they require human expertise (as in medical imaging) or physical interactions (as in robotics), for example. Thus collected real-world datasets follow a long-tailed distribution, in which a dominant set of tasks only have a small number of training examples Wang et al. (2017). Inspired by human ingenuity in quickly solving novel problems by leveraging prior experience, meta-learning approaches aim to jointly learn from past experience to quickly adapt to new tasks with little available data Schmidhuber (1987); Thrun & Pratt (2012). This has had a significant impact in few-shot supervised learning, where each task is associated with only a few training examples. By leveraging structural similarities among those tasks, one can achieve accuracy far greater than what can be achieved for each task in isolation Finn et al. (2017); Ravi & Larochelle (2016); Koch et al. (2015); Oreshkin et al. (2018); Triantafillou et al. (2019); Rusu et al. (2018). The success of such approaches hinges on the following fundamental question: When can we jointly train small data tasks to achieve the accuracy of large data tasks? We investigate this tradeoff under a canonical scenario where the tasks are linear regressions in d-dimensions and the regression parameters are drawn i.i.d.