Goto

Collaborating Authors

 Bayesian Inference


Stochastic tree ensembles for regularized nonlinear regression

arXiv.org Machine Learning

Tree-based algorithms for supervised learning, such as Classification and Regression Trees (CART) (Breiman et al., 1984), random forests (Breiman, 1996, 2001), adaBoost (Freund and Schapire, 1997), and gradient boosting (Breiman, 1997; Friedman, 2001, 2002), are widely used for applied supervised learning. As a whole, these methods are popular in applied settings due to their speed and accuracy in mean estimation and out-of-sample prediction tasks. One limitation of such methods is their well-known sensitivity to tuning parameters, which require costly cross-validation to optimize. Bayesian additive regression trees (BART) (Chipman et al., 2007, 2010) is a popular model-based alternative that is often more accurate than other treebased methods; specifically, BART boasts valuable robustness to the choice of tuning-parameters. However, relative to random forests and boosting, BART's wider adoption has been slowed by its more severe computational demands, owing to its reliance on a random walk Metropolis-Hastings Markov chain Monte Carlo (MCMC) algorithm. Despite this limitation, BART has inspired a considerable body of research in recent years.


Projective Preferential Bayesian Optimization

arXiv.org Machine Learning

Bayesian optimization is an effective method for finding extrema of a black-box function. We propose a new type of Bayesian optimization for learning user preferences in high-dimensional spaces. The central assumption is that the underlying objective function cannot be evaluated directly, but instead a minimizer along a projection can be queried, which we call a projective preferential query. The form of the query allows for feedback that is natural for a human to give, and which enables interaction. This is demonstrated in a user experiment in which the user feedback comes in the form of optimal position and orientation of a molecule adsorbing to a surface. We demonstrate that our framework is able to find a global minimum of a high-dimensional black-box function, which is an infeasible task for existing preferential Bayesian optimization frameworks that are based on pairwise comparisons.


Inferential Induction: Joint Bayesian Estimation of MDPs and Value Functions

arXiv.org Machine Learning

Bayesian reinforcement learning (BRL) offers a decision-theoretic solution to the problem of reinforcement learning. However, typical model-based BRL algorithms have focused either on ma intaining a posterior distribution on models or value functions and combining this with approx imate dynamic programming or tree search. This paper describes a novel backwards induction pri nciple for performing joint Bayesian estimation of models and value functions, from which many new BRL algorithms can be obtained. We demonstrate this idea with algorithms and experiments in discrete state spaces.


Overcoming Mode Collapse and the Curse of Dimensionality

#artificialintelligence

Machine Learning Lecture at CMU by Ke Li, Ph.D. Candidate at the University of California, Berkeley Lecturer: Ke Li Carnegie Mellon University Abstract: In this talk, Li presents his team's work on overcoming two long-standing problems in machine learning and algorithms: 1. Mode collapse in generative adversarial nets (GANs) Generative adversarial nets (GANs) are perhaps the most popular class of generative models in use today. Unfortunately, they suffer from the well-documented problem of mode collapse, which the many successive variants of GANs have failed to overcome. I will illustrate why mode collapse happens fundamentally and show a simple way to overcome it, which is the basis of a new method known as Implicit Maximum Likelihood Estimation (IMLE). It turns out that this problem is not insurmountable - I will explain how the curse of dimensionality arises and show a simple way to overcome it, which gives rise to a new family of algorithms known as Dynamic Continuous Indexing (DCI). Bio: Ke Li is a recent Ph.D. graduate from UC Berkeley, where he was advised by Prof. Jitendra Malik, and will join Google as a Research Scientist and the Institute for Advanced Study (IAS) as a Member hosted by Prof. Sanjeev Arora.


Extended Stochastic Gradient MCMC for Large-Scale Bayesian Variable Selection

arXiv.org Machine Learning

Stochastic gradient Markov chain Monte Carlo (MCMC) algorithms have received much attention in Bayesian computing for big data problems, but they are only applicable to a small class of problems for which the parameter space has a fixed dimension and the log-posterior density is differentiable with respect to the parameters. This paper proposes an extended stochastic gradient MCMC lgoriathm which, by introducing appropriate latent variables, can be applied to more general large-scale Bayesian computing problems, such as those involving dimension jumping and missing data. Numerical studies show that the proposed algorithm is highly scalable and much more efficient than traditional MCMC algorithms. The proposed algorithms have much alleviated the pain of Bayesian methods in big data computing.


The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

arXiv.org Machine Learning

Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work developing this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational distribution to a more compact parameterization. For a variety of deep Bayesian neural networks trained using Gaussian mean-field variational inference, we find that the posterior standard deviations consistently exhibit strong low-rank structure after convergence. This means that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance. Furthermore, we find that such factorized parameterizations improve the signal-to-noise ratio of stochastic gradient estimates of the variational lower bound, resulting in faster convergence.


Constructing a variational family for nonlinear state-space models

arXiv.org Machine Learning

Mathematical models of system dynamics are a core technology in most model-based engineered systems acting and interacting with their environment. Examples include GPS, autonomous vehicles, passenger aircraft and robotics, to name just a few. The remarkable utility of mathematical models stems from the fact that, inter alia, they enable decision making based on prediction of system behaviour under new scenarios, accelerate the analysis and design processes, are fundamental to detecting faults or changes, and they are capable of handling uncertainty that is present in data, assumptions and algorithms. Motivated by the broad applicability and utility of modelling, the scientific community has devoted significant research attention towards learning dynamical models from data. Importantly, for dynamic systems, the sequence or ordering of the data must be maintained as future outcomes are deemed to be fundamentally related to the past. This is sometimes called sequence learning (Sun and Giles, 2001) or system identification (Ljung, 1999). In essence, these approaches search over a space of models and determine the model that best (in some sense) fits the data while maintaining the time ordering. The current paper is directed towards solving this important problem. To make these ideas more concrete, here we assume that data from the system of interest is available in the form of a data record y 1:T {y 1,...,y T }, where each measurementy k is potentially multidimensional and the number of available measurements is denoted as T 0. We further assume that the data may be adequately described as an instance from a joint distribution that is parametrized by an unknown vectorฮธ (called the parameter vector), that is (with abuse of notation)


Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

arXiv.org Machine Learning

Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We prove that commonly used incomplete decoding algorithms - greedy search, beam search, top-k sampling, and nucleus sampling - are inconsistent, despite the fact that recurrent language models are trained to produce sequences of finite length. Based on these insights, we propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model. Empirical results show that inconsistency occurs in practice, and that the proposed methods prevent inconsistency.


Product Kanerva Machines: Factorized Bayesian Memory

arXiv.org Machine Learning

An ideal cognitively-inspired memory system would compress and organize incoming items. The Kanerva Machine (Wu et al., 2018b;a) is a Bayesian model that naturally implements online memory compression. However, the organization of the Kanerva Machine is limited by its use of a single Gaussian random matrix for storage. Here we introduce the Product Kanerva Machine, which dynamically combines many smaller Kanerva Machines. Its hierarchical structure provides a principled way to abstract invariant features and gives scaling and capacity advantages over single Kanerva Machines. We show that it can exhibit unsupervised clustering, find sparse and combinatorial allocation patterns, and discover spatial tunings that approximately factorize simple images by object.


Macroscopic Traffic Flow Modeling with Physics Regularized Gaussian Process: A New Insight into Machine Learning Applications

arXiv.org Machine Learning

Despite the wide implementation of machine learning (ML) techniques in traffic flow modeling recently, those data-driven approaches often fall short of accuracy in the cases with a small or noisy dataset. To address this issue, this study presents a new modeling framework, named physics regularized machine learning (PRML), to encode classical traffic flow models (referred as physical models) into the ML architecture and to regularize the ML training process. More specifically, a stochastic physics regularized Gaussian process (PRGP) model is developed and a Bayesian inference algorithm is used to estimate the mean and kernel of the PRGP. A physical regularizer based on macroscopic traffic flow models is also developed to augment the estimation via a shadow GP and an enhanced latent force model is used to encode physical knowledge into stochastic processes. Based on the posterior regularization inference framework, an efficient stochastic optimization algorithm is also developed to maximize the evidence lowerbound of the system likelihood. To prove the effectiveness of the proposed model, this paper conducts empirical studies on a real-world dataset which is collected from a stretch of I-15 freeway, Utah. Results show the new PRGP model can outperform the previous compatible methods, such as calibrated pure physical models and pure machine learning methods, in estimation precision and input robustness.