sparse gps
Export Reviews, Discussions, Author Feedback and Meta-Reviews
"NIPS Neural Information Processing Systems 8-11th December 2014, Montreal, Canada",,, "Paper ID:","1660" "Title:","Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models" Current Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors have come up with a novel way to distribute the optimisation of variational Bayesian sparse GPs (for regression and LVMs). By reformulating Titsias' (2009) variational lower bound, such that the training data become independent given the inducing points, training can be parallelised across nodes on a cluster or network, via a Map-Reduce implementation. There is of course a bit of overhead, because of the optimisation of global parameters, which is negligible considering the overall speed-up and scaling, and the authors demonstrate that with a simple experiment. It is nice that the reformulation of the variational lower bound unifies both cases (regression becomes a special case of the LVM when the inputs are fixed).
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.37)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
The main idea builds upon the inducing-point formalism underpinning most sparse methods for GP inference. As the computational cost of traditional sparse methods in GPs based on inducing points is O(NM 2), where N is the number of observations and M is the number of inducing points, the paper addresses the problem of large-scale inference by making conditional independence assumptions across inducing points. More specifically, the method proposed in the paper can be seen as a modified version of the partially independent conditional (PIC) approach, where not only the latent functions are grouped in blocks but also the inducing points are clustered in blocks (corresponding to those latent functions) and statistical dependences across inducing point blocks are modeled with a tree. These additional independence assumptions make the resulting inference algorithm much more scalable as it only scales (potentially) linearly with the number of observations and the number of inducing points. The method is evaluated on 1D and 2D problems showing that it outperforms standard sparse GP approximations.
the NLLs in the final version of the paper in addition to reporting averages and standard deviations in all of our other 3 tables by running more trials
We agree with all three reviewers that evaluating the predictive variances is important. Thank you for your comments and suggestions. Finally, we will clarify that SGPR is by (Titsias, 2009) and SVGP is by (Hensman et al., 2013). This has important ramifications, e.g., We were unaware of Nguyen's paper at submission and we will add this discussion to the paper. We note that the precomputation, like CG, can be run to a specified desired tolerance. Hensman et al. (2013) used 1000 inducing points on the massive Airline dataset.
Amortized Variational Inference for Deep Gaussian Processes
Gaussian processes (GPs) are Bayesian nonparametric models for function approximation with principled predictive uncertainty estimates. Deep Gaussian processes (DGPs) are multilayer generalizations of GPs that can represent complex marginal densities as well as complex mappings. As exact inference is either computationally prohibitive or analytically intractable in GPs and extensions thereof, some existing methods resort to variational inference (VI) techniques for tractable approximations. However, the expressivity of conventional approximate GP models critically relies on independent inducing variables that might not be informative enough for some problems. In this work we introduce amortized variational inference for DGPs, which learns an inference function that maps each observation to variational parameters. The resulting method enjoys a more expressive prior conditioned on fewer input dependent inducing variables and a flexible amortized marginal posterior that is able to model more complicated functions. We show with theoretical reasoning and experimental results that our method performs similarly or better than previous approaches at less computational cost.
- Asia > China > Fujian Province > Xiamen (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (4 more...)
Trust Your Robots! Predictive Uncertainty Estimation of Neural Networks with Sparse Gaussian Processes
Lee, Jongseok, Feng, Jianxiang, Humt, Matthias, Müller, Marcus G., Triebel, Rudolph
This paper presents a probabilistic framework to obtain both reliable and fast uncertainty estimates for predictions with Deep Neural Networks (DNNs). Our main contribution is a practical and principled combination of DNNs with sparse Gaussian Processes (GPs). We prove theoretically that DNNs can be seen as a special case of sparse GPs, namely mixtures of GP experts (MoE-GP), and we devise a learning algorithm that brings the derived theory into practice. In experiments from two different robotic tasks -- inverse dynamics of a manipulator and object detection on a micro-aerial vehicle (MAV) -- we show the effectiveness of our approach in terms of predictive uncertainty, improved scalability, and run-time efficiency on a Jetson TX2. We thus argue that our approach can pave the way towards reliable and fast robot learning systems with uncertainty awareness.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (3 more...)
A Tutorial on Sparse Gaussian Processes and Variational Inference
Leibfried, Felix, Dutordoir, Vincent, John, ST, Durrande, Nicolas
Gaussian processes (GPs) provide a framework for Bayesian inference that can offer principled uncertainty estimates for a large range of problems. For example, if we consider regression problems with Gaussian likelihoods, a GP model enjoys a posterior in closed form. However, identifying the posterior GP scales cubically with the number of training examples and requires to store all examples in memory. In order to overcome these obstacles, sparse GPs have been proposed that approximate the true posterior GP with pseudo-training examples. Importantly, the number of pseudo-training examples is user-defined and enables control over computational and memory complexity. In the general case, sparse GPs do not enjoy closed-form solutions and one has to resort to approximate inference. In this context, a convenient choice for approximate inference is variational inference (VI), where the problem of Bayesian inference is cast as an optimization problem -- namely, to maximize a lower bound of the log marginal likelihood. This paves the way for a powerful and versatile framework, where pseudo-training examples are treated as optimization arguments of the approximate posterior that are jointly identified together with hyperparameters of the generative model (i.e. prior and likelihood). The framework can naturally handle a wide scope of supervised learning problems, ranging from regression with heteroscedastic and non-Gaussian likelihoods to classification problems with discrete labels, but also multilabel problems. The purpose of this tutorial is to provide access to the basic matter for readers without prior knowledge in both GPs and VI. A proper exposition to the subject enables also access to more recent advances (like importance-weighted VI as well as inderdomain, multioutput and deep GPs) that can serve as an inspiration for new research ideas.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
- Education (1.00)
- Leisure & Entertainment > Games (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Sparse within Sparse Gaussian Processes using Neighbor Information
Tran, Gia-Lac, Milios, Dimitrios, Michiardi, Pietro, Filippone, Maurizio
Approximations to Gaussian processes based on inducing variables, combined with variational inference techniques, enable state-of-the-art sparse approaches to infer GPs at scale through mini batch-based learning. In this work, we address one limitation of sparse GPs, which is due to the challenge in dealing with a large number of inducing variables without imposing a special structure on the inducing inputs. In particular, we introduce a novel hierarchical prior, which imposes sparsity on the set of inducing variables. We treat our model variationally, and we experimentally show considerable computational gains compared to standard sparse GPs when sparsity on the inducing variables is realized considering the nearest inducing inputs of a random mini-batch of the data. We perform an extensive experimental validation that demonstrates the effectiveness of our approach compared to the state-of-the-art. Our approach enables the possibility to use sparse GPs using a large number of inducing points without incurring a prohibitive computational cost.