wgp
Bayesian Warped Gaussian Processes
Warped Gaussian processes (WGP) [1] model output observations in regression tasks as a parametric nonlinear transformation of a Gaussian process (GP). The use of this nonlinear transformation, which is included as part of the probabilistic model, was shown to enhance performance by providing a better prior model on several data sets. In order to learn its parameters, maximum likelihood was used. In this work we show that it is possible to use a non-parametric nonlinear transformation in WGP and variationally integrate it out. The resulting Bayesian WGP is then able to work in scenarios in which the maximum likelihood WGP failed: Low data regime, data with censored values, classification, etc. We demonstrate the superior performance of Bayesian warped GPs on several real data sets.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)
Bayesian Warped Gaussian Processes
Warped Gaussian processes (WGP) [1] model output observations in regression tasks as a parametric nonlinear transformation of a Gaussian process (GP). The use of this nonlinear transformation, which is included as part of the probabilistic model, was shown to enhance performance by providing a better prior model on several data sets. In order to learn its parameters, maximum likelihood was used. In this work we show that it is possible to use a non-parametric nonlinear transformation in WGP and variationally integrate it out. The resulting Bayesian WGP is then able to work in scenarios in which the maximum likelihood WGP failed: Low data regime, data with censored values, classification, etc.
Scalable Bayesian Transformed Gaussian Processes
Zhu, Xinran, Huang, Leo, Ibrahim, Cameron, Lee, Eric Hans, Bindel, David
The Bayesian transformed Gaussian process (BTG) model, proposed by Kedem and Oliviera, is a fully Bayesian counterpart to the warped Gaussian process (WGP) and marginalizes out a joint prior over input warping and kernel hyperparameters. This fully Bayesian treatment of hyperparameters often provides more accurate regression estimates and superior uncertainty propagation, but is prohibitively expensive. The BTG posterior predictive distribution, itself estimated through high-dimensional integration, must be inverted in order to perform model prediction. To make the Bayesian approach practical and comparable in speed to maximum-likelihood estimation (MLE), we propose principled and fast techniques for computing with BTG. Our framework uses doubly sparse quadrature rules, tight quantile bounds, and rank-one matrix algebra to enable both fast model prediction and model selection. These scalable methods allow us to regress over higher-dimensional datasets and apply BTG with layered transformations that greatly improve its expressibility. We demonstrate that BTG achieves superior empirical performance over MLE-based models.
Bayesian Warped Gaussian Processes
Warped Gaussian processes (WGP) [1] model output observations in regression tasks as a parametric nonlinear transformation of a Gaussian process (GP). The use of this nonlinear transformation, which is included as part of the probabilistic model, was shown to enhance performance by providing a better prior model on several data sets. In order to learn its parameters, maximum likelihood was used. In this work we show that it is possible to use a non-parametric nonlinear transformation in WGP and variationally integrate it out. The resulting Bayesian WGP is then able to work in scenarios in which the maximum likelihood WGP failed: Low data regime, data with censored values, classification, etc.
Learning non-Gaussian Time Series using the Box-Cox Gaussian Process
A Gaussian process (GP) [1] is a prior distribution over functions with a support that includes a wide class of phenomena via the design of its mean and covariance functions, the parameters of which provide meaningful interpretation of the process at hand. Beyond regression [2], GPs have been extensively used in the last two decades for classification [3], density estimation [4], filter design [5], model identification [6] and optimisation [7]. In general terms, all these generative models have two stages: The latent process is modelled as a GP and the observation is modelled (conditional to the latent process) as a non-Gaussian variable. This class of models is referred to as GP with non-Gaussian likelihood, or as Generalised GPs. These usually consider likelihood functions from the exponential family such as the Laplace, Poisson, beta and gamma distributions [8]. A well-known example is the GP classification model, where the classes are represented by the output of an activation neuron into which a latent GP is fed. A slightly different approach to non-Gaussian models, which is not constrained to the exponential family, is the warped GP (WGP, [9]). The WGP models non-Gaussian data by assuming that there is a transformation φ such that the observations can be passed through φ to yield a GP, therefore, the likelihood function of this model is not designed directly but, rather, induced by the transformation (a.k.a.
- North America > United States (0.46)
- South America > Chile (0.04)
- Asia > Middle East > Jordan (0.04)
- Government > Regional Government > North America Government > United States Government (0.46)
- Banking & Finance > Economy (0.46)