Álvarez, Mauricio A.
Adaptive RKHS Fourier Features for Compositional Gaussian Process Models
Shi, Xinxing, Baldwin-McDonald, Thomas, Álvarez, Mauricio A.
Gaussian Processes (GPs) provide a principled Bayesian framework for function approximation, making them particularly useful in many applications requiring uncertainty calibration [Rasmussen and Williams, 2006], such as Bayesian optimisation [Snoek et al., 2012] and time-series analysis [Roberts et al., 2013]. Despite offering reasonable uncertainty estimation, shallow GPs often struggle to model complex, non-stationary processes present in practical applications. To overcome this limitation, Deep Gaussian Processes (DGPs) employ a compositional architecture by stacking multiple GP layers, thereby enhancing representational power while preserving the model's intrinsic capability to quantify uncertainty [Damianou and Lawrence, 2013]. However, the conventional variational formulation of DGPs heavily depends on local inducing point approximations across intermediate GP layers [Titsias, 2009, Salimbeni and Deisenroth, 2017], which hinder the model from capturing the global structures commonly found in real-world scenarios. Incorporating Fourier features into GP models has shown promise in addressing this challenge in GP inference due to the periodic nature of these features. A line of research uses Random Fourier Features (RFFs, [Rahimi and Recht, 2007]) of stationary kernels to convert the original (deep) GPs into Bayesian networks in weight space [Lázaro-Gredilla et al., 2010, Gal and Turner, 2015, Cutajar et al., 2017]. Building on this concept within a sparse variational GP framework, recent advancements in inter-domain GPs [Lázaro-Gredilla and Figueiras-Vidal, 2009a, Van der Wilk et al., 2020] directly approximate the posterior of the original GPs by introducing fixed Variational Fourier Features (VFFs) through process projection onto a Reproducing Kernel Hilbert Space (RKHS)[Hensman et al., 2018, Rudner et al., 2020]. VFFs are derived by projecting GPs onto a different domain.
Deep Latent Force Models: ODE-based Process Convolutions for Bayesian Deep Learning
Baldwin-McDonald, Thomas, Álvarez, Mauricio A.
Effectively modeling phenomena present in highly nonlinear dynamical systems whilst also accurately quantifying uncertainty is a challenging task, which often requires problem-specific techniques. We outline the deep latent force model (DLFM), a domain-agnostic approach to tackling this problem, which consists of a deep Gaussian process architecture where the kernel at each layer is derived from an ordinary differential equation using the framework of process convolutions. Two distinct formulations of the DLFM are presented which utilise weight-space and variational inducing points-based Gaussian process approximations, both of which are amenable to doubly stochastic variational inference. We provide evidence that our model is capable of capturing highly nonlinear behaviour in real-world multivariate time series data. In addition, we find that our approach achieves comparable performance to a number of other probabilistic models on benchmark regression tasks. We also empirically assess the negative impact of the inducing points framework on the extrapolation capabilities of LFM-based models.
Thin and Deep Gaussian Processes
de Souza, Daniel Augusto, Nikitin, Alexander, John, ST, Ross, Magnus, Álvarez, Mauricio A., Deisenroth, Marc Peter, Gomes, João P. P., Mesquita, Diego, Mattos, César Lincoln C.
Gaussian processes (GPs) can provide a principled approach to uncertainty quantification with easy-to-interpret kernel hyperparameters, such as the lengthscale, which controls the correlation distance of function values. However, selecting an appropriate kernel can be challenging. Deep GPs avoid manual kernel engineering by successively parameterizing kernels with GP layers, allowing them to learn low-dimensional embeddings of the inputs that explain the output data. Following the architecture of deep neural networks, the most common deep GPs warp the input space layer-by-layer but lose all the interpretability of shallow GPs. An alternative construction is to successively parameterize the lengthscale of a kernel, improving the interpretability but ultimately giving away the notion of learning lower-dimensional embeddings. Unfortunately, both methods are susceptible to particular pathologies which may hinder fitting and limit their interpretability. This work proposes a novel synthesis of both previous approaches: Thin and Deep GP (TDGP). Each TDGP layer defines locally linear transformations of the original input data maintaining the concept of latent embeddings while also retaining the interpretation of lengthscales of a kernel. Moreover, unlike the prior solutions, TDGP induces non-pathological manifolds that admit learning lower-dimensional representations. We show with theoretical and experimental results that i) TDGP is, unlike previous models, tailored to specifically discover lower-dimensional manifolds in the input data, ii) TDGP behaves well when increasing the number of layers, and iii) TDGP performs well in standard benchmark datasets.
Modular Gaussian Processes for Transfer Learning
Moreno-Muñoz, Pablo, Artés-Rodríguez, Antonio, Álvarez, Mauricio A.
We present a framework for transfer learning based on modular variational Gaussian processes (GP). We develop a module-based method that having a dictionary of well fitted GPs, one could build ensemble GP models without revisiting any data. Each model is characterised by its hyperparameters, pseudo-inputs and their corresponding posterior densities. Our method avoids undesired data centralisation, reduces rising computational costs and allows the transfer of learned uncertainty metrics after training. We exploit the augmentation of high-dimensional integral operators based on the Kullback-Leibler divergence between stochastic processes to introduce an efficient lower bound under all the sparse variational GPs, with different complexity and even likelihood distribution. The method is also valid for multi-output GPs, learning correlations a posteriori between independent modules. Extensive results illustrate the usability of our framework in large-scale and multi-task experiments, also compared with the exact inference methods in the literature.
Compositional Modeling of Nonlinear Dynamical Systems with ODE-based Random Features
McDonald, Thomas M., Álvarez, Mauricio A.
Effectively modeling phenomena present in highly nonlinear dynamical systems whilst also accurately quantifying uncertainty is a challenging task, which often requires problem-specific techniques. We present a novel, domain-agnostic approach to tackling this problem, using compositions of physics-informed random features, derived from ordinary differential equations. The architecture of our model leverages recent advances in approximate inference for deep Gaussian processes, such as layer-wise weight-space approximations which allow us to incorporate random Fourier features, and stochastic variational inference for approximate Bayesian inference. We provide evidence that our model is capable of capturing highly nonlinear behaviour in real-world multivariate time series data. In addition, we find that our approach achieves comparable performance to a number of other probabilistic models on benchmark regression tasks.
Learning Nonparametric Volterra Kernels with Gaussian Processes
Ross, Magnus, Smith, Michael T., Álvarez, Mauricio A.
This paper introduces a method for the nonparametric Bayesian learning of nonlinear operators, through the use of the Volterra series with kernels represented using Gaussian processes (GPs), which we term the nonparametric Volterra kernels model (NVKM). When the input function to the operator is unobserved and has a GP prior, the NVKM constitutes a powerful method for both single and multiple output regression, and can be viewed as a nonlinear and nonparametric latent force model. When the input function is observed, the NVKM can be used to perform Bayesian system identification. We use recent advances in efficient sampling of explicit functions from GPs to map process realisations through the Volterra series without resorting to numerical integration, allowing scalability through doubly stochastic variational inference, and avoiding the need for Gaussian approximations of the output processes. We demonstrate the performance of the model for both multiple output regression and system identification using standard benchmarks.
Recyclable Gaussian Processes
Moreno-Muñoz, Pablo, Artés-Rodríguez, Antonio, Álvarez, Mauricio A.
We present a new framework for recycling independent variational approximations to Gaussian processes. The main contribution is the construction of variational ensembles given a dictionary of fitted Gaussian processes without revisiting any subset of observations. Our framework allows for regression, classification and heterogeneous tasks, i.e. mix of continuous and discrete variables over the same input domain. We exploit infinite-dimensional integral operators based on the Kullback-Leibler divergence between stochastic processes to re-combine arbitrary amounts of variational sparse approximations with different complexity, likelihood model and location of the pseudo-inputs. Extensive results illustrate the usability of our framework in large-scale distributed experiments, also compared with the exact inference models in the literature.
Continual Multi-task Gaussian Processes
Moreno-Muñoz, Pablo, Artés-Rodríguez, Antonio, Álvarez, Mauricio A.
We address the problem of continual learning in multi-task Gaussian process (GP) models for handling sequential input-output observations. Our approach extends the existing prior-posterior recursion of online Bayesian inference, i.e.\ past posterior discoveries become future prior beliefs, to the infinite functional space setting of GP. For a reason of scalability, we introduce variational inference together with an sparse approximation based on inducing inputs. As a consequence, we obtain tractable continual lower-bounds where two novel Kullback-Leibler (KL) divergences intervene in a natural way. The key technical property of our method is the recursive reconstruction of conditional GP priors conditioned on the variational parameters learned so far. To achieve this goal, we introduce a novel factorization of past variational distributions, where the predictive GP equation propagates the posterior uncertainty forward. We then demonstrate that it is possible to derive GP models over many types of sequential observations, either discrete or continuous and amenable to stochastic optimization. The continual inference approach is also applicable to scenarios where potential multi-channel or heterogeneous observations might appear. Extensive experiments demonstrate that the method is fully scalable, shows a reliable performance and is robust to uncertainty error propagation over a plenty of synthetic and real-world datasets.
Variational Bridge Constructs for Grey Box Modelling with Gaussian Processes
Ward, Wil O. C., Ryder, Tom, Prangle, Dennis, Álvarez, Mauricio A.
This paper introduces a method for inference of heterogeneous dynamical systems where part of the dynamics are known, in the form of an ordinary differential equation (ODEs), with some functional input that is unknown. Inference of such systems can be difficult, particularly when the dynamics are non-linear and the input is unknown. In this work, we place a Gaussian process (GP) prior over the input function which results in a stochastic It\^o process. Using an autoregressive variational approach, we simulate samples from the resulting process and conform them to the dynamics of the system, conditioned on some observation model. We apply the approach to non-linear ODEs to evaluate the method. As a simulation-based inference method, we also show how it can be extended to models with non-Gaussian likelihoods, such as count data.
Sparse Gaussian process Audio Source Separation Using Spectrum Priors in the Time-Domain
Alvarado, Pablo A., Álvarez, Mauricio A., Stowell, Dan
Gaussian process (GP) audio source separation is a time-domain approach that circumvents the inherent phase approximation issue of spectrogram based methods. Furthermore, through its kernel, GPs elegantly incorporate prior knowledge about the sources into the separation model. Despite these compelling advantages, the computational complexity of GP inference scales cubically with the number of audio samples. As a result, source separation GP models have been restricted to the analysis of short audio frames. We introduce an efficient application of GPs to time-domain audio source separation, without compromising performance. For this purpose, we used GP regression, together with spectral mixture kernels, and variational sparse GPs. We compared our method with LD-PSDTF (positive semi-definite tensor factorization), KL-NMF (Kullback-Leibler non-negative matrix factorization), and IS-NMF (Itakura-Saito NMF). Results show that the proposed method outperforms these techniques.