Goto

Collaborating Authors

 Durrande, Nicolas


Kernel Identification Through Transformers

arXiv.org Machine Learning

Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models, as the chosen kernel determines both the inductive biases and prior support of functions under the GP prior. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. Drawing inspiration from recent progress in deep learning, we introduce a novel approach named KITT: Kernel Identification Through Transformers. KITT exploits a transformer-based architecture to generate kernel recommendations in under 0.1 seconds, which is several orders of magnitude faster than conventional kernel search algorithms. We train our model using synthetic data generated from priors over a vocabulary of known kernels. By exploiting the nature of the self-attention mechanism, KITT is able to process datasets with inputs of arbitrary dimension. We demonstrate that kernels chosen by KITT yield strong performance over a diverse collection of regression benchmarks.


Deep Neural Networks as Point Estimates for Deep Gaussian Processes

arXiv.org Machine Learning

Bayesian inference has the potential to improve deep neural networks (DNNs) by providing 1) uncertainty estimates for robust prediction and downstream decision-making, and 2) an objective function (the marginal likelihood) for hyperparameter selection [MacKay, 1992a; 1992b; 2003]. The recent success of deep learning [Krizhevsky et al., 2012; Vaswani et al., 2017; Schrittwieser et al., 2020] has renewed interest in large-scale Bayesian Neural Networks (BNNs) as well, with effort mainly focused on obtaining useful uncertainty estimates [Blundell et al., 2015; Kingma et al., 2015; Gal and Ghahramani, 2016]. Despite already providing usable uncertainty estimates, there is significant evidence that current approximations to the uncertainty on neural network weights can still be significantly improved [Hron et al., 2018; Foong et al., 2020]. The accuracy of the uncertainty approximation is also linked to the quality of the marginal likelihood estimate [Blei et al., 2017]. Since hyperparameter learning using the marginal likelihood fails for most common approximations [e.g., Blundell et al., 2015], the accuracy of the uncertainty estimates is also questionable. Damianou and Lawrence [2013] used Gaussian processes [Rasmussen and Williams, 2006] as layers to create a different Bayesian analogue to a DNN: the Deep Gaussian process (DGP). Gaussian processes (GPs) are a different representation of a single layer neural network, which is promising because it allows high-quality approximations to uncertainty [Titsias, 2009; Burt et al., 2019].


The Minecraft Kernel: Modelling correlated Gaussian Processes in the Fourier domain

arXiv.org Machine Learning

In the univariate setting, using the kernel spectral representation is an appealing approach for generating stationary covariance functions. However, performing the same task for multiple-output Gaussian processes is substantially more challenging. We demonstrate that current approaches to modelling cross-covariances with a spectral mixture kernel possess a critical blind spot. For a given pair of processes, the cross-covariance is not reproducible across the full range of permitted correlations, aside from the special case where their spectral densities are of identical shape. We present a solution to this issue by replacing the conventional Gaussian components of a spectral mixture with block components of finite bandwidth (i.e. rectangular step functions). The proposed family of kernel represents the first multi-output generalisation of the spectral mixture kernel that can approximate any stationary multi-output kernel to arbitrary precision.


A Tutorial on Sparse Gaussian Processes and Variational Inference

arXiv.org Machine Learning

Gaussian processes (GPs) provide a framework for Bayesian inference that can offer principled uncertainty estimates for a large range of problems. For example, if we consider regression problems with Gaussian likelihoods, a GP model enjoys a posterior in closed form. However, identifying the posterior GP scales cubically with the number of training examples and requires to store all examples in memory. In order to overcome these obstacles, sparse GPs have been proposed that approximate the true posterior GP with pseudo-training examples. Importantly, the number of pseudo-training examples is user-defined and enables control over computational and memory complexity. In the general case, sparse GPs do not enjoy closed-form solutions and one has to resort to approximate inference. In this context, a convenient choice for approximate inference is variational inference (VI), where the problem of Bayesian inference is cast as an optimization problem -- namely, to maximize a lower bound of the log marginal likelihood. This paves the way for a powerful and versatile framework, where pseudo-training examples are treated as optimization arguments of the approximate posterior that are jointly identified together with hyperparameters of the generative model (i.e. prior and likelihood). The framework can naturally handle a wide scope of supervised learning problems, ranging from regression with heteroscedastic and non-Gaussian likelihoods to classification problems with discrete labels, but also multilabel problems. The purpose of this tutorial is to provide access to the basic matter for readers without prior knowledge in both GPs and VI. A proper exposition to the subject enables also access to more recent advances (like importance-weighted VI as well as inderdomain, multioutput and deep GPs) that can serve as an inspiration for new research ideas.


Matern Gaussian Processes on Graphs

arXiv.org Machine Learning

Gaussian processes are a versatile framework for learning unknown functions in a manner that permits one to utilize prior information about their properties. Although many different Gaussian process models are readily available when the input space is Euclidean, the choice is much more limited for Gaussian processes whose input space is an undirected graph. In this work, we leverage the stochastic partial differential equation characterization of Mat\'{e}rn Gaussian processes - a widely-used model class in the Euclidean setting - to study their analog for undirected graphs. We show that the resulting Gaussian processes inherit various attractive properties of their Euclidean and Riemannian analogs and provide techniques that allow them to be trained using standard methods, such as inducing points. This enables graph Mat\'{e}rn Gaussian processes to be employed in mini-batch and non-conjugate settings, thereby making them more accessible to practitioners and easier to deploy within larger learning frameworks.


Sparse Gaussian Processes with Spherical Harmonic Features

arXiv.org Machine Learning

We introduce a new class of inter-domain variational Gaussian processes (GP) where data is mapped onto the unit hypersphere in order to use spherical harmonic representations. Our inference scheme is comparable to variational Fourier features, but it does not suffer from the curse of dimensionality, and leads to diagonal covariance matrices between inducing variables. This enables a speed-up in inference, because it bypasses the need to invert large covariance matrices. Our experiments show that our model is able to fit a regression model for a dataset with 6 million entries two orders of magnitude faster compared to standard sparse GPs, while retaining state of the art accuracy. We also demonstrate competitive performance on classification with non-conjugate likelihoods.


Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

arXiv.org Machine Learning

Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks (in a classical BO setup), as well as warm-starting it for a new task.


Physically-Inspired Gaussian Process Models for Post-Transcriptional Regulation in Drosophila

arXiv.org Machine Learning

The regulatory process of Drosophila has been thoroughly studied for understanding a great variety of systems biology principles. While pattern-forming gene networks are further analysed in the transcription step, post-transcriptional events (e.g. translation, protein processing) play an important role in establishing protein expression patterns and levels. Since post-transcriptional regulation of gap genes in Drosophila depends on spatiotemporal interactions between mRNAs and gap proteins, proper physically-inspired stochastic models are required to study the existing link between both quantities. Previous research attempts have shown that the use of Gaussian processes (GPs) and differential equations leads to promising predictions when analysing regulatory networks. Here we aim at further investigating two types of physically-inspired GP models based on a reaction-diffusion equation where the main difference lies on whether the GP prior is placed. While one of them has been studied previously using gap protein data only, the other is novel and yields a simplistic approach requiring only the differentiation of kernel functions. In contrast to other stochastic frameworks, discretising the spatial space is not required here. Both GP models are tested under different conditions depending on the availability of gap gene mRNA expression data. Finally, their performances are assessed on a high-resolution dataset describing the blastoderm stage of the early embryo of Drosophila melanogaster.


Gaussian Process Modulated Cox Processes under Linear Inequality Constraints

arXiv.org Machine Learning

Point processes are used in a variety of real-world problems for modelling temporal or spatiotemporal point patterns in fields such as astronomy, geography, and ecology (Baddeley et al., 2015; Møller and Waagepetersen, 2004). In reliability analysis, they are used as renewal processes to model the lifetime of items or failure (hazard) rates (Cha and Finkelstein, 2018). Poisson processes are the foundation for modelling point patterns (Kingman, 1992). Their extension to stochastic intensity functions, known as doubly stochastic Poisson processes or Cox processes (Cox, 1955), enables nonparametric inference on the intensity function and allows expressing uncertainties (Møller and Waagepetersen, 2004). Moreover, previous studies have shown that other classes of point processes may also be seen as Cox processes. For example, Yannaros (1988) proved that Gamma renewal processes are Cox processes under non-increasing conditions. A similar analysis was made later for Weibull processes (Yannaros, 1994).


Banded Matrix Operators for Gaussian Markov Models in the Automatic Differentiation Era

arXiv.org Machine Learning

These two limitations have been thoroughly Banded matrices can be used as precision studied over the past decades and several approaches matrices in several models including linear have been proposed to overcome them. The most popular state-space models, some Gaussian processes, method for reducing computational complexity is and Gaussian Markov random fields. The the sparse GP framework (Candela and Rasmussen, aim of the paper is to make modern inference 2005; Titsias, 2009), where computations are focussed methods (such as variational inference or on a set of "inducing variables", allowing a tradeoff gradient-based sampling) available for Gaussian between computational requirements and the accuracy models with banded precision.