Goto

Collaborating Authors

 Deisenroth, Marc Peter


Mat\'ern Gaussian processes on Riemannian manifolds

arXiv.org Machine Learning

Gaussian processes are an effective model class for learning unknown functions, particularly in settings where accurately representing predictive uncertainty is of key importance. Motivated by applications in the physical sciences, the widely-used Mat\'ern class of Gaussian processes has recently been generalized to model functions whose domains are Riemannian manifolds, by re-expressing said processes as solutions of stochastic partial differential equations. In this work, we propose techniques for computing the kernels of these processes on compact Riemannian manifolds via spectral theory of the Laplace-Beltrami operator in a fully constructive manner, thereby allowing them to be trained via standard scalable techniques such as inducing point methods. We also extend the generalization from the Mat\'ern to the widely-used squared exponential Gaussian process. By allowing Riemannian Mat\'ern Gaussian processes to be trained using well-understood techniques, our work enables their use in mini-batch, online, and non-conjugate settings, and makes them more accessible to machine learning practitioners.


Probabilistic Active Meta-Learning

arXiv.org Machine Learning

Data-efficient learning algorithms are essential in many practical applications where data collection is expensive, e.g., in robotics due to the wear and tear. To address this problem, meta-learning algorithms use prior experience about tasks to learn new, related tasks efficiently. Typically, a set of training tasks is assumed given or randomly chosen. However, this setting does not take into account the sequential nature that naturally arises when training a model from scratch in real-life: how do we collect a set of training tasks in a data-efficient manner? In this work, we introduce task selection based on prior experience into a meta-learning algorithm by conceptualizing the learner and the active meta-learning setting using a probabilistic latent variable model. We provide empirical evidence that our approach improves data-efficiency when compared to strong baselines on simulated robotic experiments.


A Foliated View of Transfer Learning

arXiv.org Machine Learning

Transfer learning considers a learning process where a new task is solved by transferring relevant knowledge from known solutions to related tasks. While this has been studied experimentally, there lacks a foundational description of the transfer learning problem that exposes what related tasks are, and how they can be exploited. In this work, we present a definition for relatedness between tasks and identify foliations as a mathematical framework to represent such relationships.


Estimating Barycenters of Measures in High Dimensions

arXiv.org Machine Learning

Barycentric averaging is a principled way of summarizing populations of measures. Existing algorithms for estimating barycenters typically parametrize them as weighted sums of Diracs and optimize their weights and/or locations. However, these approaches do not scale to high-dimensional settings due to the curse of dimensionality. In this paper, we propose a scalable and general algorithm for estimating barycenters of measures in high dimensions. The key idea is to turn the optimization over measures into an optimization over generative models, introducing inductive biases that allow the method to scale while still accurately estimating barycenters. We prove local convergence under mild assumptions on the discrepancy showing that the approach is well-posed. We demonstrate that our method is fast, achieves good performance on low-dimensional problems, and scales to high-dimensional settings. In particular, our approach is the first to be used to estimate barycenters in thousands of dimensions.


Stochastic Differential Equations with Variational Wishart Diffusions

arXiv.org Machine Learning

We present a Bayesian non-parametric way of inferring stochastic differential equations for both regression tasks and continuous-time dynamical modelling. The work has high emphasis on the stochastic part of the differential equation, also known as the diffusion, and modelling it by means of Wishart processes. Further, we present a semi-parametric approach that allows the framework to scale to high dimensions. This successfully lead us onto how to model both latent and auto-regressive temporal systems with conditional heteroskedastic noise. We provide experimental evidence that modelling diffusion often improves performance and that this randomness in the differential equation can be essential to avoid overfitting.


Aligning Time Series on Incomparable Spaces

arXiv.org Machine Learning

Dynamic time warping (DTW) is a useful method for aligning, comparing and combining time series, but it requires them to live in comparable spaces. In this work, we consider a setting in which time series live on different spaces without a sensible ground metric, causing DTW to become ill-defined. To alleviate this, we propose Gromov dynamic time warping (GDTW), a distance between time series on potentially incomparable spaces that avoids the comparability requirement by instead considering intra-relational geometry. We derive a Frank-Wolfe algorithm for computing it and demonstrate its effectiveness at aligning, combining and comparing time series living on incomparable spaces. We further propose a smoothed version of GDTW as a differentiable loss and assess its properties in a variety of settings, including barycentric averaging, generative modeling and imitation learning.


Variational Integrator Networks for Physically Meaningful Embeddings

arXiv.org Machine Learning

Learning workable representations of dynamical systems is becoming an increasingly important problem in a number of application areas. By leveraging recent work connecting deep neural networks to systems of differential equations, we propose variational integrator networks, a class of neural network architectures designed to ensure faithful representations of the dynamics under study. This class of network architectures facilitates accurate long-term prediction, interpretability, and data-efficient learning, while still remaining highly flexible and capable of modeling complex behavior. We demonstrate that they can accurately learn dynamical systems from both noisy observations in phase space and from image pixels within which the unknown dynamics are embedded.


Deep Gaussian Processes with Importance-Weighted Variational Inference

arXiv.org Machine Learning

Deep Gaussian processes (DGPs) can model complex marginal densities as well as complex mappings. Non-Gaussian marginals are essential for modelling real-world data, and can be generated from the DGP by incorporating uncorrelated variables to the model. Previous work on DGP models has introduced noise additively and used variational inference with a combination of sparse Gaussian processes and mean-field Gaussians for the approximate posterior. Additive noise attenuates the signal, and the Gaussian form of variational distribution may lead to an inaccurate posterior. We instead incorporate noisy variables as latent covariates, and propose a novel importance-weighted objective, which leverages analytic results and provides a mechanism to trade off computation for improved accuracy. Our results demonstrate that the importance-weighted objective works well in practice and consistently outperforms classical variational inference, especially for deeper models.


Differentially Private Empirical Risk Minimization with Sparsity-Inducing Norms

arXiv.org Machine Learning

Differential privacy is concerned about the prediction quality while measuring the privacy impact on individuals whose information is contained in the data. We consider differentially private risk minimization problems with regularizers that induce structured sparsity. These regularizers are known to be convex but they are often non-differentiable. We analyze the standard differentially private algorithms, such as output perturbation, Frank-Wolfe and objective perturbation. Output perturbation is a differentially private algorithm that is known to perform well for minimizing risks that are strongly convex. Previous works have derived excess risk bounds that are independent of the dimensionality. In this paper, we assume a particular class of convex but non-smooth regularizers that induce structured sparsity and loss functions for generalized linear models. We also consider differentially private Frank-Wolfe algorithms to optimize the dual of the risk minimization problem. We derive excess risk bounds for both these algorithms. Both the bounds depend on the Gaussian width of the unit ball of the dual norm. We also show that objective perturbation of the risk minimization problems is equivalent to the output perturbation of a dual optimization problem. This is the first work that analyzes the dual optimization problems of risk minimization problems in the context of differential privacy.


GPdoemd: a Python package for design of experiments for model discrimination

arXiv.org Machine Learning

Model discrimination identifies a mathematical model that usefully explains and predicts a given system's behaviour. Researchers will often have several models, i.e.\ hypotheses, about an underlying system mechanism, but insufficient experimental data to discriminate between the models, i.e.\ discard inaccurate models. Given rival mathematical models and an initial experimental data set, optimal design of experiments suggests maximally informative experimental observations that maximise a design criterion weighted by prediction uncertainty. The model uncertainty requires gradients, which may not be readily available for black-box models. This paper (i) proposes a new design criterion using the Jensen-R\'enyi divergence, and (ii) develops a novel method replacing black-box models with Gaussian process surrogates. Using the surrogates, we marginalise out the model parameters with approximate inference. Results show these contributions working well for both classical and new test instances. We also (iii) introduce and discuss GPdoemd, the open-source implementation of the Gaussian process surrogate method.