Damoulas, Theodoros


Multi-resolution Multi-task Gaussian Processes

Neural Information Processing Systems

We consider evidence integration from potentially dependent observation processes under varying spatio-temporal sampling resolutions and noise levels. We offer a multi-resolution multi-task (MRGP) framework that allows for both inter-task and intra-task multi-resolution and multi-fidelity. We develop shallow Gaussian Process (GP) mixtures that approximate the difficult to estimate joint likelihood with a composite one and deep GP constructions that naturally handle biases. In doing so, we generalize existing approaches and offer information-theoretic corrections and efficient variational approximations. We demonstrate the competitiveness of MRGPs on synthetic settings and on the challenging problem of hyper-local estimation of air pollution levels across London from multiple sensing modalities operating at disparate spatio-temporal resolutions.


Structured Variational Inference in Continuous Cox Process Models

Neural Information Processing Systems

We propose a scalable framework for inference in a continuous sigmoidal Cox process that assumes the corresponding intensity function is given by a Gaussian process (GP) prior transformed with a scaled logistic sigmoid function. We present a tractable representation of the likelihood through augmentation with a superposition of Poisson processes. This view enables a structured variational approximation capturing dependencies across variables in the model. Our framework avoids discretization of the domain, does not require accurate numerical integration over the input space and is not limited to GPs with squared exponential kernels. We evaluate our approach on synthetic and real-world data showing that its benefits are particularly pronounced on multivariate input settings where it overcomes the limitations of mean-field methods and sampling schemes.


Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with \beta-Divergences

Neural Information Processing Systems

We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $\beta$-divergences. The resulting inference procedure is doubly robust for both the predictive and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as $\beta \to 0$. Secondly, we give a principled way of choosing the divergence parameter $\beta$ by minimizing expected predictive loss on-line.


Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization

arXiv.org Machine Learning

Within the context of empirical risk minimization, see Raginsky, Rakhlin, and Telgarsky (2017), we are concerned with a non-asymptotic analysis of sampling algorithms used in optimization. In particular, we obtain non-asymptotic error bounds for a popular class of algorithms called Stochastic Gradient Langevin Dynamics (SGLD). These results are derived in appropriate Wasserstein distances in the absence of the log-concavity of the target distribution. More precisely, the local Lipschitzness of the stochastic gradient $H(\theta, x)$ is assumed, and furthermore, the dissipativity and convexity at infinity condition are relaxed by removing the uniform dependence in $x$.


Probabilistic sequential matrix factorization

arXiv.org Machine Learning

We introduce the probabilistic sequential matrix factorization (PSMF) method for factorizing time-varying and non-stationary datasets consisting of high-dimensional time-series. In particular, we consider nonlinear-Gaussian state-space models in which sequential approximate inference results in the factorization of a data matrix into a dictionary and time-varying coefficients with (possibly nonlinear) Markovian dependencies. The assumed Markovian structure on the coefficients enables us to encode temporal dependencies into a low-dimensional feature space. The proposed inference method is solely based on an approximate extended Kalman filtering scheme which makes the resulting method particularly efficient. The PSMF can account for temporal nonlinearities and, more importantly, can be used to calibrate and estimate generic differentiable nonlinear subspace models. We show that the PSMF can be used in multiple contexts: modelling time series with a periodic subspace, robustifying changepoint detection methods, and imputing missing-data in high-dimensional time-series of air pollutants measured across London.


On the Constrained Least-cost Tour Problem

arXiv.org Artificial Intelligence

We introduce the Constrained Least-cost Tour (CLT) problem: given an undirected graph with weight and cost functions on the edges, minimise the total cost of a tour rooted at a start vertex such that the total weight lies within a given range. CLT is related to the family of Travelling Salesman Problems with Profits, but differs by defining the weight function on edges instead of vertices, and by requiring the total weight to be within a range instead of being at least some quota. We prove CLT is $\mathcal{NP}$-hard, even in the simple case when the input graph is a path. We derive an informative lower bound by relaxing the integrality of edges and propose a heuristic motivated by this relaxation. For the case that requires the tour to be a simple cycle, we develop two heuristics which exploit Suurballe's algorithm to find low-cost, weight-feasible cycles. We demonstrate our algorithms by addressing a real-world problem that affects urban populations: finding routes that minimise air pollution exposure for walking, running and cycling in the city of London.


Structured Variational Inference in Continuous Cox Process Models

arXiv.org Machine Learning

We propose a scalable framework for inference in an inhomogeneous Poisson process modeled by a continuous sigmoidal Cox process that assumes the corresponding intensity function is given by a Gaussian process (GP) prior transformed with a scaled logistic sigmoid function. We present a tractable representation of the likelihood through augmentation with a superposition of Poisson processes. This view enables a structured variational approximation capturing dependencies across variables in the model. Our framework avoids discretization of the domain, does not require accurate numerical integration over the input space and is not limited to GPs with squared exponential kernels. We evaluate our approach on synthetic and real-world data showing that its benefits are particularly pronounced on multivariate input settings where it overcomes the limitations of mean-field methods and sampling schemes. We provide the state of-the-art in terms of speed, accuracy and uncertainty quantification trade-offs.


Generalized Variational Inference

arXiv.org Artificial Intelligence

This paper introduces a generalized representation of Bayesian inference. It is derived axiomatically, recovering existing Bayesian methods as special cases. We use it to prove that variational inference (VI) based on the Kullback-Leibler Divergence with a variational family Q produces the uniquely optimal Q-constrained approximation to the exact Bayesian inference problem. Surprisingly, this implies that standard VI dominates any other Q-constrained approximation to the exact Bayesian inference problem. This means that alternative Q-constrained approximations such as VI targeted at minimizing other divergences and Expectation Propagation can produce better posteriors than VI only by implicitly targeting more appropriate Bayesian inference problems. Inspired by this, we introduce Generalized Variational Inference (GVI), a modular approach for instead solving such alternative inference problems explicitly. We explore some applications of GVI, including robustness and better marginals. Lastly, we derive black box GVI and apply it to Bayesian Neural Networks as well as Deep Gaussian Processes, where GVI comprehensively outperforms competing methods.


Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with $\beta$-Divergences

Neural Information Processing Systems

We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $\beta$-divergences. The resulting inference procedure is doubly robust for both the predictive and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as $\beta \to 0$. Secondly, we give a principled way of choosing the divergence parameter $\beta$ by minimizing expected predictive loss on-line. Reducing False Discovery Rates of \CPs from up to 99\% to 0\% on real world data, this offers the state of the art.


Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with $\beta$-Divergences

Neural Information Processing Systems

We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $\beta$-divergences. The resulting inference procedure is doubly robust for both the predictive and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as $\beta \to 0$. Secondly, we give a principled way of choosing the divergence parameter $\beta$ by minimizing expected predictive loss on-line. Reducing False Discovery Rates of \CPs from up to 99\% to 0\% on real world data, this offers the state of the art.