Goto

Collaborating Authors

 Bayesian Inference


Representing Additive Gaussian Processes by Sparse Matrices

arXiv.org Artificial Intelligence

Among generalized additive models, additive Mat\'ern Gaussian Processes (GPs) are one of the most popular for scalable high-dimensional problems. Thanks to their additive structure and stochastic differential equation representation, back-fitting-based algorithms can reduce the time complexity of computing the posterior mean from $O(n^3)$ to $O(n\log n)$ time where $n$ is the data size. However, generalizing these algorithms to efficiently compute the posterior variance and maximum log-likelihood remains an open problem. In this study, we demonstrate that for Additive Mat\'ern GPs, not only the posterior mean, but also the posterior variance, log-likelihood, and gradient of these three functions can be represented by formulas involving only sparse matrices and sparse vectors. We show how to use these sparse formulas to generalize back-fitting-based algorithms to efficiently compute the posterior mean, posterior variance, log-likelihood, and gradient of these three functions for additive GPs, all in $O(n \log n)$ time. We apply our algorithms to Bayesian optimization and propose efficient algorithms for posterior updates, hyperparameters learning, and computations of the acquisition function and its gradient in Bayesian optimization. Given the posterior, our algorithms significantly reduce the time complexity of computing the acquisition function and its gradient from $O(n^2)$ to $O(\log n)$ for general learning rate, and even to $O(1)$ for small learning rate.


Further analysis of multilevel Stein variational gradient descent with an application to the Bayesian inference of glacier ice models

arXiv.org Artificial Intelligence

Bayesian inference is a ubiquitous and flexible tool for updating a belief (i.e., learning) about a quantity of interest when data are observed, which ultimately can be used to inform downstream decision-making. In particular, Bayesian inverse problems allow one to derive knowledge from data through the lens of physicsbased models. These problems can be formulated as follows: given observational data, a physics-based model, and prior information about the model inputs, find a posterior probability distribution for the inputs that reflects the knowledge about the inputs in terms of the observed data and prior. Typically, the physicsbased models are given in the form of an input-to-observation map that is based on a system of partial differential equations (PDEs). The computational task underlying Bayesian inference is approximating posterior probability distributions to compute expectations and to quantify uncertainties. There are multiple ways of computationally exploring posterior distributions to gain insights, reaching from Markov chain Monte Carlo to variational methods [24, 42, 28]. In this work, we make use of Stein variational gradient descent (SVGD) [32], which is a method for particle-based variational inference, to approximate posterior distributions. It builds on Stein's identity to formulate an update step for the particles that can be realized numerically in an efficient manner via


Decentralized Inference via Capability Type Structures in Cooperative Multi-Agent Systems

arXiv.org Artificial Intelligence

This work studies the problem of ad hoc teamwork in teams composed of agents with differing computational capabilities. We consider cooperative multi-player games in which each agent's policy is constrained by a private capability parameter, and agents with higher capabilities are able to simulate the behavior of agents with lower capabilities (but not vice-versa). To address this challenge, we propose an algorithm that maintains a belief over the other agents' capabilities and incorporates this belief into the planning process. Our primary innovation is a novel framework based on capability type structures, which ensures that the belief updates remain consistent and informative without constructing the infinite hierarchy of beliefs. We also extend our techniques to settings where the agents' observations are subject to noise. We provide examples of games in which deviations in capability between oblivious agents can lead to arbitrarily poor outcomes, and experimentally validate that our capability-aware algorithm avoids the anti-cooperative behavior of the naive approach in these toy settings as well as a more complex cooperative checkers environment.


Dependent Latent Class Models

arXiv.org Machine Learning

Latent Class Models (LCMs) are used to cluster multivariate categorical data (e.g. group participants based on survey responses). Traditional LCMs assume a property called conditional independence. This assumption can be restrictive, leading to model misspecification and overparameterization. To combat this problem, we developed a novel Bayesian model called a Dependent Latent Class Model (DLCM), which permits conditional dependence. We verify identifiability of DLCMs. We also demonstrate the effectiveness of DLCMs in both simulations and real-world applications. Compared to traditional LCMs, DLCMs are effective in applications with time series, overlapping items, and structural zeroes.


Computationally-efficient initialisation of GPs: The generalised variogram method

arXiv.org Artificial Intelligence

We present a computationally-efficient strategy to initialise the hyperparameters of a Gaussian process (GP) avoiding the computation of the likelihood function. Our strategy can be used as a pretraining stage to find initial conditions for maximum-likelihood (ML) training, or as a standalone method to compute hyperparameters values to be plugged in directly into the GP model. Motivated by the fact that training a GP via ML is equivalent (on average) to minimising the KL-divergence between the true and learnt model, we set to explore different metrics/divergences among GPs that are computationally inexpensive and provide hyperparameter values that are close to those found via ML. In practice, we identify the GP hyperparameters by projecting the empirical covariance or (Fourier) power spectrum onto a parametric family, thus proposing and studying various measures of discrepancy operating on the temporal and frequency domains. Our contribution extends the variogram method developed by the geostatistics literature and, accordingly, it is referred to as the generalised variogram method (GVM). In addition to the theoretical presentation of GVM, we provide experimental validation in terms of accuracy, consistency with ML and computational complexity for different kernels using synthetic and real-world data.


Learning battery model parameter dynamics from data with recursive Gaussian process regression

arXiv.org Artificial Intelligence

Demand for battery systems is increasing rapidly as efforts Prognosis (i.e., future prediction) in this framework is to decarbonise electricity grids and electrify mobility gather achieved using a separate model for the evolution of parameters pace [1]. Due to their long lifetime and high energy density, over battery lifetime, and this can range from a random Li-ion cells have become the workhorse in battery systems walk [8]-[10] to semi-empirical curve fits of trajectories that [2]. Although the cost of these has dramatically decreased in may be re-parameterised over lifetime using adaptive methods the last decade [3], the economics of storage needs to further such as particle filtering [13], [14], a Bayesian approach improve to increase take-up, notably in applications where that also provides parameter uncertainty estimates. Modeldriven battery systems are not yet competitive in terms of levelized approaches tend to use rather simple equivalent-circuit cost [4]. Also, given the risks of Li-ion cell demand outpacing models because they have relatively few parameters that need the supply of the required raw materials [5], it is crucial that to be fitted, whereas parameterising physics-based models, the performance of existing systems, especially in terms of such as those within the Doyle-Fuller-Newman framework lifetime, is maximised. A key element in improving the overall [15], [16], is plagued by poor identifiability [17]. This is cost-effectiveness of Li-ion batteries is accurate estimation mainly due to a lack of reference electrodes in commercial and prediction of battery state-of-health (SOH), which can cells which means that decoupling the positive and negative improve lifetime, warranty and insurance costs, system safety half-cell potentials is very difficult.


Vehicle State Estimation and Prediction

arXiv.org Artificial Intelligence

Autonomous driving feedback control loops [2], [3], [4], [5], [6], [7], [8], [9],;10], [11] and decision-making systems [12], [13], [14], [15] depend on the effectiveness of information collection and learning the knowledge of vehicle motions, including the ego-vehicle and other nearby vehicles. Knowing the information, the autonomous vehicles can estimate the behaviors and future positions of others so as to determine the way of behaving in current traffic scenario. Therefore, the knowledge of vehicles at current moment on motions and states are particularly essential for autonomous driving. As for autonomous vehicles driving on the road, the sensor suite deployed on them commonly includes GPS, IMU, Lidars, Cameras and Radars. With the information collected from GPS and IMU, the ego vehicle can measure its states, including the global position, the heading angle that shows the orientation, the linear velocity and angular velocity as well as acceleration.


Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

arXiv.org Artificial Intelligence

However, While VAEs are nowadays omnipresent in the field of machine in practice, they suffer from a problem called learning, it is also widely recognized that there remain posterior collapse, which occurs when the encoder in practice some major challenges that still require effective coincides, or collapses, with the prior taking no solutions. Notably, they suffer from the problem of information from the latent structure of the input posterior collapse, which occurs when the distribution corresponding data into consideration. In this work, we introduce to the encoder coincides, or collapses, with the an inverse Lipschitz neural network into the prior taking no information from the latent structure of the decoder and, based on this architecture, provide a input data into consideration. Also known as KL vanishing new method that can control in a simple and clear or over-pruning, this phenomenon makes VAEs incapable manner the degree of posterior collapse for a wide to produce pertinent representations and has been reportedly range of VAE models equipped with a concrete observed in many fields (e.g., Bowman et al. (2016); Fu et al. theoretical guarantee. We also illustrate the effectiveness (2019); Wang & Ziyin (2022); Yeung et al. (2017)). There of our method through several numerical exists now a large body of literature that examines its underlying experiments.


Quantum Gaussian Process Regression for Bayesian Optimization

arXiv.org Artificial Intelligence

Gaussian process regression is a well-established Bayesian machine learning method. We propose a new approach to Gaussian process regression using quantum kernels based on parameterized quantum circuits. By employing a hardware-efficient feature map and careful regularization of the Gram matrix, we demonstrate that the variance information of the resulting quantum Gaussian process can be preserved. We also show that quantum Gaussian processes can be used as a surrogate model for Bayesian optimization, a task that critically relies on the variance of the surrogate model. To demonstrate the performance of this quantum Bayesian optimization algorithm, we apply it to the hyperparameter optimization of a machine learning model which performs regression on a real-world dataset. We benchmark the quantum Bayesian optimization against its classical counterpart and show that quantum version can match its performance.


Maximum Likelihood Estimation in Gaussian Process Regression is Ill-Posed

arXiv.org Artificial Intelligence

Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.