gp classification
Decentralized Gaussian Process Classification and an Application in Subsea Robotics
Gao, Yifei, He, Hans J., Stilwell, Daniel J., McMahon, James
Teams of cooperating autonomous underwater vehicles (AUVs) rely on acoustic communication for coordination, yet this communication medium is constrained by limited range, multi-path effects, and low bandwidth. One way to address the uncertainty associated with acoustic communication is to learn the communication environment in real-time. We address the challenge of a team of robots building a map of the probability of communication success from one location to another in real-time. This is a decentralized classification problem -- communication events are either successful or unsuccessful -- where AUVs share a subset of their communication measurements to build the map. The main contribution of this work is a rigorously derived data sharing policy that selects measurements to be shared among AUVs. We experimentally validate our proposed sharing policy using real acoustic communication data collected from teams of Virginia Tech 690 AUVs, demonstrating its effectiveness in underwater environments.
Towards Efficient Occupancy Mapping via Gaussian Process Latent Field Shaping
Gentil, Cedric Le, Pradalier, Cedric, Barfoot, Timothy D.
Occupancy mapping has been a key enabler of mobile robotics. Originally based on a discrete grid representation, occupancy mapping has evolved towards continuous representations that can predict the occupancy status at any location and account for occupancy correlations between neighbouring areas. Gaussian Process (GP) approaches treat this task as a binary classification problem using both observations of occupied and free space. Conceptually, a GP latent field is passed through a logistic function to obtain the output class without actually manipulating the GP latent field. In this work, we propose to act directly on the latent function to efficiently integrate free space information as a prior based on the shape of the sensor's field-of-view. A major difference with existing methods is the change in the classification problem, as we distinguish between free and unknown space. The `occupied' area is the infinitesimally thin location where the class transitions from free to unknown. We demonstrate in simulated environments that our approach is sound and leads to competitive reconstruction accuracy.
Reviews: Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification
The authors of this paper introduce a novel approach to GP classification, called GPD. The authors use a GP to produce the parameters of a Dirichlet distribution, and use a categorical likelihood for multi-class classification problems. After applying a log-normal approximation to the Dirichlet distribution, inference for GPD is the same as exact-GP inference (i.e. The authors show that GPD has competitive accuracy, is well calibrated, and offers a speedup over existing GP-classificaiton methods. Quality: The method introduced by this paper is a clever probabilistic formulation of Bayesian classification.
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
Miller, Jack, O'Neill, Charles, Bui, Thang
In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures provides evidence that grokking is not specific to SGD or weight norm regularisation. Instead, grokking may be possible in any setting where solution search is guided by complexity and error. Based on this insight and further trends we see in the training trajectories of a Bayesian neural network (BNN) and GP regression model, we make progress towards a more general theory of grokking. Specifically, we hypothesise that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes.
Self-Distillation for Gaussian Process Regression and Classification
Borup, Kenneth, Andersen, Lars Nørvang
We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher, while the distribution-centric approach, re-uses the full probabilistic posterior for the next iteration. By analyzing the properties of these approaches, we show that the data-centric approach for GPR closely relates to known results for self-distillation of kernel ridge regression and that the distribution-centric approach for GPR corresponds to ordinary GPR with a very particular choice of hyperparameters. Furthermore, we demonstrate that the distribution-centric approach for GPC approximately corresponds to data duplication and a particular scaling of the covariance and that the data-centric approach for GPC requires redefining the model from a Binomial likelihood to a continuous Bernoulli likelihood to be well-specified. To the best of our knowledge, our proposed approaches are the first to formulate knowledge distillation specifically for Gaussian Process models.
Scalable Variational Gaussian Process Classification
Hensman, James, Matthews, Alex, Ghahramani, Zoubin
Gaussian process classification is a popular method with a number of appealing properties. We show how to scale the model within a variational inducing point framework, outperforming the state of the art on benchmark datasets. Importantly, the variational formulation can be exploited to allow classification in problems with millions of data points, as we demonstrate in experiments.
Pseudo-Marginal Bayesian Inference for Gaussian Processes
Filippone, Maurizio, Girolami, Mark
The main challenges that arise when adopting Gaussian Process priors in probabilistic modeling are how to carry out exact Bayesian inference and how to account for uncertainty on model parameters when making model-based predictions on out-of-sample data. Using probit regression as an illustrative working example, this paper presents a general and effective methodology based on the pseudo-marginal approach to Markov chain Monte Carlo that efficiently addresses both of these issues. The results presented in this paper show improvements over existing sampling methods to simulate from the posterior distribution over the parameters defining the covariance function of the Gaussian Process prior. This is particularly important as it offers a powerful tool to carry out full Bayesian inference of Gaussian Process based hierarchic statistical models in general. The results also demonstrate that Monte Carlo based integration of all model parameters is actually feasible in this class of models providing a superior quantification of uncertainty in predictions. Extensive comparisons with respect to state-of-the-art probabilistic classifiers confirm this assertion.
Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood
Riihimäki, Jaakko, Jylänki, Pasi, Vehtari, Aki
We consider probabilistic multinomial probit classification using Gaussian process (GP) priors. The challenges with the multiclass GP classification are the integration over the non-Gaussian posterior distribution, and the increase of the number of unknown latent variables as the number of target classes grows. Expectation propagation (EP) has proven to be a very accurate method for approximate inference but the existing EP approaches for the multinomial probit GP classification rely on numerical quadratures or independence assumptions between the latent values from different classes to facilitate the computations. In this paper, we propose a novel nested EP approach which does not require numerical quadratures, and approximates accurately all between-class posterior dependencies of the latent values, but still scales linearly in the number of classes. The predictive accuracy of the nested EP approach is compared to Laplace, variational Bayes, and Markov chain Monte Carlo (MCMC) approximations with various benchmark data sets. In the experiments nested EP was the most consistent method with respect to MCMC sampling, but the differences between the compared methods were small if only the classification accuracy is concerned.