Adaptive sparse variational approximations for Gaussian process regression
Department of Decision Sciences, Bocconi Institute for Data Science and Analytics, Bocconi University, Milan Abstract Accurate tuning of hyperparameters is crucial to ensure that models can generalise effectively across different settings. We construct a variational approximation to a hierarchical Bayes procedure, and derive upper bounds for the contraction rate of the variational posterior in an abstract setting. The theory is applied to various Gaussian process priors and variational classes, resulting in minimax optimal rates. Our theoretical results are accompanied with numerical analysis both on synthetic and real world data sets. Keywords: variational inference, Bayesian model selection, Gaussian processes, nonparametric regression, adaptation, posterior contraction rates 1 Introduction A core challenge in Bayesian statistics is scalability, i.e. the computation of the posterior for large sample sizes. Variational Bayes approximation is a standard approach to speed up inference. Variational posteriors are random probability measures that minimise the Kullback-Leibler divergence between a suitable class of distributions and the otherwise hard to compute posterior. Typically, the variational class of distributions over which the optimisation takes place does not contain the original posterior, hence the variational procedure can be viewed as a projection onto this class. The projected variational distribution then approximates the posterior. During the approximation procedure one inevitably loses information and hence it is important to characterize the accuracy of the approach. Despite the wide use of variational approximations, their theoretical underpinning started to emerge only recently, see for instance Alquier and Ridgway (2020); Yang et al. (2020); Zhang and Gao (2020a); Ray and Szab o (2022). In a Bayesian procedure, the choice of prior reflects the presumed properties of the unknown parameter. In comparison to regular parametric models, where in view of the Bernstein-von Mises theorem the posterior is asymptotically normal, the prior plays a crucial role in the asymptotic behaviour of the posterior. In fact, the large-sample behaviour of the posterior typically depends intricately on the choice of prior hyperparam-eters, so it is vital that these are tuned correctly. The two classical approaches are hierarchical and empirical Bayes methods.
Apr-4-2025