Goto

Collaborating Authors

 stationary kernel


Kriging via variably scaled kernels

arXiv.org Machine Learning

Classical Gaussian processes and Kriging models are commonly based on stationary kernels, whereby correlations between observations depend exclusively on the relative distance between scattered data. While this assumption ensures analytical tractability, it limits the ability of Gaussian processes to represent heterogeneous correlation structures. In this work, we investigate variably scaled kernels as an effective tool for constructing non-stationary Gaussian processes by explicitly modifying the correlation structure of the data. Through a scaling function, variably scaled kernels alter the correlations between data and enable the modeling of targets exhibiting abrupt changes or discontinuities. We analyse the resulting predictive uncertainty via the variably scaled kernel power function and clarify the relationship between variably scaled kernels-based constructions and classical non-stationary kernels. Numerical experiments demonstrate that variably scaled kernels-based Gaussian processes yield improved reconstruction accuracy and provide uncertainty estimates that reflect the underlying structure of the data


Regular Fourier Features for Nonstationary Gaussian Processes

arXiv.org Machine Learning

Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample locations. Spectral methods address this challenge by exploiting the Fourier representation, treating the spectral density as a probability distribution for Monte Carlo approximation. Although this probabilistic interpretation works for stationary processes, it is overly restrictive for the nonstationary case, where spectral densities are generally not probability measures. We propose regular Fourier features for harmonizable processes that avoid this limitation. Our method discretizes the spectral representation directly, preserving the correlation structure among spectral weights without requiring probability assumptions. Under a finite spectral support assumption, this yields an efficient low-rank approximation that is positive semi-definite by construction. When the spectral density is unknown, the framework extends naturally to kernel learning from data. We demonstrate the method on locally stationary kernels and on harmonizable mixture kernels with complex-valued spectral densities.



Thin and Deep Gaussian Processes Daniel Augusto de Souza

Neural Information Processing Systems

Gaussian processes (GPs) can provide a principled approach to uncertainty quantification with easy-to-interpret kernel hyperparameters, such as the lengthscale, which controls the correlation distance of function values. However, selecting an appropriate kernel can be challenging. Deep GPs avoid manual kernel engineering by successively parameterizing kernels with GP layers, allowing them to learn low-dimensional embeddings of the inputs that explain the output data. Following the architecture of deep neural networks, the most common deep GPs warp the input space layer-by-layer but lose all the interpretability of shallow GPs. An alternative construction is to successively parameterize the lengthscale of a kernel, improving the interpretability but ultimately giving away the notion of learning lower-dimensional embeddings.


Thin and Deep Gaussian Processes Daniel Augusto de Souza

Neural Information Processing Systems

Gaussian processes (GPs) can provide a principled approach to uncertainty quantification with easy-to-interpret kernel hyperparameters, such as the lengthscale, which controls the correlation distance of function values. However, selecting an appropriate kernel can be challenging. Deep GPs avoid manual kernel engineering by successively parameterizing kernels with GP layers, allowing them to learn low-dimensional embeddings of the inputs that explain the output data. Following the architecture of deep neural networks, the most common deep GPs warp the input space layer-by-layer but lose all the interpretability of shallow GPs. An alternative construction is to successively parameterize the lengthscale of a kernel, improving the interpretability but ultimately giving away the notion of learning lower-dimensional embeddings.



Author response: 'Stationary Activations for Uncertainty Calibration in Deep Learning ' NeurIPS: # 5154

Neural Information Processing Systems

We thank the anonymous reviewers for their enthusiasm and detailed comments on the manuscript. We start by addressing R2's concerns as they had the lowest score. This link is explicit as shown in this paper. The choice of kernel/activation function is up to the modelling task and'expert We merely provide a building block. The Matรฉrn is a widely used prior, and worth adding to the NN tool set.



Novel Pivoted Cholesky Decompositions for Efficient Gaussian Process Inference

arXiv.org Artificial Intelligence

The Cholesky decomposition is a fundamental tool for solving linear systems with symmetric and positive definite matrices which are ubiquitous in linear algebra, optimization, and machine learning. Its numerical stability can be improved by introducing a pivoting strategy that iteratively permutes the rows and columns of the matrix. The order of pivoting indices determines how accurately the intermediate decomposition can reconstruct the original matrix, thus is decisive for the algorithm's efficiency in the case of early termination. Standard implementations select the next pivot from the largest value on the diagonal. In the case of Bayesian nonparametric inference, this strategy corresponds to greedy entropy maximization, which is often used in active learning and design of experiments. We explore this connection in detail and deduce novel pivoting strategies for the Cholesky decomposition. The resulting algorithms are more efficient at reducing the uncertainty over a data set, can be updated to include information about observations, and additionally benefit from a tailored implementation. We benchmark the effectiveness of the new selection strategies on two tasks important to Gaussian processes: sparse regression and inference based on preconditioned iterative solvers. Our results show that the proposed selection strategies are either on par or, in most cases, outperform traditional baselines while requiring a negligible amount of additional computation.


Spectral Mixture Kernels for Bayesian Optimization

arXiv.org Artificial Intelligence

Bayesian Optimization (BO) is a widely used approach for solving expensive black-box optimization tasks. However, selecting an appropriate probabilistic surrogate model remains an important yet challenging problem. In this work, we introduce a novel Gaussian Process (GP)-based BO method that incorporates spectral mixture kernels, derived from spectral densities formed by scale-location mixtures of Cauchy and Gaussian distributions. This method achieves a significant improvement in both efficiency and optimization performance, matching the computational speed of simpler kernels while delivering results that outperform more complex models and automatic BO methods. We provide bounds on the information gain and cumulative regret associated with obtaining the optimum. Extensive numerical experiments demonstrate that our method consistently outperforms existing baselines across a diverse range of synthetic and real-world problems, including both low- and high-dimensional settings.