Despite the success of Gaussian processes (GPs) in modelling spatial stochastic processes, dealing with large datasets is still challenging. The problem arises by the need to invert a potentially large covariance matrix during inference. In this paper we address the complexity problem by constructing a new stationary covariance function (Mercer kernel) that naturally provides a sparse covariance matrix. The sparseness of the matrix is defined by hyper-parameters optimised during learning. The new covariance function enables exact GP inference and performs comparatively to the squared-exponential one, at a lower computational cost. This allows the application of GPs to large-scale problems such as ore grade prediction in mining or 3D surface modelling. Experiments show that using the proposed covariance function, very sparse covariance matrices are normally obtained which can be effectively used for faster inference and less memory usage.
Li, Tianyang (University of Texas at Austin) | Liu, Liu (University of Texas at Austin) | Kyrillidis, Anastasios ( IBM T.J. Watson Research Center, Yorktown Heights ) | Caramanis, Constantine (University of Texas at Austin)
We present a novel method for frequentist statistical inference in M-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. To show the merits of our scheme, we apply it to both synthetic and real data sets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.
We introduce a class of nonstationary covariance functions for Gaussian process (GP) regression. Nonstationary covariance functions allow the model to adapt to functions whose smoothness varies with the inputs. The class includes a nonstationary version of the Matérn stationary covariance, in which the differentiability of the regression function is controlled by a parameter, freeing one from fixing the differentiability in advance. In experiments, the nonstationary GP regression model performs well when the input space is two or three dimensions, outperforming a neural network model and Bayesian free-knot spline models, and competitive with a Bayesian neural network, but is outperformed in one dimension by a state-of-the-art Bayesian free-knot spline model.
In this contribution we describe an approach to evolve composite covariance functions for Gaussian processes using genetic programming. A critical aspect of Gaussian processes and similar kernel-based models such as SVM is, that the covariance function should be adapted to the modeled data. Frequently, the squared exponential covariance function is used as a default. However, this can lead to a misspecified model, which does not fit the data well. In the proposed approach we use a grammar for the composition of covariance functions and genetic programming to search over the space of sentences that can be derived from the grammar. We tested the proposed approach on synthetic data from two-dimensional test functions, and on the Mauna Loa CO2 time series. The results show, that our approach is feasible, finding covariance functions that perform much better than a default covariance function. For the CO2 data set a composite covariance function is found, that matches the performance of a hand-tuned covariance function.