Support Collapse of Deep Gaussian Processes with Polynomial Kernels for a Wide Regime of Hyperparameters
Chernobrovkina, Daryna, Grünewälder, Steffen
Deep Gaussian processes (DGPs) have been introduced by [1] as a natural extension of Gaussian processes (GPs) that has been inspired by deep neural networks. Like deep neural networks, DGPs have multiple layers and each layer corresponds to an individual GP. It has recently been noted by [2] that traditional GPs attain for certain compositional regression problems a strictly slower rate of convergence than the minimax optimal rate. This is demonstrated in [2] by showing that for a class of generalized additive models any GP will be suboptimal, independently of the kernel function that is used. Generalized additive models can be regarded as a simple form of a compositional model with two layers. In contrast, [3] have shown that DGPs can attain for such problems the minimax optimal rate of convergence (up to logarithmic factors) when the DGPs are carefully tuned. In fact, they show that DGPs are able to attain optimal rates of convergence for many compositional problems. Along similar lines, [4] show that for nonlinear inverse problems DGPs can attain a rate of convergence that is polynomially faster than the rate that GPs with Matérn kernel functions can attain when the unknown parameter has a compositional structure. One well known downside of DGPs is the difficulty of sampling from the posterior distribution.
Mar-15-2025