Gaussian Process Regression with Mismatched Models

Sollich, Peter

Neural Information Processing Systems 

I derive approximations to the learning curves for the more generic case of mismatched models, and find very rich behaviour: For large input space dimensionality, where the results become exact, there are universal (student-independent) plateaux in the learning curve, with transitions in between that can exhibit arbitrarily many over-fitting maxima; over-fitting can occur even if the student estimates the teacher noise level correctly. In lower dimensions, plateaux also appear, and the learning curve remains dependent on the mismatch between student and teacher even in the asymptotic limit of a large number of training examples. Learning withexcessively strong smoothness assumptions can be particularly dangerous:For example, a student with a standard radial basis function covariance function will learn a rougher teacher function onlylogarithmically slowly. All predictions are confirmed by simulations. 1 Introduction There has in the last few years been a good deal of excitement about the use of Gaussian processes (GPs) as an alternative to feedforward networks [1]. GPs make prior assumptions about the problem to be learned very transparent, and even though they are nonparametric models, inference-at least in the case of regression considered below-is relatively straightforward. One crucial question for applications is then how'fast' GPs learn, i.e. how many training examples are needed to achieve a certain level of generalization performance.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found