Learning Curves for Gaussian Processes

Neural Information Processing Systems 

Within the neural networks community, there has in the last few years been a good deal of excitement about the use of Gaussian processes as an alternative to feedforward networks [lJ. The advantages of Gaussian processes are that prior assumptions about the problem to be learned are encoded in a very transparent way, and that inference-at least in the case of regression that I will consider-is relatively straightforward. One crucial question for applications is then how'fast' Gaussian processes learn, i.e., how many training examples are needed to achieve a certain level of generalization performance. The typical (as opposed to worst case) behaviour is captured in the learning curve, which gives the average generalization error as a function of the number of training examples n. Several workers have [2,3, 4J or studied its large n asymptotics. As I will illustrate derived bounds on (n) below, however, the existing bounds are often far from tight; and asymptotic results will not necessarily apply for realistic sample sizes n. My main aim in this paper is therefore to derive approximations to ( n) which get closer to the true learning curves than existing bounds, and apply both for small and large n. In its simplest form, the regression problem that I am considering is this: We are trying to learn a function 0* which maps inputs x (real-valued vectors) to (real(cid:173) valued scalar) outputs O*(x) .