Goto

Collaborating Authors

 Sollich, Peter


Learning Curves for Gaussian Processes

Neural Information Processing Systems

I consider the problem of calculating learning curves (i.e., average generalization performance) of Gaussian processes used for regression. Asimple expression for the generalization error in terms of the eigenvalue decomposition of the covariance function is derived, and used as the starting point for several approximation schemes. I identify where these become exact, and compare with existing bounds on learning curves; the new approximations, which can be used for any input space dimension, generally get substantially closer to the truth. 1 INTRODUCTION: GAUSSIAN PROCESSES Within the neural networks community, there has in the last few years been a good deal of excitement about the use of Gaussian processes as an alternative to feedforward networks [lJ. The advantages of Gaussian processes are that prior assumptions about the problem to be learned are encoded in a very transparent way, and that inference-at least in the case of regression that I will consider-is relatively straightforward. One crucial question for applications is then how'fast' Gaussian processes learn, i.e., how many training examples are needed to achieve a certain level of generalization performance.


On-Line Learning with Restricted Training Sets: Exact Solution as Benchmark for General Theories

Neural Information Processing Systems

Calculation of Q(t) and R(t) using (4, 5, 7, 9) to execute the path average and the average over sets is relatively straightforward, albeit tedious. We find that -"Yt(l -"Yt)


On-line Learning from Finite Training Sets in Nonlinear Networks

Neural Information Processing Systems

Online learning is one of the most common forms of neural network training. We present an analysis of online learning from finite training sets for nonlinear networks (namely, soft-committee machines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.


On-line Learning from Finite Training Sets in Nonlinear Networks

Neural Information Processing Systems

Online learning is one of the most common forms of neural network training.We present an analysis of online learning from finite training sets for nonlinear networks (namely, soft-committee machines), advancingthe theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.


Online Learning from Finite Training Sets: An Analytical Case Study

Neural Information Processing Systems

By an extension of statistical mechanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p N, larger learning rates can be used without compromising asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay,\) at given final learning time, the generalization performance of online learning is essentially as good as that of offline learning.


Online Learning from Finite Training Sets: An Analytical Case Study

Neural Information Processing Systems

By an extension of statistical mechanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p N, larger learning rates can be used without compromising asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay,\) at given final learning time, the generalization performance ofonline learning is essentially as good as that of offline learning.


Learning with ensembles: How overfitting can be useful

Neural Information Processing Systems

We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fit the training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance, in particular if the individual students are subject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of noise in the training data. 1 INTRODUCTION An ensemble is a collection of a (finite) number of neural networks or other types of predictors that are trained for the same task.


Learning with ensembles: How overfitting can be useful

Neural Information Processing Systems

AndersKrogh'" NORDITA, Blegdamsvej 17 2100 Copenhagen, Denmark kroghGsanger.ac.uk Abstract We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fitthe training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. Forsmaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance,in particular if the individual students are subject to noise in the training process.


Learning with ensembles: How overfitting can be useful

Neural Information Processing Systems

We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fit the training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance, in particular if the individual students are subject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of noise in the training data. 1 INTRODUCTION An ensemble is a collection of a (finite) number of neural networks or other types of predictors that are trained for the same task.