Online Learning from Finite Training Sets: An Analytical Case Study
–Neural Information Processing Systems
By an extension of statistical mechanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p N, larger learning rates can be used without compromising asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay,\) at given final learning time, the generalization performance of online learning is essentially as good as that of offline learning.