Goto

Collaborating Authors

 finite training set


Online Learning from Finite Training Sets: An Analytical Case Study

Neural Information Processing Systems

By an extension of statistical me(cid:173) chanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p N, larger learning rates can be used without compromis(cid:173) ing asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay,) at given final learning time, the generalization per(cid:173) formance of online learning is essentially as good as that of offline learning.


On-line Learning from Finite Training Sets in Nonlinear Networks

Neural Information Processing Systems

Online learning is one of the most common forms of neural net(cid:173) work training. We present an analysis of online learning from finite training sets for non-linear networks (namely, soft-committee ma(cid:173) chines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.


A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets

Neural Information Processing Systems

We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training error and reducing the test error quickly.


A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets

Roux, Nicolas L., Schmidt, Mark, Bach, Francis R.

Neural Information Processing Systems

We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training error and reducing the test error quickly. Papers published at the Neural Information Processing Systems Conference.


On-line Learning from Finite Training Sets in Nonlinear Networks

Sollich, Peter, Barber, David

Neural Information Processing Systems

Online learning is one of the most common forms of neural network training. We present an analysis of online learning from finite training sets for nonlinear networks (namely, soft-committee machines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.


On-line Learning from Finite Training Sets in Nonlinear Networks

Sollich, Peter, Barber, David

Neural Information Processing Systems

Online learning is one of the most common forms of neural network training. We present an analysis of online learning from finite training sets for nonlinear networks (namely, soft-committee machines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.


On-line Learning from Finite Training Sets in Nonlinear Networks

Sollich, Peter, Barber, David

Neural Information Processing Systems

Online learning is one of the most common forms of neural network training.We present an analysis of online learning from finite training sets for nonlinear networks (namely, soft-committee machines), advancingthe theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.


Online Learning from Finite Training Sets: An Analytical Case Study

Sollich, Peter, Barber, David

Neural Information Processing Systems

By an extension of statistical mechanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p N, larger learning rates can be used without compromising asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay,\) at given final learning time, the generalization performance of online learning is essentially as good as that of offline learning.


Online Learning from Finite Training Sets: An Analytical Case Study

Sollich, Peter, Barber, David

Neural Information Processing Systems

By an extension of statistical mechanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p N, larger learning rates can be used without compromising asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay,\) at given final learning time, the generalization performance of online learning is essentially as good as that of offline learning.


Online Learning from Finite Training Sets: An Analytical Case Study

Sollich, Peter, Barber, David

Neural Information Processing Systems

By an extension of statistical mechanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p N, larger learning rates can be used without compromising asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay,\) at given final learning time, the generalization performance ofonline learning is essentially as good as that of offline learning.