Removing Noise in On-Line Search using Adaptive Batch Sizes
–Neural Information Processing Systems
Stochastic (online) learning can be faster than batch learning. However, at late times, the learning rate must be annealed to remove thenoise present in the stochastic weight updates. In this annealing phase, the convergence rate (in mean square) is at best proportional to l/T where T is the number of input presentations. An alternative is to increase the batch size to remove the noise. In this paper we explore convergence for LMS using 1) small but fixed batch sizes and 2) an adaptive batch size. We show that the best adaptive batch schedule is exponential and has a rate of convergence whichis the same as for annealing, Le., at best proportional to l/T. 1 Introduction Stochastic (online) learning can speed learning over its batch training particularly,,,,hen data sets are large and contain redundant information [M0l93J. However, at late times in learning, noise present in the weight updates prevents complete convergence fromtaking place. To reduce the noise, the learning rate is slowly decreased (annealed{ at late times. The optimal annealing schedule is asymptotically proportional toT where t is the iteration [GoI87, L093, Orr95J.
Neural Information Processing Systems
Dec-31-1997
- Country:
- North America > United States
- California (0.14)
- Oregon (0.14)
- North America > United States
- Industry:
- Education > Educational Setting > Online (0.55)
- Technology: