Goto

Collaborating Authors

 Country


Removing Noise in On-Line Search using Adaptive Batch Sizes

Neural Information Processing Systems

Stochastic (online) learning can be faster than batch learning. However, at late times, the learning rate must be annealed to remove thenoise present in the stochastic weight updates. In this annealing phase, the convergence rate (in mean square) is at best proportional to l/T where T is the number of input presentations. An alternative is to increase the batch size to remove the noise. In this paper we explore convergence for LMS using 1) small but fixed batch sizes and 2) an adaptive batch size. We show that the best adaptive batch schedule is exponential and has a rate of convergence whichis the same as for annealing, Le., at best proportional to l/T. 1 Introduction Stochastic (online) learning can speed learning over its batch training particularly,,,,hen data sets are large and contain redundant information [M0l93J. However, at late times in learning, noise present in the weight updates prevents complete convergence fromtaking place. To reduce the noise, the learning rate is slowly decreased (annealed{ at late times. The optimal annealing schedule is asymptotically proportional toT where t is the iteration [GoI87, L093, Orr95J.


A Mean Field Algorithm for Bayes Learning in Large Feed-forward Neural Networks

Neural Information Processing Systems

In the Bayes approach to statistical inference [Berger, 1985] one assumes that the prior uncertainty about parameters of an unknown data generating mechanism can be encoded in a probability distribution, the so called prior. Using the prior and the likelihood of the data given the parameters, the posterior distribution of the parameters can be derived from Bayes rule. From this posterior, various estimates for functions ofthe parameter, like predictions about unseen data, can be calculated. However, in general, those predictions cannot be realised by specific parameter values, but only by an ensemble average over parameters according to the posterior probability. Hence,exact implementations of Bayes method for neural networks require averages over network parameters which in general can be performed by time consuming 226 M.Opper and O. Winther Monte Carlo procedures.


On the Effect of Analog Noise in Discrete-Time Analog Computations

Neural Information Processing Systems

Wolfgang Maass Institute for Theoretical Computer Science Technische Universitat Graz* PekkaOrponen Department of Mathematics University of Jyvaskylat Abstract We introduce a model for noise-robust analog computations with discrete time that is flexible enough to cover the most important concrete cases, such as computations in noisy analog neural nets and networks of noisy spiking neurons. We show that the presence of arbitrarily small amounts of analog noise reduces the power of analog computational models to that of finite automata, and we also prove a new type of upper bound for the VC-dimension of computational models with analog noise. 1 Introduction Analog noise is a serious issue in practical analog computation. However there exists no formal model for reliable computations by noisy analog systems which allows us to address this issue in an adequate manner. The investigation of noise-tolerant digital computations in the presence of stochastic failures of gates or wires had been initiated by [von Neumann, 1956]. We refer to [Cowan, 1966] and [Pippenger, 1989] for a small sample of the nllmerous results that have been achieved in this direction. The same framework (with stochastic failures of gates or wires) hac; been applied to analog neural nets in [Siegelmann, 1994].


Noisy Spiking Neurons with Temporal Coding have more Computational Power than Sigmoidal Neurons

Neural Information Processing Systems

Furthermore it is shown that networks of noisy spiking neurons with temporal coding have a strictly larger computational power than sigmoidal neural nets with the same number of units. 1 Introduction and Definitions We consider a formal model SNN for a ยงpiking neuron network that is basically a reformulation of the spike response model (and of the leaky integrate and fire model) without using 6-functions (see [Maass, 1996a] or [Maass, 1996b] for further backgrou nd).



Support Vector Regression Machines

Neural Information Processing Systems

A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.


Multilayer Neural Networks: One or Two Hidden Layers?

Neural Information Processing Systems

The number of hidden layers is a crucial parameter for the architecture of multilayer neural networks. Early research, in the 60's, addressed the problem of exactly realizing Booleanfunctions with binary networks or binary multilayer networks. On the one hand, more recent work focused on approximately realizing real functions with multilayer neural networks with one hidden layer [6, 7, 11] or with two hidden units [2]. On the other hand, some authors [1, 12] were interested in finding bounds on the architecture of multilayer networks for exact realization of a finite set of points.


Dynamics of Training

Neural Information Processing Systems

A new method to calculate the full training process of a neural network isintroduced. No sophisticated methods like the replica trick are used. The results are directly related to the actual number of training steps. Some results are presented here, like the maximal learning rate, an exact description of early stopping, and the necessary numberof training steps. Further problems can be addressed with this approach.


Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient

Neural Information Processing Systems

Shun-ichi Amari RIKEN Frontier Research Program, RIKEN, Hirosawa 2-1, Wako-shi 351-01, Japan amari@zoo.riken.go.jp Abstract The parameter space of neural networks has a Riemannian metric structure.The natural Riemannian gradient should be used instead of the conventional gradient, since the former denotes the true steepest descent direction of a loss function in the Riemannian space. The behavior of the stochastic gradient learning algorithm is much more effective if the natural gradient is used. The present paper studies the information-geometrical structure of perceptrons and other networks, and prove that the online learning method based on the natural gradient is asymptotically as efficient as the optimal batch algorithm. Adaptive modification of the learning constant is proposed and analyzed in terms of the Riemannian measure andis shown to be efficient. The natural gradient is finally applied to blind separation of mixtured independent signal sources. 1 Introd uction Neural learning takes place in the parameter space of modifiable synaptic weights of a neural network.


A Model of Recurrent Interactions in Primary Visual Cortex

Neural Information Processing Systems

A general feature of the cerebral cortex is its massive interconnectivity -it has been estimated anatomically [19] that cortical neurons receive upwards of 5,000 synapses, the majority of which originate from other nearby cortical neurons. Numerous experiments inprimary visual cortex (VI) have revealed strongly nonlinear interactions between stimulus elements which activate classical and nonclassical receptive field regions.