AITopics

Stochastic (online) learning can be faster than batch learning. However, at late times, the learning rate must be annealed to remove thenoise present in the stochastic weight updates. In this annealing phase, the convergence rate (in mean square) is at best proportional to l/T where T is the number of input presentations. An alternative is to increase the batch size to remove the noise. In this paper we explore convergence for LMS using 1) small but fixed batch sizes and 2) an adaptive batch size. We show that the best adaptive batch schedule is exponential and has a rate of convergence whichis the same as for annealing, Le., at best proportional to l/T. 1 Introduction Stochastic (online) learning can speed learning over its batch training particularly,,,,hen data sets are large and contain redundant information [M0l93J. However, at late times in learning, noise present in the weight updates prevents complete convergence fromtaking place. To reduce the noise, the learning rate is slowly decreased (annealed{ at late times. The optimal annealing schedule is asymptotically proportional toT where t is the iteration [GoI87, L093, Orr95J.

artificial intelligence, batch size, machine learning, (12 more...)

Country:

North America > United States > Oregon (0.14)
North America > United States > California (0.14)

Industry: Education > Educational Setting > Online (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Opper, Manfred, Winther, Ole

A Mean Field Algorithm for Bayes Learning in Large Feed-forward Neural Networks

In the Bayes approach to statistical inference [Berger, 1985] one assumes that the prior uncertainty about parameters of an unknown data generating mechanism can be encoded in a probability distribution, the so called prior. Using the prior and the likelihood of the data given the parameters, the posterior distribution of the parameters can be derived from Bayes rule. From this posterior, various estimates for functions ofthe parameter, like predictions about unseen data, can be calculated. However, in general, those predictions cannot be realised by specific parameter values, but only by an ensemble average over parameters according to the posterior probability. Hence,exact implementations of Bayes method for neural networks require averages over network parameters which in general can be performed by time consuming 226 M.Opper and O. Winther Monte Carlo procedures.

approximation, artificial intelligence, machine learning, (17 more...)

Country:

Europe > Germany (0.14)
Europe > Denmark (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Maass, Wolfgang, Orponen, Pekka

On the Effect of Analog Noise in Discrete-Time Analog Computations

Wolfgang Maass Institute for Theoretical Computer Science Technische Universitat Graz* PekkaOrponen Department of Mathematics University of Jyvaskylat Abstract We introduce a model for noise-robust analog computations with discrete time that is flexible enough to cover the most important concrete cases, such as computations in noisy analog neural nets and networks of noisy spiking neurons. We show that the presence of arbitrarily small amounts of analog noise reduces the power of analog computational models to that of finite automata, and we also prove a new type of upper bound for the VC-dimension of computational models with analog noise. 1 Introduction Analog noise is a serious issue in practical analog computation. However there exists no formal model for reliable computations by noisy analog systems which allows us to address this issue in an adequate manner. The investigation of noise-tolerant digital computations in the presence of stochastic failures of gates or wires had been initiated by [von Neumann, 1956]. We refer to [Cowan, 1966] and [Pippenger, 1989] for a small sample of the nllmerous results that have been achieved in this direction. The same framework (with stochastic failures of gates or wires) hac; been applied to analog neural nets in [Siegelmann, 1994].

artificial intelligence, computation, machine learning, (17 more...)

Country:

North America > United States (0.28)
Europe > Austria > Styria > Graz (0.25)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Noisy Spiking Neurons with Temporal Coding have more Computational Power than Sigmoidal Neurons

Maass, Wolfgang

Furthermore it is shown that networks of noisy spiking neurons with temporal coding have a strictly larger computational power than sigmoidal neural nets with the same number of units. 1 Introduction and Definitions We consider a formal model SNN for a §piking neuron network that is basically a reformulation of the spike response model (and of the leaky integrate and fire model) without using 6-functions (see [Maass, 1996a] or [Maass, 1996b] for further backgrou nd).

artificial intelligence, machine learning, neuron, (14 more...)

Country: Europe > Austria (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Krzyzak, Adam, Linder, Tamás

Radial Basis Function Networks and Complexity Regularization in Function Learning

In this paper we apply the method of complexity regularization to derive estimationbounds for nonlinear function estimation using a single hidden layer radial basis function network.

artificial intelligence, machine learning, neural network, (12 more...)

Country:

Europe > Hungary (0.15)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Drucker, Harris, Burges, Christopher J. C., Kaufman, Linda, Smola, Alex J., Vapnik, Vladimir

Support Vector Regression Machines

A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.

artificial intelligence, machine learning, representation, (17 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Brightwell, Graham, Kenyon, Claire, Paugam-Moisy, Hélène

Multilayer Neural Networks: One or Two Hidden Layers?

The number of hidden layers is a crucial parameter for the architecture of multilayer neural networks. Early research, in the 60's, addressed the problem of exactly realizing Booleanfunctions with binary networks or binary multilayer networks. On the one hand, more recent work focused on approximately realizing real functions with multilayer neural networks with one hidden layer [6, 7, 11] or with two hidden units [2]. On the other hand, some authors [1, 12] were interested in finding bounds on the architecture of multilayer networks for exact realization of a finite set of points.

artificial intelligence, hyperplane, machine learning, (17 more...)

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Bös, Siegfried, Opper, Manfred

Dynamics of Training

A new method to calculate the full training process of a neural network isintroduced. No sophisticated methods like the replica trick are used. The results are directly related to the actual number of training steps. Some results are presented here, like the maximal learning rate, an exact description of early stopping, and the necessary numberof training steps. Further problems can be addressed with this approach.

artificial intelligence, machine learning, training step, (15 more...)

Country: Asia > Japan (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient

Amari, Shun-ichi

Shun-ichi Amari RIKEN Frontier Research Program, RIKEN, Hirosawa 2-1, Wako-shi 351-01, Japan amari@zoo.riken.go.jp Abstract The parameter space of neural networks has a Riemannian metric structure.The natural Riemannian gradient should be used instead of the conventional gradient, since the former denotes the true steepest descent direction of a loss function in the Riemannian space. The behavior of the stochastic gradient learning algorithm is much more effective if the natural gradient is used. The present paper studies the information-geometrical structure of perceptrons and other networks, and prove that the online learning method based on the natural gradient is asymptotically as efficient as the optimal batch algorithm. Adaptive modification of the learning constant is proposed and analyzed in terms of the Riemannian measure andis shown to be efficient. The natural gradient is finally applied to blind separation of mixtured independent signal sources. 1 Introd uction Neural learning takes place in the parameter space of modifiable synaptic weights of a neural network.

artificial intelligence, gradient, machine learning, (17 more...)

Country:

Asia > Japan (0.24)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.37)

Todorov, Emanuel, Siapas, Athanassios, Somers, David

A Model of Recurrent Interactions in Primary Visual Cortex

A general feature of the cerebral cortex is its massive interconnectivity -it has been estimated anatomically [19] that cortical neurons receive upwards of 5,000 synapses, the majority of which originate from other nearby cortical neurons. Numerous experiments inprimary visual cortex (VI) have revealed strongly nonlinear interactions between stimulus elements which activate classical and nonclassical receptive field regions.

artificial intelligence, interaction, response function, (13 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.34)