Technology
Removing Noise in On-Line Search using Adaptive Batch Sizes
Stochastic (online) learning can be faster than batch learning. However, at late times, the learning rate must be annealed to remove the noise present in the stochastic weight updates. In this annealing phase, the convergence rate (in mean square) is at best proportional to l/T where T is the number of input presentations. An alternative is to increase the batch size to remove the noise. In this paper we explore convergence for LMS using 1) small but fixed batch sizes and 2) an adaptive batch size. We show that the best adaptive batch schedule is exponential and has a rate of convergence which is the same as for annealing, Le., at best proportional to l/T.
A Mean Field Algorithm for Bayes Learning in Large Feed-forward Neural Networks
In the Bayes approach to statistical inference [Berger, 1985] one assumes that the prior uncertainty about parameters of an unknown data generating mechanism can be encoded in a probability distribution, the so called prior. Using the prior and the likelihood of the data given the parameters, the posterior distribution of the parameters can be derived from Bayes rule. From this posterior, various estimates for functions ofthe parameter, like predictions about unseen data, can be calculated. However, in general, those predictions cannot be realised by specific parameter values, but only by an ensemble average over parameters according to the posterior probability. Hence, exact implementations of Bayes method for neural networks require averages over network parameters which in general can be performed by time consuming 226 M. Opper and O. Winther Monte Carlo procedures.
On the Effect of Analog Noise in Discrete-Time Analog Computations
Maass, Wolfgang, Orponen, Pekka
We introduce a model for noise-robust analog computations with discrete time that is flexible enough to cover the most important concrete cases, such as computations in noisy analog neural nets and networks of noisy spiking neurons. We show that the presence of arbitrarily small amounts of analog noise reduces the power of analog computational models to that of finite automata, and we also prove a new type of upper bound for the VC-dimension of computational models with analog noise. 1 Introduction Analog noise is a serious issue in practical analog computation. However there exists no formal model for reliable computations by noisy analog systems which allows us to address this issue in an adequate manner. The investigation of noise-tolerant digital computations in the presence of stochastic failures of gates or wires had been initiated by [von Neumann, 1956]. We refer to [Cowan, 1966] and [Pippenger, 1989] for a small sample of the nllmerous results that have been achieved in this direction. The same framework (with stochastic failures of gates or wires) hac; been applied to analog neural nets in [Siegelmann, 1994].
Noisy Spiking Neurons with Temporal Coding have more Computational Power than Sigmoidal Neurons
Furthermore it is shown that networks of noisy spiking neurons with temporal coding have a strictly larger computational power than sigmoidal neural nets with the same number of units. 1 Introduction and Definitions We consider a formal model SNN for a ยงpiking neuron network that is basically a reformulation of the spike response model (and of the leaky integrate and fire model) without using 6-functions (see [Maass, 1996a] or [Maass, 1996b] for further backgrou nd).
An Apobayesian Relative of Winnow
Littlestone, Nick, Mesterharm, Chris
We study a mistake-driven variant of an online Bayesian learning algorithm (similar to one studied by Cesa-Bianchi, Helmbold, and Panizza [CHP96]). This variant only updates its state (learns) on trials in which it makes a mistake. The algorithm makes binary classifications using a linear-threshold classifier and runs in time linear in the number of attributes seen by the learner. We have been able to show, theoretically and in simulations, that this algorithm performs well under assumptions quite different from those embodied in the prior of the original Bayesian algorithm. It can handle situations that we do not know how to handle in linear time with Bayesian algorithms. We expect our techniques to be useful in deriving and analyzing other apobayesian algorithms. 1 Introduction We consider two styles of online learning.
Statistical Mechanics of the Mixture of Experts
The mixture of experts [1, 2] is a well known example which implements the philosophy of divide-and-conquer elegantly. Whereas this model are gaining more popularity in various applications, there have been little efforts to evaluate generalization capability of these modular approaches theoretically. Here we present the first analytic study of generalization in the mixture of experts from the statistical 184 K. Kang and 1. Oh physics perspective. Use of statistical mechanics formulation have been focused on the study of feedforward neural network architectures close to the multilayer perceptron[5, 6], together with the VC theory[8]. We expect that the statistical mechanics approach can also be effectively used to evaluate more advanced architectures including mixture models.
Practical Confidence and Prediction Intervals
We propose a new method to compute prediction intervals. Especially for small data sets the width of a prediction interval does not only depend on the variance of the target distribution, but also on the accuracy of our estimator of the mean of the target, i.e., on the width of the confidence interval. The confidence interval follows from the variation in an ensemble of neural networks, each of them trained and stopped on bootstrap replicates of the original data set. A second improvement is the use of the residuals on validation patterns instead of on training patterns for estimation of the variance of the target distribution. As illustrated on a synthetic example, our method is better than existing methods with regard to extrapolation and interpolation in data regimes with a limited amount of data, and yields prediction intervals which actual confidence levels are closer to the desired confidence levels. 1 STATISTICAL INTERVALS In this paper we will consider feedforward neural networks for regression tasks: estimating an underlying mathematical function between input and output variables based on a finite number of data points possibly corrupted by noise.