Goto

Collaborating Authors

 Rohwer, Richard


The Generalisation Cost of RAMnets

Neural Information Processing Systems

Neural Computing Research Group Aston University Aston Triangle, Birmingham B4 7ET, UK. Abstract Given unlimited computational resources, it is best to use a criterion ofminimal expected generalisation error to select a model and determine its parameters. However, it may be worthwhile to sacrifice somegeneralisation performance for higher learning speed. A method for quantifying sub-optimality is set out here, so that this choice can be made intelligently. Furthermore, the method is applicable to a broad class of models, including the ultra-fast memory-based methods such as RAMnets. This brings the added benefit of providing, for the first time, the means to analyse the generalisation properties of such models in a Bayesian framework . 1 Introduction In order to quantitatively predict the performance of methods such as the ultra-fast RAMnet, which are not trained by minimising a cost function, we develop a Bayesian formalism for estimating the generalisation cost of a wide class of algorithms.


The Generalisation Cost of RAMnets

Neural Information Processing Systems

We follow a similar approach to (Zhu & Rohwer, to appear 1996) in using a Gaussian process to define a prior over the space of functions, so that the expected generalisation cost under the posterior can be determined. The optimal model is defined in terms of the restriction of this posterior to the subspace defined by the model. The optimum is easily determined for linear models over a set of basis functions. We go on to compute the generalisation cost (with an error bar) for all models of this class, which we demonstrate to include the RAMnets.


Time Trials on Second-Order and Variable-Learning-Rate Algorithms

Neural Information Processing Systems

The performance of seven minimization algorithms are compared on five neural network problems. These include a variable-step-size algorithm, conjugate gradient, and several methods with explicit analytic or numerical approximations to the Hessian.


Time Trials on Second-Order and Variable-Learning-Rate Algorithms

Neural Information Processing Systems

In 4 of these methods the gradient is divided component-wise by a decaying average of either the second derivatives or their absolute values.


The "Moving Targets" Training Algorithm

Neural Information Processing Systems

A simple method for training the dynamical behavior of a neural networkis derived. It is applicable to any training problem in discrete-time networks with arbitrary feedback. The algorithm resembles back-propagation in that an error function is minimized using a gradient-based method, but the optimization is carried out in the hidden part of state space either instead of, or in addition to weight space. Computational results are presented for some simple dynamical training problems, one of which requires response to a signal 100 time steps in the past. 1 INTRODUCTION This paper presents a minimization-based algorithm for training the dynamical behavior ofa discrete-time neural network model. The central idea is to treat hidden nodes as target nodes with variable training data.


The "Moving Targets" Training Algorithm

Neural Information Processing Systems

A simple method for training the dynamical behavior of a neural network is derived. It is applicable to any training problem in discrete-time networks with arbitrary feedback. The algorithm resembles back-propagation in that an error function is minimized using a gradient-based method, but the optimization is carried out in the hidden part of state space either instead of, or in addition to weight space. Computational results are presented for some simple dynamical training problems, one of which requires response to a signal 100 time steps in the past. 1 INTRODUCTION This paper presents a minimization-based algorithm for training the dynamical behavior of a discrete-time neural network model. The central idea is to treat hidden nodes as target nodes with variable training data.