Goto

Collaborating Authors

 Saad, David




Dynamics of Supervised Learning with Restricted Training Sets

Neural Information Processing Systems

We study the dynamics of supervised learning in layered neural networks, in the regime where the size p of the training set is proportional to the number N of inputs. Here the local fields are no longer described by Gaussian distributions.


Dynamics of Supervised Learning with Restricted Training Sets

Neural Information Processing Systems

We study the dynamics of supervised learning in layered neural networks, in the regime where the size p of the training set is proportional to the number N of inputs. Here the local fields are no longer described by Gaussian distributions.


Globally Optimal On-line Learning Rules

Neural Information Processing Systems

We present a method for determining the globally optimal online learning rule for a soft committee machine under a statistical mechanics framework. This work complements previous results on locally optimal rules, where only the rate of change in generalization error was considered. We maximize the total reduction in generalization error over the whole learning process and show how the resulting rule can significantly outperform the locally optimal rule. 1 Introduction We consider a learning scenario in which a feed-forward neural network model (the student) emulates an unknown mapping (the teacher), given a set of training examples produced by the teacher. The performance of the student network is typically measured by its generalization error, which is the expected error on an unseen example. The aim of training is to reduce the generalization error by adapting the student network's parameters appropriately. A common form of training is online learning, where training patterns are presented sequentially and independently to the network at each learning step.


Globally Optimal On-line Learning Rules

Neural Information Processing Systems

We present a method for determining the globally optimal online learning rule for a soft committee machine under a statistical mechanics framework. This work complements previous results on locally optimal rules, where only the rate of change in generalization the total reduction inerror was considered. We maximize the whole learning process and show howgeneralization error over the resulting rule can significantly outperform the locally optimal rule. 1 Introduction We consider a learning scenario in which a feed-forward neural network model (the an unknown mapping (the teacher), given a set of training examplesstudent) emulates The performance of the student network is typicallyproduced by the teacher. A common form of training is online learning, where training patterns are presented sequentially and independently to the network at each learning step. This form of training can be beneficial in terms of both storage and computation time, especially for large systems.


Two Approaches to Optimal Annealing

Neural Information Processing Systems

We employ both master equation and order parameter approaches to analyze the asymptotic dynamics of online learning with different learning rate annealing schedules. We examine the relations between the results obtained by the two approaches and obtain new results on the optimal decay coefficients and their dependence on the number of hidden nodes in a two layer architecture.


Two Approaches to Optimal Annealing

Neural Information Processing Systems

The latter studies are based on examining the Kramers Moyal expansion of the master equation for the weight space probability densities. A different approach, based on the deterministic dynamics of macroscopic quantities called order parameters, has been recently presented [6, 7].


Learning with Noise and Regularizers in Multilayer Neural Networks

Neural Information Processing Systems

We study the effect of noise and regularization in an online gradient-descent learning scenario for a general two-layer student network with an arbitrary number of hidden units. Training examples are randomly drawn input vectors labeled by a two-layer teacher network with an arbitrary number of hidden units; the examples are corrupted by Gaussian noise affecting either the output or the model itself. We examine the effect of both types of noise and that of weight-decay regularization on the dynamical evolution of the order parameters and the generalization error in various phases of the learning process.


Learning with Noise and Regularizers in Multilayer Neural Networks

Neural Information Processing Systems

We study the effect of noise and regularization in an online gradient-descent learning scenario for a general two-layer student network with an arbitrary number of hidden units. Training examples arerandomly drawn input vectors labeled by a two-layer teacher network with an arbitrary number of hidden units; the examples arecorrupted by Gaussian noise affecting either the output or the model itself. We examine the effect of both types of noise and that of weight-decay regularization on the dynamical evolution ofthe order parameters and the generalization error in various phases of the learning process. 1 Introduction One of the most powerful and commonly used methods for training large layered neural networks is that of online learning, whereby the internal network parameters {J} are modified after the presentation of each training example so as to minimize the corresponding error.