Saad, David
The Learning Dynamcis of a Universal Approximator
West, Ansgar H. L., Saad, David, Nabney, Ian T.
The learning properties of a universal approximator, a normalized committee machine with adjustable biases, are studied for online back-propagation learning. Within a statistical mechanics framework, numericalstudies show that this model has features which do not exist in previously studied two-layer network models without adjustablebiases, e.g., attractive suboptimal symmetric phases even for realizable cases and noiseless data. 1 INTRODUCTION Recently there has been much interest in the theoretical breakthrough in the understanding ofthe online learning dynamics of multi-layer feedforward perceptrons (MLPs) using a statistical mechanics framework. In the seminal paper (Saad & Solla, 1995), a two-layer network with an arbitrary number of hidden units was studied, allowing insight into the learning behaviour of neural network models whose complexity is of the same order as those used in real world applications.
The Learning Dynamcis of a Universal Approximator
West, Ansgar H. L., Saad, David, Nabney, Ian T.
The learning properties of a universal approximator, a normalized committee machine with adjustable biases, are studied for online back-propagation learning. Within a statistical mechanics framework, numerical studies show that this model has features which do not exist in previously studied two-layer network models without adjustable biases, e.g., attractive suboptimal symmetric phases even for realizable cases and noiseless data. 1 INTRODUCTION Recently there has been much interest in the theoretical breakthrough in the understanding of the online learning dynamics of multi-layer feedforward perceptrons (MLPs) using a statistical mechanics framework. In the seminal paper (Saad & Solla, 1995), a two-layer network with an arbitrary number of hidden units was studied, allowing insight into the learning behaviour of neural network models whose complexity is of the same order as those used in real world applications. The model studied, a soft committee machine (Biehl & Schwarze, 1995), consists of a single hidden layer with adjustable input-hidden, but fixed hidden-output weights. The average learning dynamics of these networks are studied in the thermodynamic limit of infinite input dimensions in a student-teacher scenario, where a stu.dent network is presented serially with training examples (e lS, (IS) labelled by a teacher network of the same architecture but possibly different number of hidden units.
Adaptive Back-Propagation in On-Line Learning of Multilayer Networks
West, Ansgar H. L., Saad, David
This research has been motivated by the dominance of the suboptimal symmetric phase in online learning of two-layer feedforward networks trained by gradient descent [2]. This trapping is emphasized for inappropriate small learning rates but exists in all training scenarios, effecting the learning process considerably. We Adaptive Back-Propagation in Online Learning of Multilayer Networks 329 proposed an adaptive back-propagation training algorithm [Eq.
Adaptive Back-Propagation in On-Line Learning of Multilayer Networks
West, Ansgar H. L., Saad, David
This research has been motivated by the dominance of the suboptimal symmetric phase in online learning of two-layer feedforward networks trained by gradient descent [2]. This trapping is emphasized for inappropriate small learning rates but exists in all training scenarios, effecting the learning process considerably. We Adaptive Back-Propagation in Online Learning of Multilayer Networks 329 proposed an adaptive back-propagation training algorithm [Eq.
Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks
Saad, David, Solla, Sara A.
Sollat CONNECT, The Niels Bohr Institute Blegdamsdvej 17 Copenhagen 2100, Denmark Abstract We consider the problem of online gradient descent learning for general two-layer neural networks. An analytic solution is presented andused to investigate the role of the learning rate in controlling theevolution and convergence of the learning process. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. We investigate the emergence of generalization ability in an online learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time.
Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks
Saad, David, Solla, Sara A.
We consider the problem of online gradient descent learning for general two-layer neural networks. An analytic solution is presented and used to investigate the role of the learning rate in controlling the evolution and convergence of the learning process. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. We investigate the emergence of generalization ability in an online learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time.
Hyperparameters Evidence and Generalisation for an Unrealisable Rule
Marion, Glenn, Saad, David
Using a statistical mechanical formalism we calculate the evidence, generalisation error and consistency measure for a linear perceptron trainedand tested on a set of examples generated by a non linear teacher. The teacher is said to be unrealisable because the student can never model it without error. Our model allows us to interpolate between the known case of a linear teacher, and an unrealisable, nonlinearteacher. A comparison of the hyperparameters which maximise the evidence with those that optimise the performance measuresreveals that, in the nonlinear case, the evidence procedure is a misleading guide to optimising performance. Finally, we explore the extent to which the evidence procedure is unreliable and find that, despite being sub-optimal, in some circumstances it might be a useful method for fixing the hyperparameters. 1 INTRODUCTION The analysis of supervised learning or learning from examples is a major field of research within neural networks.
Learning from queries for maximum information gain in imperfectly learnable problems
Sollich, Peter, Saad, David
In supervised learning, learning from queries rather than from random examples can improve generalization performance significantly. Westudy the performance of query learning for problems where the student cannot learn the teacher perfectly, which occur frequently in practice. As a prototypical scenario of this kind, we consider a linear perceptron student learning a binary perceptron teacher. Two kinds of queries for maximum information gain, i.e., minimum entropy, are investigated: Minimum student space entropy (MSSE)queries, which are appropriate if the teacher space is unknown, and minimum teacher space entropy (MTSE) queries, which can be used if the teacher space is assumed to be known, but a student of a simpler form has deliberately been chosen. We find that for MSSE queries, the structure of the student space determines theefficacy of query learning, whereas MTSE queries lead to a higher generalization error than random examples, due to a lack of feedback about the progress of the student in the way queries are selected.
Hyperparameters Evidence and Generalisation for an Unrealisable Rule
Marion, Glenn, Saad, David
Using a statistical mechanical formalism we calculate the evidence, generalisation error and consistency measure for a linear perceptron trained and tested on a set of examples generated by a non linear teacher. The teacher is said to be unrealisable because the student can never model it without error. Our model allows us to interpolate between the known case of a linear teacher, and an unrealisable, nonlinear teacher. A comparison of the hyperparameters which maximise the evidence with those that optimise the performance measures reveals that, in the nonlinear case, the evidence procedure is a misleading guide to optimising performance. Finally, we explore the extent to which the evidence procedure is unreliable and find that, despite being sub-optimal, in some circumstances it might be a useful method for fixing the hyperparameters. 1 INTRODUCTION The analysis of supervised learning or learning from examples is a major field of research within neural networks.