Tesauro, Gerald
Can neural networks do better than the Vapnik-Chervonenkis bounds?
Cohn, David, Tesauro, Gerald
These experiments are designed to test whether average generalization performance can surpass the worst-case bounds obtained from formal learning theory using the Vapnik-Chervonenkis dimension (Blumer et al., 1989). We indeed find that, in some cases, the average generalization is significantly better than the VC bound: the approach to perfect performance is exponential in the number of examples m, rather than the 11m result of the bound. In other cases, we do find the 11m behavior of the VC bound, and in these cases, the numerical prefactor is closely related to prefactor contained in the bound.
Asymptotic Convergence of Backpropagation: Numerical Experiments
Ahmad, Subutai, Tesauro, Gerald, He, Yu
We have calculated, both analytically and in simulations, the rate of convergence at long times in the backpropagation learning algorithm fornetworks with and without hidden units. Our basic finding for units using the standard sigmoid transfer function is lit convergence of the error for large t, with at most logarithmic corrections fornetworks with hidden units. Other transfer functions may lead to a 8lower polynomial rate of convergence. Our analytic calculations were presented in (Tesauro, He & Ahamd, 1989). Here we focus in more detail on our empirical measurements of the convergence ratein numerical simulations, which confirm our analytic results.
Asymptotic Convergence of Backpropagation: Numerical Experiments
Ahmad, Subutai, Tesauro, Gerald, He, Yu
We have calculated, both analytically and in simulations, the rate of convergence at long times in the backpropagation learning algorithm for networks with and without hidden units. Our basic finding for units using the standard sigmoid transfer function is lit convergence of the error for large t, with at most logarithmic corrections for networks with hidden units. Other transfer functions may lead to a 8lower polynomial rate of convergence. Our analytic calculations were presented in (Tesauro, He & Ahamd, 1989). Here we focus in more detail on our empirical measurements of the convergence rate in numerical simulations, which confirm our analytic results.
Asymptotic Convergence of Backpropagation: Numerical Experiments
Ahmad, Subutai, Tesauro, Gerald, He, Yu
We have calculated, both analytically and in simulations, the rate of convergence at long times in the backpropagation learning algorithm for networks with and without hidden units. Our basic finding for units using the standard sigmoid transfer function is lit convergence of the error for large t, with at most logarithmic corrections for networks with hidden units. Other transfer functions may lead to a 8lower polynomial rate of convergence. Our analytic calculations were presented in (Tesauro, He & Ahamd, 1989). Here we focus in more detail on our empirical measurements of the convergence rate in numerical simulations, which confirm our analytic results.
Neural Network Visualization
Wejchert, Jakub, Tesauro, Gerald
We have developed graphics to visualize static and dynamic information in layered neural network learning systems. Emphasis was placed on creating new visuals that make use of spatial arrangements, size information, animation and color. We applied these tools to the study of back-propagation learning of simple Boolean predicates, and have obtained new insights into the dynamics of the learning process.
Neural Network Visualization
Wejchert, Jakub, Tesauro, Gerald
We have developed graphics to visualize static and dynamic information inlayered neural network learning systems. Emphasis was placed on creating new visuals that make use of spatial arrangements, sizeinformation, animation and color. We applied these tools to the study of back-propagation learning of simple Boolean predicates, and have obtained new insights into the dynamics of the learning process.
Scaling and Generalization in Neural Networks: A Case Study
Ahmad, Subutai, Tesauro, Gerald
The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to represent the inllUh and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, but as yet there are few rigorous results.In this paper we summarize a study Qf generalization in the simplest possible case-perceptron networks learning linearly separable functions.The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number ofuseful properties. We find that many aspects of.generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved.
Connectionist Learning of Expert Preferences by Comparison Training
Tesauro, Gerald
A new training paradigm, caned the "eomparison pa.radigm," is introduced for tasks in which a. network must learn to choose a prdcrred pattern from a set of n alternatives, based on examplcs of Imma.n expert prderences. In this pa.radigm, the inpu t to the network consists of t.wo uf the n alterna tives, and the trained output is the expert's judgement of which pa.ttern is better. This para.digm is applied to the lea,rning of hackgammon, a difficult board ga.me in wllieh the expert selects a move from a. set, of legal mmยทes.
Scaling and Generalization in Neural Networks: A Case Study
Ahmad, Subutai, Tesauro, Gerald
The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to represent the inllUh and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, but as yet there are few rigorous results. In this paper we summarize a study Qf generalization in the simplest possible case-perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of.generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved.