Goto

Collaborating Authors

 Fine, Terrence L.


Asymptotics of Gradient-based Neural Network Training Algorithms

Neural Information Processing Systems

We study the asymptotic properties of the sequence of iterates of weight-vector estimates obtained by training a multilayer feed forward neural network with a basic gradient-descent method using a fixed learning constant and no batch-processing. In the onedimensional case, an exact analysis establishes the existence of a limiting distribution that is not Gaussian in general. For the general case and small learning constant, a linearization approximation permits the application of results from the theory of random matrices to again establish the existence of a limiting distribution. We study the first few moments of this distribution to compare and contrast the results of our analysis with those of techniques of stochastic approximation. 1 INTRODUCTION The wide applicability of neural networks to problems in pattern classification and signal processing has been due to the development of efficient gradient-descent algorithms for the supervised training of multilayer feedforward neural networks with differentiable node functions. A basic version uses a fixed learning constant and updates all weights after each training input is presented (online mode) rather than after the entire training set has been presented (batch mode). The properties of this algorithm as exhibited by the sequence of iterates are not yet well-understood. There are at present two major approaches.


Sample Size Requirements for Feedforward Neural Networks

Neural Information Processing Systems

We estimate the number of training samples required to ensure that the performance of a neural network on its training data matches that obtained when fresh data is applied to the network. Existing estimates are higher by orders of magnitude than practice indicates. This work seeks to narrow the gap between theory and practice by transforming the problem into determining the distribution of the supremum of a random field in the space of weight vectors, which in turn is attacked by application of a recent technique called the Poisson clumping heuristic.


Asymptotics of Gradient-based Neural Network Training Algorithms

Neural Information Processing Systems

We study the asymptotic properties of the sequence of iterates of weight-vector estimates obtained by training a multilayer feedforward neuralnetwork with a basic gradient-descent method using a fixed learning constant and no batch-processing. In the onedimensional case,an exact analysis establishes the existence of a limiting distribution that is not Gaussian in general. For the general caseand small learning constant, a linearization approximation permits the application of results from the theory of random matrices toagain establish the existence of a limiting distribution. We study the first few moments of this distribution to compare and contrast the results of our analysis with those of techniques of stochastic approximation. 1 INTRODUCTION The wide applicability of neural networks to problems in pattern classification and signal processing has been due to the development of efficient gradient-descent algorithms forthe supervised training of multilayer feedforward neural networks with differentiable node functions. A basic version uses a fixed learning constant and updates allweights after each training input is presented (online mode) rather than after the entire training set has been presented (batch mode). The properties of this algorithm as exhibited by the sequence of iterates are not yet well-understood. There are at present two major approaches.


Sample Size Requirements for Feedforward Neural Networks

Neural Information Processing Systems

We estimate the number of training samples required to ensure that the performance of a neural network on its training data matches that obtained when fresh data is applied to the network. Existing estimates are higher by orders of magnitude than practice indicates. This work seeks to narrow the gap between theory and practice by transforming the problem into determining the distribution of the supremum of a random field in the space of weight vectors, which in turn is attacked by application of a recent technique called the Poisson clumping heuristic.


Designing Linear Threshold Based Neural Network Pattern Classifiers

Neural Information Processing Systems

Terrence L. Fine School of Electrical Engineering Cornell University Ithaca, NY 14853 Abstract The three problems that concern us are identifying a natural domain of pattern classification applications of feed forward neural networks, selecting anappropriate feedforward network architecture, and assessing the tradeoff between network complexity, training set size, and statistical reliability asmeasured by the probability of incorrect classification. We close with some suggestions, for improving the bounds that come from Vapnik Chervonenkis theory, that can narrow, but not close, the chasm between theory and practice. Neural networks are appropriate as pattern classifiers when the pattern sources are ones of which we have little understanding, beyond perhaps a nonparametric statistical model, but we have been provided with classified samples of features drawn from each of the pattern categories. Neural networks should be able to provide rapid and reliable computation of complex decision functions. The issue in doubt is their statistical response to new inputs.


Designing Linear Threshold Based Neural Network Pattern Classifiers

Neural Information Processing Systems

The three problems that concern us are identifying a natural domain of pattern classification applications of feed forward neural networks, selecting an appropriate feedforward network architecture, and assessing the tradeoff between network complexity, training set size, and statistical reliability as measured by the probability of incorrect classification. We close with some suggestions, for improving the bounds that come from Vapnik Chervonenkis theory, that can narrow, but not close, the chasm between theory and practice. Neural networks are appropriate as pattern classifiers when the pattern sources are ones of which we have little understanding, beyond perhaps a nonparametric statistical model, but we have been provided with classified samples of features drawn from each of the pattern categories. Neural networks should be able to provide rapid and reliable computation of complex decision functions. The issue in doubt is their statistical response to new inputs.