Asymptotics of Gradient-based Neural Network Training Algorithms

Dec-31-1995–Neural Information Processing Systems

We study the asymptotic properties of the sequence of iterates of weight-vector estimates obtained by training a multilayer feed forward neural network with a basic gradient-descent method using a fixed learning constant and no batch-processing. In the onedimensional case, an exact analysis establishes the existence of a limiting distribution that is not Gaussian in general. For the general case and small learning constant, a linearization approximation permits the application of results from the theory of random matrices to again establish the existence of a limiting distribution. We study the first few moments of this distribution to compare and contrast the results of our analysis with those of techniques of stochastic approximation. 1 INTRODUCTION The wide applicability of neural networks to problems in pattern classification and signal processing has been due to the development of efficient gradient-descent algorithms for the supervised training of multilayer feedforward neural networks with differentiable node functions. A basic version uses a fixed learning constant and updates all weights after each training input is presented (online mode) rather than after the entire training set has been presented (batch mode). The properties of this algorithm as exhibited by the sequence of iterates are not yet well-understood. There are at present two major approaches.

artificial intelligence, neural network, sequence, (16 more...)

Neural Information Processing Systems

Dec-31-1995

Conferences PDF

Add feedback

Country:
- North America > United States
  - California (0.15)
  - New York (0.15)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
Asymptotics of Gradient-based Neural Network Training Algorithms
Asymptotics of Gradient-based Neural Network Training Algorithms

Similar Docs Excel Report more

Title	Similarity	Source
None found