Gradient Descent: Second Order Momentum and Saturating Error

Dec-31-1992–Neural Information Processing Systems

We then regard gradient descent with momentum as a dynamic system and explore a non quadratic error surface, showing that saturation of the error accounts for a variety of effects observed in simulations and justifies some popular heuristics. 1 INTRODUCTION Gradient descent is the bread-and-butter optimization technique in neural networks. Some people build special purpose hardware to accelerate gradient descent optimization of backpropagation networks. Understanding the dynamics of gradient descent on such surfaces is therefore of great practical value. Here we briefly review the known results in the convergence of batch gradient descent; show that second-order momentum does not give any speedup; simulate a real network and observe some effect not predicted by theory; and account for these effects by analyzing gradient descent with momentum on a saturating error surface.

artificial intelligence, convergence, machine learning, (13 more...)

Neural Information Processing Systems

Dec-31-1992

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
Gradient Descent: Second Order Momentum and Saturating Error
Gradient Descent: Second Order Momentum and Saturating Error

Similar Docs Excel Report more

Title	Similarity	Source
None found