Gradient Descent: Second Order Momentum and Saturating Error
–Neural Information Processing Systems
We then regard gradient descent with momentum as a dynamic system and explore a non quadratic error surface, showing that saturation of the error accounts for a variety of effects observed in simulations and justifies some popular heuristics. 1 INTRODUCTION Gradient descent is the bread-and-butter optimization technique in neural networks. Some people build special purpose hardware to accelerate gradient descent optimization of backpropagation networks. Understanding the dynamics of gradient descent on such surfaces is therefore of great practical value. Here we briefly review the known results in the convergence of batch gradient descent; show that second-order momentum does not give any speedup; simulate a real network and observe some effect not predicted by theory; and account for these effects by analyzing gradient descent with momentum on a saturating error surface.
Neural Information Processing Systems
Dec-31-1992
- Technology: