Using Curvature Information for Fast Stochastic Search

Feb-17-2024, 06:16:37 GMT–Neural Information Processing Systems

We present an algorithm for fast stochastic gradient descent that uses a nonlinear adaptive momentum scheme to optimize the late time convergence rate. The algorithm makes effective use of cur(cid:173) vature information, requires only O(n) storage and computation, and delivers convergence rates close to the theoretical optimum. We demonstrate the technique on linear and large nonlinear back(cid:173) prop networks. Learning algorithms that perform gradient descent on a cost function can be for(cid:173) mulated in either stochastic (on-line) or batch form. Stochastic learning provides several advantages over batch learning.

algorithm, fast stochastic search, learning, (11 more...)

Neural Information Processing Systems

Feb-17-2024, 06:16:37 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.54)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.80)