Scalable Adaptive Stochastic Optimization Using Random Projections

Gabriel Krummenacher, Brian McWilliams, Yannic Kilcher, Joachim M. Buhmann, Nicolai Meinshausen

Mar-23-2026, 20:51:58 GMT–Neural Information Processing Systems

Adaptive stochastic gradient methods such as ADAGRAD have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of ADAGRAD is expected to attain better performance, however in high dimensions it is computationally impractical.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Mar-23-2026, 20:51:58 GMT

Conferences PDF

Add feedback

Country:
- Europe > Switzerland (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Scalable Adaptive Stochastic Optimization Using Random Projections Gabriel Krummenacher gabriel.krummenacher@inf.ethz.ch Brian McWilliams
Scalable Adaptive Stochastic Optimization Using Random Projections

Similar Docs Excel Report more

Title	Similarity	Source
None found