Spectral-factorized Positive-definite Curvature Learning for NN Training
Lin, Wu, Dangel, Felix, Eschenhagen, Runa, Bae, Juhan, Turner, Richard E., Grosse, Roger B.
Many training methods, such as Adam(W) and Shampoo, learn a positive-definite curvature matrix and apply an inverse root before preconditioning. Recently, non-diagonal training methods, such as Shampoo, have gained significant attention; however, they remain computationally inefficient and are limited to specific types of curvature information due to the costly matrix root computation via matrix decomposition. To address this, we propose a Riemannian optimization approach that dynamically adapts spectral-factorized positive-definite curvature estimates, enabling the efficient application of arbitrary matrix roots and generic curvature learning. We demonstrate the efficacy and versatility of our approach in positive-definite matrix optimization and covariance adaptation for gradient-free optimization, as well as its efficiency in curvature learning for neural net training.
Feb-10-2025
- Country:
- Asia
- Japan > Honshū
- Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Middle East > Jordan (0.04)
- Japan > Honshū
- Europe
- Denmark (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America
- Canada > Ontario
- Toronto (0.14)
- United States > Rhode Island
- Providence County > Providence (0.04)
- Canada > Ontario
- Asia
- Genre:
- Research Report (0.82)
- Industry:
- Education (0.46)
- Technology: