Towards Practical Second-Order Optimizers in Deep Learning: Insights from Fisher Information Analysis
–arXiv.org Artificial Intelligence
First-order optimization methods remain the standard for training deep neural networks (DNNs). Optimizers like Adam incorporate limited curvature information by preconditioning the stochastic gradient with a diagonal matrix. Despite the widespread adoption of first-order methods, second-order optimization algorithms often exhibit superior convergence compared to methods like Adam and SGD. However, their practicality in training DNNs is still limited by a significantly higher per-iteration computational cost compared to first-order methods. In this thesis, we present AdaFisher, a novel adaptive second-order optimizer that leverages a diagonal block-Kronecker approximation of the Fisher information matrix to adaptively precondition gradients. AdaFisher aims to bridge the gap between the improved convergence and generalization of second-order methods and the computational efficiency needed for training DNNs. Despite the traditionally slower speed of second-order optimizers, AdaFisher is effective for tasks such as image classification and language modeling, exhibiting remarkable stability and robustness during hyperparameter tuning. We demonstrate that AdaFisher outperforms state-of-the-art optimizers in both accuracy and convergence speed. The code is available from https://github.com/AtlasAnalyticsLab/AdaFisher.
arXiv.org Artificial Intelligence
Apr-30-2025
- Country:
- Europe > France (0.28)
- North America > Canada (0.27)
- Genre:
- Overview (1.00)
- Research Report
- New Finding (1.00)
- Experimental Study (0.67)
- Industry:
- Energy (0.45)
- Technology: