HesScale: Scalable Computation of Hessian Diagonals

Elsayed, Mohamed, Mahmood, A. Rupam

arXiv.org Artificial Intelligence 

Second-order optimization uses curvature information about the objective function, which can help in faster convergence. However, such methods typically require expensive computation of the Hessian matrix, preventing their usage in a scalable way. The absence of efficient ways of computation drove the most widely used methods to focus on first-order approximations that do not capture the curvature information. In this paper, we develop HesScale, a scalable approach to approximating the diagonal of the Hessian matrix, to incorporate second-order information in a computationally efficient manner. We show that HesScale has the same computational complexity as backpropagation. Our results on supervised classification show that HesScale achieves high approximation accuracy, allowing for scalable and efficient second-order optimization. First-order optimization offers a cheap and efficient way of performing local progress in optimization problems by using gradient information. However, their performance suffers from instability or slow progress when used in ill-conditioned landscapes. Such a problem is present because firstorder methods do not capture curvature information which causes two interrelated issues. First, the updates in first-order have incorrect units (Duchi et al. 2011), which creates a scaling issue.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found