Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks

Adeoye, Adeyemi D., Petersen, Philipp Christian, Bemporad, Alberto

arXiv.org Artificial Intelligence 

Despite their superior convergence rates compared to first-order methods, (approximate) second-order methods are still rarely used -- and as such, underexplored -- for training large-scale machine learning and neural network (NN) models. This is due to their highly prohibitive computations and memory footprints at each iteration. Some past and recent works have, however, made efforts to reduce this overhead by proposing different approximations to the Hessian of the loss function, which the methods ultimately exploit to achieve their impressive convergence properties (see e.g., [1, 2, 3, 4, 5, 6, 7, 8, 9]). One of the most appealing approximations to the Hessian matrix within the context of practical deep learning and nonlinear optimization in general is the generalized Gauss-Newton (GGN) approximation of [10], which uses a positive semi-definite (PSD) matrix to model the curvature about an arbitrary convex loss function. In fact, the Fisher information matrix (FIM) -- a curvature approximating matrix which most other approximate second-order methods seek to estimate -- is shown to have direct connections with the GGN matrix in many practical cases [4, 11]. Despite its close connection with the GGN matrix, the FIM, unlike the GGN matrix, potentially leads to over-approximating the second-order terms in more general loss functions, throwing away relevant curvature information [10]. In addition to the desirable property of maintaining positive-definiteness throughout the training procedure, other nice properties of the GGN matrix, in comparison with the Hessian matrix, are discussed in [12, Section 8.1]; see also [13] for discussions in the context of nonlinear least-squares estimation and [14] for efficient training of (deep) recurrent neural networks with a GGN approach.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found