On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems
–Neural Information Processing Systems
Our al exploits the special structure of the sum of squared error measure in Equation (1); hence, the other objective functions are outside the scope of this paper. The gradient vector and Hessian matrix are given by g g(9) JT rand H H(9) JT J S, where J is the m x n Jacobian matrix of r, and S denotes the matrix of second-derivative terms. If S is simply omitted based on the "small residual" assumption, then the Hessian matrix reduces to the Gauss-Newton model Hessian: i.e., JT J. Furthermore, a family of quasi-Newton methods can be applied to approximate term S alone, leading to the augmented Gauss-Newton model Hessian (see, for example, Mizutani [2] and references therein).
Neural Information Processing Systems
Dec-31-2001