On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems

Mizutani, Eiji, Demmel, James

Neural Information Processing Systems 

Our al exploits the special structure of the sum of squared error measure in Equation (1); hence, the other objective functions are outside the scope of this paper. The gradient vector and Hessian matrix are given by g g(9) JT rand H H(9) JT J S, where J is the m x n Jacobian matrix of r, and S denotes the matrix of second-derivative terms. If S is simply omitted based on the "small residual" assumption, then the Hessian matrix reduces to the Gauss-Newton model Hessian: i.e., JT J. Furthermore, a family of quasi-Newton methods can be applied to approximate term S alone, leading to the augmented Gauss-Newton model Hessian (see, for example, Mizutani [2] and references therein).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found