Practical Quasi-Newton Methods for Training Deep Neural Networks

Neural Information Processing Systems 

In our proposed methods, we approximate the Hessian by a block-diagonal matrix and use the structure of the gradient and Hessian to further approximate these blocks, each of which corresponds to a layer, as the Kronecker product of two much smaller matrices.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found