Practical Quasi-Newton Methods for Training Deep Neural Networks
–Neural Information Processing Systems
In our proposed methods, we approximate the Hessian by a block-diagonal matrix and use the structure of the gradient and Hessian to further approximate these blocks, each of which corresponds to a layer, as the Kronecker product of two much smaller matrices.
Neural Information Processing Systems
Oct-2-2025, 07:12:15 GMT