woodfisher
- Workflow (0.67)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Natural Language (0.67)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Switzerland (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Switzerland (0.04)
WoodFisher: Efficient Second-Order Approximation for Neural Network Compression
Second-order information, in the form of Hessian-or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been significant interest in utilizing this information in the context of deep neural networks; however, relatively little is known about the quality of existing approximations in this context. Our work considers this question, examines the accuracy of existing approaches, and proposes a method called WoodFisher to compute a faithful and efficient estimate of the inverse Hessian. Our main application is to neural network compression, where we build on the classic Optimal Brain Damage/Surgeon framework. We demonstrate that WoodFisher significantly outperforms popular state-of-the-art methods for one-shot pruning. Further, even when iterative, gradual pruning is allowed, our method results in a gain in test accuracy over the state-of-the-art approaches for popular image classification datasets such as ImageNet ILSVRC. Further, we show how our method can be extended to take into account first-order information, and illustrate its ability to automatically set layer-wise pruning thresholds, or perform compression in the limited-data regime.
- Workflow (0.67)
- Research Report > New Finding (0.46)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- Europe > Austria (0.04)
- (2 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- Europe > Austria (0.04)
- (2 more...)
d1ff1ec86b62cd5f3903ff19c3a326b2-AuthorFeedback.pdf
We would like to thank the reviewers for their comments, and take the opportunity to answer their questions below. We thank the reviewer for the relevant [Amari et al., 2000] reference, which we will cite and discuss. Similarly, [Amari et al., 2000] considers single-layer networks Further, we examined the method's accuracy relative to recent techniques, and extended it to We are open to changing the term "WoodFisher" which we used as a mnemonic Please see Appendix S5 for ablation studies. For simplicity, we consider the scaling constant as 1 here. Thanks for the suggestions, we will correct the font sizes & the broken references.
Review for NeurIPS paper: WoodFisher: Efficient Second-Order Approximation for Neural Network Compression
Weaknesses: --- Missing details about lambda While mentioned line 138, the dampening parameter lambda does not appear in the experimental section of the main body, and I only found a value 1e-5 in the appendix (l799). How do you select its value? I expect your final algorithm be very sensitive to lambda, since \delta_L as defined in eq.4 will select directions with smallest curvature. Another comment about lambda is that if you set it to a very large value k, then its becomes dominant compared to the eigenvalues of F, then your technique basically amounts to magnitude pruning. In that regards, it means that MP is just a special case of your technique, when using a large dampening value.
Review for NeurIPS paper: WoodFisher: Efficient Second-Order Approximation for Neural Network Compression
The focus of the submission is training neural networks using 2nd-order information. Particularly, the goal of the work is the approximation of the inverse of the empirical Fisher matrix as it is defined in the displayed equation under (1). The authors notice that the empirical Fisher is an average of diads (a x a T where T denotes transposition) hence its inverse can be recursively computed by the Woodbury matrix identity. The resulting inverse is applied for pruning of convolutional neural networks (CNNs) and is compared against other unstructured pruning methods. Training and pruning neural networks are central problems of machine learning.