Reviews: Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
–Neural Information Processing Systems
Update: Authors justified the choice of the competitor in empirical evaluation (thought it's better to add it to the body of the paper in camera ready if accepted). I find technique interesting, though i think results are exploratory and some-what preliminary, I think it's important for NeurIPS community to get familiar with these results. They identify and address major issues of current approaches, such as 1) prune then finetune for accuracy recover 2) prunning by custom learning (mostly custom regulizers). Authors introduce GSM - a new approach, that does not require finetuning afterwards and can be solved by means of vanilla SGD. GSM only updats the top Q values of the gradient based on the suggested metric (first order Taylor) --- dL/dw * w .
Neural Information Processing Systems
Feb-5-2025, 08:22:39 GMT
- Technology: