Reviews: Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

Feb-5-2025, 08:22:39 GMT–Neural Information Processing Systems

Update: Authors justified the choice of the competitor in empirical evaluation (thought it's better to add it to the body of the paper in camera ready if accepted). I find technique interesting, though i think results are exploratory and some-what preliminary, I think it's important for NeurIPS community to get familiar with these results. They identify and address major issues of current approaches, such as 1) prune then finetune for accuracy recover 2) prunning by custom learning (mostly custom regulizers). Authors introduce GSM - a new approach, that does not require finetuning afterwards and can be solved by means of vanilla SGD. GSM only updats the top Q values of the gradient based on the suggested metric (first order Taylor) --- dL/dw * w .

deep neural network, global sparse momentum sgd, sensitivity, (11 more...)

Neural Information Processing Systems

Feb-5-2025, 08:22:39 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)