Reviews: Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

Neural Information Processing Systems 

The paper proposes a method for pruning deep networks based on the largest values of the gradient vector. The idea is new compared to previous attempts; although it is somewhat related to Fisher pruning, that is also based on magnitudes of gradients, the method here is more of an SGD variant rather than a post-training evaluation method. The techniques do not come with rigorous guarantees, but the reviewers agree that the experiments and surrounding studies are interesting enough to incite future research around this method.