Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

It is good that the authors study reasonably modern and well-known (AlexNet and VGG) deep nets, so that the compression ratios seem directly relevant to recent CNN work. However, the authors do not seem to compare their work to ANY of the now-large literative on related recent methods for compressing CNN and other deep nets. They dismiss OBD in Section 2 as unsuitable for "today's large scale neural networks", but that seems unfair since OBD's saliencies only require the Hessian diagonals, and several modern methods to approximate the required Hessian diagonals could be used for baselines. But even without using ODB-related methods, there are many other baselines they could of and should of reported for comparison, including methods that reduce weights to fewer bits or approximate the CNN filter matrices using cheap SVD-based compression, or ones which approximate the fully connected layers using randomized projection (e.g. The authors claim in Section 2 that such work is often "orthogonal to network pruning", but that seems to miss the point: once those methods are used, is there any real advantage to the proposed method?