Reviews: TETRIS: TilE-matching the TRemendous Irregular Sparsity

Neural Information Processing Systems 

This paper deals with sparsifying the weights of a neural net to reduce memory requirements and speed up inference. Simply pruning small weights yields unstructured sparsity, which is hard to exploit with standard libraries and hardware. This paper imposes block sparsity, where each weight tensor is divided into fixed blocks (of size 32 x 32, for example) and non-zero weights are specified in only a fraction of the blocks. The paper's innovation is an iterative algorithm for reordering the rows and columns of a tensor to group together the large weights, reducing the accuracy loss from block pruning. Experiments on the VGG16 network for ImageNet show that this method achieves better speed-accuracy trade-offs than either unstructured weight pruning or block pruning without reordering. Update: I appreciate the authors' responses.