Play and Prune: Adaptive Filter Pruning for Deep Model Compression

arXiv.org Artificial Intelligence

While convolutional neural networks (CNN) have achieved impressive performance on various classification/recognition tasks, they typically consist of a massive number of parameters. This results in significant memory requirement as well as computational overheads. Consequently, there is a growing need for filter-level pruning approaches for compressing CNN based models that not only reduce the total number of parameters but reduce the overall computation as well. We present a new min-max framework for filter-level pruning of CNNs. Our framework, called Play and Prune (PP), jointly prunes and fine-tunes CNN model parameters, with an adaptive pruning rate, while maintaining the model's predictive performance. Our framework consists of two modules: (1) An adaptive filter pruning (AFP) module, which minimizes the number of filters in the model; and (2) A pruning rate controller (PRC) module, which maximizes the accuracy during pruning. Moreover, unlike most previous approaches, our approach allows directly specifying the desired error tolerance instead of pruning level. Our compressed models can be deployed at run-time, without requiring any special libraries or hardware. Our approach reduces the number of parameters of VGG-16 by an impressive factor of 17.5X, and number of FLOPS by 6.43X, with no loss of accuracy, significantly outperforming other state-of-the-art filter pruning methods.


Deep Compression and Pruning for Machine Learning in AI Self-Driving Cars: Using Convolutional Neural Networks (CNN) - AI Trends

#artificialintelligence

From a cognition and growth perspective, I played the game with my son and also my daughter as not only a means to have fun, but also since I figured it would be a good learning tool for them. One aspect of learning in this particular game is the effects of compression. When you compress items together, you need to think about how the physics of compression impacts other objects. In some instances, the compression would bear upon a handful of the pegs and the other pegs were not under any pressure at all. This at first was counter intuitive to my children as they initially assumed that applying pressure would cause all of the pegs to be under pressure.


Frequency-Domain Dynamic Pruning for Convolutional Neural Networks

Neural Information Processing Systems

Deep convolutional neural networks have demonstrated their powerfulness in a variety of applications. However, the storage and computational requirements have largely restricted their further extensions on mobile devices. Recently, pruning of unimportant parameters has been used for both network compression and acceleration. Considering that there are spatial redundancy within most filters in a CNN, we propose a frequency-domain dynamic pruning scheme to exploit the spatial correlations. The frequency-domain coefficients are pruned dynamically in each iteration and different frequency bands are pruned discriminatively, given their different importance on accuracy. Experimental results demonstrate that the proposed scheme can outperform previous spatial-domain counterparts by a large margin. Specifically, it can achieve a compression ratio of 8.4x and a theoretical inference speed-up of 9.2x for ResNet-110, while the accuracy is even better than the reference model on CIFAR-110.


Leveraging Filter Correlations for Deep Model Compression

arXiv.org Machine Learning

We present a filter correlation based model compression approach for deep convolutional neural networks. Our approach iteratively identifies pairs of filters with largest pairwise correlations and discards one of the filters from each such pair. However, instead of discarding one of the filter from such pairs na\"{i}vely, we further optimize the model so that the two filters from each such pair are as highly correlated as possible so that discarding one of the filters from the pairs results in as little information loss as possible. After discarding the filters in each round, we further finetune the model to recover from the potential small loss incurred by the compression. We evaluate our proposed approach using a comprehensive set of experiments and ablation studies. Our compression method yields state-of-the-art FLOPs compression rates on various benchmarks, such as LeNet-5, VGG-16, and ResNet-50,56, which are still achieving excellent predictive performance for tasks such as object detection on benchmark datasets.


Progressive Weight Pruning of Deep Neural Networks using ADMM

arXiv.org Machine Learning

Deep neural networks (DNNs) although achieving human-level performance in many domains, have very large model size that hinders their broader applications on edge computing devices. Extensive research work have been conducted on DNN model compression or pruning. However, most of the previous work took heuristic approaches. This work proposes a progressive weight pruning approach based on ADMM (Alternating Direction Method of Multipliers), a powerful technique to deal with non-convex optimization problems with potentially combinatorial constraints. Motivated by dynamic programming, the proposed method reaches extremely high pruning rate by using partial prunings with moderate pruning rates. Therefore, it resolves the accuracy degradation and long convergence time problems when pursuing extremely high pruning ratios. It achieves up to 34 times pruning rate for ImageNet dataset and 167 times pruning rate for MNIST dataset, significantly higher than those reached by the literature work. Under the same number of epochs, the proposed method also achieves faster convergence and higher compression rates. The codes and pruned DNN models are released in the link bit.ly/2zxdlss