sparsity ratio
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > China > Hong Kong (0.04)
- Contests & Prizes (0.60)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Communications > Mobile (0.69)
- North America > United States > California (0.14)
- North America > United States > Michigan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > France (0.05)
- Asia > Middle East > Israel (0.05)
Appendix RepresentationLearningProcess
Here we provide more experimental results. Specifically, we evaluate the representational similarity using the CKA value of the same layer from the model (ResNet-32) with different sparsity at each epoch and compare them with the final model. In our work, we evaluate four different types of freezing schemes (Sec. Inthis case, we can keep the single-shot & resume has the same FLOPs reduction asthe single-shot scheme, and the entire network can be fine-tuned at the end of training with a small learningrate. For the periodically freezing scheme, we let the selected layers freeze periodically with a given frequency so that all the layers/blocks are able to be updated at different stages of the training process.
- Asia > Singapore (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices. DNN pruning is an approach for deep model compression, which aims at eliminating some parameters with tolerable performance degradation. In this paper, we propose a novel momentum-SGD-based optimization method to reduce the network complexity by on-the-fly pruning. Concretely, given a global compression ratio, we categorize all the parameters into two parts at each training iteration which are updated using different rules. In this way, we gradually zero out the redundant parameters, as we update them using only the ordinary weight decay but no gradients derived from the objective function. As a departure from prior methods that require heavy human works to tune the layer-wise sparsity ratios, prune by solving complicated non-differentiable problems or finetune the model after pruning, our method is characterized by 1) global compression that automatically finds the appropriate per-layer sparsity ratios; 2) end-to-end training; 3) no need for a time-consuming re-training process after pruning; and 4) superior capability to find better winning tickets which have won the initialization lottery.