Lossless CNN Channel Pruning via Gradient Resetting and Convolutional Re-parameterization
Ding, Xiaohan, Hao, Tianxiang, Liu, Ji, Han, Jungong, Guo, Yuchen, Ding, Guiguang
However, as CNN's representational capacity depends Inspired by the neurobiology research about the independence on the width of conv layers, it is difficult to reduce the of remembering and forgetting, we propose to width without performance drops. On practical CNN architectures re-parameterize a CNN into the remembering parts and forgetting like ResNet-50 [16] and large-scale datasets like parts, where the former learn to maintain the performance ImageNet [6], lossless pruning with high compression ratio and the latter learn for efficiency. By training the has long been considered challenging. For reasonable tradeoff re-parameterized model using regular SGD on the former between compression ratio and performance, a typical but a novel update rule with penalty gradients on the latter, paradigm (Figure 1.A) [2, 3, 9, 30, 33, 56, 57] seeks to train we realize structured sparsity, enabling us to equivalently the model with magnitude-related penalty loss (e.g., group convert the re-parameterized model into the original architecture Lasso [51, 54]) on the conv kernels to produce structured with narrower layers.
Sep-1-2020
- Country:
- North America
- Europe
- Spain (0.04)
- United Kingdom > Wales
- Ceredigion > Aberystwyth (0.04)
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Technology: