Hybrid Pruning: Thinner Sparse Networks for Fast Inference on Edge Devices

Xu, Xiaofan, Park, Mi Sun, Brick, Cormac

arXiv.org Artificial Intelligence 

We introduce hybrid pruning which combines both coarse-grained channel and fine-grained weight pruning to reduce model size, computation and power demands with no to little loss in accuracy for enabling modern networks deployment on resource-constrained devices, such as always-on security cameras and drones. Additionally, to effectively perform channel pruning, we propose a fast sensitivity test that helps us quickly identify the sensitivity of within and across layers of a network to the output accuracy for target multiplier-accumulators (MACs) or accuracy tolerance. Our experiment shows significantly better results on ResNet50 on ImageNet compared to existing work, even with an additional constraint of channels be hardware-friendly number.

