Model Compression via Pruning
To obtain fast and accurate inference on edge devices, a model has to be optimized for real-time inference. Fine-tuned state-of-the-art models like VGG16/19, ResNet50 have 138 million and 23 million parameters respectively and inference is often expensive on resource-constrained devices. Previously I've talked about one model compression technique called "Knowledge Distillation" using a smaller student network to mimic the performance of a larger teacher network (Both student and teacher network has different network architecture). Today, the focus will be on "Pruning" one model compression technique that allows us to compress the model to a smaller size with zero or marginal loss of accuracy. In short, pruning eliminates the weights with low magnitude (That does not contribute much to the final model performance).
Nov-23-2020, 19:50:57 GMT
- Technology: