Magnificent Minified Models

Jun-16-2023–arXiv.org Artificial Intelligence

There are many ways to make a deep neural network smaller. In this paper, we focus on three categories of model size reduction: pruning, quantization, and training smaller models from scratch. Quantization means changing model parameters to lower-precision formats, like changing all 32-bit floating point parameters to 16-bit, which results in file size about half as large. Pruning deals with deleting parameters or groups of parameters (like entire neurons) from a trained model to make it smaller (often followed by a fine-tuning round of training, as done in our experiments). Parameter-level pruning (also called unstructured pruning) prunes individual parameters at a time, whereas neuron-level pruning (also called structured pruning) prunes all parameters associated with a given neuron at once. To simplify terminology across multiple methods we use the term'damage' to broadly refer to the undesired impact of removing a node or zeroing a weight on network performance. Different compression methods use different approaches to either estimate damage directly, or rank neurons or weights in order of increasing assumed damage according to some other metric that does not directly evaluate the impact on loss or performance.

artificial intelligence, machine learning, pruning, (17 more...)

arXiv.org Artificial Intelligence

Jun-16-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > Jordan (0.04)

Genre:
- Research Report > New Finding (0.69)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found