compression rate
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses. For advanced NLP problems, a neural language model usually consists of recurrent layers (e.g., using LSTM cells), an embedding matrix for representing input tokens, and a softmax layer for generating output tokens. For problems with a very large vocabulary size, the embedding and the softmax matrices can account for more than half of the model size. For instance, the bigLSTM model achieves state-of-the-art performance on the One-Billion-Word (OBW) dataset with around 800k vocabulary, and its word embedding and softmax matrices use more than 6GBytes space, and are responsible for over 90\% of the model parameters. In this paper, we propose GroupReduce, a novel compression method for neural language models, based on vocabulary-partition (block) based low-rank matrix approximation and the inherent frequency distribution of tokens (the power-law distribution of words). We start by grouping words into $c$ blocks based on their frequency, and then refine the clustering iteratively by constructing weighted low-rank approximation for each block, where the weights are based the frequencies of the words in the block. The experimental results show our method can significantly outperform traditional compression methods such as low-rank approximation and pruning. On the OBW dataset, our method achieved 6.6x compression rate for the embedding and softmax matrices, and when combined with quantization, our method can achieve 26x compression rate without losing prediction accuracy.
- Europe > Italy > Abruzzo > L'Aquila Province > L'Aquila (0.04)
- Europe > Austria > Tyrol > Innsbruck (0.04)
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
- (2 more...)
- Energy (0.93)
- Government > Regional Government > North America Government > United States Government (0.68)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Supplementary Materials A Complexity Analysis
Our proposed method significantly reduces communication overhead in federated learning. This method poses a trade-off between time and memory complexity. We also provide detailed information about the optimization hyperparameters e.g. In this section, we explore the effect of fitness sparsification i.e. selecting top-k fitness values from the To enable a fair and insightful comparison between the two population sizes, our focus was on assessing performance based on the number of members remaining post-sparsification rather than directly contrasting sparsification rates. Our results underline the crucial role that population size plays in exploring optimal solutions, overshadowing even the significance of compression rate.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > United States (0.29)
- North America > Canada (0.04)
- Information Technology (0.68)
- Government (0.47)
- Semiconductors & Electronics (0.46)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Vietnam > Hanoi > Hanoi (0.04)