GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
Patrick Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh
–Neural Information Processing Systems
For problems with a very large vocabulary size, the embedding and the softmax matrices can account for more than half of the model size. For instance, the bigLSTM model achieves great performance on the One-Billion-Word (OBW) dataset with around 800k vocabulary, and its word embedding and softmax matrices use more than 6GBytes space, and are responsible for over 90% of the model parameters. In this paper, we propose GroupReduce, a novel compression method for neural language models, based on vocabulary-partition (block) based low-rank matrix approximation and the inherent frequency distribution of tokens (the power-law distribution of words).
Neural Information Processing Systems
Nov-20-2025, 18:57:04 GMT
- Country:
- Asia > Vietnam
- North America
- Canada > Quebec
- Montreal (0.04)
- United States > California
- Los Angeles County > Los Angeles (0.14)
- Santa Clara County > Mountain View (0.04)
- Canada > Quebec
- Technology: