GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

Patrick Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh

Nov-20-2025, 18:57:04 GMT–Neural Information Processing Systems

For problems with a very large vocabulary size, the embedding and the softmax matrices can account for more than half of the model size. For instance, the bigLSTM model achieves great performance on the One-Billion-Word (OBW) dataset with around 800k vocabulary, and its word embedding and softmax matrices use more than 6GBytes space, and are responsible for over 90% of the model parameters. In this paper, we propose GroupReduce, a novel compression method for neural language models, based on vocabulary-partition (block) based low-rank matrix approximation and the inherent frequency distribution of tokens (the power-law distribution of words).

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Nov-20-2025, 18:57:04 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States > California
    - Los Angeles County > Los Angeles (0.14)
    - Santa Clara County > Mountain View (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Asia > Vietnam
  - Hanoi > Hanoi (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.91)

Duplicate Docs Excel Report

Title
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

Similar Docs Excel Report more

Title	Similarity	Source
None found