frequency
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses. For advanced NLP problems, a neural language model usually consists of recurrent layers (e.g., using LSTM cells), an embedding matrix for representing input tokens, and a softmax layer for generating output tokens. For problems with a very large vocabulary size, the embedding and the softmax matrices can account for more than half of the model size. For instance, the bigLSTM model achieves state-of-the-art performance on the One-Billion-Word (OBW) dataset with around 800k vocabulary, and its word embedding and softmax matrices use more than 6GBytes space, and are responsible for over 90\% of the model parameters. In this paper, we propose GroupReduce, a novel compression method for neural language models, based on vocabulary-partition (block) based low-rank matrix approximation and the inherent frequency distribution of tokens (the power-law distribution of words). We start by grouping words into $c$ blocks based on their frequency, and then refine the clustering iteratively by constructing weighted low-rank approximation for each block, where the weights are based the frequencies of the words in the block. The experimental results show our method can significantly outperform traditional compression methods such as low-rank approximation and pruning. On the OBW dataset, our method achieved 6.6x compression rate for the embedding and softmax matrices, and when combined with quantization, our method can achieve 26x compression rate without losing prediction accuracy.
A Bayesian Nonparametric View on Count-Min Sketch
The count-min sketch is a time-and memory-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream. The count-min sketch and related hash-based data structures are ubiquitous in systems that must track frequencies of data such as URLs, IP addresses, and language n-grams. We present a Bayesian view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation. In particular, we take a nonparametric approach and consider tokens generated from a Dirichlet process (DP) random measure, which allows for an unbounded number of unique tokens. Using properties of the DP, we show that it is possible to straightforwardly compute posterior marginals of the unknown true counts and that the modes of these marginals recover the count-min sketch estimator, inheriting the associated probabilistic guarantees. Using simulated data with known ground truth, we investigate the properties of these estimators. Lastly, we also study a modified problem in which the observation stream consists of collections of tokens (i.e., documents) arising from a random measure drawn from a stable beta process, which allows for power law scaling behavior in the number of unique tokens.
Regular Fourier Features for Nonstationary Gaussian Processes
Jawaid, Arsalan, Karatas, Abdullah, Seewig, Jörg
Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample locations. Spectral methods address this challenge by exploiting the Fourier representation, treating the spectral density as a probability distribution for Monte Carlo approximation. Although this probabilistic interpretation works for stationary processes, it is overly restrictive for the nonstationary case, where spectral densities are generally not probability measures. We propose regular Fourier features for harmonizable processes that avoid this limitation. Our method discretizes the spectral representation directly, preserving the correlation structure among spectral weights without requiring probability assumptions. Under a finite spectral support assumption, this yields an efficient low-rank approximation that is positive semi-definite by construction. When the spectral density is unknown, the framework extends naturally to kernel learning from data. We demonstrate the method on locally stationary kernels and on harmonizable mixture kernels with complex-valued spectral densities.
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- North America > United States (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > United States > Oregon > Washington County > Hillsboro (0.04)
- North America > United States > California > Los Angeles County > Claremont (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Japan > Honshū > Tōhoku (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- North America > Dominican Republic (0.04)
- (14 more...)
- Information Technology (0.46)
- Health & Medicine (0.46)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.66)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Information Technology (0.68)
- Asia > China (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Health & Medicine (1.00)
- Energy (0.94)
- Information Technology > Security & Privacy (0.67)
- Information Technology > Data Science (0.93)
- Information Technology > Communications > Networks (0.93)
- Information Technology > Artificial Intelligence > Robots (0.93)
- (3 more...)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China (0.04)