Reviews: GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
–Neural Information Processing Systems
The sizes of embedding matrices in NLP tasks have long posed difficult computational problems, either from the inefficiency of operating (softmaxing) over them, or often from the sheer difficulty in storing them. In this paper the authors take on the latter problem, introducing a method of using multiple low-rank approximations to reduce the size of these matrices. They rely on frequency binning -- the same observation underlying the hierarchical softmax solution to the former problem -- to group words, prioritizing the most frequent words to receive higher rank approximations. This itself leads to significant compression rates with little loss in accuracy, and when further combined with quantization, yields large reduction in memory. Importantly quantization appears to play nicely with their methodology, and the combined seem to provide much smaller models overall, while performing at least as well as naive quantiziation on large data sets.
Neural Information Processing Systems
Oct-7-2024, 22:13:25 GMT
- Technology: