computation-efficient recurrent neural network
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector.
Reviews: LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
This work provides a novel and effective way to reduce the number of parameters for models that require handling of large vocabularies. The large drop in model size by several orders of magnitude could effectively allow some large models to be ported to the phone, which may not have been possible previously. I find it really interesting that a single method can improve both input parameter size and output size whereas previous work on softmaxes have only tackled the output side. However, I find that some technical details are lacking and the description can be confusing in some places. In particular, I find figure 2 and the unnumbered equation after Eq 1 confusing.
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Li, Xiang, Qin, Tao, Yang, Jian, Liu, Tie-Yan
Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector.