Goto

Collaborating Authors

 svd-softmax



Reviews: SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

Neural Information Processing Systems

This paper proposes an efficient way to approximate the softmax computation for large vocabulary applications. The idea is to decompose the output matrix with singular value decomposition. Then, by selecting the most important singular values you select the most probable words and also compute the partition function for a limited amount of words. These are supposed to contribute for the most part of the sum. For the remaining words, their contributions to the partition function is only approximated.



SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

Shim, Kyuhong, Lee, Minjae, Choi, Iksoo, Boo, Yoonho, Sung, Wonyong

Neural Information Processing Systems

We propose a fast approximation method of a softmax function with a very large vocabulary using singular value decomposition (SVD). The proposed method transforms the weight matrix used in the calculation of the output vector by using SVD. The approximate probability of each word can be estimated with only a small part of the weight matrix by using a few large singular values and the corresponding elements for most of the words. We applied the technique to language modeling and neural machine translation and present a guideline for good approximation. The algorithm requires only approximately 20\% of arithmetic operations for an 800K vocabulary case and shows more than a three-fold speedup on a GPU.


SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

Shim, Kyuhong, Lee, Minjae, Choi, Iksoo, Boo, Yoonho, Sung, Wonyong

Neural Information Processing Systems

We propose a fast approximation method of a softmax function with a very large vocabulary using singular value decomposition (SVD). SVD-softmax targets fast and accurate probability estimation of the topmost probable words during inference of neural network language models. The proposed method transforms the weight matrix used in the calculation of the output vector by using SVD. The approximate probability of each word can be estimated with only a small part of the weight matrix by using a few large singular values and the corresponding elements for most of the words. We applied the technique to language modeling and neural machine translation and present a guideline for good approximation. The algorithm requires only approximately 20\% of arithmetic operations for an 800K vocabulary case and shows more than a three-fold speedup on a GPU.