word2vec Skip-Gram with Negative Sampling is a Weighted Logistic PCA
Landgraf, Andrew J., Bellay, Jeremy
Mikolov et al. (2013) introduced the skip-gram formulation for neural word embeddings, wherein one tries to predict the context of a given word. Their negative-sampling algorithm improved the computational feasibility of training the embeddings. Due to their state-of-the-art performance on a number of tasks, there has been much research aimed at better understanding it. Goldberg and Levy (2014) showed that skip-gram with negative-sampling algorithm (SGNS) maximizes a different likelihood than the skip-gram formulation poses and further showed how it is implicitly related to pointwise mutual information (Levy and Goldberg, 2014). We show that SGNS is a weighted logistic PCA, which is a special case of exponential family PCA for the binomial likelihood. Cotterell et al. (2017) showed that the skip-gram formulation can be viewed as exponential family PCA with a multinomial likelihood, but they did not make the connection between the negative-sampling algorithm and the binomial likelihood. Li et al. (2015) showed that SGNS is an explicit matrix factorization related to representation learning, but the matrix factorization objective they found was complicated and they did not find the connection to the binomial distribution or exponential family PCA.
May-26-2017
- Country:
- North America > United States > Ohio > Franklin County > Columbus (0.05)
- Genre:
- Research Report (0.50)
- Technology: