Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper shows that the skip-gram model of Mikolov et al, when trained with their negative sampling approach can be understood as a weighted matrix factorization of a word-context matrix with cells weighted by point wise mutual information (PMI), which has long been empirically known to be a useful way of constructing word-context matrices for learning semantic representations of words. This is an important result since it provides a link between two (apparently) very different methods for constructing word embeddings that empirically performed well, but seemed on the surface to have nothing to do with each other. Using this insight, the authors then propose a new matrix construction and finds it performs very well on standard tasks. The paper is mostly admirably clear (see below for a few suggestions on where citations could be added to make the relevant related work clear) and very nice contribution to have to explain what is going on in these neural language model embedding models.