Language Models, Word2Vec, and Efficient Softmax Approximations

@machinelearnbot 

The Word2Vec model has become a standard method for representing words as dense vectors. This is typically done as a preprocessing step, after which the learned vectors are fed into a discriminative model (typically an RNN) to generate predictions such as movie review sentiment, do machine translation, or even generate text, character by character. Previously, the bag of words model was commonly used to represent words and sentences as numerical vectors, which could then be fed into a classifier (for example Naive Bayes) to produce output predictions. Given a vocabulary of words and a document of words, a -dimensional vector would be created to represent the vector, where index denotes the number of times the th word in the vocabulary occured in the document. This model represented words as atomic units, assuming that all words were independent of each other.