Demystifying Word2Vec
Research into word embeddings is one of the most interesting in the deep learning world at the moment, even though they were introduced as early as 2003 by Bengio, et al. Most prominently among these new techniques has been a group of related algorithm commonly referred to as Word2Vec which came out of google research.[2] In this report we are going to investigate the significance of Word2Vec for NLP research going forward and how it relates and compares to prior art in the field. In particular we are going to look at some desired properties of word embeddings, two generally popular approaches centered around the concept of a Bag of Words (which in the following we shall simply refer to as BoW), namely Latent Semantic Analysis and explore its shortcomings. This shall motivate a detailed exposition of how and why Word2Vec works and whether the word embeddings derived from this methodology can remedy some of the shortcomings of BoW based approaches.
Feb-5-2017, 22:30:12 GMT