Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates

Patel, Raj, Domeniconi, Carlotta

arXiv.org Machine Learning 

Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates Raj Patel Carlotta Domeniconi † Abstract Semantic representations of words have been successfully extracted from unlabeled corpuses using neural network models like word2vec. These representations are generally high quality and are computationally inexpensive to train, making them popular. However, these approaches generally fail to approximate out of vocabulary (OOV) words, a task humans can do quite easily, using word roots and context clues. This paper proposes a neural network model that learns high quality word representations, subword representations, and context clue representations jointly. Learning all three types of representations together enhances the learning of each, leading to enriched word vectors, along with strong estimates for OOV words, via the combination of the corresponding context clue and subword embeddings. Our model, called Estimator Vectors (EV), learns strong word embed-dings and is competitive with state of the art methods for OOV estimation. 1 Introduction Semantic representations of words are useful for many natural language processing (NLP) tasks. While there exists many ways to learn them, models like word2vec [11] and GloVe [15] have been shown to be very efficient at producing high quality word embeddings. These embeddings not only capture similarity between words, but also capture some algebraic relationships between words. These models, though, also have some downsides. One major drawback is that they can only learn embeddings for words in the vocabulary, determined by the corpus they were trained on. Although common words are typically captured, most existing approaches are unable to learn the meaning of new words, known as out of vocabulary (OOV) words, a task humans can do easily.