Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates

Oct-18-2019–arXiv.org Machine Learning

Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates Raj Patel Carlotta Domeniconi † Abstract Semantic representations of words have been successfully extracted from unlabeled corpuses using neural network models like word2vec. These representations are generally high quality and are computationally inexpensive to train, making them popular. However, these approaches generally fail to approximate out of vocabulary (OOV) words, a task humans can do quite easily, using word roots and context clues. This paper proposes a neural network model that learns high quality word representations, subword representations, and context clue representations jointly. Learning all three types of representations together enhances the learning of each, leading to enriched word vectors, along with strong estimates for OOV words, via the combination of the corresponding context clue and subword embeddings. Our model, called Estimator Vectors (EV), learns strong word embed-dings and is competitive with state of the art methods for OOV estimation. 1 Introduction Semantic representations of words are useful for many natural language processing (NLP) tasks. While there exists many ways to learn them, models like word2vec [11] and GloVe [15] have been shown to be very efficient at producing high quality word embeddings. These embeddings not only capture similarity between words, but also capture some algebraic relationships between words. These models, though, also have some downsides. One major drawback is that they can only learn embeddings for words in the vocabulary, determined by the corpus they were trained on.

context clue, representation, vector, (15 more...)

arXiv.org Machine Learning

Oct-18-2019

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Virginia
    - Fairfax County > Fairfax (0.04)
  - Canada > Alberta
    - Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found