Impact of a New Word Embedding Cost Function on Farsi-Spanish Low-Resource Neural Machine Translation

Ahmadnia, Benyamin (Tulane University ) | Dorr, Bonnie J. (stitute for Human and Machine Cognition)

AAAI Conferences 

Neural Machine Translation (NMT) relies heavily on word embeddings, which are continuous representations of words in a vector space, obtained from large monolingual data and, independently, from bilingual data for NMT model training. Word embeddings have proven to be invaluable for performance improvements in natural language analysis tasks that otherwise suffer from data scarcity. This paper defines a new cost function---demonstrated on Farsi-Spanish low-resource attention-based NMT---that encodes word similarity as distances within a word embedding space. The novelty of this cost function is that it encourages our attentional NMT model to generate words that are close to their references in the embedding space. This approach encourages the decoder to select acceptably similar words when potential candidates are found to be Out-Of-Vocabulary (OOV). Experimental results demonstrate improvements of our attentional NMT model over a community-standard NMT baseline model.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found