Word-based Domain Adaptation for Neural Machine Translation

Yan, Shen, Dahlmann, Leonard, Petrushkov, Pavel, Hewavitharana, Sanjika, Khadivi, Shahram

Jun-7-2019–arXiv.org Artificial Intelligence

In this paper, we empirically investigate applying word-level weights to adapt neural machine translation to e-commerce domains, where small e-commerce datasets and large out-of-domain datasets are available. In order to mine in-domain like words in the out-of-domain datasets, we compute word weights by using a domain-specific and a non-domain-specific language model followed by smoothing and binary quantization. The baseline model is trained on mixed in-domain and out-of-domain datasets. Experimental results on English to Chinese e-commerce domain translation show that compared to continuing training without word weights, it improves MT quality by up to 2.11% BLEU absolute and 1.59% TER. We have also trained models using fine-tuning on the in-domain data. Pre-training a model with word weights improves fine-tuning up to 1.24% BLEU absolute and 1.64% TER, respectively.

artificial intelligence, machine translation, natural language, (15 more...)

arXiv.org Artificial Intelligence

Jun-7-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Vietnam
  - Da Nang > Da Nang (0.04)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Services > e-Commerce Services (0.76)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found