Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Murray, Kenton, Kinnison, Jeffery, Nguyen, Toan Q., Scheirer, Walter, Chiang, David

Oct-1-2019–arXiv.org Machine Learning

Neural sequence-to-sequence models, particularly the Transformer, are the state of the art in machine translation. Y et these neural networks are very sensitive to architecture and hyper-parameter settings. Optimizing these settings by grid or random search is computationally expensive because it requires many training runs. In this paper, we incorporate architecture search into a single training run through auto-sizing, which uses regularization to delete neurons in a network over the course of training. On very low-resource language pairs, we show that auto-sizing can improve BLEU scores by up to 3.9 points while removing one-third of the parameters from the model.

language pair, machine translation, neural network, (19 more...)

arXiv.org Machine Learning

Oct-1-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.48)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)
  - Natural Language > Machine Translation (1.00)
  - Representation & Reasoning > Search (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found