Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation
Murray, Kenton, Kinnison, Jeffery, Nguyen, Toan Q., Scheirer, Walter, Chiang, David
Neural sequence-to-sequence models, particularly the Transformer, are the state of the art in machine translation. Y et these neural networks are very sensitive to architecture and hyper-parameter settings. Optimizing these settings by grid or random search is computationally expensive because it requires many training runs. In this paper, we incorporate architecture search into a single training run through auto-sizing, which uses regularization to delete neurons in a network over the course of training. On very low-resource language pairs, we show that auto-sizing can improve BLEU scores by up to 3.9 points while removing one-third of the parameters from the model.
Oct-1-2019