Fully Quantized Transformer for Improved Translation

Prato, Gabriele, Charlaix, Ella, Rezagholizadeh, Mehdi

arXiv.org Machine Learning 

A BSTRACT State-of-the-art neural machine translation methods employ massive amounts of parameters. Drastically reducing computational costs of such methods without affecting performance has been up to this point unsolved. In this work, we propose a quantization strategy tailored to the Transformer (V aswani et al., 2017) architecture. We evaluate our method on the WMT14 EN-FR and WMT14 EN-DE translation tasks and achieve state-of-the-art quantization results for the Transformer, obtaining no loss in BLEU scores compared to the non-quantized baseline. We further compress the Transformer by showing that, once the model is trained, a good portion of the nodes in the encoder can be removed without causing any loss in BLEU. 1 I NTRODUCTION Neural machine translation methods have achieved impressive results lately (Ahmed et al., 2017; Ott et al., 2018; Edunov et al., 2018). Having been proposed only recently (Kalchbrenner & Blunsom, 2013; Sutskever et al., 2014; Cho et al., 2014), many great work have led the field to move forward quickly. Bahdanau et al. (2014) introduced an attention mechanism, allowing the decoder to attend to any hidden state generated by the encoder. Multiple improvements to their approach have been proposed, such as multiplicative attention (Luong et al., 2015) and more recently multi-head self-attention (V aswani et al., 2017).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found